Stop blaming spreadsheets (and take a good look in the mirror)

This week, spreadsheets hit the news again, when data for a book written by economist Pikkety turned out to contain spreadsheet errors. On this, Daniele Lemire wrote a blog post warning people not to use spreadsheets for serious work. This is useless advice, let me explain why.

1) Only a fool blames their tool

Heathcare.gov is built in Java. Did people go around the interwebs shouting we all should stop using Java? Of course not! Because it is easy to see that the problems stemmed from other areas: process, time pressure, lack of testing and other (human) factors probably. See how silly it is to blame a tool? The same goes for spreadsheets. Yes, they are not so easy to test, but they have many benefits, like ‘liveness’ (immediate feedback), having data, metadata and calculations in one view, ease of deployment. In that sense, they have benefits over other programming environments.

2) No, we do not know better! 

I have been working on spreadsheets as a researcher for about 5 years now, and during that time I spoke with a lot of spreadsheet users. Of course, I asked them why they would run a bank/insurance company/airline on spreadsheets. You know what many of them say?

“We asked IT to build this, they said it would take 6 months and half a million euros. And most likely it will be more expensive and not what we want.”

The reality is that it is just not feasible to build software for all business processes. We don’t have the manpower, and frankly, in many cases also not the domain skill needed. So what we need to do isn’t ridicule spreadsheet users. They have been disappointed by us many times. I already mentioned healthcare.gov and there are so many other IT screw-ups that it is almost arrogant to claim superiority over the spreadsheet users.

3) Like democracy, spreadsheets are the worst, except for all others

Then, in addition to our riducule, we try to push tools to them, “real” programming languages, that nor fit their needs nor their skills. They will never learn Java or C#. Python maybe, that seems to be easy to use for end-users, but certainly not for all of them. End-users are not programmers, they don’t want to be and they should not need to be.

Instead of shaming spreadsheet users, let’s focus on inventing better spreadsheet-killers tools (Tableau is my personal favorite at the moment). Or, and this is my line of research, help spreadsheet users to testmeasure and refactor. No one, really, no one, is helped with your judgement without thinking about alternatives.

10 thoughts on “Stop blaming spreadsheets (and take a good look in the mirror)

  • I agree! To add more on the “what to do” note, there is a whole subarea on how to help end users succeed with tools like spreadsheets. It’s surveyed here:
    The state of the art in end-user software engineering
    Andrew J. Ko, Robin Abraham, Laura Beckwith, Alan Blackwell, Margaret Burnett, Martin Erwig, Joseph Lawrance, Chris Scaffidi, Henry Lieberman, Brad Myers, Mary Beth Rosson, Gregg Rothermel, Mary Shaw, and Susan Wiedenbeck,
    ACM Computing Surveys 43(3), Article 21 (April 2011), 44 pages.

    Another relevant paper I recently encountered is:
    D. Jannach, T. Schmitz, B. Hofer, F. Wotawa, Avoiding, Finding and Fixing Spreadsheet Errors-A Survey of Automated Approaches for Spreadsheet QA, Journal of Systems and Software, 2014 (to appear). DOI=a href=”http://dx.doi.org/10.1016/j.jss.2014.03.058″>http://dx.doi.org/10.1016/j.jss.2014.03.058

  • Nobody is saying to not use Java after healthcare.gov, because those of us who know how bad a tool it is have been saying that for years already. Those who still in 2014 don’t want to believe it will never be convinced.

    There are languages that provide 5-10x improvement in code size. We’ve had them for years. They work great. If you’re still using Java today, either you don’t care about code size (and what it brings, like slowness of development, cost, and bugs), or you’re choosing to remain willfully ignorant of the state of the art.

    We could name some alternatives, but then the conversation turns to making us look like $LANGUAGE fanboys. And they’re not hard to find if you look for them.

  • “Only a fool blames his tools”

    This weekend, my dad and I were cutting down a tree. It was taking a long time and we were not making much progress. After arguing for a bit over technique, we decided to change the chainsaw blade. We took the tree down in 2 minutes.

    By blaming out tool and changing it, we solved our problem. Sometimes, a wise man blames his tools.

  • How did you get from “you shouldn’t use spreadsheets because they’re almost impossible to test” to “only a fool blames their data?”

    “See how silly it is to blame a tool?”

    It isn’t silly to blame a tool. If you need help understanding this, ask any craftsperson the difference between this open table saw, and this table saw with the safety stops. Bragging how you don’t need the safe tools and choosing tools for safety is “silly” is a good way to lose a thumb.

    Making excuses that researchers should not be expected to learn proper tools because they’re end users is the actual silly thing. A large part of being a professional is using the correct tools. You’re programming. It’s time to be an adult and listen to the programmers. Spreadsheets are a terrible tool, and there’s a reason that most legitimate researchers moved on 20 years ago.

    Unless you want to be a Piketty and be famous for releasing garbage work because nobody could audit it.

    By the by, more researchers use R than Excel these days by quite a large margin.

    Maybe someday you’ll catch up?

    • Introducing a dedicated statistics language like R is relevant and interesting, and I shall certainly consider it the next time I need a statistics tool.

      However, it would be more interesting if the arguments comparing spreadsheets and R were based on research of how end users (most of which are non-programmers) perform with respect to ease-of-use and correctness.

      I have used spreadsheets happily for many tasks including household budgets, tax computations, and simple statistical analysis of research data. I believe from its description that R is great, but I also find that the spreadsheet programming paradigm is great and I’m unsure if R could have done any of this better (as in faster and more correct).

      My point here is that the previous paragraph is just one, personal opinion based on personal preferences and experiences. I’m not saying that I would never use a dedicated tool like R (I do use dedicated tools), but that I would like to see data or at least qualified opinions supporting the hypothesis that guys like Pikkety would make fewer errors using a dedicated, lazy, functional language like R with a command prompt than using a dedicated spreadsheet programming language with a graphical UI, within the same time frame. The answer might also depend on the precise tasks you consider.

      Further, it is not qualified which spreadsheet tool to compare with. Excel? Calc? Gnumeric? Tableau? If any of these compares with R in the level of correctness by users, we are down to comparing ease-of-use, and if they also compare here, we are down to personal preferences. And why only R? The big math beast out there is Mathematica. And why not both paradigms? Mathematica can interact with spreadsheets, and in fact, Gnumeric cooperates with the R project and includes many of their statistical routines.

      Margaret Burnett’s remark with references to surveys on end users and spreadsheets (and why users do make errors in spreadsheets) are more constructive than statements like “Maybe someday you’ll catch up?” that are just somewhat derogative expressions of personal preferences.

      Best regards,
      Hans

      Disclosure that I’m a programmer: I’ve a PhD degree in computer science in the area of programming languages (thesis report written in emacs and PlainTeX), and work as a chief developer at a software company developing its own programming language and using a host of other languages.

  • “They have many benefits, like … having data, metadata and calculations in one view”

    This is a source of huge problems. While all other programming environments encourage separation of code from data, spreadsheets actively encourage mixing code and data in the same place. This results in proliferation of copies of the “same” spreadsheet – because a copy is made for each data set. This promptly becomes a maintenance nightmare.

    Don’t get me wrong, I love Excel for all these reasons but there’s no denying it’s a source of huge operational risk for businesses.

  • I don’t think much of Dan’s arguments. Spreadsheets don’t make code review difficult. *Bad* spreadsheets make code review difficult. As does bad VBA, SAS, Python, R, or whatever language someone abuses.

    He keeps saying that you wouldn’t want the people who write software for the space shuttle or the banking industry to be writing codes in spreadsheets. Funny thing is, they do. And sometimes they make mistakes. Mostly human error. And I’m pretty sure they make mistakes no matter *what* technology they employ.

    The only thing wrong with Excel is the low barrier to entry. Now I’m not saying I want that barrier to be raised. I just wish that organizations didn’t think you are qualified to drive Excel somewhere dangerous merely because you know how to start it up.

Leave a Reply

Your email address will not be published. Required fields are marked *