My first solo paper (without my PhD supervisor) ‘Improving spreadsheet test practices’ will be published at CASCON ’13 in Toronto. This paper explores the idea of applying testing to spreadsheets.
The core observation I did for this paper is the fact that spreadsheet users do exhibit some basic testing. Many spreadsheet creator use formulas like IF(A1=100,”OK”,”Results should sum up to 100″) to validate input and calculations. If you think about such a formula, you can clearly see it is a different beast than a formula like SUM(A1:A6). This last one is a formula to calculate, the first one is a formula to validate.
Distinguishing between formula types
The idea of this paper, and the corresponding new tool Expector, is to scan a spreadsheet and decide on each formula whether it is a test formula. We define a test formula with the following.
- A test formula has the operation IF as root function
- A test formula contains at least one branch that can result in a string
- A test formula’s result is not used in other formulas, unless these formulas are test formulas too
Do spreadsheet users test?
They do! With our method, we found that 8.8% of all spreadsheets in the EUSES corpus (a well known open spreadsheet test set) contains such test formulas.
On average, the spreadsheets with tests had a ‘coverage’ of 35%. This means the test formulas referred to 35% of the other formulas, on average.
So what can be improved?
First of all, the coverage can be improved. From software engineering, we know that a higher coverage tends to improve the chance an error is spotted. This is what our tool Expector helps users with. Expector is an Excel add-on which visualized to the user what cells are currently tested, as follows:
With the ‘Mark Tests’ button, all test formulas are marked green. By selecting ‘Mark Covered Formulas’ also all dependents are marked. ‘Mark Non-Covered Formulas’ makes non tested formulas red. Selecting both on our example results in:
What else can we do?
We can also actually ‘test’ the formulas. In the example above, it is not really necessary because the spreadsheet is so small. But suppose we add some questions:
Now, if we change D6, we can no longer see the test as it is outside of the screen. Here, Expector helps us too, by continuously examining the test formulas and given the user feedback if one breaks:
We gather the name of the test from the labels, and the expected outcome from the formula.