New paper: BumbleBee, a tool for spreadsheet formula transformations

Some spreadsheets can be improved

While looking at spreadsheet and how they are used, over the past years I have noticed that many users don’t make their spreadsheets as easy as they could be. For instance, they use A1+A2+A3+A4+A5 instead of the simpler SUM(A1:A5) Sometimes because they are unaware of a simpler construct, or because the spreadsheet evolved over time. For instance, in used to be A1+A2, then A3 was added and so forth. Such complex formulas were exactly the aim of our previous work on smell detection.

If you say smell, you say… refactorings!

So in order to improve spreadsheets, we and other researchers have developed a number of refactorings to improve spreadsheet formulas. Over the last few months, I have been working on BumbleBee, a tool to perform not only refactorings, but more general transformations on spreadsheet formulas.

With our tool, that looks slightly little bit less impressive that its namesake shown above, you can perform transformations on spreadsheet formulas.

Spreadsheet refactoring with BumbleBee

In the add-in interface, shown below, a user can find transformations that can be applied on the current formula with the ‘Find applicable rewrites’ button. Once they are shown in the dropdown list, a user can select one. The result of this transformation in then shown in the preview box. In the example below, the SUM and COUNT to AVERAGE transformation is shown and B12, with =SUM(F3:F7)/COUNT(F3:F7) is selected, so the preview box shows AVERAGE(F3:F7). The user can now apply this transformation in any selected range, sheet or in the entire workbook.

Screenshot

A few refactorings come pre-loaded with BumbleBee, but you as a user can use our BumbleBee language to express your own transformation of choice.

The BumbleBee language

Our language is an extension of regular Excel formulas, so you can write a transformation like

A1+A2 <-> SUM(A1:A2)

which allows BumbleBee to transform exactly the formula A1+A2 to SUM(A1:A2) or vice versa. In addition to using exact cell references, you may also use variables instead of cell references:

{i,j} + {i,j+1} <-> SUM({i,j}:{i,j+1})

With this transformation you can not only rewrite A1+A2 buy any addition of two cells next to each other.

Variables are also allowed for ranges. The refactoring in the screenshot above, which transforms the SUM and COUNT into an average is expressed like this

SUM({r})/COUNT({r}) <-> AVERAGE({r})

Finally, we can express transformations for arbitrary formulas, using square brackets

[c] <-> ROUND([c],0)

Not just refactorings

The cool thing about BumbleBee transformations is that they do not have to be refactorings, we do not enforce that transformations preserve behavior, and that is by choice. The final transformation

[c] <-> ROUND([c],0)

is already not entirely behavior preserving, as it can change a value 3.6 in a 3. We foresee (at least) two different application for non-behavior preserving transformations:

1) Updating to new versions of Excel

When a new version of Excel appears, new formulas are added. For instance, in Excel 2007, the new function IFERROR was introduced. The syntax of this function quite simple:

IFERROR(value,value_if_error)

In previous versions of Excel, this could only be expressed with the more cumbersome

IF(ISERROR(value),value_if_error, value))

Therefore, many Excelfiles which were created with an earlier version of Excel still contain this construct. While it continues to function, it is not so easy to read because of the repeated ‘value’ which can be a complex expression itself. With BumbleBee, this can be easily expressed and consistently updated.

2) Changing business rules

CaptureIn our example spreadsheet for instance, students now need to take all three tests in order to get a grade. Suppose we would loosen that into at least 2 tests, then the formula above needs to change into IF(E3<=2, SUM(B3:D3)/E3,”-“)

For this, we cannot simply use search and replace, as we have to replace 3 with a different cell each time (on row 4, it will be E4 etc) This is where BumbleBee comes in. With the transformation

IF({i+3,j}=3,SUM({i,j}:{i+2,j})/3,”-“) <-> IF({i+3,j}<=2, SUM({i,j}:{i+2,j})/{i+3,j},”-“) we can perform this change everywhere.

That last transformation does not look like something a spreadsheet user can write

You are right, in the case of changing business rules, the transformation can become complex quickly. This is why, in future work, we plan to generate these transformations based on edits by the user. We envision a BumbleBee 2.0 that can ask a user that just updated a formula, whether he wants to perform the same edit throughout the spreadsheet.

That looks awesome, can I have it?

Absolutely. Here is an installer for Excel 2010. By adding your own transformations to the worksheet ‘Transformations’ and hitting ‘Initialize’ you can create your own rules and play around with them.

If you have questions, drop me a line or post below. If you want to know more, here is a preprint of our paper about BumbleBee.

4 Responses so far.

  1. […] and in the formulas. Recently she worked on testing spreadsheets and she built a tool to refactor […]

  2. […] and in the formulas. Recently she worked on testing spreadsheets and she built a tool to refactor […]

  3. Tim Josselyn says:

    Absolutely brilliant.

  4. […] at the moment) or, and this is my line of research, help spreadsheet users to test, measure and refactor. No one, really, no one, is helped with your judgement without thinking about […]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>