Sometimes, people want to have their spreadsheet analyzed for errors or metrics, but they do not want to share their confidential data and formulas. This is where AnonymousXL comes in. With this tool you can remove all confidential stuff from your spreadsheet.
First, AnonymousXL removes all meta data. That is an easy step.
Then, numerical values are changed, by adding or removing up to 60% of the values. By changing the values with a small margin, Joeri makes sure that the spreadsheet does not change too much to do meaningful analysis. Integers remain integers by the way, by rounding them off after adding or removing. “Magic” numbers like 0,1,12,100 etc are left untouched.
Changing strings is a bit more involved, as changing all text values to “anonymous” messes up pivot tables, as they cannot contain the same value multiple times. Joeri tried this at first and then it crashed Excel 🙂 So, a mapping is made from every value unto a new string “text a”, “text b” “tect c” etc and all strings are mapped to the same values.
When working on AnonymousXL, Joeri found that sometimes the modifications resulted in errors. For instance, if there is a formula A2/A3 and AnonymousXL changes A3 into a 0, then the previously correct formula now results in #DIV/0. Because analyzing the errors is one of the analyses that Joeri wanted to perform in the first place. Therefore AnonymousXL keeps a list of all error cells that are in the spreadsheet initially resulted in an error and checks whether this list remains the same.
Currently, range names are not taken into account yet (question from the audience) but that is future work. Also, links to files are not anonymized, which would also be interesting to work on, as they might contain sensitive info too. Another idea (by Bas Jansen) is to create a forbidden list with client names, project names etc. and only replace them. Maybe we could extract the list of all strings from the spreadsheet and have the user select the forbidden words.