Implication of Data Quality for Spreadsheet Analysis

by Donald P. Ballou, Harold L. Pazer, Salvatore Belardo and Barbara Klein —School of Business-State University of New York at Albany

This paper describes the implication errors in spreadsheet data can have, and they kick of with some nice observations in the intro:

Intro

However, the obvious problem of the impact of faulty data on spreadsheet computations and projections has been largely ignored. Okay, this paper is 25 years old, so this might have changed, but my recent background search is in line with this statement. Many papers on spreadsheets focus on the quality of the calculations, but errors in the operational data can influence the determination of the most appropriate forecasting model and The manager is unlikely, however, to study the implications of errors in the data that are being projected . Clearly such errors have an impact, but it is not necessarily obvious which are potentially serious and which less so.

Idea

The idea of this paper is quite interesting, although it is not very well written in my opinion. The idea is to assume there is a certain distribution of errors in a spreadsheet, and to calculate how those errors propagate through the given calculations. They assume a normal distribution of errors, and mention that in certain cases enough may be known to specify the error distribution functions for some of the variables. Their contribution is not necessarily the calculation on this concrete spreadsheet, but the method of calculating the propagation.

Conclusions

  1. The smaller the aggregation (for instance a week vs a year) the bigger the change of errors, since aggregation dampens errors.
  2. The seriousness of errors is highly dependent upon the manipulations that the data undergo.  Especially those calculations with denominators, should be treated with care .
  3. Errors in financial data can have a serious impact on projected values and in many cases alter the optimal forecasting model.
  4. Data quality control procedures should be introduced with the goal of enhancing the quality of those data that most significantly affect spreadsheet results.

BibTex

@article{DBLP:journals/db/BallouPBK87,
  author    = {Donald P. Ballou and
               Harold L. Pazer and
               Salvatore Belardo and
               Barbara D. Klein},
  title     = {Implication of Data Quality for Spreadsheet Analysis},
  journal   = {DATA BASE},
  volume    = {18},
  number    = {3},
  year      = {1987},
  pages     = {13-19},
  ee        = {http://doi.acm.org/10.1145/27544.27546},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}