Empirical Evidence of Large-Scale Diversity in API Usage of Object-Oriented Software

The idea of this paper, presented by Martin Monperrus, is to abstract over source code to analyze it more coarse-grained.


With this abstraction, we can determine how diverse a certain API is used. Is it always used in the same way.

The authors have mined 9.022.262 type-usages which refer to 382.774 Java classes. To their surprise, they observed a lot of diversity of API usage. There were 748 classes that were used in more than 100 different ways. The speculate on the reasons for this, like a correlation with reusability. According to the authors, “[diversity] can reflect the fact that client code was able to use the class in ways that were unanticipated by the class designer.”

The paper ends with a lot of open questions on the impact of this finding, for instance

  • Should we support or encourage the diversity in object-oriented software?
  • How to ensure that all possible type-usages are correct? Should there be one test per observed API usage (this would mean 2.460 test cases for Java’s String)

Interesting topic, I expect(and hope) that in future SCAMs we will see papers which address these questions. Pre-print is here.