Gang Zhang starts of by making some fun: Cloning is evil!
However, developers keep cloning. Gang’s cites a study that shows that 5 to 20% of code is still clones. So the authors started to wonder: why do people clone and what can we do to improve it?
We know a lot of reasons already:
- Simple reuse by copy-paste
- Design reuse
So far, studies have been done by studying the source code, but sometimes just the source code can not tell the whole story. Therefore, we should ask humans. This too can be hard, since people forget or even lie. So the best option is to combine both sources.
The authors studies a system of 14 million LOC that has been in use for 10 years, with 2000 developers in total (currently 400).
The used the following study setup:
- Code base analysis (for preliminary understanding)
- Questionnaire (know reasons)
- Interview (root cause analysis)
For step 1, CCFinder was used that identified 22,000 clones, of which 141 clones were selected to study. The questionnaire was designed with the life of a code clone in mind. Also organizational and personal reasons were considered (in addition to technical reasons)
Results showed that one of the important reasons for cloning is the lack of a framework and also the confidence they have that cloning will get a problem solved easily. Developers are afraid to remove clones, often because they lack the historical context, so they are afraid to change.
Initially, the clone can be useful (for learning and getting things to work) but after that, they can become harmful.
Things that can be changed:
- Refactor in time
- Define a clear strategy for experimentation
- Prevent the third clone instance (broken window syndrome)
- Organizational changes like drop evaluation on LOC
Unfortunately, I was unable to find the paper online.