Cloning Practices: Why Developers Clone and What can be Changed

Gang Zhang starts of by making some fun: Cloning is evil!

However, developers keep cloning. Gang’s cites a study that shows that 5 to 20% of code is still clones. So the authors started to wonder: why do people clone and what can we do to improve it?

We know a lot of reasons already:

  • Simple reuse by copy-paste
  • Forking
  • Design reuse

So far, studies have been done by studying the source code, but sometimes just the source code can not tell the whole story. Therefore, we should ask humans. This too can be hard, since people forget or even lie. So the best option is to combine both sources.

The authors studies a system of 14 million LOC that has been in use for 10 years, with 2000 developers in total (currently 400).

The used the following study setup:

  1. Code base analysis (for preliminary understanding)
  2. Questionnaire (know reasons)
  3. Interview (root cause analysis)

For step 1, CCFinder was used that identified 22,000 clones, of which 141 clones were selected to study. The questionnaire was designed with the life of a code clone in mind. Also organizational and personal reasons were considered (in addition to technical reasons)

Results showed that one of the important reasons for cloning is the lack of a framework and also the confidence they have that cloning will get a problem solved easily. Developers are afraid to remove clones, often because they lack the historical context, so they are afraid to change.

Initially, the clone can be useful (for learning and getting things to work) but after that, they can become harmful.

Things that can be changed:

  • Refactor in time
  • Define a clear strategy for experimentation
  • Prevent the third clone instance (broken window syndrome)
  • Organizational changes like drop evaluation on LOC
I really liked this presentation (great slides too) Somehow, it feels crazy that after years of clone analysis, this has not been done earlier. Interesting work and great results.

Unfortunately, I was unable to find the paper online.

2 Comments

  1. Adrian

    Consider me sceptic, but I don’t buy into it. Clones are not evil. Clones are pragmatic code reuse and as research community we should build tools that support engineers in creating more clones rather than less. There is an evil, and it is that clones are not linked! Hence we need hot clones as proposed in http://scg.unibe.ch/bib/Schw10b

    1. Felienne (Post author)

      Thanks for replying. I understand your skepticism. I am not against cloning, I just liked the fact that these authors tried to understand (rather than analyze) the phenomenon.

      Have you seen this post: http://www.felienne.com/?p=851 They propose to reverse engineer these links. Seems relevant to your hot clones.

Comments are closed.