Of course, we should refactor if code clones emerge, but that does not always happen. No empirical result, but Stefan estimates that clones make up about 20 to 30% of most systems.
Sometimes, the clones are not exactly the same. And research so far has not really agreed on whether or not that is harmful, and research so far has not really agreed on whether or not that is harmful.
Stefan’s hypothesis is that they are only harmful when developers are not aware of clones. To test this, he inspected 3 systems from the automotive domain, together about 1 million lines of Java, and also interviewed developers from the projects.
A first interesting result is that most clones are type-3:
Furthermore, the developers that were more aware of clones turned out to have fewer clones in general and fewer faulty ones (small study, so no p-value)
So we should focus on more awareness!
There is a preprint.