Fixing bugs is expensive, couldn’t we automate that? Yes, we can! Given a buggy program and a test suite, we mutate the program until the test past. Many of the existing techniques (GenProg, PAR) tend to overfit and sometimes generate a nonsensical patch:
Sometimes for example, GenProg will remove a throw statement. yes, that will fix a but but, not how we want. Also GenProg is quite slow, it can hours to fix a bug, because there are so many different candidates. We as humans learn from bug fixing, we know how to some classes of fix bugs. PAR does take history into account, but that is by manual effort of the PAR authors that put in a number of patterns manually.
David extracted data from a number of popular Java projects, and filtered all commits that looked like bug fixes, for example but looking at words like ‘fix’, ‘bug fix’ while discarding ‘non fix’ etc. Also, he only looked at very small changes, only a few lines, as a bigger change might also contain other changes (adding comments) or be two fixes in one commit.
David tested his system against PAR and GenProg, on 90 programs from the Defects4J dataset within a 90 minute limit, and on commodity hardware. HDRepair fixed
18 23 bugs, while GenProg fixed 1 and PAS fixed 4. Cool! Why?
The existing techniques sometimes indeed generated plausible but incorrect fixes, and sometimes they would timeout, and sometimes you would need mutations that were not inside the program, and then GenProg is lost.
All their data is here: https://github.com/xuanbachle/bugfixes