Belief and Evidence in Empirical Software Engineering – Premkumar Devanbu

Belief is important! In medicine belief is often a topic of research. For example, doctors for a long time believed that ulcers were caused by stress, and when a new paper appeared saying bacteria caused them, doctors initially refused to believe it.

There has been research about this even! Some hints:

2016-05-18 11.26.30

Belief is interesting in SE, because programmers are highly skilled, and highly opinionated professionals. And, like in medicine, there is evidence. But are developers looking at the evidence?

Prems approach was to combine data from Microsoft, on the belief side there were questionnaires, and on the evidence side there were bug reports, commits etc. They sent out 2500 surveys, and got 564 responses, 497 male, 53 female, 368 from the US. Here goes:

What do programmers belief?

This is a broad question of course, so it needed to be narrowed down a little bit. They selected beliefs that there was evidence for, could be found with the Microsoft data, actionable and realistic.

A few non controversial statements:

  • Code reviews improve quality
  • Static analysis helps find bugs

Most controversial

  • Code quality depends on programming language
  • Fixing defects is more risky than adding features
  • Geographically distributed teams produce code as good as colocated teams.

Why do they believe it?

Next, Prem wanted to know why people held the beliefs. They got options and needed to rank them.

Capture

So, wow! Imagine your doctor saying: if you have a headache, just eat a carrot, that is my personal opinion, that’d not be good….

Comparison with evidence

On the geographically distributed teams. within Microsoft, there were two teams, A and B:

Capture

Turns out, these two teams had different opinions on that. From the paper:

“We also selected a specific question, regarding the quality effects of Geographic distribution, where respondents from one team tended
to believe that Geographic distribution was bad for software quality, and from a different team tended to believe it had no bad effect.

Based on a quantitative analysis of the project repositories of both, we found that geographic distribution had a barely measurable effect
on quality; it was statistically significant, but only because of very large sample sizes (in the hundreds of thousands). Furthermore, the effect was not always in the expected direction; sometimes the effect was good, and sometimes bad. Thus, we found that one team’s beliefs were consistent with the evidence, and another team’s wasn’t. This finding illustrates the risks that programmers might face, by relying too much on their personal experience; subjective, personal recollection is notoriously error-prone.”

There is a preprint.