Eric Bouwers PhD defense

Phillipe Kruchten (PK) is the first one allowed to ask a question. He starts of with asking

PK: What is analyzability?

Eric Bouwers (EB): We use the ISO definition, which states that it is a subcharacteristic of maintainability.

Capture
PK: I have a question on section 6.6.6. I was involved in systems larger than what you describe. If I have a 3 million loc system, is 8 still a useful number? Shouldn’t it be relative to the size of the system?

EB: The number 8 comes from the benchmark. If you have a bigger system, you should look at components or modules instead of classes.

PK is not yet convinced: I have seen systems that have 30 or 40 components and people can work with them fine.

EB: In such systems, often there is a (sometimes implicit) hierarchy above that, like folders, or prefixes in the same or there should be one.

PK: But this is not in the thesis.

EB: That is true.

PK: The model that you describe, is that a checklist or are they metrics? Because in your thesis, page 42 and beyond, we see ‘check’ a lot. Regarding your proposition that it is important to replicate research, this does not seem very precise.

EB: Well, on page 38, we describe the model but later on, we do provide the metrics.

Capture

PK: What exactly is the difference between personal and environmental?

EB: Environmental concerns the context in which people work or the amount of information needed to understand the system, personal has to do with how 1 person understands a system.

Next up: Andreas Zeller (AZ)

Capture

AZ: Regarding Table 2.3, is the mapping just the opinion of two experts?

EB: No, this is also based on other data (interview and static analysis)

AZ: Couldn’t it be the case that the experts share a common understanding of the perfect mapping?

EB: That would be just what we want! This is not a risk, but something we desire.

AZ: What was the most surprising fact that came out of this analysis?

EB: Logic in the database. I did not think this would be important, but apparently, it is so common that experts mention it.

AZ: Isn’t this a representation of what your company, SIG, is doing?

EB: Well, these reports were written 6 or 7 years ago, when we did not have such a deep understanding of the model that we (SIG) apply. Now we do.

AZ: When somebody is suggesting a metric, the first thing we think is: there already are so many metrics. So to what extent do your metrics correlate with existing metrics. How did you look into that (for the component balance metric in particular.)

EB: We did not find any reference to a paper that listed an ideal number of components, although there were papers that mentioned there should or could be an ideal number. So basically, we have done a basic literature study.

AZ: Did you check for standard metric attributes, like monotonicity?

EB: No, because we do not consider this complexity metrics (we explicitly state this in the thesis)

AZ: Chapter 8, encapsulation metrics. Looking at the amount of information flowing between components, it would be also interesting to look at the outdegree of a component. Did you look at that?

EB: Yes, this is partly captured by one of our metrics.

Committe member three: Patricia Lago (PL)

Capture

PL: Taking as example Chapter 2, Table 2.2, you claim that to avoid bias, interviews does not include system properties, but there are architectureal properties mentioned, like high-level design. How did this influence the results?

EB: We did not give them the list on the left, we gathered those from the interviews. We did not ask them to list them or to check them.

PL: So what it your opinion on the bias you introduced? Do you expect all systems to be organized in the same way? How generalizeable is your approach?

EB: Indeed, we do suppose there is some level of high-level design and decomposition, so it is definitely geared towards how we were working at SIG at the time. However, these terms are very common, in a different setting, I would expect at least 80% of those attributes.

PL: How did you check the definitions that the interviewees had with the definitions you use. How do you know the subjects had the same interpretation of terms that you did?

EB: We wrote the interviews down ourselves and then asked interviewees whether we did it correctly, and also, we presented the list of attributes to other experts (informally)

Next up: Inald Langendijk (IL)

IL: Regarding the statement on page 27:

Capture

 

To what mathematical deduction techniques are you referring?

EB: Summing them and seeing what occurs more often.

IL: Counting, basically?

EB: Yes, this is a mathematical technique.

IL: How does this relate to importance?

EB: If it occurs more often, it is more important.

IL: I disagree, I think the outliers are the most important.

EB: We want to evaluate systems consistently.

IL: I would have liked to see the 1’s in Table 2.2 in the follow-up analysis.

EB: We do take them into account, but not for the definition of high-level design. We want to make it repeatable and avoid that one experts explains it differently.

IL: What is the Gini coefficient, as you do have a reference, but no definition.

EB: It measures inequality. [For the interested reader: http://en.wikipedia.org/wiki/Gini_coefficient]

IL: Would there be other options, for instance entropy. This is more common and better understood.

EB: I think this is only better understood among researchers and not among practitioners.

IL: Have you considered it?

EB: No.

IL: Why Spearman (rank-based) and not Pearsson?

EB: Because the data is always not normally distributed.

IL: But later on you do calculate p-values, and for this you need a distribution, which one did you use?

EB: The normal distribution.

IL: So at one point you are precise, and then you do decide to use a distribution.

EB: True, I am not a trained statistician.

Number 5, from our own floor: Geert-Jan Houben (GJH)

GJH: Why are you only considering the median for information availability?

EB: We are assuming that there will always be people available that can understand the system.

GJH: But you only have a problem if there is a mismatch between the people and the system. If there are people available that know the system, they can maintain it.

EB: We only consider the system, we do not place that in the context of available people.

GJH: What is your definition of actionable? I wonder because you describe in your thesis that your metrics are actionable, but also that it is hard for people to use your results.

EB: Actionability is geared towards action inside the system.

GJH: I agree

Proposition number 3 is read aloud by one of the paranymps: “If software engineering PhD students spend 20% of their time `in the field’, their research will be based on more realistic assumptions.”

GJH: But you spend more than 20% in industry.

EB:  I did not say exactly 20%, it could be more.

Next up, the promotor Joost Visser (JV) askes about proposition #1: “To enable the effective application of software metrics, a pattern catalog based on real-world usage scenarios must be developed.”

JV: If this is so important, why doesn’t it understand, what is hindering this?

EB: I think that people do not have time for this. Also, it is hard to describe not what problems exist, but what problems could be detected.

Finally, the other promotor Arie van Deursen (AvD)

AvD: In chapter 8, you drop the people setting and switch to the controlled setting. How could this controlled setting be improved. In this chapter, you looked at 10 different systems and their version history (in subversion) But the new way of doing version control is on GitHub. How would using GitHub instead of subversion influence your methods?

EB: One of the threads to validity we have now is that one commit can contain changes related to different features.

AvD: Indeed, and even more so, in GitHub, you would have a pull request containing of different commits, all belonging together in a sense.

EB: The problem is that Git is not around long enough to do this kind of analyses.

Hora est!