Analysing Programming Data – Neil Brown

Neil works on BlackBox, an opt-in plugin into BlueJ (which I blogged about before) The dataset now contains about 300 million (!) compilations. Neil will present some results from his TOCE paper on the BlackBox dataset.

The plan of this paper was to ask teachers what they thought were the hardest things to learn, and then look at the data to proof they were wrong πŸ™‚ Some mistakes that teachers believed were hard:

  • mismatched brackets
  • = versus ==
  • == versus equals()

He then looked into the data to see what errors are most common and which take longest to fix.

The most common mistake is mismatches brackets (C) and wrong types in method call (I). Neil also studies mistakes of users over time. Do novice programmers get better?

The errors related to mismatched brackets—this includes < ( { —go down after a few weeks of programming, and also the time to fix it goes down over time. For other errors the time to fix it does not really go down:

This could be confounded by the fact that the programmers over time do more difficult exercises so they get more errors.

And then… the teacher part! What do they think is hard?

This graph is a bit hard to read, but the general idea is that educators are split in 4 categories of experience, and ranked the errors. The line is the real ranking of the errors.

The graph basically shows that there is very little consensus among the teachers and also that they do not agree with the data. Neil then analyzed whether there was a correlation with experience in teaching in general, in teaching intro CS and in teaching Java and for none of those there was a correlation!

In summary:

  • students do not really learn to fix mistakes apart from brackets
  • educators do not know what is hard
  • experience in teaching does not help teachers get better at guessing

Ok, so now what, asks Neil? Staying in the frame of medicine, maybe we need a programming epidemiology? We cannot study all patterns in control groups, we also need to look at the big patterns.

For example: can we use this data to answer questions like: what problems to students have with static typing or with recursion? Maybe this data is not the right tool for these questions, maybe it is? Still an open question.

This post was visited 115 times.

Leave a Comment

Your email address will not be published. Required fields are marked *