In the 1800rds the state of medicine was a bit like the state now in programming, says Stefik. There was a guy called Wilhelm von Hoven that did not believe that homeopathy worked, and at the time homeopaths were the experts in the field in medicine.
He wrote a long ranting letter to a newspaper that he did not trust them, which ultimately led to the Nuremberg Salt Test, the first controlled randomized trials. For more information on the Nuremberg story, see this paper.
The situation got better after this but there were still challenges. For example, doctors hesitated to give an experimental medicine to people and therefor the proces of randomization differed between groups. Also, in addition to lab studied, field studies were also done, sometimes on ethical groups (for example, because syphilis and gonorrhea killed so many people, doctors would give the antibiotics without a control group because they were pretty sure it worked.
A big revolution came from Austin Bradford. He developed procedures that ultimately led to the phased randomized controlled trial model, combining lab studies with data from field studies, and his way is now the way to run studies required by law in the US. Stefik especially recommends this paper of Bradford.
All was a lot better, but there was still the problem of the evidence standard. When is there ‘enough’ evidence? A good strategy is to start small and gradually scale the group size. One of the famous studies around this was the tolbutamide study. In the control group people died, but not enough to be really sure, and people did not want to repeat the study because they might kill people.
This led to the focus on replication, and by now, by the way, we are already in the 90s, so it is relatively recent! By 1993, in a meeting with 30 experts in Ottawa, Canada, they developed CONSORT, a discipline wide effort to improve reporting standards in experiments.
The current status of the process is about this:
And there is one more thing that is really important, which is starting to get some traction now in psychology and education is trial registration. You have to register an experiment before you run it, so that you cannot ‘hide’ studies.
So now what? Do we simply copy this into computer science? No, says Stefik. We are not ready as a field! But here are some things we could do:
- Work together across institutions
- Share data and hypotheses and run replications
- Hold each other to evidence standards
Andy Ko adds a great point, namely that in medicine, there is a unified model of human anatomy, which we still very much lack in computer science. Johannes adds that another difference is that medicine agrees in that dying is bad, but in computer science do we really agree on this? Sometimes there are ‘real world effects’ like exploding rockets, gambling machines or cheating Diesel engines, and those need regulations, but otherwise, isn’t this freedom a good thing? Isn’t it okay to have some freedom?
Ciera argues that medicine can also have other goals like increasing quality of life (which I would say still largely falls under the same umbrella)