Make science, not (worm) wars

Warning: long post ahead!

I have no interest in opening the can of - well, you know - that has taken the development economics twitterverse by storm this week. (If you are interested in the debate over the efficacy of deworming, I highly recommend starting off by reading the original Miguel and Kremer piece in Econometrica. Next on your list should be the 3ie replication plan - too many commentators that are writing on this topic this week seem to have ignored the details of this document. It seems to me that something that should come out of this entire kerfuffle is more careful thinking about how we use pre-analysis plans, and how we treat deviations from these plans, especially in the replication context. After you've read what the 3ie researchers pre-specified they would do, your next stop should be the two 3ie replication articles, Aiken et al and Davey et al, out this week in the International Journal of Epidemiology. Before reading anything else about worms on the internet, make sure you also read the original authors' reply, also published in the IJE this week. You might have noticed that what I'm recommending is all technical academic articles, rather than blog overviews or media articles. The reason for this is that this debate at this point is highly technical - and figuring out who is right and who is wrong demands a careful, thorough look at the actual evidence being presented by the original authors and the replication team. Once you've read all of the aforementioned research, you have my blessing to read a couple of useful posts on the subject. I recommend in particular Chris Blattman's original post and his follow-up; Berk Ozler's (updated) Development Impact post; and GiveWell's original reanalysis of Miguel and Kremer, as well as their commentary on the recent replication exercises.)

Longest parenthetical section of a blog post of all time. Without getting into the relative merits of the original study and the replication, though, I think there are lessons that the social science community can and should take away. There's a lot going on here - many of these sub-sections will likely be the topics of further posts, but this is a nice setting to discuss all of them together.

 These two photos tell a dramatically different story about deworming. Guess which one the media likes?

These two photos tell a dramatically different story about deworming. Guess which one the media likes?

 

First and foremost, the scientific method and processes are incredibly important and valuable. We should all be taking a leaf out of Miguel and Kremer's book and making our data and code publicly available. More and more economics journals are making this a requirement for publication, as well it should be (next step - make sure the replication code actually works. But that's a topic for another post). Peer review, seminar presentations about unpublished work, and working paper circulation are key pieces of how economics bolsters the credibility of our findings, and should continue. 

Second, replication is a tricky thing. In an ideal world, we'd be able to take any study in the social sciences and simply re-do it and see whether the results matched up. In natural sciences, we really are at the place where if one lab publishes a result, another lab should be able to replicate the procedure and come up with the same result. In the social sciences, this is harder - context matters hugely, and if an experiment is done in everyone's favorite RCT motherland, Busia, Kenya, we might not get exactly the same result if we try it in Punjab, India, instead. Is that a failure to replicate, indicating that the original study is ``debunked''? (Ugh.) Or does that simply mean that we've learned something about external validity, and maybe something else about what particular social and cultural phenomenon were required for the original result?

Given that this type of pure replication is a little nebulous, one thing that's being done (as in the worms debate) is a re-analysis of the original data and code.  In principle, this is a great idea - coding errors happen, and we want to make sure that our understanding of the world isn't based on a mis-placed "!=". In addition, it is definitely possible that slicing the data in new ways can lead to new insights. But we should be a little bit cautious: the incentives for replication are not set up very well in economics - re-analyzing a big paper's result with its own data, only to find that the original authors got it right gives you zero credit, whereas overturning a result that's established in the literature often creates huge waves, so it's in the replicator's best interest to find a chink in the study's armor. On one hand, this is good, because it means that replicators are setting out to push hard at established results, and the ones that hold up are strengthened. On the other hand, replicators have the incentive to push the data in ways that don't necessarily make sense in order to turn that p<0.05 into a p>0.05.

So how do we ensure that we learn from replications? I see three potential solutions: 1) A replication repository 2) Journals could leave some space for replications and 3) Replication plans, meet peer review. 

Replication repository: Every graduate student in economics has probably (and if they haven't, they should!) replicated a paper at one time or another, and turned this replication in as a problem set. At this point, the replication gets a check-plus, and is never heard from again. This seems like a waste - it would be cool to have a place where these replications are collected, so that an interested party could pull up the replication rap sheet on a paper. (There are problems with this - who hosts the repository? What happens if things don't replicate?) 

Edit: It turns out that this already exists. It's not huge yet (212 replications so far), but this is a great thing. h/t to Matt Woerman for pointing this out.

Journals: It would be great to see top journals have a small section once a year where they commission economists in a relevant field to replicate papers published in that journal. They could guarantee a replication publication regardless of the result, in order to get better incentives around succeeding to replicate existing results, and this could be a good way of strengthening the science.

Replication plans + peer review: Just as with regular research, it's important to make sure that replicators aren't specification mining or data mining to get the results that look flashy on paper. This is especially tricky to prevent in replication-land, because often the data/code are already publicly available to a potential replicator. In order to make these replications as credible as possible, a potential replicator should be required to submit a very detailed replication plan (akin to a pre-analysis plan) for careful peer review before producing any replication results. This should help to ensure that replicators only push the data in ways that make sense to the broader community, rather than getting free passes to do whatever they want. Importantly, replications also need to highlight what their replication plan said, and any deviations from this plan need to be made clear and be defended. 

Finally, there's a lesson here in the interaction between the science and the media. A flashy press release goes a long way in getting newspapers and online media outlets to write about your worn - but at the end of the day, the goal should be advancing the science, rather than making one set of researchers look better than another. On the researcher side, making technical results understandable to the public is not always easy - but for research that has important implications for public policy, it is essential. On the media side, making sure to present a balanced account of the science is also important. Writing a headline that says ``Big study debunked!'' without speaking to the original authors (who have undoubtedly submitted an academic reply to the replication) is irresponsible journalism.

At the end of the day, the goal should be to make the best science possible. It's not always easy to check egos at the door, but it's essential that we do so in order to actually learn about the world.