Forget weatherization - how do we make evaluation work?

The New York Times put out an important article yesterday discussing the importance of credible policy evaluation, featuring work by the all-star team of Fowlie, Greenstone, and Wolfram. The upshot of the article? When program evaluation is done by people with a stake in seeing that same program succeed, we have reason to worry about the conclusions. The problem is the following: if non-independent evaluation teams suggest that a program is great, it's hard to know whether it's actually great or if existing incentives distorted the results.  

The Weatherization Assistance Program is a large effort by the US government to weatherize low-income households and make them more energy efficient. The aforementioned all-star team of economists put out a paper in June (now R&R at the QJE!) using a state-of-the-art randomized controlled trial to measure the energy savings from the program. They concluded, much to the chagrin of many energy efficiency advocates, that the costs of the program are twice the energy savings benefits among Michigan households.

Cheery weatherization clipart from here. 

The DOE recently released their own study of the program, in over 4,000 pages spread across 36 documents. If you're cynical like me, you're perhaps not shocked that the DOE's report finds that overall, the program benefit-cost ratio is 4:1. This takes into account non-energy benefits such as health that Meredith, Michael, and Catherine did not directly include in their original study (though to be fair, they did look at indoor temperature set-points, and find no evidence of changes - suggesting that there is little propensity for large health effects to result from reduced exposure to extreme temperatures among their sample).

What even a cynical reader might be surprised by is the magnitude of the problems with DOE's reports. From the Energy Institute blog (I also highly recommend that you read the accompanying deep dive into thermal-stress-related benefits):

We have spent many hours poring over these opaque documents. Our judgment is that many of the DOE’s conclusions are based on dubious assumptions, invalid extrapolations, the invention of a new formula to measure benefits that does not produce meaningful results, and no effort to evaluate statistical significance. Using the DOE’s findings, we show below that costs exceed energy savings by a significant margin. We also document major problems with the valuation of non-energy benefits, which comprise the vast majority of estimated program benefits.

Overall, the poor quality of the DOE’s analysis fails to provide a credible basis for the conclusion that the benefits of energy efficiency investments conducted under WAP substantially outweigh the costs. This blog summarizes our assessment of the DOE’s analysis for a general audience. We provide a more technical discussion, using the important example of benefits relating to thermal stress, here.
Eduardo Porter, author of the excellent New York Times article described above, also conducted a Q&A with Bruce Tonn, head of the DOE evaluation team. If you ask me, this is almost more damning than the original article - but I'll leave you to judge for yourself.

Full disclosure: I provided research assistance on the economists' response, helping to read over the thousands of pages of documents from DOE. So maybe I'm a less-than-impartial commentator. But I will say this: I would have been thrilled to see a DOE report that, using modern empirical techniques and direct measurements, was able to provide definitive proof of the existence of large and real health benefits from WAP. I'm disappointed that the evidence that DOE did provide was appears unconvincing and flawed. Getting climate policy right, and furthermore, getting low-income assistance right, in situations where governments have limited budgets, demands honest, sometimes hard-to-stomach, independent evaluation. Getting these policies right now will pay off in the long run - as will moving towards an institutional culture of proper ex post evaluation.

