data visualization

Lies, damn lies, and data visualization

A redundant trifecta? First of all, happy holidays, if you'd like, aka one of the few times of year I feel guilt-free about saying that I'm taking (shock and horror) two full days off! And for whomever of you is out there thinking "wait, but you're blogging...probably about economics"...hush. This blog isn't work (and as pretty much any academic economist will tell you, unless you're Chris Blattman, blogging won't help your career either)...[Lie or damn lie? You decide. I think we can clearly rule out data visualization on this one.]

I went to Edward Tufte's "Presenting Data and Information" one-day course in the city last week. If you're not already familiar with Tufte, you should be! He's one of the best writers out there on how to present data in an honest and compelling way, which is often under-appreciated by academic economists, I think.

The course was largely focused toward a corporate audience, but several of the general principles hold true in academia-land as well:

  • "Know your content" rather than "know your audience". In general, a guiding Tufte principle is that good content speaks for itself. Work to declutter your graphics/presentation so that your data/results are what pop out.
  • Be intellectually honest. I thought this goes without saying, but we've had a few recent egregious examples (see below) of badly misleading figures. Don't be a damn liar (and think about what you're presenting and how. Rules of thumb aren't always good).
  • Don't be afraid to integrate data with words. Good figure labels are essential, and annotative text can often help.
  • Social science is harder than real science [and also potentially has more transparency problems.]


Yikes, part 2.

Yikes, part 2.

Maybe none of these points seem super deep to you, but I think they're worth engaging with.

Tufte also spent a long time talking about presenting information in a way that allows the audience to take in your content in a (semi-)unguided manner before beginning to talk. The idea is to present a meeting audience with a handout when you walk in, and letting them read before you start clicking through slides. This helps get everyone on the same page, and also presents them with an opportunity to engage with material at their pace rather than making lots of people wait. Presenting information this way also allows people to dive deeper if they want - and a carefully crafted handout can do a lot in a page. Then when you actually start talking, you can have a more substantive, less hand-holdy discussion. Definitely interesting. Probably not going to be implemented in job talks any time soon, but the point about a carefully-crafted fluff-less written document is appealing in a field where papers are often 35+ pages (scientists manage to publish major work in 2 or 3, so...)

Other (random) things of note:

  • A paper I've been part of (with Catherine, Dave Rapson, Mar Reguant, and Chris Knittel) is being presented at the AEA meetings in January, in the "Evaluating Energy Efficiency Programs" session at (gulp) 8 AM on Jan. 3. Still very preliminary, but I think we're doing some cool things in incorporating machine learning with traditional econometric methods to exploit high-frequency electricity data. If you can suffer the early morning, come check it out!
  • Star Wars: The Force Awakens is awesome. Go see it if you haven't already! And if you have, shut up about the "they remade Episode IV" stuff. Similar plots? Sure. Was it still super awesome? 100%.
  • I basically missed out on all of the COP coverage, but hopefully this agreement represents some steps in the right direction.
  • My new Anova sous vide cooking toy has been a smashing success so far (neither Dana nor I has been harmed yet). 
  • Grist has some compelling maps.
  • The Economist has an infographic advent calendar. 
  • And, finally, on an optimistic note, Quartz's Chart of the Year shows the dramatic decline in people living in poverty over the last 200 years or so. We're doing a lot of things wrong these days, but it's nice to see clear graphical evidence that we're actually doing some stuff right.

That's it - stop reading this and go do something fun/outside/etc!


No, really, climate change will be bad!

I would be remiss to purport to blog about the economics of energy, the environment, and the developing world if I failed to highlight a new (important) study that came out in Nature this week.

The all-star team of Marshall Burke, Sol Hsiang (who has a fancy new website), and Ted Miguel is at it again, with a paper on the effects of temperature on GDP around the world. Before they even get to the empirics, they provide some really nice insight as to why when there are sharp non-linearities in micro temperature response functions, we shouldn't expect to see these same kinks in macro response functions. The idea is basically the following: a micro response function tells us the marginal effect of having an additional (hour, day) in a given temperature range. Imagine, as with US maize and lots of other things, temperature is increasing up to a point and then has a sharp decrease beyond that point. The macro response function will aggregate these days or hours up to a longer time period (a year, say), meaning that the overall effect of annual temperature on annual output will be a weighted average of the two slopes of the micro response, weighted by the number of days in each temperature range. Was that confusing? Check out Figure 1, panels d, e, and f (the math to derive this is all in the supplement to the paper as well):

This key insight is really important in allowing us to understand how we should expect micro responses to differ from macro ones. Cool. 

The authors then go on to empirically estimate the global macro temperature response function, settling on (after many robustness checks) a quadratic in temperature. What they come up with is a strong inverted-u shaped relationship, with an optimum around 55F (that might seem low, but remember that we're talking about annual average temperature here). This suggests that some (colder) countries might benefit from global warming, and hotter countries have a lot to lose. They tackle several points that are often brought up in this literature, and end up unable to reject that the rich and poor country responses are the same (though the confidence intervals are quite large as well. Minor gripe: 90% confidence intervals are shown in the paper. Yes, I know that 95% is arbitrary too, but it is the empirical economics standard...); they show that agriculture takes a big hit in both poor and rich countries, and that non-ag GDP seems to take slightly less of one in richer countries, but the relationship between temperature and non-ag GDP is still downward sloping; and finally, that the response functions in 1960-1989 look almost identical to the 1990-2010 response functions, suggesting that there hasn't been a ton of adaptation during the time period of their data.

Using these estimates, they go on to make some beautiful figures showing climate damage projections out to 2100 (IMHO, as much as I know that they like Figure 3, I think it's aesthetically pleasing but not the most legible). They find that, using fairly reasonable assumptions about growth and emissions paths, global GDP is projected to be approximately 25% lower in 2100 with climate change than without -- a much larger effect than all three current IAMs used in US policy (DICE, FUND, and PAGE) would suggest . There are wide confidence intervals around this estimate to be sure - but it's also worth noting that the majority of the uncertainty here comes from Europe and North America. These are large economies, and so have a large effect on GDP per capita overall, but are also close to the estimated global optimum, meaning that if the optimum is off by a little bit, the effects for these countries could even flip in sign.

I think this paper is a really important contribution to the climate-economics space. The effects are huge, and the paper (and supplemental information, and stuff that got left out of the supplemental information but was in an earlier non-circulating working paper version) is very thorough.

A few small comments: it is worth noting that there's a ton of statistical uncertainty floating around here.  Panel C of the first extended data figure shows the estimated marginal effects with lags included - and in every estimate that includes lags, the confidence interval bounds zero (and I think these are still the 90% CI's?). The confidence intervals on Figure 5a, the main estimate, also sit squarely on top of zero. And, as with every projection exercise, we should take this one with a giant brick of salt. These guys do a good job, but remember that they're also using short-run fluctuations in temperatures to trace out this response function. This is nice because, conditional  on the right fixed effects, we generally think that it's as good as randomly assigned, but does make plugging the estimates into a projection a little tricky to interpret. It's standard in this literature to do this kind of thing - and the fact that they find no evidence of adaptation in the 50+ year period they're looking at helps shore up the argument for doing so - but it's worth keeping in mind that that's what's being done.

It's also really important to think carefully (in all of these papers - not just BHM) about what's actually being used for identification. We know from Wolfram and Craig McIntosh that using higher-order polynomials in fixed effects models re-introduces cross sectional variation (and any omitted variable bias that comes with it!). I think in an earlier version of the paper, I saw a binned model floating around, which removes this concern, and had similar point estimates, but this general point is something that's under-appreciated, I think. (And, even with binned models, we need to be really careful when presenting something as the aggregate temperature response function, if there are only a few countries that ever end up in the really hot bins. That's a soapbox for another day.)

Also, as I mentioned above a little bit, while it's true that these guys aren't able to statistically reject that the poor and rich country responses are different, that doesn't mean that the true responses aren't different - it could be that there's not enough statistical power to address these questions in the data. That's going to be especially true at the colder end of the distribution - there are so few poor countries there that it's really hard to say anything concrete. 

All that said, I think this is a super interesting and important paper, and I'm glad that it's out in time for Paris. I've already learned a lot from these guys, and I continue to do so - they're some of the most careful, thorough, and productive researchers out there working on really policy-relevant topics. Plus, they make beautiful figures. This is a paper that's really worth diving into - I highly recommend actually reading the paper, the extended data, and the supplemental information (which is something I won't say very often)!

One last thing before I close: Marshall, Sol, and Ted have put up a really good companion site to their paper, that makes the results accessible and digestible. Plus, they've put up replication code - very important when you're working on such hot (ha) issues as climate and GDP. Take a look!

Edited to add: Marshall just posted a response to some frequent criticism on his blog. Worth a read.

WWP: Choleric intake is bad for housing prices

I have a confession to make: I haven't had a chance to fully read this week's excellent-looking paper from Attila Ambrus, Erica Field, and Robert Gonzalez. But! A quick skim convinced me that it's cool enough that it's absolutely worth mentioning. This paper invokes several of my favorite things:

1) Regression discontinuity design - I'm up to my ears in an RD paper with a classmate, and I'm coming more and more to appreciate the benefits of this design. Yes, only having a LATE around a certain threshold can be frustrating from an external validity perspective - but the gains in terms of identification seem to strongly outweigh the costs in many cases.

2) John Snow's 19th century medical work! I'm a little bit of a data visualization nerd, and John Snow's original map of London's cholera outbreak, which helped provide convincing evidence that the Broad Street water pump was contaminated and was the source of the contagion is a classic. Beautiful figure, and incredibly clever.

3) A surprising result. This paper finds that the areas that were negatively affected by the cholera-bearing pump in the 19th century still have lower house prices today. Not what I would've expected.

One of the money figures of the paper: prices are dramatically lower inside the pump's catchment area.

One of the money figures of the paper: prices are dramatically lower inside the pump's catchment area.

All in all, this seems like a really cool paper that I wish I had more time to actually dig into at the moment. I'll leave you with the authors' abstract:

How do geographically concentrated income shocks influence the long-run spatial distribution of poverty within a city? We examine the impact on housing prices of a cholera epidemic in 19th century London in which one in seven families living in one neighborhood experienced the death of a wage earner. Ten years after the epidemic, housing prices are significantly lower just inside the catchment area of the water pump that transmitted the disease, despite being the same before the epidemic. Moreover, differences in housing prices persist and grow in magnitude over the following century. Census data reveal that price changes coincide with a sharp increase in population density at the border, consistent with anecdotes of impoverished residents taking in subtenants to make ends meet. To illustrate a mechanism through which idiosyncratic shocks to individuals that have no direct effect on infrastructure can have a permanent effect on housing prices, we build a model of a rental market with frictions, with poor tenants exerting a negative externality on their neighbors, in which a locally concentrated negative income shock can permanently change the tenant composition of the affected areas.

What's my current excuse for not reading as much as I should?

Up at Sibley Park before the super blood moon eclipse / fantastic sunset.

Up at Sibley Park before the super blood moon eclipse / fantastic sunset.

Wedding in t-minus 13 (12?) days!

PS: You know you love that pun in the title. You might not admit it out loud, but you do.

WWP: Ocean risks

Maybe I have holdover inspiration from my weekend in the beautiful Pacific Northwest, but this week's WWP features two papers that are forthcoming in the AEJ: Applied (but not out yet - so I can still count them as working papers, right? Okay, maybe not, but at the very least, they're new work!) with oceanic themes. 

The first is a really cool new paper by Sebastian Axbard, a PhD candidate at Uppsala University in Sweden (turns out he was also a visiting scholar here at Berkeley a few years ago. I swear I didn't know until after I found the paper!). Axbard combines two of my favorite topics: piracy (or crime) and environmental (climate) economics. His paper uses some neat new remote sensed data on sea surface temperature and chlorophyll-a concentrations to construct a measure of fishing conditions, which he then combines with fish market price data, labor outcomes from one of Indonesia's (many great) datasets, SAKERNAS, and finally, geocoded piracy data. He uses exogenous variation in fishing to show that piracy responds to local incomes, and then goes on to show that a military exercise targeted at piracy (which is called, I kid you not, Operation Octopus !!!!!) did reduce attacks. He finds that the operation had a strong effect on piracy in locations with bad fishing conditions, but that the effects persist in time more effectively when the military operation is accompanied by good fishing conditions (and therefore, we surmise, a good outside option for these potential pirates). This paper is super cool - here's Axbard's abstract:

The effect of climatic variation on conflict and crime is well established, but less is known about the mechanism through which this effect operates. This study contributes to the literature by exploiting a new source of exogenous variation in climate to study the effect of fishermen’s income opportunities on sea piracy. Using satellite data to construct a monthly measure of local fishing conditions it is found that better income opportunities reduce piracy. A wide range of approaches are employed to ensure that these effects are driven by income opportunities rather than other mechanisms through which climate could affect piracy.

The most recent ungated copy of the paper I could find is here

Cool display of the attacks (red dots) and fishing conditions (blue squares) from Axbard's paper. From Figure 5. 

Cool display of the attacks (red dots) and fishing conditions (blue squares) from Axbard's paper. From Figure 5. 

This week's second paper is also about the ocean - sort of (okay, this is a little bit of a stretch - but typhoons come from the ocean, so I'm going to claim that this post is cohesive). André Gröger at Goethe Universitaet Frankfurt and Yanos Zylberberg at Bristol also use remote sensed data, this time to look at the effect of a huge typhoon in Vietnam on migration. These guys use satellite information on coastal inundation (I had to look this up - NOAA's definition is "Water covering normally dry land is a condition known as inundation"), which they construct from MODIS images, which they match to another great Vietnamese panel dataset. They find that the typhoon caused a large decrease in incomes in affected areas, and households respond by sending migrants out of the home, or for homes that already have a migrant, seeing increased remittances from their migrant. The body of work so far on migration as a response to (climatic) shocks is small, but rapidly growing - this is a really cool new addition. The authors themselves write:

We analyze how internal labor migration facilitates shock coping in rural economies. Employing high precision satellite data, we identify objective variations in the inundations generated by a catastrophic typhoon in Vietnam and match them with household panel data before and after the shock. We find that, following a massive drop in income, households cope mainly through labor migration to urban areas. Households with settled migrants ex-ante receive more remittances. Nonmigrant households react by sending new members away who then remit similar amounts than established migrants. This mechanism is most effective with long-distance migration, while local networks fail to provide insurance.

 I again found an ungated copy here

Again, more awesome satellite data. Figure 2 from the Gröger and Zylberberg paper.

Again, more awesome satellite data. Figure 2 from the Gröger and Zylberberg paper.

Really excited to see remote sensing data being used in (soon-to-be) published papers! I consider myself lucky to be an economist now in our current era of amazing data availability - and it gets better every day. I hope that in a few years I'll look back at this blog post and laugh at what I used to think was data abundance.

Back on track

I was going to make this post a Wednesday Working Paper, but because of my fantastic Seattle vacation (and less fantastic return to 2 vet trips in as many days with my cat), I haven't actually read anything new. Sorry I'm not sorry. To get the ball rolling agin, I want to highlight two great websites that were brought to my attention this week (both via Twitter. Have I mentioned my ongoing love affair with Twitter yet?)

Great data visualization or  the greatest  data visualization? Proof that both analysis of and presentation of (social science) data is hard.

Great data visualization or the greatest data visualization? Proof that both analysis of and presentation of (social science) data is hard.

First, FiveThirtyEight has a really nice piece on the state of science. Like the Economist article I blogged about a little while ago (first link to my own blog - oooh, meta), this post has an interactive infographic where you can play with p-hacking, this time using actual data to show statistically significant effects of Republicans/Democrats on the economy. The article does a nice job explaining potentially complex issues, like p-hacking, differences in methodological approaches by different disciplines, and the degree to which science is self-correcting, in a digestible way.  As a (social) scientist myself, I appreciate the article's headline and subtitle: ``Science Isn't Broken - It's just a hell of a lot harder than we give it credit for.'' Truth. One important thing missing from this article, though, is that the author spends essentially zero time talking about causality. The p-hacking exercise (and, as far as I can tell, the fascinating soccer player example...which includes an author from BITSS, Garret Christensen) deals only with correlations. Figuring out whether something is causal or merely correlational might be the biggest part of my job as a young economist - and actually nailing down causality is really hard to do. So consider that yet another (extremely large) item on the why-(social)-science-is-hard list. We would also benefit from more media highlighting the differences between causal and correlational work - both are very important, but should have different policy implications, but they're often treated as one and the same in newspaper or online articles about research. Kudos overall, though, to FiveThirtyEight for a detailed but readable piece on the challenges of doing science currently (and how far we've come at doing better science - we've got a ways to go, but I'm optimistic that a great deal of progress has already been made).

On a lighter note (and not to be outdone), here's what might be my new favorite time-waster website: bad data presentation from WTF Visualization. Seems like the creators of these awful graphics need to read some Tufte