Tuesday, January 14, 2014

How many ways can you be right?

I was reading a little bit today about the Reinhart-Rogoff saga today, and it got me thinking about robust conclusions. For those of you who don’t know, Reinhart and Rogoff are a couple of macroeconomists who wrote an influential paper on the relationship between debt and growth. Basically, they argue in their paper that debt in excess of 90% of GDP is associated with significantly lower economic growth, like there’s some sort of a threshold effect. Politicians who are pro-austerity had then seized upon this research as a key piece of evidence in favor of austerity measures worldwide (causality be damned!). Now *that* is a high impact paper! The brouhaha began when a UMass graduate student (Thomas Herndon) attempted to replicate the results for a class project. Turns out there were a few issues with the original paper. One of the funniest/scariest ones was that they had an error in their spreadsheet that omitted some data from their calculations. I don’t know what’s funnier or scarier: 1. that nobody thought to double check the spreadsheet before, you know, shaping global economic policy, or 2. that serious academics are actually using Excel (Excel!) for this sort of “quantitative" work. Whatever, that’s a whole other can of worms that has been written about to death elsewhere. 

To me, a more interesting issue had to do with the particulars of how they treated their data. The question relates to whether the average by country or by country-year–some of the countries in the data set had a lot more data points than others. Basically, if you average the R&R way (average by country), you see a sharp decrease in GDP growth above 90% debt load. Averaging by country year (which seems to make more sense to me), this effect disappears, which is one of the points of the Herndon re-analysis. Note that R&R always took care to emphasize the median and not the mean, presumably because of outliers or something. Here’s the R&R data. It’s the -0.1 number at 90% and above that goes away in the Herndon paper.

Now, I suppose that R&R had strong reasons for doing the averaging their way (and have said so in public), although it sounds to me like reasonable people could disagree. And I suppose that reasonable people could argue about whether you should use the median or the mean. The point I think is interesting is that the conclusions can change remarkably depending on which seemingly reasonable thing you do. What got me thinking about it is this line from Tyler Cowen’s blog on the subject:
In the paper by the critics, the pp.7-9 discussion of “weighting by country” vs. “weighting by country-year” is very interesting, but the fact that it matters as much as it does makes me more skeptical about the entire enterprise.
Indeed, it does make me question whether these results have any merit at all in either direction! Usually, when we see this sort of stuff in our data, the first thing we check are the number of data points and the error bars–honestly, I’m amazed that they can get away without putting any sort of estimates or even discussion of confidence intervals on their data. See this link for a much more honest portrayal of this data (got the link from some article by Paul Krugman).

Makes me wonder about similar sorts of problems in molecular biology, particularly in the age of deep sequencing. I’ve definitely become very suspicious whenever I hear reports of something that seems to rely very heavily on some newfangled analysis of otherwise run of the mill data, and there have been a number of such high profile reports in recent years that ended up being bogus (which shall of course remain nameless…). Although to be clear, there are also a bunch of non-sequencing examples as well, such as how people quantify alternative splicing, etc. I just feel like robust results should be fairly clear and not particularly dependent on some weird normalization or what have you.

Uri Alon’s network motifs are a great example of something very robust to the particulars. For those of you who are not familiar with it, here’s the idea: given the transcriptional network of E. coli, it seems like particular subnetwork “motifs” are highly overrepresented compared to what a random network would give you. An example is negative feedback, where a transcription factor downregulates its own transcription. Now, one sticky point is what one means by a “random” network. There are many such ways to construct random networks–do you maintain the scaling of the connectivity? Number of edges? Whatever. The point is that the results were so significant that the p-values were something like 10^-20 or less no matter what sort of random network you choose as a null model. So I believe it! I think it also illustrates a good general practice when you are faced with decisions in analysis: if you have to make a choice upon which reasonable (or even moderately unreasonable) people could disagree, just do both and see what happens. If the results are consistent with your interpretation either way, all the better. If not, well, you better make a strong case for your particular analysis method... or perhaps its time to revisit your model.

For instance, in Marshall’s paper on chromosome structure, we saw one really strong gene interaction pair and several more weak interactions. There were multiple ways of generating a p-value for each pair, and they all gave slightly different answers. But only one pair really stuck no matter what we did with the data, and that’s the one we reported.

Of course, things like chromosome structure are fairly esoteric, and so flimsy claims are considerably less influential in the world at large than the Reinhart-Rogoff paper. Now, I don’t think that austerity politics, be they right or wrong, has its roots in this one paper, so it’s perhaps a bit unfair to hold these two solely responsible for austerity measures globally. But it’s disingenuous for them to say that they had no role in the debate, since they were literally in the room while the political types were deciding on policy. Honestly, I personally would not have been comfortable or confident giving advice in such a situation based on the data they show in their paper. But I’ve certainly never been in such a position. And we’re all publicity hounds in this line of work. Consider the following line from the same Tyler Cowen blog post:
My own view, as you can read in The Great Stagnation, is that the primary mechanism is slow growth causing high debt/gdp ratios, not vice versa. In any case this is by far the most important issue, whether or not you agree with my take on it.
Shameless book plug! Is that an economist thing? Anyway, can I tell you some more about Marshall’s paper now? ;)

No comments:

Post a Comment