Sunday, August 23, 2015

Top 10 signs that a paper/field is bogus

These days, there has been a lot of hang-wringing about how most papers and wrong and reproducibility and so forth. Often this is accompanied with some shrill statements like “There’s a crisis in the biomedical research system! Everything is broken! Will somebody please think of the children?!” And look, I agree that these are all Very Bad Things. The question is what to do about it. There are some (misguided, in my view) reproducibility efforts out there, with things like registered replication studies and publishing all negative results and so forth. I don’t really have too much to say about all that except that it seems like a pretty boring sort of science to do.

So what to do about this supposed crisis? I remember someone I know telling me that when he was in graduate school, he went to his (senior, pretty famous) PI with a bunch of ideas based on what he'd been reading, and the PI said something along the lines of "Look, don't confuse yourself by reading too much of that stuff, most of it’s wrong anyway". I've been thinking for some time now that this is some of the best advice you can get.

Of course, that PI had decades of experience to draw upon, whereas the trainee obviously didn't. And I meet a lot of trainees these days who believe in all kinds of crazy things. I think that learning how to filter out what is real from the ocean of scientific literature is a skill that hopefully most trainees get some exposure to during their science lives. That said, there’s precious little formalized advice out there for trainees on this point, and I believe that a little knowledge can go a long way: for trainees, following up on a bogus result can lead to years of wasted time. Even worse is choosing a lab that works on a bogus field–a situation from which escape is difficult. So I actually think it is fair to ask “Will somebody please think of the trainees?”.

With this in mind, I thought it might be useful to share some of the things I've learned over the last several years. A lot of this is very specific to molecular biology, but maybe useful beyond. Sadly, I’ll be omitting concrete examples for obvious reasons, but buy me a beer sometime and then maybe I'll spill the beans. If you’re looking for a general principle underlying these thoughts, it’s to have a very strong underlying belief system based in Golden Era molecular biology. Like: DNA replication, yeah, I’m pretty sure that’s a thing. Golgi Apparatus, I think that exists. Transcription and translation, pretty sure those really happen. Beyond that, well…
  1. Run the numbers. One consistent issue in molecular biology is that because it tends to be so qualitative, we have little sense for magnitudes and plausibility of various mechanisms. That said, we now are getting to the point where we have a lot more quantitative data that lets us run some basic sanity checks (BioNumbers is a great resource for this). An example that I’ve come across often is mRNA localization. Many people I’ve met have, umm, fairly fanciful notions of the degree to which mRNA is localized. From what we’ve seen in the lab, almost every mRNA seems to just be randomly distributed around the cytoplasm, with the exception being ER-localized ones, which are, well, localized to the ER. Ask yourself: why should there be any mRNA localization? Numbers indicate that proteins diffuse quite rapidly around the cell, on a timescale that is likely faster than mRNA transport. So for most cells, the numbers say that you shouldn’t localize mRNA–rather, just localize proteins. And, uh, that’s what we see. There are of course exceptions, like lncRNA, that show interesting localization patterns–again, this makes sense because there is no protein to localize downstream. There are other things that people say about lncRNA that don’t make sense, though. I’ll leave that as an exercise for the reader… :) (Also should point out that these considerations can actually help make the case for mRNA localization in neurons, which I think is a thing.)
  2. Consider why nobody has seen this Amazing New Phenomenon before. Was it a lack of technology? Okay, then it might be real. Was it just brute force? Also possible that it's real. Was it just waiting for someone to think of the idea? Well, in my experience, nutty ideas are relatively cheap. So I'd be very suspicious if this result was just apparently sitting there without anyone noticing. Ask yourself: should this putative set of genes have shown up in a genetic screen? Should this protein have co-purified with this other protein? Did people already do similar experiments a long time ago and come up empty handed? What are other situations in which people may have inadvertently seen the same thing before? It’s also possible that the result is indeed true, but represents a “one-off” special case: consider this exchange about a recent paper (I have to say that I was surprised that some people in the lab didn’t even find this result surprising!). Whether you choose to pursue one-offs is I think a largely aesthetic choice.
  3. Trust your brain, not stats. If looking at an effect makes you wonder what the p-value is, you’re already on thin ice, so tread carefully. Also, beware of new statistical methods that claim to extract results from the same data where none existed before. Usually, these will at best find only marginally interesting new examples. More often, they just find noise. If there was something really obvious, probably the original authors would have found it by manual inspection of the data. Also, if there’s a clear confounding factor that the authors claim to have somehow controlled for, be suspicious.
  4. Beware of the "dynamic process". Sometimes, when you press someone on the details of a particular entity or process in the cell whose existence is dubious, they will respond with "Well, it's a dynamic object/process." Often (though certainly not always), this is an excuse for lazy thinking. Remember that just because something is "dynamic" doesn't mean that you should not be able to see it! Equilibrium, people.
  5. For some crazy new proposed mechanism, ask yourself if that is how you think the cell would do it. We often hear that nothing in biology makes sense except in light of evolution. In this context, I think it's worth wondering whether the proposed mechanism would be a reasonable way for the cell to do something it was not otherwise able to do. Otherwise, maybe it’s some sort of artifact. As a (made up) example, cells have many well-established mechanisms for communicating with each other. For a new mechanism of communication to be plausible (in my book), it should offer some additional functionality beyond these existing mechanisms. Evolution can do weird stuff, though, so this line of reasoning is inherently somewhat suspect.
  6. Check for missing obvious-next-step experiments. Sometimes you’ll find a paper describing something cool and new, and you’ll immediately wonder “Hmm, if what they’re saying is true, then shouldn’t those particles also be able to…”. Well, if you thought of it after reading a paper for 30 minutes, then presumably the authors had the same idea as some point as well. And presumably tried it. And it presumably didn’t work. (Oh, wait, sorry, I meant the results were “inconclusive”.) Or they tried to get RNA from those cells to profile. And they just didn’t get enough RNA. And so on. Keep an eye out for these, especially if multiple papers are missing these key experiments.
  7. For methods, look for validation with known biology. The known positives should be positive and presumed negatives should be negative. Let’s say you have some new sequencing method for measuring all RNA-protein interactions (again, completely hypothetical). Have a list of known interactions that should show up and a list of ones for which there’s no plausible reason to expect an interaction. Most people think about the positives, but less often about the negatives. Think carefully about them.
  8. Dig carefully into validation studies. I remember reading some paper in which they claimed to have detected a bunch of new molecules and then “validated” their existence. Then the validation had things like blots exposed for weeks to show signals and PCRs run for 80 cycles and stuff like that. Hmm. Often this data is buried deep in supplements. Spend the time to find it.
  9. Be suspicious of the interpretation of biological perturbations. Cells are hard to manipulate. And so it’s perhaps unsurprising that most perturbations can lead you astray. Off-target effects for knockdown are notoriously difficult to control for. And even if you do have target specificity, another problem is that as our measurements get better, biological complexity means that virtually all hypotheses will be true at least 50% of the time. Overexpression often leads to hugely non-biological protein levels and can lead to artifacts. Cloning out single cells leads to weird variability. Frankly, playing with cells is so difficult that I’m sort of amazed we understand anything!
  10. Know the limitations of methods. If you’re looking for differential gene expression, how much can you trust RT-PCR? Have you heard of the MIQE guidelines for RT-PCR? I hadn't, but they are extensive. For RNA-seq, how well-validated is it in your expression regime? If you’re analyzing sequence variants, how do you know it’s not sequencing error (since largely discredited claims of extensive RNA editing are one widely-publicized example of this issue). ChIP-seq hotspots? The list goes on. If you don’t know much about a method, ask someone who does.
  11. Bonus: autofluorescence. Enough said.
I offer these more as a set of guidelines for how I like to think about new results, and I’m sure we can all think of several counterexamples to virtually every one of these. My point is that high-level thinking in molecular biology requires making decisions, and making a real decision means leaving something else on the table. Making decisions based on the literature means deciding what avenues not to follow up on, and I think that most good molecular biologists learn this early on. Even more importantly, they develop the social networks to get the insider’s perspective on what to trust and what to ignore. As a beginning trainee, though, you typically will have neither the experience nor the network to make these decisions. My advice would be to pick a PI who asks these same sorts of questions. Then keep asking yourself these questions during your training. Seek out critical people and bounce your ideas off of them. At the same time, don’t become one of those people who just rips every paper to shreds in journal club. The point is to learn to exhibit sound judgement and find a way forward, and that also means sifting out the good nuggets and threading them together across multiple papers.

As a related point, as I mentioned earlier, there’s a lot of fuss out there about the “reproducibility crisis”. There’s two possible models: one in which we require every paper to be “true”, and the more laissez-faire model I am advocating for in which we just assume a lot of papers are wrong and train people to know the difference. Annoying reviewers often say that extraordinary claims require extraordinary evidence. I think this is wrong, and I’m not alone (hat tip: Dynamic Ecology). I think that in any one paper, you do your best, and time will tell whether you were right or wrong. It’s just not practical or efficient for a paper to solve every potential problem, with every result cross-validated with every method. Science is bigger than any one paper, and I think it’s worth providing our trainees with a better appreciation of that fact.

Update, 8/25/2015:
Couple nice suggestions from various folks. One from anonymous suggests "#12: Whenever the word 'modulating' appears anywhere in title/abstract." Well said!

Another point from Casey Bergman:


  1. This is a fine piece of writing. I've though the same for years, but not expressed it as cogently. Copied to my labbies. Thanks.

  2. True academic papers are now hard to find. It is really a sad thing.

  3. Your overarching point about the importance of good judgment is very important, and very well put.

  4. Very practical but easy to be ignored points to avoid being bogus. Thanks for the sharing.
    BOC Sciences

  5. Very cogent guidelines. I propose NanoFlares/SmartFlares and now StickyFlares (see PMID: 26195734 ) as a subject for applying these guidelines.

  6. I'd add to the list any title which has the word "regulate" in it...WAY over used, and far too determinisitic. Great read!

  7. Arun, i took me a while! but I finally replied as promised. Find long version here:
    In brief:
    Top 10 signs RNA is awesome
    1. Degree of RNA localization is not overrated but essentially unknown!
    2. RNAs not always randomly distributed!
    3. Why should RNAs be localized?
    • For starters, diffusion is in fact very limited!
    • Then also: RNAs don't diffuse well at all!
    • And: the cytoplasm is not water!
    • Cells are not small and round.
    • Are all localized RNAs encoding localized proteins?
    • Localized RNAs = silenced pool?
    • Localized RNAs = not translated?
    • Evolution!

    Best wishes!