Saturday, April 8, 2017

The hater’s guide to (experimental) reproducibility

(Thanks to Caroline Bartman and Lauren Beck for discussions.)

Okay, before I start, I just want to emphasize that my lab STRONGLY supports computational reproducibility, and we have released data + code (code all the way from raw data to figures) for all papers primarily from our lab for quite some time now. Just sayin’. We do it because a. we can; b. it enforces a higher standard within the lab; c. on balance, it’s the right thing to do.

All right, that said, I have to say that I find, like many others, the entire conversation about reproducibility right now to be way off the rails, mostly because it’s almost entirely dominated by the statistical point of view. My opinion is that this is totally off base, at least in my particular area of quantitative molecular biology; like I said before, “If you think that github accounts, pre-registered studies and iPython notebooks will magically solve the reproducibility problem, think again.” Yet, it seems that this statistically-dominated perspective is not just a few Twitter people sounding off about Julia and Docker. This "science is falling apart" story has taken hold in the broader media, and the fact that someone like Ioannidis was even being mentioned for director of NIH (!?) shows how deeply and broadly this narrative has taken hold.

Anyway, I won’t rehash all the ways I find this annoying, wrongheaded and in some ways dangerous, I’ll just sum up by saying I’m a hater. But like all haters, deep down, my feelings are fueled by jealousy. :) Jealousy because I actually deeply admire the fact that computational types have spent a lot of time thinking about codifying best practices, and have developed a culture and sense of community standards that embodies those practices. And while I do think that a lot of the moralistic grandstanding from computational folks around these issues is often self-serving, that doesn’t mean that talking about and encouraging computational/statistical reproducibility is a bad thing. Indeed, the fact that statisticians dominate the conversation is not their fault, it’s ours: why is there no experimental equivalent to the (statistical/computational) reproducibility movement?

So first off, the answer is that there is, with lists of validated antibodies and an increased awareness of things like cell line and mycoplasma contamination and so forth. That is all great, but in my experience, these things journals make you check are not typically the reasons for experimental irreproducibility. Fundamentally, these efforts suffer from what I consider a “checklist problem”, which is the idea that reproducibility can be codified into a simple, generic checklist of things. Like, the thought is that if I could just check off all the boxes on mycoplasma and cell identification and animal protocols, then my work would be certified as Reproducible™. This is not to say that we shouldn’t have more checklists (see below), but I just don’t think it’s going to solve the problem.

Okay, so if simplistic checklists aren’t the full solution, then what is? I think the crux of the issue actually comes back to a conversation we had with the venerable Warren Ewens a while back about how to analyze some data we were puzzling over, and he said something to the effect of “There are all these statistical tests we can think about, but it also has to pass the smell test.” This resonated with me, because I realize that that at least some of us experimentalists DO teach reproducibility, but it’s more of an experiential learning to try and impart an intuitive sense of what discrepancies to ignore and which to lose sleep over. In particular in molecular biology, where our tools are imprecise and the systems are (hopelessly?) complex, this intuition is, in my opinion, the single most skill we can teach our trainees.

Thing is, some do a much better job of teaching this intuition than others. I think that where we can learn from the computational/statistical reproducibility movement is to try and at least come up with some general principles and guidelines for enhancing the quality of our science, even if they can’t be easily codified. And within a particular lab, I think there are some general good practices, and maybe it’s time to have a more public discussion about them so that we can all learn from each other. So, with all that in mind, here’s our attempt to start a discussion with some ideas for experimental reproducibility, ranging from day-to-day to big picture:
  1. Keep an online lab notebook that is searchable with links to protocols and is easily shared with other lab members.
  2. Organize protocols in an online doc that allows for easy sharing and commenting. Avoid protocol "fragmentation"; if a variation comes up, spend the time to build that in as a branch point in the protocol. Otherwise, there will be protocol drift, and others may not know about new improvements.
  3. Annotate protocols carefully, explaining, where possible, which elements of the protocol are critical and why (and ideally have some documentation). This helps to avoid protocol cruft, where new steps get introduced and reified without reason. Often, leading a new trainee through a protocol is a good time to annotate, since it exposes all the unwritten parts of the protocol. Note: this is also a good way to explore protocol simplification!
  4. Catalog important lab-generated reagents (probes, plasmids, etc.) with unique identifiers and develop a system for labeling. In the lab, we have a system for labeling and cataloging probes, which helps us figure out post-facto what the difference is between "M20_probe_Cy3" and "M20_probe_Cy3_usethis". What is hard with this is to develop a system for labeling enforcement. Not sure how best to do this. My system is that I won't order any new probes for a person until all their probes are appropriately cataloged.
  5. Carefully track biologic reagents that are known to suffer from lot variability, including dates, lot numbers, etc. Things like matrigel, antibodies, R-spondin.
  6. Set up a system for documenting little experiments that establish a little factoid in the lab. Like "Oh, probe length of 30 works best for expansion microscopy based on XYZ…". These can be invaluable down the line, since they're rarely if ever published—and then turn from lab memory into lab lore.
  7. Journal length limits have led to a culture of very short and non-detailed methods, but there's this thing called the internet that apparently can store and share a lot of information. I think we need to establish a culture of publicly sharing detailed protocols, including annotating all the nuances and so forth. Check out this from Feng Zhang about CRISPR (we also have made an extensive single molecule RNA FISH page here).
  8. (Lauren) Track experiments in a log, along with all relevant (or even seemingly irrelevant) details. This could be, for instance, a big Google Doc with list of all similar types of experiments, pointing to where the data is kept, and critically, all the little details. These tabulated forms of lab notebooks can really help identify patterns in those little details, but also serve to show other members of the lab what details matter and that they should be attentive to.
  9. Along those lines, record all your failures, along with the type of failure. We've definitely had times when we could have saved a lot of time in the lab if we had kept track of that. SHARE FAILURES with others in the lab, especially the PI.
  10. (Caroline) Establish an objective baseline for an experiment working, and stick to it. Sort of like pre-registering your experiment, in a way. If you take data, what will allow you to say that it worked or didn't work. If it didn't work, is there a rationalization? If so, discuss with someone, including the PI, to make sure you aren't deluding yourself and just ignoring data you don't like. There are often good reasons to drop bits of data, and sometimes we make mistakes in our judgement calls, but at least get a second opinion.
  11. Develop lab-specific checklists. Every lab has it's own set of things it cares about and that people should check, like microscope light intensity or probe HPLC trace or whatever. Usually these are taught and learned through experience, but that strikes me as less efficient than it could be.
  12. Replicates: What constitutes a biological replicate? Is it the same batch of cells grown in two wells? Is it two separate passages of the same cell line? If so, separated by how much time? Or do you want to start each one fresh from a frozen vial? Whatever your system, it's important to come up with some ground rules for what replicates means, and then stick to it. I feel like one aspect of replication is that you don't want the conditions to be necessarily exactly the same, so a little variability is good. After all, that's what separates a biological replicate (which is really about capturing systematic but unknown variability) from a technical replicate (which is statistically variability).
  13. Have someone else take a look at your data without leading them too much with your hypothesis. Do they follow the same logic to reach the same conclusion? Many times, people fall so in love with their crazy hypothesis that they fail to see the simpler (and far more plausible) boring explanation instead. (Former postdoc Gautham Nair was so good at finding the simple boring explanation that we called it the "Gautham transform" in the lab!)
  14. Critically examine parts that don't fit in the story. No story is perfect, especially in molecular biology, which has a serious "everything affects everything" problem. Often times there is no explanation, and there's nothing you can really do about it. Okay, but resist the urge to sweep it under the rug. Sometimes there's new science in there!
  15. Finally, there is no substitute for just thinking long and hard about your work with a critical mindset. Everything else is just, like I said, a checklist, nothing more, nothing less.
Anyway, some thoughts, and I'm guessing most people already do a lot of this, implicitly or explicitly. We'd love to hear the probably huge list of other ideas people out there have for improving the quality/reproducibility of their science. Point is, let's have a public discussion so that everyone can participate!


  1. As a computing guy in and out of machine learning, and most likely earning the derision of biologists everywhere for nursery school level oversimplifications...... don't admire us computational types too much - far too often that's what we say, not what we do :-) That said, I agree wholeheartedly with what appears to be your goal, from a non-biologists perspective

    1. Haha, yes, not that we do all of this stuff either! Agree that every step in the right direction is a good one, no matter how small.

  2. For point 1), what do y'all use for online notebook? Been doing google docs, but unhappy with loading / download-PDF times.

    1. Well, for a while, people were using Google Docs in the lab, but frankly, I think most of the lab has moved back to paper, sadly. I think Google Docs just ended up being a bit too cumbersome in the end… though I miss them dearly as a PI.

    2. :-( But paper is not computationally fungible :-(

  3. My 2c - Jupyter - cell oriented notebook in browser, hosts a wide variety of back ends - my bestiary includes python 2,3, R, SBCL, and some even odder back ends - go here --

  4. I'm curious how it handles images, I'm assuming that's trivial to load and print one, but can it store them standdalone in the output? So that it can be distributed independent of the filesystem?

    How do you keep these stored, synced, and shared with people?

    1. Well, we're about to go way off topic on this thread - I can respond here or give you an email, Arjun Raj can make the call - in short, its a browser based solution, its excellent not only at pictures but also whizzy things like animated charts and even volume visualizations :-) In the interim, check the link I gave to Jupyter, it has a very active community

    2. You're right, I should just give it a go. Thanks for the pointer/plug though.

    3. Interesting, images for a notebook? Like gel images, etc.? We usually just stick those directly in a Google Doc. For large-scale microscopy images, we've just started using Dropbox Business with Smart Sync and have been pretty happy with it so far…

    4. You can do images, interactive 2d plots, even 3D volume visualization, you can interact with it, make it barf up a frozen pub ready paper whenever you want- the catch is you need to tell it what to do in code, but it's pretty easy to get a hold of the basics - here's an app somebody did for gels -- and here's a set of living demo notebooks, including a couple of biology focused ones (yeah, we geeks had a head start :-) ) - we can talk more via email if you want

  5. As another hater of the prevailing reproducibility discussion, I endorse all of your ideas for lab reproducibility. My first experience was in a yeast lab back before modern genomics - we had two computers in the lab, one to run Vector NTI, and another to write papers – and the good labs I knew were basically doing some version of your recommendations on paper, before the era of Google docs. Getting a cutting-edge experimental system up and running consistently is not trivial, and it requires serious effort & lab discipline to keep going. That also means it's not trivial for another lab that doesn't have the same system up and running to easily replicate a particular result.

    About intuition: One thing I find infuriating in overly broad discussions of reproducibility, dominated by psychologists and statisticians, is the lack of a historical perspective. Over the past 70 years, one of the most successful scientific fields - really one of the most successful in history - is molecular biology. And when you look back at how the field worked, people didn't rely much on statistical hypothesis testing, they didn't use large N in their experiments, and they did very little of what the Reproducibility crowd is arguing should be standard. And yet this has been one of the most historically successful fields. I have yet to hear anyone who claims that 'science' generally is facing a reproducibility crisis explain why molecular biology has been so successful without adopting the proposed remedies - or explain what has changed to make molecular biology supposedly less reproducible today.

    This is where I think your argument for intuition comes in. The pioneering molecular biologists had killer intuition, and they were relentlessly self-critical. That's how they made progress without adopting formulaic reproducibility standards.

    1. Agree 100% about the historical perspective. I've brought this up a few times on Twitter, and people say, well, the future is this statistical perspective, but I've seen little evidence to suggest that so far, even though we've had big data and stats in biology for a while now.

      When it comes down to doing good quality science, as you say, there's just no substitute for hard, careful, critical thinking. No set of checklists can codify that. The best they can do is help out a bit with logistics.