Tuesday, June 28, 2016

Reproducibility, reputation and playing the long game in science

Every so often these days, something will come up again about how most research findings are false. Lots of ink has already been spilled on the topic, so I won’t dwell on the reproducibility issue too long, but the whole thing has gotten me thinking more and more about the meaning and consequences of scientific reputation.

Why reputation? Reputation and reproducibility are somewhat related but clearly distinct concepts. In my field (I guess?) of molecular biology, I think that reputation and reproducibility are particularly strongly correlated because the nature of the field is such that perceived reproducibility is heavily tied to the large number of judgement calls you have make in the course of your research. As such, perhaps reputation has evolved as the best way to measure reproducibility in this area.

I think that this stands in stark contrast with the more common diagnosis one sees these days for the problem of irreproducibility, which is that it's all down to statistical innumeracy. Every so often, I’ll see tweets like this (names removed unless claimed by owner):

The implication here is that the problem with all this “cell” biology is that the Ns are so low as to render the results statistically meaningless. The implicit solution to the problem is then “Isn’t data cheap now? Just get more data! It’s all in the analysis, all we need to do is make that reproducible!” Well, if you think that github accounts, pre-registered studies and iPython notebooks will magically solve the reproducibility problem, think again. Better statistical and analysis management practices are of course good, but the excessive focus on these solutions to me ignores the bigger point, which is that, especially in molecular and cellular biology, good judgement about your data and experiments trumps all. (I do find it worrying that statistics has somehow evolved to the point of absolving ourselves of the responsibility for the scientific inferences we make ("But look at the p-value!"). I think this statistical primacy is perhaps part of an bigger—and in my opinion, ill-considered—attempt to systematize and industrialize scientific reasoning, but that’s another discussion.)

Here’s a good example from the (infamous?) study claiming to show that aspartame induces cancer. (I looked this over a while ago given my recently acquired Coke Zero habit. Don’t judge.) Here’s a table summarizing their results:

The authors claim that this shows an effect of increased lymphomas and leukemias in the female rats through the entire dose range of aspartame. And while I haven’t done the stats myself, looking at the numbers, the claim seems statistically valid. But the whole thing really hinges on the one control datapoint for the female rats, which is (seemingly strangely) low compared to virtually everything else. If that number was, say, 17% instead of 8%, I’m guessing essentially all the statistical significance would go away. Is this junk science? Well, I think so, and the FDA agrees. But I would fully agree that this is a judgement call, and in a vacuum would require further study—in particular, to me, it looks like there is some overall increase in cancers in these rats at very high doses, and while it is not statistically significant in their particular statistical treatment, my feeling is that there is something there, although probably just a non-specific effect arising from the crazy high doses they used.

Hey, you might say, that’s not science! Discarding data points because they “seem off” and pulling out statistically weak “trends” for further analysis? Well, whatever, in my experience, that’s how a lot of real (and reproducible) science gets done.

Now, it would be perfectly reasonable of you to disagree with me. After all, in the absence of further data, my inklings are nothing more than an opinion. And in this case, at least we can argue about the data as it is presented. In most papers in molecular biology, you don’t even get to see the data from all experiments they didn’t report for whatever reason. The selective reporting of experiments sounds terrible, and is probably responsible for at least some amount of junky science, but here’s the thing: I think molecular biology would be uninterpretable without it. So many experiments fail or give weird results for so many different reasons, and reporting them all would leave an endless maze that would be impossible to navigate sensibly. (I think this is a consequence of studying complex systems with relatively imprecise—and largely uncalibrated—experimental tools.) Of course, such a system is ripe for abuse, because anyone can easily leave out a key control that doesn’t go their way under the guise of “the cells looked funny that day”, but then again, there are days where the cells really do look funny. So basically, in the end, you are stuck with trust: you have to trust that the person you’re listening to made the right decisions, that they checked all the boxes that you didn’t even know existed, and that they exhibited sound judgement. How do you know what work to follow up on? In a vacuum, hard to say, but that’s where reputation comes in. And when it comes to reputation, I think there’s value in playing the long game.

Reputation comes in a couple different forms. One is public reputation. This is the one you get from talks you give and the papers you publish, and it can suffer from hype and sloppiness. People do still read papers and listen to talks (well, at least sometimes), and eventually they will notice if you cut corners and oversell your claims. Not much to say about this except that one way to get a good public reputation is to, well, do good science! Another important thing is to just be honest. Own up to the limitations of your work, and I’ve found that people will actually respect you more. It’s pretty easy to sniff out someone who’s being disingenuous (as the lawyerly answers from Elizabeth Holmes have shown), and I think people will actually respect you more if you just straight up say what you really think. Plus, it makes people think you’re smart if you show you’ve already thought about all the various problems.

Far more murky is the large gray zone of private reputation, which encompasses all the trust in the work that you don’t see publicly. This is going out to dinner with a colleague and hearing “Oh yeah, so-and-so is really solid”… or “That person did the same experiment 40 times in grad school to get that one result” or “Oh yeah, well, I don’t believe a single word out of that person’s mouth.” All of which I have heard, and don’t let me forget my personal favorite “Mr. Artifact bogus BS guy”. Are these just meaningless rumors? Sometimes, but mostly not. What has been surprising to me is how much signal there is in this reputational gossip relative to noise—when I hear about someone with a shady reputation, I will often hear very similar things independently from multiple sources.

I think this is (rightly) because most scientists know that spreading science gossip about people is generally something to be done with great care (if at all). Nevertheless, I think it serves a very important purpose, because there’s a lot of reputational information that is just hard to share publicly. Many reasons for this, one of them being that the burden of proof for calling someone out publicly is very high, the potential for negative fallout is large, and you can easily develop your own now-very-public reputation for being a bitter, combative pain in the ass. A world in which all scientists called each other out publicly on everything would probably be non-functional.

Of course, this must all be balanced against the very significant negatives to scientific gossip. It is entirely possible that someone could be unfairly smeared in this way, although honestly, I’m not sure how many instances of this I’ve really seen. (I do know of one case in which one scientist supposedly started a whisper campaign against another scientist about their normalization method or something suitably petty, although I have to say the concerns seemed valid to me.)

So how much gossip should we spread? For me, that completely depends on the context. With close friends, well, that’s part of the fun! :) With other folks, I’m of course far more restrained, and I try to stick to what I know firsthand, although it’s impossible to give a straight up rule given the number of factors to weigh. Are they asking for an evaluation of a potential collaborator? Are we discussing a result that they are planning to follow up on in the lab, thus potentially harming a trainee? Will they even care what I say either way? An interesting special case is trainees in the lab. I think they actually stand to benefit greatly from this informal reputational chatter. Not only do they learn who to avoid, but even just knowing the fact that not everyone in science can be trusted is a valuable lesson.

Which leads to another important problem with private reputations: if they are private, what about all the other people who could benefit from that knowledge but don’t have access to it? This failure can manifest in a variety of ways. For people with less access to the scientific establishment (smaller or poorer countries, e.g.), you basically just have to take the literature at face value. The same can be true even within the scientific establishment; for example, in interdisciplinary work, you’ll often have one community that doesn’t know the gossip of another (lots of examples where I’ll meet someone who talks about a whole bogus subfield without realizing it’s bogus). And sometimes you just don’t get wind in time. The damage in terms of time wasted is real. I remember a time when our group was following up a cool-seeming result that ended up being bogus as far as we could tell, and I met a colleague at a conference, told her about it, and she said they saw the same thing. Now two people know, and perhaps the handful of other people that I’ve mentioned this to. That doesn’t seem right.

At this point, I often wonder about a related issue: do these private reputations even matter? I know plenty of scientists with widely-acknowledged bad reputations who are very successful. Why doesn’t it stick? Part of it is that our review systems for papers and grants just don’t accommodate this sort of information. How do you give a rational-sounding review that says “I just don’t believe this”? Some people do give those sorts of reviews, but come across as, again, bitter and combative, so most don’t. Not sure what to do about this problem. In the specific case of publishing papers, I often wonder why journal editors don’t get wind of these issues. Perhaps they just are in the wrong circles? Or maybe there are unspoken union rules about ratting people out to editors? Or maybe it’s just really hard not to send a paper to review if it looks strong on the face of it, and at that point, it’s really hard for reviewers to do anything about it. It is possible that preprints and more public discussion may help with this? Of course, then people would actually have to read each other’s papers…

That said, while the downsides of a bad private reputation may not materialize as often as we feel they should, the good news is that I think the benefits to a good private reputation can be great. If people think you do good, solid work, I think that people will support you even if you’re not always publishing flashy papers and so forth. It’s a legitimate path to success in science, and don’t let the doom and gloomers and quit-lit types tell you otherwise. How to develop and maintain a good private reputation? Well, I think it’s largely the same as maintaining a good public one: do good science and don’t be a jerk. The main difference is that you have to do these things ALL THE TIME. There is no break. Your trainees and mentors will talk. Your colleagues will talk. It’s what you do on a daily basis that will ensure that they all have good things to say about you.

(Side point… I often hear that “Well, in industry, we are held to a different standard, we need things to actually work, unlike in academia.” Maybe. Another blog post on this soon, but I’m not convinced industry is any better than academia in this regard.)

Anyway, in the end, I think that molecular biology is the sort of field in which scientific reputation will remain an integral part of how we assess our science, for better or for worse. Perhaps we should develop a more public culture of calling people out like in physics, but I’m not sure that would necessarily work very well, and I think the hostile nature of discourse in that field contributes to a lack of diversity. Perhaps the ultimate analysis of whether to spread gossip or do something gossip-worthy is just based on what it takes for you to get a good night’s sleep.


  1. Arjun,

    Great points! I agree that reputation matters and should matter, perhaps more. However, I think that in some cases reputations built just based on private comments should matter much less, because I do know people who try to smear their colleagues/competitors. Maybe that backfires. Maybe it does not.

    We make comments privately either because (i) we want to avoid retaliation or because (ii) the evidence is weak/unconvincing. I find (i) much more valid than (ii) and think that (ii) should be treated with much scepticism. A well known fallacy of human thinking is that we can interpret multiple instances of very weak evidence as being cumulatively much, much stronger than it actually is. Not to mention that multiple accounts are not necessarily independent and uncorrelated.

  2. My guess is that private reputation doesn't work very well in a huge field like molecular biology

  3. Ferdinando PucciJuly 2, 2016 at 6:50 PM

    This is ridiculous. You're basically saying that subjective evaluation, mafia-like nepotism is a good thing to be praised. This is the most absurd argument I've read on the subject.

    Reputation will not and should not substitute for full disclosure of experimental results. It's that simple. And journal editors are fully guilty for that.

    1. I get where you're coming from. First off, I'm not making a moral judgement on whether it's good or bad to do things via private reputation. My point is merely to show why it has become particularly important in molecular biology. I once met a chemist who was amazed coming to cell biology that people would tell him "oh, so and so's work is all junk". Previously for him, he never really questioned whether someone had made solid conclusions. As I say, I think this is because molecular biology involves the study of a very complex, highly self-interacting system with imprecise tools, leading to a lot of potential artifacts.

      Also, yes, I agree about full disclosure of experimental results in theory. In practice, this is very difficult. Do you report every single piece of data you ever took? What about the brand of the Eppendorf tubes you use? Or the specific lot numbers of the FBS? All of these things can and do matter. We don't report them because we trust that people have considered those things. Blaming journal editors for not requiring this is a cop-out. In this day and age, it's pretty easy to post to the internet as much experimental detail as you would like for your own work. Not to personalize this too much, but have you done so for your papers?

  4. Hi Arjun, very interesting post. I'd like to weigh in from the Editor's perspective. I think there are at least two reasons "junk" work gets past editors.

    First, converting the gut reaction "I just don't believe this BS" into a cogent scientific argument is time consuming and deeply, deceptively difficult. A favorite moment of mine in graduate school: I was walking out of a Big Name seminar behind Jonathan Weissman, and I heard him say, "Wow, that was so wrong-headed as to be virtually unassailable." It was a career-altering "A-ha!" moment for me. This quote hits on a key truth: if you deliver a story well, the further it is from ground truth or best practices, the harder it is to attack. Sometimes that's because the task is just so overwhelming, sometimes it's because it can be epically hard to identify the core premises that sit under a (faulty but internally consistent) argument, sometimes studies are constructed in a way that's appealing-yet-bizarre and gold-standard ways to interrogate data are not quite relevant. Dealing with any of these difficulties takes a lot of time, effort, and insight, and those are understandably in sort supply, especially among reviewers. As a result, lots of things essentially get a pass. I deeply wish that wasn't true, and I'm infinitely grateful to the reviewers who do take that sort of time. It kills me when I'm handling a paper and my gut tells me it needs that sort of attention, but I can't find a reviewer who will supply it.

    Second, there's a messy reality: many people suggest their friends as reviewers. At our journal, it's standard practice to send a manuscript to one of the reviewers the authors suggest because that's a respectful thing to do. We do look for conflicts of interest, but many of these conflicts are (in your parlance) private: there's no PubMed or Google trail. Groups of people have each other's backs, and as far as I can tell, there aren't community-accepted standards for what constitutes a conflict of interest. This is probably trickier than it seems, because scientists who have intellectual affinities for each other's work tend to become friendly, and drawing a strict line between colleague (no conflict of interest) and friend (conflict of interest) is hard. A downstream consequence of this: it's probably true while among your group, Prof. X's work is BS, Prof. X probably has friends who think his/her work is forward thinking, or game changing, or etc. etc. The challenge for the editor is to get a paper out of its clique and in from of objective eyes. I can do that in my home field (at least now, while my knowledge of the sociology is reasonably fresh) but I handle papers that are far outside of my field, too. In all honesty, it's recipe for lower standards when I'm working outside of my field and an author abuses the system and suggests friends as reviewers. We always choose reviewers on our own as well (to guard against this), but it's hard to know whether our choices really are objective if nobody is talking to us.

    Finally, I'd like to echo what you said about Eppy tubes/FBS lot numbers etc. Scientific projects are turtles-all-the-way-down re: judgement calls about which details are important. In a very fundamental way, journals have to respect the authors' position as The Experts and trust them to report important details. I simply don't have the expertise to draw that line in an informed way for all the papers I handle, nor do I think that that line is static or universal across fields. The current system definitely has problems, but it does put the ball in the author's court out of respect for your expertise.

    1. Very interesting points, thanks Quincey! I definitely can only imagine how frustrating this is to deal with at the editorial level--it's just really hard to assess all that work, and then afterward you're probably getting endless comments from scientists like "How could you publish that junk?". I've adopted a "Well, junk happens" stance, and I think that's the only reasonable thing to do. If we adopt a very stringent "quality" filter, I think we'll stamp out what little really new and exciting work is out there, which is already disadvantaged for a variety of reasons.

    2. Arjun, I completely agree with you. The challenge is to protect the folks who are earnestly trying to try to think in new ways (which may strike some as BS because just like kids eating new foods, it can take repeated exposure to develop a taste for new ideas/approaches) vs. people who distract from core illegitimacy with the scientific equivalent of jazz hands and snazzy production.

      Finally, just a note to Yoseph below-- I agree that traditional peer review has it's limitations and that it's healthy to question its legitimacy, but I don't think technology (in this regard) is the answer. There will always be a natural human tendency to protect one's own. On the other hand, I do think that technology may help re: reputation. If, as a community, we were to require the deposition of raw data and very clear accounting of how all of our data sausage gets made (e.g. linking versioned, time-stamped electronic notebooks to the raw data published in papers), some of this private knowledge may well enter the public sphere.

  5. Hi Arjun, thank you for a terrific post. I enjoyed reading it and the comments as well. Special thanks to Quincey who gave a great "behind the scene" view as an editor struggling with this. This is something I have thought about a lot. I recall my reaction after reading my first Biology paper when I started my PhD in ML/CompBio. Coming from CS and Physics I came back to my advisor, Nir Friedman horrified: "What is this language? 'This implies...', 'we conclude...', 'strongly suggests....' - I want facts! proofs!" Nir just smiled and said: "I probably should not have started you on this project with this paper, maybe try this one....". I later learned this is part of how biology papers are written. Sometimes to cover the author's a**, but also because many times it's really hard to completely prove something or cancel all alternatives yet it is still very "publishable". This comes back to the complexity of the systems you describe, the limits of our current tools, and why we tend to relay on reputation of PI for things we can actually trust. And so it is possible that in such settings we need a fuzzy system in place for assessing reputation, and it might vary between valuable information to harmful gossip. All we can do is strive to promote solid, reproducible, Science as researchers (or editors!). Coming back to Quincey's comment, I wonder if some things will change if the way we evaluate/comment on/score will change. In a way, we are using a very archaic system of usually 2-3 anonymous reviewers of limited capacity/knowledge (and biases!) to gate/evaluate research/papers and I wonder how the new technologies we have at hand may help improve things (obviously to improve reproducibility as well, a topic you touched upon in other posts)..... Anyway, thanks again for a great thought provoking post!

    1. Great point about technology and review. Cell Systems has a new "Tool" format in which reviewers actually test the software out: http://www.cell.com/cell-systems/pdf/S2405-4712(16)30228-9.pdf

  6. This sounds like a horrible idea, judging someone by the gossip of your friends? In any lab I've worked in, there's been a person who hates me, a person who loves me, and 3 who are neutral. So you're going to take the 20% chance that I'm a saint and the 20% chance that I'm a hack, and judge me accordingly? That's a terrible idea.

    We like to think that as scientists we are above petty feelings, but you seem to have forgotten that humans are still human, and personal, non-scientific feelings can color a judgement of a person's work, and you'd never know it because you decided to listen to them.

  7. yeah, you are right dear. Reputation and reproducibility are particularly strongly correlated with each other. That's why they are playing till now in science. Anyway, great article and thanks for share.

  8. I found this blog after a long time which is really helpful to let understand different approaches. I am going to adopt these new point to my career and thankful for this help.
    Bahis Sitesi