Friday, January 23, 2015

Some thoughts on Tomasetti and Vogelstein (and post-publication review)

Interesting paper from Tomasetti and Vogelstein entitled “Variation in cancer risk among tissues can be explained by the number of stem cell divisions” (screw the paywall). This paper has generated a lot of controversy on Twitter and blogs, which is in many ways a preview of what a post-publication review environment might look like. I worry that it’s been largely negative, so here are my (admittedly relatively uninformed) thoughts.

Here is the abstract:
Some tissue types give rise to human cancers millions of times more often than other tissue types. Although this has been recognized for more than a century, it has never been explained. Here, we show that the lifetime risk of cancers of many different types is strongly correlated (0.81) with the total number of divisions of the normal self-renewing cells maintaining that tissue’s homeostasis. These results suggest that only a third of the variation in cancer risk among tissues is attributable to environmental factors or inherited predispositions. The majority is due to “bad luck,” that is, random mutations arising during DNA replication in normal, noncancerous stem cells. This is important not only for understanding the disease but also for designing strategies to limit the mortality it causes.
Basically, the idea is that part of the reason that some tissues are more prone to cancer is because they have a lot of stem cell divisions–an idea supported by the data they present. I think this is a really important point! In particular, because in some ways it establishes what I consider an important null, which is that in considering cancer incidence, it seems reasonable to consider that the more proliferative tissues will be more prone to cancer just because of the increased number of cell divisions. Darryl Shibata (USC) has a series of really nice papers on this point, focusing on colorectal cancer. In particular, in this paper, he points out that such models would predict that taller (i.e., bigger) people would have more stem cells and thus should have a higher incidence of cancer. And that’s actually what they find! I saw Shibata give an (excellent) talk on this at a Physics of Cancer workshop, and afterwards, a cancer biologist criticised this height result, incredulously saying “Well, but there are so many other factors associated with being tall!” Fair enough. But I think that Darryl’s is an economical model that explains the data, and would be what I would consider an important null that deviations should be measured against. I think this is a nice point that Tomasetti and Vogelstein make as well.

What are the consequences of such a null? Tomasetti and Vogelstein frame their discussion around stochastic, environmental and genetic influences on cancer incidence between tissues. Emphasis on between tissues. What exactly does this mean? Well, what they are saying is that if you compare lung cancer rates in smokers vs. non-smokers (environmental effect), then the rate of getting cancer is around 10-20 times higher, but your chances of getting lung cancer even as a non-smoker is still much higher than getting, say, head osteosarcoma, and a plausible possible reason for this is that there are way more stem cell divisions in lung than in the bones in your head. Similarly, colorectal cancer incidence rates are much higher in people with a genetic predisposition (APC mutation), but again, even without the genetic predisposition, that is still many orders of magnitude higher than in other tissues with much lower rates of stem cell divisions. I think this is pretty interesting! Of course, as with Shibata’s height association, the association with stem cell divisions is not proof that the stem cell divisions are per se the cause of this association, but one of the nice things about Shibata’s work is that he shows that a model of stem cell divisions and number of genetic “hits” required for a particular cancer can match the actual cancer incidence data. So I think this is a plausible null model for a baseline of how much certain tissues will get cancer. Incidentally, this made me realize a perhaps obvious point on the genetic determinants of cancer: if you find an association of a gene with cancer incidence, then it may be that the association is because the gene is associated with, e.g., height, in which case, yes, there is technically a genetic underpinning for that variation, but it is hard to imagine designing any sort of drug based on this finding. Tomasetti and Vogelstein make this point in their paper.

The authors then go on to further analyze their data and separate cancers into ones in which the variance in incidence is dominated by “stochastic” effects vs. “deterministic” effects. I can’t say I’ve gone into the details of this analysis, but it seems interesting–and a natural question to ask with these data. Here are a few thoughts on the ideas this analysis explores. One question that has come up a lot is why is this correlation not so strong, especially on a linear scale? I think that one issue is that the division into stochastic, environmental and genetic is missing a big component, which is the tissue, cell and molecular biology of cancer. Some tissues may require more genetic “hits” than others, or a long series of epigenetic effects, or have structures that enable rapid removal of defective stem cells, and so even tissues with the same number of divisions, in the absence of any genetic or environmental factors, will have different rates of cancer. Another issue is that these data are imperfect, and so you will get some spread no matter what. Still, I think the association is real and interesting.

Anyway, I think this “null model” is pretty cool. I wonder if one of the reasons that we focus so much on environmental and genetic effects is that we can do “experiments” on them, whereas the causal links in the stem cell division hypothesis are hard to prove.

There was a very interesting critique from Yaniv Erlich that said that the authors’ analysis implicitly assumes that there is no interaction between the number of stem cell divisions and genetic and environmental factors. A good point, although I do think that Tomasetti and Vogelstein have thought about this–as I mentioned, they say explicitly:
The total number of stem cells in an organ and their proliferation rate may of course be influenced by genetic and environmental factors such as those that affect height or weight.
Their example about the mouse vs. human incidence of colon vs. small intestine cancer in the case of the APC mutation is I think a nice piece of evidence suggesting that number of divisions is very important factor in determining cancer incidence. Although again, many alternative explanations here.

I think some of the confusion out there about this paper can be summed up as follows:
“You are a smoker and I am not, so I have a lower rate of getting lung cancer.”
“Yeah, but you still have a much higher rate of getting lung cancer than bone cancer.”
“Uhh… okay… sure… don’t think I’m gonna take up smoking anytime soon, though.”
It’s just a weird comparison to make. That said, I don’t think the authors really make this comparison anywhere in their manuscript. What I think they are saying at the end is that for cancers that have strong determinants due to environmental factors, lifestyle changes and other such interventions could be useful (like quitting smoking), whereas for other cancers that arise more randomly, we should just focus on detection. Although I have to admit that perhaps I’m missing something, but this seems like a point one could make even without this analysis.

There has been a lot of discussion out there about how weak the correlation is and whether its appropriate to use log-log or linear scales and so forth. I think the basic point they are trying to make is that more highly proliferative tissues are more prone to cancer. I think the data they present are consistent with this conclusion. Whether the specific amount of variance they quote in the abstract is right or not is an important technical matter that I think other people are already talking about a lot, but I think the fundamental conclusion is sound.

A note about the reaction to this paper: in principle, I like the concept of moving from pre-publication anonymous peer review to a post-publication peer review world. I think that pre-publication anonymous peer review is slow, arbitrary, and (most importantly) demoralizing, especially for trainees. That said, now that I’ve seen a bit of post-publication peer review happen online, I think the sad thing I must report is that in many cases, the culture seems to be one of the hardcore takedown, often in a rather accusatorial tone. And I thought it was hard to get a positive review from a journal! Here are some nice thoughts from Kamoun, who recently responded (admirably) to an issue raised on Pubpeer.

My view is that in any paper with real-world data, there will be points that are solid and points that are weak. In post-publication peer review, we run the risk of reducing a paper to a negative soundbite that propagates very fast, and thus throwing out the baby with the bathwater, not to mention putting the author (often a trainee) under very intense public scrutiny that they might not be equipped to handle. I think we should be very careful in how we approach post-publication review because of its viral nature online. Anyway, those are my two cents.

PS: Apropos of discussions of log-log correlations vs. linear correlations, we have a fairly extensive comparison of RNA-seq data to RNA FISH data. More very soon.

Friday, January 16, 2015

Gordon Conference turns graduate student into crazy reptile lady

Just got back from a cool Gordon conference on Stochastic Physics in Biology with a couple students in the lab. Lots of interesting science, and lots of cool people to talk with as well!

The food was overall really good, but one day, we decided to go get some Mexican food from a local taco shack. Delicious! On the way back, we noticed a little store on the side of the road called "Exotic Emporium". When we went inside, what did we find but a reptile pet store. Olivia fell in love with those little critters, and here's the evidence:

"Hello, strange lizard":

"That is a large snake!"

"I think I like snakes."

"Okay, put the snake around your neck then." "Umm, okay..."

"The colors! The colors!"

"Can I keep it?"

Saturday, December 27, 2014

Three observations about anonymity in peer review

I made a vow to myself to not blog about peer review ever again. Oh well. Anyway, I have been thinking about a few things related to anonymity in the review process that I don’t think I’ve heard discussed elsewhere:
  1. Everyone I talk to who has published there has raved about eLife. Like, literally everyone–in fact, they have all said it was one of their best publication experiences, with a swift, fair, and responsive review process. I was wondering what it was in particular that made the review process so much less painful. Then somebody told me something that made a ton of sense (I forget who, but thanks, Dr. Insight, wherever you are!). The referees confer to reach a joint verdict on the paper. In theory, this is to build a scientific consensus to harmonize the feedback. In practice, Dr. Insight pointed out that the main benefit is that it’s a lot harder to give those crazy jackass reviews we all get because you will be discussing it with your fellow reviewers, who are presumably peers in some way or another. You don’t want to look like a complete tool or someone with an axe to grind in front of your peers. And so I think this process yields many of the benefits of non-anonymous peer review while still being anonymous (to the author). Well played, eLife!
  2. One reimagining of the publishing system that I definitely favor is one in which every paper gets published in a journal that only publishes based on technical veracity, like PLOS ONE. Then the function of the “selective journal” is just to publish a “Best of…” list of the papers they like the best. I think that a lot of people like this idea, one which decouples assessments of whether the paper is technically correct from assessments of “impact”. In theory, sounds good. One issue, though, is that it ignores the hierarchy on the reviewer side of the fence. Editors definitely do not just randomly select reviewers, nor select them just based on field-specific knowledge. And not every journal gets the same group of reviewers–you better believe that people who are too busy to review for Annals of the Romanian Plant Society B will somehow magically find time in their schedule to review for Science. Perhaps what might happen is that this new version of “Editor” (i.e., literature curator) might commission further post-publication reviews from a trusted critic before putting this paper on their list. Anyway, it’s something to work out.
  3. I recently started signing all my reviews (not sure if they ever made it to the authors, but I can at least say I tried). I think this makes sense for a number of reasons, most of which have been covered elsewhere. As I had noted here, though, there is “Another important factor that gets discussed less often, which is that in the current system, editors have more information than you as an author do. Sometimes you’ll get 2/3 good reviews and its fine. Sometimes not. Whether the editor is willing to override the reviewer can often depend on relative stature more than the content of the review–after all, the editor is playing the game as well, and probably doesn’t want to override Prof. PowerPlayer who gave the negative review. This definitely happens. The editor can have an agenda behind who they send reviews to and who they listen to. So no matter how much blinding is possible (even double blind doesn’t really seem plausible), as long as we have editors choosing reviewers and deciding who to listen to, there will be information asymmetry. Far better, in my mind, to have reviewer identities open–puts a bit of the spotlight on editors, also.” Another interesting point: as you work your way down the ladder, if you get a signed negative review, you will know who to exclude next time around. Not sure of all the implications of that.
Anyway, that’s it–hopefully will never blog about peer review again until we are all downloading PDFs from BioRxiv directly to our Google self-driving cars.

Friday, December 26, 2014

Posting comments on papers

For many years, people have wondered why most online forums for comments result in hundreds of comments, but even the most exciting scientific results lead to the sound of crickets chirping. Lots of theories as to why, like fear of scientific reprisal or fear of saying something stupid or lack of anonymity.

Perhaps. But I wonder if part of it is just that it feels… incongruous to post comments on scientific papers. To date, I have posted exactly two comments on papers. My first owed its genesis (I think) to the fact that I had just read something about how nobody comments on papers, and so I was determined to post a comment on something. And it was a nice paper on something I found interesting and so I wanted to say something. I just now wrote my second comment. It was on this AWESOME paper (hat tip to Sri Kosuri) comparing efficiency of document preparation using Word vs. LaTeX (verdict: LaTeX loses, little surprise to me). Definitely something I found interesting, and so I somehow felt the urge to comment.

And then, as I started writing my comment, something just felt… wrong. Firstly, the process was annoying. I had to log in to my PLOS account, which I of course forgot all the details of. Then, as I was leaving my comment, I noticed a radio button at the bottom to say whether I had a competing interest. The whole process was starting to feel a whole lot more official than I had anticipated. Suddenly, the relatively breezy and light-hearted nature of my comment felt very out of place. It’s just very hard to escape the feeling that any commentary on a scientific paper must be couched in the stultifying language and framework of the typical peer review, which is just so different than the far more informal commentary than you get on, for instance, blog posts. And heaven forbid if you actually posted a joke or something like that.

I feel like part of the reason nobody comments is that publishing a paper seems like a Very Serious Business™, and so any writing or commentary associated with it seems like it should be just as serious. Well, I agree that publishing a paper is a very tedious business, but I think making scientific discourse a bit more lighthearted would be a good thing overall. And who knows, one side-effect could be that maybe someone might actually read the paper for a change!

Tuesday, December 23, 2014

Fortune cookies and peer review

Ever play that game where you take the fortune from a fortune cookie and then add “in bed” to the end of it for a funny reinterpretation? I’ve found it works pretty well if you just replace “in bed” with “in peer review”. Behold (from some recent fortune cookies I got):

Look for the dream that keeps coming back. It is your destiny in peer review.

Wiseness makes for oneself an island which no flood can overwhelm in peer review.

Ignorance never settles a question in peer review.

In the near future, you will discover how fortunate you are in peer review.

Every adversity carries with it the seed of an equal or greater benefit in peer review.

You will find luck when you go home in peer review.

Also reminds me of the weirdest fortune I ever got: “Alas! The onion you are eating is someone else’s water lily.” Not sure exactly what that means, in peer review or otherwise…

Saturday, December 20, 2014

Time-saving tip–make a FAQ for almost anything

One of the fundamental tenets of programming is DRY: Don’t Repeat Yourself. If you find yourself writing the same thing multiple times, you’re creating a problem in that you have to maintain consistency if you ever make a change, and you’ve had to write it twice.

In thinking about what I have to do in my daily life, a lot of it also involves repetitive tasks. The most onerous of these are requests for information that require somewhat length e-mails or what have you. Yet, many times, I end up answering the same questions over and over. Which brings up a solution: refer to a publicly available FAQ.

I first did this for RNA FISH because I was always getting similar questions about protocols and equipment, etc. So I made this website, which I think has been useful both for researchers looking for answers and for me in terms of saving me time writing out these answer for every person I meet.

I also recently saw a nice FAQ recently (can’t find the link, darn!) where someone had put together a letter of recommendation FAQ. As in, if you want a letter of recommendation from this person, here’s a list of details to provide and a list of criteria to determine whether they would be able to write a good one for you.

Another senior professor I met recently said that she got sick of getting papers from her trainees that were filled with various errors. So she set up a list of criteria and told everyone that she wouldn’t look at anything that didn’t pass that bar. Strikingly, she said that the trainees actually loved it–it made a nice checklist for them and they knew exactly what was expected of them.

I think all of these are great, and I think I might make up such documents myself. I’m also thinking of instituting an internal FAQ for our data management in the lab. Any other ideas?

Sunday, December 14, 2014

Origin and impact of stories in life sciences research: is it all Cell’s fault?

I found this article by Solomon Snyder to be informative:

Quick summary: Benjamin Levin realized in the 80s that the tools of molecular biology had matured to the point where one could answer a question “soup to nuts”. So his goal was to start a journal that would publish such “stories” that aimed to provide a definitive resolution to a particular problem. That journal was Cell, and, well, the rest is history–Cell is the premier journal in the field of molecular and cellular biology, and is home to many seminal studies. Snyder then says that Nature and Science and the other journals quickly picked up on this same ideal, with the result that we now have a pervasive desire to “tell a story” in biomedical research papers.

I was talking with Olivia about this, and we agreed that this is pretty bad for science. Many issues, the most obvious of which is that it encourages selective omission of data and places undue emphasis on “packaging” of results. Here are some thoughts from before that I had on storytelling.

I also wonder if the era of the scientific story is drawing to a close in molecular biology. The 80s were dominated by the “gene jock”: phenotype, clone, biochemistry, story, Cell paper. I feel like we are now coming up on the scientific limitations of that approach. Molecular biology has in many ways matured in the sense that we understand many of the basic mechanisms underlying cellular function, like how DNA gets replicated and repaired, how cells move their chromosomes, and elements of transcription, but we still have a very limited understanding of how all this fits together for overall cellular function. Maybe these problems are too big for a single Cell paper to contain the “story”–in fact, maybe it’s too big to be just a single story. Maybe we’re in the era of the molecular biology book.

As an example, take cancer biology. It seems like big papers often run from characterizing a gene to curing mice to looking for evidence for the putative mechanism in patient samples. Yet, I think it is fair to say that we have not made much progress overall in using molecular biology to cure cancer in humans. What then is the point of those epic papers crammed full of an incredible range of experiments? Perhaps it would be better to have smaller, more exploratory papers that nibble away at some much larger problems in the field.

In physics, it seems like theorists play a role in defining the big questions that then many people go about trying to answer. I wonder if an approach like this might have some place in modern molecular biology. What if we had people define a few big problems and really think about them, and then we all tried to attack different parts of it experimentally based on that hard thinking? Maybe we’re not quite there yet, but I wouldn’t be surprised if this happened in the next 10-20 years.

(Note: this is most certainly not an endorsement for ENCODE-style “big science”. Those are essentially large-scale stamp collecting expeditions whose value is wholly different. I’m talking about developing a theory like quantum mechanics and then trying to prove it, which is a very different thing–and something largely missing from molecular biology today. Of course, whether such theories even exist in molecular biology is a valid question…)