Friday, January 23, 2015

Some thoughts on Tomasetti and Vogelstein (and post-publication review)

Interesting paper from Tomasetti and Vogelstein entitled “Variation in cancer risk among tissues can be explained by the number of stem cell divisions” (screw the paywall). This paper has generated a lot of controversy on Twitter and blogs, which is in many ways a preview of what a post-publication review environment might look like. I worry that it’s been largely negative, so here are my (admittedly relatively uninformed) thoughts.

Here is the abstract:
Some tissue types give rise to human cancers millions of times more often than other tissue types. Although this has been recognized for more than a century, it has never been explained. Here, we show that the lifetime risk of cancers of many different types is strongly correlated (0.81) with the total number of divisions of the normal self-renewing cells maintaining that tissue’s homeostasis. These results suggest that only a third of the variation in cancer risk among tissues is attributable to environmental factors or inherited predispositions. The majority is due to “bad luck,” that is, random mutations arising during DNA replication in normal, noncancerous stem cells. This is important not only for understanding the disease but also for designing strategies to limit the mortality it causes.
Basically, the idea is that part of the reason that some tissues are more prone to cancer is because they have a lot of stem cell divisions–an idea supported by the data they present. I think this is a really important point! In particular, because in some ways it establishes what I consider an important null, which is that in considering cancer incidence, it seems reasonable to consider that the more proliferative tissues will be more prone to cancer just because of the increased number of cell divisions. Darryl Shibata (USC) has a series of really nice papers on this point, focusing on colorectal cancer. In particular, in this paper, he points out that such models would predict that taller (i.e., bigger) people would have more stem cells and thus should have a higher incidence of cancer. And that’s actually what they find! I saw Shibata give an (excellent) talk on this at a Physics of Cancer workshop, and afterwards, a cancer biologist criticised this height result, incredulously saying “Well, but there are so many other factors associated with being tall!” Fair enough. But I think that Darryl’s is an economical model that explains the data, and would be what I would consider an important null that deviations should be measured against. I think this is a nice point that Tomasetti and Vogelstein make as well.

What are the consequences of such a null? Tomasetti and Vogelstein frame their discussion around stochastic, environmental and genetic influences on cancer incidence between tissues. Emphasis on between tissues. What exactly does this mean? Well, what they are saying is that if you compare lung cancer rates in smokers vs. non-smokers (environmental effect), then the rate of getting cancer is around 10-20 times higher, but your chances of getting lung cancer even as a non-smoker is still much higher than getting, say, head osteosarcoma, and a plausible possible reason for this is that there are way more stem cell divisions in lung than in the bones in your head. Similarly, colorectal cancer incidence rates are much higher in people with a genetic predisposition (APC mutation), but again, even without the genetic predisposition, that is still many orders of magnitude higher than in other tissues with much lower rates of stem cell divisions. I think this is pretty interesting! Of course, as with Shibata’s height association, the association with stem cell divisions is not proof that the stem cell divisions are per se the cause of this association, but one of the nice things about Shibata’s work is that he shows that a model of stem cell divisions and number of genetic “hits” required for a particular cancer can match the actual cancer incidence data. So I think this is a plausible null model for a baseline of how much certain tissues will get cancer. Incidentally, this made me realize a perhaps obvious point on the genetic determinants of cancer: if you find an association of a gene with cancer incidence, then it may be that the association is because the gene is associated with, e.g., height, in which case, yes, there is technically a genetic underpinning for that variation, but it is hard to imagine designing any sort of drug based on this finding. Tomasetti and Vogelstein make this point in their paper.

The authors then go on to further analyze their data and separate cancers into ones in which the variance in incidence is dominated by “stochastic” effects vs. “deterministic” effects. I can’t say I’ve gone into the details of this analysis, but it seems interesting–and a natural question to ask with these data. Here are a few thoughts on the ideas this analysis explores. One question that has come up a lot is why is this correlation not so strong, especially on a linear scale? I think that one issue is that the division into stochastic, environmental and genetic is missing a big component, which is the tissue, cell and molecular biology of cancer. Some tissues may require more genetic “hits” than others, or a long series of epigenetic effects, or have structures that enable rapid removal of defective stem cells, and so even tissues with the same number of divisions, in the absence of any genetic or environmental factors, will have different rates of cancer. Another issue is that these data are imperfect, and so you will get some spread no matter what. Still, I think the association is real and interesting.

Anyway, I think this “null model” is pretty cool. I wonder if one of the reasons that we focus so much on environmental and genetic effects is that we can do “experiments” on them, whereas the causal links in the stem cell division hypothesis are hard to prove.

There was a very interesting critique from Yaniv Erlich that said that the authors’ analysis implicitly assumes that there is no interaction between the number of stem cell divisions and genetic and environmental factors. A good point, although I do think that Tomasetti and Vogelstein have thought about this–as I mentioned, they say explicitly:
The total number of stem cells in an organ and their proliferation rate may of course be influenced by genetic and environmental factors such as those that affect height or weight.
Their example about the mouse vs. human incidence of colon vs. small intestine cancer in the case of the APC mutation is I think a nice piece of evidence suggesting that number of divisions is very important factor in determining cancer incidence. Although again, many alternative explanations here.

I think some of the confusion out there about this paper can be summed up as follows:
“You are a smoker and I am not, so I have a lower rate of getting lung cancer.”
“Yeah, but you still have a much higher rate of getting lung cancer than bone cancer.”
“Uhh… okay… sure… don’t think I’m gonna take up smoking anytime soon, though.”
It’s just a weird comparison to make. That said, I don’t think the authors really make this comparison anywhere in their manuscript. What I think they are saying at the end is that for cancers that have strong determinants due to environmental factors, lifestyle changes and other such interventions could be useful (like quitting smoking), whereas for other cancers that arise more randomly, we should just focus on detection. Although I have to admit that perhaps I’m missing something, but this seems like a point one could make even without this analysis.

There has been a lot of discussion out there about how weak the correlation is and whether its appropriate to use log-log or linear scales and so forth. I think the basic point they are trying to make is that more highly proliferative tissues are more prone to cancer. I think the data they present are consistent with this conclusion. Whether the specific amount of variance they quote in the abstract is right or not is an important technical matter that I think other people are already talking about a lot, but I think the fundamental conclusion is sound.

A note about the reaction to this paper: in principle, I like the concept of moving from pre-publication anonymous peer review to a post-publication peer review world. I think that pre-publication anonymous peer review is slow, arbitrary, and (most importantly) demoralizing, especially for trainees. That said, now that I’ve seen a bit of post-publication peer review happen online, I think the sad thing I must report is that in many cases, the culture seems to be one of the hardcore takedown, often in a rather accusatorial tone. And I thought it was hard to get a positive review from a journal! Here are some nice thoughts from Kamoun, who recently responded (admirably) to an issue raised on Pubpeer.

My view is that in any paper with real-world data, there will be points that are solid and points that are weak. In post-publication peer review, we run the risk of reducing a paper to a negative soundbite that propagates very fast, and thus throwing out the baby with the bathwater, not to mention putting the author (often a trainee) under very intense public scrutiny that they might not be equipped to handle. I think we should be very careful in how we approach post-publication review because of its viral nature online. Anyway, those are my two cents.

PS: Apropos of discussions of log-log correlations vs. linear correlations, we have a fairly extensive comparison of RNA-seq data to RNA FISH data. More very soon.


  1. I think this is spot on, both on the paper and the twitter responses. I'm fairly new to Twitter, but I'm pretty dismayed at the sometimes pretty vitriolic takedowns of colleagues.

  2. "What are the consequences of such a null? Tomasetti and Vogelstein frame their discussion around stochastic, environmental and genetic influences on cancer incidence between tissues."

    Since they're looking at average cancer rates by tissue, doesn't that imply an averaging of differences between individuals (except for a few cases, such as lung cancer, where they compared smokers and non-smokers)? How can they say that their results speak to the effect of genetic and environmental variation (which are differences between individuals) when that variation has been averaged out in their data points?

    1. I'm really not an expert on this, but I think the key point which is sort of tricky to think about is that we normally think about genetic or environmental differences between individuals. In this paper, they are talking about genetic or environmental differences between *tissues*. What is an environmental difference between tissues? Well, some tissues get a lot more exposure than others. That doesn't explain all of it, though, and they give the nice example of the two parts of the GI tract that get cancer at very different rates but have similar environmental exposures. The one that gets much more cancer also has many more divisions, though. Genetic differences is sort of a weird thing to even think about in this context, but the idea is that even a genetic predisposition to a particular cancer does not cause the rate of cancer for a particular tissue to change as much as comparing that tissue to another tissue with a much larger or smaller rate of stem cell divisions. I think it's a relatively unfamiliar comparison to make and was fairly confusing for me–and I'm still probably getting something wrong!