Thursday, June 4, 2015

Gene expression by the numbers, verdict on day 1: awesome!

(Day 0Day 1Day 2Day 3 (take Rorschach test at end of Day 3!))

Yesterday was day 1 of Gene expression by the numbers, and it was everything I had hoped it would be! Lots of discussion about big ideas, little ideas, and everything in between. Jane Kondev said at some point that we should have a “controversy meter” based on the loudness of the discussion. Some of the discussions would definitely have rated highly, which great! Here are some thoughts, very much from my own point of view:

We started the day with a lively discussion about how I am depressed (scientifically) :). I’m depressed because I’ve been thinking lately that maybe biology is just hopelessly complex, and we’ll never figure it out. At the very least, I’ve been thinking we need wholly different approaches. More concretely for this meeting, will we ever truly be able to have a predictive understanding of how transcription is regulated? Fortunately (?), only one other person in the room admitted to such feelings, and most people were very optimistic on this count. I have to say that at the end of the day, I’m not completely convinced, but the waters are muddier.

Who is an optimist? Rob Phillips is an optimist! And he made a very strong point. Basically, he’s been able to take decades of data on transcriptional regulation in E. coli and reduce it to a single, principled equation. Different conditions, different concentrations, whatever, it all falls on a single line. I have to say, this is pretty amazing. It’s one thing to be an optimist, another to be an optimist with data. Well played.

And then… over to eukaryotes. I don’t think anyone can say with a straight face that we can predict eukaryotic transcription. Lots of examples of a lot of effects that don’t resolve with simple models, and Angela DePace gave a great talk highlighting some of the standard assumptions that we make that may not actually hold. So what do we do? Just throw our hands in the air and say “Complexity, yipes!”?

Not so fast. First, what is the simple model? The simplest model is the thermodynamic model. Essentially, each transcription factor binds to the promoter independently of each other, and its effects are independent of each other. Um, duh, that can’t work, right? I was of the opinion that decades of conventional promoter bashing hasn’t really provided much in the way of general rules, and more quantitative work along these lines hasn’t really done so either.

But Barak brought up an extremely good point, which is that a lot of these approaches to seeing how promoter changes affect transcription suffer from being very statistically underpowered. They also made the point (with data) that once you really start sampling, maybe things are not so bad–and amazingly enough, maybe some of the simplest and “obviously wrong” caricatures of transcriptional regulation are not all that far off. Maybe with sufficient sampling, we can start to see rules and exceptions, instead of a big set of exceptions. Somehow, this really resonated with me.

I’m also left a bit confused. So do we have a good understanding of regulation or not? I saw some stuff that left me hopeful that maybe simple models may be pretty darn good, and maybe we’re not all that far off from the point where if I wanted to dial up a promoter that expressed at a certain level, I just type in this piece of DNA and I’ll get close. I also saw a lot of other stuff that left me scratching my head and sent me back to wondering how we’ll ever figure it all out.

There was also here an interesting difference in style. Some approach from a very statistical point of view (do a large amount of different things and look for emergent patterns). Some approach things from a very mechanistic point of view (tweak particular parameters we think are important, like distances and individual bases, and see what happens). I usually think it’s very intellectually lazy to say things like “we need both approaches, they are complementary”, but in this case, I think it’s apt, though if I had to lean one way, personally, I think I favor the statistical approach. Deriving knowledge from the statistical approach is a tricky matter, but that’s a bigger question. How much variance do we need to explain? As yet unanswered, see later for some discussion about the elephant in the room.

Models: some cool talks about models. One great point: “No such thing as validating a model. We can only disprove models.” A point of discussion was how to deal with models that don’t fit all the data. Do we want to capture everything? How many exceptions to the rule can you tolerate before it’s no longer a rule?

Which comes to a talk that was probably highest on the controversy meter. In this one, the conferee who shares my depression showed some results that struck me as very familiar. The idea was build a quantitative model, then go build some experiments to show transcriptional response, and the model fits nicely. Then you change something in the growth medium, and suddenly, the model is out the window. We’ve all seen this: day to day variability, batch variability, “weird stuff happened that day”, whatever. So does the model really reflect our understanding of the underlying system?

This prompted a great discussion about what our goals are as a community. Is the goal really to predict everything in every condition? Is that an unreasonable thing to expect from a model? This got down to understanding vs. predicting. Jane brought up the point that these are different: Google can predict traffic, but it doesn’t understand traffic. A nice analogy, but I’m not sure that it works the other way around. I think understanding means prediction, even if prediction doesn’t necessarily mean understanding. Perhaps this comes down to an aesthetic choice. Practically speaking, for the quantitative study of transcription, I think that the fact that the model failed to predict transcription in a different condition is a problem. One of my big issues with our field is that we have a bunch of little models that are very context specific, and the quantitative (and sometimes qualitative) details vary. How can we put our models together if the sands are shifting under our feet all the time? I think this is a strong argument against modularity. Rob made the solid counter that perhaps we’re just not measuring all the parameters–if we could measure transcription factor concentration directly, maybe that would explain things. Perhaps. I’m not convinced. But that’s just, like, my opinion, man.

So to me the big elephant in the room that was not discussed is what exactly matters about transcription? As quantitative scientists, we may care about whether there are 72 transcripts in this cell vs. 98 in the one next door, but does that have any consequences? I think this is an important question because I think it can shape what we measure. For instance, this might help us answer the question about whether explaining 54% of the variance is enough–maybe the cell only cares about on vs. off, in which case, all the quantitative stuff is irrelevant (I think there is evidence for and against this). Maybe then all we should be studying is how genes go from an inactive to an active state and not worry about how much they turn on. Dunno, all I’m saying is that without any knowledge of the functional consequences, we’re running the risk of heading down the wrong path.

Another benefit to discussing functional consequences is that I think it would allow us to come up with useful definitions that we can then use to shape our discussion. For instance, what is cross-talk? (Was the subject of a great talk.) We always talk about it like it’s a bad thing, but how do we know that? What is modularity? What is noise? I think these are functional concepts that must have functional definitions, and armed with those definitions, then maybe we will have a better sense of what we should be trying to understand and manipulate with regard to transcriptional output.

Anyway, looking forward to day 2!


  1. Thanks for covering this! Here's what I might have said if I was there today:

    The problem using statistical mechanics to model transcription in eukaryotes is that it doesn't make much sense mechanistically. Here’s a collection of things that we’ve observed about transcription in eukaryotes:
    1) Transcription occurs in short bursts rather than “analog” smooth production
    2) None of the transcription factors measured via imaging so far seem to bind to
    promoters for longer than a few seconds (binding energy predicts much longer)
    3) Competition between proteins that bind the same sequence isn’t always observed, in fact sometimes facilitation is what’s seen
    4) Histone occupancy of promoters has been shown to be essential for both
    repressing and expressing genes
    5) ATP-consuming DNA translocases can push and eject other proteins off DNA
    6) Killing chromatin remodeling factors has been shown to increase TF promoter
    occupancy, but decrease expression of the same gene

    Clearly all of these observations are in direct contradiction with the statistical mechanical view where transcription is driven by a transcription factor outcompeting histones and other proteins to bind a promoter and recruit a polymerase, all at equilibrium.

    Regarding modeling - sure, we can make measurements to show where these models break down, but what’s the point? We could push the limits of measurement and extend the complexity of the model to incorporate some of the mechanistic details, et cetera, but are we really going to learn any new biology?

    The models don’t represent reality as much as they represent our assumptions about reality. Making a model with one set of assumptions may let you investigate/explain different phenomena than a model with a different set of assumptions, even if both are derived from the same data.

    The real question then becomes: what aspects of biology can we not explore when we think in terms of the equilibrium statistical mechanics? What does bringing back the energy dissipation of ATP dependent processes allow us to explore?

    What’s worth exploring? (In my opinion, this is what we ought to be devoting most of the discussion towards)

    My thoughts are that there is a fundamental connection between the heat dissipation in eukaryotic transcription and computation. In Hopfield's kinetic proofreading, heat must be dissipated to achieve lower error rates in translation than would be achievable by an equilibrium mechanism. Considering complex life has the same order of magnitude number of genes as simple prokaryotes, the complexity must arise through complex regulation. How do we achieve complex regulation without making mistakes? Does it require more than an equilibrium mechanism could achieve?

    Jeremy Gunawardena and Gasper Tkacik's group are both doing interesting work in this area - would love to get an update if it comes up at the conference!

    1. Thanks for the great commentary. I agree that there are definitely a lot of instances in which it is clear that the equilibrium model is simply no good. There were some great examples at the conference, both from Barak and from Angela.

      I really like the idea of thinking about how complex regulation can ensure the lack of mistakes. This strikes me as a very important functional question that we should be addressing. There was some discussion on this point, but to me, the biggest issue is that measuring transcription as an output in and of itself is problematic because we don't know whether the levels matter. Because of that issue, it's hard to say whether a "mistake" is really a mistake or not–all a matter of interpretation.

      Heat dissipation is an interesting idea. Of course, literally speaking, there are probably many aspects that burn ATP (and may not matter for the organism), so maybe what we need is the regulatory equivalent of "energy".

      Regarding the issues about models: I still think there are some very deep philosophical concepts at play. Ultimately, I think we're in a bit of a "put up or shut up" situation. Someone's gotta prove that their viewpoint is making headway. Rob's work does this, I think, in a powerful way, as does Barak's. If I want to argue for complexity, I have to contend with that.

  2. Just to contrast -- I think there are more than a few examples to suggest even eukaryotic transcription can be pretty simple. For example, Zenklusen et al abd Gandhi et al show that a good fraction of yeast transcription generates poisson distributions. Poisson-distributed RNA counts imply a single rate-determining step. These observations lend credibility to statistical mechanic methods which (implicitly) assume a single rate constant for transcription. Of course, there are more complex mechanisms, but maybe more complicated will be two crucial steps, or three!

    1. Hi Marc,
      Yeah, I just don't know. I feel like it comes down to a matter of taste, on some level. Is a model that explains most of what happens good enough? At what point is it a viable "theory"? I saw examples at the conference that gave me hope, and others that got me back to thinking that complexity reigns. Even a single rate limiting step can be complex if the regulation of that one step is very complex. To me, a fundamental issue with the synthetic approach is how to construct a "null" piece of DNA. Is there any DNA with *no* regulation? Based on my understanding of Barak's work, I would argue that's hard to come by.