(Day 0, Day 1, Day 2, Day 3 (take Rorschach test at end of Day 3!))
Yesterday was day 1 of Gene expression by the numbers, and it was everything I had hoped it would be! Lots of discussion about big ideas, little ideas, and everything in between. Jane Kondev said at some point that we should have a “controversy meter” based on the loudness of the discussion. Some of the discussions would definitely have rated highly, which great! Here are some thoughts, very much from my own point of view:
We started the day with a lively discussion about how I am depressed (scientifically) :). I’m depressed because I’ve been thinking lately that maybe biology is just hopelessly complex, and we’ll never figure it out. At the very least, I’ve been thinking we need wholly different approaches. More concretely for this meeting, will we ever truly be able to have a predictive understanding of how transcription is regulated? Fortunately (?), only one other person in the room admitted to such feelings, and most people were very optimistic on this count. I have to say that at the end of the day, I’m not completely convinced, but the waters are muddier.
Who is an optimist? Rob Phillips is an optimist! And he made a very strong point. Basically, he’s been able to take decades of data on transcriptional regulation in E. coli and reduce it to a single, principled equation. Different conditions, different concentrations, whatever, it all falls on a single line. I have to say, this is pretty amazing. It’s one thing to be an optimist, another to be an optimist with data. Well played.
And then… over to eukaryotes. I don’t think anyone can say with a straight face that we can predict eukaryotic transcription. Lots of examples of a lot of effects that don’t resolve with simple models, and Angela DePace gave a great talk highlighting some of the standard assumptions that we make that may not actually hold. So what do we do? Just throw our hands in the air and say “Complexity, yipes!”?
Not so fast. First, what is the simple model? The simplest model is the thermodynamic model. Essentially, each transcription factor binds to the promoter independently of each other, and its effects are independent of each other. Um, duh, that can’t work, right? I was of the opinion that decades of conventional promoter bashing hasn’t really provided much in the way of general rules, and more quantitative work along these lines hasn’t really done so either.
But Barak brought up an extremely good point, which is that a lot of these approaches to seeing how promoter changes affect transcription suffer from being very statistically underpowered. They also made the point (with data) that once you really start sampling, maybe things are not so bad–and amazingly enough, maybe some of the simplest and “obviously wrong” caricatures of transcriptional regulation are not all that far off. Maybe with sufficient sampling, we can start to see rules and exceptions, instead of a big set of exceptions. Somehow, this really resonated with me.
I’m also left a bit confused. So do we have a good understanding of regulation or not? I saw some stuff that left me hopeful that maybe simple models may be pretty darn good, and maybe we’re not all that far off from the point where if I wanted to dial up a promoter that expressed at a certain level, I just type in this piece of DNA and I’ll get close. I also saw a lot of other stuff that left me scratching my head and sent me back to wondering how we’ll ever figure it all out.
There was also here an interesting difference in style. Some approach from a very statistical point of view (do a large amount of different things and look for emergent patterns). Some approach things from a very mechanistic point of view (tweak particular parameters we think are important, like distances and individual bases, and see what happens). I usually think it’s very intellectually lazy to say things like “we need both approaches, they are complementary”, but in this case, I think it’s apt, though if I had to lean one way, personally, I think I favor the statistical approach. Deriving knowledge from the statistical approach is a tricky matter, but that’s a bigger question. How much variance do we need to explain? As yet unanswered, see later for some discussion about the elephant in the room.
Models: some cool talks about models. One great point: “No such thing as validating a model. We can only disprove models.” A point of discussion was how to deal with models that don’t fit all the data. Do we want to capture everything? How many exceptions to the rule can you tolerate before it’s no longer a rule?
Which comes to a talk that was probably highest on the controversy meter. In this one, the conferee who shares my depression showed some results that struck me as very familiar. The idea was build a quantitative model, then go build some experiments to show transcriptional response, and the model fits nicely. Then you change something in the growth medium, and suddenly, the model is out the window. We’ve all seen this: day to day variability, batch variability, “weird stuff happened that day”, whatever. So does the model really reflect our understanding of the underlying system?
This prompted a great discussion about what our goals are as a community. Is the goal really to predict everything in every condition? Is that an unreasonable thing to expect from a model? This got down to understanding vs. predicting. Jane brought up the point that these are different: Google can predict traffic, but it doesn’t understand traffic. A nice analogy, but I’m not sure that it works the other way around. I think understanding means prediction, even if prediction doesn’t necessarily mean understanding. Perhaps this comes down to an aesthetic choice. Practically speaking, for the quantitative study of transcription, I think that the fact that the model failed to predict transcription in a different condition is a problem. One of my big issues with our field is that we have a bunch of little models that are very context specific, and the quantitative (and sometimes qualitative) details vary. How can we put our models together if the sands are shifting under our feet all the time? I think this is a strong argument against modularity. Rob made the solid counter that perhaps we’re just not measuring all the parameters–if we could measure transcription factor concentration directly, maybe that would explain things. Perhaps. I’m not convinced. But that’s just, like, my opinion, man.
So to me the big elephant in the room that was not discussed is what exactly matters about transcription? As quantitative scientists, we may care about whether there are 72 transcripts in this cell vs. 98 in the one next door, but does that have any consequences? I think this is an important question because I think it can shape what we measure. For instance, this might help us answer the question about whether explaining 54% of the variance is enough–maybe the cell only cares about on vs. off, in which case, all the quantitative stuff is irrelevant (I think there is evidence for and against this). Maybe then all we should be studying is how genes go from an inactive to an active state and not worry about how much they turn on. Dunno, all I’m saying is that without any knowledge of the functional consequences, we’re running the risk of heading down the wrong path.
Another benefit to discussing functional consequences is that I think it would allow us to come up with useful definitions that we can then use to shape our discussion. For instance, what is cross-talk? (Was the subject of a great talk.) We always talk about it like it’s a bad thing, but how do we know that? What is modularity? What is noise? I think these are functional concepts that must have functional definitions, and armed with those definitions, then maybe we will have a better sense of what we should be trying to understand and manipulate with regard to transcriptional output.
Anyway, looking forward to day 2!