Friday, June 5, 2015

Gene expression by the numbers, day 2: take me out to the ballgame

(Day 0Day 1Day 2Day 3 (take Rorschach test at end of Day 3!))

First off, just want to thank a commenter for providing an interesting and thoughtful response to some of the topics we discussed in day 1. Highly recommended reading.

Day 2 started with Rob trying to stir the pot by placing three bets (the stakes are dinner in Paris at a fancy restaurant, yummy!). First bet was actually with me, or really a bet against pessimism. He claimed that he would be able to explain Hana’s complicated data on transcription in different conditions once we measured the relevant parameters, like, say, transcription factor concentration (wrote about this in the day 1 post). My response was, well, even if you could explain that with all the transcription factor concentrations, that’s not really the problem I have. My problem is that it is impossible to build a simple predictive model of transcription here. The input-output relationship depends on so many other factors that we end up with a mess–there are no well-defined modules. To which Rob rightfully responded by saying that that's moving the goalposts: I said he can't do X, he does X, I say now you have to do Y. Fair enough. I accept the original challenge: I claim that he will not be able to explain the differences in Hana's data using just transcription factor concentration.

Next bet was with Barak. In the day 1 post, I mention the statistical approach vs. the mechanistic approach. Rob and Barak still have to formulate the bet precisely (and I think they actually agree mostly), but basically, it is a bet against the statistical approach. Hmm. Personally, I don't know how I come down on this. I am definitely sympathetic to Rob's point of view, and don't like the overemphasis these days on statistics (my thoughts). But my thoughts are evolving. Rob asked "Would it really have been possible to derive gravitation with a bunch of star charts and machine learning?" To which I responded with something along the lines of "well, we are machines, and we learned it." Sort of silly, but sort of not.

Final bet was with Ido (something about universality of noise scaling laws). Ido also had a bet as well on this point, in this case offering up a bottle of Mezcal for a resolution. More on this some other time. I am going to try and get the bottle!

The talks were again great (I mean really great), if perhaps a bit more topically diffuse than yesterday. Started with evolution. Very cool, with beautiful graphs of clonal sweeps. An interesting point was that experimental evolution arrives at different answers than you expect initially. They are rational (or can be), but not what you expect early on–amazingly even in pathways as well worked out as the metabolic pathways. I'm wondering if we could leverage this to understand pathways better in some way?

On to the "tech development" section, which was only somewhat about tech development, somewhat not. Stirling gave a great talk about human NET-seq. What I really liked about it was that in the end, there was a simple answer to a simple question (is transcription different over exons when they're skipped? exons vs. introns?). I think it's awesome to see that genome-wide data can give such clear results.

So far, everything was about control of the mean levels of transcription. Both Ido and I talked about the variance around that mean, with Ido providing beautiful data on input-output functions. On the Mezcal, Ido shows that there is a strong relationship between the Fano factor and the mean. I am wondering whether this is due to volume variation. Olivia's paper has some data on this. Probably the subject of another blog post at some point in the future.

Theory: great discussion about Hill coefficients with Jeremy! How can you actually get thresholds in transcriptional regulation? Couple ideas. There's conventional cooperativity, and there could also be other mechanisms, like titration via dummy binding sites like in Nick Buchler's work. Surprising that we still have a lot of questions about mechanisms of thresholds after all this time.

Conversation with Jeremy and Harinder: how much do we know about whether sequence fully predicts binding? Thought for an experiment–if you sweep through transcription factor concentrations, what happens to binding as measured by e.g. ChIP-seq? Has anyone done this experiment?

Then, off to the Red Sox vs. the Twins. Biked over there on Hubway with Ron, which was perfect on a really lovely day in Cambridge. The game was super fun! Apparently there were some people playing baseball there, but that didn't distract me too much. Had a great time chatting with various folks, including two really awesome students from Angela's lab, Clarissa Scholes and Ben Vincent, who joined in the fun. Talked with them about the leaky pipeline, which is something I will never, ever discuss online for various reasons. Also crying in lab–someone at the conference told me that they've made everyone in their lab cry, which is so surprising if you know this person. Someone also told me that I'm weird. Like, they said "Arjun, you are weird." Which is true.

Oh, and the Twins won, which made me happy–not because I know the first thing about baseball, but I hate the Red Sox, mostly because of their very annoying fans. Oops, did I say that out loud?

Okay, fireworks are happening here on day 3. More soon!


  1. Thanks for the great summary!

    I've heard Rob make that challenge about Brahe's/Kepler's data and gravitation a few times before. Being someone who's interested in understanding mechanism, at first I'd nod my head - but now I'm not so sure that gravitation is the best comparison. The underlying law is simple and there was a nearly comprehensive data set. In those areas where physics deals with fairly complex, heterogeneous systems, physicists turn to statistics.

    One thing that we need to clarify as a field is what exactly statistical thermodynamic models are capturing. They are right now, hands down, the most predictive models relating sequence to expression. But it's important to recognize they're really genetic models dressed up as biophysical models: these are models that predict interactions between binding sites (which is what is varied in the experiments), using a hypothesized TF occupancy that is rarely measured. (We've done it once in our lab.)

    Because these are really models of binding sites, not TFs, they manage to do a decent job capturing cooperative & anti-cooperative interactions (in a genetic sense) between sties. My take is that these interactions are robust enough to make it through the clearly questionable assumption of equilibrium. In other words, thermo models - despite their assumptions and simplistic representations of mechanism, are good at finding interactions between binding sites.

    1. I feel deeply ambivalent about the approach to science we must take. I feel like the statistical approach is perhaps the only way forward, but Rob made a pretty convincing argument that maybe we just need to stay the course. Dunno.

      I agree that the big question with the thermo model is what exactly it corresponds to. I think you're right to point out that these are just dressed up genetic models. Unless we can rigorously show that these things are really working the way we want, then they aren't really all that different than statistical models, right?

    2. Even though they are largely models of genetic experiments, I disagree with Rob about them being purely statistical. The models make explicit predictions about how regulatory sequence should affect TF occupancy - and regardless of what else is going on, we all believe TF occupancy is a big part of cis-regulation, right?

      My perspective is colored, obviously, by my experience. We built a thermo model that made an extremely non-intuitive, but successful prediction, in advance of the experiment, about what would happen when you eliminate two binding sites from an enhancer.

      That model led to a purely analytical, non-statistical result about how cooperativity and competition between activators and repressors plays out. That result is a mechanistic prediction, one which I'm planning to test by building a system from scratch based on the predicted mechanism.

      So I think thermo models really do have a lot of what Rob is asking for. We've just rarely followed up experimentally at the level of biochemical detail we need to test the occupancy predictions - though we do have some data.

    3. I actually think that Rob would agree with the thermodynamic model–I think maybe I somewhat oversimplified this discussion. It's true that thermodynamic models can make predictions and that they have basis in reality. Whether the parameters actually correspond to reality is maybe an open question, dunno.

  2. Thanks for the summary! I believe that someone in Mike Eisen's lab did ChIP-seq on flies with different levels of bicoid a couple of years back, but I'm not sure where the project stands at the moment.

  3. Re ChIP-seq after trans perturbations, one of my favorite posters from the fly meeting was from Colleen Hannon from the Wieschaus lab, who did Bicoid ChIP-seq on embryos that ubiquitously express different levels of Bicoid (not a trivial genetic experiment at all). She found that enhancers that are normally exposed to high levels of Bicoid bind less Bicoid when levels are decreased, whereas enhancers that are normally exposed to low levels of Bicoid either bind the same or more Bicoid when levels are decreased. Sorry I just used 'Bicoid' four times in one sentence. The effect doesn't seem to be dependent on changes in other TFs.

    Also, Arjun failed to mention our incredibly revealing discussion on karaoke / walk-in music choices, which included the fact that members of his lab 1) have never heard and 2) fail to appreciate Cher's Believe. Clarissa and I are still trying to get our heads around that ;)

    1. Very interesting! One of the themes of the conference is wondering if we'll ever make sense of all these weird phenomena. Hmm.

      Oh, and Ally Cote from my lab just wants to make clear that she knows every word to Believe.