Saturday, June 6, 2015

Gene expression by the numbers, day 3: the breakfast club

(Day 0, Day 1, Day 2, Day 3)

So day 3 was… pretty wild! And inspiring. A bit hard to describe. There was one big session. The session had some dancing. A chair was thrown. Someone got a butt in the face. I’m not kidding.

How did such nuttiness come to pass? Well, today the 15 of us all gave exit talks, where we have the floor to discuss a point of our choosing. On the heels of the baseball game, we decided (okay, someone decided) that everyone should choose a walk-up song, and we’d play the song while the speaker made their way up for the exit talk. Later, I’ll post the playlist and the conference attendees and set up a matching game. The playlist was so good!

(Note: below is a fairly long post about various things we talked about. Even if you don’t want to read it all, check out the scientific Rorschach test towards the end.)

I was somehow up first. (See if you can guess my song. People in my lab can probably guess my song.) The question I posed was “does transcription matter?” More specifically, if I changed the level of transcription of a gene from, say, 196 transcripts per cell to 248 transcripts per cell, does that change anything about the cell? I think the answer depends on the context. Which led me to my main point that I kind of mentioned in an earlier post, which is that (I think) we need strong definitions based on functional outcomes in order to shape how we approach studying transcriptional regulation. I personally think this means that we really need to have much better measurements of phenotype so we can see what the consequences are of, say, a 25% increase in transcription. If there is no consequence, then should we bother studying why transcription is 25% higher in one situation vs. the other? Along these lines, Mo Khalil made the point that maybe we can turn to experimental evolution to help us figure out what matters, and maybe that could help guide our search for what matters in regulation.

Barak led another great point about definitions. He started his talk by posing the question “Can someone please give me a good definition of an enhancer?” In the ensuing discussion, folks seemed to converge on the notion that in molecular biology, definitions of entities is often very vague and typically defined much more by the experiments that we can do. Example: is an enhancer a stretch of DNA that affects a gene independently of its position? At a distance? These notions often from experiments in which they move the enhancer around and find that it still drives expression. Yet from the quantitative point of view, the tricky thing with experimentally based definitions is that these were often qualitative experiments. If moving the enhancer changes expression by 50%, then is that “location independent”?

Justin made an interesting point: can we come up with “fuzzy” definitions? Is there a sense in which we can build models that incorporate this fuzziness that seems to be pervasive in biology? I think this idea got everyone pretty excited: the idea of a new framework is tantalizing, although we still have no idea exactly what this would look like. I have to admit that personally, I’m not so sure that dispensing with the rigidity of definitions is a good thing–without rigid definitions, we run the risk of not saying anything useful and concrete at all. Perhaps having flexible definitions is actually similar to just saying that we can parametrize classes of models, with experiments eliminating some fraction of those model classes.

Jané brought in a great perspective from physics, saying that actually having a lot of arguments about definitions is a great thing. Maybe by having a lot of competing definitions and all of us trying to prove ours and contrast with others will eventually lead us to the right answer, and myopia in science can really lead to stagnation. I really like this thought. I feel like “big science” endeavors often fail to provide real progress because of exactly this problem.

The discussion of definitions also fed into a somewhat more meta discussion about interdisciplinary science and different approaches. Rob is strongly of the opinion that physicists should not need to get the permission of biologists to study biology, nor should they allow them to dictate what’s “biologically relevant”. I think this is right, and I also find myself often annoyed when people tell us what’s important or not.

Al made a great point about the role of theory in quantitative molecular biology. The point of theory is to say, “Hey, look at this, this doesn’t make sense. When you run the numbers, the picture we have doesn’t work–we need a new model.” Jané echoed this point, saying that at least with a model, we have something to argue about.

He also said that it would be great if we could formulate “no-go” models. Can we place constraints on the system in the abstract? Gasper put this really nicely: let’s say I’m a cell in a bicoid gradient trying to make a decision on what to do with my life. Let’s say I had the most powerful regulatory “computer” in the world in that cell. What’s the best that that computer could do with the information it is given? How precisely can it make its decision? How close do real cells get to this? I think this is a very powerful way to look at biology, actually.

Some of the discussions on theory and definitions brought up an important meta point relating to interdisciplinary work. I think it’s important that we learn to speak each other’s languages. I’ve very often heard physicists give a talk where they garble the name of a protein or something like that, and when a biologist complains, the response is sort of “well, whatever, it doesn’t matter”. Perhaps it doesn’t matter, but can be grating to the ear and the attitude can come across as somewhat disrespectful. I think that if a biologist were to give a talk and said “oh, this variable here called p… oh, yes, you call it h-bar, but whatever, doesn’t matter, I call it p”, it would not go over very well. I think we have to be respectful and aware of each other’s terminology and definitions and world view if we want to get each other to care about what we are both doing. And while I agree with Rob that physicists shouldn’t need permission to study biology, I also think it would be nice to have their blessings. Personally, I like to be very connected to biologists, and I feel like it has opened my mind up a lot. But I also think that’s a personal choice, perhaps informed by my training with Sanjay Tyagi, a biologist who I admire tremendously.

Another point about communicating across fields came up in discussing synthetic biology approaches to transcriptional regulation. If you take a synthetic approach to regulatory DNA, you will often encounter fierce resistance that you’re studying a “toy model” and not the real system. The counter, which I think is a reasonable argument, is that if you study just the existing DNA, you end up throwing your hands in the air and saying “complexity, who knew!”. (One conferee even said complexity is a waste of time: it’s not a feature but rather a reflection of our ignorance. I disagree.) So the synthetic approach may allow us to get at the underlying principles in a controlled and rigorous manner. I think that’s the essence of mechanistic molecular biology: make a controlled environment and then see if we can boil something down to its parts. Sort of like working in cell extracts. I think this is a sensible approach and one that deserves support in the biological community–as Angela said, it’s a “hearts and minds” problem.

That said, personally, I’m not so sure that it will be so easy to boil things down to its parts–partly because it's clearly very hard to find non-regulatory DNA to serve as the "blank slate" to work with for synthetic biology.  I'm thinking lately that maybe a more data first approach is the way to go, although I weirdly feel quite strongly against this view at the same time (much more on this in a perspective piece we are writing right now in lab). But that’s fundamentally scary, and for many scientists, this may not be a world they want to live in. Let me subject you to a scientific Rorschach test:

Image from here
What do you see here?
  1. A catalog of data points.
  2. A rule with an exception.
  3. A best fit line that explains, dunno, 60% of the variance, p = 0.002 (or whatever).
If you said #1, then you live in the world of truth and fact, which is admirable. You are also probably no fun at dinner parties.

Which leads us to #2 vs. #3. I posit that worldview #2 is science as we traditionally know it. A theory is a matter of belief, and doesn’t have a p-value. It can have exceptions, which point to places where we need some new theory, but in and of itself, it is a belief that is absolute. #3 is a different world, one in which we have abandoned understanding as we traditionally define it (and there is little right now to lead us to believe that #3 will give us understanding like #2, sorry omics people).

I would argue that the complexity of biological regulation may force us out of #2 and into #3. At this meeting, I saw some pretty strong evidence that a simple thermodynamic model can explain a fair amount of transcriptional regulation. So is that a theory, a simple explanation that most of us believe? And we just need some additional theory to explain the exceptions? Or, alternatively, can we just embrace the exceptions, come up with some effective theory based on regression, and then say we’ve solved it totally? The latter sounds “wrong” somehow, but really, what’s the difference between that and the thermodynamic model? I don’t think that any of us can honestly say that the thermodynamic model is anything other than an effective representation of molecular processes that we are not capturing fully. So then how different is that than a SVM telling us there are 90 features that explain most of the variance? How much variance do you explain before it’s a theory and not a statistical model? 90%? How many features before it’s no longer science but data science? 10? I think that where we place these bars is a matter of aesthetics, but also defines in some ways who we are as scientists.

Personally, I feel like complexity is making things hopeless and we have to have a fundamental rethink transitioning from #2 to #3 in some way. And I say this with utmost fear and trepidation, not to mention distaste. And I’m not so sure I’m right. Rob holds very much the opposite view, and we had a conversation in which he said, well, this field is messy right now and it might take decades to figure it out. He could be right. He also said that if I’m right, then it’s essentially saying that his work on finding a single equation for transcription is not progress. Did I agree that that was not progress? I felt boxed in by my own arguments, and so I had to say “Yeah, I guess that’s not progress”. But I very much believe that it is progress, and it’s objectively hard to argue otherwise. I don’t know, I’m deeply ambivalent on this myself.

Whew. So as you can probably tell, this conference got pretty meta by the end. Ido said this meeting was not a success for him, because he hasn’t come away with any tangible, actionable items. I agree and disagree. This meeting was sort of like The Breakfast Club. It was a bunch of us from different points of view, getting together and arguing, and over time getting in touch with our innermost hopes and anxieties. Here’s a quote from Wikipedia on the ending of the movie:
Although they suspect that the relationships would end with the end of their detention, their mutual experiences would change the way they would look at their peers afterward.
I think that’s where I am. I actually learned a lot about regulatory DNA, about real question marks in the field, and got some serious challenges to how I’ve been thinking about science these days. It’s true that I didn’t come away with a burning experiment that I now have to do, but I would be surprised if my science were not affected by these discussions in the coming months and years (in fact, I am now resolved to work out a theory together with Ian in the lab by the end of the summer).

At the end, Angela put up the Ann Friedman’s Disapproval Matrix:

She remarked, rightly, that even when we disagreed, we were all pretty much in the top half of the matrix. I think this speaks to the level of trust and respect everyone had for each other, which was the best part of this meeting. For my part, I just want to say that I feel lucky to have been a part of this conference and a part of this community.

Walk-up song match game coming soon, along with a playlist!


  1. Thanks again for taking the time to write all this up - great coverage and commentary! This last day sounded super interesting, both in content and in organization. I hope others adapt this style for any meetings they organize!

    I agree with you that a good approach is to start with functional consequences of transcription at relate them to the different aspects of transcription. For example, for function at the organism, tissue, or cellular level - what are the constraints on:
    - dynamic range
    - speed of response
    - noise (extrinsic and intrinsic)
    - evolvability in a system of similar binding sequences (I think this is super important for eukaryotes) / robustness (still functions under mutation?)
    - channel capacity (related to quantities above - in Gasper's sense of how reliably it can decode a signal).

    We might find clues by looking at the evolutionary record, where we see new functions evolve at the organism/tissue/cellular level, and then look for corresponding changes to machinery of transcriptional control.

    As we work out these constraints, in my opinion its not useful to distinguish the ways of learning the models (mechanism based bottom up / statistical learning as you talked with Rob and Barak) - the only thing that matters are the model classes we are trying to learn. For example, models from equilibrium stat mech will all have a similar form - and maybe it doesn't matter if we measure the parameters first and then build the model (Rob's case) or learn the model first and then go find the proteins that give the parameters (Barak's case).

    If we can break the different model classes into their minimally complex / fewest parameter forms, we ask for a given functional consequence which model class does it require? This requires the no-go approach that you discussed with Gasper. What is the minimal complexity we need to add to explain the new function? e.g. going up one rung in the model class hierarchy.

    If we think about it this way - which is function first and then minimally complex models need to explain the function - then we also do away with the problem of different definitions. The words become irrelevant - only the different model classes to explain the behavior.

    Perhaps in developing this approach we need to bring in more people from mathematics.

    Anyway, thanks for the great coverage and look forward to discussing more with you next time you're here.

    - Tony Kulesa

    1. Hi Tony,
      I really like these ideas about functional consequences. I'm not sure how exactly to figure out what these constraints are. I think one big issue is that we have a very poor knowledge of phenotype in most model organisms. I think that's a big part of what we need to do moving forward. Lucas Pelkmans has some interesting work on this, I think.

      I still feel like there is something important about the differences between the mechanistic and statistical approaches. They are important for what we think science is all about. Although perhaps this distinction has less practical relevance.

      Anyway, thanks so much for the thoughtful comments! Sorry we didn't get a chance to hang out (busy couple of days), but maybe next time.

  2. not to be picky, but am actually curious about the intended ending to the sentence starting:
    "we run the risk of ..."

    it seems to be hanging in space.

    thanks, m

  3. I think that the difference between #2 and #3 is less in (i) the size of the model and the number of explanatory variables and more in (ii) the extent to which the model is based on a plausible physical mechanism and gives us a satisfying conceptual understanding (e.g., thermodynamics), not just a computational/predictive tool that captures measured variance (e.g., SVM). The two factors (i and ii) are not independent since very large models tend to distract from a simple and pleasing conceptual underpinning, but they are distinct. I agree that #2 is much more desirable. Since #2 is very hard in the context of large, incompletely measured and poorly understood systems, many researchers go for #3 since it seems more interesting than #1 alone.

  4. Thanks again for a great and provocative summary. It sounds like a successful meeting, in the sense that it helped to clarify questions and disagreements in the field, even if Ido and others don't have anything tangible to bring into the lab.

    You raise a great point that all of us need to consider more - what transcriptional differences matter? Sometimes we get lucky and have a system where a good phenotype is coupled with an interesting set of enhancers/promoters.

    However, I'd argue that we can still make important progress without answering that question. If we have a model/mechanism/theoretical understanding that lets us accurately identify enhancers from their sequence, and accurately predict the transcriptional effects of mutating/adding/subtracting binding sites to native cis-regulatory elements in the genome, then we've succeeded in answering the question, how is cis-regulatory function encoded in DNA sequence? We may not know how much cis-regulatory function is phenotypically important, but we will understand how different sequences encode different patterns of transcription.

    1. I agree that measuring transcription is a reasonable midway point between genotype and phenotype, and effects on transcription may qualify as a phenotype in and of itself. I also think it's hard to escape the fact that we still don't know in what context the level of transcription matters. I think it does, but differently in different situations.

  5. I think the good thing about a mechanistic model is that it can give you a set of assumptions that you can experimentally test for, that if true would imply that your model is correct (for ex. equivalence principle in general relativity). So it's easy to define a domain of applicability for a model and identify when the model may start to break down.

    With statistical models like SVM, pgm, etc. you often don't get that.

    1. Dunno, is it not possible to test a statistical model? Perhaps not quite as satisfying, but should still be possible, no?