Monday, May 6, 2019

Wisdom of crowds and open, asynchronous peer review

I am very much in favor of preprints and open review, but something I listened to on Planet Money recently gave me some food for thought, along with a recent poll I tweeted about re-reviewing papers. The episode was about wisdom of the crowds, and how magically if you take a large number of non-expert guesses about, say, the weight of an ox, the average comes out pretty close to the actual value. Pretty cool effect!

But something in the podcast caught my ear. They talked about how when they asked some kids, you had to watch out, because once one kid said, say, 300 pounds (wildly inaccurate), then if the other kids heard it, then they would all start saying 300 pounds. Maybe some minor variations, but the point is that they were strongly influenced by that initial guess, rather than just picking something essentially purely random. The thing was that if you had no point of reference, then even a guess provides that point of reference.

Okay, so what does this have to do with peer review? What got me thinking about it was the tweet about re-reviewing a paper you had already seen but for a different journal. I'm like nah not gonna do it because it's a waste of time, but some people said, well, you are now biased. So… in a world where we openly and asynchronously review papers (preprints, postpub, whatever), we would have the same problem that the kids guessing the weight of the cow did: whoever gives the first opinion would potentially strongly influence all subsequent opinions. With conventional peer review, everyone does it blind to the others, and so reviews could be considered more independent samplings (probably dramatically undersampled, but that's another blog post). But imagine someone comments on a preprint with some purported flaw. That narrative is very likely to color subsequent reviews and discussions. I think we've all seen this coloring: take eLife collaborative peer review, or even grant review. Everyone harmonizes their scores, and it's often not an averaging. One could argue that unlike randos on the internet guessing a cow's weight, peer reviewers are all experts. Maybe, but I am somehow not so sure that once we are in the world of experts reviewing what is hopefully a reasonably decent paper that there's much signal beyond noise.

What could we do about this? Well, we could commission someone to hold all the open reviews in confidence and then publish them all at once… oh wait, I think we already have some annoying system for that. I dunno, not really sure, but anyway, was something I was wondering about recently, thoughts welcome.

Sunday, April 28, 2019

Reintegrating into lab following a mental health leave

[From AR] These days, there is a greatly increased awareness and decreased stigmatization of mental health amongst trainees (and faculty, for that matter), which is great. For mentors, understanding mental health issues amongst trainees is super important, and something we have until recently not gotten a lot of training on. More recently, it is increasingly common to get some training or at least information on how to recognize the onset of mental health issues, and in graduate groups here at Penn at least, it is fairly straightforward to initiative a leave of absence to deal with the issue, should that be required. However, one aspect of handling mental health leaves for which there appears to be precious little guidance out there is what challenges trainees face when returning from a mental health leave of absence, and what mentors might do about it. Here, I present a document written by four anonymous trainees with some of their thoughts (and I will chime in at the end with some thoughts from the mentor perspective).


[From trainees] This article is a collection of viewpoints from four trainees on mental health in academia. We list a collection of helpful practices on the part of the PI and the lab environment in general for cases when the trainees return to lab after recovering from mental health issues.

A trainee typically returns either because they feel recovered and ready to get back to normalcy, or they are **better** than before and have self-imposed goals (e.g. finishing their PhD), or they just miss doing science. Trainees in these situations are likely to have spent time introspecting on multiple fronts and they often return with renewed drive. However, it is very difficult to shake off the fear of recurrence of the episode (here we use episode broadly to refer to a phase of very poor mental health), which can make trainees more vulnerable and sensitive to external circumstances than an average person; for instance, minor stresses can appear much larger. In particular, an off-day post a mental health issue can make one think they are already slipping back into it. In some cases, students may find it more difficult to start a new task, perhaps due to the latent fear of not being able to learn afresh. Support from the mentor and lab environment in general can be crucial in both providing and sustaining the confidence of the trainee. It is important that the mentor recognize that the act of returning to the lab is an act of courage in itself. The PI’s interactions with the trainee have a huge bearing on how the trainee re-integrates into his/her work. Here are some steps that we think can help:

Explicitly tell trainees to seek the PI out if they need help. This can be important for all trainees to hear because the default assumption is that these are personal problems to be dealt with personally in its entirety. In fact, advisors should do this with every trainee -- explicitly tell them that they are there to be reached out to, should their mental health be compromised/affected in any way. Restating this to a returning trainee can help create a welcoming and safe environment.

Reintegrating the trainee into the lab environment. The PI should have an open conversation with the trainee about how much information they want divulged to the rest of the group/department, and how they communicate the trainee’s absence to the group, if at all.

Increased time with the mentee. More frequent meetings with a returning student for the first few months help immensely for multiple reasons: a. It can help quell internal fears by a process of regular reinforcement; b. It can get the students back on track with their research faster; c. The academically stimulating conversations can provide the gradual push needed to think at a level they were used to before mental health issues. Having said that, individuals have their preferred way of dealing with the re-entering situation and a frank conversation about how they want to proceed helps immensely.

Help rebuild the trainee’s confidence. One of the authors of this post recounts her experience of getting back on her feet. Her advisor unequivocally told her: “Your PhD will get done; you are smart enough. You just need to work on your mental health, and I will work with you to make that the first priority.” Words of encouragement can go a long way -- there is ample anecdotal evidence that people can fully recover from their mental health state if proper care is taken by all stakeholders.

Create a small, well-defined goal/team goals. One of the authors of this article spent her first few months working on a fairly easy and straightforward project with a clear message, one that was easy to keep pushing on as she settled in to lab again. While this may not be the best way forward for everyone depending on where they are with their research, a clearly-defined goal can come as a quick side-project, or a deliberate breaking-down of a large project into very actionable smaller ones. Another alternative is to allow the trainee to work with another student/postdoc, something which allows a constant back-and-forth, and quicker validation which can lead to less room for mental doubt.

Remember that trainees may need to come back for a variety of other reasons as well. There are costs associated with a prolonged leave of absence, and for some trainees, they may need to come back before they are totally done with their mental health work. It's likely that some time needs to be set aside to continue that work, and it's helpful if PIs can work with students to accommodate that, within reason.

Finally, it is important for all involved parties to realize that the job of a PI is not to be the trainee’s parent, but to help the student along in their professional journey. Facilitating a lab environment where one feels comfortable, respected, and heard goes a long way, even if that means going an extra mile on the PI’s part to ensure such conditions, case-by-case.

[Back to AR] Hopefully this article is helpful for mentors and also for trainees as they try to reintegrate into the lab. For my part as a mentor, I think that a little extra empathy and attention can go a long way. I think it's important for all parties to realize that mentors are typically not trained mental health professionals, but some common sense guidelines could include increased communication, reasonable expectations, and in particular a realization that tasks that would seem quite easy for a trainee to accomplish before might be much harder now at first, in particular anything out of the usual comfort zone, like a new technique, etc.

Comments more than welcome; it seems this is a relatively under-reported area. And a huge thank you to the anonymous writers of this letter for starting the discussion.

Monday, February 18, 2019

Dear me, I am awesome. Sincerely, me… aka How to write a letter of rec for yourself

Got an email from someone who got asked to write a letter for themselves by someone else and was looking for guidance… haha, now that PI has made work for me! :) Oh well, no problem, I actually realize how hard this is for the letter drafter, and it’s also something for which there is very little guidance out there for obvious reasons. So I thought I’d make a little guide. Oh, first a couple things. First off, I don’t really know all that much about doing this, having written a few for myself and having asked for a couple, so comments from others are most welcome. Secondly, if you’re one of those sanctimonious types who thinks the PIs should write every letter and never ask for a draft, well, this blog post is probably not for you so don’t bug me about it. Third, if the PI is European, maybe just like turn everything down a notch, ya know? ;)

Anyhoo: so I figure the best way to describe how to do this is to describe how I write a letter. I’ll aim it at how I write letters for, say, a former trainee applying for a postdoc fellowship, maybe with some notes about how this might change for faculty applying for some sort of award or something.

Okay. I usually use the first paragraph to give an executive summary. Here’s an example of what I might write:
“It is my pleasure to provide my strongest possible recommendation for Dr. Nancy Longpaper. Nancy is simply an incredible scientist: she has developed, from scratch and by combining both experimental and computational skills, a system that has led to fundamental new insights into the evolution of frog legs. She has all the tools to be a superstar in her field: talent, intellectual brilliance, work ethic, and raw passion for science to become a stellar independent scientist. I look forward to watching her career unfold in the coming years.”
Or whatever something like that. The key parts of this that you will want to leave blank is the first sentence, i.e., the “strongest possible recommendation” part. That’s an important part that the letter writer will fill in.

Okay, second (optional) paragraph. This one depends a bit on personality. For some letter writers, they like to include a bit about how awesome they are and thus how qualified they are to write the letter. This is important for things like visas and so forth. This could be something like “First, I would like to introduce myself and my expertise. My laboratory studies XYZ, and I am an expert in ABC. I have published several peer reviewed articles in renowned journals such as Proceedings of the Canadian Horticultural Society B and our work has been continuously funded by the NIH.” I personally don’t include things like this for regular (non-visa) recommendations, but I have seen it.

Third paragraph: I usually try and put in some context about how I met the person I’m recommending. Like, “I first met Nancy when she was looking for labs to rotate in. She rotated in my lab and worked on project ABC. Even in her short time in the lab, she managed to accomplish XYZ. I immediately offered her a spot, and while I was disappointed for her to join Prof. Goodgrant’s lab, I was very pleased when she asked me to chair her thesis committee.” If you are a junior PI, this might be replaced with something about how the letter writer knows about your work and any interactions you may have had.

Next several paragraphs: a bunch of scientific meat. This is where you are REALLY going to save your letter writer some time. I usually break it into two parts. First paragraph or two, I describe the person’s work. What specifically did they do. PROVIDE CITATIONS, including journal names. Sorry, they matter, too bad. Try and aim for a very general audience, stressing primarily the impact of the findings. But if you don’t, don’t worry, people probably either know the work already or not. Still, try. Emphasize specific contributions. Like, “Nancy herself conceived of the critical set of controls that was required to establish the now well accepted ‘left leg bias estimator’ statistical methodology that was the key to making the discovery that XYZ.” At all times, emphasize why what you did was special. Don’t be shy! If you’re too ridiculous, don’t worry, your letter writer will fix it.

Next part of the science-meat section: in my letters, I usually try and zoom out a bit. Like, what are the specific attributes of the person that led them to be successful in the aforementioned science. Like, “This is a set of findings that only someone of Nancy’s caliber could have discovered. Her intellectual abilities and broad command of the literature enabled her to rapidly ask important questions at the forefront of the field…” Be careful to emphasize big picture important qualities and not just list out your specific skills here. Like, don’t say “Nancy was really good at qPCR and probably ran about 4.32 million of them.” Makes you sound like a drone. At the trainee level, something about how rapidly you picked up skills could be good, but definitely not at the junior faculty level. Just try and be honest about the qualities you have that you think are most important and relevant. Be maybe a little over the top but not too crazy and then maybe your letter writer will embellish as needed.

Second to last paragraph: I try and fill in a bit more personal characteristics here. Like, what are the personal qualities that helped them shine. E.g. “Nancy also is an excellent communicator of her science, and already has excellent visibility. She gives great talks and has generated a lot of enthusiasm……” Also, if relevant, can add the standard “On a personal note, Nancy is a wonderful person to have in the lab……” Probably like 4-5 sentences max. Make it sound like you belong at the level you are applying for. If it’s for a faculty position, make it sound like you are faculty, not a student.

Finally, I end my letters with an “In sum, Nancy is the perfect candidate for XYZ. I have had the privilege of watching many star scientists develop into independent scientists in this field at top institutions over the years, and I consider Nancy to be of that caliber. I cannot recommend her more strongly.” This one can be sort of a skeleton and the letter writer can fill this in with whatever gushy verbiage they want. For some things, there might be some sort of “comparables” statement here that they can put in if they want.

Tips:
  • Don’t ever say anything bad. If you say something bad, it’s a huge red flag. If the letter writer wants to say something bad, they will. That would be a pretty jerky thing to do, though.
  • Length: There are three things that matter in a letter: the first paragraph, the last paragraph, and how long the letter is in between. For a postdoc thingy, aim for 1.5-2 pages for a strong letter. 2-3 for faculty positions. 1-2 for other stuff after that.
  • Duplication: What do you do if two letter writers ask for a draft? Uhhhh… not actually sure. I have tried to make a few edits, but sometimes I just send it and say hey already sent this and they can kinda edit it up a bit. I dunno, weird situation.
Anyway, that’s my template for whatever it’s worth, and comments welcome from anyone who knows more!

Sunday, February 3, 2019

The sad state of scientific talks (and a thought on how we might help fix it)

Just got back from a Keystone meeting, and I’m just going to say it (rather than subtweet it): most of the talks were bad. I don’t mean to offend anyone, and certainly it was no worse than most other conferences, but come on. Talks over time, filled with jargon and unexplained data incomprehensible to those even slightly outside the field, long rambling introductions… it doesn’t have to be this way, people! Honestly, it also begs the question as to why people bother going to these meetings just to play around on their computers because the talk quality is so poor. I’ve heard so many people say the informal interactions are the most useful thing at conferences. I actually think this is partly because the formal part is so bad.

Why? After all, there are endless resources out there on how to give a good talk. While some tips conflict (titles? no titles? titles? no titles?), mostly they agree on some basic tenets of slide construction and presentation. I wrote this blog post with some tips on structuring talks and also links to a few other resources I think are good. And most graduate programs have at least some sort of workshop or something or other on giving a talk. So why are we in this situation?

I think the key thing to realize is that giving a good talk actually requires working on your talk. A good talk requires more than taking a couple minutes to throw some raw data onto a slide and winging it with how you present that data. For most of us, when we write a paper, it is a long iterative process to achieve clarity and engagement. Why would a talk be any different? (Oh, and by the way, practice is critical, but is not in and of itself sufficient—have to work on the right things; see aforementioned blog post.)

I think the fundamental issue is the nature of feedback and incentives for giving research talks. Without having these structured well, there is little push to do the work required to make a talk good, and they are currently structured very poorly. For incentives, the biggest problem is that the structure to date is all about what you don’t get in the long term, which are often things you don’t know you could get in the first place. Giving a good talk has huge benefits and opens the door to various opportunities long term, but it’s not like someone is going to tell you, “Hey, I had this job opening, but I’m not going to tell you about it now because your talk stunk." Partly, the issue is that the visible benefits of good presentations are often correlated to some extent with brilliance. Take, for instance, Michael Elowitz’s talk at this conference, which my lab hands down voted as the best talk of the conference. Amazing science, clear, and exciting. Michael is a brilliant and deservedly highly successful scientist. Does it help that he is an excellent communicator of his work? Of course! To what extent? I don’t know. What I can say is that many of the best scientists presented their work very well. Where do cause and effect begin and end? Hard to say, but it’s clearly not an independent variable.

Despite this correlation, I still firmly believe that you don’t have to Michael Elowitz-level brilliant to give a great talk. So then why are all these talks so bad? The other element beyond vague incentives is feedback. The most common feedback, regardless of anything about the talk you give, is “Hey, great talk!” Maybe, if you really stunk it up, you’ll get “interesting talk”. And that’s about it. I have many times gotten “Hey, great talk” followed by a question demonstrating that I totally did a terrible job explaining things. I mean, how is anybody ever going to get better if they don’t even get a thumbs-up/down on their presentation? The reason we don’t get that feedback is obviously because of the social awkwardness to telling someone something they did publicly was bad. The main place where people feel safe to give feedback is in lab meeting, which while somewhat helpful is also one of the worst places to get feedback. Asking a bunch of people already intimately familiar with your story and conversant in your jargon about what is clear or not is not going to get you all that far, generally. Also, the person with the most authority in that context (the PI) probably also gives terrible talks and so is not a good person to get feedback from. (Indeed, I have heard many, many stories of PIs actively giving their trainees bad advice.) Generally, the fact that most people you are getting feedback from aren’t themselves typically good at it is a big problem.

Okay, fine…

WHAT CAN WE DO ABOUT IT?

Again, I think the key missing element is honest feedback—I think most talk-givers don’t even realize just how bad their talks are. As I said, few people are going to tell someone to their face that their talk sucks. So how about the following: what if people preregister their talk on a website, and then people can anonymously submit a rating with comments? Basically like a teacher rating, but for speakers at a conference. You could even provide the link to the rating website on the first slide of your talk or something. This would have a number of advantages. First off, if you don’t want to do it, fine, no problem. Second, all feedback is anonymous, thus allowing people to be honest. Also, the comments allow people to give some more detailed feedback if they so choose. And, there is a strong positive incentive. With permission, you could have your average rating posted. This rating could be compared to e.g. the overall average, and if it’s good—which presumably it is if you decided to share it :)—then that’s great publicity, no?

One problem with this, though, is it doesn’t necessarily provide specific feedback. Like, what was clear or not? Comments could provide this to some extent. Also, if you, as the speaker, are willing, you could even imagine posting some questions related to your talk and seeing how well people got those particular points. Of course completely optional and just for those who really care about improving. Which should be all of us, right? :)

Oh, and one suggestion from Rita Strack was to promote the 15 minute format, which is short enough to either require concision and clarity, or, should that not happen, is over fast! :)

Some suggested (e.g. Katie Whitehead) that we incentivize good talks by doing Skype interviews or having them submit YouTubes, etc. for contributed talks. In principle I like this, but I think it's just a LOT of work and also conflates scientific merit with presentation merit, so people who don't get a spot have something other than their presentation skills to blame. Still, could work maybe.

Another, perhaps more radical idea, is to do away with the talk format entirely. Most scientists are far more clear when answering questions (probably for the simple reason that the audience drives it). Perhaps we could limit talks to 5 minutes followed by some sort of structured Q&A? Not sure how to do that exactly, but anyway, a thought.

Anybody want to give this a try?

Wednesday, August 8, 2018

On mechanism and systems biology

(Latest in a slowly unfolding series of blog posts from the Paros conference.)

Related reading:




Mechanism. The word fills many of us with dread: “Not enough mechanism.” “Not particularly mechanistic.” "What's the mechanism?" So then what exactly do we mean by mechanism? I don’t think it’s an idle question—rather, I think it gets down to the very essence of what we think science means. And I think there are some practical consequences on everything from how we report results to the questions we may choose to study (and consequently to how we evaluate science). So I’ll try and organize this post around a few concrete proposals.

To start: I think the definition I’ve settled on for mechanism is “a model for how something works”.

I think it’s interesting to think about how the term mechanism has evolved in our field from something that really was mechanism once upon a time into something that is really not mechanism. In the old days, mechanism meant figuring out e.g. what an enzyme did and how it worked, perhaps in conjunction with other enzymes. Things like DNA polymerase and ATP synthase. The power of the hard mechanistic knowledge of this era is hard to overstate.

What can we learn about the power of mechanism/models from this example?

As the author of this post argues, models/theories are “inference tickets” that allow you to make hard predictions in completely new situations without testing them. We are used to thinking of models as being written in math and making quantitative predictions, but this need not be the case. Here, the predictions of how these enzymes function has led to, amongst other things, our entire molecular biology toolkit: add this enzyme, it will phosphorylate your DNA, add this other enzyme, it will ligate that to another piece of DNA. That these enzymes perform certain functions is a “mechanism” that we used to predict what would happen if we put these molecules in a test tube together, and that largely bore out, with huge practical implications.

Mechanisms necessarily come with a layer of abstraction. Perhaps we are more used to talking about these in models, where we have a name for them: “assumptions”. Essentially, there is a point at which we say, who knows, we’re just going to say that this is the way it is, and then build our model from there. In this case, it’s that the enzyme does what we say it will. We still have quite a limited ability to take an unknown sequence of amino acids and predict what it will do, and certainly very limited ability to take a desired function and just write out the sequence to accomplish said function. We just say, okay, assume these molecules do XYZ, and then our model is that they are important for e.g. transcription, or reverse transcription, or DNA replication, or whatever.

Fast forward to today, when a lot of us are studying biological regulation, and we have a very different notion of what constitutes “mechanism”. Now, it’s like oh, I see a correlation between X and Y, the reviewer asks for “mechanism”, so you knock down X and see less Y, and that’s “mechanism”. Not to completely discount this—I mean, we’ve learned a fair amount by doing these sorts of experiments, but I think it’s a pretty clear that this is not sufficient to say that we know how it works. Rather, this is a devolution to empiricism, which is something I think we need to fix in our field.

Perhaps the most salient question is what it does it mean to know “how it works?”. I posit that mechanism is an inference that connects one bit of empiricism to another. Let’s illustrate in the case of something where we do know the mechanism/model: a lever.






“How it works” in this context means that we need a layer of abstraction, and have some degree of inference given that layer of abstraction. Here, the question may be “how hard do I have to push to lift the weight?”. Do we need to know that the matter is composed of quarks to make this prediction, or how hard the lever itself is? No. Do we need to know how the string works? No. We just assume the weight pulls down on the string and whatever it’s made of is irrelevant because we know these to be empirically the case. We are going to assume that the only things that matter are the locations of the weight, the fulcrum, and my finger, as well as the weight of the, uhh, weight and how hard I push. This is the layer of abstraction the model is based on. The model we use is that of force balance, and we can use that to predict exactly how hard to push given these distances and weights.

How would a modern data scientist approach this problem? Probably take like 10,000 levers and discover Archimedes Law of the Lever by making a lot of plots in R. Who knows, maybe this is basically how Archimedes figured it out in the first place. It is perhaps often possible to figure out a relationship empirically, and even make some predictions. But that’s not what we (or at least I) consider a mechanism. I think there has to be something beyond pure empiricism, often linking very disparate scales or processes, sometimes in ways that are simply impossible to investigate empirically. In this case, we can use the concepts of force to figure out how things might work with, say, multiple weights, or systems of weights on levers, or even things that don’t look like levers at all. Wow!

Okay, so back to regulatory biology. I think one issue that we suffer from is that what we call mechanism has moved away from true “how it works” models and settled into what is really empiricism, sort of without us noticing it. Consider, for instance, development. People will say, oh, this transcription factor controls intestinal development. Why do they say that? Well, knock it out and there’s no intestine. Put it somewhere else and now you get extra intestine. Okay, but that’s not how it works. It’s empirical. How can you spot empiricism? A good sign is excessive obsession with statistics: effect sizes and p-values are often a good sign that you didn’t really figure out how it works. Another sign is that we aren’t really able to apply what we learned outside of the original context. If I gave you a DNA typewriter and said, okay, make an intestine, you would have no idea how to do it, right? We can make more intestine in the original context, but the domain of applicability is pretty limited.

Personally, I think that these difficulties arise partially because of our tools, but mostly because I think we are still focused on the wrong layers of abstraction. Probably the most common current layers of abstraction are those of genes/molecules, cells, and organisms. Our most powerful models/mechanisms to date are the ones where we could draw straight lines connecting these up. Like, mutate this gene, make these cells look funny, now this person has this disease. However, I think these straight lines are more the exception than the norm. Mostly, I think these mappings are highly convoluted in interwoven systems, making it very hard to make predictions based on empiricism alone (future blog post coming on Omnigenic Model to discuss this further).

Which leads me to a proposal: let’s start thinking about other layers of abstraction. I think that the successes of the genes/molecules -> cells paradigm has led to a certain ossification of thought centered around thinking of genes and molecules and cells as being the right layers of abstraction. But maybe genes and cells are not such fundamental units as we think they are. In the context of multicellular organisms, perhaps cells themselves are passive players, and rather it is communities of cells that are the fundamental unit. Organoids could be a good example of this, dunno. Also, it is becoming clear that genetics has some pretty serious limits in terms of determining mechanism in the sense I’ve defined. Is there some other layer involving perhaps groups of genes? Sorry, not a particularly inspired idea, but whatever, something like that maybe. Part of thinking this way also means that we have to reconsider how we evaluate science. As Rob pointed out, we have gotten so used to equating “mechanism” to “molecules and their effects on cells” that we have become both closed minded to other potential types of mechanism while also deceiving ourselves into allowing empiricism to pose as mechanism under the guise of statistics. We just have to be open to new abstractions and not hold everyone to the "What's the molecule?" standard.

Of course, underlying this is an open question: do such layers of abstraction that allow mechanism in the true sense exist? Complexity seems to be everywhere in biology, and my reaction so far has been to just throw up my hands up and say “it’s complicated!”. But (and this is another lesson learned from Rob), that’s not an excuse—we have to at least try. And I do think we can find some mechanistic wormholes through the seemingly infinite space of empiricism that we are currently mired in.

Regardless of what layers of abstraction we choose, however, I think that it is clear that a common feature of these future models will be that they are multifactorial, meaning that they will simultaneously incorporate the interactions of multiple molecules or cells or whatever the units we choose are. How do we deal with multiple interactions? I’m not alone in thinking that our models need to be quantitative, which as noted in my first post, is an idea that’s been around for some time now. However, I think that a fair charge is that in the early days of this field, our quantitative models were pretty much window dressing. I think (again a point that I’ve finally absorbed from Rob) that we have to start setting (and reporting) quantitative goals. We can’t pick and choose how our science is quantitative. If we have some pretty model for something, we better do the hard work to get the parameters we need, make hard quantitative predictions, and then stick to them. And if we don’t quantitatively get what we predict, we have to admit we were wrong. Not partly right, which is what we do now. Here’s the current playbook for a SysBio paper: quantitatively measure some phenomenon, make a nice model, predict that removal of factor X should send factor Y up by 4x, measure that it went up 2x, and put a bow on it and call it a day. I think we just have to admit that this is not good enough. This “pick and choose” mix of quantitative and qualitative analyses is hugely damaging because it makes it impossible to build upon these models. The problem is that qualitative reporting in, say, abstracts leads to people seeing “X affects Y” and “Y affects Z” and concluding “thus, X affects Z” even though the effects for X on Y and Y on Z may be small enough to make this conclusion pretty tenuous.

So I have a couple proposals. One is that in abstracts, every statement should include some sort of measure of the percentage of effect explained by the putative mechanism. I.e., you can’t just say “X affects Y”. You have to say something like “X explains 40% of the change in Y”. I know, this is hard to do, and requires thought about exactly what “explains” means. But yeah, science is hard work. Until we are honest about this, we’re always going to be “quantitative” biologists instead of true quantitative biologists.

Also, as a related grand challenge, I think it would be cool to try and be able to explain some regulatory process in biology out to 99.9%. As in, okay, we really now understand in some pretty solid way how something works. Like, we actually have mechanism in the true sense. You can argue that this number is arbitrary, and it is, but I think it could function well as an aspirational goal.

Any discussion of empiricism vs. theory will touch on the question of science vs. engineering. I would argue that—because we’re in an age of empiricism—most of what we’re doing in biology right now is probably best called engineering. Trying to make cells divide faster or turn into this cell or kill that other cell. And it’s true that look, whatever, if I can fix your heart, who cares if I have a theory of heart? One of my favorite stories along these lines is the story of how fracking was discovered, which was purely by accident (see Planet Money podcast): a desperate gas engineer looking to cut costs just kept cutting out an expensive chemical and seeing better yield until he just went with pure water and, voila, more gas than ever. Why? Who cares! Then again, think about how many mechanistic models went into, e.g., the design of the drills, transportation, everything else that goes into delivering energy. I think this highlights the fact that just like science and engineering are intertwined, so are mechanism and empiricism. Perhaps it’s time, though, to reconsider what we mean by mechanism to make it both more expansive and rigorous.

Monday, August 6, 2018

The biologist's arrow


Guest post by Caroline Bartman

How do we understand biology? “Mutant IDH2 <arrow> 2-hydroxyglutarate <arrow> hypermethylation <arrow> cell proliferation (?),” I scribbled at the top of a paper I read this week. My mind requires linear relationships, direct chains of cause and effect, to retain the findings of a paper I read.

Evidence suggests that this is not how biology in general operates. For example, Pritchard’s ‘omnigenic theory’ synthesizes many years of work to show that most polymorphisms contribute to the total phenotype in a significant but barely detectable way. Identifying each genetic variant that contributes to a phenotype requires many years of costly effort and will culminate with a long list of polymorphisms that incrementally contribute to a phenotype. (Exceptions to this rule- PCSK9- are valuable but rare.) Not only are most contributions miniscule (median contribution of significant height SNPs is 0.00143 meters according to Pritchard), but many polymorphisms play a role in a wide range of traits, by influencing broadly expressed genes. Our search for cause <arrow> effect reveals a tangled thicket of partial causes and modest effects.

Human genetic studies are not the only realm where such complexity dominates. We perform RNA sequencing of wild-type and knockout cells, find a thousand differentially expressed genes, and then focus on a single target gene. We do a screen and follow up on a single hit. It boggles the mind to understand that all of the hits, probably even some below the significance threshold, contribute to that biological process every time it occurs. So we ignore this tangle in order to tell a story, to write a paper, to give a talk that other scientists will appreciate.

This struggle to understand continues as we try to finish a study. Many scientific projects reach an uncomfortable stage where we have a phenotype in hand, a dramatic finding with some relevance to an open biological question, but we require a bit of mechanism for the last figure. (We use the phrase ‘bit of mechanism’ with a half-ashamed laugh.) A bit of mechanism? A handle to give readers, to reassure them that biology is not random, there is a reason for our finding, there is ultimately something to understand? How many of these last figure gambits are quickly abandoned by the relevant subfield as future studies fail to support these ‘mechanisms,’ or change their interpretation beyond recognition?

How do we as humans with limited intelligence, limited bandwidth, limited attention span understand complex biological processes?

Does understanding biology even matter? Don’t we do biology to help patients, to solve problems, to cure disease? But one of the most attractive things about biology for me was that there is a truth outside oneself. Unlike consulting, or writing, or reporting, which are all ways humans can talk about humans, or operate in artificial systems constructed by humans, I believed that science was the way to escape from navel-gazing, the way out of the closed loop. It is not all about humans and feelings and opinions! There are truths outside our selves that we can understand! Just look at ribosomes, or whales, or frogs, or the lac operon and you see a truth that does not require humans as an origin but that humans could find a logic behind. But can we actually understand that logic?

This concern does not lend itself well to selecting and starting a new biological project. The papers that are most beautiful and elegant to me are the simplest. But they leave me with a disquieting feeling that they have achieved beauty by denying complexity.


Thursday, June 14, 2018

Notes from Frontiers in Biophysics conference in Paros, episode 1 (pilot): Where's the beef in biophysics?

Long blog post hiatus, which is a story for another time. For now, I’m reporting from what was a very small conference on the Frontiers of Biophysics from Paros, a Greek island in the Aegean, organized by Steve Quake and Rob Phillips. The goals of the conference were two-fold:
  1. Identify big picture goals and issues in biophysics, and
  2. Consider ways to alleviate suffering and further human health.
Regarding the latter, I should say at the outset that this conference was very generously supported by Steve through the foundation he has established in memory of his mother-in-law Eleftheria Peiou, who sounds like she was a wonderful woman, and suffered through various discomforts in the medical system, which was the inspiration behind trying to reduce human suffering. I actually found this directive quite inspiring, and I’ve personally been wondering what I could do in that vein in my lab. I also wonder whether the time is right for a series of small Manhattan Projects on various topics so identified. But perhaps I’ll leave that for a later post.

Anyway, it was a VERY interesting meeting in general, and so I think I’m going to split this discussion up based on themes across a couple different blog posts, probably over the course of the next week or two. Here are some topics I’ll write about:

Exactly what is all this cell type stuff about

Exactly what do we mean by mechanism

I need a coach

What are some Manhattan Projects in biology/medicine

Maybe some others

So the conference started with everyone introducing themselves and their interests (research and otherwise) in a 5 minute lightning talk, time strictly enforced. First off, can I just say, what a thoughtful group of folks! It is clear that everyone came prepared to think outside their own narrow interests, which is very refreshing.

The next thing I noticed a lot of was a lot of hand-wringing about what exactly we mean by biophysics, which is what I’ll talk about for the rest of this blog post. (Please keep in mind that this is very much an opinionated take and does not necessarily reflect that of the conferees.) To me, basically, biophysics, as seemingly defined at this meeting, as a whole needs a pretty fundamental rebranding. Raise your hand if biophysics means one of the following to you:
  1. Lipid rafts
  2. Ion channels
  3. A bunch of old dudes trying to convince each other how smart they are (sorry, cheap shot intended for all physicists) ;)
If you have not raised your hand yet, then perhaps you’re one of the lonely self-proclaimed “systems biologists” out there, a largely self-identified group that has become very scattered since around 2000. What is the history of this group of people? Here’s a brief (and probably offensive, sorry) view of molecular biology. Up until the 80s, maybe 90s, molecular biology had an amazing run, working out the genetic code, signaling, aspects of gene regulation, and countless other things I’m forgetting. This culminated in the “gene-jock” era in which researchers could relate a mutation to a phenotype in mechanistic detail (this is like the Cell golden era I blogged about earlier). Since that era, well… not so much progress, if you ask me—I’m still firmly of the opinion that there haven’t really been any big conceptual breakthroughs in 20-30 years, except Yamanaka, although one could argue whether that’s more engineering. I think this is basically the end of the one-gene-one-phenotype era. As it became clear that progress would require the consideration of multiple variables, it also became clear that a more quantitative approach would be good. For ease of storytelling, let’s put this date around 2000, when a fork in the road emerged. One path was the birth of genomics and a more model-free statistical approach to biology, one which has come to dominate a lot of the headlines now; more on that later. The other was “systems biology”, characterized by an influx of quantitative people (including many physicists) into molecular biology, with the aim of building a quantitative mechanistic model of the cell. I would say this field had its heyday from around 2000-2010 (“Hey look Ma, I put GFP on a reporter construct and put error bars on my graph and published it in Nature!”), after which folks from this group have scattered towards more genomics-type work or have moved towards more biological applications. I think that this version of "systems biology" most accurately describes most of the attendees at the meeting, many of whom came from single molecule biophysics.

I viewed this meeting as a good opportunity to maybe take score and see how well our community has done. I think Steve put it pretty concisely when he said “So, where’s the beef?” I.e., it's been a while, and so what does our little systems biology corner of the world have to show for itself in the world of biology more broadly? Steve posed the question at dinner: “What are the top 10 contributions from biophysics that have made it to textbook-level biology canon?” I think we came up with two: Hodgkin and Huxley’s model of action potentials, gene expression “noise”, and Luria and Delbrück’s work on genetic heritability (and maybe kinetic proofreading; other suggestions more than welcome!). Ouch. So one big goal of the meeting was to identify where biophysics might go to actually deliver on the promise and excitement of the early 2000s. Note: Rob had a long list of examples of cool contributions, but none of them has gotten a lot of traction with biologists.

I’ll report more on some specific ideas for the future later, but for now, here’s my personal take on part of the issue. With the influx of physicists came an influx of physics ideas. And I think this historical baggage mostly distracts from the problems we might try to solve (Stephan Grill made this point as well, that we need something fundamentally new ways of thinking about problems). This baggage from physics is I think a problem both strategically and tactically. At the most navel-gazy level, I feel like discussions of “Are we going to have Newton’s laws for biology” and “What is going to be the hydrogen atom of the cell” and “What level of description should we be looking at” never really went anywhere and feel utterly stale at this point. On a more practical level, one issue I see is trying to map quantitative problems that come up in biology back to solved problems in physics, like the renormalization group or Hamiltonian dynamics or what have you. Now, I’m definitely not qualified to get into the details of these constructs and their potential utility, but I can say that we’ve had physicists who are qualified for some time now, and I think I agree with Steve: where’s the beef?

I think I agree with Stephan that perhaps we as a community perhaps need to take stock of what it is that we value about the physics part of biophysics and then maybe jettison the rest. To me, the things I value about physics are quantitative rigor and the level of predictive power that goes with it (more on that in blog post on mechanism). I love talking to folks who have a sense for the numbers, and can spot when an argument doesn’t make quantitative sense. Steve also mentioned something that I think is a nice way to come up with fruitful problems, which is looking at existing data through a quantitative lens to be able to find paradoxes in current qualitative thinking. To me, these are important ways in which we can contribute, and I believe will have a broader impact in the biological community (and indeed already has through the work of a number of “former” systems biologists).

To me, all this raises a question that I tried to bring up at the meeting but that didn’t really gain much traction in our discussions, which is how do we define and build our community? So far, it’s been mostly defined by what it is not: well, we’re quantitative, but not genomics; we’re like regular biology, but not really; we’re… just not this and that. Personally, I think our community could benefit from a strong positive vision of what sort of science we represent. And I think we need to make this vision connect with biology. Rob made the point, which is certainly valid, that maybe we don’t need to care about what biologists think about our work. I think there’s room for that, but I feel like building a movement would require more than us just engaging in our own curiosities.

Which of course begs the question of why we would need to have a “movement” anyway. I think there’s a few lessons to learn from our genomics colleagues, who I think have done a much better job of creating a movement. I think there are two main benefits. One is attracting talent to the field and building a “school of thought”. The other is attracting funding and so forth. Genomics has done both of these extremely well. There are dangers as well. Sometimes genomics folks sound more like advocates than scientists, and it’s important to keep science grounded in data. Still, overall, I think there are huge benefits. Currently, our field is a bunch of little fiefdoms, and like it or not, building things bigger than any one person involves a political dimension.

So how do we define this field? One theme of the conference that came up repeatedly was the idea of Hilbert Problems, which for those who don’t know, is a list of open math problems set out in 1900 by David Hilbert, and they were very influential. Can we perhaps build a field around a set of grand challenges? I find that idea very appealing. Although I think that given that I’ve increasingly come to think of biology as engineering instead of science, I wonder if maybe phrasing these questions instead in engineering terms would be better, sort of like a bunch of biomedical Manhattan Projects. I’ll talk about some ideas we came up with in a later blog post.

Anyway, more in the coming days/weeks…