Wednesday, August 21, 2019

I <3 Adobe Illustrator (for scientific figure-making) and I hope that you will too

Guest post by Connie Jiang

As has been covered somewhat extensively (see here, here, and here), we are a lab that really appreciates the flexibility and ease with which one can use Illustrator to compile and annotate hard-coded graphical data elements to create figures. Using Illustrator to set things like font size, marker color, and line weighting is often far more intuitive and time-efficient than trying to do so programmatically. Furthermore, it can easily re-arrange/re-align graphics and create beautiful vector schematics, with far more flexibility than hard-coded options or PowerPoint.

So why don’t more people use Illustrator?

For one, it’s not cheap. We are lucky to have access to relatively inexpensive licenses through Penn. If expense is your issue, I’ve heard good things about Inkscape and Gimp, but unfortunately I have minimal experience with these and this document will not discuss them. Furthermore, as powerful and flexible as Illustrator is, its interface can be overwhelming. Faced with the activation energy and cognitive burden of having to learn how to do even basic things (drawing an arrow, placing and reshaping a text box without distorting the text it contains), maybe it’s unsurprising that so many people continue to use PowerPoint, a piece of software that most people in our lab first began experimenting with prior to 8th grade [AR editor’s note: uhhh… not everyone]. 

Recently, I decided to try to compile a doc with the express purpose of decreasing that activation energy of learning to use Illustrator to accomplish tasks that we do in the lab setting. Feel free to skip to the bottom if you’d just like to get to that link, but here were the main goals of this document:
  1. Compile a checklist to run through for each figure before submission. This is a set of guidelines and standards we aim to adhere to in lab to maintain quality and consistency of figures.
  2. Give a basic but thorough rundown of essentially everything in Illustrator that you need to begin to construct a scientific figure. Furthermore, impart the Illustrator “lingo” necessary to empower people to search for more specific queries.
  3. Answer some of what I feel to be the most FAQs. Due to my love of science-art and general artistic/design experimentation, I’ve spent a lot of time in Illustrator, so people in lab will sometimes come to me with questions. These are questions like: “my figure has too many points and is slowing my Illustrator down: how can I fix it?” and “what’s the difference between linked and embedded images?”. Additionally, there are cool features that I feel like every scientist should be able to take advantage of, like “why are layers super awesome?” and “how can I select everything of similar appearance attributes?”.
Finally, a disclaimer: This document will (hopefully) give you the tools and language to use Illustrator as you see fit. It does not give any design guidance or impart aesthetic sense (aside from heavily encouraging you to not use Myriad Pro). Make good judgments~

Full Raj lab basic Illustrator guide can be found here.

Sunday, August 4, 2019

I need a coach

I’ve been ruminating over the course of the last several years on a conversation I had with Rob Phillips about coaches. He was saying (and hopefully he will forgive me if I’m mischaracterizing this) that he has had people serve the role of coach in his life before, and that that really helped push him to do better. It’s something I keep coming back to over and over, especially as I get further along in my career.

In processing what Rob was saying, one of the first questions that needed answering is exactly what is a coach? I think most of us think about formal training interactions (i.e., students, postdocs) when we think of coaching in science, and I think this ends up conflating two actually rather disparate things, which are mentoring and coaching. At least for me, mentorship is about wisdom that I have accumulated about decision making that I can hopefully pass on to others. These can be things like “Hmm, I think that experiment is unlikely to be informative” or “That area of research is pretty promising” or “I don’t think that will matter much for a job application, I would spend your time on this instead”. A coach, on the other hand, is someone who will help push you to focus and implement strategies for things you already know, but are having trouble doing. Like “I think we can get this experiment done faster” or “This code could be more cleanly written” or “This experiment is sloppy, let’s clean it up”. Basically, a mentor gives advice on what to do, a coach gives advice on how to actually do it.

Why does this decoupling matter, especially later in your career? When in a formal training situation, you will often get both of these from the same people—the same person, say, guiding your research project is the same person pushing you to get things done right. But after a few years in a faculty position, the N starts to get pretty small, and as such I think the value of mentorship per se diminishes significantly; basically, everybody gives you a bunch of conflicting advice on what to do in any given situation, which is frankly mostly just a collection of well-meaning but at best mildly useful anecdotes. But while the utility of mentorship decreases (or perhaps the availability of high quality mentorship) decreases, I have found that I still have a need for someone to hold me accountable, to help me implement the wisdom that I have accumulated but am sometimes too lazy or scared to put into practice. Like, someone to say “hey, watch a recording of your lecture finally and implement the changes” or “push yourself to think more mechanistically, your ideas are weak” or “that writing is lazy, do better” or “finish that half-written blog post”. To some extent, you can get this from various people in your life, and I desperately seek those people out, but it’s increasingly hard to find the further along you are. Moreover, even if you do find someone, they may have a different set of wisdom that they would be trying to implement for you, like, coaching you towards what they think is good, not what you yourself think is good (“Always need a hypothesis in each specific aim” whereas maybe you’ve come to the conclusion that that’s not important or whatever). If you have gotten to the point where you’ve developed your own set of models of what matters or doesn’t in the world, then you somehow need to be able to coach yourself in order to achieve those goals.

Is it possible to self-coach? I think so, but I’ve always struggled to figure out how. I guess the first step is to think about what makes a good coach. To me, the role of a good coach is to devise a concrete plan (often with some sort of measurable outcome) that promotes a desired change in default behavior. For example, when working with people in the lab in a coaching capacity, one thing I’ve tried to do is to propose concrete goals to try and help overcome barriers. If someone could be participating more in group meeting and seminars, I’ll say “try to ask at least 3 questions at group meeting and one at every seminar” and that does seem to help. Or I’ll push someone to make their figures, or write down their experiment along with results and conclusions. Or make a list of things to do in a day and then search for one more thing to add. Setting these sorts of rules can help provide the structure to achieve these goals and model new behaviors.

How do you implement these coaching strategies for yourself? I think there are a few steps, the first of which are relatively easy. Initially, the issue is to identify the issue, which is actually usually fairly clear: “I want to reduce time spent on email”, “I want to write clean code”, “I want to construct a set of alternative hypotheses every time I come up with some fun new idea”, “Push myself to really think in a model-based fashion”. Next, is reduction to a concrete set of goals, which is also usually pretty easy: “Read every email only once and batch process them for a set period of time” or “write software that follows XYZ design pattern” or “write down alternative hypotheses”. The biggest struggle is accountability, which is where having a coach would be good. How do I enforce the rules when I’m the only one following them?

I’m not really sure, but one thing that works for me (which is perhaps quite obvious) is to rely on something external for accountability. For example, I am always looking for ways to improve my talks, and value being able to do a good job. However, it was hard to get feedback, and even when I did, I often didn’t follow through to implement said feedback. So I did this thing where I show the audience a QR code which leads them to a form for feedback. Often, they pointed out things I didn’t realize were unclear, which was of course helpful. But what was also helpful was when they pointed out things that I already knew were unclear, but had been lazy about fixing. This provided me with a bit of motivation to finally fix the issue, and I think it’s improved things overall. Another externalization strategy I’ve tried is to imagine that I’m trying to model behavior for someone else. Example: I was writing some software a while back for the lab, and there were times where I could have done something in the quick, lazy, and wrong way, rather than in the right way. What helped motivate me to do it right was to say to myself, “Hey, people in the lab are going to look at this software as an example of how to do things, and I need to make sure they learn the right things, so do it right, dummy”.

Some things are really hard to externalize, like making sure you stress test your ideas with alternative hypotheses and designing the experiments that will rigorously test them. One form of externalization that works for me is to imagine former lab members who were really smart and critical and just imagine them saying to me “but what about…”. Just imagining what they might say somehow helps me push myself to think a bit harder.

Any thoughts on other ways to hold yourself accountable when nobody else is looking?

Monday, May 6, 2019

Wisdom of crowds and open, asynchronous peer review

I am very much in favor of preprints and open review, but something I listened to on Planet Money recently gave me some food for thought, along with a recent poll I tweeted about re-reviewing papers. The episode was about wisdom of the crowds, and how magically if you take a large number of non-expert guesses about, say, the weight of an ox, the average comes out pretty close to the actual value. Pretty cool effect!

But something in the podcast caught my ear. They talked about how when they asked some kids, you had to watch out, because once one kid said, say, 300 pounds (wildly inaccurate), then if the other kids heard it, then they would all start saying 300 pounds. Maybe some minor variations, but the point is that they were strongly influenced by that initial guess, rather than just picking something essentially purely random. The thing was that if you had no point of reference, then even a guess provides that point of reference.

Okay, so what does this have to do with peer review? What got me thinking about it was the tweet about re-reviewing a paper you had already seen but for a different journal. I'm like nah not gonna do it because it's a waste of time, but some people said, well, you are now biased. So… in a world where we openly and asynchronously review papers (preprints, postpub, whatever), we would have the same problem that the kids guessing the weight of the cow did: whoever gives the first opinion would potentially strongly influence all subsequent opinions. With conventional peer review, everyone does it blind to the others, and so reviews could be considered more independent samplings (probably dramatically undersampled, but that's another blog post). But imagine someone comments on a preprint with some purported flaw. That narrative is very likely to color subsequent reviews and discussions. I think we've all seen this coloring: take eLife collaborative peer review, or even grant review. Everyone harmonizes their scores, and it's often not an averaging. One could argue that unlike randos on the internet guessing a cow's weight, peer reviewers are all experts. Maybe, but I am somehow not so sure that once we are in the world of experts reviewing what is hopefully a reasonably decent paper that there's much signal beyond noise.

What could we do about this? Well, we could commission someone to hold all the open reviews in confidence and then publish them all at once… oh wait, I think we already have some annoying system for that. I dunno, not really sure, but anyway, was something I was wondering about recently, thoughts welcome.

Sunday, April 28, 2019

Reintegrating into lab following a mental health leave

[From AR] These days, there is a greatly increased awareness and decreased stigmatization of mental health amongst trainees (and faculty, for that matter), which is great. For mentors, understanding mental health issues amongst trainees is super important, and something we have until recently not gotten a lot of training on. More recently, it is increasingly common to get some training or at least information on how to recognize the onset of mental health issues, and in graduate groups here at Penn at least, it is fairly straightforward to initiative a leave of absence to deal with the issue, should that be required. However, one aspect of handling mental health leaves for which there appears to be precious little guidance out there is what challenges trainees face when returning from a mental health leave of absence, and what mentors might do about it. Here, I present a document written by four anonymous trainees with some of their thoughts (and I will chime in at the end with some thoughts from the mentor perspective).

[From trainees] This article is a collection of viewpoints from four trainees on mental health in academia. We list a collection of helpful practices on the part of the PI and the lab environment in general for cases when the trainees return to lab after recovering from mental health issues.

A trainee typically returns either because they feel recovered and ready to get back to normalcy, or they are **better** than before and have self-imposed goals (e.g. finishing their PhD), or they just miss doing science. Trainees in these situations are likely to have spent time introspecting on multiple fronts and they often return with renewed drive. However, it is very difficult to shake off the fear of recurrence of the episode (here we use episode broadly to refer to a phase of very poor mental health), which can make trainees more vulnerable and sensitive to external circumstances than an average person; for instance, minor stresses can appear much larger. In particular, an off-day post a mental health issue can make one think they are already slipping back into it. In some cases, students may find it more difficult to start a new task, perhaps due to the latent fear of not being able to learn afresh. Support from the mentor and lab environment in general can be crucial in both providing and sustaining the confidence of the trainee. It is important that the mentor recognize that the act of returning to the lab is an act of courage in itself. The PI’s interactions with the trainee have a huge bearing on how the trainee re-integrates into his/her work. Here are some steps that we think can help:

Explicitly tell trainees to seek the PI out if they need help. This can be important for all trainees to hear because the default assumption is that these are personal problems to be dealt with personally in its entirety. In fact, advisors should do this with every trainee -- explicitly tell them that they are there to be reached out to, should their mental health be compromised/affected in any way. Restating this to a returning trainee can help create a welcoming and safe environment.

Reintegrating the trainee into the lab environment. The PI should have an open conversation with the trainee about how much information they want divulged to the rest of the group/department, and how they communicate the trainee’s absence to the group, if at all.

Increased time with the mentee. More frequent meetings with a returning student for the first few months help immensely for multiple reasons: a. It can help quell internal fears by a process of regular reinforcement; b. It can get the students back on track with their research faster; c. The academically stimulating conversations can provide the gradual push needed to think at a level they were used to before mental health issues. Having said that, individuals have their preferred way of dealing with the re-entering situation and a frank conversation about how they want to proceed helps immensely.

Help rebuild the trainee’s confidence. One of the authors of this post recounts her experience of getting back on her feet. Her advisor unequivocally told her: “Your PhD will get done; you are smart enough. You just need to work on your mental health, and I will work with you to make that the first priority.” Words of encouragement can go a long way -- there is ample anecdotal evidence that people can fully recover from their mental health state if proper care is taken by all stakeholders.

Create a small, well-defined goal/team goals. One of the authors of this article spent her first few months working on a fairly easy and straightforward project with a clear message, one that was easy to keep pushing on as she settled in to lab again. While this may not be the best way forward for everyone depending on where they are with their research, a clearly-defined goal can come as a quick side-project, or a deliberate breaking-down of a large project into very actionable smaller ones. Another alternative is to allow the trainee to work with another student/postdoc, something which allows a constant back-and-forth, and quicker validation which can lead to less room for mental doubt.

Remember that trainees may need to come back for a variety of other reasons as well. There are costs associated with a prolonged leave of absence, and for some trainees, they may need to come back before they are totally done with their mental health work. It's likely that some time needs to be set aside to continue that work, and it's helpful if PIs can work with students to accommodate that, within reason.

Finally, it is important for all involved parties to realize that the job of a PI is not to be the trainee’s parent, but to help the student along in their professional journey. Facilitating a lab environment where one feels comfortable, respected, and heard goes a long way, even if that means going an extra mile on the PI’s part to ensure such conditions, case-by-case.

[Back to AR] Hopefully this article is helpful for mentors and also for trainees as they try to reintegrate into the lab. For my part as a mentor, I think that a little extra empathy and attention can go a long way. I think it's important for all parties to realize that mentors are typically not trained mental health professionals, but some common sense guidelines could include increased communication, reasonable expectations, and in particular a realization that tasks that would seem quite easy for a trainee to accomplish before might be much harder now at first, in particular anything out of the usual comfort zone, like a new technique, etc.

Comments more than welcome; it seems this is a relatively under-reported area. And a huge thank you to the anonymous writers of this letter for starting the discussion.

Monday, February 18, 2019

Dear me, I am awesome. Sincerely, me… aka How to write a letter of rec for yourself

Got an email from someone who got asked to write a letter for themselves by someone else and was looking for guidance… haha, now that PI has made work for me! :) Oh well, no problem, I actually realize how hard this is for the letter drafter, and it’s also something for which there is very little guidance out there for obvious reasons. So I thought I’d make a little guide. Oh, first a couple things. First off, I don’t really know all that much about doing this, having written a few for myself and having asked for a couple, so comments from others are most welcome. Secondly, if you’re one of those sanctimonious types who thinks the PIs should write every letter and never ask for a draft, well, this blog post is probably not for you so don’t bug me about it. Third, if the PI is European, maybe just like turn everything down a notch, ya know? ;)

Anyhoo: so I figure the best way to describe how to do this is to describe how I write a letter. I’ll aim it at how I write letters for, say, a former trainee applying for a postdoc fellowship, maybe with some notes about how this might change for faculty applying for some sort of award or something.

Okay. I usually use the first paragraph to give an executive summary. Here’s an example of what I might write:
“It is my pleasure to provide my strongest possible recommendation for Dr. Nancy Longpaper. Nancy is simply an incredible scientist: she has developed, from scratch and by combining both experimental and computational skills, a system that has led to fundamental new insights into the evolution of frog legs. She has all the tools to be a superstar in her field: talent, intellectual brilliance, work ethic, and raw passion for science to become a stellar independent scientist. I look forward to watching her career unfold in the coming years.”
Or whatever something like that. The key parts of this that you will want to leave blank is the first sentence, i.e., the “strongest possible recommendation” part. That’s an important part that the letter writer will fill in.

Okay, second (optional) paragraph. This one depends a bit on personality. For some letter writers, they like to include a bit about how awesome they are and thus how qualified they are to write the letter. This is important for things like visas and so forth. This could be something like “First, I would like to introduce myself and my expertise. My laboratory studies XYZ, and I am an expert in ABC. I have published several peer reviewed articles in renowned journals such as Proceedings of the Canadian Horticultural Society B and our work has been continuously funded by the NIH.” I personally don’t include things like this for regular (non-visa) recommendations, but I have seen it.

Third paragraph: I usually try and put in some context about how I met the person I’m recommending. Like, “I first met Nancy when she was looking for labs to rotate in. She rotated in my lab and worked on project ABC. Even in her short time in the lab, she managed to accomplish XYZ. I immediately offered her a spot, and while I was disappointed for her to join Prof. Goodgrant’s lab, I was very pleased when she asked me to chair her thesis committee.” If you are a junior PI, this might be replaced with something about how the letter writer knows about your work and any interactions you may have had.

Next several paragraphs: a bunch of scientific meat. This is where you are REALLY going to save your letter writer some time. I usually break it into two parts. First paragraph or two, I describe the person’s work. What specifically did they do. PROVIDE CITATIONS, including journal names. Sorry, they matter, too bad. Try and aim for a very general audience, stressing primarily the impact of the findings. But if you don’t, don’t worry, people probably either know the work already or not. Still, try. Emphasize specific contributions. Like, “Nancy herself conceived of the critical set of controls that was required to establish the now well accepted ‘left leg bias estimator’ statistical methodology that was the key to making the discovery that XYZ.” At all times, emphasize why what you did was special. Don’t be shy! If you’re too ridiculous, don’t worry, your letter writer will fix it.

Next part of the science-meat section: in my letters, I usually try and zoom out a bit. Like, what are the specific attributes of the person that led them to be successful in the aforementioned science. Like, “This is a set of findings that only someone of Nancy’s caliber could have discovered. Her intellectual abilities and broad command of the literature enabled her to rapidly ask important questions at the forefront of the field…” Be careful to emphasize big picture important qualities and not just list out your specific skills here. Like, don’t say “Nancy was really good at qPCR and probably ran about 4.32 million of them.” Makes you sound like a drone. At the trainee level, something about how rapidly you picked up skills could be good, but definitely not at the junior faculty level. Just try and be honest about the qualities you have that you think are most important and relevant. Be maybe a little over the top but not too crazy and then maybe your letter writer will embellish as needed.

Second to last paragraph: I try and fill in a bit more personal characteristics here. Like, what are the personal qualities that helped them shine. E.g. “Nancy also is an excellent communicator of her science, and already has excellent visibility. She gives great talks and has generated a lot of enthusiasm……” Also, if relevant, can add the standard “On a personal note, Nancy is a wonderful person to have in the lab……” Probably like 4-5 sentences max. Make it sound like you belong at the level you are applying for. If it’s for a faculty position, make it sound like you are faculty, not a student.

Finally, I end my letters with an “In sum, Nancy is the perfect candidate for XYZ. I have had the privilege of watching many star scientists develop into independent scientists in this field at top institutions over the years, and I consider Nancy to be of that caliber. I cannot recommend her more strongly.” This one can be sort of a skeleton and the letter writer can fill this in with whatever gushy verbiage they want. For some things, there might be some sort of “comparables” statement here that they can put in if they want.

  • Don’t ever say anything bad. If you say something bad, it’s a huge red flag. If the letter writer wants to say something bad, they will. That would be a pretty jerky thing to do, though.
  • Length: There are three things that matter in a letter: the first paragraph, the last paragraph, and how long the letter is in between. For a postdoc thingy, aim for 1.5-2 pages for a strong letter. 2-3 for faculty positions. 1-2 for other stuff after that.
  • Duplication: What do you do if two letter writers ask for a draft? Uhhhh… not actually sure. I have tried to make a few edits, but sometimes I just send it and say hey already sent this and they can kinda edit it up a bit. I dunno, weird situation.
Anyway, that’s my template for whatever it’s worth, and comments welcome from anyone who knows more!

Sunday, February 3, 2019

The sad state of scientific talks (and a thought on how we might help fix it)

Just got back from a Keystone meeting, and I’m just going to say it (rather than subtweet it): most of the talks were bad. I don’t mean to offend anyone, and certainly it was no worse than most other conferences, but come on. Talks over time, filled with jargon and unexplained data incomprehensible to those even slightly outside the field, long rambling introductions… it doesn’t have to be this way, people! Honestly, it also begs the question as to why people bother going to these meetings just to play around on their computers because the talk quality is so poor. I’ve heard so many people say the informal interactions are the most useful thing at conferences. I actually think this is partly because the formal part is so bad.

Why? After all, there are endless resources out there on how to give a good talk. While some tips conflict (titles? no titles? titles? no titles?), mostly they agree on some basic tenets of slide construction and presentation. I wrote this blog post with some tips on structuring talks and also links to a few other resources I think are good. And most graduate programs have at least some sort of workshop or something or other on giving a talk. So why are we in this situation?

I think the key thing to realize is that giving a good talk actually requires working on your talk. A good talk requires more than taking a couple minutes to throw some raw data onto a slide and winging it with how you present that data. For most of us, when we write a paper, it is a long iterative process to achieve clarity and engagement. Why would a talk be any different? (Oh, and by the way, practice is critical, but is not in and of itself sufficient—have to work on the right things; see aforementioned blog post.)

I think the fundamental issue is the nature of feedback and incentives for giving research talks. Without having these structured well, there is little push to do the work required to make a talk good, and they are currently structured very poorly. For incentives, the biggest problem is that the structure to date is all about what you don’t get in the long term, which are often things you don’t know you could get in the first place. Giving a good talk has huge benefits and opens the door to various opportunities long term, but it’s not like someone is going to tell you, “Hey, I had this job opening, but I’m not going to tell you about it now because your talk stunk." Partly, the issue is that the visible benefits of good presentations are often correlated to some extent with brilliance. Take, for instance, Michael Elowitz’s talk at this conference, which my lab hands down voted as the best talk of the conference. Amazing science, clear, and exciting. Michael is a brilliant and deservedly highly successful scientist. Does it help that he is an excellent communicator of his work? Of course! To what extent? I don’t know. What I can say is that many of the best scientists presented their work very well. Where do cause and effect begin and end? Hard to say, but it’s clearly not an independent variable.

Despite this correlation, I still firmly believe that you don’t have to Michael Elowitz-level brilliant to give a great talk. So then why are all these talks so bad? The other element beyond vague incentives is feedback. The most common feedback, regardless of anything about the talk you give, is “Hey, great talk!” Maybe, if you really stunk it up, you’ll get “interesting talk”. And that’s about it. I have many times gotten “Hey, great talk” followed by a question demonstrating that I totally did a terrible job explaining things. I mean, how is anybody ever going to get better if they don’t even get a thumbs-up/down on their presentation? The reason we don’t get that feedback is obviously because of the social awkwardness to telling someone something they did publicly was bad. The main place where people feel safe to give feedback is in lab meeting, which while somewhat helpful is also one of the worst places to get feedback. Asking a bunch of people already intimately familiar with your story and conversant in your jargon about what is clear or not is not going to get you all that far, generally. Also, the person with the most authority in that context (the PI) probably also gives terrible talks and so is not a good person to get feedback from. (Indeed, I have heard many, many stories of PIs actively giving their trainees bad advice.) Generally, the fact that most people you are getting feedback from aren’t themselves typically good at it is a big problem.

Okay, fine…


Again, I think the key missing element is honest feedback—I think most talk-givers don’t even realize just how bad their talks are. As I said, few people are going to tell someone to their face that their talk sucks. So how about the following: what if people preregister their talk on a website, and then people can anonymously submit a rating with comments? Basically like a teacher rating, but for speakers at a conference. You could even provide the link to the rating website on the first slide of your talk or something. This would have a number of advantages. First off, if you don’t want to do it, fine, no problem. Second, all feedback is anonymous, thus allowing people to be honest. Also, the comments allow people to give some more detailed feedback if they so choose. And, there is a strong positive incentive. With permission, you could have your average rating posted. This rating could be compared to e.g. the overall average, and if it’s good—which presumably it is if you decided to share it :)—then that’s great publicity, no?

One problem with this, though, is it doesn’t necessarily provide specific feedback. Like, what was clear or not? Comments could provide this to some extent. Also, if you, as the speaker, are willing, you could even imagine posting some questions related to your talk and seeing how well people got those particular points. Of course completely optional and just for those who really care about improving. Which should be all of us, right? :)

Oh, and one suggestion from Rita Strack was to promote the 15 minute format, which is short enough to either require concision and clarity, or, should that not happen, is over fast! :)

Some suggested (e.g. Katie Whitehead) that we incentivize good talks by doing Skype interviews or having them submit YouTubes, etc. for contributed talks. In principle I like this, but I think it's just a LOT of work and also conflates scientific merit with presentation merit, so people who don't get a spot have something other than their presentation skills to blame. Still, could work maybe.

Another, perhaps more radical idea, is to do away with the talk format entirely. Most scientists are far more clear when answering questions (probably for the simple reason that the audience drives it). Perhaps we could limit talks to 5 minutes followed by some sort of structured Q&A? Not sure how to do that exactly, but anyway, a thought.

Anybody want to give this a try?

Wednesday, August 8, 2018

On mechanism and systems biology

(Latest in a slowly unfolding series of blog posts from the Paros conference.)

Related reading:

Mechanism. The word fills many of us with dread: “Not enough mechanism.” “Not particularly mechanistic.” "What's the mechanism?" So then what exactly do we mean by mechanism? I don’t think it’s an idle question—rather, I think it gets down to the very essence of what we think science means. And I think there are some practical consequences on everything from how we report results to the questions we may choose to study (and consequently to how we evaluate science). So I’ll try and organize this post around a few concrete proposals.

To start: I think the definition I’ve settled on for mechanism is “a model for how something works”.

I think it’s interesting to think about how the term mechanism has evolved in our field from something that really was mechanism once upon a time into something that is really not mechanism. In the old days, mechanism meant figuring out e.g. what an enzyme did and how it worked, perhaps in conjunction with other enzymes. Things like DNA polymerase and ATP synthase. The power of the hard mechanistic knowledge of this era is hard to overstate.

What can we learn about the power of mechanism/models from this example?

As the author of this post argues, models/theories are “inference tickets” that allow you to make hard predictions in completely new situations without testing them. We are used to thinking of models as being written in math and making quantitative predictions, but this need not be the case. Here, the predictions of how these enzymes function has led to, amongst other things, our entire molecular biology toolkit: add this enzyme, it will phosphorylate your DNA, add this other enzyme, it will ligate that to another piece of DNA. That these enzymes perform certain functions is a “mechanism” that we used to predict what would happen if we put these molecules in a test tube together, and that largely bore out, with huge practical implications.

Mechanisms necessarily come with a layer of abstraction. Perhaps we are more used to talking about these in models, where we have a name for them: “assumptions”. Essentially, there is a point at which we say, who knows, we’re just going to say that this is the way it is, and then build our model from there. In this case, it’s that the enzyme does what we say it will. We still have quite a limited ability to take an unknown sequence of amino acids and predict what it will do, and certainly very limited ability to take a desired function and just write out the sequence to accomplish said function. We just say, okay, assume these molecules do XYZ, and then our model is that they are important for e.g. transcription, or reverse transcription, or DNA replication, or whatever.

Fast forward to today, when a lot of us are studying biological regulation, and we have a very different notion of what constitutes “mechanism”. Now, it’s like oh, I see a correlation between X and Y, the reviewer asks for “mechanism”, so you knock down X and see less Y, and that’s “mechanism”. Not to completely discount this—I mean, we’ve learned a fair amount by doing these sorts of experiments, but I think it’s a pretty clear that this is not sufficient to say that we know how it works. Rather, this is a devolution to empiricism, which is something I think we need to fix in our field.

Perhaps the most salient question is what it does it mean to know “how it works?”. I posit that mechanism is an inference that connects one bit of empiricism to another. Let’s illustrate in the case of something where we do know the mechanism/model: a lever.

“How it works” in this context means that we need a layer of abstraction, and have some degree of inference given that layer of abstraction. Here, the question may be “how hard do I have to push to lift the weight?”. Do we need to know that the matter is composed of quarks to make this prediction, or how hard the lever itself is? No. Do we need to know how the string works? No. We just assume the weight pulls down on the string and whatever it’s made of is irrelevant because we know these to be empirically the case. We are going to assume that the only things that matter are the locations of the weight, the fulcrum, and my finger, as well as the weight of the, uhh, weight and how hard I push. This is the layer of abstraction the model is based on. The model we use is that of force balance, and we can use that to predict exactly how hard to push given these distances and weights.

How would a modern data scientist approach this problem? Probably take like 10,000 levers and discover Archimedes Law of the Lever by making a lot of plots in R. Who knows, maybe this is basically how Archimedes figured it out in the first place. It is perhaps often possible to figure out a relationship empirically, and even make some predictions. But that’s not what we (or at least I) consider a mechanism. I think there has to be something beyond pure empiricism, often linking very disparate scales or processes, sometimes in ways that are simply impossible to investigate empirically. In this case, we can use the concepts of force to figure out how things might work with, say, multiple weights, or systems of weights on levers, or even things that don’t look like levers at all. Wow!

Okay, so back to regulatory biology. I think one issue that we suffer from is that what we call mechanism has moved away from true “how it works” models and settled into what is really empiricism, sort of without us noticing it. Consider, for instance, development. People will say, oh, this transcription factor controls intestinal development. Why do they say that? Well, knock it out and there’s no intestine. Put it somewhere else and now you get extra intestine. Okay, but that’s not how it works. It’s empirical. How can you spot empiricism? A good sign is excessive obsession with statistics: effect sizes and p-values are often a good sign that you didn’t really figure out how it works. Another sign is that we aren’t really able to apply what we learned outside of the original context. If I gave you a DNA typewriter and said, okay, make an intestine, you would have no idea how to do it, right? We can make more intestine in the original context, but the domain of applicability is pretty limited.

Personally, I think that these difficulties arise partially because of our tools, but mostly because I think we are still focused on the wrong layers of abstraction. Probably the most common current layers of abstraction are those of genes/molecules, cells, and organisms. Our most powerful models/mechanisms to date are the ones where we could draw straight lines connecting these up. Like, mutate this gene, make these cells look funny, now this person has this disease. However, I think these straight lines are more the exception than the norm. Mostly, I think these mappings are highly convoluted in interwoven systems, making it very hard to make predictions based on empiricism alone (future blog post coming on Omnigenic Model to discuss this further).

Which leads me to a proposal: let’s start thinking about other layers of abstraction. I think that the successes of the genes/molecules -> cells paradigm has led to a certain ossification of thought centered around thinking of genes and molecules and cells as being the right layers of abstraction. But maybe genes and cells are not such fundamental units as we think they are. In the context of multicellular organisms, perhaps cells themselves are passive players, and rather it is communities of cells that are the fundamental unit. Organoids could be a good example of this, dunno. Also, it is becoming clear that genetics has some pretty serious limits in terms of determining mechanism in the sense I’ve defined. Is there some other layer involving perhaps groups of genes? Sorry, not a particularly inspired idea, but whatever, something like that maybe. Part of thinking this way also means that we have to reconsider how we evaluate science. As Rob pointed out, we have gotten so used to equating “mechanism” to “molecules and their effects on cells” that we have become both closed minded to other potential types of mechanism while also deceiving ourselves into allowing empiricism to pose as mechanism under the guise of statistics. We just have to be open to new abstractions and not hold everyone to the "What's the molecule?" standard.

Of course, underlying this is an open question: do such layers of abstraction that allow mechanism in the true sense exist? Complexity seems to be everywhere in biology, and my reaction so far has been to just throw up my hands up and say “it’s complicated!”. But (and this is another lesson learned from Rob), that’s not an excuse—we have to at least try. And I do think we can find some mechanistic wormholes through the seemingly infinite space of empiricism that we are currently mired in.

Regardless of what layers of abstraction we choose, however, I think that it is clear that a common feature of these future models will be that they are multifactorial, meaning that they will simultaneously incorporate the interactions of multiple molecules or cells or whatever the units we choose are. How do we deal with multiple interactions? I’m not alone in thinking that our models need to be quantitative, which as noted in my first post, is an idea that’s been around for some time now. However, I think that a fair charge is that in the early days of this field, our quantitative models were pretty much window dressing. I think (again a point that I’ve finally absorbed from Rob) that we have to start setting (and reporting) quantitative goals. We can’t pick and choose how our science is quantitative. If we have some pretty model for something, we better do the hard work to get the parameters we need, make hard quantitative predictions, and then stick to them. And if we don’t quantitatively get what we predict, we have to admit we were wrong. Not partly right, which is what we do now. Here’s the current playbook for a SysBio paper: quantitatively measure some phenomenon, make a nice model, predict that removal of factor X should send factor Y up by 4x, measure that it went up 2x, and put a bow on it and call it a day. I think we just have to admit that this is not good enough. This “pick and choose” mix of quantitative and qualitative analyses is hugely damaging because it makes it impossible to build upon these models. The problem is that qualitative reporting in, say, abstracts leads to people seeing “X affects Y” and “Y affects Z” and concluding “thus, X affects Z” even though the effects for X on Y and Y on Z may be small enough to make this conclusion pretty tenuous.

So I have a couple proposals. One is that in abstracts, every statement should include some sort of measure of the percentage of effect explained by the putative mechanism. I.e., you can’t just say “X affects Y”. You have to say something like “X explains 40% of the change in Y”. I know, this is hard to do, and requires thought about exactly what “explains” means. But yeah, science is hard work. Until we are honest about this, we’re always going to be “quantitative” biologists instead of true quantitative biologists.

Also, as a related grand challenge, I think it would be cool to try and be able to explain some regulatory process in biology out to 99.9%. As in, okay, we really now understand in some pretty solid way how something works. Like, we actually have mechanism in the true sense. You can argue that this number is arbitrary, and it is, but I think it could function well as an aspirational goal.

Any discussion of empiricism vs. theory will touch on the question of science vs. engineering. I would argue that—because we’re in an age of empiricism—most of what we’re doing in biology right now is probably best called engineering. Trying to make cells divide faster or turn into this cell or kill that other cell. And it’s true that look, whatever, if I can fix your heart, who cares if I have a theory of heart? One of my favorite stories along these lines is the story of how fracking was discovered, which was purely by accident (see Planet Money podcast): a desperate gas engineer looking to cut costs just kept cutting out an expensive chemical and seeing better yield until he just went with pure water and, voila, more gas than ever. Why? Who cares! Then again, think about how many mechanistic models went into, e.g., the design of the drills, transportation, everything else that goes into delivering energy. I think this highlights the fact that just like science and engineering are intertwined, so are mechanism and empiricism. Perhaps it’s time, though, to reconsider what we mean by mechanism to make it both more expansive and rigorous.