Tuesday, December 29, 2015

Is the academic work ethic really toxic?

Every so often, I’ll read something or other about how the culture of work in academia is toxic, encouraging people to work 24/7/52 (why do people say 24/7/365?) and thus ignore all other aspects of their existence and in the process destroying their life. As I’ve written before, I think this argument gets it backwards. I think most academics work hard because they want to and are immersed in what they are doing, not because of the “culture”. It is the conflation of hours and passion that lead to confusion.

Look, I know people who are more successful than I am and work less than I do. Good for them! That doesn’t mean I’m going to start working less hard. To me, if you’re thinking “I need to work X hours to get job Y/award Z”, well, then you’re in the wrong line of work. If you’re thinking “I really need to know about X because, uh, I just need to know” then academia might be for you. Sure, sometimes figuring out X requires a lot of work, and there is a fair amount of drudgery and discipline required to turn an idea into a finished paper. Most academics I know will make the choice to do that work. Some will do it at a pace I would find unmanageable. Some will do it at a pace I find lethargic. I don’t think it really matters. I read a little while ago that Feng Zhang goes back to work every day after dinner and works until 3am doing experiments himself in the lab (!). I couldn’t do that. But again, reading about Zhang, I think it’s pretty clear that he does it because he has a passion for his work. What’s wrong with that? If he wants to work that way, I don’t see any reason he should be criticized for it. Nor, conversely, lionized for it. I think we can praise his passion, though. Along those lines, I know many academics who are passionate about their work and thus very successful, all while working fairly regular hours (probably not 40/week, but definitely not 80/week), together with long vacations. Again, the only requirement for success in science is a desire to do it, along with the talent and dedication to finish what you start.

I think this conflation of hours and passion leads to some issues when working with trainees. To me, I most enjoy working with people who have a passion for their work. Often, but not always, this means that they work long-ish hours. If someone is not motivated, then a symptom is sometimes working shorter hours–or, other times, working long hours but not getting as much done. If we’re to the point where I’m counting someone’s hours, though, then it’s already too late. For trainees, if your PI is explicitly counting hours, then that means either you should find a new PI or carefully consider why your PI is counting your hours. What’s important is that both parties should realize that hours are the symptom, not the underlying condition.

Monday, December 28, 2015

Is all of Silicon Valley on a first name basis?

One very annoying software trend I've noticed in the last several years is the use of just first names in software. For instance, iOS shows first names only in messages. Google Inbox has tons of e-mail conversations involving me and someone named "John". Also, my new artificially intelligent scheduling assistant (which is generally awesome) will put appointments with "Jenn" on my calendar. Hmm. For me, those variables need a namespace.

I'm assuming this is all in some effort to make software more friendly and conversational, and it demos great for some Apple exec to say "Ask Tim if he wants to have lunch on Wednesday" into his phone and have it automatically know they meant Tim Cook. Great, but in my professional life (and, uh, I'm guessing maybe Tim Cook's also), I interact with a pretty large number of people, some only occasionally, making this first name only convention pretty annoying.

Which makes me wonder if the logical next step is just to refer to people by their e-mail or Twitter. I'm sure that would generate a lot of debate as to which is the identifier of choice, but I'm guessing that ORCID is probably not going to be it. :)

Wednesday, December 23, 2015

Bragging about data volume is lame

I've noticed a trend in some papers these days of bragging about the volume of data you collect. Here's an example (slightly modified) from a paper I was just looking at "We analyzed a total of 293,112 images." Often times, these numbers serve no real purpose except to highlight that you took a lot of data, which I think is sort of lame.

Of course, numbers in general are good and are an important element in describing experiments. Like "We took pictures of at least 5000 cells in 592 conditions." That gives a sense of the scale of the experiment and is important for the interpretation. But if you just say "We imaged a total of 2,948,378 cells", then that provides very little useful information about why you imaged all those cells. Are they all the same? Is that across multiple conditions? What is the point of this number except to impress?

And before you leave a comment, yes, I know we did that in this paper. Oops. I feel icky.

Tuesday, December 22, 2015

Reviewing for eLife is... fun?

Most of the time, I find reviewing papers to be a task that, while fun-sounding in principle, often becomes a chore in practice, especially if the paper is really dense. Which is why I was sort of surprised that I actually had some fun reviewing for eLife just recently. I've previously written about how the post-review harmonization between reviewers is a blessing for authors because it's a lot harder to give one of those crummy, ill-considered reviews when your colleagues know it's you giving them. Funny thing is that it's also fun for reviewers! I really enjoy discussing a paper I just read with my colleagues. I feel like that's an increasingly rare occurrence, and I was happy to have the opportunity. Again, well done eLife!

Sunday, December 20, 2015

Impressions from a couple weeks with my new robo-assistant, Amy Ingram

Like many, I both love the idea of artificial intelligence and hate spending time on logistics. For that reason, I was super excited to hear about X.ai, which is some startup in NYC that makes an artificially intelligent scheduler e-mail bot. It takes care of this problem (e-mail conversation):
“Hey Arjun, can we meet next week to talk about some cool project idea or another?”
“Sure, let’s try sometime late next week. How about Thursday 2pm?”
“Oh, actually, I’ve got class then, but I’m free at 3pm.”
“Hmm, sorry, I’ve got something else at 3pm. Maybe Friday 1pm?”
“Unfortunately I’m out of town on Friday, maybe the week after?”
“Sure, what about Tuesday?”
“Well, only problem is that…”
And so on. X.ai’s solution works like this:
“Hey Arjun, can we meet next week to talk about some cool project idea or another?”
“Sure, let’s try sometime late next week. I’m CCing my assistant Amy, who will find us a time.”
And that’s it! Amy will e-mail back and forth with whoever wrote to me and find a time to meet that fits us both, putting it straight on my calendar without me having to lift another finger. Awesome.

So how well does it work? Overall, really well. It took a bit of finagling at first to make sure that that my calendar was appropriately set up (like making sure I’m set to “available” even if my calendar has an all day event) and that Amy knew my preferences, but overall, out of the several meetings attempted so far, only one of them got mixed up, and to be fair, it was a complicated one involving multiple parties and some screw ups on my part due to it being the very first meeting I scheduled with Amy. Overall, Amy has done a great job removing scheduling headaches from my life–indeed, when I analyzed a week in my e-mail, I was surprised how much was spent on scheduling, and so this definitely reduces some overhead. Added benefit: Amy definitely does not drop the ball.

One of the strangest things about using this service so far has been my psychological responses to working with it (her?). Perhaps the most predictable one was that I don’t feel like a “have your people call my people” kind of person. I definitely feel a bit uncomfortable saying things like “I’m CCing my assistant who will find us a time”, like I’m some sort of Really Busy And Important Person instead of someone who teaches a class and jokes around with twenty-somethings all day. Perhaps this is just a bit of misplaced egalitarian/lefty anxiety, or imposter syndrome manifesting itself as a sense that I don’t deserve admin support, or the fact that I’m pretty sure I’m not actually busy enough to merit real human admin support. Anyway, whatever, I just went for it.

So then this is where it starts getting a bit weird. So far, I haven’t been explicitly mentioning that Amy is a robot in my e-mails (like “I’m CCing my robo-assistant Amy…”). That said, for the above reasons of feeling uncomfortably self-important, I actually am relieved when people figure out that it’s a robot, since it somehow seems a bit less “one-percenty”. So why didn't I just say she’s a robot right off the bat? To be perfectly honest, when I really think about it, it’s because I didn't want to hurt her feelings! It’s so strange. Other examples: for the first few meetings, Amy CCs you on the e-mail chain so you can see how she handles it. I felt a strong compulsion to write saying “Thank you!” at the end of the exchange. Same when I write to her to change preferences. Like
“Amy, I prefer my meetings in the afternoon.”
“Okay, I have updated your preferences as follows…”
… “Thank you?!?!?”
Should I bother with the formalities of my typical e-mails, with a formal heading and signature? I think I’ve been doing it, even though it obviously (probably?) doesn’t matter.

Taking it a bit further, should I be nice? Should I get angry if she messes something up? Will my approval or frustration even register? Probably not, I tell myself. But then again, what if it’s part of her neural network to detect feelings of frustration. Would her network change the algorithms somewhat in response? Is that what I would want to happen? I just don’t know. I have to say that I had no idea that this little experiment would have me worrying about the intricacies of human/AI relations.

In some sense, then, I was actually a bit relieved at the outcome of the following exchange. As a test, Sara worked with Amy to set up an appointment for us to get a coffee milkshake (inside joke). She then told Amy to tell me that I should wear camouflage to the appointment, a point that Amy dutifully relayed to me:
Hi Arjun,
I just wanted to pass along this message I received from Sara. It doesn’t look like it’s a message I can provide an answer to, so I suggest you follow up with Sara. directly.
Thanks, Amy! 2 o'clock would be great. And please make sure he wears camouflage. Sara
To which I responded:
Hi Amy, 
Thanks for the note. Can you please tell Sara that I don’t own any camouflage?
And then I heard this back:
Hi Arjun,
Right now I can't pass requests like this to your guests. I think your message would have a stronger impact if you sent it directly to Sara.
Ah, a distinctly and reassuringly non-human, form-like response. What a relief! Looks like we've still got a little way to go before we have to worry about her (its?) feelings. Still, the singularity is coming, one meeting at a time!

Saturday, December 19, 2015

Will reproducibility reduce the need for supplementary figures?

One constant refrain about the kids these days is that they use way too much supplementary material. All those important controls, buried in the supplement! All the alternative hypotheses that can’t be ruled out, buried in the supplement! All the “shady data” that doesn’t look so nice, buried in the supplement! Now papers are just reduced to ads for the real work, which is… buried in the supplement! The answer to the ultimate question of life, the universe and everything? Supplementary figure 42!

Whatever. Overall, I think the idea of supplementary figures makes sense. Papers have more data and analyses in them than before, and supplementary figures are a good way to keep important but potentially distracting details out of the way. To the extent that papers serve as narratives for our work as well as documentation of it, then it’s important to keep that narrative as focused as possible. Typically, if you know the field well enough to know that a particular control is important, then you likely have an interest sufficient enough to go to the trouble to dig it up in the supplement. If the purpose of the paper is to reach people outside of your niche–which most papers in journals with big supplements are attempting to do–then there’s no point in having all those details front and center.

(As an extended aside/supplementary discussion (haha!), the strategy we’ve mostly adopted (from Jeff Gore, who showed me this strategy when we were postdocs together) is to use supplementary figures like footnotes, like “We found that protein X bound to protein Y half the time. We found this was not due to the particular cross-linking method we used (Supp. Fig. 34)”. Then the supplementary figure legend can have an extended discussion of the point in question, no supplementary text required. This is possible because unlike regular figure legends, you can have interpretation in the legend itself, or at least the journal doesn’t care enough to look.)

I think the distinction between the narrative and documentary role of a paper is where things may start to change with the increased focus on reproducibility. Some supplementary figures are really important to the narrative, like a graph detailing an important control. But many supplementary figures are more like data dumps, like “here’s the same effect in the other 20 genes we analyzed”. Or showing the same analysis but on replicate data. Another type of supplementary figure has various analyses done on the data that may be interesting, but not relevant to the main points of the paper. If not just the data but also the analysis and figures are available in a repository associated with the paper, then is there any need for these sorts of supplementary figures?

Let’s make this more concrete. Let’s say you put up your paper in a repository on github or the equivalent. The way we’ve been doing this lately is to have all processed data (like spot counts or FPKM) in one folder, all scripts in another, and when you run the scripts, it takes the processed data, analyzes it, and puts all the outputted graphical elements into a third folder (with subfolders as appropriate). (We also have a “Figures” folder where we assemble the figures from the graphical elements in Illustrator; more in another post.) Let’s say that we have a side point about the relative spatial positions of transcriptional loci for all the different genes we examined in a couple different datasets; e.g., Supp Figs. 16 and 21 of this paper. As is, the supplementary figures are a bit hard to parse because there’s so much data, and the point is relatively peripheral. What if instead we just pointed to the appropriate set of analyses in the “graphs” folder? And in that folder, it could have a large number of other analyses that we did that didn’t even make the cut for the supplement. I think this is more useful than the supplement as normally presented and more useful than just the raw data, because it also contains additional analyses that may be of interest–and my guess is that these analyses are actually far more valuable than the raw data in many cases. For example, Supp Fig. 11 of that same paper shows an image with our cell-cycle determination procedure, but we had way more quantitative data that we just didn’t show because the supplement was already getting insane. Those analyses would be great candidates for a family of graphs in a repository. Of course, all of this requires these analyses being well-documented and browsable, but again, not sure that’s any worse than the way things are now.

Now, I’m not saying that all supplementary figures would be unnecessary. Some contain important controls and specific points that you want to highlight, e.g., Supp. Fig. 7–just like an important footnote. But analyses of data dumps, replicates, side points and the such might be far more efficiently and usefully kept in a repository.

One potential issue with this scheme is hosting and versioning. Most supplementary information is currently hosted by journals. In this repository-based future, it’s up to Bitbucket or Github to stick around, and the authors are free to modify and remove the repository if they wish. Oh well, nothing’s permanent in this world anyway, so I’m not so worried about that personally. I suppose you could zip up the whole thing and upload it as a supplementary file, although most supplementary information has size restrictions. Not sure about the solution to that.

Part of the reason I’ve been thinking about this lately is because Cell Press has this very annoying policy that you can’t have more supplementary figures than main figures. This wreaked havoc with our “footnote” style we originally used in Olivia’s paper because now you have to basically agglomerate smaller, more focused supplementary figures into huge supplementary mega-figures that are basically a hot figure mess. I find this particularly ironic considering that Cell’s focus on “complete stories” is probably partially to blame for the proliferation of supplementary information in our field. I get that the idea is to reduce the amount of supplementary information, but I don’t think the policy accomplishes this goal and only serves to complicate things. Cell Press, please reconsider!

Sunday, December 13, 2015

Cheese spread from the lab holiday party

Seriously awesome collection of cheeses from yesterday's lab holiday party! La Tur and Morbier (More Beer!) are some of my favorites, along with the brie Rohit brought. Eduardo's super stinky cheese was decent and definitely less funky tasting than smelling.

Blog hiatus (hopefully) over, and a few thoughts on writing grants

Had at least a couple folks ask why I haven’t written any blog posts lately. The answer is some combination of real-life work leading to writing fatigue and some degree of lack of inspiration. On a related note, writing grants sucks.

There have been some grants that I’ve had fun writing, but I would say they are in the distinct minority. I am of course not alone in that, but one of the reasons often trotted out for hating writing grants is that many scientists just hate writing in general, and I think that is true for a number of scientists that I know. Personally, though, I actually really like writing, and I typically enjoy writing papers and am reasonably fast at it, so it’s not the writing per se. So what is it that makes me leave the page empty until just days before the deadline while patiently waiting for the internet to run out of videos of people catching geoducks?

Part of it is definitely that grantwriting makes you sit and think about your work, how it fits into what we already know, and how it will tell us something new. It is certainly the case that grants can force you to critically evaluate ideas–writing is where weak ideas go to die, and that death can be painful. But I don’t think this is the whole story, either. I would say that the few grants I’ve really enjoyed writing are the ones where the process focused on thinking about the science I really want to do (or more likely already did) and explaining it clearly. So what is it about the other grants that make me try to find virtually any excuse to avoid writing them?

After a bit of reflection, I think that for me, the issue is that writing a grant generally often just feels so disingenuous. This is because I’m generally trying to write something that is “fundable” rather than what I really want to do. And I find it really REALLY hard to get motivated to do that. I mean, think about it. I’ve got to somehow come up with a “plausible plan” for research for 5 years, in a way that sounds exciting to a bunch of people who are probably not experts in the area and have a huge stack of applications to read. First off, if I ever end up in a situation where I’m actually doing what I thought I was going to do 5 years ago, I should probably be fired for complete lack of imagination. Secondly, the scope of what one typically proposes in a grant is often far less imaginative to begin with. Nobody ever proposes the really exciting thing they want to do, instead they just propose what reviewers will think is safe and reasonable. Not that these are some brilliant insights on my part; I think most applicants and reviewers are acutely aware of this, hence the maxim “An NIH grant should have 3 aims: 2 you’ve already done and 1 you’re never going to do”. So to the extent that everyone already knows all this, why do we bother with the whole charade?

I should say that I feel far less disingenuous writing “people” grants, by which I mean the fund-the-person-not-the-project grants like many junior investigator awards, HHMI and those as part of the NIH high-risk/high-reward program. At least there, I’m focusing more on the general area that we’re interested in in the lab, describing what makes us think it’s exciting, and why we’re well positioned to work on this topic, which is far more realistic than detailing specific experiments I’ll use to evaluate a particular hypothesis that I’ll probably realize is hopelessly naive after year 1. Of course, I think these are basically the criteria that people are judging “project” grants on as well for the most part, but at least I don’t have to pretend that I know what cell type I’m going to use in our RNA FISH validation studies in year 4… nor will I get dinged for a “bad choice” in this regard, either. This is not to say that writing people-grants is easy–it is particularly tricky to write confidently about yourself without sounding silly or boastful–but I’m just saying that the whole exercise of writing a people-grant involves writing about things that feel more aligned with the criteria by which I think grants should actually be evaluated.

(Sometimes I wonder if this whole system exists mainly to give people a way out of directly criticizing people. If you’re not too excited about a grant, can harp on technical issues, of which there are always plenty. I think this is an issue with our culture of criticism, which is probably a topic for another blog post.)

Carried a bit further, the logical conclusion to this line of argument is that we shouldn’t be judged on prospective plans at all, but rather just on past performance. Personally, I would much rather spend time writing about (and being judged on) science I actually did than some make-believe story about what I’m going to do. I remember a little while ago that Ron Germain wrote a proposal that was “people-oriented” in the sense that it suggested grants for all young investigators, with renewal based on productivity. His proposal engendered a pretty strong backlash from people saying that people-based grants are just a way to help the rich get richer (“Check your privilege!”). Hmm, don’t know that I’m qualified to delve into this too deeply, but I'm not sure I buy the argument that people-based grants would necessarily disfavor the non-elite, at least any more than the current system already does. Of course the current people-based grant system looks very elitist–it’s very small, and so naturally it will mostly go to a few elites. I don’t think that we can necessarily draw any conclusions from that about what people-based funding might look like on a broader scale. I also think that it might be a lot easier to combat bias if we can be very explicit about it, which I think may actually be easier in people-based grants.

As to the backlash against these sort of proposals, I would just say that many scientists have an inherent (and inherently contradictory) tendency towards supreme deification on the one hand and radical egalitarianism on the other. I think a good strategy is probably somewhere in between. Some people-based grants to encourage a bit more risk-taking and relieve some of the writing burden. Some project-based grants to keep programmatic diversity (because it would help fund important areas that are maybe not fashionable at the moment). I don’t know where this balance is, but my feeling is that we're currently skewed too far towards projects. For this reason, I was really excited about the NIH R35 program–until you follow this eligibility flowchart and find out that most roads lead to no. :(

Oh, and about the actual mechanics of writing a grant: my personal workflow is to write the text in Google Docs using PaperPile, then export to Pages, Apple’s little-used-but-perfect-for-grants word processing tool. The killer feature of Pages is that it’s SO much better than Word/Google Docs at allowing you to move figures to exactly where you want them and have them stay there, and as an added bonus, they will keep their full PDF-quality resolution. Only problem is that there are a grand total of around 18 people in the continental United States who use Pages, and none of them are writing a grant with you. Sad. Still better than LaTeX, though. ;)

Saturday, December 12, 2015

Diet Coke vs. Coke Zero: Lab holiday party challenge!

A couple weeks ago, I was remarking how I like Coke Zero much more than Diet Coke, and Uschi said, "Oh, isn't it just the same thing, but just with a different label on it?" To which I responded "No way, it's completely different!" So we did a little blind taste test right there on the spot and I of course got them mixed up. :)

Ah, but that was just n=1. So with that in mind, we decided to take a more, umm, scientific approach at our lab holiday party today. Uschi set up two cups for all 13 participants, one with Coke Zero and one with Diet Coke. Everyone then had to say which they thought was which. Results? 11 of 13 got it right! (And I was one of them, thankfully...)

Verdict: Coke Zero and Diet Coke are distinguishable, with a p of 0.0017.

Monday, October 26, 2015

What happens if you think too much about RNA editing

This is what happens if you spend too much time thinking about RNA editing:

Don't let this happen to you.

PS: joking–I *love* editing. ;)

Thursday, October 1, 2015

Fun new perspective paper with Ian Mellis

Wrote a perspective piece with Ian Mellis that just came out today:


tl;dr: Where is systems biology headed? The singularity, of course... :)

(Warning: purposefully inflammatory.)

Monday, September 21, 2015

Some observations from Single Cell Genomics 2015

Just got back from a really great conference on Single Cell Genomics at Utrecht in the Netherlands. The lead organizer, Alexander van Oudenaarden (my postdoc mentor) was an absolutely terrific host, with excellent speakers, a superb venue, and a great dance party with live music in an old church (!) to cap it all off.

Here are some observations on the field from a relative outsider:

1. Single cell genomics is becoming much more democratic. As the tools have developed, the costs and complexity have gone way down in terms of preparing the libraries of cells, and it seems like the field has achieved some degree of consensus on barcoding and molecular identifiers. The droplet techniques are particularly remarkable in terms of the numbers of cells, and look relatively inexpensive and easy to set up (we are close to having it working in our lab, and we just started a little while ago).

2. Meanwhile, the quality of the data overall seems to have increased. Earlier on, I think there was a lot of talk about how much better one method for e.g. single cell RNA-seq was than the other, and the question on everyone's mind from the outside was which one to use. Nowadays, it doesn’t seem like any one method leads to radically different biological claims than anyone else’s. That’s not to say that there aren’t any differences, but rather that there are fewer practical differences between methods that I could see, especially compared to a few years ago. Then again, I'm completely naive to this area, so I could be way off base here.

3. tSNE was everywhere. Very cool method! It's important to remember, though, that it's just a new-fangled projection method that tries to preserve distance. Can't necessarily ascribe any biological significance to it–it's just a visualization method to help look at high-dimensional datasets. I think most folks at the conference realize that, but perhaps people from outside might not know about that.

4. This was undoubtedly a technology meeting. That said, while the technology is still rapidly advancing, I feel that we have to start asking what the new conceptual insights one might get from single cell sequencing might be. I think this question is in the air, and I think some clever folks will start coming up with something, especially now that the methods are maturing. But it will require some deep thinking.

5. Along those lines, one thing that sort of bugs me is when people start their talk with a statement like "It is clear that ABC is important for the cell because of XYZ" as motivation for developing the method. Sometimes I would disagree with those statements. I think that it's important to really dig into the evidence for some phenomenon being important and present that fairly at the beginning of a talk.

6. At the same time, one amazing talk highlighted some actual, real, clinical personalized medicine using single cell sequencing. Now THAT is real-world impact. I’m don't think it’s published yet, but when it is, I’m pretty sure you’ll hear about it.

7. Imaging is making a comeback. For a while, I was sort of bummed that sequencing was the new hotness and imaging was old and busted. But Long Cai and Xiaowei Zhuang showed off some very nice recent results on multiplexing RNA FISH to get us closer to image-based transcriptomics. Still a ways to go, but it has a number of advantages, spatial information of course being the most obvious one, sensitivity being another. One big issue is cost reduction for oligonucleotides, though. That may take some creative thinking.

8. This field has a lot of young, energetic people! As Alexander remarked, the poster session was huge, and the quality was very high. Clearly a growth area. It is also clearly friendly but rather competitive. At this stage, though, I think the methods are all sort of blending together, and I get the sense that the big game have already been hunted in terms of flashy papers purely based on methods. So maybe the competitiveness will diminish a bit now, or at least transfer elsewhere.

9. Speaking of growth, Dutch people are indeed really tall. Like, really tall. I had to use the kids urinal in the bathroom at the airport when I got off the plane.

Next year this meeting will be at the Sanger Institute–should be fun!

Monday, September 7, 2015

Another option for how to shop your paper around

I had a very interesting conversation with a journal editor recently. Normally, when your paper gets solid reviews but gets rejected for “impact” reasons or whatever, the journal will try to funnel you into one of their family journals (“just click the link… just click the link…”). Good deal for them: they get to keep a solid paper, boost a new journal, maybe collect revenue from their open access honeypot, all without much additional work. Good deal for you? Maybe yes, maybe no. But here’s the thing the editor told me: if you got good reviews from some other journal, just take those reviews and send it in with your paper to our journal! Often, if there are no technical flaws, they can accept right away, maybe send for one additional reviewer just to double check. Sort of a personally-managed transfer.

There probably are some thorny ethical or legal issues with doing this, and I have not done it myself. Then again, my feeling, which is completely a guess based on anecdotes, is that some journals are increasingly sending out papers to review that they have no intention of publishing themselves, but want to capture into their family journals. (One thing is that it’s probably easier for editors to get good reviewers that way.) So I'm not sure anyone's hands are clean. Publishing is so demoralizing these days that I think you just do what you have to do.

Anyway, just another option to pass the time until a future of pre-print awesomeness arrives. Maybe we can then just send community feedback to the journal and be done with it!

Tuesday, August 25, 2015

New York Gyro lost... and found!

We had some serious issues in the lab over the last several months. In mid-June, our favorite food truck (okay, Ally's and mine), New York Gyro, disappeared from the corner of 38th and Walnut, only to be replaced by some other New York Gyro guy. Okay, no big deal, one New York Gyro is as good as the next, right? Oh, how so very naive, my friend. There is this New York Gyro (affectionately referred to as "Far Gyro" because it's sort of, well, far), and then there's everyone else. His chicken over rice is hands-down the best. And a free drink?! What more can you ask for. So we thought, okay, he's leaving for Ramandan, no big deal. Well, Ramadan came and went, but no Far Gyro. At first, well, maybe he tacked on a vacation. Then maybe it was some family thing. Then... well, let's just say we were hoping that he had somehow gotten bumped off his spot and was out there, somewhere, waiting for us to find him. It got to the point where Martha (from the Winkelstein lab) made this sign:

Meanwhile, some usurper had taken his place, driving down the ratings on Yelp for New York Gyro on 38th and Walnut. Blasphemy! And then...

Found Gyro

Far Gyro is now Super Far Gyro, having resurfaced on 41st and Walnut! Rohit just happened to see him because he lives right there. Well worth the additional 3 blocks of walking. Whew.

Anyway, if you have never been to this particular truck, give it a try sometime. Chicken over rice, all the sauces, hold the lettuce (well, that last part is just for me–I have boycotted lettuce as a waste of precious time and space).

Sunday, August 23, 2015

Top 10 signs that a paper/field is bogus

These days, there has been a lot of hang-wringing about how most papers and wrong and reproducibility and so forth. Often this is accompanied with some shrill statements like “There’s a crisis in the biomedical research system! Everything is broken! Will somebody please think of the children?!” And look, I agree that these are all Very Bad Things. The question is what to do about it. There are some (misguided, in my view) reproducibility efforts out there, with things like registered replication studies and publishing all negative results and so forth. I don’t really have too much to say about all that except that it seems like a pretty boring sort of science to do.

So what to do about this supposed crisis? I remember someone I know telling me that when he was in graduate school, he went to his (senior, pretty famous) PI with a bunch of ideas based on what he'd been reading, and the PI said something along the lines of "Look, don't confuse yourself by reading too much of that stuff, most of it’s wrong anyway". I've been thinking for some time now that this is some of the best advice you can get.

Of course, that PI had decades of experience to draw upon, whereas the trainee obviously didn't. And I meet a lot of trainees these days who believe in all kinds of crazy things. I think that learning how to filter out what is real from the ocean of scientific literature is a skill that hopefully most trainees get some exposure to during their science lives. That said, there’s precious little formalized advice out there for trainees on this point, and I believe that a little knowledge can go a long way: for trainees, following up on a bogus result can lead to years of wasted time. Even worse is choosing a lab that works on a bogus field–a situation from which escape is difficult. So I actually think it is fair to ask “Will somebody please think of the trainees?”.

With this in mind, I thought it might be useful to share some of the things I've learned over the last several years. A lot of this is very specific to molecular biology, but maybe useful beyond. Sadly, I’ll be omitting concrete examples for obvious reasons, but buy me a beer sometime and then maybe I'll spill the beans. If you’re looking for a general principle underlying these thoughts, it’s to have a very strong underlying belief system based in Golden Era molecular biology. Like: DNA replication, yeah, I’m pretty sure that’s a thing. Golgi Apparatus, I think that exists. Transcription and translation, pretty sure those really happen. Beyond that, well…
  1. Run the numbers. One consistent issue in molecular biology is that because it tends to be so qualitative, we have little sense for magnitudes and plausibility of various mechanisms. That said, we now are getting to the point where we have a lot more quantitative data that lets us run some basic sanity checks (BioNumbers is a great resource for this). An example that I’ve come across often is mRNA localization. Many people I’ve met have, umm, fairly fanciful notions of the degree to which mRNA is localized. From what we’ve seen in the lab, almost every mRNA seems to just be randomly distributed around the cytoplasm, with the exception being ER-localized ones, which are, well, localized to the ER. Ask yourself: why should there be any mRNA localization? Numbers indicate that proteins diffuse quite rapidly around the cell, on a timescale that is likely faster than mRNA transport. So for most cells, the numbers say that you shouldn’t localize mRNA–rather, just localize proteins. And, uh, that’s what we see. There are of course exceptions, like lncRNA, that show interesting localization patterns–again, this makes sense because there is no protein to localize downstream. There are other things that people say about lncRNA that don’t make sense, though. I’ll leave that as an exercise for the reader… :) (Also should point out that these considerations can actually help make the case for mRNA localization in neurons, which I think is a thing.)
  2. Consider why nobody has seen this Amazing New Phenomenon before. Was it a lack of technology? Okay, then it might be real. Was it just brute force? Also possible that it's real. Was it just waiting for someone to think of the idea? Well, in my experience, nutty ideas are relatively cheap. So I'd be very suspicious if this result was just apparently sitting there without anyone noticing. Ask yourself: should this putative set of genes have shown up in a genetic screen? Should this protein have co-purified with this other protein? Did people already do similar experiments a long time ago and come up empty handed? What are other situations in which people may have inadvertently seen the same thing before? It’s also possible that the result is indeed true, but represents a “one-off” special case: consider this exchange about a recent paper (I have to say that I was surprised that some people in the lab didn’t even find this result surprising!). Whether you choose to pursue one-offs is I think a largely aesthetic choice.
  3. Trust your brain, not stats. If looking at an effect makes you wonder what the p-value is, you’re already on thin ice, so tread carefully. Also, beware of new statistical methods that claim to extract results from the same data where none existed before. Usually, these will at best find only marginally interesting new examples. More often, they just find noise. If there was something really obvious, probably the original authors would have found it by manual inspection of the data. Also, if there’s a clear confounding factor that the authors claim to have somehow controlled for, be suspicious.
  4. Beware of the "dynamic process". Sometimes, when you press someone on the details of a particular entity or process in the cell whose existence is dubious, they will respond with "Well, it's a dynamic object/process." Often (though certainly not always), this is an excuse for lazy thinking. Remember that just because something is "dynamic" doesn't mean that you should not be able to see it! Equilibrium, people.
  5. For some crazy new proposed mechanism, ask yourself if that is how you think the cell would do it. We often hear that nothing in biology makes sense except in light of evolution. In this context, I think it's worth wondering whether the proposed mechanism would be a reasonable way for the cell to do something it was not otherwise able to do. Otherwise, maybe it’s some sort of artifact. As a (made up) example, cells have many well-established mechanisms for communicating with each other. For a new mechanism of communication to be plausible (in my book), it should offer some additional functionality beyond these existing mechanisms. Evolution can do weird stuff, though, so this line of reasoning is inherently somewhat suspect.
  6. Check for missing obvious-next-step experiments. Sometimes you’ll find a paper describing something cool and new, and you’ll immediately wonder “Hmm, if what they’re saying is true, then shouldn’t those particles also be able to…”. Well, if you thought of it after reading a paper for 30 minutes, then presumably the authors had the same idea as some point as well. And presumably tried it. And it presumably didn’t work. (Oh, wait, sorry, I meant the results were “inconclusive”.) Or they tried to get RNA from those cells to profile. And they just didn’t get enough RNA. And so on. Keep an eye out for these, especially if multiple papers are missing these key experiments.
  7. For methods, look for validation with known biology. The known positives should be positive and presumed negatives should be negative. Let’s say you have some new sequencing method for measuring all RNA-protein interactions (again, completely hypothetical). Have a list of known interactions that should show up and a list of ones for which there’s no plausible reason to expect an interaction. Most people think about the positives, but less often about the negatives. Think carefully about them.
  8. Dig carefully into validation studies. I remember reading some paper in which they claimed to have detected a bunch of new molecules and then “validated” their existence. Then the validation had things like blots exposed for weeks to show signals and PCRs run for 80 cycles and stuff like that. Hmm. Often this data is buried deep in supplements. Spend the time to find it.
  9. Be suspicious of the interpretation of biological perturbations. Cells are hard to manipulate. And so it’s perhaps unsurprising that most perturbations can lead you astray. Off-target effects for knockdown are notoriously difficult to control for. And even if you do have target specificity, another problem is that as our measurements get better, biological complexity means that virtually all hypotheses will be true at least 50% of the time. Overexpression often leads to hugely non-biological protein levels and can lead to artifacts. Cloning out single cells leads to weird variability. Frankly, playing with cells is so difficult that I’m sort of amazed we understand anything!
  10. Know the limitations of methods. If you’re looking for differential gene expression, how much can you trust RT-PCR? Have you heard of the MIQE guidelines for RT-PCR? I hadn't, but they are extensive. For RNA-seq, how well-validated is it in your expression regime? If you’re analyzing sequence variants, how do you know it’s not sequencing error (since largely discredited claims of extensive RNA editing are one widely-publicized example of this issue). ChIP-seq hotspots? The list goes on. If you don’t know much about a method, ask someone who does.
  11. Bonus: autofluorescence. Enough said.
I offer these more as a set of guidelines for how I like to think about new results, and I’m sure we can all think of several counterexamples to virtually every one of these. My point is that high-level thinking in molecular biology requires making decisions, and making a real decision means leaving something else on the table. Making decisions based on the literature means deciding what avenues not to follow up on, and I think that most good molecular biologists learn this early on. Even more importantly, they develop the social networks to get the insider’s perspective on what to trust and what to ignore. As a beginning trainee, though, you typically will have neither the experience nor the network to make these decisions. My advice would be to pick a PI who asks these same sorts of questions. Then keep asking yourself these questions during your training. Seek out critical people and bounce your ideas off of them. At the same time, don’t become one of those people who just rips every paper to shreds in journal club. The point is to learn to exhibit sound judgement and find a way forward, and that also means sifting out the good nuggets and threading them together across multiple papers.

As a related point, as I mentioned earlier, there’s a lot of fuss out there about the “reproducibility crisis”. There’s two possible models: one in which we require every paper to be “true”, and the more laissez-faire model I am advocating for in which we just assume a lot of papers are wrong and train people to know the difference. Annoying reviewers often say that extraordinary claims require extraordinary evidence. I think this is wrong, and I’m not alone (hat tip: Dynamic Ecology). I think that in any one paper, you do your best, and time will tell whether you were right or wrong. It’s just not practical or efficient for a paper to solve every potential problem, with every result cross-validated with every method. Science is bigger than any one paper, and I think it’s worth providing our trainees with a better appreciation of that fact.

Update, 8/25/2015:
Couple nice suggestions from various folks. One from anonymous suggests "#12: Whenever the word 'modulating' appears anywhere in title/abstract." Well said!

Another point from Casey Bergman:

Wednesday, August 19, 2015

A fun day in the lab

Yesterday was a really great day in the lab! Olivia, who graduated in May, came back from her trip to dance camp to have lunch with us... and showed off her new engagement ring! Awesome! And with one of the best engagement speeches ever from her fiancé Derek (as seen on Facebook). Olivia's answer to the big question? "Okay, fine."

Also, it was Maggie's birthday, and Ally made this fantastic cake made of fruit and yogurt:

Happy birthday, Maggie!

Then we went to Taco Tuesday at Wahoo's and enjoyed a lovely summer evening outside with beer, tacos and cornhole.

Sigh, now back to writing a grant...

Tuesday, August 11, 2015

The impact-factor introduction

Last week, I went to the Penn MSTP retreat (for MD/PhD and VMD/PhD students), which was really cool. It truly is The Best MSTP Program in the Galaxy™, with tons of very talented students, including, I'm proud to say, four in our lab! There was lots of interesting and inspiring science in talks and posters throughout the day, and I also got to meet with a couple of cool incoming students, which is always a pleasure.

One thing I noticed several times, however, was the pernicious habit of mentioning of what journals folks in the program were publishing in or somehow associated with, emphasizing, of course, the fancy ones like Nature, etc. I noticed this in particular in the introduction of the keynote speaker, Chris Vakoc (Penn alum from Gerd Blobel's lab), because the introduction only mentioned where his work was published and didn't say anything about what science he actually did! I feel it bears mentioning that Chris gave a magnificent talk about his work on chromatin and cancer, including finding an inhibitor that actually seems to have cured a patient of leukemia. That's real impact.

I've seen these "impact-factor introductions" outside of the MSTP retreat a few times as well, and it really rubs me the wrong way. Frankly, being praised for the journals you've published in is just about the worst praise one could hope for. In a way, it's like saying "I don't even care enough to learn about what you do, but it seems like some other people think it's good". Remember, "where" we publish is just something we invent to separate out the mostly uninteresting science from the perhaps-marginally-less-likely-to-be-uninteresting-but-still-mostly-uninteresting science. If you actually are lucky enough to do something really important, it won't really matter where it's published.

What was even more worrisome was that the introduction for the speaker came from a (very well-intentioned) trainee. I absolutely do not want to single out this trainee, and I am certain the trainee knows about Chris's work and holds it in high regard. Rather, I think the whole thing highlights a culture we have fostered in which trainees have come to value perceived "impact" more than science itself. As another example, I remember bumping into a (non-MSTP) student recently and mentioning that we had recently published a paper, and rather than first asking what it was about, they only asked about where it was published! I think that's frightening, and shows that our trainees are picking up the worst form of scientific careerism from us. Not that I'm some sort of saint, either. I found it surprising to read BioRxiv recently and feel a bit disoriented without a journal name on the paper to help me know whether a paper was worth reading. Hmm. I'm clearly still in recovery.

Now, I'm not an idealist, nor particularly brave. I still want to publish papers in glossy journals for all the same reasons everyone else does, mostly because it will help ensure someone actually reads our work, and because (whether I like it or not) it's important for trainees and also for keeping the lab running. I also personally think that this journal hierarchy system has arisen for reasons that are not easy to fix, some of which are obvious and some less so. More ideas on that hopefully soon. But in the meantime, can we all at least agree not to introduce speakers by where they publish?

Incidentally, the best introduction I've ever gotten was when I gave a talk relatively recently and the introducer said something like "... and so I'm excited to hear Dr. Raj talk about his offbeat brand of science." Now that's an introduction I can live up to!

Friday, July 24, 2015

The editors at Cell Systems are awesome!

I just had a really lovely experience reviewing a paper at Cell Systems. Started out fairly standard: got an e-mail asking to review, paper looked good (showing good taste), so I accepted, then submitted my review. Then I got a personal e-mail from the editor thanking me for my review, remarking how nice the paper was, and asking a question or two. Then they issued the decision letter and I got a personal thank you for my review. It just really helped me feel involved and connected to the process, much more so than the auto-generated form e-mails I typically get. It also makes me way more inclined to see the editors at Cell Systems as thoughtful people who care about science, and thus more likely to recommend it to others and to submit there myself. Note to editors: a little humanity goes a long way. (I have had similarly good experiences with the editors at Nature Methods.)

Also want to say for the record that I still think that the current journal system is generally not good for how we do science. But in the meantime while we wait for preprints or whatever to become the standard, I think it’s nice to acknowledge a job well done when you see it.

Thursday, July 23, 2015

RNA integrity in tissues

Been thinking a lot about expression in tissue these days. Funny quote in a post from the always quotable Dan Graur: “They looked at the process of transcription in long-dead tissues? Isn’t that like studying the [...] circulation system in sushi?” He also points to this study from Yoav Gilad about RNA degradation, which is really great–wish there were more such studies.

We have been doing a fair amount of RNA FISH in tissues (big thanks to help from Shalev Itzkovitz and Long Cai), and while we haven’t done a formal study, I can say that RNA degradation is a huge problem in tissue. We’ve seen RNA in some tissues disappear literally within minutes after tissue harvest. This seems somewhat correlated with RNase content of the tissue, but it’s still unclear. We’ve also worked in fresh frozen human samples, all collected ostensibly the same way, and found huge variability in RNA recovery, with some samples showing great signals and other, seemingly identical samples showing no RNA whatsoever. This is true even for GAPDH. No clue whether the variability is biological or not, but I'm inclined to think it's technical. Most likely culprit is ischemic time, of which we had no control in the human samples.

We’ve also found that we’ve been able to get decent signals in formaldehyde fixed paraffin embedded samples, even though those are thought to be generally worse than fresh frozen. If I had to guess, I’d say it’s all about sample handling before freezing/fixing. I would be very hesitant to make any strong claims about gene expression without being absolutely certain about the sample quality. Problem is, I don't know what it means to be absolutely certain... :)

Anyway, so far, all we have is the sum of anecdotes, which I share here in case anyone’s interested. We really should do a more formal study of this at some point.

Wednesday, July 15, 2015

RNA-seq vs. RNA FISH, part 2: differential expression of 19 genes

On the heels of RNA FISH vs. RNA-seq (absolute abundance), here's cells in two different conditions, differential expression of 19 genes, RNA FISH vs. RNA-seq:
A few are way off, but not bad, on the whole.

Saturday, July 11, 2015

How should we do script review to spot errors?

Sydney just thought up a great idea for the lab: she was wondering if someone could review all her analysis scripts to look for errors before we finalize it and submit a manuscript. Sort of like a code review, I guess. I think this is awesome, and can definitely reduce the potential for getting some very serious egg on your face after publication. (Note: I'm not talking about infrastructure-type software, which I think has a very different set of problems and solutions. This is about analysis scripts for the science itself.)

We all discussed briefly at group meeting about how this might work in practice, which took on a very practical significance because Chris was going over figures for the paper he's putting together. Here were some of the points of discussion, much revolving around the time it takes for someone to go over someone else's code.

  1. When should the review happen? In the ideal world, the reviewer would be involved each step of the way, spotting errors early on in the process. In practice, that's a pretty big burden on the reviewer, and there's the potential to spend time reviewing analyses that never see the light of day. So I think we all thought it's better done at the end. Of course, doing it at the bitter end could be, well, bitter. So we're thinking maybe doing it in chunks when specific pieces of the analysis are finalized?
  2. Who should do it? Someone well-versed in the project would obviously be able to go through it faster. Also, they may be better able to suggest "sanity checks" (additional analyses to demonstrate correctness) than someone naive to the project. Then again, might their familiarity blind them to certain errors? I'm just not sure at this stage how much work it is to go through this.
  3. Related: How actively should the code author be involved? On the one hand, looking at raw code without any guidance can be very intimidating and time-consuming. On the other hand, having someone lead you through the code might inadvertently steer the reviewer away from problem areas.
  4. Who should do it, part 2? Some folks in the lab are a bit more computationally savvy than others. I worry that the more computationally savvy folks might get overburdened. It could be a training exercise for others to learn, but the quality of the review itself might suffer somewhat.
  5. How should we assign credit? Acknowledgement on the paper? Co-authorship? I could see making a case either way, guess it probably depends on the specifics.

Anyway, don't know if anyone out there has tried something like this, but if so, we'd love to hear your thoughts. I think it's increasingly important to think about these days.

Some of my favorite meta-science posts from the blog

I recently was asked to join a faculty panel on writing for Penn Bioengineering grad students, and in doing so, I realized that this blog already has a bunch of thoughts on "meta-science", like how to do science, manage time, give a talk, write. Below are some vaguely organized links to various posts on the subject, along with a couple outside links. I'll also try and maintain this Google Doc with links as well.

Time and people management:
Save time with FAQs
Quantifying the e-mail in my life, 1/2
Organizing the e-mail in my life, 2/2
How to get people to do boring stuff
The Shockley model of academic performance
Use concrete rules to change yourself
Let others organize your e-mail for you
Some thoughts on time management
Is my PI out to get me?
How much work do PIs do?
What I have learned since being a PI

How to do science:
The Shockley model of academic performance
What makes a scientist creative?
Why there is no journal of negative results
Why does push-button science push my buttons
Some thoughts on how to do science
Storytelling in science
Uri Alon's cloud
The magical results of reviewer experiments
Being an anal scientist
Statistics is not science
Machine learning, take 2

Giving talks:
How to structure a talk
Figures for talks vs. figures for papers
Simple tips to improve your presentations
Images in presentations
A case against laser pointers for talks
A case against color merges to show colocalization

The most annoying words in scientific discourse
How to write fast
Passive voice in scientific writing
The principle of WriteItAllOut
Figures for talks vs. figures for papers
What's the point of figure legends?
Musing on writing
Another short musing on writing

The eleven stages of academic grief
A taxonomy of papers
Why there is no journal of negative results
How to review a paper
How to re-review a paper
What not to worry about when you submit a manuscript
Storytelling in science
The cost of a biomedical research paper
Passive-aggressive review writing
The magical results of reviewer experiments
Retraction in the age of computation

Career development:
Why are papers important for getting faculty positions?
Is academia really broken? Or just really hard?
How much work do PIs do?
What I have learned since being a PI
Is my PI out to get me?
Why there's a great crunch coming in science careers
Change yourself with rules
The royal scientific jelly

The hazards of commenting code
Why don't bioinformaticians learn how to run gels?

Thursday, July 9, 2015

Notes from a Chef Watson lab party

I recently read about Chef Watson, which is a website that that is the love child of IBM's Watson (the one that won at Jeopardy) and Bon Appetit Magazine. Basically, you put in ingredients and out comes crazy recipes, generates by Watson's artificial intelligence. Note: it doesn't give you existing recipes. No, it actually generates the recipe based on its silicon-based knowledge of what tastes good with what.

After reading this awesome review (Diner Cod Pizza?!?), I had to try it for myself. After doing a trial run with some deviled eggs (made with soy sauce, tahini, white miso, mayonnaise, onion–yum!), I somehow convinced everyone in the lab into holding a Chef Watson-inspired lab dish-to-pass. I thought it would be a good idea because it combines our love of food with our love of artificial intelligence. And here are the results:

Maggie: Appetizer Rhubarb Tartlets
Made with polenta, rhubarb, orange juice, boursin, tamarind paste, shallots, basil. I actually really like this one, although it was a bit tart.

Ally: Grapefruit potato salad
Didn't get the complete recipe details, but, umm, it had grapefruit and potato. I actually thought it was not too bad, considering I'm not a huge grapefruit fan.

Andrew: Bean... thingy
Made with kidney beans, pecorino romano, salami, tahini, pepper sauce, capers, chicken, green chiles, onions, mint syrup. This one totally rocked!  Consensus winner!

Paul: Crab soup
Hmm. Don't remember what all was in this, but there was some crab. And a bunch of other random stuff. This recipe was interesting. Very interesting. The crazy thing was how the flavors evolved in every bite. Started sort of like crab soup and ended with the taste of Indian food (to me). Did I mention interesting? It won the prize for most interesting.

Paul taking a sip:

"That was interesting!":

Sara: Banana Lime Macaroni and Cheese
Yep, that just about sums it up, ingredients-wise. This dish was fairly polarizing (sorry, didn't get a picture). I actually thought it was pretty good. Sara herself was somewhat less enthusiastic. Meanwhile, she was busy blowing bubbles for her son Jonah. Meanwhile, Jonah drank a bottle of bubble mixture.

Claire: Corn bread
All in all, this was really good, and relatively normal. Only "weird" thing was honey, which I thought added a nice sweetness, although Claire thought it was a bit much. This picture is great, as much for the food as for the highly sceptical look on Todd's face!

Lauren: Chips and Salsa
Non-Watson, hence less outlandish. But tasty!

Stefan: Sausages
Non-Watson, but homemade and delicious.

Me: Asian Sesame Oil Pasta Salad
Japanese noodles, tahini, mayo, thyme, sherry vinegar, peanut, green pepper, yellow pepper, broccoli, sweet onions, apple. Forgot to take a picture, but not bad. Perhaps a bit bland, but tasted good with some chili pepper oil.

Verdict: Overall, I think Chef Watson is great! It definitely suggests flavor combinations that you would likely never think of otherwise. I think one lesson was that you probably want to flip through a bunch of recipes until you come across one that sort of makes sense. The other lesson is that Watson isn't so great at getting proportions and cooking times right. You definitely have to use your own judgement or things could get ugly. Anyway, I for one welcome our robotic cooking overlords.

Update 7/11/2015: Some people strongly disagree with my sentiment that Chef Watson is great. I view it as glass half-full: Watson gives you interesting ingredients to use, but the first time you combine them you probably won't get the proportions right. But they are definitely combinations you would not have chosen otherwise. The glass half-empty version is that we already have tried and true recipes. Why mess with success? Well, I guess I'm just an optimist! Rhymes with futurist! :)

Thursday, June 25, 2015

When to say yes

As a junior PI, you get a lot of advice about when to say “no”. One PI I know told me that he and his other junior PIs have a rule that they have to say no to at least one thing a day. And it is sage advice. The demands on our time are huge, and so every minute counts.

Sometimes I worry, though, that the pendulum may have swung too far in the other direction, to the point where the received wisdom to say no to everything prevents us from saying yes every once in a while. Say yes and you might just end up on a new adventure you may not have anticipated with cool and interesting people. Say no and you will never know.

I started thinking about this when I read this excellent blog post with advice for new PIs. All great tips, and one that really resonated with me was the tip to “Be a good colleague”. Basically, the point is that while there are some reasons you might think it a wise to do a bad job on something so that nobody asks you again, it’s far better to do a good job. I think the same holds for interpersonal interactions. I think it’s important to make time for the people you care about in your work life. Sometimes you might do a favor for a senior (or junior) colleague. Then you might have lunch and end up with an awesome collaboration. Or maybe the favor doesn’t get repaid. That’s okay, too, happens. And some people are just not going to make fun collaborators, and you might get burned. It takes time to get better at identifying those beforehand, and I know I still have much to learn about that. But I’m also learning not to be quite as suspicious of every request, and also trying to just go with the flow a bit. It’s led to some really great collaborations from which I've learned a lot.

My point is that by reflexively saying no to everything, I think we’re denying ourselves some of the richness of the life of a PI that comes through interactions with colleagues and their trainees, which I’ve found to be very valuable. And enjoyable. That’s the point, right?

Biking in a world of self-driving cars will be awesome

While I was biking home the other day, I had a thought: this ride would be so much safer if all these cars were Google cars. I think it’s fair to say that most bikers have had some sort of a run-in with a car at some point in their cycling lives, and the asymmetry of the situation makes it very dangerous for bikers. Thing is, we can (and should) try to raise bike awareness in drivers, but the fact is that bikes can often come out of nowhere and in places that drivers don’t expect, and it’s just hard for drivers to keep track of all these possibilities. Whether it’s “fair” or “right” or not is beside the point: when I’m biking around, I just assume every driver I meet is going to do something stupid. It’s not about being right, it’s about staying alive.

But with self-driving cars? All those sensors means that the car would be aware of bikers coming from all angles. I think this would result in a huge increase in biker safety. I think it would also greatly increase ridership. I know a lot of people who at least say they would ride around a lot more if it weren’t for their fear of getting hit by a car. It would be great to get all those people on the road.

Two further thoughts: self-driving car manufacturers, if you are reading this, please come up with some sort of idea for what to do about getting “doored” (when someone opens a door in the bike lane). Perhaps some sort of warning, like “vehicle approaching”? Not just bikes, actually–would be good to avoid cars getting doored (or taking off the door) as well.

Another thing I wonder about is whether bike couriers and other very aggressive bikers will take advantage of cautious and safe self-driving cars to completely disregard traffic rules. I myself would never do that :), but I could imagine it becoming a problem.

Wednesday, June 24, 2015

And you thought Tim Hunt was bad?

I’ve been sort of following the ink trail on Tim Hunt’s comments (which, incidentally, seems to have made the trail following Alice Huang go cold (just like in politics!)), so the topic of sexism in academia has been on my mind. I don’t think I have anything useful to say on the Hunt thing beyond what’s already out there. Even before Tim Hunt, I have a lot of female trainees in my lab, and I thought I had a sense of the sort of serious obstacles they face, including the sorts of comments like those from Hunt. And yes, comments like those are a serious obstacle. Disappointing and damaging, but not entirely surprising to hear stuff like that, although perhaps not in such a public forum.

It is in that context that I was absolutely shocked to hear someone I know tell me about her experiences at a major US institution. Seriously inappropriate comments in the workplace, including heavy-handed sexual advances. Women being groped and physically pushed around behind closed doors. Men in power using that power to touch women inappropriately, and as was intimated without details, worse. Worse to the point that a woman has a physical reaction when a certain man enters the room. And an institution that essentially protects these predators.

My jaw was on the floor. And the response to my shock was “Arjun, you have no idea, this stuff is happening all the time.” All the time. (To be clear, this institution is not Penn.)

The woman said that it seems to be much more of a problem with the older generation of men. I suppose we can wait for them to retire and go away. My sense of justice makes me feel like these people should have to pay for what has likely been a career of preying on women. And any institution that enables this sort of behavior needs some pretty deep soul searching. Even if such behavior is less prevalent in the newer generation, that is no guarantee that it is eliminated. And even having one such person around is one too many.

I am purposefully not naming any names because this is some pretty serious stuff, and ultimately, it’s not really my story to tell. I just wanted to bring it up because while I think we have come a long way, for me, it was a wake-up call that we still have a very long way to go.

Also, I want to make sure that this post isn’t misconstrued as some sort of minimization of the negative impact of Tim Hunt’s frankly bewildering statements. Words matter. Actions matter. It all matters. Indeed, I see the reaction to Tim Hunt’s comments as a strongly positive indicator of how far the discussion has come. Rather, I also want to point out that the reason the discussion is where it is comes from the tireless efforts of women through the decades who have put up with things I couldn’t even imagine, whose very decision to stay in science can be regarded as an act of deep courage and bravery. The thing that blew my mind is that women are still making those decisions to this day.

Sunday, June 14, 2015

RNA-seq vs. RNA FISH for 26 genes

Been meaning to post this for a while. Anyway, in case you're interested, here is a comparison of mean number of RNA per cell measured by RNA FISH to FPKM as measured by RNA-seq for 26 genes (bulk and also combined single cell RNA-seq). Experimental details in Olivia's paper. We used a standard RNA-seq library prep kit from NEB for the bulk, and used the Fluidigm C1 for the single cell RNA-seq. Cells are primary human foreskin fibroblasts.

Bulk RNA-seq vs. RNA FISH (avg. # molecules per cell)

Bulk RNA-seq vs. RNA FISH (avg. # molecules per cell), linear scale
Single cell RNA-seq vs. RNA FISH (avg. # molecules per cell)
Single cell RNA-seq vs. RNA FISH (avg. # molecules per cell), linear scale

Probably could be better with UMIs and so forth, but anyway, for whatever it's worth.

Saturday, June 6, 2015

Gene expression by the numbers, day 3: the breakfast club

(Day 0, Day 1, Day 2, Day 3)

So day 3 was… pretty wild! And inspiring. A bit hard to describe. There was one big session. The session had some dancing. A chair was thrown. Someone got a butt in the face. I’m not kidding.

How did such nuttiness come to pass? Well, today the 15 of us all gave exit talks, where we have the floor to discuss a point of our choosing. On the heels of the baseball game, we decided (okay, someone decided) that everyone should choose a walk-up song, and we’d play the song while the speaker made their way up for the exit talk. Later, I’ll post the playlist and the conference attendees and set up a matching game. The playlist was so good!

(Note: below is a fairly long post about various things we talked about. Even if you don’t want to read it all, check out the scientific Rorschach test towards the end.)

I was somehow up first. (See if you can guess my song. People in my lab can probably guess my song.) The question I posed was “does transcription matter?” More specifically, if I changed the level of transcription of a gene from, say, 196 transcripts per cell to 248 transcripts per cell, does that change anything about the cell? I think the answer depends on the context. Which led me to my main point that I kind of mentioned in an earlier post, which is that (I think) we need strong definitions based on functional outcomes in order to shape how we approach studying transcriptional regulation. I personally think this means that we really need to have much better measurements of phenotype so we can see what the consequences are of, say, a 25% increase in transcription. If there is no consequence, then should we bother studying why transcription is 25% higher in one situation vs. the other? Along these lines, Mo Khalil made the point that maybe we can turn to experimental evolution to help us figure out what matters, and maybe that could help guide our search for what matters in regulation.

Barak led another great point about definitions. He started his talk by posing the question “Can someone please give me a good definition of an enhancer?” In the ensuing discussion, folks seemed to converge on the notion that in molecular biology, definitions of entities is often very vague and typically defined much more by the experiments that we can do. Example: is an enhancer a stretch of DNA that affects a gene independently of its position? At a distance? These notions often from experiments in which they move the enhancer around and find that it still drives expression. Yet from the quantitative point of view, the tricky thing with experimentally based definitions is that these were often qualitative experiments. If moving the enhancer changes expression by 50%, then is that “location independent”?

Justin made an interesting point: can we come up with “fuzzy” definitions? Is there a sense in which we can build models that incorporate this fuzziness that seems to be pervasive in biology? I think this idea got everyone pretty excited: the idea of a new framework is tantalizing, although we still have no idea exactly what this would look like. I have to admit that personally, I’m not so sure that dispensing with the rigidity of definitions is a good thing–without rigid definitions, we run the risk of not saying anything useful and concrete at all. Perhaps having flexible definitions is actually similar to just saying that we can parametrize classes of models, with experiments eliminating some fraction of those model classes.

Jané brought in a great perspective from physics, saying that actually having a lot of arguments about definitions is a great thing. Maybe by having a lot of competing definitions and all of us trying to prove ours and contrast with others will eventually lead us to the right answer, and myopia in science can really lead to stagnation. I really like this thought. I feel like “big science” endeavors often fail to provide real progress because of exactly this problem.

The discussion of definitions also fed into a somewhat more meta discussion about interdisciplinary science and different approaches. Rob is strongly of the opinion that physicists should not need to get the permission of biologists to study biology, nor should they allow them to dictate what’s “biologically relevant”. I think this is right, and I also find myself often annoyed when people tell us what’s important or not.

Al made a great point about the role of theory in quantitative molecular biology. The point of theory is to say, “Hey, look at this, this doesn’t make sense. When you run the numbers, the picture we have doesn’t work–we need a new model.” Jané echoed this point, saying that at least with a model, we have something to argue about.

He also said that it would be great if we could formulate “no-go” models. Can we place constraints on the system in the abstract? Gasper put this really nicely: let’s say I’m a cell in a bicoid gradient trying to make a decision on what to do with my life. Let’s say I had the most powerful regulatory “computer” in the world in that cell. What’s the best that that computer could do with the information it is given? How precisely can it make its decision? How close do real cells get to this? I think this is a very powerful way to look at biology, actually.

Some of the discussions on theory and definitions brought up an important meta point relating to interdisciplinary work. I think it’s important that we learn to speak each other’s languages. I’ve very often heard physicists give a talk where they garble the name of a protein or something like that, and when a biologist complains, the response is sort of “well, whatever, it doesn’t matter”. Perhaps it doesn’t matter, but can be grating to the ear and the attitude can come across as somewhat disrespectful. I think that if a biologist were to give a talk and said “oh, this variable here called p… oh, yes, you call it h-bar, but whatever, doesn’t matter, I call it p”, it would not go over very well. I think we have to be respectful and aware of each other’s terminology and definitions and world view if we want to get each other to care about what we are both doing. And while I agree with Rob that physicists shouldn’t need permission to study biology, I also think it would be nice to have their blessings. Personally, I like to be very connected to biologists, and I feel like it has opened my mind up a lot. But I also think that’s a personal choice, perhaps informed by my training with Sanjay Tyagi, a biologist who I admire tremendously.

Another point about communicating across fields came up in discussing synthetic biology approaches to transcriptional regulation. If you take a synthetic approach to regulatory DNA, you will often encounter fierce resistance that you’re studying a “toy model” and not the real system. The counter, which I think is a reasonable argument, is that if you study just the existing DNA, you end up throwing your hands in the air and saying “complexity, who knew!”. (One conferee even said complexity is a waste of time: it’s not a feature but rather a reflection of our ignorance. I disagree.) So the synthetic approach may allow us to get at the underlying principles in a controlled and rigorous manner. I think that’s the essence of mechanistic molecular biology: make a controlled environment and then see if we can boil something down to its parts. Sort of like working in cell extracts. I think this is a sensible approach and one that deserves support in the biological community–as Angela said, it’s a “hearts and minds” problem.

That said, personally, I’m not so sure that it will be so easy to boil things down to its parts–partly because it's clearly very hard to find non-regulatory DNA to serve as the "blank slate" to work with for synthetic biology.  I'm thinking lately that maybe a more data first approach is the way to go, although I weirdly feel quite strongly against this view at the same time (much more on this in a perspective piece we are writing right now in lab). But that’s fundamentally scary, and for many scientists, this may not be a world they want to live in. Let me subject you to a scientific Rorschach test:

Image from here
What do you see here?
  1. A catalog of data points.
  2. A rule with an exception.
  3. A best fit line that explains, dunno, 60% of the variance, p = 0.002 (or whatever).
If you said #1, then you live in the world of truth and fact, which is admirable. You are also probably no fun at dinner parties.

Which leads us to #2 vs. #3. I posit that worldview #2 is science as we traditionally know it. A theory is a matter of belief, and doesn’t have a p-value. It can have exceptions, which point to places where we need some new theory, but in and of itself, it is a belief that is absolute. #3 is a different world, one in which we have abandoned understanding as we traditionally define it (and there is little right now to lead us to believe that #3 will give us understanding like #2, sorry omics people).

I would argue that the complexity of biological regulation may force us out of #2 and into #3. At this meeting, I saw some pretty strong evidence that a simple thermodynamic model can explain a fair amount of transcriptional regulation. So is that a theory, a simple explanation that most of us believe? And we just need some additional theory to explain the exceptions? Or, alternatively, can we just embrace the exceptions, come up with some effective theory based on regression, and then say we’ve solved it totally? The latter sounds “wrong” somehow, but really, what’s the difference between that and the thermodynamic model? I don’t think that any of us can honestly say that the thermodynamic model is anything other than an effective representation of molecular processes that we are not capturing fully. So then how different is that than a SVM telling us there are 90 features that explain most of the variance? How much variance do you explain before it’s a theory and not a statistical model? 90%? How many features before it’s no longer science but data science? 10? I think that where we place these bars is a matter of aesthetics, but also defines in some ways who we are as scientists.

Personally, I feel like complexity is making things hopeless and we have to have a fundamental rethink transitioning from #2 to #3 in some way. And I say this with utmost fear and trepidation, not to mention distaste. And I’m not so sure I’m right. Rob holds very much the opposite view, and we had a conversation in which he said, well, this field is messy right now and it might take decades to figure it out. He could be right. He also said that if I’m right, then it’s essentially saying that his work on finding a single equation for transcription is not progress. Did I agree that that was not progress? I felt boxed in by my own arguments, and so I had to say “Yeah, I guess that’s not progress”. But I very much believe that it is progress, and it’s objectively hard to argue otherwise. I don’t know, I’m deeply ambivalent on this myself.

Whew. So as you can probably tell, this conference got pretty meta by the end. Ido said this meeting was not a success for him, because he hasn’t come away with any tangible, actionable items. I agree and disagree. This meeting was sort of like The Breakfast Club. It was a bunch of us from different points of view, getting together and arguing, and over time getting in touch with our innermost hopes and anxieties. Here’s a quote from Wikipedia on the ending of the movie:
Although they suspect that the relationships would end with the end of their detention, their mutual experiences would change the way they would look at their peers afterward.
I think that’s where I am. I actually learned a lot about regulatory DNA, about real question marks in the field, and got some serious challenges to how I’ve been thinking about science these days. It’s true that I didn’t come away with a burning experiment that I now have to do, but I would be surprised if my science were not affected by these discussions in the coming months and years (in fact, I am now resolved to work out a theory together with Ian in the lab by the end of the summer).

At the end, Angela put up the Ann Friedman’s Disapproval Matrix:

She remarked, rightly, that even when we disagreed, we were all pretty much in the top half of the matrix. I think this speaks to the level of trust and respect everyone had for each other, which was the best part of this meeting. For my part, I just want to say that I feel lucky to have been a part of this conference and a part of this community.

Walk-up song match game coming soon, along with a playlist!