RajLab: December 2015

Tuesday, December 29, 2015

Is the academic work ethic really toxic?

Every so often, I’ll read something or other about how the culture of work in academia is toxic, encouraging people to work 24/7/52 (why do people say 24/7/365?) and thus ignore all other aspects of their existence and in the process destroying their life. As I’ve written before, I think this argument gets it backwards. I think most academics work hard because they want to and are immersed in what they are doing, not because of the “culture”. It is the conflation of hours and passion that lead to confusion.

Look, I know people who are more successful than I am and work less than I do. Good for them! That doesn’t mean I’m going to start working less hard. To me, if you’re thinking “I need to work X hours to get job Y/award Z”, well, then you’re in the wrong line of work. If you’re thinking “I really need to know about X because, uh, I just need to know” then academia might be for you. Sure, sometimes figuring out X requires a lot of work, and there is a fair amount of drudgery and discipline required to turn an idea into a finished paper. Most academics I know will make the choice to do that work. Some will do it at a pace I would find unmanageable. Some will do it at a pace I find lethargic. I don’t think it really matters. I read a little while ago that Feng Zhang goes back to work every day after dinner and works until 3am doing experiments himself in the lab (!). I couldn’t do that. But again, reading about Zhang, I think it’s pretty clear that he does it because he has a passion for his work. What’s wrong with that? If he wants to work that way, I don’t see any reason he should be criticized for it. Nor, conversely, lionized for it. I think we can praise his passion, though. Along those lines, I know many academics who are passionate about their work and thus very successful, all while working fairly regular hours (probably not 40/week, but definitely not 80/week), together with long vacations. Again, the only requirement for success in science is a desire to do it, along with the talent and dedication to finish what you start.

I think this conflation of hours and passion leads to some issues when working with trainees. To me, I most enjoy working with people who have a passion for their work. Often, but not always, this means that they work long-ish hours. If someone is not motivated, then a symptom is sometimes working shorter hours–or, other times, working long hours but not getting as much done. If we’re to the point where I’m counting someone’s hours, though, then it’s already too late. For trainees, if your PI is explicitly counting hours, then that means either you should find a new PI or carefully consider why your PI is counting your hours. What’s important is that both parties should realize that hours are the symptom, not the underlying condition.

Monday, December 28, 2015

Is all of Silicon Valley on a first name basis?

One very annoying software trend I've noticed in the last several years is the use of just first names in software. For instance, iOS shows first names only in messages. Google Inbox has tons of e-mail conversations involving me and someone named "John". Also, my new artificially intelligent scheduling assistant (which is generally awesome) will put appointments with "Jenn" on my calendar. Hmm. For me, those variables need a namespace.

I'm assuming this is all in some effort to make software more friendly and conversational, and it demos great for some Apple exec to say "Ask Tim if he wants to have lunch on Wednesday" into his phone and have it automatically know they meant Tim Cook. Great, but in my professional life (and, uh, I'm guessing maybe Tim Cook's also), I interact with a pretty large number of people, some only occasionally, making this first name only convention pretty annoying.

Which makes me wonder if the logical next step is just to refer to people by their e-mail or Twitter. I'm sure that would generate a lot of debate as to which is the identifier of choice, but I'm guessing that ORCID is probably not going to be it. :)

Wednesday, December 23, 2015

Bragging about data volume is lame

I've noticed a trend in some papers these days of bragging about the volume of data you collect. Here's an example (slightly modified) from a paper I was just looking at "We analyzed a total of 293,112 images." Often times, these numbers serve no real purpose except to highlight that you took a lot of data, which I think is sort of lame.

Of course, numbers in general are good and are an important element in describing experiments. Like "We took pictures of at least 5000 cells in 592 conditions." That gives a sense of the scale of the experiment and is important for the interpretation. But if you just say "We imaged a total of 2,948,378 cells", then that provides very little useful information about why you imaged all those cells. Are they all the same? Is that across multiple conditions? What is the point of this number except to impress?

And before you leave a comment, yes, I know we did that in this paper. Oops. I feel icky.

Tuesday, December 22, 2015

Reviewing for eLife is... fun?

Most of the time, I find reviewing papers to be a task that, while fun-sounding in principle, often becomes a chore in practice, especially if the paper is really dense. Which is why I was sort of surprised that I actually had some fun reviewing for eLife just recently. I've previously written about how the post-review harmonization between reviewers is a blessing for authors because it's a lot harder to give one of those crummy, ill-considered reviews when your colleagues know it's you giving them. Funny thing is that it's also fun for reviewers! I really enjoy discussing a paper I just read with my colleagues. I feel like that's an increasingly rare occurrence, and I was happy to have the opportunity. Again, well done eLife!

Sunday, December 20, 2015

Impressions from a couple weeks with my new robo-assistant, Amy Ingram

Like many, I both love the idea of artificial intelligence and hate spending time on logistics. For that reason, I was super excited to hear about X.ai, which is some startup in NYC that makes an artificially intelligent scheduler e-mail bot. It takes care of this problem (e-mail conversation):

“Hey Arjun, can we meet next week to talk about some cool project idea or another?”
“Sure, let’s try sometime late next week. How about Thursday 2pm?”
“Oh, actually, I’ve got class then, but I’m free at 3pm.”
“Hmm, sorry, I’ve got something else at 3pm. Maybe Friday 1pm?”
“Unfortunately I’m out of town on Friday, maybe the week after?”
“Sure, what about Tuesday?”
“Well, only problem is that…”

And so on. X.ai’s solution works like this:

“Hey Arjun, can we meet next week to talk about some cool project idea or another?”
“Sure, let’s try sometime late next week. I’m CCing my assistant Amy, who will find us a time.”

And that’s it! Amy will e-mail back and forth with whoever wrote to me and find a time to meet that fits us both, putting it straight on my calendar without me having to lift another finger. Awesome.

So how well does it work? Overall, really well. It took a bit of finagling at first to make sure that that my calendar was appropriately set up (like making sure I’m set to “available” even if my calendar has an all day event) and that Amy knew my preferences, but overall, out of the several meetings attempted so far, only one of them got mixed up, and to be fair, it was a complicated one involving multiple parties and some screw ups on my part due to it being the very first meeting I scheduled with Amy. Overall, Amy has done a great job removing scheduling headaches from my life–indeed, when I analyzed a week in my e-mail, I was surprised how much was spent on scheduling, and so this definitely reduces some overhead. Added benefit: Amy definitely does not drop the ball.

One of the strangest things about using this service so far has been my psychological responses to working with it (her?). Perhaps the most predictable one was that I don’t feel like a “have your people call my people” kind of person. I definitely feel a bit uncomfortable saying things like “I’m CCing my assistant who will find us a time”, like I’m some sort of Really Busy And Important Person instead of someone who teaches a class and jokes around with twenty-somethings all day. Perhaps this is just a bit of misplaced egalitarian/lefty anxiety, or imposter syndrome manifesting itself as a sense that I don’t deserve admin support, or the fact that I’m pretty sure I’m not actually busy enough to merit real human admin support. Anyway, whatever, I just went for it.

So then this is where it starts getting a bit weird. So far, I haven’t been explicitly mentioning that Amy is a robot in my e-mails (like “I’m CCing my robo-assistant Amy…”). That said, for the above reasons of feeling uncomfortably self-important, I actually am relieved when people figure out that it’s a robot, since it somehow seems a bit less “one-percenty”. So why didn't I just say she’s a robot right off the bat? To be perfectly honest, when I really think about it, it’s because I didn't want to hurt her feelings! It’s so strange. Other examples: for the first few meetings, Amy CCs you on the e-mail chain so you can see how she handles it. I felt a strong compulsion to write saying “Thank you!” at the end of the exchange. Same when I write to her to change preferences. Like

“Amy, I prefer my meetings in the afternoon.”
“Okay, I have updated your preferences as follows…”
… “Thank you?!?!?”

Should I bother with the formalities of my typical e-mails, with a formal heading and signature? I think I’ve been doing it, even though it obviously (probably?) doesn’t matter.

Taking it a bit further, should I be nice? Should I get angry if she messes something up? Will my approval or frustration even register? Probably not, I tell myself. But then again, what if it’s part of her neural network to detect feelings of frustration. Would her network change the algorithms somewhat in response? Is that what I would want to happen? I just don’t know. I have to say that I had no idea that this little experiment would have me worrying about the intricacies of human/AI relations.

In some sense, then, I was actually a bit relieved at the outcome of the following exchange. As a test, Sara worked with Amy to set up an appointment for us to get a coffee milkshake (inside joke). She then told Amy to tell me that I should wear camouflage to the appointment, a point that Amy dutifully relayed to me:

Hi Arjun,

I just wanted to pass along this message I received from Sara. It doesn’t look like it’s a message I can provide an answer to, so I suggest you follow up with Sara. directly.
---
Thanks, Amy! 2 o'clock would be great. And please make sure he wears camouflage. Sara
---
Amy

To which I responded:

Hi Amy,

Thanks for the note. Can you please tell Sara that I don’t own any camouflage?

Thanks,
Arjun

And then I heard this back:

Hi Arjun,
Right now I can't pass requests like this to your guests. I think your message would have a stronger impact if you sent it directly to Sara.
Amy

Ah, a distinctly and reassuringly non-human, form-like response. What a relief! Looks like we've still got a little way to go before we have to worry about her (its?) feelings. Still, the singularity is coming, one meeting at a time!

Saturday, December 19, 2015

Will reproducibility reduce the need for supplementary figures?

One constant refrain about the kids these days is that they use way too much supplementary material. All those important controls, buried in the supplement! All the alternative hypotheses that can’t be ruled out, buried in the supplement! All the “shady data” that doesn’t look so nice, buried in the supplement! Now papers are just reduced to ads for the real work, which is… buried in the supplement! The answer to the ultimate question of life, the universe and everything? Supplementary figure 42!

Whatever. Overall, I think the idea of supplementary figures makes sense. Papers have more data and analyses in them than before, and supplementary figures are a good way to keep important but potentially distracting details out of the way. To the extent that papers serve as narratives for our work as well as documentation of it, then it’s important to keep that narrative as focused as possible. Typically, if you know the field well enough to know that a particular control is important, then you likely have an interest sufficient enough to go to the trouble to dig it up in the supplement. If the purpose of the paper is to reach people outside of your niche–which most papers in journals with big supplements are attempting to do–then there’s no point in having all those details front and center.

(As an extended aside/supplementary discussion (haha!), the strategy we’ve mostly adopted (from Jeff Gore, who showed me this strategy when we were postdocs together) is to use supplementary figures like footnotes, like “We found that protein X bound to protein Y half the time. We found this was not due to the particular cross-linking method we used (Supp. Fig. 34)”. Then the supplementary figure legend can have an extended discussion of the point in question, no supplementary text required. This is possible because unlike regular figure legends, you can have interpretation in the legend itself, or at least the journal doesn’t care enough to look.)

I think the distinction between the narrative and documentary role of a paper is where things may start to change with the increased focus on reproducibility. Some supplementary figures are really important to the narrative, like a graph detailing an important control. But many supplementary figures are more like data dumps, like “here’s the same effect in the other 20 genes we analyzed”. Or showing the same analysis but on replicate data. Another type of supplementary figure has various analyses done on the data that may be interesting, but not relevant to the main points of the paper. If not just the data but also the analysis and figures are available in a repository associated with the paper, then is there any need for these sorts of supplementary figures?

Let’s make this more concrete. Let’s say you put up your paper in a repository on github or the equivalent. The way we’ve been doing this lately is to have all processed data (like spot counts or FPKM) in one folder, all scripts in another, and when you run the scripts, it takes the processed data, analyzes it, and puts all the outputted graphical elements into a third folder (with subfolders as appropriate). (We also have a “Figures” folder where we assemble the figures from the graphical elements in Illustrator; more in another post.) Let’s say that we have a side point about the relative spatial positions of transcriptional loci for all the different genes we examined in a couple different datasets; e.g., Supp Figs. 16 and 21 of this paper. As is, the supplementary figures are a bit hard to parse because there’s so much data, and the point is relatively peripheral. What if instead we just pointed to the appropriate set of analyses in the “graphs” folder? And in that folder, it could have a large number of other analyses that we did that didn’t even make the cut for the supplement. I think this is more useful than the supplement as normally presented and more useful than just the raw data, because it also contains additional analyses that may be of interest–and my guess is that these analyses are actually far more valuable than the raw data in many cases. For example, Supp Fig. 11 of that same paper shows an image with our cell-cycle determination procedure, but we had way more quantitative data that we just didn’t show because the supplement was already getting insane. Those analyses would be great candidates for a family of graphs in a repository. Of course, all of this requires these analyses being well-documented and browsable, but again, not sure that’s any worse than the way things are now.

Now, I’m not saying that all supplementary figures would be unnecessary. Some contain important controls and specific points that you want to highlight, e.g., Supp. Fig. 7–just like an important footnote. But analyses of data dumps, replicates, side points and the such might be far more efficiently and usefully kept in a repository.

One potential issue with this scheme is hosting and versioning. Most supplementary information is currently hosted by journals. In this repository-based future, it’s up to Bitbucket or Github to stick around, and the authors are free to modify and remove the repository if they wish. Oh well, nothing’s permanent in this world anyway, so I’m not so worried about that personally. I suppose you could zip up the whole thing and upload it as a supplementary file, although most supplementary information has size restrictions. Not sure about the solution to that.

Part of the reason I’ve been thinking about this lately is because Cell Press has this very annoying policy that you can’t have more supplementary figures than main figures. This wreaked havoc with our “footnote” style we originally used in Olivia’s paper because now you have to basically agglomerate smaller, more focused supplementary figures into huge supplementary mega-figures that are basically a hot figure mess. I find this particularly ironic considering that Cell’s focus on “complete stories” is probably partially to blame for the proliferation of supplementary information in our field. I get that the idea is to reduce the amount of supplementary information, but I don’t think the policy accomplishes this goal and only serves to complicate things. Cell Press, please reconsider!

Sunday, December 13, 2015

Cheese spread from the lab holiday party

Seriously awesome collection of cheeses from yesterday's lab holiday party! La Tur and Morbier (More Beer!) are some of my favorites, along with the brie Rohit brought. Eduardo's super stinky cheese was decent and definitely less funky tasting than smelling.

Blog hiatus (hopefully) over, and a few thoughts on writing grants

Had at least a couple folks ask why I haven’t written any blog posts lately. The answer is some combination of real-life work leading to writing fatigue and some degree of lack of inspiration. On a related note, writing grants sucks.

There have been some grants that I’ve had fun writing, but I would say they are in the distinct minority. I am of course not alone in that, but one of the reasons often trotted out for hating writing grants is that many scientists just hate writing in general, and I think that is true for a number of scientists that I know. Personally, though, I actually really like writing, and I typically enjoy writing papers and am reasonably fast at it, so it’s not the writing per se. So what is it that makes me leave the page empty until just days before the deadline while patiently waiting for the internet to run out of videos of people catching geoducks?

Part of it is definitely that grantwriting makes you sit and think about your work, how it fits into what we already know, and how it will tell us something new. It is certainly the case that grants can force you to critically evaluate ideas–writing is where weak ideas go to die, and that death can be painful. But I don’t think this is the whole story, either. I would say that the few grants I’ve really enjoyed writing are the ones where the process focused on thinking about the science I really want to do (or more likely already did) and explaining it clearly. So what is it about the other grants that make me try to find virtually any excuse to avoid writing them?

After a bit of reflection, I think that for me, the issue is that writing a grant generally often just feels so disingenuous. This is because I’m generally trying to write something that is “fundable” rather than what I really want to do. And I find it really REALLY hard to get motivated to do that. I mean, think about it. I’ve got to somehow come up with a “plausible plan” for research for 5 years, in a way that sounds exciting to a bunch of people who are probably not experts in the area and have a huge stack of applications to read. First off, if I ever end up in a situation where I’m actually doing what I thought I was going to do 5 years ago, I should probably be fired for complete lack of imagination. Secondly, the scope of what one typically proposes in a grant is often far less imaginative to begin with. Nobody ever proposes the really exciting thing they want to do, instead they just propose what reviewers will think is safe and reasonable. Not that these are some brilliant insights on my part; I think most applicants and reviewers are acutely aware of this, hence the maxim “An NIH grant should have 3 aims: 2 you’ve already done and 1 you’re never going to do”. So to the extent that everyone already knows all this, why do we bother with the whole charade?

I should say that I feel far less disingenuous writing “people” grants, by which I mean the fund-the-person-not-the-project grants like many junior investigator awards, HHMI and those as part of the NIH high-risk/high-reward program. At least there, I’m focusing more on the general area that we’re interested in in the lab, describing what makes us think it’s exciting, and why we’re well positioned to work on this topic, which is far more realistic than detailing specific experiments I’ll use to evaluate a particular hypothesis that I’ll probably realize is hopelessly naive after year 1. Of course, I think these are basically the criteria that people are judging “project” grants on as well for the most part, but at least I don’t have to pretend that I know what cell type I’m going to use in our RNA FISH validation studies in year 4… nor will I get dinged for a “bad choice” in this regard, either. This is not to say that writing people-grants is easy–it is particularly tricky to write confidently about yourself without sounding silly or boastful–but I’m just saying that the whole exercise of writing a people-grant involves writing about things that feel more aligned with the criteria by which I think grants should actually be evaluated.

(Sometimes I wonder if this whole system exists mainly to give people a way out of directly criticizing people. If you’re not too excited about a grant, can harp on technical issues, of which there are always plenty. I think this is an issue with our culture of criticism, which is probably a topic for another blog post.)

Carried a bit further, the logical conclusion to this line of argument is that we shouldn’t be judged on prospective plans at all, but rather just on past performance. Personally, I would much rather spend time writing about (and being judged on) science I actually did than some make-believe story about what I’m going to do. I remember a little while ago that Ron Germain wrote a proposal that was “people-oriented” in the sense that it suggested grants for all young investigators, with renewal based on productivity. His proposal engendered a pretty strong backlash from people saying that people-based grants are just a way to help the rich get richer (“Check your privilege!”). Hmm, don’t know that I’m qualified to delve into this too deeply, but I'm not sure I buy the argument that people-based grants would necessarily disfavor the non-elite, at least any more than the current system already does. Of course the current people-based grant system looks very elitist–it’s very small, and so naturally it will mostly go to a few elites. I don’t think that we can necessarily draw any conclusions from that about what people-based funding might look like on a broader scale. I also think that it might be a lot easier to combat bias if we can be very explicit about it, which I think may actually be easier in people-based grants.

As to the backlash against these sort of proposals, I would just say that many scientists have an inherent (and inherently contradictory) tendency towards supreme deification on the one hand and radical egalitarianism on the other. I think a good strategy is probably somewhere in between. Some people-based grants to encourage a bit more risk-taking and relieve some of the writing burden. Some project-based grants to keep programmatic diversity (because it would help fund important areas that are maybe not fashionable at the moment). I don’t know where this balance is, but my feeling is that we're currently skewed too far towards projects. For this reason, I was really excited about the NIH R35 program–until you follow this eligibility flowchart and find out that most roads lead to no. :(

Oh, and about the actual mechanics of writing a grant: my personal workflow is to write the text in Google Docs using PaperPile, then export to Pages, Apple’s little-used-but-perfect-for-grants word processing tool. The killer feature of Pages is that it’s SO much better than Word/Google Docs at allowing you to move figures to exactly where you want them and have them stay there, and as an added bonus, they will keep their full PDF-quality resolution. Only problem is that there are a grand total of around 18 people in the continental United States who use Pages, and none of them are writing a grant with you. Sad. Still better than LaTeX, though. ;)

Saturday, December 12, 2015

Diet Coke vs. Coke Zero: Lab holiday party challenge!

A couple weeks ago, I was remarking how I like Coke Zero much more than Diet Coke, and Uschi said, "Oh, isn't it just the same thing, but just with a different label on it?" To which I responded "No way, it's completely different!" So we did a little blind taste test right there on the spot and I of course got them mixed up. :)

Ah, but that was just n=1. So with that in mind, we decided to take a more, umm, scientific approach at our lab holiday party today. Uschi set up two cups for all 13 participants, one with Coke Zero and one with Diet Coke. Everyone then had to say which they thought was which. Results? 11 of 13 got it right! (And I was one of them, thankfully...)

Verdict: Coke Zero and Diet Coke are distinguishable, with a p of 0.0017.