Saturday, December 20, 2014

Time-saving tip–make a FAQ for almost anything

One of the fundamental tenets of programming is DRY: Don’t Repeat Yourself. If you find yourself writing the same thing multiple times, you’re creating a problem in that you have to maintain consistency if you ever make a change, and you’ve had to write it twice.

In thinking about what I have to do in my daily life, a lot of it also involves repetitive tasks. The most onerous of these are requests for information that require somewhat length e-mails or what have you. Yet, many times, I end up answering the same questions over and over. Which brings up a solution: refer to a publicly available FAQ.

I first did this for RNA FISH because I was always getting similar questions about protocols and equipment, etc. So I made this website, which I think has been useful both for researchers looking for answers and for me in terms of saving me time writing out these answer for every person I meet.

I also recently saw a nice FAQ recently (can’t find the link, darn!) where someone had put together a letter of recommendation FAQ. As in, if you want a letter of recommendation from this person, here’s a list of details to provide and a list of criteria to determine whether they would be able to write a good one for you.

Another senior professor I met recently said that she got sick of getting papers from her trainees that were filled with various errors. So she set up a list of criteria and told everyone that she wouldn’t look at anything that didn’t pass that bar. Strikingly, she said that the trainees actually loved it–it made a nice checklist for them and they knew exactly what was expected of them.

I think all of these are great, and I think I might make up such documents myself. I’m also thinking of instituting an internal FAQ for our data management in the lab. Any other ideas?

Sunday, December 14, 2014

Origin and impact of stories in life sciences research: is it all Cell’s fault?

I found this article by Solomon Snyder to be informative:

http://www.pnas.org/content/110/7/2428.full

Quick summary: Benjamin Levin realized in the 80s that the tools of molecular biology had matured to the point where one could answer a question “soup to nuts”. So his goal was to start a journal that would publish such “stories” that aimed to provide a definitive resolution to a particular problem. That journal was Cell, and, well, the rest is history–Cell is the premier journal in the field of molecular and cellular biology, and is home to many seminal studies. Snyder then says that Nature and Science and the other journals quickly picked up on this same ideal, with the result that we now have a pervasive desire to “tell a story” in biomedical research papers.

I was talking with Olivia about this, and we agreed that this is pretty bad for science. Many issues, the most obvious of which is that it encourages selective omission of data and places undue emphasis on “packaging” of results. Here are some thoughts from before that I had on storytelling.

I also wonder if the era of the scientific story is drawing to a close in molecular biology. The 80s were dominated by the “gene jock”: phenotype, clone, biochemistry, story, Cell paper. I feel like we are now coming up on the scientific limitations of that approach. Molecular biology has in many ways matured in the sense that we understand many of the basic mechanisms underlying cellular function, like how DNA gets replicated and repaired, how cells move their chromosomes, and elements of transcription, but we still have a very limited understanding of how all this fits together for overall cellular function. Maybe these problems are too big for a single Cell paper to contain the “story”–in fact, maybe it’s too big to be just a single story. Maybe we’re in the era of the molecular biology book.

As an example, take cancer biology. It seems like big papers often run from characterizing a gene to curing mice to looking for evidence for the putative mechanism in patient samples. Yet, I think it is fair to say that we have not made much progress overall in using molecular biology to cure cancer in humans. What then is the point of those epic papers crammed full of an incredible range of experiments? Perhaps it would be better to have smaller, more exploratory papers that nibble away at some much larger problems in the field.

In physics, it seems like theorists play a role in defining the big questions that then many people go about trying to answer. I wonder if an approach like this might have some place in modern molecular biology. What if we had people define a few big problems and really think about them, and then we all tried to attack different parts of it experimentally based on that hard thinking? Maybe we’re not quite there yet, but I wouldn’t be surprised if this happened in the next 10-20 years.

(Note: this is most certainly not an endorsement for ENCODE-style “big science”. Those are essentially large-scale stamp collecting expeditions whose value is wholly different. I’m talking about developing a theory like quantum mechanics and then trying to prove it, which is a very different thing–and something largely missing from molecular biology today. Of course, whether such theories even exist in molecular biology is a valid question…)

Saturday, December 13, 2014

The Shockley model of academic performance

I just came across a very interesting post from Brian McGill about William Shockley’s model for why graduate student performance varies so much. Basically, the point is that being successful (in this case, publishing papers) requires clearing several distinct hurdles, and thus requires the following skills:
  1. ability to think of a good problem
  2. ability to work on it
  3. ability to recognize a worthwhile result
  4. ability to make a decision as to when to stop and write up the results
  5. ability to write adequately
  6. ability to profit constructively from criticism
  7. determination to submit the paper to a journal
  8. persistence in making changes (if necessary as a result of journal action).
Now, as Brian points out, if you were 50% better at all of these (not way beyond the norm, but just a little bit better), then your probability of succeeding in your assigned task (which is the product of the individual probabilities) is roughly 25 times better. This is huge! And it’s also to me a reason for great hope. The reason is that if, alternatively, being 25 times better required being 25 times better at any one particular thing, then it seems to me that it would require at least some degree of unusually strong innate ability in that one area. Like, if it was all about writing fast, then someone who was a supernaturally fast writer would just dominate and there’s nothing you could really do to improve yourself to that extent. But 50%? I feel like I could get 50% better at a lot of things! And so can you. Here are some thoughts I had about creativity, writing with speed, execution and rejection, and there are tons of other ways to get better at these things. Note that by this model, by far the most important quality in a person is the ability to reflect on their strengths and weaknesses and improve themselves in all of these categories.

I think this multiplicative model becomes even more interesting when you talk about working together with people in a lab. One point is that establishing a lab culture in which everyone pushes each other in all regards is critical and will have huge payoffs. Practically, this means having everyone buy in to what we collectively think of as a worthwhile idea, how we approach execution, how to write, what our standards of rigor are, and sharing stories of rejection and subsequent success through perseverance. This also provides some basis for the disastrous negative consequences of having a toxic person in lab: even if the effects on each other person in the lab in all or even some of these qualities are small, in aggregate, it can have a huge effect.

The other point is delegation strategy. It’s clear that in this model, one must avoid bottlenecks at all costs. This means that if you are unable to do something for reasons of time or otherwise and the person you are working with is also unable to do that task, things are going to get ugly. The most obvious case is that most PIs have only a limited capacity (if any) to actually work on a project. So if a trainee is unable to work on the project, nothing will happen. Next most obvious case is inability to write. If the trainee is unable to write and you as a PI have no time or desire to write, papers will not get written, period. Deciding how much time to invest in developing a trainee’s skills to shore up particular weaknesses is a related but somewhat different matter, and one that I think depends on the context.

This model also maybe provides some basis for the importance of “grit” or resilience or motor or drive or whatever it is you want to call it. These underlie those items on the list that are the hardest to change through mentorship. If someone just doesn’t have an ability to work on a project, then there’s not a whole lot you can do about it. If someone does not have the determination to do all the little things required to finish a project or to stick to it in the face of rejection, it will be hard to make progress, and there’s not much that you can do to alleviate these deficiencies as a mentor. I think many PIs have made this realization, and I have often gotten the advice that the most important thing they look for in a person is enthusiasm and drive. I would add to this being open to reflection and self-improvement. Everything else is just gravy.

Sunday, November 23, 2014

The most annoying words in scientific discourse

Most scientific writing and discourse is really bad. Like, REALLY bad. How can we make it better? There are some obvious simple rules, like avoiding passive voice, avoiding acronyms, and avoiding jargon.

I wanted to add another few items to the list, this time in the form of words that typically signify weak writing (and sometimes weak thinking). Mostly, these are either ambiguous, overused, or pointless meta-content just used to mask a lack of real content. Here they are, along with my reasons for disliking them:

Novel. Ugh, I absolutely hate this word. It’s just so overused in scientific discourse, and it’s taken on this subtext relating to how interesting a piece of work is. Easily avoided. Like “Our analysis revealed novel transcript variants.” Just say “new transcript variants”.

Insight. One of the best examples of contentless meta-content. If any abstract says the word insight, nine times out of ten it’s to hide a complete lack of insight. For example: “Our RNA-seq analysis led to many novel insights.” Wait, so there are insights? If so, what are these insights? If those insights were so insightful, I’m pretty sure someone would actually spell them out. More than likely, we’re talking about “novel transcript variants” here.

Landscape. Example of a super imprecise word. What does this mean anyway? Do you mean an arrangement of shrubbery? Or do you mean genome-wide? In which case, say genome-wide. Usually, using the word landscape is an attempt to evoke some images like these:


Now exactly what do these images mean? Speaking of which…

Epigenetic. Used as a placeholder for “I have no idea what’s going on here, but it’s probably not genetic”. Or even just “I have no idea what’s going on here whatsoever”. Or chromatin modifications. Or all of this at once. Which is too bad, because it actually is a useful word with an interesting meaning.

Paradigm. Need I say more?

Robust. Use of the word robust is robust to perturbations in the actual intended meaning upon invoking robustness. :)

Impact. As in “impact factor”. The thing that bugs me about this word is that its broad current usage really derives from the Thomson/Reuters calculation of Impact Factor for journal “importance”. People now use it as a surrogate for importance, but it’s always sort of filtered through the lens of impact factor, as though impact factor is the measure of whether a piece of work is important. So twisted has our discourse become that I’ve even heard the word impactful thrown about, like "that work was impactful". It's a word, but a weird one. If something is influential, then say influential. If it’s important, then say important. If an asteroid hits the moon, that’s impact.

These words are everywhere in science, providing muddied and contentless messages wherever they are found. For instance, I’m sure you’ve seen some variant of this talk title before: “Novel insights into the epigenetic landscape: changing the paradigm of gene regulation.”

To which I would say: “Wow, that sounds impactful.”

[Updated to include Paradigm, forgot that one.]
[Updated 12/13: forgot Robust, how could I?]

Saturday, November 22, 2014

Verdict on a (mostly) Bacn-free week of e-mail: totally awesome!

It’s been one week since I tabulated my e-mail and decided to run a few experiments based on the results. Quick recap: I found that I got a lot of Bacn (solicited but often unimportant e-mail, like tables of contents and seminar announcements), and this was contributing to a sense of being overwhelmed by e-mail. So I resolved to do the following:
  1. Filter out primary conveyors of Bacn to a Bacn folder that I would skim through rapidly just a few times a day.
  2. Deal decisively with the e-mail when I read it–either reply or get off the pot, so to speak.
Quick summary is that this experiment has been a great success! I feel much more efficient, less overwhelmed, and less likely to miss important things. Highly recommended.

Here’s a few more details. So I have two e-mail addresses. For the most part, one of them gets all my work e-mail, and the other one is mostly personal, but has a lot of Bacn and spam in it. Before, I had been combining both into my inbox. So that was easy: just check my work e-mail and separate out the personal one to check over on an as needed basis. Of course, I’m still getting a lot of Bacn on my work e-mail, so I then made filters to automatically file Bacn into a separate folder. I initially thought this was going to be super simple. Turns out it was a bit more work than I thought: there are MANY different Bacn providers at Penn. So it took a while to set up a filter for each of them. But it worked: almost all the Bacn went to a specific folder.

The results were glorious! I found I spent much less time looking through all these unimportant e-mails during the day, and then I could batch process them much more efficiently during a period of downtime. There is little better than selecting a huge block of e-mail and deleting them all at once! A few times, I would get a real e-mail from a Bacner that I needed to respond to, but it turns out that they were never urgent nor terribly important, and I could deal with them during this downtime period (which is probably when I should be dealing with them anyway).

I didn’t anticipate how much this e-mail filtering would engender peace of mind. I guess I was expending more mental energy that I thought processing all these different e-mails in a single stream. The steady stream of notifications that we all know we should ignore but don’t thinned out considerably, and I felt like my focus was better. I didn’t quantify actual productivity gains there may have been (although I suspect there was some), but I can definitely say that the perceived quality of e-mail life went up considerably. Definitely felt like I was in much more control over what I was doing. Basically, it made it much easier to process e-mail the way I always knew I should in theory but rarely actually did in practice.

I think this filtering also really helped with the other aspect of my experiment, which was to be decisive (actually something I have been working on in general). The idea here was to read each e-mail only once before doing something with it, which means either marking as read or replying. Or at least getting as close to this ideal as possible. Since all the e-mails in front of me now have a similar status, I found it a bit easier to do this, because I’m not changing “modes” from one e-mail to the next.

Decisiveness is hard, and something I’ve struggled with for a long time, both in the context of e-mail or otherwise. And being deliberate is not necessarily a bad thing. But I think most of us tend to undervalue our time, and I feel like being decisive is making a tradeoff between making the best possible decision slowly and making a good enough decision quickly. Or, as is more often the case, making the best possible decision slowly and making the best possible decision quickly–indeed, I feel like much of the time, the “decision making process” is really more like a slow process of rationalizing a decision you’ve already essentially made. So I’m trying to just go with my instincts and then thinking, well, if I made a mistake, so be it. The key thing is to think to myself “Well, am I going to get any new information that might change my decision? If not, then go for it.” That actually takes care of a lot of situations, e-mail or otherwise.

UPDATE: Forgot to mention that I got two e-mails this past week from close collaborators with the subject line "Not Bacn". :)

Sunday, November 16, 2014

A week in my e-mail life

[Note: This is a longish post, so here’s an “abstract” that gets across the main points: Academics get a lot of e-mail. I decided to catalog my e-mails for the week to see if I could identify any patterns. I found that a large amount of my e-mail was “Bacn”, meaning e-mails that I am in some way supposed to get, but are typically not very important, like seminar announcements, etc. A lot of the more research-oriented e-mail was related to logistics, like shipping, etc. As for what to do about it, I think the number one thing is to pre-filter a bunch of the Bacn, which typically just comes from a relatively limited number of easily identified people and only very very rarely requires any sort of immediate action. This will help make it easier to process it in batch mode, which is another area where I could really improve how I handle e-mail, rather than replying in a more "real time" fashion. And I will try to be more decisive in handling e-mail. An update on how all this worked next week.]

As is the case for most academics these days, I get a lot of e-mail. And as is the case for most academics, I love to complain about how much time it takes up. I was thinking about this recently when I came across the line “E-mail is everyone else’s to do list for you.” Which I thought was an interesting way of thinking about it. I mean, just because someone has my e-mail address doesn’t necessarily give them the right to command my attention, right? But then I thought a bit more, and I wondered if my attention really is being dragged unnecessarily in unwanted directions, or is it primarily spent on things that I want to pay attention to. Are there ways that I can make myself more efficient?

So I decided to catalog all the e-mail I got in the last week. First, a couple notes on methodology. I basically just looked through my e-mail for the past week and tried not to delete anything (which I normally don’t do, except for spam). Going through, I categorized the e-mail (more on that later), kept track of whether I replied or forwarded the e-mail, and how long it took me to reply. I also kept track of whether the e-mail was initiated by myself or came from someone else and whether the e-mail was directed to me specifically or whether it was just a general broadcast (some judgement calls in this).

Here's what I found:

Good news is that I don't instigate a lot of e-mail, which makes me feel better about myself–in fact, so few that I didn’t really think it was worth doing a similar analysis on my sent e-mail. But I did reply to a relatively large number of e-mails. But now that I think about it, I would guess this is the case for most academics. Most of their e-mail misery comes from others randomly bugging you, and I think it’s usually just a handful of others.

As for speed of reply, I’m generally quite fast, but there’s a long tail:
Zooming in on the short time-scale:

A pretty substantial number of replies actually happened within minutes, sort of like texting or something, then a tail of longer times to reply.  I actually expected this to be a bit more bimodal, but it's pretty unimodal, but with a long tail. I did notice that I have chunks of reply e-mail at the beginning and end of the day, which is good–my intention lately has definitely been to try and do as much batch processing as possible. I think I could be more disciplined about this, though.

Of course, the key piece of data is what different sorts of e-mail I get. Here’s how I broke it down:
  1. Spam
    1. Spam spam. Like, Nigerian Bankers who have a great deal on Viagra for you. 
    2. Science spam. This is various marketing for HPLC equipment or strange journals or whatever. I get a lot of this, presumably because various vendors have sold my e-mail to direct marketers.
  2. Bacn. Bacn is a very interesting category. It is like spam, but a level up: it’s something where there is some sort of relationship there, including perhaps direct solicitation of the e-mail. Here is how I broke that down:
    1. Personal. e.g. NYtimes.com table of contents.
    2. TOC. Tables of contents of various journals.
    3. Science. ResearchGate, Nature Publishing Group
    4. Penn Bacn. Seminar announcements, thesis defenses, visitors, latest fund-raising drive.
  3. Scheduling. This includes setting up a meeting or lunch or whatever with someone, thesis committee meeting times, etc.
    1. Scheduling Bacn. These are scheduling e-mails in which you’re just sort of along for the ride. You don’t have to do anything, but the e-mail is there, perhaps asking you if you want to meet with so and so.
  4. Teaching. Students asking for help or whatever.
  5. Evaluations and Letters. Someone asking for you to evaluate a person or paper or whatever in some way, shape or form. An important part of our lives. I’m of course happy to do this for people who have been in my life in the lab. Less exciting is...
    1. Evaluations and Letters Bacn. This is any sort of evaluation of someone or something from outside. This includes, but is not limited to, reviewing papers.
  6. Research. This is what we’re supposed to be doing, right? Well, that all depends…
    1. Logistics. This is all stuff about orders, handling of manuscripts, lab organization, etc.
    2. Collaborations. This is managing various collaborations with other groups. This does not include close collaborators with whom we are doing real science together with. It’s more just like people whom we’re doing a one-off experiment with. Often, there is overlap with the Logistics category.
    3. Research Bacn: Seems like a weird category, right? These are what I would consider relatively unsolicited e-mails that are random and tangential to your research effort, but are science related. Like, someone sends you a link to a paper they wrote. Or someone had a thought after meeting with you. Or something. This is not quite Bacn in the sense that you may not necessarily be able to ignore all of it, but it’s not quite important enough not to be Bacn.
    4. Actual Research: This is, you know, actual research. Also a proxy for what I consider the most important to me. Mostly conversations with people whom we are working with closely about science. This can include making decisions about scientific goings-on in the lab, or thoughts on an experiment, or how to interpret something–basically, the fun part of it.

So what’s the breakdown? Here are some pie-charts (I’ll get to strategies I’m thinking about implementing later).








Let’s start with spam. Turns out I don’t get that much of it. It certainly doesn’t take that long to get rid of them. In fact, I have to say that I sometimes rather enjoy them for their humorous qualities. Here are four of my favorite examples:

Message 1:
Subject: ВОССТАНОВИМ ЗАПУЩЕННЫЙ УЧЕТ
Вы руководитель от Вас внезапно ушел бухгалтер!
Вас предали? Вы подставлены? Завтра налоговая?


БУХГАЛТЕРСКИЙ БЕСПРЕДЕЛ!!!

Message 2:

Subject: Лучший Новогодний подарок - безопасность ваша и ваших близких!

Message 3:
Subject: Your  Account Was Banned
This is a joke :)

Than trying to work mounted on clumsy, long webfeet by the
ecriture artiste which the french writers that hears. Similarly,
employing the eye, it is a moment without devoting his heart
upon mahadeva. Towards the abode of bhishma, casting aside

their.

Message 4:
Subject: Mandy - 100% results.
Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lolGy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.

I think the Gmail spam filters do a pretty good job of getting rid of most of this cruft.

Bacn. This was perhaps the biggest surprise. Most of what I get is Bacn. And it’s super annoying to sort through, due primarily to the very nature of Bacn, which is something that you might conceivably be interested in. And one of the worst offenders is Penn! The amount of Penn Bacn I get is crazy. It’s primarily seminar announcements (and reannouncements (and re-reannouncements)) and various other random stuff that I may in theory want to know about, but I typically won’t. And it typically comes from a few prime Bacn distributors. The only problem is that I will sometimes get something important from these Bacners, and so I can’t just automatically filter them out into the trash. Hmm. These typically come mostly in the morning, which is when I try and get real work done.

Funny note about Bacn: I made some Bacn myself! Had to send out an e-mail to the graduate group about something or other. I feel sort of bad about it now. Even funnier, I even managed to send research-related Bacn to myself in the form of an e-mail to myself of a paper I thought I should read. Of course, I paid it about as much attention as all my other research-related Bacn… :)

Scheduling. Surprisingly large amount of e-mail just to schedule appointments. This was actually a relatively tame week in that regard, so I was sort of surprised how much e-mail circulated about that.

Research. Large number of logistic e-mails, often about shipping, etc. The shipping and ordering stuff doesn’t take up too much time, honestly, perhaps because we have a relatively small operation. It was interesting to see how much Research “collaborations” took up. To me, this is partly a matter of how much you invest in your scientific community, sort of like being a good citizen. That said, it is clear that this can suck your brain quite easily. Research Bacn is I think something that I get a lot more of than I imagine most people getting, for various reasons. Surprisingly (unsurprisingly?) little time spent on actual Research Research e-mails. Which I actually regard overall as a good thing: for most research discussions, I talk with the people in my lab directly. I think that is a far more efficient way to get things done, generally, and avoids those super long e-mails that take hours to craft.

So what to do with this data? I think I came to a few primary conclusions:
  1. I need to organize my e-mail so that the Bacn is out of sight most of the time. I try my best to ignore Bacn most of the time, but in practice, it takes a lot of discipline to avoid looking at all those e-mails during the day, especially when there are sometimes other interesting e-mails that interspersed in my inbox as well that I may very well want to deal with. To do this, I’ve implemented filters on Gmail to just send most of these to a specific folder that I will check once a day or so, hopefully in a really fast batch mode. There is some slight chance that I might miss a timely e-mail, but whatever. Looking at it now, perhaps this is obvious, but somehow I just didn't think of it before.
  2. I get a lot of research-related logistical e-mails that I should probably be delegating about ordering and the such. These are not quite Bacn, because I (or someone in the lab) do need to give some input or really read them, sometimes in a timely manner. But just as often not. I also noticed I got a few more of these this week than usual.
  3. Teaching: I didn’t get a lot of teaching e-mail this week, which is nice, but somewhat unusual. I actually have a specific teaching gmail account that I ask students to send to–this organization is very useful, and it allows me to make others do some of the organizing for me. Of course, you have to actually tell your students about it, which I of course forgot to do this term in my grad class. But I will definitely remember next term in my big required undergrad class. I will also be sure to have a policy that I only respond to student e-mails on one particular time of the week, no exceptions.
  4. Perhaps the most important lesson is to BE DECISIVE. Someone (and I’m so sorry, I forget who, and the comments got deleted) left an awesome comment on the blog somewhere about a simple rule, which is read each e-mail only once. I think that’s absolutely right. I definitely found myself reading an e-mail and then mulling it over and then mulling it over again. I have to not do that. If it requires thought, I should just make a (prioritized) to-do list item for it and then mark it as read and be done with it. Otherwise, I’m just cycling over and over again.
Anyway, those are some thoughts. I will try and implement this this week and post again once the results of this reorganization are in.

Sunday, November 9, 2014

My favorite quote about LaTeX

Argh, just finished struggling through submitting a LaTeX document to a journal. And I think I still screwed up and will have to do some more fussing. My only hope (and a fading one at that) is that things will not devolve to the point where I just have to copy the whole damn thing into Google Docs, where you can actually spend your time on, you know, doing real work.

So I just Googled around and found the following page, which has my new favorite quote about LaTeX:
Latex ("LaTeX" if you're pretentious as hell) is the biggest piece of shit in the history of both pieces and shit.
Yes.

(And yes, before you say it, I know what you are going to say.)