Saturday, August 23, 2014

Is academia really broken? Or just really hard?

(Second contrarian post in a row. Need to do some more positive thinking!)

Scarcely a day goes by when I don’t read something somewhere on the internet about how academia is broken. Usually, this centers around peer review of papers, getting an academic job, getting grants and so forth. God knows I’ve contributed a fair amount of similar internet-fodder myself. And just for the record, I absolutely do think that many of the systems that we have in place are deeply flawed and could do with a complete overhaul.

But what do all these hot-button meta-science topics have in common? Why do they engender such visceral reactions? I think they are all about the same basic underlying issue, namely competition for limited resources (spots in high impact journals, academic jobs, grant funding). I think we can and should fix the processes by which these resources are apportioned. But there’s also no getting around the fact that there are limited resources, and as such, there will be a large number of people dissatisfied with the results no matter what system we choose to use.

Take peer review of papers. Colossal waste of time, I agree. Personally, the best system I can envision is one where everyone publishes their work in PLOS ONE or equivalent with non-anonymous review (or, probably better, no review), then “editors” just trawl through that and publish their own “best of” lists. I’m sure you have a favorite vision for publishing, too, and I’m guessing it doesn’t look much like the current system–and I applaud people working to change this system. In the end, though, I anticipate that even if my system was adopted, everyone (including me) would still be complaining about how so and so hot content aggregator is not paying attention to their own particular groundbreaking results they put up on bioRxiv. The bottom line is that we are all competing for the limited attentions of our fellow scientists, and everyone thinks their own work is more important than it probably is, and they will inevitably be bummed when their work is not recognized for being the unique and beautiful snowflake that they are so sure it is. Groundbreaking, visionary papers will still typically be under-recognized at the time precisely because they are breaking new ground. Most papers will still be ignored. Fashionable and trendy papers will still be popular for the same reason that fashionable clothes are–because, umm, that’s the definition of fashion. Politics will still play a role in what people pay attention to. We can do pre-publication review, post-publication review, no review, more review, alt-metrics, old-metrics, whatever: these underlying truths will remain. It’s worth noting that the same sorts of issues are present even in fields with strong traditions of using pre-print servers and far less fetishization of publishing in The Glossies. I think it's the fear and heartbreak associated with rejection by one's peers (either by reviewers or by potential readers) that is the primary underlying motivation for people to consider alternative approaches to publishing–it certainly is for me. We should definitely consider and implement alternatives, but I think it's worth considering that the anguish that comes from nobody appearing to appreciate your work will always be present because other people's attention is a limited and precious resource that we are all fighting for. [Update 8/25: same points made here and here by Jeremy Fox]

For trainees, the other “great filter event” they probably experience is getting a faculty position. Yes, the system is probably somewhat broken (in particular with gender/racial disparities that we simply must address), although compared to peer review of papers, search committees are far more deliberate in their decision making, precisely because the stakes are so much higher. Yes, we can and should encourage and support students considering other career paths. I guess what I’m saying is that even if everyone went into science with their eyes wide open, with all the best mentoring in the world, the reality is that there are more dreamers than dream jobs available. That means many people who feel like they deserve such a position (and certainly many of them do) are not going to get one. And they probably won’t be happy about it.

(Sidebar about career path stuff: to be frank, most of the trainees I’ve met are pretty realistic about their chances of getting a faculty position and have many other plans they are considering as well, and so I think some of the “I’m not getting support and advice about other career choices” meme is overblown, especially these days. We can blame the “system” for somehow making it seem like doing something other than academics is a failure, and there is definitely some truth to that. At the same time, I think it’s fair to say that many people do a PhD because being a scientist was a long-held dream from childhood, and so if we’re being totally honest, at least some of the sense of failure comes from within. It’s a lot easier to say abstractly that we should be realistic with trainees and manage expectations and so forth than to actually look someone in the eye and tell them to their face that they should give up on their dream. I agree that this is the sort of hard stuff PIs should do as part of their jobs–I’m just saying it’s not as easy as it is sometimes made out to be. And yes, I’ve personally experienced both sides of this particular coin.)

Look, nobody likes this stuff. Rejecting is about as much fun as being rejected, and I FULLY support all efforts to make our scientific processes better in every possible way. All I’m saying is that even the best, most utopian system we can think of will suffer from inequities, politics, fashions, etc. because that is just human nature. The current systems are currently largely run by scientists, after all, and so we really have nobody to blame but ourselves. I realize it’s much easier to blame Spineless Editor From Fancy Journal, Nasty Reviewer with a Bone to Pick, Crusty Old Guy on the Hiring Committee, or Crazy Grant Reviewer with a Serious Mental Health Issue, and I’ve for sure blamed all those people myself when I have failed at something. Maybe I was right, or maybe I was wrong. I’m pretty sure it’s mostly a rationalization that lets me keep my chin up in what can sometimes be a fairly demoralizing line of work. Science is a human endeavor. It will be as good and as bad as humans are. And when the chips are down and there’s not enough to go around, that can bring out both the best and the worst in us.

Sunday, August 17, 2014

Another approach to having data available, standardized and accessible: who cares?

I once went to a talk by someone who spent most of their seminar talking about a platform they had created for integrating and processing data of all different kinds (primarily microarray). After the talk, a Very Wise Colleague of mine and I were chatting with the speaker, and I said something to the effect of “Yeah, it’s so crazy how much effort it takes to deal with other people’s datasets” and both the speaker and I nodded vigorously while Very Wise Colleague smiled a little. Then he said, “Well, you know, another approach to this problem is to just not care.” Now, Very Wise Colleague has forgotten more about this field than I’ve ever learned (times 10), so I have spent the last several years pondering this statement. And I think that as time has gone on and I’ve become at least somewhat less unwise, I think I largely agree with Very Wise Colleague.

I realize this is a less than fashionable point of view these days, especially amongst the “open everything” crowd (heavy overlap with the genomics crowd). I think this largely stems from some very particular aspects of genomics data that are dangerous to generalize to the broader scientific community. So let’s start with a very important exception to my argument and then work from there: the human genome. I think our lab uses the human genome on pretty much a daily basis. Mouse genome as well. As such, it is super handy that the data is available and easily accessed and manipulated because we need the data as a resource of specific important information that does not change or (substantially) improve with time or technology.

I think this is only true of a very small subset of research, though, and leads to the following bigger question: when The Man is paying for research, what are they paying for? In the case of the genome, I think the idea is that they are paying for a valuable set of data that is reasonably finalized and important to the broader scientific endeavor. Same could be said for genomes of other species, or for measuring the melting points of various metals, crystal structures, motions of celestial bodies, etc.–basically anything in which the data yields a reasonably final value of interest. For most other research, though, The Man is paying us to generate scientific insight, not data. Think about virtually every important result in biomedical science from the past however long. Like how mutations to certain genes cause cells to proliferate uncontrollably (i.e., genes cause cancer). Do we really need the original data for any reason? At this point, no. Would anyone at the time have needed the original data for any reason? Maybe a few people who wanted to trade notes on a thing or two, but that’s about it. The main point of the work is the scientific insight one gains from it, which will hopefully stand the test of time. Standing the test of time, by the way, means independent verification of your conclusions (not data) in others labs in other systems. Whether or not you make your data standardized and easily accessible makes no real difference in this context.

I think it’s also really important before building any infrastructure to first think pretty carefully about the "reasonably final" part of reasonably final value of interest. The genome, minor caveats aside, passes this bar. I mean, once you have a person’s genome, you have their sequence, end of story. No better technology will give them a radically better version of the sequence. Such situations in biology are relatively rare, though. Most of the time, technology will march along so fast that by the time you build the infrastructure, the whole field has moved on to something new. I saw so many of those microarray transcriptome profile compendiums and databases that came out just before RNA-seq started to catch on–were those efforts really worthwhile? Given that experience, is it worth doing the same thing now with RNA-seq? Even right now, although I can look up the HeLa transcriptome in online repositories, do I really trust that it’s going to give me the same results that I would get on my HeLa cells growing in my incubator in my lab? Probably just sequence it myself as a control anyway. And by the time someone figures this whole mess out, will some new tech have come along making the whole effort seem hopelessly quaint?
 Incidentally, I think the same sort of thinking is a pretty strong argument that if a line of research is not going to give a reasonably final value of interest for something, then you better try and get some scientific insight out of it, because purely as data, the whole thing will likely be obsolete in a few years.

Now, of course, making data available and easily shared with others via standards is certainly a laudable goal, and in the absence of any other factors, sure, why not, even for scientific insight-oriented studies. But there are other factors. Primary amongst them is that most researchers I know maintain all sorts of different types of data, often custom to the specific study, and to share means having to in effect write a standard for that type of data. That’s a lot of work, and likely useless as the standards will almost certainly change over time. In areas where the rationale for interoperable data is very strong, then researchers in the field will typically step up to the task with formats and databases, as is the case with genomes and protein structures, etc. For everything else, I feel like it’s probably more efficient to handle it the old fashioned way by just, you know, sending an e-mail–I think personal engagement on the data is more productive than just randomly downloading the data anyway. (Along those lines, I think DrugMonkey was right on with this post about PLOS’s new and completely inane data availability policy.) I think the question really is this: if someone for some reason wants to do a meta-analysis of my work, is the onus on me or them to wrangle with the data to make it comparable with other people’s studies? I think it’s far more efficient for the meta-analyzer to wrangle with the data from the studies they are interested in rather than make everyone go to a lot of trouble to prepare their data in pseudo-standard formats for meta-analyses that will likely never happen.

All this said, I do definitely personally think that making data generally available and accessible is a good thing, and it’s something that we’ve done for a bunch of our papers. We have even released a software package for image analysis that hopefully someone somewhere will find useful outside of the confines of our lab. Or not. I guess the point is that if someone else doesn’t want to use our software, well, that’s fine, too.

Thursday, August 14, 2014

An argument as to why the great filter may be behind us

A little while back, I read a great piece on the internet about the Fermi Paradox and the possibility of other life in our galaxy (blogged about it here). To quickly summarize, there are tons of earth-like planets out there in our galaxy, and so a pretty fair number of them likely have the potential to harbor life. If we are just one amongst the multitudes, then some civilizations must have formed hundreds of millions or billions of years ago. Now, there’s a credible argument to be made that a civilization that is a few hundred million years more advanced than we are should actually have developed into a “Type III” civilization that has colonized the entire galaxy (gets into the somewhat spooky concept of the von Neumann probe). The question then is why haven’t we actually met any aliens in a galaxy that seemingly should be teeming with life.

There are two general answers. One is that life is out there, but we just haven’t detected it yet, and that online piece does a good job of going through all the possible reasons why we might not yet have detected any life out there. But the other possibility, and the one that I think is frankly a bit more plausible, is that there aren’t any Type III civilizations out there. Yet. Will we be the first? That’s what this piece by Nick Bostrom is all about. The idea is that somewhere in the history of a Type III civilization is an event known as the great filter. This is some event during the course of civilization development that is exceptionally rare, thus providing a great filter between the large number of potential life-producing worlds out there and the complete and utter radio silence of the galaxy as we know it. What are candidates for the great filter? Well, the development of life itself is one. Maybe the transition from prokaryotic life to eukaryotic life. Or maybe all civilizations are doomed to destroy themselves. So in many ways, the existential question facing humanity is whether this great filter is behind us (yay!) or ahead of us (uh-oh!). One fun point that Nick Bostrom makes is that it’s a good thing we haven’t yet found life on Mars. If we did find life on Mars, then that means that the formation of life is not particularly rare, meaning that cannot be a great filter event. The more complex the life that we found on Mars, the worse and worse it would be, because that would eliminate ever greater number of potential great filter candidates behind us, meaning that it is likely that the great filter is ahead of us. Ouch! So don’t be rooting for life on Mars. But while the presence of life on Mars would likely indicate that the great filter is ahead of us, the absence of such life doesn’t say anything, and certainly doesn’t prove that the great filter is behind us. Hmm.

So for a while, I thought this was a classic optimist/pessimist divide: if you’re an optimist, then you believe the filter is behind us, pessimist, ahead of us. But I think there’s actually a rational argument to be made that it’s behind us. Why? Well, I think there are two possible categories of great filter events ahead of us. One is destruction of all life by outside forces. These could be asteroid impacts, gamma ray bursts, etc. Bostrom makes a good argument against these being great filters because a great filter has to be something that is almost vanishingly rare to get past. So even if only 1/1000 civilizations made it past these asteroids and bursts and whatever, then it’s still not a great filter, given the enormous number of potentially life-sustaining planets out there. The other category of filter events (which is in some ways more depressing) are those that basically say that intelligent life is inherently self-destructive, along the lines of “we cause global warming and kill ourselves”, or global thermonuclear war, etc. This is the pessimists line of argument.

Here’s a statistical counterargument in favor of the filter being behind us, or at least against the self-destruct scenario. Suppose that the civilizations are inherently self-destructive and that the filter event ahead of us. Then I would argue that we should see the remnants of previous civilizations on our planet. The idea is that as long as a civilization’s self-destruction doesn’t cause the complete and total annihilation of our planet (which I think unlikely, more in a bit), then conditions would be favorable for life to again evolve until it hits the filter again. And again. And again. Statistically speaking, it would be very unlikely for us, right now, to be the very first in this series of civilizations. Possible, but unlikely.

Now, this argument relies on the notion that whatever these potential future filter events are, they don’t prevent the re-evolution of intelligent life. I think this is likely to be the case. Virtually every such candidate we can think of would probably destroy us, maybe even most life, but it’s hard to imagine them killing off all life on earth, permanently. Global warming? It’s been hot in the earth’s past, with much higher levels of CO2, and life thrived, probably would again. Nuclear war or extreme pollution? Might take a billion or two more years, but eventually, intelligent cockroaches would be wandering the earth in our place. Remember, it doesn’t have to happen overnight. I think there are very few self-destruct scenarios that would lead to COMPLETE destruction–all I can think of are events that literally destroy the planet, like making a black hole that eats up the solar system or something like that. I feel like those are unlikely.

So where does that leave us? Well, I can think of two possibilities. One is that we are not destined for self-destruction, but that the “filter” event is one that just prevents us from colonizing the galaxy. Given our current technological trajectory, I don’t think this is the case. Thank god! Stasis would just feel so… ordinary. The other much more fun possibility is that we are the first ones past the great filter, and we’re going to colonize the galaxy! Awesome! Incidentally, I’m an optimist and an (unofficial) singularitarian. So keep that in mind as well.

So what was the great filter, if it really is behind us? I personally feel like the strongest candidate is the development of eukaryotic life (basically, the development of cells with nuclei). You can get some sense for how rare something is by seeing how long it took to happen, given that conditions aren’t changing. This is hard, because conditions are always changing, but still. Take the development of life itself. Maybe a couple hundred million years? That’s a long time, but not that long, and moreover, conditions on early Earth were changing a lot, so it could be that it didn’t take very long at all once the conditions were right. But eukaryotic life? Something like 1.5-2 billion years! Now that’s a long time, no matter how cosmically your timescale. And the “favorable conditions” issue doesn’t really apply: presumably the conditions favorable to eukaryotic life aren’t really any different than for prokaryotic life, since it's just different rearrangements of the same basic stuff. So prokaryotic life just sat around for billions of years until the right set of cells bumped into each other to make eukaryotic life. Seems like a good candidate for a great filter to me.

Incidentally, one of the things I like about thinking about this stuff is how it puts life on earth in perspective. Given all the conflicts in the news these days, I can’t help but wonder that if we all thought more about our place in the universe, maybe we’d stop fighting with each other so much. We should all be working to better humanity and become a Type III civilization! The wisdom of a fool, I suppose...

Friday, August 8, 2014

A taxonomy of papers

As the years go by, I feel like I’ve seen enough papers go through the meat grinder that they’ve started to fall into a variety of categories, based not so much on content as on the process of getting them out the door. Here are a few I could think of.

The Slam Dunk: This is the paper that you just know is going to make it into Nature. It’s in a hot field, the results are clean, the story is all in place. And kaboom! It slides right in. Yes, these papers do exist. They are pretty rare, though, because most of these actually end up being…

The Face Plant: This is the paper that you just know is going to make it into Nature. It’s in a hot field, the results are clean, the story is all in place. And kaboom! “Thanks for submitting your paper. Unfortunately, we think it’s too boring to even ask anyone else to read it. Perhaps a more boring journal would thus be more appropriate.” This can sometimes lead to…

The Snowball: These papers start life as a nice, baseball-sized lump of snow. And then it starts rolling down the hill, accumulating more and more experiments until at the end, nobody can control it, and you better get the hell out of the way. I suppose some of these papers fall apart under their own weight, leaving behind a set of smaller papers for people to pick up. I guess. Mostly, I just see these things get published with a 75 page supplement under the title “The collected works of Jane Doe, PhD”.

The Van Gogh [via Nikolai Slavov]: This is a variant of the Face Plant in which the results are truly visionary new science that actually matters, and that just makes everyone... uncomfortable. Might even make it to review, in which case you get reviews that say stuff like "These results cannot possibly be true" or "I just don't believe these claims" or "These results are inconsistent with everything we know about X". Yeah, well, that's just, like, your opinion, man! These papers are so depressing. [Wonderful collection of "Van Goghs" from Nikolai.]

The Baby Bird with the Broken Wing: Ah, the baby bird with the broken wing! So frail and delicate, with results held together by the thinnest of threads. If only everyone could see your inner beauty! And so it is nursed back to health with tons of experiments, each aiming to shore up the results and stamp out all those pesky inconsistencies. And so it is nursed back to health. But it may ultimately never fly again. And even if it does, it’s sort of messed up looking. At which point it’s officially…

The Glass of Lemonade: This is the paper where all the results are somewhat ambiguous, every experiment requires months of debugging, every question ends with a new twist. And what do you do when life gives you a bunch of lemons? You make lemonade! The secret to making lemonade is to add sugar. Lots of it. The scientific equivalent of sugar is the z-score, the p-value, and so forth.

The Ugly Duckling: This is the paper that starts out with a simple little idea that just keeps on working out until the paper basically just writes and publishes itself. Sort of like the slam dunk, but it was never intended to be a flashy paper, it just somehow ended up that way. Yes, these papers exist as well. They are so awesome!

The Heart Attack: This is one of the most dreaded types of paper: the paper that has a fatal flaw. You know it, and even worse, the reviewers know it. It’s weird how this happens, but it does. Take heart, though, at the following rejection letter a colleague told me they received from the editor of a Nice™ journal [paraphrased]: “We have received the reviews, and based on the reviewers comments, we sadly cannot accept your paper. In particular, they note a serious flaw in your work, and believe that this flaw is irrecoverable. [...] We would, however, like to offer for you to transfer this manuscript to our lesser sister journal…” Seriously, you can’t make this stuff up.

The Frankenstein: This type of paper results from the merger of two projects into one. Often, this happens when two different lab members are working on different aspects of the same topic, and then somewhere, someone decides to put these two papers together. Nothing quite matches up, and if two people are involved, it can definitely lead to some hard feelings. The Frankenstein can also spontaneously arise from The Snowball after it collects enough additional experiments.

The Anchor: This is the stupid little review paper nobody will ever read that you agreed to write 8 months ago and was due 2 months ago and now you are looking at the screen ready to blow your brains out for having agreed to write it. Until you remember that there’s Twitter. Ah, Twitter!

Any others?

Sunday, August 3, 2014

How much do PIs work?

Just read this very astute blog post by Meghan Duffy about how much academics say they work, how much they actually work, and how much they should work. The gist of it is that there’s a myth that you need to work 80 hours a week to get tenure, that virtually all academics don’t work that much, and that working that much would be counterproductive anyway. I agree completely! I personally don’t work 80 hours a week, and I don’t think I’m working particularly more or less than anyone else. The upshot of the post is that we should stop promulgating the myth of 80 hours a week, and stop believing it when other people say it.

Hmm. I’m not 100% sold that it’s the myth itself that’s really the problem. In talking with my junior prof friends, we don’t really talk about how many hours we work or anything like that, and I certainly haven’t had anyone tell me they work 80 hours a week. But I think it’s safe to say that most of us feel overwhelmed by our jobs–it FEELS like we’re working 80 hours a week. I really think it is the perception that we’re up to our eyeballs that creates problems more than the reality of the number of hours we work. Why do we have this feeling?

I think there are two reasons (aside from the obvious one that yes, this is objectively a busy time of life). One is that once I started as a PI, my time was no longer my own in the same way it was before. As a postdoc, it’s very easy to measure your daily productivity: I needed to collect this data, and I collected it; write this paper, and I wrote it–checkbox ticked. Now, my day is filled with a large number of diffuse scientific tasks and specific administrative tasks, neither of which yield the same sort of fulfillment that actually doing science itself has. I think this contributes to the feeling of not “getting anywhere”, which leads to a constant sense that we need to do something, hence always feeling busy.

What to do about that? I don’t really do many (if any) experiments myself in lab anymore–once I started teaching, I was too overwhelmed to keep it up, and once that became more manageable, it was hard to get back on the experimental treadmill (mostly, I just fix stuff around the lab now). I was lamenting this fact with a friend who started his lab around the same time as me, and he said something that really stuck with me: “The best use of our time is to help our students get THEIR experiments working.” I think that’s very true, and that sentiment really helped me get over the guilt associated with not doing experiments anymore. But what it doesn’t help with is generating the feeling of accomplishment that let’s you sleep well at night knowing that you, personally, made some tangible contribution to the progress of humanity (or whatever it is that we do).

So lately, I’ve tried to assign myself a reasonable set of science tasks to do, like analyze a dataset or solve some computational problem. This has given me a real feeling of satisfaction, and has made me feel more productive. The amazing thing is that it also makes me feel less harried at the end of the day, because I can point to something I care about and say “I did that!”, so I feel much less like I’m behind on every single thing (although I’m still just as behind as ever). I guess the point was to inject a little bit of positive reinforcement into my life, and I think it’s given me more energy to tackle all the other stuff I need to get done. Of course, the key is to set reasonable expectations and set aside some time to complete the task, but that’s a whole other blog post… :)

The second reason that it feels like we’re working 80 hours a week is that on some level, we actually are–I think it’s just the nature of being a scientist. I’m into what I do, and I think about it all the time. I often think about ideas and projects in the lab before I go to sleep and when I wake up. Not every time, but a lot of the time. I’m running into the lab right now on Sunday morning because someone smelled burning plastic and I want to make sure the lab doesn’t burn down (update: issue with fan coil). Sometimes I’ll meet a colleague for lunch. We might talk about kids or teaching or research. Does that count as work? Maybe I’ll have an idea that springs from that conversation. Maybe get an idea in the shower. Is that work? What about writing this blog post?

Don’t get me wrong, it’s not like science is the sole purpose of my existence. I have small children, am married to a non-scientist, have non-scientist friends, and watch tons of crummy action movies. All I’m saying is that I love science. It is a pervasive part of my life, and I don’t feel a need to apologize for that–certainly no more than a need to apologize for watching virtually every Steven Seagal movie ever made (check out this one, where he plays a Russian mobster named–wait for it–Ruslan!). Perhaps this leads to feeling a bit overwhelmed sometimes, but overall, I am passionate about what I do and enjoy it tremendously.

Anyway, I think what I’m trying to say is that I feel like it’s the perception of how busy we are that matters more than the actual hours, and maybe the best way to improve our well being in that regard is to not focus on how long we work but rather how to make those hours as meaningful and fulfilling as possible.

Friday, August 1, 2014

How to write fast

Being a scientist means being able to write effectively about your science: papers, grants, e-mails, reviews, blogs, twitters, facebooks, whatever. And being an efficient scientist means being able to write about your science both effectively and fast. Striking this balance is a struggle for most people, and solutions are likely highly personal, but here are a few things I’ve found have worked for me (more interesting/less generic ones towards the end):
  1. Deadlines are your friend. Wait until the last minute and write in a big spurt. I personally feel that the last 10% takes way more than 10% of the time, but actually makes much less than 10% difference in the final outcome (grant, paper, etc.). Being up against a deadline is unpleasant, but cuts down on this relatively low-payoff time.
  2. If you do have to write early for whatever reason, set an artificial early deadline and try to finish it by then as though it were a hard deadline. This has another bonus…
  3. … which is to put the piece of writing away for a week and not think about it, then come back to it. This distance gives you a sufficiently long break that editing your own writing will be much more effective and efficient than if you just edit it continuously.
  4. Don’t be afraid of the blank page. For me, the blank page is a period of reflection and thought. Often, I will look at a blank page for a week, during which time I’ve really thought about what I wanted to say, at which point it all comes out very quickly and relatively coherently. Whenever I force myself to write before I'm ready, I just end up rewriting it anyway.
  5. If you’re having a hard time explaining something in writing, just try to explain it to someone verbally. For me, this really helps me clearly formulate something. Then just write that down and see what happens. Much faster than struggling endlessly with that one troublesome sentence.
  6. Don’t worry about word limits while you’re writing. I’ve found that writing with the word limit in mind makes my writing very confusing and overly compressed because I try to squeeze in too many thoughts in as few words as possible. I find it’s more efficient to just write what I want to say as clearly as possible and then come back and cut as necessary. And be brutal about trimming and don’t look back.
  7. Watch out for “track changes wars”. If you’re writing with other people (who doesn't these days), there is a natural tendency to push back against other people’s edits. This can lead to a lot of back and forth about minor points. One way to handle this is to just accept all changes in the document and read it clean. If whatever it was is a real problem, it will still stand out.
  8. Learn the “templates” for scientific writing. Most scientific writing has a particular form to it, and once you learn that, it makes for easy formulas for getting ideas out of your mind and onto the page. These templates vary from format to format. For instance, in a paper, often the results section will go something like “Our findings suggested that X. For X to be true, we reasoned that Y could be either A or B. In order to test for Y, we performed qPCR on…” Rinse and repeat. If you find it sounding repetitive, just use your thesaurus, and learn the 3-4 common variants for the given sentiment (e.g., “we reasoned”, “we hypothesized”, “we considered whether”) and cycle through them. It’s all rather prosaic, but it will get words on the page. You can channel your inner Shakespeare in revision. Same thing for grants.
  9. Regarding templates for grants, I have basically found it much easier to work from someone else’s grant. Many grants have very vague outlines for overall structure, and so ask a friend for theirs and try to stick with that. It will save you hours of wondering whether this or that structure or style can be funded. Which reminds me: be sure to ask people who, you know, actually got the grant… :)
  10. Some people really like writing out an outline of the whole thing first. I’ve never really been able to get into that myself. But a few times lately when I’ve really been up against a deadline, I tried what I can perhaps best call a “short form temporary outline”. The idea is that I have to write a paragraph, and it has to say 4 things. Write out a very quick outline just below the cursor with bullet points of these 4 things in a reasonable order. This should just take a couple minutes. Then, well, just start writing them out. If a thought comes to you while writing, just add it to the outline so you remember. It’s sort of like a to-do list for the paragraph. I’ve found this made writing faster because I didn’t feel like I had to try to remember a lot of stuff in my head, thus freeing my mind to just write. Next paragraph, next outline.
  11. [Updated, 8/15]: Forgot this really important one–don't be afraid to just rewrite something wholesale. Sometimes I'll write something that just... sucks. But at least I got it out of my system. Often, in the course of writing it, I will discover what I really meant to say. Better to just start fresh and write it again the right way. It's like renovating an old house–often would be easier to just tear it down and start over.
Oh, and avoid passive voice. The question of how to reduce the crushing writing load we all are facing to begin with is perhaps a topic for another blog post... :)

Wednesday, July 23, 2014

The hazards of commenting code

- Gautham

It is commonly thought that good code should be thoroughly commented. In fact, this is the opposite of good practice. A coding strategy that does not allow the programmer to use coding as a crutch is good. Programs should be legible on their own.

Here are the most common scenarios:


  • Bad. The comment is understandable and it precedes an un-understandable piece of code. When the maintainer of the code goes through this, they still have to do a lot of work to figure out how to change the code, or to figure out where the bug might be.
  • Better. The comment is understandable, and the line of code is also understandable. Now you are making the reader read the same thing twice. This also dilutes code into a sea of just words.
  • Best. There is no comment. Only an understandable piece of code due to good naming, good abstractions, and a solid design. Good job!
  • Terrible. The comment is understandable. The code it describes does not do what the comment says. The bug hides in here. The maintainer has to read every piece of your un-understandable code because they have realized they can't trust your comments, which they shouldn't anyway. And so all your commenting effort was for nothing. This scenario is surprisingly common. 

When are comments acceptable?
  • Documentation. If you have a mature set of tools, you might have them to the point that the user can just read the manual, rather than read the code. This is intended for users, not maintainers, and usually takes the form of a large comment that automated documentation generation tools can interpret.
  • Surprising/odd behavior of libraries you are using. Matlab has some weird things it does, and sometimes I like to notify the maintainer that this line of code looks this way for a reason (especially if the line of code is more complex than a naive implementation would appear to require because of subtleties of the programming language or libraries/packages being used.) It can be counter-argued that rather than put in a comment you could put in a set of unit tests that explore all the edge-case behavior and encapsulate byzantine code into functions whose names describe the requirements that the code is trying to meet.
  • When your program is best explained with pictures. Programs are strings. But sometimes they represent or manipulate essentially graphical entities. For example, a program that represents a balanced binary search tree involves tree rotation manipulations. These manipulations are very difficult to describe in prose, and so they are similarly difficult to describe in code. Some ASCII art can be a real life saver in this kind of situation, because code is a poor representation of diagrams. So think of it this way: don't let yourself write text in comments but its okay to draw figures in the comments.

For more on these ideas, please just get Robert Martin's book on Clean Code.