Sunday, November 23, 2014

The most annoying words in scientific discourse

Most scientific writing and discourse is really bad. Like, REALLY bad. How can we make it better? There are some obvious simple rules, like avoiding passive voice, avoiding acronyms, and avoiding jargon.

I wanted to add another few items to the list, this time in the form of words that typically signify weak writing (and sometimes weak thinking). Mostly, these are either ambiguous, overused, or pointless meta-content just used to mask a lack of real content. Here they are, along with my reasons for disliking them:

Novel. Ugh, I absolutely hate this word. It’s just so overused in scientific discourse, and it’s taken on this subtext relating to how interesting a piece of work is. Easily avoided. Like “Our analysis revealed novel transcript variants.” Just say “new transcript variants”.

Insight. One of the best examples of contentless meta-content. If any abstract says the word insight, nine times out of ten it’s to hide a complete lack of insight. For example: “Our RNA-seq analysis led to many novel insights.” Wait, so there are insights? If so, what are these insights? If those insights were so insightful, I’m pretty sure someone would actually spell them out. More than likely, we’re talking about “novel transcript variants” here.

Landscape. Example of a super imprecise word. What does this mean anyway? Do you mean an arrangement of shrubbery? Or do you mean genome-wide? In which case, say genome-wide. Usually, using the word landscape is an attempt to evoke some images like these:


Now exactly what do these images mean? Speaking of which…

Epigenetic. Used as a placeholder for “I have no idea what’s going on here, but it’s probably not genetic”. Or even just “I have no idea what’s going on here whatsoever”. Or chromatin modifications. Or all of this at once. Which is too bad, because it actually is a useful word with an interesting meaning.

Paradigm. Need I say more?

Robust. Use of the word robust is robust to perturbations in the actual intended meaning upon invoking robustness. :)

Impact. As in “impact factor”. The thing that bugs me about this word is that its broad current usage really derives from the Thomson/Reuters calculation of Impact Factor for journal “importance”. People now use it as a surrogate for importance, but it’s always sort of filtered through the lens of impact factor, as though impact factor is the measure of whether a piece of work is important. So twisted has our discourse become that I’ve even heard the word impactful thrown about, like "that work was impactful". It's a word, but a weird one. If something is influential, then say influential. If it’s important, then say important. If an asteroid hits the moon, that’s impact.

These words are everywhere in science, providing muddied and contentless messages wherever they are found. For instance, I’m sure you’ve seen some variant of this talk title before: “Novel insights into the epigenetic landscape: changing the paradigm of gene regulation.”

To which I would say: “Wow, that sounds impactful.”

[Updated to include Paradigm, forgot that one.]
[Updated 12/13: forgot Robust, how could I?]

Saturday, November 22, 2014

Verdict on a (mostly) Bacn-free week of e-mail: totally awesome!

It’s been one week since I tabulated my e-mail and decided to run a few experiments based on the results. Quick recap: I found that I got a lot of Bacn (solicited but often unimportant e-mail, like tables of contents and seminar announcements), and this was contributing to a sense of being overwhelmed by e-mail. So I resolved to do the following:
  1. Filter out primary conveyors of Bacn to a Bacn folder that I would skim through rapidly just a few times a day.
  2. Deal decisively with the e-mail when I read it–either reply or get off the pot, so to speak.
Quick summary is that this experiment has been a great success! I feel much more efficient, less overwhelmed, and less likely to miss important things. Highly recommended.

Here’s a few more details. So I have two e-mail addresses. For the most part, one of them gets all my work e-mail, and the other one is mostly personal, but has a lot of Bacn and spam in it. Before, I had been combining both into my inbox. So that was easy: just check my work e-mail and separate out the personal one to check over on an as needed basis. Of course, I’m still getting a lot of Bacn on my work e-mail, so I then made filters to automatically file Bacn into a separate folder. I initially thought this was going to be super simple. Turns out it was a bit more work than I thought: there are MANY different Bacn providers at Penn. So it took a while to set up a filter for each of them. But it worked: almost all the Bacn went to a specific folder.

The results were glorious! I found I spent much less time looking through all these unimportant e-mails during the day, and then I could batch process them much more efficiently during a period of downtime. There is little better than selecting a huge block of e-mail and deleting them all at once! A few times, I would get a real e-mail from a Bacner that I needed to respond to, but it turns out that they were never urgent nor terribly important, and I could deal with them during this downtime period (which is probably when I should be dealing with them anyway).

I didn’t anticipate how much this e-mail filtering would engender peace of mind. I guess I was expending more mental energy that I thought processing all these different e-mails in a single stream. The steady stream of notifications that we all know we should ignore but don’t thinned out considerably, and I felt like my focus was better. I didn’t quantify actual productivity gains there may have been (although I suspect there was some), but I can definitely say that the perceived quality of e-mail life went up considerably. Definitely felt like I was in much more control over what I was doing. Basically, it made it much easier to process e-mail the way I always knew I should in theory but rarely actually did in practice.

I think this filtering also really helped with the other aspect of my experiment, which was to be decisive (actually something I have been working on in general). The idea here was to read each e-mail only once before doing something with it, which means either marking as read or replying. Or at least getting as close to this ideal as possible. Since all the e-mails in front of me now have a similar status, I found it a bit easier to do this, because I’m not changing “modes” from one e-mail to the next.

Decisiveness is hard, and something I’ve struggled with for a long time, both in the context of e-mail or otherwise. And being deliberate is not necessarily a bad thing. But I think most of us tend to undervalue our time, and I feel like being decisive is making a tradeoff between making the best possible decision slowly and making a good enough decision quickly. Or, as is more often the case, making the best possible decision slowly and making the best possible decision quickly–indeed, I feel like much of the time, the “decision making process” is really more like a slow process of rationalizing a decision you’ve already essentially made. So I’m trying to just go with my instincts and then thinking, well, if I made a mistake, so be it. The key thing is to think to myself “Well, am I going to get any new information that might change my decision? If not, then go for it.” That actually takes care of a lot of situations, e-mail or otherwise.

UPDATE: Forgot to mention that I got two e-mails this past week from close collaborators with the subject line "Not Bacn". :)

Sunday, November 16, 2014

A week in my e-mail life

[Follow up post here: Verdict on a (mostly) Bacn-free week of e-mail: totally awesome!]

[Note: This is a longish post, so here’s an “abstract” that gets across the main points: Academics get a lot of e-mail. I decided to catalog my e-mails for the week to see if I could identify any patterns. I found that a large amount of my e-mail was “Bacn”, meaning e-mails that I am in some way supposed to get, but are typically not very important, like seminar announcements, etc. A lot of the more research-oriented e-mail was related to logistics, like shipping, etc. As for what to do about it, I think the number one thing is to pre-filter a bunch of the Bacn, which typically just comes from a relatively limited number of easily identified people and only very very rarely requires any sort of immediate action. This will help make it easier to process it in batch mode, which is another area where I could really improve how I handle e-mail, rather than replying in a more "real time" fashion. And I will try to be more decisive in handling e-mail. An update on how all this worked next week.]

As is the case for most academics these days, I get a lot of e-mail. And as is the case for most academics, I love to complain about how much time it takes up. I was thinking about this recently when I came across the line “E-mail is everyone else’s to do list for you.” Which I thought was an interesting way of thinking about it. I mean, just because someone has my e-mail address doesn’t necessarily give them the right to command my attention, right? But then I thought a bit more, and I wondered if my attention really is being dragged unnecessarily in unwanted directions, or is it primarily spent on things that I want to pay attention to. Are there ways that I can make myself more efficient?

So I decided to catalog all the e-mail I got in the last week. First, a couple notes on methodology. I basically just looked through my e-mail for the past week and tried not to delete anything (which I normally don’t do, except for spam). Going through, I categorized the e-mail (more on that later), kept track of whether I replied or forwarded the e-mail, and how long it took me to reply. I also kept track of whether the e-mail was initiated by myself or came from someone else and whether the e-mail was directed to me specifically or whether it was just a general broadcast (some judgement calls in this).

Here's what I found:

Good news is that I don't instigate a lot of e-mail, which makes me feel better about myself–in fact, so few that I didn’t really think it was worth doing a similar analysis on my sent e-mail. But I did reply to a relatively large number of e-mails. But now that I think about it, I would guess this is the case for most academics. Most of their e-mail misery comes from others randomly bugging you, and I think it’s usually just a handful of others.

As for speed of reply, I’m generally quite fast, but there’s a long tail:
Zooming in on the short time-scale:

A pretty substantial number of replies actually happened within minutes, sort of like texting or something, then a tail of longer times to reply.  I actually expected this to be a bit more bimodal, but it's pretty unimodal, but with a long tail. I did notice that I have chunks of reply e-mail at the beginning and end of the day, which is good–my intention lately has definitely been to try and do as much batch processing as possible. I think I could be more disciplined about this, though.

Of course, the key piece of data is what different sorts of e-mail I get. Here’s how I broke it down:
  1. Spam
    1. Spam spam. Like, Nigerian Bankers who have a great deal on Viagra for you. 
    2. Science spam. This is various marketing for HPLC equipment or strange journals or whatever. I get a lot of this, presumably because various vendors have sold my e-mail to direct marketers.
  2. Bacn. Bacn is a very interesting category. It is like spam, but a level up: it’s something where there is some sort of relationship there, including perhaps direct solicitation of the e-mail. Here is how I broke that down:
    1. Personal. e.g. NYtimes.com table of contents.
    2. TOC. Tables of contents of various journals.
    3. Science. ResearchGate, Nature Publishing Group
    4. Penn Bacn. Seminar announcements, thesis defenses, visitors, latest fund-raising drive.
  3. Scheduling. This includes setting up a meeting or lunch or whatever with someone, thesis committee meeting times, etc.
    1. Scheduling Bacn. These are scheduling e-mails in which you’re just sort of along for the ride. You don’t have to do anything, but the e-mail is there, perhaps asking you if you want to meet with so and so.
  4. Teaching. Students asking for help or whatever.
  5. Evaluations and Letters. Someone asking for you to evaluate a person or paper or whatever in some way, shape or form. An important part of our lives. I’m of course happy to do this for people who have been in my life in the lab. Less exciting is...
    1. Evaluations and Letters Bacn. This is any sort of evaluation of someone or something from outside. This includes, but is not limited to, reviewing papers.
  6. Research. This is what we’re supposed to be doing, right? Well, that all depends…
    1. Logistics. This is all stuff about orders, handling of manuscripts, lab organization, etc.
    2. Collaborations. This is managing various collaborations with other groups. This does not include close collaborators with whom we are doing real science together with. It’s more just like people whom we’re doing a one-off experiment with. Often, there is overlap with the Logistics category.
    3. Research Bacn: Seems like a weird category, right? These are what I would consider relatively unsolicited e-mails that are random and tangential to your research effort, but are science related. Like, someone sends you a link to a paper they wrote. Or someone had a thought after meeting with you. Or something. This is not quite Bacn in the sense that you may not necessarily be able to ignore all of it, but it’s not quite important enough not to be Bacn.
    4. Actual Research: This is, you know, actual research. Also a proxy for what I consider the most important to me. Mostly conversations with people whom we are working with closely about science. This can include making decisions about scientific goings-on in the lab, or thoughts on an experiment, or how to interpret something–basically, the fun part of it.

So what’s the breakdown? Here are some pie-charts (I’ll get to strategies I’m thinking about implementing later).








Let’s start with spam. Turns out I don’t get that much of it. It certainly doesn’t take that long to get rid of them. In fact, I have to say that I sometimes rather enjoy them for their humorous qualities. Here are four of my favorite examples:

Message 1:
Subject: ВОССТАНОВИМ ЗАПУЩЕННЫЙ УЧЕТ
Вы руководитель от Вас внезапно ушел бухгалтер!
Вас предали? Вы подставлены? Завтра налоговая?


БУХГАЛТЕРСКИЙ БЕСПРЕДЕЛ!!!

Message 2:

Subject: Лучший Новогодний подарок - безопасность ваша и ваших близких!

Message 3:
Subject: Your  Account Was Banned
This is a joke :)

Than trying to work mounted on clumsy, long webfeet by the
ecriture artiste which the french writers that hears. Similarly,
employing the eye, it is a moment without devoting his heart
upon mahadeva. Towards the abode of bhishma, casting aside

their.

Message 4:
Subject: Mandy - 100% results.
Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lolGy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.Gy, lol.

I think the Gmail spam filters do a pretty good job of getting rid of most of this cruft.

Bacn. This was perhaps the biggest surprise. Most of what I get is Bacn. And it’s super annoying to sort through, due primarily to the very nature of Bacn, which is something that you might conceivably be interested in. And one of the worst offenders is Penn! The amount of Penn Bacn I get is crazy. It’s primarily seminar announcements (and reannouncements (and re-reannouncements)) and various other random stuff that I may in theory want to know about, but I typically won’t. And it typically comes from a few prime Bacn distributors. The only problem is that I will sometimes get something important from these Bacners, and so I can’t just automatically filter them out into the trash. Hmm. These typically come mostly in the morning, which is when I try and get real work done.

Funny note about Bacn: I made some Bacn myself! Had to send out an e-mail to the graduate group about something or other. I feel sort of bad about it now. Even funnier, I even managed to send research-related Bacn to myself in the form of an e-mail to myself of a paper I thought I should read. Of course, I paid it about as much attention as all my other research-related Bacn… :)

Scheduling. Surprisingly large amount of e-mail just to schedule appointments. This was actually a relatively tame week in that regard, so I was sort of surprised how much e-mail circulated about that.

Research. Large number of logistic e-mails, often about shipping, etc. The shipping and ordering stuff doesn’t take up too much time, honestly, perhaps because we have a relatively small operation. It was interesting to see how much Research “collaborations” took up. To me, this is partly a matter of how much you invest in your scientific community, sort of like being a good citizen. That said, it is clear that this can suck your brain quite easily. Research Bacn is I think something that I get a lot more of than I imagine most people getting, for various reasons. Surprisingly (unsurprisingly?) little time spent on actual Research Research e-mails. Which I actually regard overall as a good thing: for most research discussions, I talk with the people in my lab directly. I think that is a far more efficient way to get things done, generally, and avoids those super long e-mails that take hours to craft.

So what to do with this data? I think I came to a few primary conclusions:
  1. I need to organize my e-mail so that the Bacn is out of sight most of the time. I try my best to ignore Bacn most of the time, but in practice, it takes a lot of discipline to avoid looking at all those e-mails during the day, especially when there are sometimes other interesting e-mails that interspersed in my inbox as well that I may very well want to deal with. To do this, I’ve implemented filters on Gmail to just send most of these to a specific folder that I will check once a day or so, hopefully in a really fast batch mode. There is some slight chance that I might miss a timely e-mail, but whatever. Looking at it now, perhaps this is obvious, but somehow I just didn't think of it before.
  2. I get a lot of research-related logistical e-mails that I should probably be delegating about ordering and the such. These are not quite Bacn, because I (or someone in the lab) do need to give some input or really read them, sometimes in a timely manner. But just as often not. I also noticed I got a few more of these this week than usual.
  3. Teaching: I didn’t get a lot of teaching e-mail this week, which is nice, but somewhat unusual. I actually have a specific teaching gmail account that I ask students to send to–this organization is very useful, and it allows me to make others do some of the organizing for me. Of course, you have to actually tell your students about it, which I of course forgot to do this term in my grad class. But I will definitely remember next term in my big required undergrad class. I will also be sure to have a policy that I only respond to student e-mails on one particular time of the week, no exceptions.
  4. Perhaps the most important lesson is to BE DECISIVE. Someone (and I’m so sorry, I forget who, and the comments got deleted) left an awesome comment on the blog somewhere about a simple rule, which is read each e-mail only once. I think that’s absolutely right. I definitely found myself reading an e-mail and then mulling it over and then mulling it over again. I have to not do that. If it requires thought, I should just make a (prioritized) to-do list item for it and then mark it as read and be done with it. Otherwise, I’m just cycling over and over again.
Anyway, those are some thoughts. I will try and implement this this week and post again once the results of this reorganization are in.

Sunday, November 9, 2014

My favorite quote about LaTeX

Argh, just finished struggling through submitting a LaTeX document to a journal. And I think I still screwed up and will have to do some more fussing. My only hope (and a fading one at that) is that things will not devolve to the point where I just have to copy the whole damn thing into Google Docs, where you can actually spend your time on, you know, doing real work.

So I just Googled around and found the following page, which has my new favorite quote about LaTeX:
Latex ("LaTeX" if you're pretentious as hell) is the biggest piece of shit in the history of both pieces and shit.
Yes.

(And yes, before you say it, I know what you are going to say.)

Saturday, November 8, 2014

“Edge of Tomorrow” and the case for better education

I just watched Edge of Tomorrow, a recent action movie with Tom Cruise, and it got me thinking about education. In case you haven’t seen it, it’s basically an action movie version of Groundhog Day, where Tom Cruise lives the same day over and over until he saves the world from alien invaders. Umm, well, that last sentence sounded pretty stupid, but I actually thought it was a pretty good movie.

Anyway, in the movie, Tom Cruise (Sgt. Cage) enlists the help of Emily Blunt (Rita Vrataski, super badass alien killer), and every time he relives the same day, he makes it a little further towards killing the aliens with her. He remembers everything that happened, but she remembers nothing. That means that he has to teach her everything that he has collectively learned every day, which is of course limited by the capacity that she has to absorb all that information. It occurs to me that our own lives are a lot like Rita’s day. Each of us is born knowing nothing, and we have exactly one lifetime to learn the collective knowledge of the world (and hopefully add to it) before we die. As our civilization’s knowledge burgeons, we have to get better at cramming this stuff into our kids' brains, because they still just have one lifetime to learn an ever increasing sum of knowledge and then to use it. Somehow, thinking about it this way makes me think that it’s really sad that we haven’t paid as much attention to how we educate as we should. I mean, I guess I already knew that, but it just seems to take on a bit more urgency for me when I think about it this way.

Hmm. I can’t believe I just made an analogy between Tom Cruise and the collective knowledge of the world. I need a drink.

Friday, November 7, 2014

My water heater is 100% efficient (in the winter)

Just had a thought while taking a shower the other day. These days, there's lots of effort to rate appliances by their efficiency. But it occurs to me that inefficiency leads to heat, and if you are heating your home, then you are basically using all that "wasted" energy. So even if some of the gas used for our water heater doesn't actually heat the water, as long as its in the basement and the heat travels upward, that heat is not going to waste. So the effective efficiency of the appliance is actually higher than expected. Conversely, in summer, if you use the air conditioner, the opposite is true. I guess the overall efficiency would depend on your mix of heating and cooling.

I was also thinking about this a while ago when I installed a bunch of LED lightbulbs. Although they use much less energy, they are producing much less heat to warm up the house. I mentioned this to Gautham, and he pointed out that using electricity to heat your house may be considerably less efficient than, say, natural gas, and so that means it's not 100% efficient, relatively speaking. Still, it's better than what one would naively expect.

Of course, the best thing about LED lightbulbs is not so much the electricity or cost savings (which are pretty modest, frankly), but the fact that they don't burn out. If you have a bunch of 50W halogen spotlights, you know what I mean. By the way, just got a "TorchStar UL-listed 110V 5W GU10 LED Bulb - 2700K Warm White LED Spotlight - 320 Lumen 36 Degree Beam Angle GU10 Base for Home" from Amazon, and it looks great (better than the other one I got from Amazon for sure).

Thursday, November 6, 2014

Why are papers important for getting faculty positions?

Loved Lenny's post about how a high profile paper out of your postdoc is not required for many positions in academia. The list he has is pretty good proof of that fact, and I know firsthand from my own experience–I think my "big postdoc paper" was just submitted by the time I had my last interview.

I think it's important to keep in mind, though, that the existence of such examples is not a proof that there are no causal connections between the two. I think a lot of this is field dependent as well as institution dependent. For instance, I definitely feel like my job search might have been easier with a published paper, especially in biology/medical departments. And I have definitely heard of places, for example in other countries, in which applicants have been explicitly told that the job is theirs if and only if their postdoc paper is accepted. And I have heard this multiple times, so it was not a one-off.

Why? If the search committee understands the work and the researcher and believes in them both, then why does the existence of an accepted high profile paper matter so much in and of itself? A big part of the answer is that visibility matters.

One thing I realized after starting my faculty job was that starting a lab is a hard business, and part of that business is getting people interested in your research. There are tons of people out there doing science. Why should someone want to join your lab? Why should anyone care about your work? Why should anyone give you funding to do this work? Why should you be the one to succeed when everyone else is out there doing good science as well? Having a high profile paper when you start is undeniably a part of the answer to these questions. And it’s also a simple metric of success that is readily interpreted by people across disciplines.

Departments generally want the people they hire to succeed. There are many reasons why it's a lot easier to succeed if you have a fancy paper as you are starting your lab. It helps in recruiting students and postdocs, and in getting grants and getting invited to talks. Same thing goes for coming from the lab of a big-name PI. The big-name PI will be out talking about your work at venues and forums that you can only dream about as a junior PI. These are all different pieces of the puzzle, and nice papers are for sure an important piece of that puzzle, for better or for worse. And the fact is that there is at least some correlation between where you publish (especially averaging over time) and the quality and importance of your work. Not a perfect one for various reasons, and I hate the current publishing system, but it is disingenuous to pretend that this is not the case.

I'm sure many people out there are saying "It should all be about the science, not where it's published or who you worked with or all that other stuff." Sure, sounds nice in theory, but in practice, it's harder than people think. Imagine you are in the market for a washing machine. You go to the store and there are hundreds of washing machines to choose from. Some come from name brands, some are completely unknown. Some of name brand ones are rated in Consumer Reports by a handful of "washing machine experts", and some are rated much higher than others. Which one would you buy? Now imagine you are in the market for a colleague for at least the next 6-7 years, hopefully the next 30-40+ years, and you will be investing millions of dollars in this person and be interacting with them regularly on a professional basis. Their success or failure will reflect directly on your department. You better believe people make a pretty considered decision here. And yes, visibility matters. Personal connections matter. Papers matter. Your personality matters. Your science matters. EVERYTHING matters. Seriously, think about it: how could it possibly be otherwise?

Monday, November 3, 2014

Why don’t bioinformaticians learn how to run gels?

Just read an interesting post from Sean Eddy about genomics. Lots of points there about sequencing and big science and other stuff that seems well above my pay grade. But the post also brings up the notion that biologists should be able to do their own data analysis, in particular scripting with Perl/Python. I’ve heard this subjected debated before many times, and I’m sure I’ll hear it again. But I don't think it's the right way to think about it.

First off, I want to say that I agree with the underlying premise in theory. Yes, it would be great for everyone to have some basic skills in quantitative analysis and programming. It would certainly be useful for biologists to be able to analyze their own data, and we do all our own analysis at the command line in the lab, typically using tools graciously and freely provided by others. For others with different skills and interests, there is finite time in the day, and maybe they don’t have the time and inclination to learn this stuff. To require biologists to learn to do things at the command line is I think missing a huge opportunity, and is also a bit unfair.

Consider the following: how many bioinformaticians are required to learn and perform library prep to do their work? And what if we told them to “just figure it out by Googling around”? I’m not even talking about understanding all the various technical aspects of library prep, I mean even just doing the basic protocols. Probably not very many have been required to do this. I’m sure they could do it and figure it out, but why should they, you might ask? A reasonable question. Well, then why should biologists be subjected to the pain of shell/Perl scripting just to figure out if some genes’ expression went up or down? Why does this work in only one direction? Remember, scripting is NOT SCIENCE. It is just a tool. I see no reason why everyone should have to learn about all the details of every tool in order to do their science. This even applies just within the realm of computation: how many people who use the log function know anything about how to implement it? Going up the chain, I don’t need to know why MATLAB uses Householder transformations to compute a QR factorization instead of Gram-Schmidt or even that it does so at all–I can just call it and trust that MATLAB does the best thing by default. That is the nature of a mature tool.

Indeed, it is particularly ironic to hear these calls for DIY learning from genomic informaticians, when the experimental side of that same work is amongst the most commoditized and standardized bench work in existence (funnily enough, to a point where bioinformaticians might actually be able to do it with only minimal training!). Basically, add and remove liquids to/from each other for 1-2 days, squirt it in some sequencing chip and say go, then download the data. It’s pretty close to the big green “GO” button that everyone dreams about. And it comes from years of careful thought and consideration about the needs of the USER of the tool, not of the provider. Make no mistake, the technology underlying sequencing is very complicated and sophisticated. But the reason sequencing has taken off the way it has is because USING the (hardware/wetware) tool is very simple. Just like scripting/data processing, sequencing is not science, but a tool. It is, at this point, a much easier to use one than analysis software, in my opinion.

I of course appreciate that part of the reason that sequencing itself is so well developed is because there are huge companies with tremendous resources backing the effort. Fair enough. Perhaps it will require a commercial effort to build an easy to use pipeline for analysis. Maybe not. Either way, though, I think the main thing to keep in mind if you are in the tool business is that if you want people to use your tool, you will get a lot further by LISTENING (and I mean actually listening) to your users and their needs than you will by simply telling them about all the things that they ought to do and ought to know. It’s hard work, and requires a lot of thought and attention, and I certainly understand the sentiment that it may not fall within the purview of academic work. But I think it needs to happen one way or another. In the same way that simplified mobile operating systems brought computation to many more people than before, so will easy to use bioinformatics pipelines bring sequencing tools to many more biologists, which is a good thing.

This is most certainly not to say that biologists shouldn't be getting some more quantitative training, especially in computers. There is no doubt that learning some principles of programming and quantitative/statistical analysis can be hugely beneficial, given the way science as a whole is headed. Again, that is not the same thing as learning scripting. In fact, being able to script is completely unrelated to quantitative thinking and only moderately related to any high level concepts in programming. It is busywork, plain and simple. In my lab, we do quantitative work, and writing these scripts is still basically what I would consider a big waste of time. We can do it, but it has nothing to do with science, quantitative or otherwise, and most of us would much rather not have to bother. Even worse for science is that the requirement of scripting leaves those who can’t do it because of limited time or whatever out in the cold.

Oh, and by the way, I think Galaxy is a great step in this direction. Bravo to the developers, and thank you for your hard work!

Update, 11/4: In case you're wondering if we practice what we preach, we have two versions of our image analysis software. One is open source, very powerful, completely extensible, fancy software engineering, etc. The other one is super limited, but designed for use by scientists, not programmers. Both are freely available, but guess which one gets used by orders of magnitude more people...