RajLab: Why don’t bioinformaticians learn how to run gels?

Monday, November 3, 2014

Why don’t bioinformaticians learn how to run gels?

Just read an interesting post from Sean Eddy about genomics. Lots of points there about sequencing and big science and other stuff that seems well above my pay grade. But the post also brings up the notion that biologists should be able to do their own data analysis, in particular scripting with Perl/Python. I’ve heard this subjected debated before many times, and I’m sure I’ll hear it again. But I don't think it's the right way to think about it.

First off, I want to say that I agree with the underlying premise in theory. Yes, it would be great for everyone to have some basic skills in quantitative analysis and programming. It would certainly be useful for biologists to be able to analyze their own data, and we do all our own analysis at the command line in the lab, typically using tools graciously and freely provided by others. For others with different skills and interests, there is finite time in the day, and maybe they don’t have the time and inclination to learn this stuff. To require biologists to learn to do things at the command line is I think missing a huge opportunity, and is also a bit unfair.

Consider the following: how many bioinformaticians are required to learn and perform library prep to do their work? And what if we told them to “just figure it out by Googling around”? I’m not even talking about understanding all the various technical aspects of library prep, I mean even just doing the basic protocols. Probably not very many have been required to do this. I’m sure they could do it and figure it out, but why should they, you might ask? A reasonable question. Well, then why should biologists be subjected to the pain of shell/Perl scripting just to figure out if some genes’ expression went up or down? Why does this work in only one direction? Remember, scripting is NOT SCIENCE. It is just a tool. I see no reason why everyone should have to learn about all the details of every tool in order to do their science. This even applies just within the realm of computation: how many people who use the log function know anything about how to implement it? Going up the chain, I don’t need to know why MATLAB uses Householder transformations to compute a QR factorization instead of Gram-Schmidt or even that it does so at all–I can just call it and trust that MATLAB does the best thing by default. That is the nature of a mature tool.

Indeed, it is particularly ironic to hear these calls for DIY learning from genomic informaticians, when the experimental side of that same work is amongst the most commoditized and standardized bench work in existence (funnily enough, to a point where bioinformaticians might actually be able to do it with only minimal training!). Basically, add and remove liquids to/from each other for 1-2 days, squirt it in some sequencing chip and say go, then download the data. It’s pretty close to the big green “GO” button that everyone dreams about. And it comes from years of careful thought and consideration about the needs of the USER of the tool, not of the provider. Make no mistake, the technology underlying sequencing is very complicated and sophisticated. But the reason sequencing has taken off the way it has is because USING the (hardware/wetware) tool is very simple. Just like scripting/data processing, sequencing is not science, but a tool. It is, at this point, a much easier to use one than analysis software, in my opinion.

I of course appreciate that part of the reason that sequencing itself is so well developed is because there are huge companies with tremendous resources backing the effort. Fair enough. Perhaps it will require a commercial effort to build an easy to use pipeline for analysis. Maybe not. Either way, though, I think the main thing to keep in mind if you are in the tool business is that if you want people to use your tool, you will get a lot further by LISTENING (and I mean actually listening) to your users and their needs than you will by simply telling them about all the things that they ought to do and ought to know. It’s hard work, and requires a lot of thought and attention, and I certainly understand the sentiment that it may not fall within the purview of academic work. But I think it needs to happen one way or another. In the same way that simplified mobile operating systems brought computation to many more people than before, so will easy to use bioinformatics pipelines bring sequencing tools to many more biologists, which is a good thing.

This is most certainly not to say that biologists shouldn't be getting some more quantitative training, especially in computers. There is no doubt that learning some principles of programming and quantitative/statistical analysis can be hugely beneficial, given the way science as a whole is headed. Again, that is not the same thing as learning scripting. In fact, being able to script is completely unrelated to quantitative thinking and only moderately related to any high level concepts in programming. It is busywork, plain and simple. In my lab, we do quantitative work, and writing these scripts is still basically what I would consider a big waste of time. We can do it, but it has nothing to do with science, quantitative or otherwise, and most of us would much rather not have to bother. Even worse for science is that the requirement of scripting leaves those who can’t do it because of limited time or whatever out in the cold.

Oh, and by the way, I think Galaxy is a great step in this direction. Bravo to the developers, and thank you for your hard work!

Update, 11/4: In case you're wondering if we practice what we preach, we have two versions of our image analysis software. One is open source, very powerful, completely extensible, fancy software engineering, etc. The other one is super limited, but designed for use by scientists, not programmers. Both are freely available, but guess which one gets used by orders of magnitude more people...

10 comments:

@drnjfawcettNovember 4, 2014 at 2:05 PM
Agree it's as unrealistic to expect biologists to be able to script complex analyses, as it is to expect bioinformaticians to conduct perfect library preps (still bring me out in cold sweats..)
Coming as a medic, who has had to learn both lab and basic bioinformatics... I think it's really important in the discussion to differentiate between 'scripting as the equivalent of learning to use windows and interact with data/use Excel to analyse' and 'scripting as in designing and writing an analysis tool'. The former is I think, achievable and realistic to expect. I agree that to expect the beginner to learn the latter is nuts.
We've set up an in-house Unix/Informatics for beginners course in our group- and massively benefited from having a (very very patient) person to guide us through - reduces learning time by about a hundredfold from blind googling...

ReplyDelete
Replies
UnknownNovember 6, 2014 at 9:21 PM
Whenever this debate comes up, I wonder why don't train biologists more like physicist. In physics, even experimentalists are trained in math and programming way beyond what biologists get. They don't use all of the fancy math that theorists use, but they know how to quantitatively analyze their data.

It seems like in physics this is a solved problem. The comp bio/ experimental bio split should be a lot more like the split in physics.
ReplyDelete
Replies
Jonathan BadgerNovember 10, 2014 at 2:47 PM
A lot more bioinformaticans that you realize can run gels just fine if we want to -- many of us were originally *trained* as bench biologists, Eddy included. Not to mention that these days bench work is becoming easier and easier as pre-poured gels are available (in my day we had to pour them ourselves) plus there are all these new lab robots that automate so much like picking colonies.
ReplyDelete
Replies
AnonymousJuly 5, 2015 at 5:37 PM
This post really resonates with me, especially as someone with a purely experimental background trying to slog through this DIY programming learning curve in my second year of grad school. "Just googling" takes up a lot of time, and it can be really overwhelming (especially when you don't get many continuous chunks of time away from the bench). It would help if these tools were easier to use/modify/develop.
ReplyDelete
Replies

Add comment