Wednesday, April 16, 2014

Machine learning, take 2

As mentioned earlier, one of my favorite Gautham quotes is "Would Newton have discovered gravitation by machine learning?" I think the point is solid, that a bunch of data + statistics is not science.

At least not yet. Technically, Newton's brain was a machine, and it came up with gravitation. So it is formally possible to have a machine come up with a theory. And I don't think this argument is just based on a technicality. I was chatting with Gautham yesterday about what a theory is, and doesn't it start with observing a pattern of some kind? Newton had access to centuries (millennia?) of star charts–people had misinterpreted them into epicycles, but the data were there for him. In response to my previous post on statistics, Shankar Mukherji mentioned the work of Hod Lipson, in which they are able to deduce physical laws from the data. Very cool. It seems that progress towards this goal is already underway. My guess is that as we make more progress on machine learning (my completely uninformed bet is on neural network approaches), computers will start to see more seemingly incredible inferences about the world. My other guess is that this will happen a lot sooner than we think.

In the meantime, though, I still think we are pretty far from having Newton in silico, and I think that Gautham's point about real learning vs. (the current state of) machine learning is still a valid one. Until this future of intelligent machines arrives, I think most fields of science will still require a lot more thinking to make sense of the data, and simple classifiers may not yield what we consider scientific insight.

Monday, April 14, 2014

Papers are a lot of work, and some of it is even worth the effort

I often say that the current model for publishing is a complete waste of time, and I still think that's true for so many parts of the publishing process, like dealing with reviews, etc.  It's hard for young faculty and even harder for trainees, for whom so much rides on the seemingly arbitrary whims of reviewers and editors.  Wouldn't it just be better to post on a blog, I often wonder?

I think deep down I know the answer is no.  Not that publishing in a particular journal is really important.  But there is something to putting together a well-constructed, high quality paper that is a worthy use of time. Often it feels like finishing a paper is just a bunch of i dotting and t crossing. Yet I've often found that it's in those final stages that we make the most crucial insights. Hedia's lincRNA paper is a good example: it was only towards the end when we were writing it up that we figured out what was really going on with the siRNA vs. ASO oligonucleotide treatment.  The details aren't so important, but the point is that this was in some ways the most important finding of the paper, and it was lurking within our data almost until the very end.

I've found the last few weeks before submission to be a stressful period, when you really want to get the paper out the door and at the same time you feel like you're putting a lot on the line that you want to get right.  It's exciting but scary to put something out there. And it's especially scary to look at your data again, here at the end of the road, and wonder what it all means after years of hard work. But I feel like this mental incubation period is a necessary part of doing good science, and where many new ideas are born.

Thursday, April 10, 2014

Why is everyone piling on that poor STAP stem cell woman?

I just read a little news feature in Nature today that made me very sad. For those of you who don't know, it's about the researcher from Japan who came up with this STAP method (stimulus-triggered activation of pluripotency), in which squeezing cells and putting them in acid can make them into pluripotent stem cells. This is a huge discovery, because it means you can make stem cells without having to perform the usual manipulations (such as genetic ones) to convert cells into stem cells.

Nature published these studies to huge fanfare a little while ago, but then, almost within a month or so, many people started to publicly question whether the results were true, including even one of the coauthors (one of those "victory has a thousand fathers, defeat is an orphan" situations). People started saying that nobody could replicate the findings, and also found some errors in the manuscript, including some plagiarized materials and methods, an old image of a teratoma and some gel-lane mixups. Her institute started an investigation, and she's had to hire a lawyer and defend herself to the press and (from this little Nature article) appears to be in the hospital.

This whole situation is completely ridiculous and strikes me as something that has gotten completely out of hand. Seriously, people, it's just a paper. First, to the method itself: it seems weird to me that people are criticizing this method already so soon after publication. Honestly, if I had a nickel for every time someone couldn't do RNA FISH and said our method doesn't work, I'd have, well, a lot of nickels. And that's something so easy to do that undergrads routinely do it on their first day in lab. Something tells me that this method must be fairly tricky, otherwise someone would have probably already figured it out by now. So let's give her the benefit of the doubt, at least for a couple years.

All the investigations into the little errors and discrepancies in her paper strike me as silly and vindictive. Would all of your papers survive such deep scrutiny? Yes, her paper is very important, significantly more so than anything I've ever done, but remember that's she's still just a scientist working in a lab like you and me. Any paper is such a huge mess of data and figures that little errors will creep in from time to time. To discount her work because of them is utterly ridiculous. And plagiarism of materials and methods? Come on! How many ways can you describe how you culture cells?

And if her work doesn't end up panning out? SO WHAT! Again, it's just a paper! If I had a nickel for every Nature paper that ended up being wrong, well, you know what I'm saying. I personally know of several examples of big Cell, Science, Nature papers that are wrong that got people fancy jobs at top institutions, grants, tenure, etc. Some of these are cases in which people have grossly overstated the effect of something through some sort of tricky analysis. Some of these are cases in which the authors greatly overinterpreted the data, leading them to the wrong conclusion, often because of some sloppy science. Some of these are in the fraud gray zone, where they cover up particular discrepant results that either confuse or refute the main conclusions, or do experiments over and over again until they get the "right" outcome. Those people have jobs and everyone's happy–they're certainly not being investigated by their own institutions. Why is this woman being taken down so hard? Is it because what she's doing is so important? In that case, the lesson is clear: don't do anything important. Is that the message we want to be sending?

Wednesday, April 9, 2014

Terminator 1 and 2 were the first great comic book movies

Just watched Terminator 1 again–how awesome! Not quite as good as Terminator 2, which is probably one of the top action movies of all time, but still great, maybe top 10-20. As I was watching it, I was thinking that a lot of what made the movie so appealing is the character of an unstoppable super man (or in this case, robot). Much better as a bad guy than as a good guy, because the unstoppable good guy is boring (see: Superman). Isn't this the prototype for all the modern day comic book movies? One of the things that makes comic book movies exciting is the epic battles between the comic book characters, both doing incredible things, and waiting to see who breaks first. Terminator 2 is still amongst the best (if not the best) in this regard. Another cool thing is that the Terminator movies did this with much worse special effects than we have today, especially Terminator 1, which looks prehistoric. Practically expected claymation sometimes. But it's still awesome. Compelling movie action is more about engendering fear, suspense and relief than just special effects. Still, Terminator 2 would just not have been as awesome without the (for its time) unprecedented special effects, which have aged remarkably well.

NB: Yes, I realize that the original Superman movies came out before T1. But they just weren't as good. And that's a fact. You know it, too.

Sunday, April 6, 2014

The principle of WriteItAllOut

After Gautham's thoughts about code and clarity and lots of paper writing and grant writing these days, a couple of conclusions. First, grant writing is boring. Second, when in doubt, write it all out. For computer code, this means having long variable names. If you have the option of writing a variable name of "mntx" or "meanTranscriptionSiteIntensityInHeterokaryon", go for the latter. Yes, it takes a little more effort, but not much, and its a MUCH better idea in the long run. I wish we could do this in math and physics also. Same holds for papers and grants, both in figures and in text. In figures, if you can give an informative axis label, do it. "Mean (CRL)" is much less informative that "Mean transcript abundance per gene in human foreskin fibroblasts". It's longer, but with some creativity you can make it work. In main text, AVOID ALL ACRONYMS! People less often read papers straight through from beginning to end these days, and if someone looks at a paragraph halfway through the text and sees something like:
Similarly, we find that 9.3% of autosomally expressed accessible novel TARs show ASE, we expect this number to be lower than genes as novel TARs correspond to exons of genes.
then they will be lost. And I don't think the space taken by expanding out these acronyms is a legitimate excuse. For the record, though, I do use DNA, RNA, SNP and FISH. Actually, I'd probably be well served to expand out the latter two, although they are fairly standard.

Remember, the main point of a paper is not to make little puzzles for your readers to decipher, but to convey information, both accurately and as efficiently as possible. For grants, well, after getting some... strange reviews, I'm honestly not sure what the goal is. Except to get money.

Figures for talks and figures for papers

We've been working on writing up Olivia's paper, and I've also been giving some talks about the work, which has given me a chance to compare those two modes of communication. There are of course many differences, but one of the most striking is that the figures you use for papers seldom work right for a talk. Paper figures tend to be WAY too information dense for a talk. I noticed this recently when I gave a talk on this material and I lazily just incorporated one of our nicely constructed paper figures, only to realize when I was up there talking about it that it would probably take me a good 5 minutes to explain everything in that one picture. Note that this is not just about conveying too much data, but in this case just a diagram to illustrate the comparison between two hypotheses. There is a fundamental conflict: giving a talk, you really can only present one concept at a time and need to make sure people are coming along for the ride. In a paper, you can (and often must for space reasons) present multiple conceptual layers on top of each other. Hence the high cognitive density of those figures.

Anyway, I reconfigured the talk with some rather different figures, and it went much better (or at least I thought so). Maybe something to keep in mind when preparing a talk.

Saturday, April 5, 2014

Publishing survives partly because of our egos

The internet abounds with discussions about how the scientific publishing system as it currently stands is completely ridiculous: somehow, we scientists do all the work, both the blood, sweat and tears of creating the content, then reviewing the content, not to mention writing the reviews, perspectives and news and views, writing the little protocol pieces… and we typically pay for the “privilege”, often directly with page charges on top of institutional subscription fees. It's a tax, and it happens up and down the food chain. Yes, the system is pretty messed up. But a lot of people have already written about that, so I won’t bother writing any more on that point.

Instead, I wanted to point out how some social aspects of how the system maintains itself. Why do we scientists do all this work for free? Yes, partly because of the desire to maintain the scientific enterprise. But I think another big part of it is because of appeals to our ego. And that gets exploited throughout the publishing ecosystem. Who hasn’t had that warm feeling the first time you get asked to review a paper? After that wears off, then the first time you get asked to review a paper at Nature or Science? Or to write a news and views? Or a review article? Or guest edit a paper? Or asked to be on the editorial board? Or to assemble a collection of reviews or protocols? At which point, you probably go out and ask some young investigators to write little pieces for you, and they will probably be honored that you asked them. Note that at none of these stages do I think any of the scientists involved are purposefully trying to take advantage of anyone, at least I hope not. Nor are all the publishers who manage the content, especially the bigger players. But I’m pretty sure at least some of those publishers are. Consider those little reviews that people are always asking you to write. Like a chapter in a review book or encyclopedia or whatever. Typically some (probably well-meaning) senior professor in the field will ask you to write it, and you spend time on it and NOBODY reads it. For the author, it's basically just a chance to add a single citation count to your papers. The only solace is that nobody is wasting their time reading them, at least. So who gains? Certainly science gains very little from this enterprise, I can tell you that. Said senior professor gets to say that they edit this review journal as a line item on their CV, so there’s that. But my guess is the winner is the publisher, who gets to say that they have all this content when negotiating with the universities. There's a whole content industry out there based on scientific fluff, based off of our hard work, and enabled by people appealing to our sense of self-importance within our scientific social hierarchy.

So what to do? I can only speak from the perspective of a junior faculty, but I'm trying to be more judicious about what I choose to do with my time. Of course, I’ve done plenty of time-wasting content generation in the past, and will probably continue to do some, sometimes against my better judgement. And I’m guessing I’ll be presented with tantalizing-sounding opportunities in the future. I just hope that if I do decide to pursue those opportunities, I do so for the right reasons. I feel like as a community, when we are faced with a choice, we should remember that we're highly skilled scientists being asked to do free work for someone. And that someone is probably not working for free. Many companies would pay dearly for access to your knowledge. We shouldn't sell ourselves short, even when people try and make us feel tall.