RajLab: December 2013

Friday, December 27, 2013

Cringeworthy

Did you know that Jodie Foster gave Penn's commencement address in 2006? And did you know that she raps? Take a look:

Oh man, makes me embarrassed just watching it...

Saturday, December 21, 2013

The over-mathification of Wikipedia

Wikipedia is an amazing resource. Truly amazing. Aside from personal interests, I think I use it at least several times a day for work-related science information. For biology, it's proven to be an invaluable resource for just getting a quick overview of a particular gene or disorder or other biological topic. It could also be a great resource for more theoretical stuff as well, and sometimes it is. For instance, I teach a fluid mechanics class, and one of the main confusions for students is the difference between streamlines, streaklines and pathlines. The Wikipedia page on this topic is just great, and has an animated GIF that gets the point across way better than I can on the chalkboard.

But for many of the theoretical topics, there's a big problem: somewhere along the line, it's clear that some mathematicians got involved. The issue is that they've inserted all sorts of mathematical technicalities into otherwise relatively simple (and useful) mathematical techniques that make the article essentially unreadable and useless. Take the Wikipedia entry on the Laplace transform. The Laplace transform is super useful in solving differential equations, basically because it turns derivatives into multiplication, thereby turning the differential equation into an algebraic equation. Very handy. But good luck getting that out of the Wikipedia article! Instead of starting with a simple application to show how people actually might use the Laplace transform in practice, the Wikipedia article begins with a lengthy and overly mathematical formal definition, with statements like:

One can define the Laplace transform of a finite Borel measure μ by the Lebesgue integral

While these conditions may be interesting to mathematicians, I don't think it is of any interest for the vast majority of people who use the Laplace transform. Then it gets into discussion of the region of convergence, which again begins with:

If f is a locally integrable function (or more generally a Borel measure locally of bounded variation), then the Laplace transform F(s) of f converges provided that the limit...

Perhaps you care about the Borel measure locally of bounded variation, but I'm guessing that most people who are interested in the Laplace transform haven't taken courses in point-set topology.

Overall, look at the organization of the page. It is just so... mathy! It starts with a formal definition (devoid of any real practical motivation), proceeds to some details about convergence, then a bunch of properties and theorems, then tables of transforms, and then, THEN, finally, some examples. The only part of this that has a bit of motivation for a person who doesn't already know about why the Laplace transform is useful is at the beginning of the "Properties and theorems" section:

The Laplace transform has a number of properties that make it useful for analyzing linear dynamical systems. The most significant advantage is that differentiation and integrationbecome multiplication and division, respectively, by s (similarly to logarithms changing multiplication of numbers to addition of their logarithms). Because of this property, the Laplace variable s is also known as operator variable in the L domain: either derivative operator or (for s−1) integration operator. The transform turns integral equations and differential equations to polynomial equations, which are much easier to solve. Once solved, use of the inverse Laplace transform reverts to the time domain.

Why is this not at the very beginning? The answer is that it was! Look at this version from 2005. Much better! Still not perfect, since it doesn't have an example, but at least it more clearly gets the main point across about why you would use the Laplace transform. To me, it's clear that at some point, mathematicians got involved and wanted to make the page "right", the same way that mathematicians make the delta function "right" with all kinds of stuff about distributions, in the process completely obscuring the basic point of the delta function for people who really want to use it practically (and many other such mathy topics have been similarly "rightified"). Just to be clear, I'm saying this as a person who has a Ph.D. in math and who loves math, and I think it's great that people far smarter than I have spent time to make sure that one can rigorously define these things. And it IS important, even in some practical contexts. But it's not good for exposition to a more general audience on a Wikipedia page.

Oh, and here's the original Wikipedia page for the Laplace transform. All else aside, we've come a long way, baby!

Wednesday, December 18, 2013

Some mathematical principles

I was just thinking about the old days when I used to do math (although not particularly well), and I remember thinking that there are some principles in math. These are general "trueisms" in math that, unlike theorems, are neither provable nor even always true. Actually, I guess then they're "trueishisms". Anyway, here are a couple I know of:

Conservation of proof energy. I think I heard of this from Robin Hartshorne when I was taking his graduate algebra course. The idea is that if you're trying to prove something hard, and there's a lemma that makes the result easy, then proving that lemma is going to be hard.
In real analysis, if something seems intuitively true, there's probably a counterexample. For example, the existence of an unmeasurable set. (2 and 3 courtesy of Fang-Hua Lin during his complex variables course, if I remember right.)
In complex analysis, if something seems too amazing to be true, it probably is true. For example, everything about holomorphic functions.
In numerical analysis, if you are estimating complexity or error bounds, log(n) for n large is 7 (courtesy of Mike Shelley, I think).

Any others that people know about?

Saturday, December 14, 2013

Lab safety quizzes

So another year has passed, and I again had to take the online lab safety training, along with the requisite quiz at the end to “test my knowledge of lab safety”. Dunno about you, but I find these quizzes to be completely meaningless: either the questions are ridiculously simple, or they are ridiculously hard, testing some arcane (and ultimately useless) detail of lab safety. Here is an example of the former:

Now you could argue that by having to answer this question, it forces you to read the other two answers and thus learn their content, i.e., “oh, my two legitimate options are autoclaving and disinfecting followed by pouring down the drain.” But then take a look at this example:

Answers 1, 2, 3 are all obviously wrong (although perhaps its teaching you something about it), but then the one that’s actually correct (80-120 feet per minute face velocity) is actually the one thing I didn’t know at all! It’s like the inverse of the above.

Then you get the ones that are impossible, with arcane answers. I actually encountered a lot more of them when I was at MIT, like the one that asked something like:

EPA regulations dictate that under 40 gallons of oil may be stored in a secondary area for:
1. 3 days
2. 1 week
3. 2 weeks
4. 1 month

I think the answer was 3 or something like that. Talk about weird! Here are some similar ones I’ve seen at Penn:

I had absolutely no idea about the lecture bottle explosion question, and the other two I just made educated guesses.

Anyway, I suppose you can say that someone can learn something (albeit very little) from these quizzes, but I certainly don’t think they serve any evaluative purpose. If only they could just teach some basic common sense...

Getting honest feedback

On this blog, I've repeatedly argued that peer review is a net waste of time, basically because it doesn't enforce much quality control, it results in long publication times, and reviewers have seemingly gone mad these days with additional experiments. I say "net waste of time", though, because there are some undeniable benefits. Almost any paper will usually benefit from a well-meaning expert looking through the paper, remarking on which explanations are unclear, which claims are oversold (or undersold), and what further data would be nice to include if you have it.

So if we were to eliminate peer review, where would we get this sort of feedback? Well, ideally, through peer review–that is, actually sending the paper (informally) to your peers and asking for their comments. The problem, of course, is that this is one sort of response:

Hey Arjun,
Nice paper! Very exciting result! My suggestion would be to extend out the discussion a bit and cite these few papers (some papers). Good luck!
Science Friend

The problem, as I've documented before, is that nobody has time to read. But I'm lucky to have some good friends who will actually take the time to read a paper and give detailed feedback. For instance, Hyun Youk (postdoc at UCSF) read over a paper Gautham's about to submit and gave us quite detailed and extremely helpful feedback that really strengthened and clarified the paper, like the best peer review ever. I have no idea how to systematize this (I guess our current peer review system is basically that, but anonymous), but it's got me dreaming of a fantasy world where our friends read our papers and love them and make them better and then we post on ArXiV and all get HHMI funding. Sigh. Now to submit this paper and get ready for a war of attrition with the official "peer viewers"...

Wednesday, December 4, 2013

[Wormpaper outtakes and bloopers] elt-2 RNA level dynamics after heat shock in wild-type and HS::end-1 strains

- Gautham

Second in a series of outtakes and bloopers related to our paper on the relationship of gene expression dynamics and cell division times in the early C. elegans embryo.

elt-2 RNA level dynamics after heat shock in wild-type and HS::end-1 strains

This one is a blooper.

In our paper we perturbed cell divisions using mutants and asked if gene expression would track with those cell divisions. Conversely, it would have been great to modify the levels of transcriptional activators and see if gene expression could start before the cell divisions that normally precede them.

Strains with end-1 under the control of a heat shock promoter are available. end-1 is a well-known activator of elt-2. So we went ahead.

Methods: We obtained strain RJ663bc3 from Joel Rothman's group with the kind help of Yewubdar Argaw. We isolated embryos from a synchronized batch culture and aliquoted them at 25C. We heat shocked each aliquot at a different time, for 5 min. at 34C, and returned them to 25C. These brief but accurate heat-shocks were accomplished by centrifuging the embryos and resuspending them in M9 at the appropriate temperature. All samples were fixed simultaneously. The different aliquots spent between 0 and 40 min. at 25C after heat shock and before fixation.

Results: We did RNA-FISH on all the samples for end-1, end-3, and elt-2. We analyzed the data for each aliquot and obtained the following very interesting-looking result for elt-2 expression after end-1 over-expression by heat-shock:

Each panel shows embryos that were heat shocked within a particular window of their development as indicated in the grey shaded region. The red dots are the elt-2 expression levels for these heat-shocked embryos after they've been returned to 25C. The blue dots are the expression trajectory in wild-type without heat-shock for reference and the dashed vertical lines are E lineage divisions. It was super interesting to see elt-2 starting before the 2E divisions, and that it didn't appear to be sensitive if the heat shock came too early.

Blooper: This result is not in the paper because we see qualitatively similar behavior if we heat-shock wild-type worms, with no end-1 overexpressing transgene. Todd Lamitina alerted me to the critical need of doing that control experiment, which in hindsight I should have done first.

Perspective: What was much worse is that I had reason to know that the experiment would be a failure if I had paused to think about all I'd done. Well before doing this experiment I had looked at wild-type embryos that had been left at 30C (not a viable growth temperature for C. elegans) for an hour and found precocious elt-2 expression in these embryos. I disregarded that result as due to any number of things that could go wrong when you try to grow worms under lethal conditions. Without this confounding problem, the overexpression experiment would have been a valuable addition to the paper no matter what the outcome.

Why we were even looking at gene expression trajectories of wild-type worms under non-viable temperatures is a separate story, and another example of Perhaps there will be something interesting.

[Wormpaper outtakes and bloopers] Expression dynamics in C. elegans mes-2 mutant embryos

- Gautham

Very recently I got into a discussion in which I was being super negative about our tendency as scientists to seek to do "one more experiment," and the feeling that it is possible to increase the impact of work by doing more of it. Frankly, if you look at papers in "top tier" journals we see papers that appear to be a collection of somewhat unrelated results bundled up into a massive effort. So it definitely feels like a necessary evil to survive in academia, and even in our lab you'll hear folks complaining about how "thin" such-and-such paper is. The painful part is that the drive to avoid a "thin" paper often ends up in inconclusive or negative results which are never published.

This morning I woke up thinking that instead of being so damn negative (which is depressing for morale of all involved), we could do something positive and actually put some of that stuff up on the web, something Arjun's been encouraging since he set up the blog but we haven't really followed up with in the group. That way the work wasn't for nothing and even though its not in a journal, maybe google will find it when someone makes a search.

This is the first of a few short posts on experiments that we did in relation to our paper on the relationship between cell division times and gene expression onsets in early development of C. elegans embryos. The core of that paper was completed rather quickly, but we spent quite some time trying to add stuff to it. Here is some of the stuff that didn't make the cut.

Expression dynamics in C. elegans mes-2 mutant embryos

In the midst of working on the project that resulted in the paper I went to a talk by Prof. Susan Mango describing their work on mes-2 mutants (Yuzyuk et al. Dev Cell 2009). MES-2 is a component of Polycomb and its mutation was reported to change the window of developmental plasticity in the embryo. I thought it would be interesting to measure the dynamics of the genes to be featured in our wormpaper in this mutant background, since the paper was all about timing of expression of lineage-specification genes. Perhaps there would be something interesting.

Methods: mes-2(bn11) was the mutant studied by Yuzyuk et al. We got it in strain SS186 from the C. Elegans Genetics Center. mes-2 mutants have a very peculiar phenotype: Progeny of mes-2 homozygous mothers are sterile. We switched the balancer with a GFP balancer by mating with strain MT20110, constructed by Erik Andersen, a friend and collaborator of Arjun's from his postdoc time. I wanted to avoid using a fluorescent worm-sorter, so I ended up manually removing all GFP(+) worms from a small synchronized plate. The remaining worms are either fertile mes-2 first-generation homozygotes or their sterile progeny. Very carefully, we ran those ~1000 worms through a micro-scale version of our worm embryo preparation (which usually works with >10x the amount of worms). We then conducted RNA-FISH as usual.

Results: In the developmental window we were interested in there were no changes at all in the RNA level dynamics of genes we tested compared to wild type (N2 strain). Below is a figure for our favorite genes: end-1, end-3, and elt-2.

y-axis: Number of RNA counted by RNA-FISH. x-axis: number of nuclei (cells) in the embryo.

Similarly, in our hands, the expression dynamics of elt-7, hlh-1, and elt-1, were nearly identical to N2.

Discussion: Seeing no effect was disappointing, but it doesn't necessarily contradict the Yuzyuk report, since most of its statements on expression level effects deal with 8E or later stages in development. That is right about where our window of interest ended.

Perspective: I did these experiments on the common rationale that Perhaps there would be something interesting. That is a good reason to do an experiment. In fact, that is how all experiments get started.

However, it was a bad reason to withhold developing the manuscript for the actual paper, because this experiment, no matter what the outcome would have been, does not have all that much to do with the question of how cell divisions and expression timing are coordinated. So that should have continued as a parallel rather than an in-series effort. What is very telling is that this result is not in the paper because it was a negative result, but we would have probably found a way to put it in if it was positive. Very few experiments of that sort are related in an honest way to the paper that they are being bundled with. Its just fluff.

But we waited because of that feeling that bundling cool stuff together makes for a more compelling and publishable paper. And because, currently, there is no home for "thin" results, like this little blurb on mes-2.

Code reuse and plagiarism

Gautham's been doing a bunch of refactoring of our existing codebase for spot counting, in particular using a lot of object oriented design strategies. One of the primary goals is code reuse, which is commonly accepted as A Good Thing in programming. On the other hand, a student in lab has been writing a grant proposal on a topic that is very similar to another grant proposal we have submitted (and yes, we checked with the funding agency, it's okay in this context). But of course, my student felt compelled to have completely new language so that it wasn't "plagiarism", which is A Bad Thing. Which got me wondering: why are we so concerned about plagiarism in this context? Why is reuse of language in one context a worthy goal in and of itself, and complete blasphemy in another? Here's another example: I was at a thesis defense in which the candidate was strong second author on a paper and had included some of the figure legends from the published paper in the figures in her thesis. One of the committee members complained about this, saying that it was important to "write this in one's own words". I agree that it may be a good exercise, but I'm not convinced that using the figure legends is per se A Bad Thing. There just aren't all that many clear and concise ways of writing the same thing.

Maybe it's time to rethink the plagiarism taboo. Just like you can use someone else's computer code (with appropriate attribution), why not be able to use someone else's language? If someone wrote something really well, what's the point in rewriting it–probably less well–just for the sake of rewriting it? Would anyone rewrite a super efficient algorithm just for the sake of saying "I wrote it myself"? All that you need is a good mechanism for attribution, like quotes and links and stuff, you know, like all that stuff we already use in writing. In fact, I would argue that attribution is far more transparent in writing than in computer code, because typically only the programmer sees the code, not the end user. If I run a program, I typically am not looking at the source code, so I don't really know who did what, even if the README file gives credit to other code sources.

One might object to copy-and-paste writing becoming a mish-mash of ill-fitting pieces. First off, this is just bad writing, just as code that just jams together different pieces can be a mess, and will typically require one to code stuff to make things flow together well. But in a world where writing demands are growing daily (certainly the case for me), maybe it's time to consider text-reuse as a good practice, or at least not necessarily a bad one.