Friday, February 8, 2013

Single molecule RNA FISH FAQ

UPDATE: I've expanded this post to an entire website.  Check it out, including its extensive FAQ section!

So we often get questions about how we know single molecule RNA FISH is working the way we think it is.  A LOT of questions.  Seriously, a lot.  At this point, we've got a fairly extensive list of canned answers, and so I thought it might be useful to post them all in one place for people.

Q: How do you know that the spots you are detecting are single RNA molecules?  Couldn't they be conglomerates?

A: This is a good question, and one that has a variety of answers.  Many of the control experiments that  Sanjay did are in his excellent Vargas et al. PNAS 2005 paper.  One (beautiful, in my mind) experiment that Sanjay did was the following.  He in vitro synthesized a bunch of target RNA and put it in two different tubes.  In these tubes, he labeled the RNA with probes, with the RNA in each tube labeled with a different dye (say, red or green).  Then he combined the two tubes, so he had one tube with RNA that was either labeled with red probe or green probe, but not both.  He then injected these into the cell and observed.  If the RNA were forming conglomerates, then you would expect yellow blobs containing both red and green RNA.  If they were single molecules, though, you would expect the spots to be either red or green but never both.  The latter is what he observed.  You might question whether this holds for endogenous RNA, but he expressed that RNA and compared intensities, and it was the same.  This means that the endogenous RNA was also single particles.  Nice!  Definitely caveats to this, and technically it applies only to this RNA, but whatever, I think this is pretty solid.

There are other things you can do.  One is to measure the fluorescent intensity of the spots and show that you get a unimodal distribution of intensities.  Pretty weak in my mind, because if you had some spots with two RNA and some with one RNA, these peaks would overlap so much that it would probably look like a unimodal peak anyway.  But what do I know.

To me, one of the strongest experiments are some new results from Eric Lubeck and Long Cai (Lubeck and Cai, Nat Meth 2012).  They use super-resolution microscopy to actually read out a barcode of different colors along a single RNA molecule.  Think about how cool that is for a minute!  Anyway, it's very hard to imagine that conglomerates of RNA would show anything like that sort of thing.  I think Sanjay has some other similar experiments that corroborate this.

Q: How do you know you're getting all the RNA in the cell?

A: Honest answer: no idea.  What we have done to get at this is compare to qRT-PCR data, for whatever that's worth.  I think Singer first did this in Femino et al. Science 1998, and Vargas et al. PNAS 2005 has a nice demonstration as well.  In those cases, you can try and use absolute standard curves to get an actual average number of RNA molecules per cell via RT-PCR and compare to what you get by molecule counting via RNA FISH.  In Vargas et al. PNAS 2005, we got a pretty close correspondence, with the numbers coming within 30% of each other.  But given all the vagaries associated with RT-PCR (RT efficiency, PCR efficiency measurement error, etc.), I'm sort of amazed this number came out so close.  I think others have shown the same thing with RT-PCR, and so I guess that's pretty good evidence.  Many have shown (e.g., Raj et al. Nat Meth 2008) that fold changes in RNA counts are similar when comparing RNA FISH to RT-PCR, but I'm not sure what that really tells you about detection efficiency except that it's the same (maybe good, maybe bad) in both conditions.

Some will tell you that you can detect the same transcript with two different probe sets and look for colocalization between the colors.  The idea is that if you detect with both colors, that means that your efficiency is high.  I don't think that actually makes sense–if you have an RNA that is inaccessible for whatever reason, this control tells you nothing, and if you have an RNA that is accessible, then a single color will probably detect it.  This two color colocalization approach is good for specificity, though...

Q: How do you know your probes are detecting the right RNA?

A: This is where the two color test comes in handy.  What you can do is label every other oligo with a different fluorophore (i.e., R,G,R,G,R,G,R,G...).  If the signals colocalize, that is pretty good evidence that you're detecting the right RNA, since it is very unlikely that a whole bunch of different oligos are all binding to the same incorrect target.  Usually, you don't need to do this, because if you get good signal in a single color, you are almost certainly detecting the right thing.  However, if you are seeing bright transcription sites, they could potentially be off targets because even a single oligo can light those up.  If you are doing analysis of those sites, you will probably want to check things out this way.  Also, lincRNA are very prone to these sorts of issues and you should really check those out with this "odds and evens" approach (we'll have a paper on this soon).

Q: What is the hybridization efficiency of each oligo?

A: Lubeck and Cai estimated a hybridization efficiency of around 60-70%, and we have seen similar numbers.  Hard to know for sure why it's not 100%, but whatever, if you get enough oligos, you'll be fine.

Q: How do you know ribosomes are not preventing RNA detection?

A: In the Raj et al. Nat Meth 2008 paper, we simultaneously targeted both the open reading frame (ORF) and the 3' untranslated region (UTR) with differently colored probes and saw good colocalization.  Ribosomes should bind to the ORF but not the 3' UTR, so if the ribosomes were causing a problem, we would have noticed many more spots with the 3' UTR probes.

Q: How do you know secondary structure is not a problem?

A: In some of our early experiments (Raj et al. PLoS Bio 2006), we targeted oligos to the PP7 RNA hairpin, which is a very strong secondary structure, and saw great signal.  Same for targeting MS2 RNA hairpins.  So I'm not so worried about it.

Well, hope this helps someone somewhere.  If you're a Ph.D. student doing RNA FISH, you should definitely memorize the answers to these questions–could really help you out in your quals!

Tuesday, February 5, 2013

"Tidy data"

http://vita.had.co.nz/papers/tidy-data.pdf

The article linked above talks about a typical but undiagnosed source of unnecessary effort in data analysis, untidy data, explains what 'tidy data' looks like, and illustrates some tools that help you make the change.

Keeping data tidy saves a lot of effort. "Tidy data" is not a table format that is visually pleasing for a presentation. It is the format you'd most like data to be in for manipulations. In fact, storing data in formats that make for visually pleasing tables usually makes them especially difficult for other folks to use within programming-style analysis tools like R and Matlab. I was reminded of this when a coworker asked for help turning his manual Excel workflow into an automated Matlab workflow.

After trying to get all kinds of different types of data incorporated into an analysis related to my current project, often from the Supplementary Info in scientific papers, I've found that the less creative the authors are with their data presentation, the easier the job is.

Excel unintentionally encourages the basic problem. Since you constantly see the data, and there are all kinds of features to make borders, change fonts, join cells and pretty things up, it is hard to resist the temptation to make it into a pretty table. So your workflow looks like this:

data  -->   presentable table   ( usually stored in an Excel file and given as Supplementary Info.)
presentable table  -->   Analysis and Graphics

That last step is hard because most presentable data is not readily amenable to downstream analysis. If instead you program your data analysis (or make use of Excel's more advanced features like pivot tables), your workflow can look like this:

data  -->   presentable table
data  -->   Analysis and Graphics

As it turns out, liberating yourself from the need to have your data look presentable on its own, lets you structure it in a way that makes for rapid and painless plotting and analysis. Optimize the data format for manipulability, and save your time and others'.

- Gautham