Sunday, February 19, 2017

Results from the Guess the Impact Factor Challenge

Results from the Guess the Impact Factor Challenge

By Uschi Symmons and Arjun Raj

tl;dr: We wondered if people could guess the impact factor of the journal a paper was published in by its title. The short answer is not really. The longer answer is sometimes yes. The results suggest that talking about any sort of weird organism makes people think your work is boring, unless you’re talking about CRISPR. This begs the question of whether the people who took this quiz are cynical or just shallow. Much future research will be needed to make this determination.


Introduction:
[Arjun] This whole thing came out of a Tweet I saw:


It showed the title: “Superresolution imaging of nanoscale chromosome contacts”, and the beginning of the link: nature.com. Looking at the title, I thought, well, this sounds like it could plausibly be a paper in Nature, that most impacty of high impact journals (the article is actually in Scientific Reports, which is part of the Nature Publishing Group, which is generally considered to be low impact). This got Uschi and I thinking: could you tell what journal a paper went into by its title alone? Would you be fooled?

[Switching to Uschi and Arjun] By the way, although this whole thing is sort of a joke, we think it does hold some lessons for our glorious preprint based future, in which the main thing you have to go on is the title and the authors. Without the filter/recommendation role that current journals provide, will visibility in such a world be dominated by who the authors are and increasingly bombastic and hype-filled titles? (Not that that’s not the case already, but…)

To see if people could guess the impact factor of the journal a paper was published in solely based on the title we made up a little online questionnaire. More than 300 people filled out the questionnaire—and here are the results.

Methodology:
Our methodology was cooked up in an hour or two discussing by Slack, and has so many flaws it’s hard to enumerate them all. But we’ll try and hit the highlights in the discussion. Anyway, here’s what we did: we chose journals with a range of impact factors, three each in the high, medium, and low categories (>20, 8-20, <8, respectively). We tried to pick journals that would have papers with a flavor that most of our online audience would find familiar. We then chose two papers from each journal, picked from a random issue around December 2014/January 2015. The idea was to pick papers that have maybe receded from memory (and also have accumulated some citation statistics, reported as of Feb. 13, 2017), but not so long ago that the titles would be misleading or seem anachronistic. We picked the paper titles pretty much at random: picked an issue/did a search by date and basically just picked the first paper from the list that was in this area of biomedical science. The idea here was to avoid bias, so there was no attempt to pick “tricky” titles. There was one situation where we looked at an issue of Molecular Systems Biology and the first couple titles had colons in them, which we felt were perhaps a giveaway that it was not high profile, so we picked another issue. Papers and journals given in the results below.

The questionnaire itself presented the titles in random order and asked for each whether it was high, medium, or low impact, based on the cutoffs of 0-8, 8-20, 20+. Answering each question was optional, and we asked people to not answer for any papers that they already knew. At least a few people followed that instruction. We posted the questionnaire on Twitter (Twitter Inc.) and let Google (Alphabet) do its collection magic.

Google response analysis here, code and data here.

Results:
In total, we got 338 responses, mostly within the first day or two of posting. First question: how good were people at guessing the impact factor of the journal? Take a look:



The main conclusion is that people are pretty bad at this game. The average score was around 42%, which was not much above random chance (33%). Also, the best anyone got was 78%. Despite this, it looks like the answers were spread pretty evenly between the three categories, which matches the actual distribution, so there wasn’t a bias towards a particular answer.

Now the question you’ve probably been itching for: how well were people able to guess the journal specific titles? The answer is that they were good for some and not so good for others. To quantify how well people did, we calculated a “Perception score”, which is the average score given to a particular title, with low = 1, medium = 2, high = 3. Here is a table with the results:


TitleJournalImpact factorPerception score
Single-base resolution analysis of active DNA demethylation using methylase-assisted bisulfite sequencingNature Biotechnology43.1132.34
The draft genome sequence of the ferret (Mustela putorius furo) facilitates study of human respiratory diseaseNature Biotechnology43.1131.88
Dietary modulation of the microbiome affects autoinflammatory diseaseNature38.1382.37
Cell differentiation and germ–soma separation in Ediacaran animal embryo-like fossilsNature38.1381.77
The human splicing code reveals new insights into the genetic determinants of diseaseScience34.6612.55
Opposite effects of anthelmintic treatment on microbial infection at individual versus population scalesScience34.6611.44
Dynamic shifts in occupancy by TAL1 are guided by GATA factors and drive large-scale reprogramming of gene expression during hematopoiesisGenome Research11.3512.11
Population and single-cell genomics reveal the Aire dependency, relief from Polycomb silencing, and distribution of self-antigen expression in thymic epitheliaGenome Research11.3511.81
A high‐throughput ChIP‐Seq for large‐scale chromatin studiesMolecular Systems Biology10.8722.22
Genome‐wide study of mRNA degradation and transcript elongation in Escherichia coliMolecular Systems Biology10.8722.02
Browning of human adipocytes requires KLF11 and reprogramming of PPARĪ³ superenhancersGenes and Development10.0422.15
Initiation and maintenance of pluripotency gene expression in the absence of cohesinGenes and Development10.0422.09
Non-targeted metabolomics and lipidomics LC–MS data from maternal plasma of 180 healthy pregnant womenGigaScience7.4631.55
Reconstructing a comprehensive transcriptome assembly of a white-pupal translocated strain of the pest fruit fly Bactrocera cucurbitaeGigaScience7.4631.25
Asymmetric parental genome engineering by Cas9 during mouse meiotic exitScientific Reports5.2282.43
Dual sgRNA-directed gene knockout using CRISPR/Cas9 technology in Caenorhabditis elegansScientific Reports5.2282.25
A hyper-dynamic nature of bivalent promoter states underlies coordinated developmental gene expression modulesBMC Genomics3.8672.16
Transcriptomic and proteomic dynamics in the metabolism of a diazotrophic cyanobacterium, Cyanothece sp. PCC 7822 during a diurnal light–dark cycleBMC Genomics3.8671.25


In graphical form:

One thing really leaps out, which is the “bowtie” shape of this plot: while people, averaged together, tend to get medium-impact papers right, there is high variability in aggregate perception for the low and high impact papers. For the middle-tier, one possibility is that there is a bias towards the middle in general (like an “uh, dunno, I guess I’ll just put it in the middle” effect), but we didn’t see much evidence for an excess of “middle” ratings, so maybe people are just better at guessing these ones. Definitely not the case for the high and low end, though. The two titles apiece from Nature and Science had both high and low perceived impact. Also, the two Scientific Reports papers had very high perceived impact, presumably due to the fact that they have CRISPR in the title.

So what, if anything, makes a paper seem high or low impact? Here’s a table stratified by perceived impact factor, notice what all the low ones have in common?


TitleJournalImpact factorPerception score
The human splicing code reveals new insights into the genetic determinants of diseaseScience34.6612.55
Asymmetric parental genome engineering by Cas9 during mouse meiotic exitScientific Reports5.2282.43
Dietary modulation of the microbiome affects autoinflammatory diseaseNature38.1382.37
Single-base resolution analysis of active DNA demethylation using methylase-assisted bisulfite sequencingNature Biotechnology43.1132.34
Dual sgRNA-directed gene knockout using CRISPR/Cas9 technology in Caenorhabditis elegansScientific Reports5.2282.25
A high‐throughput ChIP‐Seq for large‐scale chromatin studiesMolecular Systems Biology10.8722.22
A hyper-dynamic nature of bivalent promoter states underlies coordinated developmental gene expression modulesBMC Genomics3.8672.16
Browning of human adipocytes requires KLF11 and reprogramming of PPARĪ³ superenhancersGenes and Development10.0422.15
Dynamic shifts in occupancy by TAL1 are guided by GATA factors and drive large-scale reprogramming of gene expression during hematopoiesisGenome Research11.3512.11
Initiation and maintenance of pluripotency gene expression in the absence of cohesinGenes and Development10.0422.09
Genome‐wide study of mRNA degradation and transcript elongation in Escherichia coliMolecular Systems Biology10.8722.02
The draft genome sequence of the ferret (Mustela putorius furo) facilitates study of human respiratory diseaseNature Biotechnology43.1131.88
Population and single-cell genomics reveal the Aire dependency, relief from Polycomb silencing, and distribution of self-antigen expression in thymic epitheliaGenome Research11.3511.81
Cell differentiation and germ–soma separation in Ediacaran animal embryo-like fossilsNature38.1381.77
Non-targeted metabolomics and lipidomics LC–MS data from maternal plasma of 180 healthy pregnant womenGigaScience7.4631.55
Opposite effects of anthelmintic treatment on microbial infection at individual versus population scalesScience34.6611.44
Reconstructing a comprehensive transcriptome assembly of a white-pupal translocated strain of the pest fruit fly Bactrocera cucurbitaeGigaScience7.4631.25
Transcriptomic and proteomic dynamics in the metabolism of a diazotrophic cyanobacterium, Cyanothece sp. PCC 7822 during a diurnal light–dark cycleBMC Genomics3.8671.25

One thing is that the titles at the bottom seem to be longer, and that is born out quantitatively, although the correlation is perhaps not spectacular:




Any other features of the title? We looked at specificity (which was the sum of the times a species, gene name or tissue was mentioned), declarativeness (“RNA transcription requires RNA polymerase” vs. “On the nature of transcription”), and mention of a “weird organism”, which we basically defined as anything not human or mouse. Check it out:



Hard to say much about declarativeness (declariciousness?), not much data there. Specificity is similarly undersampled, but perhaps there is some tendency for medium impact titles to have more specific information than others? Weird organism, however, really showed an effect. Basically, if you want people to think you wrote a low impact paper, put axolotl or something in the title. Notably, for each of the high impact journals, we had 1 each perceived as high and low impact, and this “weird organism” metric explained that difference completely. The exception to this is, of course, CRISPR: indeed, the highest perceived low impact paper was CRISPR in C. elegans. Note that we also included E. coli as “weird”, although probably should not have.

We then wondered: does this perception even matter? Does it have any bearing on citations? So many confounders here, but take a look:


First off, where you publish clearly is clearly strongly associated with citations, regardless of how your title is perceived. Beyond that, it was murky. Of the high impact titles, the ones with high perception index definitely were cited more, but the n is small there, and the effect is not there for medium and low impact titles. So who knows.

Discussion:
Our conclusion seems to be that mid-tier journals publish things that sound like they should be in mid-tier journals, perhaps with titles with more specificity. Flashy and non-flashy papers (as judged by actual impact factor) both seem to be playing the same hype game, and some of them screw up by talking about a weird organism.

Anyway, before reading too much in into any of this, like we said in the methods section, there are lots of problems with this whole thing. First off, we are vastly underpowered: the total of 18 titles is nowhere near enough to get any real picture of anything but the grossest of trends. It would have been better to have a large number of titles and have the questionnaire randomly select 18 of them, but if we didn’t get enough responses, then we would not have had very good sampling for any particular title. Also, it would have been interesting to have more titles per journal, but we instead opted for more journals just to give a bit more breadth in that respect. Oh well. Some folks also mentioned that 8 is a pretty aggressive cutoff for “low impact”, and that’s probably true. Perception of a journal’s importance and quality is not completely tied to its numerical impact factor, but we think the particular journals we chose would be pretty commonly associated with the tiers of high, medium and low. With all these caveats, should we have given our blog post the more accurate and specific title “Results from the Guess the Impact Factor Challenge in the genomicsy/methodsy subcategory of molecular biology from late 2014/early 2015”? Nah, too boring, who would read that? ;)

We think one very important thing to keep in mind is that what we measured is perceived impact factor. This is most certainly not the same thing as perceived importance. Indeed, we’re guessing that many of you played this game with your cynic hat on, rolling your eyes at obviously “high impact” papers that are probably overhyped, while in the back of your mind remembering key papers in low impact journals. That said, we think there’s probably at least some correspondence between a seemingly high profile title and whether people will click on it—let’s face it, we’re all a bit shallow sometimes. Both of these factors are probably at play in most of us, making it hard to decipher exactly how people made the judgements they did.

Question is what, if anything, should we do in light of this? A desire to “do” something implies that there is some form of systematic injustice that we could either try to fix or, conversely, try to profit from. To the former, one could argue that the current journal system (which we are most definitely not a fan of, to be clear), may provide some role here in “mixing things up”. Since papers in medium and high impact journals get more visibility than those in low impact journals, our results show that high impact journals can give exposure to poorly (or should we say specific or informatively?) titled papers, potentially giving them a citation boost and providing some opportunity for exposure that may not otherwise exist, however flawed the system may be. We think it’s possible that the move to preprints may eliminate that “mixing-things-up” factor and thus increase the incentive to pick the flashiest (and potentially least informative) title possible. After all, let’s say we lived in a fully preprint-based publishing world. Then how would you know what to look at? One obviously dominant factor is who the authors are, but let’s set that aside for now. Beyond that, one other possibility is to try and increase whatever we are measuring with perception score. So perhaps everyone will be writing like that one guy in our field with the crazy bombastic titles (you know who I mean) and nobody will be writing about how “Cas9–crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria” any more. Hmm. Perhaps science Twitter will basically accomplish the same thing once it recovers from this whole Trump thing, who knows.

Perhaps one other lesson from all of this is that science is full of bright and talented people doing pretty amazing work, and not everybody will get the recognition they feel they deserve, though our results suggest that it is possible to manipulate at least the initial perception of our work somewhat. A different question is whether we should care about such manipulations. It is simplistic to say that we should all just do the work we love and not worry about getting recognition and other outward trappings of success. At the same time, it is overly cynical to say that it’s all just a rat race and that nobody cares about the joy of scientific discovery anymore. Maybe happiness is realizing that we are most accurately characterized by living somewhere in the middle… :)

Friday, February 17, 2017

Introducing Slideboards, a tool for scientific communication

Given the information overload we all deal with, I think it’s pretty safe to say that scientific communication is more important than ever these days. The problem is that we’re still mostly using the same format we’ve been using for ages, namely the paper. And the bottom line is that people just don’t read them. The problem, deep down, is that papers serve two not entirely overlapping purposes: one is to tell people what you learned, and the other is to document precisely how you learned it. This is particularly problematic when trying to understand work outside your particular subdomain—all the details make it hard to focus on the bigger picture.

How do we normally solve the problem of giving a big-picture version of what your paper is about? Personally, I feel like the 5-10 minute short talk like you hear at a conference, when done well, accomplishes this nicely. So our first foray into communicating our science more efficiently was to make slidecasts, which are short videos consisting of slides and a voiceover narration—basically, an online version of the short conference talk. I think these are generally pretty effective, and I’ve gotten generally positive feedback on them, usually along the lines of “We should make these, too” (more on that later). One person I sent a slidecast to, though, had an interesting response. He said that he liked it, but that it was “Too slow, I want to get though the slides faster” and that “I want to know the answer to particular details, but I can’t get them.” Hmm. How do you make something simultaneously faster and include more information? So after a fair amount of thinking, we took a cue from the web. If you need to renew your driver’s license, do you download the entire operational manual of the DMV? No, you go to the website and get the overview. And if you have some special case scenario, like your boat-car needs a special game-and-fisheries license or something? Just look at the FAQ. Which got me thinking: maybe this is the solution for the “faster, but more content” crowd is looking for. Have a slidecast that one can flip through quickly, then a FAQ on the side that answer those “supplementary figure” questions that people often have during a short talk.

So we made exactly this! (And by we, I mean my awesome technician Rohit, who coded the whole thing from scratch.) We call them Slideboards, and you can check out our first fully-featured “Slideboard” here. I think it pretty much realizes our initial concept. Feel free to post a question and I will try and answer it!

Of course, it’s nice for us to make slidecasts and now Slideboards, but this always raises the question: how do we get others to make them, too? This brings me back to the feedback we got our slidecasts, which was “We should make these”, after which approximately zero people ever actually make one. Why not? Well, after having made a few of these myself, the answer is that it’s a lot of work—you really have to have a fully written out script, and it usually requires at least a couple takes, which all adds up to the better part of a day. (Of course, the fact that the work itself probably took two to four years never seems to enter into this calculus, but whatever.) Which is why we really wanted to make an authoring tool that would make the task of creating a Slideboard as simple as possible. Problem is, it’s hard. The reason why is simple, which is that making content just plain takes time, as anyone who’s made endless graphical abstracts and bullet points and the such can relate to. So we thought to ourselves, what is the content that pretty much everyone already has on their work? We thought two things: a slide deck for a talk on the work, and the PDF of the preprint or other written version that has various figures and supplementary figures. Our authoring tool leverages these to allow you to make a Slideboard quickly and easily. Basically, upload the slides to make the slideshow part and type in captions for the slides to provide some narration, then make questions and answers through a quick interface that allows you to drag and select images from the PDF to quickly insert into your answers. Here’s a very short video showing how to do it:



And that’s it! If you have some slides and Also, the viewer interface allows you audience to ask you questions, which you can then choose to answer if it seems appropriate (not that there are any dumb questions or anything, but… ;) ). We’ve tried to make the whole process as painless as possible, and hope to see your work soon!

Still, in a world with a steady stream of new ways to reformat and share your scientific work, why use this one? We believe that our approach provides a simple, rapidly digestible format that simultaneously provides a lot of information. Meanwhile, we’ve provided an authoring tool that makes it as easy as possible to develop Slideboards of your own.

And what can you do with Slideboards? Our primary goal so far has been to make a format for sharing scientific papers, and you can easily share links to either the entire Slideboard or a specific slide or question; you just edit the URL like this:

https://slideboard.herokuapp.com/sparks/14?slide_no=3&question_no=4

(More convenient URL generating buttons coming soon!) We think there are plenty more possibilities, however, including outreach to young students just getting interested in science, and probably many others we haven’t thought of. Anyway, give it a try, and just let us know if you have any questions, happy to help!

Sunday, February 5, 2017

A bigly new method: the most tremendous FISH ever invented

Post by Ian Mellis.

Here we present a novel method for the visualization and quantification of previously unobservable...what am I saying? This isn't how we write papers anymore! Now that our elected officials can so unceremoniously dispense with objective fact and insist on a personally profitable alternative reality (in a permanent tantrum televised 24 hours a day), I think it's about time that we update our scientific discourse to match the political.

You can access our BIGLy-FISH paper here FREE OF CHARGE. Patriotic!