RajLab: Results from the Guess the Impact Factor Challenge

Sunday, February 19, 2017

Results from the Guess the Impact Factor Challenge

Results from the Guess the Impact Factor Challenge

By Uschi Symmons and Arjun Raj

tl;dr: We wondered if people could guess the impact factor of the journal a paper was published in by its title. The short answer is not really. The longer answer is sometimes yes. The results suggest that talking about any sort of weird organism makes people think your work is boring, unless you’re talking about CRISPR. This begs the question of whether the people who took this quiz are cynical or just shallow. Much future research will be needed to make this determination.

Introduction:
[Arjun] This whole thing came out of a Tweet I saw:

It showed the title: “Superresolution imaging of nanoscale chromosome contacts”, and the beginning of the link: nature.com. Looking at the title, I thought, well, this sounds like it could plausibly be a paper in Nature, that most impacty of high impact journals (the article is actually in Scientific Reports, which is part of the Nature Publishing Group, which is generally considered to be low impact). This got Uschi and I thinking: could you tell what journal a paper went into by its title alone? Would you be fooled?

[Switching to Uschi and Arjun] By the way, although this whole thing is sort of a joke, we think it does hold some lessons for our glorious preprint based future, in which the main thing you have to go on is the title and the authors. Without the filter/recommendation role that current journals provide, will visibility in such a world be dominated by who the authors are and increasingly bombastic and hype-filled titles? (Not that that’s not the case already, but…)

To see if people could guess the impact factor of the journal a paper was published in solely based on the title we made up a little online questionnaire. More than 300 people filled out the questionnaire—and here are the results.

Methodology:
Our methodology was cooked up in an hour or two discussing by Slack, and has so many flaws it’s hard to enumerate them all. But we’ll try and hit the highlights in the discussion. Anyway, here’s what we did: we chose journals with a range of impact factors, three each in the high, medium, and low categories (>20, 8-20, <8, respectively). We tried to pick journals that would have papers with a flavor that most of our online audience would find familiar. We then chose two papers from each journal, picked from a random issue around December 2014/January 2015. The idea was to pick papers that have maybe receded from memory (and also have accumulated some citation statistics, reported as of Feb. 13, 2017), but not so long ago that the titles would be misleading or seem anachronistic. We picked the paper titles pretty much at random: picked an issue/did a search by date and basically just picked the first paper from the list that was in this area of biomedical science. The idea here was to avoid bias, so there was no attempt to pick “tricky” titles. There was one situation where we looked at an issue of Molecular Systems Biology and the first couple titles had colons in them, which we felt were perhaps a giveaway that it was not high profile, so we picked another issue. Papers and journals given in the results below.

The questionnaire itself presented the titles in random order and asked for each whether it was high, medium, or low impact, based on the cutoffs of 0-8, 8-20, 20+. Answering each question was optional, and we asked people to not answer for any papers that they already knew. At least a few people followed that instruction. We posted the questionnaire on Twitter (Twitter Inc.) and let Google (Alphabet) do its collection magic.

Google response analysis here, code and data here.

Results:
In total, we got 338 responses, mostly within the first day or two of posting. First question: how good were people at guessing the impact factor of the journal? Take a look:

The main conclusion is that people are pretty bad at this game. The average score was around 42%, which was not much above random chance (33%). Also, the best anyone got was 78%. Despite this, it looks like the answers were spread pretty evenly between the three categories, which matches the actual distribution, so there wasn’t a bias towards a particular answer.

Now the question you’ve probably been itching for: how well were people able to guess the journal specific titles? The answer is that they were good for some and not so good for others. To quantify how well people did, we calculated a “Perception score”, which is the average score given to a particular title, with low = 1, medium = 2, high = 3. Here is a table with the results:

Title	Journal	Impact factor	Perception score
Single-base resolution analysis of active DNA demethylation using methylase-assisted bisulfite sequencing	Nature Biotechnology	43.113	2.34
The draft genome sequence of the ferret (Mustela putorius furo) facilitates study of human respiratory disease	Nature Biotechnology	43.113	1.88
Dietary modulation of the microbiome affects autoinflammatory disease	Nature	38.138	2.37
Cell differentiation and germ–soma separation in Ediacaran animal embryo-like fossils	Nature	38.138	1.77
The human splicing code reveals new insights into the genetic determinants of disease	Science	34.661	2.55
Opposite effects of anthelmintic treatment on microbial infection at individual versus population scales	Science	34.661	1.44
Dynamic shifts in occupancy by TAL1 are guided by GATA factors and drive large-scale reprogramming of gene expression during hematopoiesis	Genome Research	11.351	2.11
Population and single-cell genomics reveal the Aire dependency, relief from Polycomb silencing, and distribution of self-antigen expression in thymic epithelia	Genome Research	11.351	1.81
A high‐throughput ChIP‐Seq for large‐scale chromatin studies	Molecular Systems Biology	10.872	2.22
Genome‐wide study of mRNA degradation and transcript elongation in Escherichia coli	Molecular Systems Biology	10.872	2.02
Browning of human adipocytes requires KLF11 and reprogramming of PPARγ superenhancers	Genes and Development	10.042	2.15
Initiation and maintenance of pluripotency gene expression in the absence of cohesin	Genes and Development	10.042	2.09
Non-targeted metabolomics and lipidomics LC–MS data from maternal plasma of 180 healthy pregnant women	GigaScience	7.463	1.55
Reconstructing a comprehensive transcriptome assembly of a white-pupal translocated strain of the pest fruit fly Bactrocera cucurbitae	GigaScience	7.463	1.25
Asymmetric parental genome engineering by Cas9 during mouse meiotic exit	Scientific Reports	5.228	2.43
Dual sgRNA-directed gene knockout using CRISPR/Cas9 technology in Caenorhabditis elegans	Scientific Reports	5.228	2.25
A hyper-dynamic nature of bivalent promoter states underlies coordinated developmental gene expression modules	BMC Genomics	3.867	2.16
Transcriptomic and proteomic dynamics in the metabolism of a diazotrophic cyanobacterium, Cyanothece sp. PCC 7822 during a diurnal light–dark cycle	BMC Genomics	3.867	1.25

In graphical form:

One thing really leaps out, which is the “bowtie” shape of this plot: while people, averaged together, tend to get medium-impact papers right, there is high variability in aggregate perception for the low and high impact papers. For the middle-tier, one possibility is that there is a bias towards the middle in general (like an “uh, dunno, I guess I’ll just put it in the middle” effect), but we didn’t see much evidence for an excess of “middle” ratings, so maybe people are just better at guessing these ones. Definitely not the case for the high and low end, though. The two titles apiece from Nature and Science had both high and low perceived impact. Also, the two Scientific Reports papers had very high perceived impact, presumably due to the fact that they have CRISPR in the title.

So what, if anything, makes a paper seem high or low impact? Here’s a table stratified by perceived impact factor, notice what all the low ones have in common?

Title	Journal	Impact factor	Perception score
The human splicing code reveals new insights into the genetic determinants of disease	Science	34.661	2.55
Asymmetric parental genome engineering by Cas9 during mouse meiotic exit	Scientific Reports	5.228	2.43
Dietary modulation of the microbiome affects autoinflammatory disease	Nature	38.138	2.37
Single-base resolution analysis of active DNA demethylation using methylase-assisted bisulfite sequencing	Nature Biotechnology	43.113	2.34
Dual sgRNA-directed gene knockout using CRISPR/Cas9 technology in Caenorhabditis elegans	Scientific Reports	5.228	2.25
A high‐throughput ChIP‐Seq for large‐scale chromatin studies	Molecular Systems Biology	10.872	2.22
A hyper-dynamic nature of bivalent promoter states underlies coordinated developmental gene expression modules	BMC Genomics	3.867	2.16
Browning of human adipocytes requires KLF11 and reprogramming of PPARγ superenhancers	Genes and Development	10.042	2.15
Dynamic shifts in occupancy by TAL1 are guided by GATA factors and drive large-scale reprogramming of gene expression during hematopoiesis	Genome Research	11.351	2.11
Initiation and maintenance of pluripotency gene expression in the absence of cohesin	Genes and Development	10.042	2.09
Genome‐wide study of mRNA degradation and transcript elongation in Escherichia coli	Molecular Systems Biology	10.872	2.02
The draft genome sequence of the ferret (Mustela putorius furo) facilitates study of human respiratory disease	Nature Biotechnology	43.113	1.88
Population and single-cell genomics reveal the Aire dependency, relief from Polycomb silencing, and distribution of self-antigen expression in thymic epithelia	Genome Research	11.351	1.81
Cell differentiation and germ–soma separation in Ediacaran animal embryo-like fossils	Nature	38.138	1.77
Non-targeted metabolomics and lipidomics LC–MS data from maternal plasma of 180 healthy pregnant women	GigaScience	7.463	1.55
Opposite effects of anthelmintic treatment on microbial infection at individual versus population scales	Science	34.661	1.44
Reconstructing a comprehensive transcriptome assembly of a white-pupal translocated strain of the pest fruit fly Bactrocera cucurbitae	GigaScience	7.463	1.25
Transcriptomic and proteomic dynamics in the metabolism of a diazotrophic cyanobacterium, Cyanothece sp. PCC 7822 during a diurnal light–dark cycle	BMC Genomics	3.867	1.25

One thing is that the titles at the bottom seem to be longer, and that is born out quantitatively, although the correlation is perhaps not spectacular:

Any other features of the title? We looked at specificity (which was the sum of the times a species, gene name or tissue was mentioned), declarativeness (“RNA transcription requires RNA polymerase” vs. “On the nature of transcription”), and mention of a “weird organism”, which we basically defined as anything not human or mouse. Check it out:

Hard to say much about declarativeness (declariciousness?), not much data there. Specificity is similarly undersampled, but perhaps there is some tendency for medium impact titles to have more specific information than others? Weird organism, however, really showed an effect. Basically, if you want people to think you wrote a low impact paper, put axolotl or something in the title. Notably, for each of the high impact journals, we had 1 each perceived as high and low impact, and this “weird organism” metric explained that difference completely. The exception to this is, of course, CRISPR: indeed, the highest perceived low impact paper was CRISPR in C. elegans. Note that we also included E. coli as “weird”, although probably should not have.

We then wondered: does this perception even matter? Does it have any bearing on citations? So many confounders here, but take a look:

First off, where you publish clearly is clearly strongly associated with citations, regardless of how your title is perceived. Beyond that, it was murky. Of the high impact titles, the ones with high perception index definitely were cited more, but the n is small there, and the effect is not there for medium and low impact titles. So who knows.

Discussion:
Our conclusion seems to be that mid-tier journals publish things that sound like they should be in mid-tier journals, perhaps with titles with more specificity. Flashy and non-flashy papers (as judged by actual impact factor) both seem to be playing the same hype game, and some of them screw up by talking about a weird organism.

Anyway, before reading too much in into any of this, like we said in the methods section, there are lots of problems with this whole thing. First off, we are vastly underpowered: the total of 18 titles is nowhere near enough to get any real picture of anything but the grossest of trends. It would have been better to have a large number of titles and have the questionnaire randomly select 18 of them, but if we didn’t get enough responses, then we would not have had very good sampling for any particular title. Also, it would have been interesting to have more titles per journal, but we instead opted for more journals just to give a bit more breadth in that respect. Oh well. Some folks also mentioned that 8 is a pretty aggressive cutoff for “low impact”, and that’s probably true. Perception of a journal’s importance and quality is not completely tied to its numerical impact factor, but we think the particular journals we chose would be pretty commonly associated with the tiers of high, medium and low. With all these caveats, should we have given our blog post the more accurate and specific title “Results from the Guess the Impact Factor Challenge in the genomicsy/methodsy subcategory of molecular biology from late 2014/early 2015”? Nah, too boring, who would read that? ;)

We think one very important thing to keep in mind is that what we measured is perceived impact factor. This is most certainly not the same thing as perceived importance. Indeed, we’re guessing that many of you played this game with your cynic hat on, rolling your eyes at obviously “high impact” papers that are probably overhyped, while in the back of your mind remembering key papers in low impact journals. That said, we think there’s probably at least some correspondence between a seemingly high profile title and whether people will click on it—let’s face it, we’re all a bit shallow sometimes. Both of these factors are probably at play in most of us, making it hard to decipher exactly how people made the judgements they did.

Question is what, if anything, should we do in light of this? A desire to “do” something implies that there is some form of systematic injustice that we could either try to fix or, conversely, try to profit from. To the former, one could argue that the current journal system (which we are most definitely not a fan of, to be clear), may provide some role here in “mixing things up”. Since papers in medium and high impact journals get more visibility than those in low impact journals, our results show that high impact journals can give exposure to poorly (or should we say specific or informatively?) titled papers, potentially giving them a citation boost and providing some opportunity for exposure that may not otherwise exist, however flawed the system may be. We think it’s possible that the move to preprints may eliminate that “mixing-things-up” factor and thus increase the incentive to pick the flashiest (and potentially least informative) title possible. After all, let’s say we lived in a fully preprint-based publishing world. Then how would you know what to look at? One obviously dominant factor is who the authors are, but let’s set that aside for now. Beyond that, one other possibility is to try and increase whatever we are measuring with perception score. So perhaps everyone will be writing like that one guy in our field with the crazy bombastic titles (you know who I mean) and nobody will be writing about how “Cas9–crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria” any more. Hmm. Perhaps science Twitter will basically accomplish the same thing once it recovers from this whole Trump thing, who knows.

Perhaps one other lesson from all of this is that science is full of bright and talented people doing pretty amazing work, and not everybody will get the recognition they feel they deserve, though our results suggest that it is possible to manipulate at least the initial perception of our work somewhat. A different question is whether we should care about such manipulations. It is simplistic to say that we should all just do the work we love and not worry about getting recognition and other outward trappings of success. At the same time, it is overly cynical to say that it’s all just a rat race and that nobody cares about the joy of scientific discovery anymore. Maybe happiness is realizing that we are most accurately characterized by living somewhere in the middle… :)

9 comments:

Katie WhiteheadFebruary 21, 2017 at 11:14 AM
Very interesting! (Despite the methodology shortcomings). Would love to know how these metrics correlate with "author perceived impact factor" - that is, how good the authors thought the paper was at time of submission. The thought being that we sometimes submit to "reach" journals and sometimes we just want something published already and may send it somewhere "less" than what we think it could do. All of these factors are difficult to deconvolute.
ReplyDelete
Replies
Manojkumar SelvarajuFebruary 21, 2017 at 1:32 PM
Dr. Arjun, in the discussion section, you and Uschi considered the likely outcomes of a preprint based future. It is possible to quantify any coverage bias for preprints from certain authors/groups by studying altmetric scores for preprints and its corresponding peer-reviewed journal publication. The existence of such bias could be explained by checking whether those authors/groups adopted preprints way before it became cool and hence acquired recognition and benefitted for their early outreach.

We are entering a phase where institutions and funding agencies have recognized the importance of promoting preprints and are exploring to have preprint servers of their own. When NIH recently proposed to have a central preprint server, many scientists debated the move over twitter when one could improve the existing bioRxiv which is working nicely. Already people have started comparing the pros and cons of posting manuscripts in bioRxiv and PeerJ servers. If multiple preprint sources prop up and vie for coverage, the issues plaguing the current system based on journal impact factor may shift to preprints, pushing us back to square one.

But fortunately, we're not there yet as preprint based system is still in its infancy, giving us time to prime it with right tools to face challenges the scientific community will put it through. One way of realizing that will be to track in which journals the preprints finally end up and in what form. While bioRxiv has added a feature to display an article's publication status, peer-reviewed journals are yet to reference the corresponding preprint (DOI, Date, Server) in the published version of the paper.

In the near future, by comparing the coverage of any given article received during its two states - preprint and peer-reviewed forms, we may be able to findout how much a preprint's coverage and percieved potential impact influenced its acceptance in a certain impact factor journal. The onus will on the hosts of preprint servers to provide more visibility to preprint articles that adhere to emerging reproducibility priniciples on sharing data and code. This can be accomplished in part by displaying standardised markers in a given article's landing page under various criteria.

Finally, how we utilize the above features will determine whether we accomplish the desired coverage for a deserving article irrespective of its lab of origin.
ReplyDelete
Replies
LuisFebruary 23, 2017 at 5:26 PM
Fun stuff. I wonder what the results would look like if you excluded the Scientific Reports papers... Scientific Reports papers were likely submitted to a high impact journal, rejected and then published without changes in Scientific Reports. Their titles are likely written with high impact journals in mind.
ReplyDelete
Replies
AlexlokFebruary 27, 2017 at 2:52 PM
Two more points:
1/ The experience of the person who answers may be important: I'm a PhD student who answered, and I expect my guesses were probably not as good as a 50-yo researcher who has seen it all. Similarly, if half your answers were suggested by undergrads who didn't understand half the words in the title but happened to find your test online, it could have skewed the results.

2/ These guesses are also probably very subfield-dependent. I hope in my own subfield, where I know more precisely what is already known and what are the current hot issues, I would make better guesses. And this may determine what papers I'm more likely to read (and cite). Several of the titles you proposed were simply outside of my area of expertise, making my answer quite random (but does it matter as I'm not likely to ever read or cite them anyway).
ReplyDelete
Replies

Add comment