TL;DR: Projects are not entirely good or bad on their own. They have to match the person doing them: you! Be honest with yourself about what your strengths and passions are. Choose a project that is fundamentally aligned with those strengths. Do NOT choose a project that relies heavily on things you are not intrinsically motivated to do. You may be tempted to pick a project to shore up on weaknesses, but don’t. Any project will have aspects that will require you to work on your weaknesses, but a project that is fundamentally aligned with your weaknesses is going to be an exercise in misery.
One of the most common questions I get from new students is how to choose a scientific project. Clearly a super important part of the scientific process, but one that has had a somewhat magical quality to it, as though there is some magic wand that one waves over a set of eppendorf tubes to turn them into a preprint that everyone wants to read. Of course, many scientists have some introspection and insight into their thought processes, and while that has largely been passed on by word of mouth, there have been some wonderful recent efforts to describe project ideation (creativity), selection, and execution (see work from Itai Yanai/Martin Lorsch, Uri Alon, Michael Fischbach, and probably several others I’m missing with apologies).
But I feel like a lot of this discussion has missed one critical feature: you. As in you, the one actually doing the project. Everyone has different strengths and weaknesses as a scientist, or more relevantly, passions and aversions. In my experience, which I’m sure many have shared, it’s the match between the project and the scientist that matters far more than the project on its own.
Why does it matter so much? Here’s my theory. Academic research is a highly unstructured work environment. It is hard to quantify, on a daily, weekly, or even monthly basis, exactly what constitutes “progress”. As such, it relies very strongly on intrinsic motivation. As in, you really have to want to do something in order to put in the sustained effort required to actually do it, because it is very difficult to quantify progress from the outside to help force you to do things you don’t want to do. It is possible to force yourself to do things you don’t want in the short term, but if you are not fundamentally excited to do something, it is very hard to keep yourself motivated to do it in the medium-to-long term.
What does this mean in practice? I think it’s easier to see how it plays out by looking at common failure modes in person-project matching. One common thing I’ve seen is sometimes people feel like they need to build experimental skills even though they are fundamentally more interested in computational work, so they want to work on a project that has a significant experimental component. Then what happens is some version of the following: “I could do this experiment today, but it’s Thursday at 4pm. I’ll do it tomorrow. Oh wait, it’s Friday, now I should probably wait until Monday” and next thing you know a month goes by and the experiment still hasn’t gotten done. Sometimes, if you take the same person and give them a dataset, they’re like “I just need this analysis to finish running by 4pm, then I can run the next step, oh wouldn’t it be cool if XYZ were true, hold on let me try this…”. It’s hard to ascribe these delays or accelerations to any one particular decision, but in aggregate, they have an enormous compounding effect. Same sort of thing the other way around.
By the way, this doesn’t mean that you shouldn’t try things, especially early on. I worked in a lab for a summer after my first year of math grad school basically as an exercise in getting some exposure to experimental work, even though I thought I’d never EVER do it for my actual thesis work. Turns out I had a true passion for experiments. Been trying to lean into that ever since! But you have to continuously evaluate and be brutally honest with yourself about whether you’re doing what you’re doing because you really like it or because you think you should like it. I’ve found graduate students often get caught in the trap of working on what they think they should like instead of what they actually like.
This same reasoning affects choice of advisor, both graduate and postdoctoral, especially the latter. Pick an advisor who can help you build on your strengths, and not someone who specializes in your weaknesses. This is not to say that you can’t have complementary skills—especially for postdocs, it is often very fruitful to combine your skills from your PhD with a set of techniques in the postdoc lab. But if you join a lab where the advisor is a skilled computationalist but you want to do some cutting edge experiments, it must be done with a lot of care. You want to be sure the rest of the environment is strong, because it will be difficult for your advisor to guide you to innovate at the edge of the field given their own strengths and weaknesses. Not to say it can’t be done, but just that it should be done very carefully.
Anyway, all that to say, when choosing a project, make sure it matches your intrinsic strengths and motivations. Research is already hard enough, work on things you like to do!
Wednesday, June 5, 2024
Monday, February 5, 2024
Pre-registration in molecular biology
A few years back, perhaps in pre-pandy times, I was on a faculty development panel in which I was one of two presenters. I was of course there to present on how to use Twitter to build your brand (sigh, I’m lame), and a more senior faculty member (I think a neuroscientist) was there to talk about pre-registration in lab work. He was very kind and wise-seeming, and explained how he had been pre-registering their results in the lab for a while, and how it transformed their work.
What is pre-registration? It’s probably most familiar to you in the form of clinical studies, where there was a notorious selection bias in which results would be reported. Like, does drinking coffee cause flatulence? One would have to do a randomized controlled trial to check. But if people did, say, 100 clinical trials and only reported the ones where there was a “positive” result, then you would see 5 clinical trials with p < 0.05 showing that coffee causes flatulence, and none of the contradictory results. So now you have to pre-register a trial, meaning that you have to say, I am going to do this trial with this power and what not, and then you are obligated to report the outcome, no matter what the outcome is. A great idea!
But here was someone advocating for pre-registration much closer to home, in our day to day lab work. I remembering being vehemently and vocally opposed. Sure, clinical trials are one thing, with a clearly stated hypothesis and major resources devoted to a single experiment. But in my line of work, where we are constantly trying new experiments and checking out new avenues of work, where there are tons of false leads and new directions? How could that possibly work without gumming up the works in needless bureaucracy? I was vehemently and vocally opposed, to which the senior faculty member just patiently and calmly responded “Sure, I hear you, just think about it”.
Ever since, I keep coming back to that moment, and it has come to have a major effect on how I approach our science—and especially our reporting of it. The key take home point is: if you did an experiment to answer a question, and you don’t have any reason to exclude it based on the experiment itself, then you have to report the results. Repeat: unless there is an independent basis for the exclusion of a result, you have to report the results. Or, to put it another way: if you would have included the data if the result had come out the other way, you have to report it.
Selective reporting of data is a strange issue in molecular biology in that almost everyone agrees that it is wrong and yet the overall culture of the field leans towards selective reporting in so many ways. Here is an example from our own work. In a recent paper, we were trying to confirm the knockdown of a particular protein. We were able to show a convincing knockdown by RNA FISH, but also wanted to show that the protein levels went down. We did a bunch of westerns, but the results came out ambiguously: sometimes we saw an effect and sometimes not (there are reasons that that could be the case, but we didn't confirm those because they were very difficult). The standard thing to do here would be to not report the western results. But there was no reason to exclude the experiment other than being annoyed with the results. So, we reported it.
But again, the cultural standard in molecular biology is often not to report such ambiguous results. I saw this mindset a lot early in my career, back when RNA FISH was considered cool and people wanted our help to add some RNA FISH to their paper to spice it up. There were several times when people came to us with data in support of a, shall we say… “fanciful” hypothesis, and then we would do the RNA FISH, which would basically show the hypothesis was wrong. At which point, the would-be collaborator would beg out, saying that given the “ambiguous” nature of the RNA FISH results, “perhaps we should save the data for the next paper” (which of course never materialized). After enough of these moments, I started asking potential collaborators what stage of their paper they were at, and if they were close to the end, whether they really wanted us to do this experiment. At least one time, when faced with this choice, the person said, uhhh, let’s not!
There have also been many times when we’ve tried following up on work where we are pretty sure there has been a lot of selective reporting of positive results. Let’s just say that that is an unpleasant realization to make.
I want to emphasize that I don’t think that people are being malicious or fraudulent in their work. I think the vast majority of scientists are honest people and are not trying to do something wrong. I just think that science would benefit from having a more transparent reporting of results, because it is sometimes the data that doesn’t fit the narrative that leads to something new in the future. I also don’t necessarily think we need to formally pre-register our work, although it might be an interesting experiment to try. We should just try and shift our culture a bit towards transparent reporting. One potential challenge in doing science this way is that our stories are a lot less likely to be “perfect”. There will almost always be some bits of conflicting evidence, and given our adversarial peer review system, there is seemingly a lot of pressure to keep these conflicting results out. Or is there? We have been doing this for quite a while, and I would say that our experience has been largely fine in the sense that reviewers don’t mind as long as you are transparent about it. I say “largely” because there have definitely been cases in which reviewers point out some issue that we were transparent about and reject our paper because of it. So at least in my experience, I would say that adopting this more transparent reporting of results is not entirely without consequence. All I can say is that if we do decide to make this cultural shift, we also have to be more tolerant of imperfections in the “story” when we put our reviewer hats on.
By the way, I think a lot of people tend to think of selective reporting as a problem of experimental science. Not at all the case! Same goes for every analysis of e.g. some large scale dataset: if you checked for some signal in the data, you have to report the result, regardless of whether the result came out the way you wanted. It’s actually if anything even more of an issue in computational work in some ways, where many hypotheses can be tested with the same data in (relatively) rapid fashion.
There is also a bit of a gray area in terms of what to do about false leads. Sometimes, you have an idea that goes in a new direction that has nothing to do with the story of the paper. I don’t know what to do in this case. Certainly, science would be in some ways better for having these results out there, since there was probably (hopefully?) some basis for the experiment or analysis in the first place. But it may just serve to distract from the main thread of the paper, making it harder to follow. I don’t know how best to balance these competing and important principles, but I think it’s an important discussion for us to have.
I’m very curious how people will respond to this discussion. Ultimately, there is no form or checklist that can solve the issues we have in science. Pre-registration sounds like a bureaucratic solution, but in the end, it’s just a call for careful, honest thought about the work we do. I’m sure some people reading this will have a strongly negative reaction, much like I did at first. All I’m saying is “Sure, I hear you, just think about it.” 🙂
What is pre-registration? It’s probably most familiar to you in the form of clinical studies, where there was a notorious selection bias in which results would be reported. Like, does drinking coffee cause flatulence? One would have to do a randomized controlled trial to check. But if people did, say, 100 clinical trials and only reported the ones where there was a “positive” result, then you would see 5 clinical trials with p < 0.05 showing that coffee causes flatulence, and none of the contradictory results. So now you have to pre-register a trial, meaning that you have to say, I am going to do this trial with this power and what not, and then you are obligated to report the outcome, no matter what the outcome is. A great idea!
But here was someone advocating for pre-registration much closer to home, in our day to day lab work. I remembering being vehemently and vocally opposed. Sure, clinical trials are one thing, with a clearly stated hypothesis and major resources devoted to a single experiment. But in my line of work, where we are constantly trying new experiments and checking out new avenues of work, where there are tons of false leads and new directions? How could that possibly work without gumming up the works in needless bureaucracy? I was vehemently and vocally opposed, to which the senior faculty member just patiently and calmly responded “Sure, I hear you, just think about it”.
Ever since, I keep coming back to that moment, and it has come to have a major effect on how I approach our science—and especially our reporting of it. The key take home point is: if you did an experiment to answer a question, and you don’t have any reason to exclude it based on the experiment itself, then you have to report the results. Repeat: unless there is an independent basis for the exclusion of a result, you have to report the results. Or, to put it another way: if you would have included the data if the result had come out the other way, you have to report it.
Selective reporting of data is a strange issue in molecular biology in that almost everyone agrees that it is wrong and yet the overall culture of the field leans towards selective reporting in so many ways. Here is an example from our own work. In a recent paper, we were trying to confirm the knockdown of a particular protein. We were able to show a convincing knockdown by RNA FISH, but also wanted to show that the protein levels went down. We did a bunch of westerns, but the results came out ambiguously: sometimes we saw an effect and sometimes not (there are reasons that that could be the case, but we didn't confirm those because they were very difficult). The standard thing to do here would be to not report the western results. But there was no reason to exclude the experiment other than being annoyed with the results. So, we reported it.
But again, the cultural standard in molecular biology is often not to report such ambiguous results. I saw this mindset a lot early in my career, back when RNA FISH was considered cool and people wanted our help to add some RNA FISH to their paper to spice it up. There were several times when people came to us with data in support of a, shall we say… “fanciful” hypothesis, and then we would do the RNA FISH, which would basically show the hypothesis was wrong. At which point, the would-be collaborator would beg out, saying that given the “ambiguous” nature of the RNA FISH results, “perhaps we should save the data for the next paper” (which of course never materialized). After enough of these moments, I started asking potential collaborators what stage of their paper they were at, and if they were close to the end, whether they really wanted us to do this experiment. At least one time, when faced with this choice, the person said, uhhh, let’s not!
There have also been many times when we’ve tried following up on work where we are pretty sure there has been a lot of selective reporting of positive results. Let’s just say that that is an unpleasant realization to make.
I want to emphasize that I don’t think that people are being malicious or fraudulent in their work. I think the vast majority of scientists are honest people and are not trying to do something wrong. I just think that science would benefit from having a more transparent reporting of results, because it is sometimes the data that doesn’t fit the narrative that leads to something new in the future. I also don’t necessarily think we need to formally pre-register our work, although it might be an interesting experiment to try. We should just try and shift our culture a bit towards transparent reporting. One potential challenge in doing science this way is that our stories are a lot less likely to be “perfect”. There will almost always be some bits of conflicting evidence, and given our adversarial peer review system, there is seemingly a lot of pressure to keep these conflicting results out. Or is there? We have been doing this for quite a while, and I would say that our experience has been largely fine in the sense that reviewers don’t mind as long as you are transparent about it. I say “largely” because there have definitely been cases in which reviewers point out some issue that we were transparent about and reject our paper because of it. So at least in my experience, I would say that adopting this more transparent reporting of results is not entirely without consequence. All I can say is that if we do decide to make this cultural shift, we also have to be more tolerant of imperfections in the “story” when we put our reviewer hats on.
By the way, I think a lot of people tend to think of selective reporting as a problem of experimental science. Not at all the case! Same goes for every analysis of e.g. some large scale dataset: if you checked for some signal in the data, you have to report the result, regardless of whether the result came out the way you wanted. It’s actually if anything even more of an issue in computational work in some ways, where many hypotheses can be tested with the same data in (relatively) rapid fashion.
There is also a bit of a gray area in terms of what to do about false leads. Sometimes, you have an idea that goes in a new direction that has nothing to do with the story of the paper. I don’t know what to do in this case. Certainly, science would be in some ways better for having these results out there, since there was probably (hopefully?) some basis for the experiment or analysis in the first place. But it may just serve to distract from the main thread of the paper, making it harder to follow. I don’t know how best to balance these competing and important principles, but I think it’s an important discussion for us to have.
I’m very curious how people will respond to this discussion. Ultimately, there is no form or checklist that can solve the issues we have in science. Pre-registration sounds like a bureaucratic solution, but in the end, it’s just a call for careful, honest thought about the work we do. I’m sure some people reading this will have a strongly negative reaction, much like I did at first. All I’m saying is “Sure, I hear you, just think about it.” 🙂
Subscribe to:
Posts (Atom)