Well, actually, before starting to think about enforcement, I think it’s worth making sure that whatever scheme you put in place has actual, real benefits to people in the lab. I’ve come to realize that process, while it can enable science, is not science in and of itself, and it’s not always worth the effort. It’s a fine line, and perhaps somewhat a matter of personal taste; I think some folks are just fussier about stuff than others.
So what are the benefits? For our lab, I feel like there are three main benefits to building process infrastructure:
- Error reduction: To me, the most useful benefit to having a standardized and robust data pipeline is that it can greatly reduce errors. The consequences of mixing up your datasets or applying the wrong algorithm can be absolutely devastating in a number of ways.
- Reproducibility/documentation: For data, I feel, as do many others, that it is imperative to be able to reliably (and understandably) reproduce the graphs and figures in your paper from your raw data. Frankly, in this day and age, there’s no excuse not to be able to do this. Documentation is just as important for other things we do in lab, whether it’s how we designed a particular probe or what the part number is for some kit we ordered 3 years ago and is about to run out.
- Saving people time and facilitating their work: Good infrastructure can save time in a number of ways. Firstly, it hopefully leads to less wheel-reinvention, which I’ve seen all the time in other labs. Another way it saves time is by (hopefully) leaving a data trail; i.e., “That data point looks funny, can you show me the image it came from?” Good infrastructure makes it easy to answer that question, and makes it much easier to explore your data in general. If getting answers is easier, you will ask more questions, which is always a good thing.
But point 3, saving time and facilitating work, that’s something everyone can all get behind without any prodding. And then there's never any issue of compliance. For instance, our software provides all the backend to make sure that our data is fully traceable from funny outlier data point to the raw images of a particular cell. But it also provides all the tools to analyze data and use all the latest tricks and tools for image analysis that we have developed in the lab. For this reason, it's essentially inconceivable that anyone would spend any time writing their own software and doing anything else, the benefits are big and, importantly, immediately realizable future.
So what I’m thinking is that we somehow have to structure all the boring lab documentation tasks so that there is some immediate gratification for doing so. What can that be? I’m not sure. But here’s an example from the lab. We’re working on having our probe database automatically generate identifiers and little labels that we can print out and stick on the tube. Not a huge deal, but it’s sort of fun and certainly convenient. And it’s something you can enjoy right away and only get if you access the probe database. So I’m hoping that will drive the use of the database. A more ambitious plan is to develop similar databases for experiments and consequent datasets that would enable automatic data loading. This would be both important for reproducibility, but would also be enormously convenient, so I’m hoping people in the lab would be excited to give it a whirl.
Anybody else have any thoughts about how to encourage people to participate in lab best practices?