Tuesday, April 1, 2014

Hypotheses and breadth vs. depth first searching

Given the avalanche of data out there, there is a notion that one can do "hypothesis-free" research, in which scientific findings arise out of sifting through large amounts of data for little nuggets. Indeed, on the face of it, this seems like a very efficient way to do science, because you don't have to collect new data.

Then I was reminded of an optimization problem I had to do once. It involved trying to fit parameters using maximum likelihood–basically, you have a function of a few variables and you try to find a minimum of the function. Now, superficially, you might expect that the best way to solve this problem, especially if you have multiple data sets, would be to pre-compute the function over a big range of values, saving you the time of recomputing over and over again. However, even for just a few parameters, it turns out that the optimization approach is more efficient: you would have to precompute an enormous number of functions, whereas solving the optimization problem requires comparatively little work because it converges quickly to the right answer.

In some ways, having a scientific question and then trying to answer it is like doing an optimization problem through the space of experiments, whereas the hypothesis-free approach is like precomputation (breadth first search). The problem with precomputation is that it is inefficient because you precompute many things you don't need (analogy: you make measurements you don't need) and then are probably missing the refined data you do need to converge to the exact answer (analogy: you are missing measurements you do need because nobody thought they were specifically necessary). Indeed, often times I've found that there is some data set out there that sort of gives us what we need, but when it comes down to it, we'd have to do it ourselves because we have something very specific in mind, and it's critical to get exactly that.

Then again, local optimization means that you don't learn what the entire function looks like–continuing the analogy, that's like not getting the big picture. And I'm certainly not saying hypothesis driven research is the best way or even a better way, mostly for the simple fact that most hypotheses in biology are wrong. Honestly, I'm not even 100% sure what hypothesis driven research really means. But when it comes to the most efficient way to learn something in science, I'm not sure the answer is as clear cut as it may seem...

No comments:

Post a Comment