Friday, June 7, 2013

The death of annotation

So by now even my mom knows that we live in the world of "Big Data".  At first glance, it seems that the only way to make sense of it all is to tag it, folderize it, database it, or somehow organize in some manner.  The big problem: we're making data too rapidly to tag, and nobody has the energy to properly tag things.  Nor do we have a systematic way to tag.  That's how you end up with the gigantic mess that is GO (gene ontology).

I think the solution is to just not worry about it.  I think the solution is search, ala Google et al.  Just sit back and let the computers do all the organizing for us.  It's probably the only way to make sure everything is tagged uniformly and appropriately, but even more importantly, computers won't get bored of doing it.  And so it might actually work.  A great example is Evernote, which now has automatic tagging.  Now my notes are actually organized!  I found that before, the note I was searching for would never be in the category I was looking for, because I had somehow forgotten to tag that particular note.

Actually, the best thing about search is that it's sort of like the ultimate in tagging: it's like having a huge number of tags, all customized and weighted for every document.  A prime example is Google Docs, where you can just store all your stuff in one huge basket and then search for it at will.  This makes it so much easier to actually get to what you want–and get there fast!  It's great, because Google search is so great.  I really feel like Google's search prowess really is a huge competitive advantage in virtually everything they do.

Same sort of thoughts also apply to things in research like bioinformatics.  I was lamenting to a couple people recently about how everyone stores their data for high throughput stuff in a different way, and it makes it so hard to compare datasets and to build a good story, etc.  Common complaint, and most would agree that it's a woeful state of affairs.  Two of us (including me) tried to think up some answers on how to organize things.  The other person's response: "Well, another thing we could do is not care".  I thought about it some more, and I'm actually convinced that's the right answer!  Over time, computers will be able to help us analyze and compare these datasets without our having to write those endless little Perl scripts to convert this type of format to that type of format and the such.  And it will do it so much faster and better than we can.  As humans, we are good at making messes, and it's a losing battle to try and have lots of people clean up after themselves.  The right move is to let the computers clean up the mess for us.  Like a Roomba.  Which I really want to get one day.  Or this robot that folds towels:

No comments:

Post a Comment