Friday, August 1, 2014

How to write fast

Being a scientist means being able to write effectively about your science: papers, grants, e-mails, reviews, blogs, twitters, facebooks, whatever. And being an efficient scientist means being able to write about your science both effectively and fast. Striking this balance is a struggle for most people, and solutions are likely highly personal, but here are a few things I’ve found have worked for me (more interesting/less generic ones towards the end):
  1. Deadlines are your friend. Wait until the last minute and write in a big spurt. I personally feel that the last 10% takes way more than 10% of the time, but actually makes much less than 10% difference in the final outcome (grant, paper, etc.). Being up against a deadline is unpleasant, but cuts down on this relatively low-payoff time.
  2. If you do have to write early for whatever reason, set an artificial early deadline and try to finish it by then as though it were a hard deadline. This has another bonus…
  3. … which is to put the piece of writing away for a week and not think about it, then come back to it. This distance gives you a sufficiently long break that editing your own writing will be much more effective and efficient than if you just edit it continuously.
  4. Don’t be afraid of the blank page. For me, the blank page is a period of reflection and thought. Often, I will look at a blank page for a week, during which time I’ve really thought about what I wanted to say, at which point it all comes out very quickly and relatively coherently. Whenever I force myself to write before I'm ready, I just end up rewriting it anyway.
  5. If you’re having a hard time explaining something in writing, just try to explain it to someone verbally. For me, this really helps me clearly formulate something. Then just write that down and see what happens. Much faster than struggling endlessly with that one troublesome sentence.
  6. Don’t worry about word limits while you’re writing. I’ve found that writing with the word limit in mind makes my writing very confusing and overly compressed because I try to squeeze in too many thoughts in as few words as possible. I find it’s more efficient to just write what I want to say as clearly as possible and then come back and cut as necessary. And be brutal about trimming and don’t look back.
  7. Watch out for “track changes wars”. If you’re writing with other people (who doesn't these days), there is a natural tendency to push back against other people’s edits. This can lead to a lot of back and forth about minor points. One way to handle this is to just accept all changes in the document and read it clean. If whatever it was is a real problem, it will still stand out.
  8. Learn the “templates” for scientific writing. Most scientific writing has a particular form to it, and once you learn that, it makes for easy formulas for getting ideas out of your mind and onto the page. These templates vary from format to format. For instance, in a paper, often the results section will go something like “Our findings suggested that X. For X to be true, we reasoned that Y could be either A or B. In order to test for Y, we performed qPCR on…” Rinse and repeat. If you find it sounding repetitive, just use your thesaurus, and learn the 3-4 common variants for the given sentiment (e.g., “we reasoned”, “we hypothesized”, “we considered whether”) and cycle through them. It’s all rather prosaic, but it will get words on the page. You can channel your inner Shakespeare in revision. Same thing for grants.
  9. Regarding templates for grants, I have basically found it much easier to work from someone else’s grant. Many grants have very vague outlines for overall structure, and so ask a friend for theirs and try to stick with that. It will save you hours of wondering whether this or that structure or style can be funded. Which reminds me: be sure to ask people who, you know, actually got the grant… :)
  10. Some people really like writing out an outline of the whole thing first. I’ve never really been able to get into that myself. But a few times lately when I’ve really been up against a deadline, I tried what I can perhaps best call a “short form temporary outline”. The idea is that I have to write a paragraph, and it has to say 4 things. Write out a very quick outline just below the cursor with bullet points of these 4 things in a reasonable order. This should just take a couple minutes. Then, well, just start writing them out. If a thought comes to you while writing, just add it to the outline so you remember. It’s sort of like a to-do list for the paragraph. I’ve found this made writing faster because I didn’t feel like I had to try to remember a lot of stuff in my head, thus freeing my mind to just write. Next paragraph, next outline.
Oh, and avoid passive voice. The question of how to reduce the crushing writing load we all are facing to begin with is perhaps a topic for another blog post... :)

Wednesday, July 23, 2014

The hazards of commenting code

- Gautham

It is commonly thought that good code should be thoroughly commented. In fact, this is the opposite of good practice. A coding strategy that does not allow the programmer to use coding as a crutch is good. Programs should be legible on their own.

Here are the most common scenarios:


  • Bad. The comment is understandable and it precedes an un-understandable piece of code. When the maintainer of the code goes through this, they still have to do a lot of work to figure out how to change the code, or to figure out where the bug might be.
  • Better. The comment is understandable, and the line of code is also understandable. Now you are making the reader read the same thing twice. This also dilutes code into a sea of just words.
  • Best. There is no comment. Only an understandable piece of code due to good naming, good abstractions, and a solid design. Good job!
  • Terrible. The comment is understandable. The code it describes does not do what the comment says. The bug hides in here. The maintainer has to read every piece of your un-understandable code because they have realized they can't trust your comments, which they shouldn't anyway. And so all your commenting effort was for nothing. This scenario is surprisingly common. 

When are comments acceptable?
  • Documentation. If you have a mature set of tools, you might have them to the point that the user can just read the manual, rather than read the code. This is intended for users, not maintainers, and usually takes the form of a large comment that automated documentation generation tools can interpret.
  • Surprising/odd behavior of libraries you are using. Matlab has some weird things it does, and sometimes I like to notify the maintainer that this line of code looks this way for a reason (especially if the line of code is more complex than a naive implementation would appear to require because of subtleties of the programming language or libraries/packages being used.) It can be counter-argued that rather than put in a comment you could put in a set of unit tests that explore all the edge-case behavior and encapsulate byzantine code into functions whose names describe the requirements that the code is trying to meet.
  • When your program is best explained with pictures. Programs are strings. But sometimes they represent or manipulate essentially graphical entities. For example, a program that represents a balanced binary search tree involves tree rotation manipulations. These manipulations are very difficult to describe in prose, and so they are similarly difficult to describe in code. Some ASCII art can be a real life saver in this kind of situation, because code is a poor representation of diagrams. So think of it this way: don't let yourself write text in comments but its okay to draw figures in the comments.

For more on these ideas, please just get Robert Martin's book on Clean Code. 

Thursday, July 10, 2014

Undergrad class FAQ

Every semester, I get some questions from my undergrads, and in the interest of efficiency, I thought I'd post the answers here as an FAQ.  Feel free to modify for your own use as you see fit.

Q: When is the midterm?
A: Feb. 14th.

Q: I have a [job interview/wedding/film festival (seriously, I got this)] during the quiz, but I'll have some time Wednesday at 2pm. Can I make it up then?
A: No.

Q: Is this going to be on the exam?
A: Most of the material I cover in class and assign homework on will be fair game for the exam, with emphasis on homework problems. I will probably not ask you to produce derivations.

Q: Is this class graded on a curve?
A: Yes, I will take into account the class average and overall performance when assigning grades.

Q: What is my grade?
A: It's hard to say, given that I don't yet have the complete picture of the class performance.

Q: Is this going to be on the exam?
A: Material from class is fair game for the exam except when explicitly noted.

Q: Is this class graded on a curve?
A: Yes, I will grade on a curve.

Q: What is my grade?
A: I don't know yet.

Q: Is this going to be on the exam?
A: Yes.

Q: Is this class graded on a curve?
A: Yes.

Q: What is my grade?
A: B.

Q: Is this going to be on the exam?
A: Yes.

Q: Is this class graded on a curve?
A: Yes.

Q: What is my grade?
A: B.

Q: Is this going to be on the exam?
A: Yes.

Q: Is this class graded on a curve?
A: Yes.

Q: What is my grade?
A: C.

Saturday, July 5, 2014

The Fermi Paradox

I think almost every scientist has thought at one point or another about the possiblity of extra-terrestrial life. What I didn't appreciate was just how much thought some folks have put into the matter! I found this little article really summarized the various possibilities amazingly well. Reading it really gave me the willies, alternately filling me with joyous wonder, existential angst and primal dread.

One cool concept is that of the "Great Filter" that weeds out civilizations (explaining why we don't see them). Is this filter ahead or behind us? Hopefully behind, right? Better hope there's NOT life on Mars:
This is why Oxford University philosopher Nick Bostrom says that “no news is good news.” The discovery of even simple life on Mars would be devastating, because it would cut out a number of potential Great Filters behind us. And if we were to find fossilized complex life on Mars, Bostrom says “it would be by far the worst news ever printed on a newspaper cover,” because it would mean The Great Filter is almost definitely ahead of us—ultimately dooming the species. Bostrom believes that when it comes to The Fermi Paradox, “the silence of the night sky is golden.”

Friday, July 4, 2014

Smile, you've got a genetic disorder!

Check out this paper in eLife, in which the authors use machine learning applied to facial images to determine whether people have genetic disorders. So cool! From what I can gather, they use a training set of just under 3000 images of faces (1300 or so of them have a genetic disorder) and then use facial recognition software to quantify those images. Using that quantification, they can cluster different disorders based on these facial features–check out this cool animation showing the morphing of an average normal face to an average face of various disorders. Although they started with a training set of 8 syndromes, the resulting characteristics they used (the “Clinical Face Phenotype Space”) was sufficiently rich to distinguish 90 different syndromes with reasonable accuracy.

“Reasonable accuracy” being a key point. The authors are quick to point out that their accuracy (varies, can be around 95%) is not sufficient for diagnostic purposes, where you really want to know 100% (or as close to it as possible). Rather, it can assist the clinician by giving them some idea of what potential disorders might be. The advantage would be that pictures are so easy to take and share. With modern cell phone cameras having penetrated virtually every market in the world and their classification being computational, then a pretty big fraction of the world population could easily participate. I think this is one of the highlights of their work, because they note that previous approaches relied on 3D scans of people, which are obviously considerably harder to get your hands on.

This approach will have to compete with sequencing, which is both definitive for genetic disorders and getting cheaper and cheaper (woe to the imagers among us!). It doesn’t feel like a stretch to imagine sequencing a person for, say, $10 or $1 in the not so distant future, at which point sequencing’s advantages would be hard to beat.

That said, I feel like the approach in this paper has a lot of implications, even in a future where sequencing is much cheaper and more accessible. Firstly, there are diseases that are genetic but have no simple or readily discernible genetic basis, in which case sequencing may not reveal the answer (although as the number of genome sequences available increase, this may change).

Secondly, and perhaps more importantly, images are ubiquitous in ways that sequences are not. If you want someone’s sequence, you still have to get a physical sample. Not so for images, which are just a click away on Facebook. Will employers and insurers be able to discriminate based on a picture? Matchmakers? Can Facebook run the world’s largest genetic analyses? Will Facebook suggest friends with shared disorders? Can a family picture turn into a genetic pedigree? The authors even try to diagnose Abraham Lincoln with Marfan disorder from an old picture, and got a partial match. I’m sure a lot will depend on the ultimate limitations of image-based phenotyping, but still, this paper definitely got my mind whirring.

Wednesday, July 2, 2014

I think the Common Core is actually pretty good

Just read this article on NYTimes.com about people pushing back on the Common Core, which is about how a lot of parents and educators are pushing back against the "Common Core", which emphasizes problem solving skills and conceptual understanding over rote application of algorithms, i.e., plug and chug. I can't say that I'm super familiar with the details of the Common Core, but I can say that I now that I have taught the undergrads at Penn who are the products of the traditional algorithmic approach, it is clear to me that the old way of teaching math was just not cutting it. Endless repetitions of adding and subtracting progressively longer numbers is not a way to train people to think about math, as though the ability to keep numbers ordered is a proxy for conceptual understanding. I think many of the critiques of Common Core actually show many examples that seem to highlight just how much better the Common Core would be at teaching useful math.

Take as an example adding two digit numbers. I don't know anybody who does math as a part of their job who does the old "carry the 1" routine from elementary school. To me, a far better way to add numbers (and is the basis for the algorithm) is realize that 62+26 is 60+20 + 2+6. This is exactly the object of ridicule #7 in the previous link. I've been teaching my son how to add and subtract this way, and now he can pretty easily add and subtract numbers like these in his head (and no, he is not a math genius). From there, it was pretty easy to extend to other numbers and different situations as well, like, say, 640+42 and the such. I see absolutely no point in him even bothering to learn the old algorithms at this point. I think that those of us who developed mathematical skills in the old curriculum probably succeeded more despite the system than because of it.

The results of decades of algorithmic learning are students who have little conceptual understanding, and even worse, are frankly scared to think. I can't tell you how many students come to my office hours who essentially want me to spoon feed them how to solve a particular problem so that they can reproduce it correctly on a test. The result is people whose basic understanding is so weak and algorithmic that they are unable to deal with new situations. Consider this quote from the NYTimes article from a child complaining about the Common Core:
“Sometimes I had to draw 42 or 32 little dots, sometimes more,” she said, adding that being asked to provide multiple solutions to a problem could be confusing. “I wanted to know which way was right and which way was wrong.”
Even at this young age, already there is a "right way" and a "wrong way". Very dangerous!

I'm sure this Common Core thing has many faults. Beyond the obvious "Well, if it was good enough for me, grumble grumble harrumph!" reactions, I think there are probably some legitimate issues, and some of it probably stems from the fact that teaching math conceptually is a difficult thing to systematize and formalize. But from what I've seen, I think the Common Core is at least a big step in the right direction.

Saturday, June 28, 2014

Why bother studying molecular biology if the singularity is coming?

Perhaps I’m just being hopelessly optimistic, but I believe Ray Kurzweil’s singularity is going to happen, and while it may not happen on his particular timetable, I would not be surprised to see it in my lifetime. For those of you who haven’t heard of it, the singularity is when the power of artificial intelligence surpasses our own, at which point it becomes impossible to predict the future pace of change in technology. Sounds crazy, right? Well, I thought it was crazy to have a computer play Jeopardy, but not only did it play, but it crushed all human challengers. I think it’s a matter of when, not if, but reasonable people could disagree… :)

Anyway, that got me thinking: if artificial intelligence is the next version/successor of our species, and it’s coming within, say, 50 years, then what’s the point of studying molecular biology? If we consider a full understanding of the molecular basis of development to be a 50-100 year challenge, then what’s the point? Or cancer? Or any disease? What’s the point of studying an obsolete organism?

In fact, it’s unclear what the point is in studying anything other than how to bring about the super-intelligent machines. Because once we have them, then we can just sit back and have them figure everything else out. That spells doom for most biomedical research. You could make an argument for neuroscience, which may help hasten the onset of the machines, but otherwise, well, the writing’s on the wall. Or we can just do it for fun, which is the only reason we do anything anyway, I suppose…