RajLab: July 2014

Wednesday, July 23, 2014

The hazards of commenting code

- Gautham

It is commonly thought that good code should be thoroughly commented. In fact, this is the opposite of good practice. A coding strategy that does not allow the programmer to use coding as a crutch is good. Programs should be legible on their own.

Here are the most common scenarios:

Bad. The comment is understandable and it precedes an un-understandable piece of code. When the maintainer of the code goes through this, they still have to do a lot of work to figure out how to change the code, or to figure out where the bug might be.
Better. The comment is understandable, and the line of code is also understandable. Now you are making the reader read the same thing twice. This also dilutes code into a sea of just words.
Best. There is no comment. Only an understandable piece of code due to good naming, good abstractions, and a solid design. Good job!
Terrible. The comment is understandable. The code it describes does not do what the comment says. The bug hides in here. The maintainer has to read every piece of your un-understandable code because they have realized they can't trust your comments, which they shouldn't anyway. And so all your commenting effort was for nothing. This scenario is surprisingly common.

When are comments acceptable?

Documentation. If you have a mature set of tools, you might have them to the point that the user can just read the manual, rather than read the code. This is intended for users, not maintainers, and usually takes the form of a large comment that automated documentation generation tools can interpret.
Surprising/odd behavior of libraries you are using. Matlab has some weird things it does, and sometimes I like to notify the maintainer that this line of code looks this way for a reason (especially if the line of code is more complex than a naive implementation would appear to require because of subtleties of the programming language or libraries/packages being used.) It can be counter-argued that rather than put in a comment you could put in a set of unit tests that explore all the edge-case behavior and encapsulate byzantine code into functions whose names describe the requirements that the code is trying to meet.
When your program is best explained with pictures. Programs are strings. But sometimes they represent or manipulate essentially graphical entities. For example, a program that represents a balanced binary search tree involves tree rotation manipulations. These manipulations are very difficult to describe in prose, and so they are similarly difficult to describe in code. Some ASCII art can be a real life saver in this kind of situation, because code is a poor representation of diagrams. So think of it this way: don't let yourself write text in comments but its okay to draw figures in the comments.

For more on these ideas, please just get Robert Martin's book on Clean Code.

Thursday, July 10, 2014

Undergrad class FAQ

Every semester, I get some questions from my undergrads, and in the interest of efficiency, I thought I'd post the answers here as an FAQ. Feel free to modify for your own use as you see fit.

Q: When is the midterm?
A: Feb. 14th.

Q: I have a [job interview/wedding/film festival (seriously, I got this)] during the quiz, but I'll have some time Wednesday at 2pm. Can I make it up then?
A: No.

Q: Is this going to be on the exam?
A: Most of the material I cover in class and assign homework on will be fair game for the exam, with emphasis on homework problems. I will probably not ask you to produce derivations.

Q: Is this class graded on a curve?
A: Yes, I will take into account the class average and overall performance when assigning grades.

Q: What is my grade?
A: It's hard to say, given that I don't yet have the complete picture of the class performance.

Q: Is this going to be on the exam?
A: Material from class is fair game for the exam except when explicitly noted.

Q: Is this class graded on a curve?
A: Yes, I will grade on a curve.

Q: What is my grade?
A: I don't know yet.

Q: Is this going to be on the exam?
A: Yes.

Q: Is this class graded on a curve?
A: Yes.

Q: What is my grade?
A: B.

Q: Is this going to be on the exam?
A: Yes.

Q: Is this class graded on a curve?
A: Yes.

Q: What is my grade?
A: C.

Saturday, July 5, 2014

The Fermi Paradox

I think almost every scientist has thought at one point or another about the possiblity of extra-terrestrial life. What I didn't appreciate was just how much thought some folks have put into the matter! I found this little article really summarized the various possibilities amazingly well. Reading it really gave me the willies, alternately filling me with joyous wonder, existential angst and primal dread.

One cool concept is that of the "Great Filter" that weeds out civilizations (explaining why we don't see them). Is this filter ahead or behind us? Hopefully behind, right? Better hope there's NOT life on Mars:

This is why Oxford University philosopher Nick Bostrom says that “no news is good news.” The discovery of even simple life on Mars would be devastating, because it would cut out a number of potential Great Filters behind us. And if we were to find fossilized complex life on Mars, Bostrom says “it would be by far the worst news ever printed on a newspaper cover,” because it would mean The Great Filter is almost definitely ahead of us—ultimately dooming the species. Bostrom believes that when it comes to The Fermi Paradox, “the silence of the night sky is golden.”

Friday, July 4, 2014

Smile, you've got a genetic disorder!

Check out this paper in eLife, in which the authors use machine learning applied to facial images to determine whether people have genetic disorders. So cool! From what I can gather, they use a training set of just under 3000 images of faces (1300 or so of them have a genetic disorder) and then use facial recognition software to quantify those images. Using that quantification, they can cluster different disorders based on these facial features–check out this cool animation showing the morphing of an average normal face to an average face of various disorders. Although they started with a training set of 8 syndromes, the resulting characteristics they used (the “Clinical Face Phenotype Space”) was sufficiently rich to distinguish 90 different syndromes with reasonable accuracy.

“Reasonable accuracy” being a key point. The authors are quick to point out that their accuracy (varies, can be around 95%) is not sufficient for diagnostic purposes, where you really want to know 100% (or as close to it as possible). Rather, it can assist the clinician by giving them some idea of what potential disorders might be. The advantage would be that pictures are so easy to take and share. With modern cell phone cameras having penetrated virtually every market in the world and their classification being computational, then a pretty big fraction of the world population could easily participate. I think this is one of the highlights of their work, because they note that previous approaches relied on 3D scans of people, which are obviously considerably harder to get your hands on.

This approach will have to compete with sequencing, which is both definitive for genetic disorders and getting cheaper and cheaper (woe to the imagers among us!). It doesn’t feel like a stretch to imagine sequencing a person for, say, $10 or $1 in the not so distant future, at which point sequencing’s advantages would be hard to beat.

That said, I feel like the approach in this paper has a lot of implications, even in a future where sequencing is much cheaper and more accessible. Firstly, there are diseases that are genetic but have no simple or readily discernible genetic basis, in which case sequencing may not reveal the answer (although as the number of genome sequences available increase, this may change).

Secondly, and perhaps more importantly, images are ubiquitous in ways that sequences are not. If you want someone’s sequence, you still have to get a physical sample. Not so for images, which are just a click away on Facebook. Will employers and insurers be able to discriminate based on a picture? Matchmakers? Can Facebook run the world’s largest genetic analyses? Will Facebook suggest friends with shared disorders? Can a family picture turn into a genetic pedigree? The authors even try to diagnose Abraham Lincoln with Marfan disorder from an old picture, and got a partial match. I’m sure a lot will depend on the ultimate limitations of image-based phenotyping, but still, this paper definitely got my mind whirring.

Wednesday, July 2, 2014

I think the Common Core is actually pretty good

Just read this article on NYTimes.com about people pushing back on the Common Core, which is about how a lot of parents and educators are pushing back against the "Common Core", which emphasizes problem solving skills and conceptual understanding over rote application of algorithms, i.e., plug and chug. I can't say that I'm super familiar with the details of the Common Core, but I can say that I now that I have taught the undergrads at Penn who are the products of the traditional algorithmic approach, it is clear to me that the old way of teaching math was just not cutting it. Endless repetitions of adding and subtracting progressively longer numbers is not a way to train people to think about math, as though the ability to keep numbers ordered is a proxy for conceptual understanding. I think many of the critiques of Common Core actually show many examples that seem to highlight just how much better the Common Core would be at teaching useful math.

Take as an example adding two digit numbers. I don't know anybody who does math as a part of their job who does the old "carry the 1" routine from elementary school. To me, a far better way to add numbers (and is the basis for the algorithm) is realize that 62+26 is 60+20 + 2+6. This is exactly the object of ridicule #7 in the previous link. I've been teaching my son how to add and subtract this way, and now he can pretty easily add and subtract numbers like these in his head (and no, he is not a math genius). From there, it was pretty easy to extend to other numbers and different situations as well, like, say, 640+42 and the such. I see absolutely no point in him even bothering to learn the old algorithms at this point. I think that those of us who developed mathematical skills in the old curriculum probably succeeded more despite the system than because of it.

The results of decades of algorithmic learning are students who have little conceptual understanding, and even worse, are frankly scared to think. I can't tell you how many students come to my office hours who essentially want me to spoon feed them how to solve a particular problem so that they can reproduce it correctly on a test. The result is people whose basic understanding is so weak and algorithmic that they are unable to deal with new situations. Consider this quote from the NYTimes article from a child complaining about the Common Core:

“Sometimes I had to draw 42 or 32 little dots, sometimes more,” she said, adding that being asked to provide multiple solutions to a problem could be confusing. “I wanted to know which way was right and which way was wrong.”

Even at this young age, already there is a "right way" and a "wrong way". Very dangerous!

I'm sure this Common Core thing has many faults. Beyond the obvious "Well, if it was good enough for me, grumble grumble harrumph!" reactions, I think there are probably some legitimate issues, and some of it probably stems from the fact that teaching math conceptually is a difficult thing to systematize and formalize. But from what I've seen, I think the Common Core is at least a big step in the right direction.