For instance, take this more conventionally over-plotted graph of city vs. highway miles per gallon, with different classes of cars labeled by color:
q2 <- qplot(cty,hwy,data=mpg,color = class) + theme_bw()
ggsave("color.pdf",q2,width = 8, height = 6)
Now there are a number of problems with this graph, but the most pertinent is the fact that there are a lot of colors corresponding to the different categories of car and so it takes a lot of effort to parse. The small multiple solution is to make a bunch of small graphs, one for each category, that allows you to see the differences between each. By the power of ggplot, behold!
q <- qplot(cty,hwy,data=mpg,facets = .~class) + theme_bw()
ggsave("horizontal_multiples.pdf",q,width = 8, height = 2)
q <- qplot(cty,hwy,data=mpg,facets = class~.) + theme_bw()
ggsave("vertical_multiples.pdf",q,width = 2, height = 8)
Notice how much easier it is to see the differences between categories of car in these small multiples than the more conventional over-plotted version, especially the horizontal one.
Most small multiple plots look like these, and they're typically a huge improvement from heavily over-plotted graphs, but I think there’s room for improvement, especially in the labeling. The biggest problem with small multiple labeling is that most of the axis labels are very far away from the graphs themselves. This is of course a seemingly logical way to set things up because the labels apply to all the multiples, but it leads to a problem because it leads to a lot of mental gymnastics to figure out what the axes are for any one particular multiple.
Thus, my suggestion is actually based on the philosophy of the small multiple itself: explain a graph once, then rely on that knowledge to help the reader parse the rest of the graphs. Check out these before and after comparisons:
The horizontal small multiples also improve, in my opinion:
To me, labeling one the small multiples directly makes it a lot easier to figure out what is in each graph, and thus makes the entire graphic easier to understand quickly. It also adheres to the principle that important information for interpretation should be close to the data. The more people’s eyes wander, the more opportunities they have to get confused. There is of course the issue that by labeling one multiple, you are calling attention to that one in particular, but I think the tradeoff is acceptable. Another issue is a loss of precision in the other multiples. Could include tickmarks as more visible markers, but again, I think the tradeoff is acceptable.
Oh, and how did I perform this magical feat of alternative labeling of small multiples (as well as general cleanup of ggplot's nice-but-not-great output)? Well, I used this amazing software package called “Illustrator” that works with R or basically any software that spits out a PDF ;). I’m of the strong opinion that being able to drag around lines and manipulate graphical elements directly is far more efficient than trying to figure out how to do this stuff programmatically most of the time. But that’s a whole other blog post…