Friday, December 13, 2013

Something @EdwardTufte would hate.

Via Andrew Gelman:
A straight line of y = 1-x.
On the vertical axis we have the probability of being Type 2 Diabetic (T2D). On the horizontal axis we have the probability of being normal. There’s a clear, important trend evident, right? No! The probability of being normal is trivially one minus the probability of being T2D! The graph could not possibly be anything other than a straight line of slope -1. (For the students out there: the complete lack of scatter in the graph is a strong hint of something wrong.) What about the colors? They assign the data points for people with a > 50% probability of being T2D to be red, and the opposite to be green. The graph is simply plotting a tautology, that the probability of x is one minus the probability of not-x, together with a color scheme for labeling x. Paraphrasing Tufte, it has an information-to-ink ratio of approximately zero.
Not quite zero: what we seem to have here is a highly inefficient two-dimensional multicolor display of a one-dimensional set of 49 numbers, using dots that are so blurry that we can’t actually get much of a sense of their distribution. All joking aside, I’m guessing this graph would be much better if the x-axis were used for some relevant continuous variable (for example, people’s ages) and the colors used for some discrete variable (for example, some other indicator of health status). 

No comments:

Post a Comment