“Correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there.’”
I was intrigued by this gem of statistical poetry, which Lifehacker writer Eric Ravenscraft quotes in his great new article, “Four Common Statistical Misconceptions You Should Avoid.” I had not heard the quote before. (Eric attributes it to a well-known xkcd correlation strip, but it’s not in the picture: mouse over an image, and the wisdom will reveal itself.)
Eric’s article made me think about other common fallacies to add to his list–including one that I call “misdirected statistical rigor.” It strikes when misunderstood, dirty and/or poorly collected data is run through the paces of statistical analysis and then either misinterpreted or not interpreted at all. A couple of jokes to illustrate:
- Statistical analysis of Internet adoption in rural areas has shown a very high level of Internet usage. The data was collected via web survey (aka self-selection bias).
- From an unknown (at least to me) author: “According to the Institute of Incomplete Research, 9 out of 10…” Replace “9 out of 10” with “f-value is decent,” or “kappa is looking good today,” and you’ll realize that the Institute of Incomplete Research has a huge alumni base.
What are your favorite statistical (or analytical) misconceptions? Or anti-patterns, if you are into that kind of thing?