Tyler Vigen has done great work popularizing Spurious Correlations. He has found an effective way to convey an important message. Namely, that correlation does not equal causation. Lots of things are correlated but that doesn’t mean that they have anything to do with each other.
Data Dredging
To create his graphs Vigen indulges in:
Data Dredging… a technique used to find something that correlates with one variable by comparing it to hundreds of other variables.
Vigen, 2015, page xii
To illustrate he provides numerous silly connections. For example, he shows the correlation between Margarine Consumption and the Divorce Rate in Maine. Of course one can always come up with a story to connect them. Still, the stories will be a stretch. His book contains lots of similar illustrations, Natalie Portman Movies and Christmas Tree Sales have a pretty high correlation. His website, which inspired the book, has many of the same pictures: click here.
A Common Cause Behind The Correlations?
Sometimes the underlying cause might be obvious. As populations grow the number of knitting shops and hospitals in town increase. Still, can you think of a good reason why hospitals and knitting shops are causally connected? (To confirm you might want to check knitting needle injury levels). Often there isn’t even an obvious underlying connection. If you compare enough things together then by complete chance some things are going to increase or decrease at the same time.
Spurious Correlations: Actors And Sales Data
There are many Hollywood actors and loads of sales data. This means some actor’s career is likely to correlate with some sales data if you look at enough actors and enough sales data. This is one reason why theory shouldn’t be seen as a dirty word. Even when we are trying to teach practical subjects theory matters. Don’t believe correlations have meaning unless you have a theory to explain how Natalie Portman impacts tree sales or vice versa.
Spurious Correlation is an especially big problem in a world of big data. Big Data encourages data dredging. Sometimes you can find something meaningful that you would never have thought of. Yet, many of the correlations you uncover will be nonsense. If you don’t act on Spurious Correlations they are just a bit of fun. Unfortunately, sometimes the nonsense can sound plausible. Bad ideas can often spread this way. We should always remember that some connections in the data just don’t mean anything at all.
For an example of nonsense academic data mining see here.
Read: Tyler Vigen, Spurious Correlations: Correlation Does Not Equal Causation, Hachette Books, New York, NY, see his website here.