Better Data Visualization And Population Is Needed
This was a topical post on data visualization and population. I have refreshed it a little. While the specific maps have changed, As of late 2020 the maps are better now. E.g., they often show cases per 100k of the population rather than simple totals. That said, the point is still relevant. When showing maps these can mislead if there are different sizes of populations and you don’t account for this. Data visualization and population is an important thing to think about.
Spring 2020: we are under a stay at home order because of the coronavirus. The medicine side is not something I have knowledge of. That said, the communications could definitely be improved upon I think. The unprecedented actions being taken to stem the virus are not being well supported by effective communication. To be fair explaining what is happening is a very tough challenge. Yet, many governments have clearly been caught flat footed. What is more news organizations aren’t obviously doing any better. As I watch the Canadian, US, and UK news I’m not sure any are covering themselves with glory.
Data Visualization And Population
Take for example a chart that we have seen a lot of on US cable news: total coronavirus cases in the US by state. Despite this being shown on many major news outlets I have been left wondering what its purpose is.
Scott Berinato has a book on how to present information visually: Good Charts. This contains useful information and checklists on making effective, i.e. persuasive, data visualizations. Relevant to the topic at hand Berinato (2016) talks about why some geographic mappings may be deceptive.
“The size of geographic space usually over- or under-represents the variable encoded within it. This is especially true with maps that represent populations.”Scott Berinato
What about the map of total coronavirus cases in the US by state? This map seems like a reasonable representation of the data that we have. (For the sake of this discussion, given my lack of specific knowledge, I will assume that the data has some value. This means that I will deliberately ignore important differences in testing between states which almost certainly distort the data). The map being shown on cable news color codes by number of cases Those states with a higher number of cases are indicated by a more dramatic shade.
This Could Be Useful I Guess
Firstly, let me note that this map could be useful in certain circumstances. Assume that you were the US federal government and your aim was to help recovery by directing limited federal aid to states in most need. (I’m hoping that is their aim.) If you were sending aid you might want to know which states have the highest number of cases to send them the needed supplies. (Technically I’m guessing you need to know what the numbers of cases are going to climb to by the time the aid arrives, but current cases might be a reasonable place to start). To this end the map might be useful. It tends to show lots of cases in states with large populations and populous states will generally need more equipment given they have more people.
But It Isn’t Useful To Cable News Viewers
That said, do any of the cable news viewers find information on total cases per state particularly useful? I would argue less than you might expect. It is certainly likely to worry people in the big states given the dramatic shades representing their states. This might help drive viewership in big states. Still, I hope that the aim is not to worry but instead to inform. If this is the case then one might expect viewers would be better served if they were told how badly their states are hit. To represent this, cases per head of the population might be a much better measure.
The problem with the map based upon raw numbers per state is that it is conflating two different things. 1) How badly the state has been hit and 2) how big the state is. It is hardly surprising that California has a lot more cases than North Dakota. Wouldn’t we all have predicted that absent any testing?
Genuine Differences Exist
It is important to note that some of the differences between states will reflect genuine differences between how hard each state has been hit. It is intuitive that states with big cities which visitors travel to from all over the world, and in which people live in close proximity, might face special challenges, e.g., New York City. That said, we don’t want to give the impression that coronavirus is a problem just for the big cities or populous states. The chart that we see on the news has the great danger of confusing genuinely harder hit places with places that have more people. This feeds a problem in US public policy.
The Politics Of Data Visualization And Population
Unfortunately there appears to be somewhat of a divide in US public policy. Blue (Democratic leaning) states are generally taking more action against the virus than red (Republican leaning) states. This is a source of frustration to many commentators that I see on some news channels such as CNN and MSNBC. Still, I would argue that these channels are at least somewhat fueling the problem.
The challenge is that blue leaning states tend to have more people. This is why Donald Trump’s election victory of 2016 looks more dramatic when shown on a map. I’m not making a (valid) point about the electoral college.
Instead I am noting that low density population areas tend to vote Republican. This means that more space is colored red on the map of the US than votes alone might imply. What does the map of coronavirus cases based upon infections per state do? The map distorts what I’m assuming most viewers are interested in. This is the rate of infection in their state. Instead it gives us raw numbers. The visualization being used on the news makes the virus look more like a problem focused on blue states than it truly is.
California Has More People Than Louisiana
For example, the cable news map, which is similar to one used by the CDC, makes California look like it has a similar problem to Louisiana. (The last raw data I saw said reported cases of Covid 19 are similar: California, 15,865, Louisiana, 16,284. This was early April 2020).
The problem is that Louisiana is a much less populous state than California, with just a little more than one person for every ten in California. Louisiana, currently, has a lot bigger problem per head than California. The west coast state has many more people. This means your chance of being sick with coronavirus, all else equal, is much lower in California than Louisiana. Furthermore, California has many more medical personnel and equipment given its greater population. The medical infrastructure of Louisiana is likely under much greater stress. The map makes it seem like California and Louisiana are in the same situation. They really aren’t (currently at least — again early April 2020).
Don’t Make Less Populated States Feel Too Comfortable
The chart makes it look like less populated states have less to worry about relative to more populated states. We also know that these less populated states tend to be politically redder. Some news organizations are disappointed with red states for not taking sufficient action. I would note that the visualizations they are using to illustrate the problem of coronavirus are the very ones I would use if I wanted to discourage red states from taking action. To be clear I’m not saying the US partisan divide will go away if CNN uses a better map. Still I don’t think the cable news networks should be fueling the problem and then complaining about it.
BTW if you want to learn more about Marketing Accountability see the MASB YouTube channel, https://www.youtube.com/c/masbmarketingaccountabilitystandardsboard
Read: Scott Berinato (2016) Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations, Harvard Business Review Press, Boston, Massachusetts