There was a bit of a dataviz hubbub yesterday and today owing to this tweet from Georgia Person:
In just 15 days the total number of #COVID19 cases in Georgia is up 49%, but you wouldn’t know it from looking at the state’s data visualization map of cases. The first map is July 2. The second is today. Do you see a 50% case increase? Can you spot how they’re hiding it? 1/ pic.twitter.com/wAgFRmtrPk— Georgia Person (@andishehnouraee) July 17, 2020
In case you didn’t catch it, Person’s critique of these Georgia Department of Public Health (GDPH) maps is that the bins in the maps differ—for example, the top bin starts at 2,961 cases in the map on the left and 3,769 cases on the right. This approach is therefore a misrepresentation of how COVID cases have changed over time.
Person’s Twitter thread and the ensuing discussion went along two main lines: First, the data visualization is incorrect and misleading because the bins change, which masks the increase in cases across the state. Second, the intent of the person(s) who created the visualization did so to purposely mislead.
I disagree with both of these arguments and I’m going to show you why. I’m also going to argue that Person commits the same sin of which he is accusing GDPH.
The Visualization in Incorrect
It’s important to inspect the entire data visualization when forming a critique. Person’s original critique is that showing the two maps side-by-side gives a distorted view of the change in cases across the state over time.
But the GDPH website does not show maps side by side and does not intend for people to make that comparison! Only a single map is shown on the webpage that reflects data for the current day. A separate line chart (next to or below the map, depending on your screen size) shows changes over time.
Unfortunately, in jumping on the “this data visualization is garbage!” bandwagon, many people did not visit the original website (Person didn’t link to it until the 6th tweet in his thread) and erroneously assumed the pair of maps he showed were on the GDPH webpage. If we’re going to critique someone’s work—and there is a person who made these graphics—we should at least have enough respect to view the entire work.
Later in his thread, Person highlights the second sentence in the text above the map to further demonstrate the problem with the dashboard.
Unfortunately, in so doing, Person conducts his own form of cherry picking. Let’s look more carefully at the text:
The first paragraph describes what you are going to see in the map: the number of cases by county. There is no claim that you will be able to make comparisons over time by using the map. The second paragraph then points to “the charts below” that present the number of cases “over time.” The website is clear and accurately describes what the user is being shown.
Intent: Malice or Incompetence
Part of the implicit part of the discussion around these visualizations is that the creator was intentionally misleading people. Although the tone along several of the Twitter discussions clearly assumed malice, others were more forgiving:
Taking things out of context does tend to make them appear misleading. I feel for the public servant who I imagine made this, who was probably doing their best, probably pushing back against wild urgent mandates from higher ups, probably making 30k less than industry standard… https://t.co/VZAL4Oo0tz— Melanie (@MMazanec22) July 18, 2020
Generally, when I come across what I believe to be a poor or misleading data visualization, I tend to assume it’s a mistake or due to lack of knowledge. There are, of course, many examples of visuals that are clearly trying to mislead or misinform (see, for example, here, here, here, and here), and GDPH is no exception (see here). While many people are screaming for accurate and responsible data to be used in the current COVID pandemic—and there are enough cases of the public sector violating basic responsible data use to warrant that argument—critics need to be more careful and responsible in their comments as well.
Note: I fully agree that Georgia’s leadership has deliberately mislead the public in many cases and ignored public health advice, so being skeptical of their charts is fair. There are just so many limitations and lags in the data that visualizing it accurately is TOUGH.— Amanda Makulec MPH (@abmakulec) July 18, 2020
In this case, I believe that the issue of the changing legend is likely due to how the data visualization tool (whatever it is) automatically sets the map bins based on the data. Create a map in any standard data visualization tool (e.g., Tableau, R, Datawrapper) and the tool is going to automatically set the bin widths, minimums, and maximums. Thus, if I have data that range from 1 to 4,661 on one day and then have data that range from 1 to 5,165 the next day, the tool redistributes those data. Because the maximum value in the GDPH map changes, that says to me that the creator simply refreshed the data and published it. (Mark Jackson made a similar comment in this tweet.)
How do you solve this underlying problem? One solution to the problem might be to keep the same bin widths for every map—say every 100,000 cases—use a blue color ramp for each and then add red to the top bin [Update, 7/25/2020: this tweet/discussion highlights how this approach might work]. TJ Jankun-Kelly put this well in his tweet and focuses in on this problem—how do you maintain the same bins when the maximum is going to keep changing?
If you use an scale that doesn’t have a max (or max that is effectively far enough away), this trap is inevitable. You’ll eventually exceed your old max. You can’t really add more colors (limited discriminability), so you have to change scales. 2/— T.J. Jankun-Kelly (@dr_tj) July 18, 2020
One could argue that the changing maximum value is a failure of data visualization design. Let’s go back to the original map, as shown on the website (screenshot taken this morning), and see. What are your critiques of this map? The map as actually shown on the website and not Person’s imaginary visualization?
In general, I think it’s a good map—the blue colors work well with the dark blue background; the 0/None category is white and differentiated from the blue color ramp; and the red color for the top bin helps highlight the largest outbreak of cases today* (though others have argued the legend is confusing and unclear; see for example, this tweet and Steve Wexler’s new blog post). I’m not that familiar with the state of Georgia’s geography, so I might want to see some labels for major cities, and maybe there could be some more differentiation in color for the tabs at the top, but overall, I think this is a pretty good one.
Now let’s look at the line chart below (screenshot taken today). The tabs at the top (Cases, Cumulative Cases, Deaths) are better here and are easier to differentiate. The annotation along the vertical axes is pretty good, though I might prefer to make the text horizontal and located at the top of the line. I don’t love the DDMMMYY format in the x-axis labels, but that’s a ticky-tacky critique. Although they are clear about the 14-day window and the lag in the data, (especially in the note below), I might leave those points off altogether because I’m not entirely sure a casual reader won’t see these as cases going down.
There’s also the big dashboard at the top of the GDPH webpage (shown below) using Big A** Numbers. This gives a nice summary of the data with simple tables that highlight the major metrics.
I’ve only briefly critiqued the GDPH website here, but what I have not done is taken their visuals out of context, rearranged them, and accused them of misleading users. Data visualization critique is warranted, but it must at least be done fairly.
So we are entering a new #dataviz era where you cherry pick a small part of a visualization, declare that it should answer your random question, create a silly comparison and declare the whole thing malicious or incompetent.
Let me out, please.— Jorge Camoes (@wisevis) July 19, 2020
The Georgia Department of Public Health has produced misleading charts before, so they have already invited people to criticize their work. But Person’s tweet has more than 40,000 likes at the time of this writing and that has likely lowered faith in the GDPH. In a time where listening to public health experts is more important than ever, Person’s irresponsible tweet is an even bigger problem than some minor data visualization design issues.
The explosion of criticism around the GDPH website was, I think, a bad day for the data visualization community. Far too many people ripped apart that website and clearly had not looked at the full website. Person’s tweet was misleading, incorrect, and irresponsible. Fortunately, today (Sunday), there are some tweets defending the GDPH project, but I think we need and can do better in the future.
There’s a large, growing and frustrating trend in people seeking to attribute anything they disagree with on Covid to malicious intent.
True of reactions to charts, numbers, actions, words.
Lots of people would apparently rather be outraged than informed.— John Burn-Murdoch (@jburnmurdoch) July 19, 2020
The conversation about this issue continues on Twitter, so I thought it worth adding a few updates:
–Great thread from Liz Roten on the breaks in the maps and why using a continuous color palette is better than the blue ramp + red top bin.
–Great post from Steve Wexler on the map color palette and, more importantly, what he took away from the entire discussion.
-*Also, it was pointed out to me that I noted that the map represented daily amounts when instead it shows cumulative totals.