Map of Georgia cases by county from the Georgia Department of Public Health

There was a bit of a dataviz hubbub yesterday and today owing to this tweet from Georgia Person:

In case you didn’t catch it, Person’s critique of these Georgia Department of Public Health (GDPH) maps is that the bins in the maps differ—for example, the top bin starts at 2,961 cases in the map on the left and 3,769 cases on the right. This approach is therefore a misrepresentation of how COVID cases have changed over time.

Person’s Twitter thread and the ensuing discussion went along two main lines: First, the data visualization is incorrect and misleading because the bins change, which masks the increase in cases across the state. Second, the intent of the person(s) who created the visualization did so to purposely mislead.

I disagree with both of these arguments and I’m going to show you why. I’m also going to argue that Person commits the same sin of which he is accusing GDPH.

The Visualization in Incorrect

It’s important to inspect the entire data visualization when forming a critique. Person’s original critique is that showing the two maps side-by-side gives a distorted view of the change in cases across the state over time.

But the GDPH website does not show maps side by side and does not intend for people to make that comparison! Only a single map is shown on the webpage that reflects data for the current day. A separate line chart (next to or below the map, depending on your screen size) shows changes over time.

Map of Georgia cases by county from the Georgia Department of Public Health

Unfortunately, in jumping on the “this data visualization is garbage!” bandwagon, many people did not visit the original website (Person didn’t link to it until the 6th tweet in his thread) and erroneously assumed the pair of maps he showed were on the GDPH webpage. If we’re going to critique someone’s work—and there is a person who made these graphics—we should at least have enough respect to view the entire work.

Later in his thread, Person highlights the second sentence in the text above the map to further demonstrate the problem with the dashboard.

Unfortunately, in so doing, Person conducts his own form of cherry picking. Let’s look more carefully at the text:

Georgia cases by county from the Georgia Department of Public Health

The first paragraph describes what you are going to see in the map: the number of cases by county. There is no claim that you will be able to make comparisons over time by using the map. The second paragraph then points to “the charts below” that present the number of cases “over time.” The website is clear and accurately describes what the user is being shown.

Intent: Malice or Incompetence

Part of the implicit part of the discussion around these visualizations is that the creator was intentionally misleading people. Although the tone along several of the Twitter discussions clearly assumed malice, others were more forgiving:

Generally, when I come across what I believe to be a poor or misleading data visualization, I tend to assume it’s a mistake or due to lack of knowledge. There are, of course, many examples of visuals that are clearly trying to mislead or misinform (see, for example, here, here, here, and here), and GDPH is no exception (see here). While many people are screaming for accurate and responsible data to be used in the current COVID pandemic—and there are enough cases of the public sector violating basic responsible data use to warrant that argument—critics need to be more careful and responsible in their comments as well.

In this case, I believe that the issue of the changing legend is likely due to how the data visualization tool (whatever it is) automatically sets the map bins based on the data. Create a map in any standard data visualization tool (e.g., Tableau, R, Datawrapper) and the tool is going to automatically set the bin widths, minimums, and maximums. Thus, if I have data that range from 1 to 4,661 on one day and then have data that range from 1 to 5,165 the next day, the tool redistributes those data. Because the maximum value in the GDPH map changes, that says to me that the creator simply refreshed the data and published it. (Mark Jackson made a similar comment in this tweet.)

How do you solve this underlying problem? One solution to the problem might be to keep the same bin widths for every map—say every 100,000 cases—use a blue color ramp for each and then add red to the top bin [Update, 7/25/2020: this tweet/discussion highlights how this approach might work]. TJ Jankun-Kelly put this well in his tweet and focuses in on this problem—how do you maintain the same bins when the maximum is going to keep changing?

One could argue that the changing maximum value is a failure of data visualization design. Let’s go back to the original map, as shown on the website (screenshot taken this morning), and see. What are your critiques of this map? The map as actually shown on the website and not Person’s imaginary visualization?

In general, I think it’s a good map—the blue colors work well with the dark blue background; the 0/None category is white and differentiated from the blue color ramp; and the red color for the top bin helps highlight the largest outbreak of cases today* (though others have argued the legend is confusing and unclear; see for example, this tweet and Steve Wexler’s new blog post). I’m not that familiar with the state of Georgia’s geography, so I might want to see some labels for major cities, and maybe there could be some more differentiation in color for the tabs at the top, but overall, I think this is a pretty good one.

Map of Georgia cases by county from the Georgia Department of Public Health

Now let’s look at the line chart below (screenshot taken today). The tabs at the top (Cases, Cumulative Cases, Deaths) are better here and are easier to differentiate. The annotation along the vertical axes is pretty good, though I might prefer to make the text horizontal and located at the top of the line. I don’t love the DDMMMYY format in the x-axis labels, but that’s a ticky-tacky critique. Although they are clear about the 14-day window and the lag in the data, (especially in the note below), I might leave those points off altogether because I’m not entirely sure a casual reader won’t see these as cases going down.

Line chart of Georgia cases by county from the Georgia Department of Public Health

There’s also the big dashboard at the top of the GDPH webpage (shown below) using Big A** Numbers. This gives a nice summary of the data with simple tables that highlight the major metrics.

Dashboard of Georgia cases by county from the Georgia Department of Public Health

I’ve only briefly critiqued the GDPH website here, but what I have not done is taken their visuals out of context, rearranged them, and accused them of misleading users. Data visualization critique is warranted, but it must at least be done fairly.

Going Forward

The Georgia Department of Public Health has produced misleading charts before, so they have already invited people to criticize their work. But Person’s tweet has more than 40,000 likes at the time of this writing and that has likely lowered faith in the GDPH. In a time where listening to public health experts is more important than ever, Person’s irresponsible tweet is an even bigger problem than some minor data visualization design issues.

The explosion of criticism around the GDPH website was, I think, a bad day for the data visualization community. Far too many people ripped apart that website and clearly had not looked at the full website. Person’s tweet was misleading, incorrect, and irresponsible. Fortunately, today (Sunday), there are some tweets defending the GDPH project, but I think we need and can do better in the future.

UPDATES (7/19/2020):

The conversation about this issue continues on Twitter, so I thought it worth adding a few updates:

Great thread from Liz Roten on the breaks in the maps and why using a continuous color palette is better than the blue ramp + red top bin.

Great post from Steve Wexler on the map color palette and, more importantly, what he took away from the entire discussion.

-*Also, it was pointed out to me that I noted that the map represented daily amounts when instead it shows cumulative totals.