I recently wrote about the binning and legend choices NPR used in a hex grid map showing state-by-state variation in COVID infections. Today, I’m turning my attention to a sequential color palette in a set of maps published by the Capital Area Food Bank and how it has led me to rethink my own instincts.
Sequential schemes primarily vary lightness to represent ordered data. Darker colours denote higher data values and vice versa. A moderate hue shift can also be applied to make different classes distinct through careful control over saturation is required so that no class appears more vivid than the others.
A full description of different data types is beyond the scope of this post, but for purposes of the following discussion, it is worth defining ‘ordered data.’ Ordered categorical data (or ordinal data) defines the values representing rank or order, but it does not allow us to measure the relative degree of difference between them. For example, you might report your health in terms of poor, good, or excellent, but it’s not clear that the difference between the categories is the same. Continuous variables can also be represented as ordered categorical data—such as income grouped into categories of $10,000. If we plotted income groups on a choropleth map and followed Field’s advice, higher incomes would be encoded with darker colors and lower incomes with lighter colors.
This sequential color scheme approach implies a value judgement on the values being mapped—more income/darker colors are better than less income/lighter colors. Similarly, excellent health/darker colors are better than poor health/lighter colors. Notice that Field uses the word ‘higher’ in his definition as opposed to ‘greater,’ which may imply some sort of value judgement.
What about levels of educational attainment? Let’s say you had a geospatial data set of the share of people across five levels of educational attainment: less than 9th grade; high school graduate; some college; college graduate; and graduate degree or higher. Instinctively, I think I would use a sequential color scheme.
But now I’m not so sure.
The Capital Area Food Bank (CAFB), a nonprofit organization that provides over 30 million meals every year to needy families in the Washington, DC region, recently published their 2020 Hunger Report. In it, four choropleth maps show (1) median household income (five categories in dollars); (2) the unemployment rate (five categories of percentages); (3) educational attainment (five categories as defined above); and (4) life expectancy at birth (five categories in years).
Except for the map that shows categories of educational attainment, each map uses a sequential color palette. The map that shows educational attainment uses a qualitative color palette with each category receiving a different color. Within each category, the map also uses color transparency: “The map also uses transparency to indicate how predominant one education level is relative to the others. For example, a strong orange color indicates that high school graduates greatly outnumber other educational levels, while a weak orange indicates a slimmer margin.”
My question is: Is the qualitative color palette the correct choice? Should it be a sequential color palette instead?
Let’s step back to Field’s definition: “darker colours denote higher data values and vice versa.” It therefore makes sense that the map showing income groups would use a sequential color palette—higher incomes/values are assigned darker colors and vice versa.
But shouldn’t higher levels of educational attainment also be encoded with darker colors? The values are higher—that is, more years of education—so isn’t a sequential color palette more appropriate?
My instinct here says yes, but upon further reflection, I think CAFB has it right: It’s not true that more education equates with good/higher/darker color and less education equates with bad/less/lighter color. Would a sequential color palette reinforce stereotypes about people with lower levels of educational attainment? Instead, should we recognize, and thus visualize, that “different education levels are qualitatively different”? In other words, although education and income are correlated, what we are showing here is not that relationship education, and thus we should focus on that specific metric.
It is probably the case that if we were looking at the average number of years of educational attainment, a sequential palette would make more sense. Sixteen years of education are objectively more than twelve years of education—there’s no value judgement in that statement, just rank ordering of the data. But saying that a post graduate degree is higher than a high school degree might imply that one is better than the other.
We could take this a step further and say that higher levels of income are not inherently better than lower levels of income (see, for example, the Beatles), so should we avoid sequential color palettes here as well? In this case, I don’t think it’s a value judgement to say that more income is encoded with darker colors. Instead, that decision reflects the fact that incomes are numbers and higher numbers, as is standard, are encoded with darker colors. Similar to the case of plotting years of education—it’s not that more years of education are inherently better but instead recognizing that more years of education are just that, more.
The issue of how we represent, visualize, and label different groups in our data visualizations is something we should spend more time thinking about. How are we ordering the bars in our bar charts? What words are we using to label and categorize groups, be it race, ethnicity, gender, and others? What colors are we assigning to the different groups? I’ve started thinking harder about these questions (more content coming soon) and I hope you will join me and others in having what may be some hard conversations.
Thanks to Kenneth Field for comments on an early draft of this post and ongoing discussions about this and other mapping issues.