Representing Different Groups in Maps

I recently wrote about the binning and legend choices NPR used in a hex grid map showing state-by-state variation in COVID infections. Today, I’m turning my attention to a sequential color palette in a set of maps published by the Capital Area Food Bank and how it has led me to rethink my own instincts.

In the previous post, I quoted a definition of the diverging color palette from Kenneth Field’s book, Cartography. I’ll begin this post by quoting his definition of a sequential color palette:

Sequential schemes primarily vary lightness to represent ordered data. Darker colours denote higher data values and vice versa. A moderate hue shift can also be applied to make different classes distinct through careful control over saturation is required so that no class appears more vivid than the others.

A full description of different data types is beyond the scope of this post, but for purposes of the following discussion, it is worth defining ‘ordered data.’ Ordered categorical data (or ordinal data) defines the values representing rank or order, but it does not allow us to measure the relative degree of difference between them. For example, you might report your health in terms of poor, good, or excellent, but it’s not clear that the difference between the categories is the same. Continuous variables can also be represented as ordered categorical data—such as income grouped into categories of $10,000. If we plotted income groups on a choropleth map and followed Field’s advice, higher incomes would be encoded with darker colors and lower incomes with lighter colors.

This sequential color scheme approach implies a value judgement on the values being mapped—more income/darker colors are better than less income/lighter colors. Similarly, excellent health/darker colors are better than poor health/lighter colors. Notice that Field uses the word ‘higher’ in his definition as opposed to ‘greater,’ which may imply some sort of value judgement.

What about levels of educational attainment? Let’s say you had a geospatial data set of the share of people across five levels of educational attainment: less than 9^th grade; high school graduate; some college; college graduate; and graduate degree or higher. Instinctively, I think I would use a sequential color scheme.

But now I’m not so sure.

The Capital Area Food Bank (CAFB), a nonprofit organization that provides over 30 million meals every year to needy families in the Washington, DC region, recently published their 2020 Hunger Report. In it, four choropleth maps show (1) median household income (five categories in dollars); (2) the unemployment rate (five categories of percentages); (3) educational attainment (five categories as defined above); and (4) life expectancy at birth (five categories in years).

Except for the map that shows categories of educational attainment, each map uses a sequential color palette. The map that shows educational attainment uses a qualitative color palette with each category receiving a different color. Within each category, the map also uses color transparency: “The map also uses transparency to indicate how predominant one education level is relative to the others. For example, a strong orange color indicates that high school graduates greatly outnumber other educational levels, while a weak orange indicates a slimmer margin.”

Choropleth map of educational attainment from the Capital Area Food Bank.

My question is: Is the qualitative color palette the correct choice? Should it be a sequential color palette instead?

Let’s step back to Field’s definition: “darker colours denote higher data values and vice versa.” It therefore makes sense that the map showing income groups would use a sequential color palette—higher incomes/values are assigned darker colors and vice versa.

But shouldn’t higher levels of educational attainment also be encoded with darker colors? The values are higher—that is, more years of education—so isn’t a sequential color palette more appropriate?

My instinct here says yes, but upon further reflection, I think CAFB has it right: It’s not true that more education equates with good/higher/darker color and less education equates with bad/less/lighter color. Would a sequential color palette reinforce stereotypes about people with lower levels of educational attainment? Instead, should we recognize, and thus visualize, that “different education levels are qualitatively different”? In other words, although education and income are correlated, what we are showing here is not that relationship education, and thus we should focus on that specific metric.

It is probably the case that if we were looking at the average number of years of educational attainment, a sequential palette would make more sense. Sixteen years of education are objectively more than twelve years of education—there’s no value judgement in that statement, just rank ordering of the data. But saying that a post graduate degree is higher than a high school degree might imply that one is better than the other.

We could take this a step further and say that higher levels of income are not inherently better than lower levels of income (see, for example, the Beatles), so should we avoid sequential color palettes here as well? In this case, I don’t think it’s a value judgement to say that more income is encoded with darker colors. Instead, that decision reflects the fact that incomes are numbers and higher numbers, as is standard, are encoded with darker colors. Similar to the case of plotting years of education—it’s not that more years of education are inherently better but instead recognizing that more years of education are just that, more.

The issue of how we represent, visualize, and label different groups in our data visualizations is something we should spend more time thinking about. How are we ordering the bars in our bar charts? What words are we using to label and categorize groups, be it race, ethnicity, gender, and others? What colors are we assigning to the different groups? I’ve started thinking harder about these questions (more content coming soon) and I hope you will join me and others in having what may be some hard conversations.

Thanks to Kenneth Field for comments on an early draft of this post and ongoing discussions about this and other mapping issues.

August 13, 2020

4 Comments

7386

4 comments


Jon Peltier

August 13, 2020 at 1:01 pm

Median household income and unemployment rate each provide a single value for each geographical region, and a single shade can be used to represent that value.
Educational attainment is really a histogram, five bars of varying heights, for each region. The Hunger Report is showing the predominant educational level for each region, but is also using darker and lighter shades depending on the degree of predominance of the level. You can’t use a sequential color scheme that also uses shading in this way, because you wouldn’t know whether the lighter shade is due to being in a different level, or if the given level is less predominant.
If they were plotting merely the predominant level of educational attainment, without shading to show the relative predominance within each region, or if you tried to do some kind of weighted average of the level in each region, you could get away with using a sequential color scale.
I’m not really sure what the shading gives the viewer: it doesn’t tell you if the predominance is lessened because of more people in the category with more or with less education.

Jon Peltier

August 13, 2020 at 1:08 pm

I also think you shouldn’t think that sequential scales imply larger is better. Things like unemployment rate and income that may fall along a continuous numerical scale are better plotted on a lighter-to-darker color scale, whether a continuous scale or broken into bins. The sequential scales merely help easily show where values are higher and lower, not necessarily what is better or worse.
- Reply
  
  Jon Schwabish
  
  August 13, 2020 at 1:37 pm
  
  Well, I think we need to keep in mind that our readers may consider sequential palettes as implying a spectrum of better to worse, whether we imply it or not. This may not be as big of a deal in this particular example of education, but I can easily imagine it applying to race, ethnicity, or other demographic characteristics.

Jon Peltier

August 13, 2020 at 4:03 pm

Demographic quantities are different, not being simple continua of easily measured numerical values. For these, categorical scales are needed. And there are still difficulties similar to the educational levels, where you might want to show regions with heterogeneous populations or individuals with multiple ethnic associations.

Representing Different Groups in Maps

My question is: Is the qualitative color palette the correct choice? Should it be a sequential color palette instead?

4 comments

Leave a Reply Cancel reply

Search

Listen

Categories

Shop

Representing Different Groups in Maps

My question is: Is the qualitative color palette the correct choice? Should it be a sequential color palette instead?

Share this:

4 comments

Leave a Reply Cancel reply

Search

Listen

Categories

Shop