10 Ways to Mislead with Data Visualization

We all love the overtly wrong data visualization. The graph that is so clearly and obviously incorrect that it makes us laugh—and cry. Sometimes it doesn’t matter all that much, like this graph from a Milwaukee news channel. We all know 101.6 inches is waaay more than 7.1 inches and a much bigger difference than is pictured here, so the graph’s incorrect scale is not a huge deal. But it’s still terribly wrong.

Screenshot of a news station that says Seasonal Snow at the top. Two snowmen are depicted, a short one for Milwaukee with a label of 7.1 inches and a taller one for Buffalo with a label of 101.6 inches.

There are cases where misleading graphs and charts can have an impact and can affect policy or decision-making. Generally, these types of graphs and charts make my ears perk up and lead me to be more suspicious—not just of the graph itself, but of the person or organization who made it. Was it a simple error? Lack of knowledge of how graphs are used or read? Or is it being done intentionally to mislead?

To illustrate how people can lie through charts, I’ve listed my 10 most misleading data visualization strategies, ranked by my sense of how egregious the strategy is (remember Letterman?). These graphs and visualization strategies aren’t inherently “wrong” in the true sense of the word, but used incorrectly can mislead, misinform, and lie. And if you want a more in-depth treatment of how people can lie with graphs, I highly, highly recommend Alberto Cairo’s book, How Charts Lie.

10. Pie charts that don’t sum to 100%

Pie charts should, by definition, sum to 100%. It doesn’t matter if there is a gap because of rounding or missing data or the graph sums to more than 100%, I’m going to immediately question the accuracy of the data underlying any pie chart that doesn’t sum to 100%.

Oh yes…. #WTF pic.twitter.com/aAHpXTl1rv
— Mark Bradbourne (he/him) (@MarkBradbourne) March 26, 2020

Pie chart with a small wedge missing at the top — *Source: Social Security Advisory Board*

9. Charts that use 3D styling

3D charts are less common than in years past, but I still cringe a bit whenever I come across one. First off, they just look old. They look dated, like the data visualization version of clip art. But besides aesthetics, I don’t like 3D charts because I don’t know if the 3D effect is being used to make the graph “pop” or to hide something?

I promise I wasn’t trying to mislead readers with this graph from an article I published in 2007, but wow, I definitely would not do this now. Because of the 3D styling, we can’t tell what happened to the trend in the last category ($100k-$150k) prior to 1996.

3D chart with 10 sections showing the change in rounded earnings from 1975 to 2006

8. Overlaid regression lines

Regression lines are not common across data visualization generally, but make frequent appearances in the research literature. One has to be very careful about what regression lines are showing in a graph. Is the author using the lines to suggest causality? What other variables are included in the equation to determine the regression line?

Here are two examples of misleadingly overlaid regression lines from Kevin Hassett, an economist who has worked at the American Enterprise Institute and the Council of Economic Advisors (CEA) under President Trump.

The first attempts to fit a line to a scatterplot of about 30 countries, with tax revenues as a percent of GDP on the left axis and government corporate tax rates on the horizontal axis. In what world does that line best fit the data? I couldn’t find the exact data, so I just tried to match it up to the original graphic and no matter how I slice it (my attempts in red)—with a line curve or a second-order polynomial—I don’t get close to that curve. (Brad DeLong has a great post in 2017 about this graph.)

Scatterplot with the title Corporate Taxes and Revenue, 2004. The vertical axis is tax revenues as a percentage of GDP and the horizontal axis is government corporate tax rates. There are about 25 countries in the plot with a fitted line that looks like an inverted U. — *Source: Wall Street Journal*, *(my lines in red)*

The second example is from Hassett’s tenure at the CEA where this graph—published in May 2020—shows their projections for COVID-19 deaths through the end of the year. Their “curve-fitting” cubic model exercise showed a precipitous decline in COVID-19 deaths, which was misleading at the time and, needless to say, terribly wrong in hindsight.

To better visualize observed data, we also continually update a curve-fitting exercise to summarize COVID-19's observed trajectory. Particularly with irregular data, curve fitting can improve data visualization. As shown, IHME's mortality curves have matched the data fairly well. pic.twitter.com/NtJcOdA98R
— CEA45 Archived (@WhiteHouseCEA45) May 5, 2020

7. Inverted vertical axis

Inverted vertical axes have been used well and have been used poorly, most famously in the Iraq’s Bloody Toll (2011) and Gun Deaths in Florida (2014), respectively. In these cases, the vertical axis puts the lower number at the top (in these two, at zero), and proceeds to increase as you go down the visual. It’s definitely not standard, but it can work—as in the Iraq’s Bloody Toll graph. But it can also be deceiving, as you can see in the Gun Deaths in Florida graph. There are subtle differences here that drive the problem with the latter graph—the axis labels and axis line at the bottom of the graph; an area chart rather than a bar chart; and the dark, thick black line at the bottom of the area chart. All of these markers make us see the white area as the data instead of the red area. Yes, it can work, but be careful.

Infographic with the title Iraq's blood tool. The vertical axis starts at 0 and then gets larger as you move down the image. The horizontal axis are dates from 2003 to 2011. The main image is a red bar chart showing civilian deaths in Iraq.

Area chart with the title Gun deaths in Florida. The vertical axis starts at 1,000 at the bottom and goes up to 0 at the top. The main data area is a red area chart. — *Source: Reuters*

Here’s another example. And another. Both seemingly intentionally trying to mislead.

6. Misleading Comparisons

Whenever a graph compares values over time, I want to know why the author chose the end points. Was it the only data available? Did the data collection process change, making a comparison between year A and year B incorrect? Or maybe they are comparing peak-to-peak or trough-to-trough, say in comparing economic conditions over time.

Climate change denier and insurrection supporter Steve Milloy published this tweet in September 2022, which claims that there has been no overall increase in global temperatures because the temperature in August 2022 is the same as it was in August 1998. Ignoring for a second that he cherry-picked a single month out of more than 520 estimates, more absurdly he drew the comparison from one high mark to another high mark. That’s like being in a recession and saying the economy is doing great because the economy is doing better than it was during the Great Depression.

August NASA satellite temps in.

No warming in almost 8 years — despite lots more CO2.

August 2022 same temp as August 1998.

Climate is a hoax.https://t.co/y3I5bHOL1T pic.twitter.com/hMQBXRDvG8
— Steve Milloy (@JunkScience) September 2, 2022

5. Percentages, not levels

With data visualization, sometimes we want to show levels, other times we want to show changes, and more times we want to show shares. It all depends on the message and the data. In cases where I see percentage changes over time, my ears perk up. What is the denominator? Even in per capita numbers, is the author dividing by all people or some subset of the population? Is it a percentage change or a percentage point change?

In August 2022, Jonas Vesterberg from The Florida Standard tried to stoke the fears of transphobic parents by claiming that between 2007 and 2021, the share of kids who “received irreversible surgical procedures [to treat gender dysphoria] in the Medicaid population” increased by 1,100 percent. How many kids did that affect? Well, the number of kids who received such procedures over that time period increased from 3 to 12.

Line chart with the title Testosterone, Estrogen, Puberty Blockers and Behavioral Health (Children). There are four lines shown for four years: 2017, 2018, 2019, and 2020. The vertical axis spans from 0% to 300%. — *Source: The Florida Standard*

4. Maps

Obviously, I don’t mean all maps, but data-driven maps do have inherent trade-offs that can often lead to misleading data visualizations. On the one hand, maps allow people to see themselves in the data—they can find their town, their state, their country. On the other hand, map’s often correlate the size of the geography with the importance of the data value, which is not always the case. Let’s compare Russia and Nigeria for example. Russia is 6.6 million square miles, nearly 20 times as large as Nigeria. But Russia is home to 145 million people, while Nigeria is home to more than 220 million people, about 50% more populous.

Maybe the best example of this size versus importance tension is the US presidential election map. Although the standard county-level choropleth map colors counties based on which candidate carried the most votes, it’s not the size of the county that matters, but the number of people living there. Thus, Donald Trump’s favorite map from 2016 doesn’t really show a “massive landslide victory,” but instead shows that he tended to carry larger, more rural, less populous counties. (Also, he lost the popular vote by almost 3 million votes in 2016 and 7 million votes in 2020.)

Man holding a framed picture of the 2016 county-level US presidential election map.

There are other aspects on data-driven maps that make my ears perk up. In particular, unusual bin (legend) sizes or categories and diverging color palettes for sequential data. Map projections don’t make me think the creator is trying to lie to me, but I don’t like the Mercator projection all the same.

3. Bar charts that don’t start at zero

The jury is still out on this one, but I think most practitioners (and maybe most researchers) would agree that bar charts should start at zero.

TV image of a bar chart with the title, "If Bus Tax Cuts Expire." The graph has two bars, one labeled 35% now and the other labeled 39.6% on January 1, 2013. The vertical axis extends from 34% to 42%. — *Source: Fox News*

I’ve written about this conundrum in the past, but here is my mathematical intuition as to why bar charts should start at zero. In the graph on the left, the axis starts at zero. Imagine the increments on the vertical axis are each 1 inch. Take the ratio of the heights of the bars and you get a ratio of 1.3. The same data are shown on the right, but the axis now starts at 100. Repeat the same calculation, and you get a ratio of 2.0. By starting the axis at something other than zero, we overemphasize the difference in the two values.

Illustration of two vertical bar charts. In both, the left bar is labeled 150 and the right bar is labeled 200. The vertical axis for the graph on the left starts at 0 and the vertical axis for the graph on the right starts at 100.

2. Broken bar charts

In my opinion, broken bar charts are one of the cardinal sins of data visualization. With this technique, you add a symbol to the bar to denote that it is “broken,” but it actually extends further than the graph shows. The creator has made an arbitrary decision as to where to cut the bars and how far out to extend the horizontal axis. Every time I see it, I’m think, “Nope, this chart is definitely misleading.”

A horizontal bar chart that includes the United States and several other countries, as well as the state of Iowa. The bar for the US is labeled 664 and the one for Iowa is labeled 582. They both have black diagonal lines through them. The horizontal axis extends from 0 to 200. — Source: https://policyviz.com/2022/10/24/break-the-frame-an-approach-to-visualizing-outliers/

1. Dual axis charts

The number one graph type that makes me wonder whether the creator is trying to pull one over on me is the dual axis chart. Are the axes scaled honestly? Is the author trying to hide something by pulling my eye away from the interesting trends? Tyler Vigen at Spurious Correlations has a large collection of these charts—and I wish he’d make more—that show by simply adjusting the scale of the two vertical axes, you can make two independent series look correlated.

Dual axis line chart from Tyler Vigen with the title, "People who drowned after falling out of a fishing boat correlates with Marriage rate in Kentucky." The left vertical axis shows fishing both deaths and extends from 0 to 20. The right vertical axis shows Kentucky marriages and extends from 7 per 1,000 to 11 per 1,000. — *Source: Tyler Vigen*

More generally, I think most people use the dual axis chart as a solution for getting as much data on a graph as possible. But they are often difficult to read, gridlines and tick marks don’t match up, and the focal point(s) can be misleading. There are a variety of alternatives to this kind of chart (see here and here), which offer the same data but in clearer (and less deceptive) ways.

Dual axis line chart with the title, "Volatility in global oil prices effect on US wheat and poultry products." The left vertical axis extends from $0 billion to $1.4 billion. The right vertical axis extends from $0 to $120. — Source: https://www.ers.usda.gov/data-products/chart-gallery/?chartId=53767

Of course, these aren’t the only misleading data visualization practices. A smattering of others that didn’t quite make it into my top 10 include, mostly because they are not as common:

No source notes or links to data. Data sources should be listed and obvious.
Icons sized by area, not height or width. Be careful here because it’s just going to offend someone.
Cherry-picking the data. Not necessarily a data visualization bias on its own, which makes it harder to spot. Sometimes, especially when showing changes over time, you can get a sense that the author is trying to hide something.
Error bars that aren’t error bars. Error bars that are supposed to show confidence intervals or ranges should do just that.
Objectively and obviously too many labels. There’s not magic rule about character count or word count on a graph, but sometimes it’s just so obvious.
Log scales, when used incorrectly. It’s likely difficult for many readers to understand what a log scale is at all, but when used incorrectly, it’s often misleading.
Weird y-axis ranges, y-axis increments, or no y-axis at all. If the vertical axis doesn’t start at zero, I immediately wonder why. And uncommon increments along the axes—like every three or six values—also make me wonder.

So, here’s the quick summary:

Pie charts that don’t sum to 100%
Charts that use 3D styling
Overlaid regression lines
Inverted vertical axis
Misleading Comparisons
Percentages, not levels
Maps
Bar charts that don’t start at zero
Broken bar charts
Dual axis charts

I am not suggesting any of these are data visualization “rules” because 3D can be used well, icons can add visual interest, and maps can be amazingly robust. But when I see these approaches or graph types used, I pause for an extra second. To paraphrase Alberto Cairo, “We can become happier, wiser, and more hopeful by meticulously assessing good numbers and charts.”

Did you like this post? You could have seen it a few weeks ago in my free bi-monthly newsletter, now on Substack! If you’d like even more dataviz tips, tricks, and strategies, consider signing up for my Winno community–there’s a free version and a paid tier to get even more content!

February 7, 2023

3 Comments

3277

3 comments


Stefan Fichtel

February 7, 2023 at 11:18 am

… and many more in »Praxisbuch INFOGRAFIK« (only in German, sorry) https://www.mitp.de/IT-WEB/KI-Data-Science/Praxisbuch-Infografik.html?isextsearch=alist&

Tim

February 10, 2023 at 6:10 pm

Kind of odd how transparency, honesty, and correctness are important for you in data visualization, but your article is sprinkled with ideology that is less-than-completely-honest.
- Reply
  
  Jon Schwabish
  
  February 13, 2023 at 9:09 am
  
  Tim,
  Thanks for reading. You’ll have to do more to refute the arguments here than to simply say “your article is sprinkled with ideology.” If you want to refute the arguments about how these visualizations are misleading or how the post is dishonest, please do so.
  -Jon

10 Ways to Mislead with Data Visualization

10. Pie charts that don’t sum to 100%

9. Charts that use 3D styling

8. Overlaid regression lines

7. Inverted vertical axis

6. Misleading Comparisons

5. Percentages, not levels

4. Maps

3. Bar charts that don’t start at zero

2. Broken bar charts

1. Dual axis charts

3 comments

Leave a Reply Cancel reply

Search

Listen

Categories

Shop

10 Ways to Mislead with Data Visualization

10. Pie charts that don’t sum to 100%

9. Charts that use 3D styling

8. Overlaid regression lines

7. Inverted vertical axis

6. Misleading Comparisons

5. Percentages, not levels

4. Maps

3. Bar charts that don’t start at zero

2. Broken bar charts

1. Dual axis charts

Share this:

3 comments

Leave a Reply Cancel reply

Search

Listen

Categories

Shop