You’ve seen plenty of them, I’m sure. A line chart with two or more lines with two vertical axes. You read the title, the legend, the axis labels, and start contemplating the patterns. Suddenly, you realize that the metrics of the axis doesn’t match what the line is supposed to be showing. What’s going on? you wonder. Then, you notice another vertical axis on the other side of the graph. Your brow furrows and you start trying to parse the whole thing together.
Like others, I’m going to suggest you avoid dual axis charts. They are confusing, hard to read, and can be easily manipulated to suggest correlations when none exist. I’m going make this argument by backing my way into a famous alternative to a dual line chart.
To start, consider this dual-axis line chart that shows the number of auto fatalities (per 100,000 people) on the left axis and the number of miles driven per capita on the right axis in the United States from 1950 to 2011.
It’s not immediately obvious that the fatalities data are shown in the blue line and associated with the left axis, and miles driven per capita is the green line plotted along the right axis. The purpose of a graph like this is to show the decline in one series (i.e., auto fatalities) while a concurrent increase in the other (i.e., miles driven).
But there are three problems with plotting the data like this.
First, these graphs are often hard to read. Did you intuitively know which lines corresponded to which axis? I didn’t. Even if the labels and axes were colored to match the lines (which many dual-axis charts don’t include), it’s hard to discern patterns in the data. Overall, they’re extra work for the reader, especially when the labeling is not obvious.
Second, the gridlines may not match up. Notice how the horizontal gridlines in this version of the graph are associated with the left axis, which leaves the numbers on the right axis floating in space. At the crossing point in 1989, it’s hard to see if the number of miles driven is closer to 8,000 or 9,000 because the gridlines are not lined up.
Third, and most importantly, the point where the lines cross becomes a focal point, even though it may have no real meaning. In the first version, our eye is drawn to the middle of the chart where the two lines intersect, because that’s where the most interesting thing is happening. But there’s nothing special about 1985 where the lines cross—it’s just a simple coincidence of the vertical axis scales. The intended takeaway of the chart is how the two series move in opposite directions, but that’s not what draws the eye.
Vertical Axis Ranges
The vertical axis in a line chart does not need to start at zero nor is there a distinct rule for the range of the vertical axis in our line charts. And by that logic, we could arbitrarily change the dimensions of each axis to make the lines cross wherever we like.
Each of these next two graphs are reasonable ways to set the vertical axes, and by manipulating those ranges, I can make the two series look like they are closely matched for a few years at the beginning of the period and then diverge. Or, I can change the axes to make it look like the two series converge over the period. By arbitrarily choosing the axis ranges, we can make different data series look as correlated as we like.
And this is the core problem with dual-axis line charts: the chart creator can deliberately mislead readers about the relationship between the series. If you haven’t seen it, Tyler Vigen at the Spurious Correlations shows these sorts of false relationships in pretty funny ways.
Solutions to the Dual Axis Chart Problem
There are a few solutions to the dual-axis chart challenge.
First, try setting the separate line charts side by side. Not everything needs to be packed in a single graph. We can break things up and use a small multiples approach. Although side-by-side graphs should ideally have the same vertical axis range to facilitate easier comparisons, we’ve already determined that these data series are not on comparable ranges, so splitting them up and using different axis ranges can work.
If it’s important to annotate a specific point on the horizontal axis, you could also vertically arrange the two and draw a line across both. This will change the rotation of the final graphic, but may offer an easier way to label a specific value or year.
Second, we might calculate an index or the percent change from some value or year and plot the data to a single vertical axis. With this approach, the reader can see the change over time for both series and compare them along the same metric. The obvious trade-off is that we lose the level presentation of the data and instead present the change.
Third, try a different chart type. If showing the changes in the associations between the two series is important, try a connected scatterplot—a graph that is like a scatterplot with a horizontal and vertical axis, but each point represents a different unit of time, such as a quarter or a year.
The data I’ve used thus far were originally shown in perhaps one of the most famous connected scatterplots (aside from the Beveridge Curve, which only economists know). Created by Hannah Fairfield at the New York Times in 2012, this Driving Safety, in Fits and Starts connected scatterplot shows how the two series we’ve been looking at so far moved over the 62-year period.
In this excellent presentation of the data, Hannah wrapped the graph around explanatory text positioned at the bottom-left of the page and labeled specific areas to denote important periods like the energy crisis in the early 1970s and air bags in the 1990s. (It’s also worth checking out Hannah’s 2010 connected scatterplot, Driving Shifts Into Reverse, that showed the relationship between the price of a gallon of gasoline and miles driven per capita.)
There is a caveat, here. I find that about 8 out of 10 times I end up in one of two places with my connected scatterplots: either they are straight lines (e.g., program participation and program spending) or they are some kind of cluttered mess. Look what happens to the fatalities-miles connected scatterplot when we extend it through 2021–the 2011-2019 period is all over the place!
Zooming in on the most recent 20 years gives us a bit more insight, but it’s still not as straightforward as the rest of the time series. Here, we can see a decline in both metrics between 2000 and 2010. Then, there’s some squiggliness (is that a word?) between 2010 and 2016. We then see fatalities fall and miles driven increase just before the pandemic. But then, between 2019 and 2020, fatalities rise slightly and driving declines by about 1,500 miles per capita. Then, between 2020 and 2021, we see a recovery of driving and a considerable increase in auto fatalities. Last year, Americans driving behavior was about what it was in 2008.
There are always exceptions, right? For dual axis charts, I think there are three exceptions to consider.
First, when we are showing a translation of a single measure, for example Fahrenheit and Celsius temperatures. In these cases, we are not trying to track two different variables but showing how one maps directly onto another.
Second, a Pareto chart, in which (typically) vertical bars are showing individual values tagged to one vertical axis and a line is showing the sum of those values (either levels or percentages) on the other axis.
In both of these cases, I don’t think the usual pitfalls apply.
What does the research say?
There is not a ton of research on how readers perceive and process dual axis charts.
In their 2011 study, A Study on Dual-Scale Data Charts, Petra Isenberg, Anastasia Bezerianos, Pierre Dragicevic and Jean-Daniel Fekete report that their study participants found the dual axis chart (or what the authors called the “superimposed chart”) “very confusing and demanding too much concentration or reflection.” Relative to other chart types, participants in the study “performed poorly both in terms of accuracy and time” when viewing dual axis charts.
There is some slightly more recent research on the effectiveness of connected scatterplots. In their 2014 paper, The Connected Scatterplot for Presenting Paired Time Series, Steve Haroz, Robert Kosara, and Steven Franconeri test how 14 study participants read and process different connected scatterplots. They conclude that the “low-complexity” versions of the connected scatterplot (like the one presented in this post) “can be understood with little explanation” and are useful at engaging readers over more traditional graph types—like the line chart.
I hope I’ve demonstrated how dual axis charts can be troublesome. They can confuse your reader and they can be used to distort the presentation of the data. Try some of the alternatives listed above to make your data clearer and easier to read. In some cases, some of these alternatives—like the connected scatterplot—can be used to help your reader see how the patterns in your data are related to one another.
This post draws on Chapter 5 from my book, Better Data Visualizations: A Guide for Scholars, Researchers, and Wonks and was first published in my bi-weekly newsletter. You can sign up for the (free!) newsletter on my Twitter profile page. Thanks to Amy Cesal for suggesting I write, and reviewing, this post.