I was recently asked to create a series of Excel graph options for what I’d consider a relatively small dataset (but sometimes the small datasets are the hardest to visualize!). The data consisted of a measure of life expectancy and household net income for 10 countries. I sketched and played around with the data, and one approach I really liked was a slope chart. But there was a hiccup: As far as I know, there is no way to directly create a slope chart in Excel when the metrics of the two variables are completely different.
Before I dive into this specific challenge, let’s briefly review the slope chart. Slope charts enable the reader to quickly see the relationships between two categories. Take the famous example from Edward Tufte’s book, The Visual Display of Quantitative Information. In it, Tufte shows government revenues as a share of GDP for 15 countries in two years, 1970 and 1979. Notice how you can easily see how GDP changes for each of these countries quickly and easily.
More recently, these small multiple slope charts from Bloomberg Visual Data that show the racial composition in 7 of largest U.S. cities relative to the racial composition in the police departments of those cities.
Yes, you could plot these data in a simple paired column chart—here’s a quick example—but the slope chart is more compact, easier to label, and, most importantly, I think it is easier to see the relationships between the two variables across the different cities.
In Excel, slope charts are exceedingly easy to make. In a nutshell, you create a line chart; switch the plot (if necessary); extend the horizontal axis; delete the gridlines and other axes; and style. If you want more details on how to do this, Stephanie Evergreen posted a tutorial in Excel 2013 a few weeks ago.
But the challenge I faced was a little different. The data included life expectancy—measured in years—and household net income—measured in dollars—for 10 countries plus the average for all OECD countries, using data from the OECD’s Better Life Index.
Quick review of the data: Life expectancy across these 10 countries range from 78.7 years in the U.S. to 82.8 years in Switzerland, and, on average, life expectancy across the entire OECD is 79.6 years. Household net income ranges from $19,510 in Korea to $41,355 in the U.S. If you haven’t noticed, the story here is the–let’s call it unique–relationship between income and life expectancy in the U.S.
My first slope chart with these two data series looked like this. Not very useful, but also not surprising. The two data series are both tagged to the y-axis and thus life expectancy is bunched at the bottom near zero.
So how, in Excel, do you create a slope chart where you can actually see the slope between life expectancy and income? I thought the easy answer was to use a scatterplot. Create two dummy series for the x-axis values—0 for life expectancy and 1 for household income—and plot as two different series. Again, you get a similar problem as in line plot above.
At first I thought there was an easy solution: Grab the Household Net Income series, move it to the secondary axis, and connect the dots by changing the graph type to a Straight Marked Scatter.
Here’s the rub: The Straight Marked Scatter connects the points within each series, not across. (And you can’t switch the data around to have a pair of points for each country, because you’re then back in the original problem with only one vertical axis.)
Okay, so how do we solve this conundrum? I think the answer lies in basic statistics. If you standardize each series by calculating a Z-score, you maintain the distribution and can then easily plot it in Excel using the line chart approach.
The Z-score calculation is quite simple: subtract the average from each observation and divide by the standard deviation of the series. The resulting average and standard deviation are then standardized to 0 and 1, respectively, and the two series are now on a consistent metric.
There is a perfect correlation between the raw data and the standardized data, so I haven’t really changed anything, except the units. All I’ve done is to create a linear transformation of the data to put the series on the same basis for purposes of building the chart.
We can now follow the basic chart creation process:
First, select the two data series and insert a line chart.
Now, Switch Plot (Mac) or Switch Row/Column (PC) so you have 11 lines instead of 2.
This now requires a bit of styling:
- Delete the horizontal gridlines, x-axis, and y-axis;
- Add major vertical gridlines;
- Position the x-axis labels on the tick marks instead of in between (called “Vertical axis crosses between categories” on the Mac);
- Move the x-axis labels to the bottom of the chart (to do so, format the x-axis and move the axis labels to the “Low” position);
- Thin the lines to 1.5 pt (I wrote a quick Macro to do this more quickly, but that’s a topic for another time); and finally,
- Change the colors of the lines. Here, I kept color for the five series that have an upward-sloping line and put the others in grey.
The final challenge (or trick, depending on your mood) is to add the data labels. This is kind of a pain in versions of Excel prior to Excel 2013. In Excel 2010 or 2011, you can add the labels manually by selecting each line, right-clicking, and choosing the Add Data Labels option. Now comes the tedious part: Select each data label individually and navigate through the formatting menu change its position. You then need to manually enter the actual data values. There are ways to speed this up like using specific cell references or writing a macro in VBA, but the manual approach doesn’t take all that long, especially if you’re only going to label 5 countries.
There you go. You now have a slope chart for two different metrics, somewhat easily created in Excel by manipulating the data in a specific way.
(Oh, and after I was done with all this work, the recesses of my brain kicked in and I remembered that National Geographic had made this very chart back in 2009. I’m guessing they made it in something like Adobe Illustrator or InDesign rather than Excel.)
Next Week: How to use Conditional Formatting to more easily see patterns in your data.