We’ve all seen it: The bar (or column) chart with 20 or 30 or 40 different groups, each with two, three, of four categories. Is that a good chart type? Is it useful? I’ve recently started to think that these types of charts are just data dumps and are not effective ways to communicate data.
As an example, I came across this paired bar chart from Max Roser, which shows different Gini coefficients for about 20 countries. (It’s worth checking out Max’s collection of charts on living standards around the world, Our World in Data). In the interactive version, the user can toggle between the two the Ginis (defined in the legend at the top of the graph shown below), plus the reduction (or change) in the Gini.
When I looked at this chart–and there are lots of these, so I’m not trying to bash this particular graph–I had a difficult time pulling out any story. The data are sorted on the red bar, but I found myself first looking at the blue bar–maybe because it’s longer and therefore takes up a larger share of the screen? I think I was then mentally computing the difference between the two, to get a sense of the impact of taxes and transfers on inequality.
So what’s the solution? I’ve been toying around with this chart to see what else I might do, and have a couple of ideas.
The obvious first approach is to visualize the gap or difference between the two series. You cut the number of bars in half here, so I think that’s helpful. One problem, however, is that many people want to show the level of both variables variable, which you don’t get from the graph of the differences.
The second approach is to redesign this as a dot plot. Here, I think putting the values on the same horizontal line provides more balance and the grey line helps visualize the gap.
One problem with the above dot plot is that I don’t include the data values, which, again, a lot of people like to include. In this second version, therefore, I’ve added the data values to the right of each data marker. Maybe a little unbalanced here–look at Taiwan and Korea–but not too bad, I don’t think.
Finally, instead of worrying about getting the labels and the dots to line up, I just used labels instead of the dots. I’m not entirely sold on this one either, but I still think it’s easier to pull out the patterns and differences than in the original.
Other possibilities–which will depend on the nature of the data–include a scatterplot, slope chart, or stacked bar chart, among many others.
So where does this leave me? I think the paired bar chart with 40+ bars simply has too much information on it. A paired bar chart with 5 or 7 groups–even with 2 bars each–may be easier to see (and yes, 5 and 7 were chosen purposefully, thank you Professor Miller). It’s clearly an empirical question–put some charts in front of some people and ask them what they remember–an exercise I’m sure people have done.
What do you think? What are other alternatives to the basic paired bar chart with all those categories?
As GINI summarises inequality, I’ve made the guess that the intent of the chart could be to show the equalising effect of taxes and the like.
With that in mind, when I’ve re-made, I’ve tried it as a slope chart, coloured by end inequality (error in the label colouring, but please imagine I’d spent the time fixing that 🙂 ). I know grid lines are often seen as chart junk, but I prefer them to labelling every point, especially on a fuzzy measure like GINI. And I’ve gone for the potentially upsetting decision to reverse the y-axis. Typically when I parse a chart up = good, hence my decision. Happy to be called out for that though.
For an interactive, I’d probably go for selecting certain “stories” in the data (similarity between USA, UK, Israel; differences in the Nordics; SE Asia; Poland vs. USA on tax effects etc.). As a static, I’d be tempted to remove many of the countries for readability.
It would also be interesting (to me at least) to overlay an extra measure like GDP to see if there is any correlation with absolute and relative inequality.
Thanks for your comment, Paulie.
My love for slope charts knows almost no bounds. Well, actually, it knows two bounds. One, I sometimes run into the problem where there are a couple of high values in the data and then everything else looks like a straight line across. That’s not really a problem here. Two, sometimes there is just too much data and you end up with a spaghetti chart. I think that’s more of my concern here.
Also, I agree with your other points about highlighting a specific story or pattern, but my objective in this post was to simply recreate this chart without adding my own views/stories on top of it.
Thanks again,
Jon
Excellent post. So good, in fact, it inspired me to write the post about killing side-by-sides which I’ve been meaning to post for ages: http://gravyanecdote.com/uncategorized/killing-the-paired-bar-chart/
Great post Jon!
I would only delete the value axis (it’s redundant), add category axis to facilitate the comparisons between Gini of Income and the difference and perhaps emphasize the differences (use color accent for differences instead of value positions).
More interesting is the question of sorting: should we sort by income before taxes, after taxes or by the difference. Each sort order imposes a slightly different message, see here:
1. sort by Gini of Income after Taxes and Transfers: http://snag.gy/jclj0.jpg
2. sor by Gini of Income before Taxes and Transfers: http://snag.gy/heLFW.jpg
3. sort by difference: http://snag.gy/2jSbY.jpg
Andrej,
I agree that the sorting is key and I obviously chose to follow the original. One interesting question, I think, is whether sorting by a variable that is not shown–in this case, the difference–is more confusing than it’s worth. Other than a title, there’s no obvious thing telling me the 3rd version is sorted by the difference, and I wonder whether that will confuse readers.
Thanks,
Jon
I have used trend lines to solve this issue before. You can also fill in below the line to make it look like its own bar. I’m not sure if this is a better solution, but I thought it was worth mentioning.