Communicating uncertainty is a difficult task. Many readers don’t fully appreciate or understand uncertainty, what it means, and where it comes from. A similar case can be made for visualizing distributions. I’ve heard many complaints about the good ol’ box-and-whisker plot—that it’s not a good graph type, that people don’t understand it. Personally, I think the challenge is that many people simply don’t understand the concept of distributions and percentiles rather than a fault of the graph itself. This graph, for example, is in today’s (and every day’s) weather section of the Washington Post. It’s essentially a box-and-whisker plot (box-and-box, actually), and is a daily staple of the paper. If it’s not a problem in the weather section, why is it problematic elsewhere? Likely, I would argue, because telling someone “this is the 75th percentile” is foreign to many people.
Fortunately, there appears to be a growing number of ways to communicate uncertainty and distributions including the ridgeline and beeswarm plots. Another way to communicate uncertainty is the gradient plot (I’ve also seen it called a stripe plot). In their paper on visualizing uncertainty with error bars, Michael Correll and Michael Gleicher “suggest the use of gradient plots (which use transparency to encode uncertainty) and violin plots (which use width) as better alternatives for inferential tasks than bar charts with error bars.”
A Gradient Plot in Excel
As an Excel user, I was curious about how to create the gradient plot in Excel. Turns out, it was a lot easier than I thought.
I’ll walk you through this with an example. Imagine six different point estimates, each with a different confidence interval. As you can see here, the plot adds an increasing level of transparency as we move further away from the central point estimate.
There are three parts to this chart:
- A stacked column chart—the bottom. The core graph here is a stacked bar chart. The bottom series of the stacked column chart goes from zero to the bottom of the confidence interval, and the fill is set to No Fill. The values of the first two series is 10 and 5.
- A stacked column chart—the top. The top of the stacked column spans the entire interval; in the case of the first column, the value is 20, which goes from 10 to 30. To add the transparency, I use the gradient fill options. Set the gradient fill Type to Linear and then fill in the menu so that you have a dark center color and two lighter colors at either end. Placing the darkest color in the center will create the increasing transparency visual for all of the columns.
- The point estimate. These are represented by the dark blue horizontal lines in the middle. I simply plotted these as scatterplots with the x-value equal to the value on the horizontal axis (1, 2, 3,….) and the y-value equal to the point estimate (20, 20, 30,…). The lines are created by adding horizontal error bars (with a length of 0.2) and hiding the marker point.
An Application of the Gradient Chart
How about applying this to actual data? In ProPublica’s Surgeon Scorecard, Sisi Wei, Olga Pierce, and Marshall Allen created these great bullet charts (side note: I’m on the lookout for really good bullet charts, so please let me know if you’ve seen some). While the standard bullet chart usually has a target or observed value as the small dark line or point, the ProPublica graphs added a gradient plot on top of the central point estimate.
Can we remake this in Excel? Yes. Well, kind of. Mostly.
In Excel, I have a stacked bar chart with the three ranges placed on the standard, primary axes. The width of the gradient plot is thinner than the ranges, so I add it to the chart and tag it to the secondary axes. By doing so, this means I can’t also add a scatterplot, which means I need to take a different approach to create the gradient plot—here, I have a stacked bar chart with four segments:
- The bottom segment matches the bottom of the confidence interval and the fill set to No Fill.
- The bottom half of the confidence interval is set to the full confidence interval divided by two minus 0.2 (you’ll see why in a second). I use the same gradient fill approach as before, but with only two colors from lightest to darkest.
- The “value” of the point estimate is a separate stack in the chart, set to a value of 0.2 (chosen because it looked good—other numbers will also work, but the rest of the confidence interval values then also need to be changed).
- The top half of the confidence interval uses the same strategy as the bottom half—it’s equal to the entire interval divided by two minus 0.2.
So there you go. I have the basic graph created.
But what about adding the “Low”, “Middle”, and “High” labels? Usually, I would do this by adding a scatterplot and labeling the points, but I can’t do so in this chart. One way—probably the easiest way—is to drop this into PowerPoint or Illustrator or something and manually add the labels with text boxes. You could try to add text boxes in Excel, but I find they never line up the way I want (which is why PowerPoint would work). Another way—certainly not as easy—is to create another graph with just the labels and layer the two graphs together.
In this approach, I recreate the range stacked bar chart, add the labels, set all the fill colors (including the plot and chart areas) to No Fill and then line them up together.
Notice that aligning them together using Excel’s Alignment menu doesn’t quite work.
Instead, I manually moved the new chart around to line up where I want it.
The final product looks pretty good and you’d never know I used two different graphs. Admittedly, it’s not a perfect solution.
There are a lot of challenges with visualizing uncertainty and distributions, perhaps the biggest being the general lack of statistical literacy. With sufficient annotation, however, maybe we can educate our readers about how to read these kinds of charts? My goal here is not to try to address those challenges, but to create a gradient plot in Excel. There are tons of resources to better understand these issues, which I won’t list here, but I do highly recommend Sisi Wei’s slides from the 2016 SND conference.