
I would like help visualising these data. What I have tried to show is by segments those that over or under index against the average (total). I think the challenge is finding a way to visualise data with many series and categories/segments whilst highlighting over and under indexing of categories. In the end I settled with a table…
Is there a better and less clunky way? I wanted to avoid bar charts at all costs as I feel the number of bars detracts away from it being a clean visual.
Let me know if this is unclear! I look forward to the comments!
Here is a bar chart for the total, and dot plot for the individual index values, the percent difference from total.
Tableau workbook at https://public.tableau.com/views/Alttoheatmap/BarChartandDotPlot?:embed=y
I think you just need to remove some of the non-important colors… Did this in Tableau, with a little data wrangling.
How about this?
(created in Zebra BI for Excel)
A couple of additional thoughts here:
– put the longer dimension to rows
– use the same chart type for the same units
– scale all charts
– visualize variances (in percentage points) in two colors (either red/green or blue/orange or black/blue to make it safe for people with color vision deficiency), use + and – sign in front of variance labels
Hope this helps…
Nice job, Andrej. I like Joe Mako’s too, but this one makes it easier for me to compare across both locations (down a column) and population groups (across a row). Yours is the only view that let me see that the “total” is not a true average — it must be weighted somehow.
You’ve lost one thing I like about the original and Joe’s, which is that only substantial differences are called out with color. You’re already showing plus/minus by position from the center line, so I like reserving color for the substantial differences.
A couple technical issues: It seems your differences are backwards and your population groups are in a different order.
Xan, thanks for pointing out the “technical” issues. I was too fast… Here’s the corrected version.
In the first version I sorted the population groups by average variance. Now, after doing my homework and learning about these population segments, I’ve put them back to the original order.
Regarding highlighting “substantial” differences I have a different opinion. Who decides that 20% (or any other number for that matter) is the right threshold to call it a “substantial” difference? And if only 20% and more is substantial, how about 19.9%? How about 50% or 100%? Reducing continuous values to 2 or 3 classes leads to oversimplification. It obscures the comparisons and encourages misinterpretations. If necessary, only discrete categories should be coded with color hue. Quantities are best coded in length only and that’s where almost all “heat maps” (as well as many map-based visualizations) fail.
I personally believe that a visualization should first truly and accurately present the situation, only then can we extract the message(s). The end user will decide for himself, what is substantial – if he is given the chance to fully assess and understand the situation.
However, the color can still be reduced as you suggested (e.g. using grey instead of green or using another color scheme, such as for color deficiency safe instead of red/green).
Andrej,
Great to hear your thinking on substantial differences. I often like simplifications if I trust the author — it means the author has done the work in figuring out what’s relevant to the message.
In other words, if the author of the visualization has the domain expertise to distinguish typical variation from substantial variation, it helps the reader to get the benefit of that expertise. (I’m trying to avoid loaded stat words like “normal” and “significant” since I’m speaking loosely.) I’m not sure if 20% is arbitrary or not in this case, though I’m assuming it is at least somewhat meaningful.
This idea of using some rule to visualize special values differently appears in many statistic graphics, such as box plots, control charts and funnel plots. In those cases, it’s a necessary to have hard cut-offs (IRQ, spec limits, and standard errors) even though they may suffer from the “19.9” issue you point out.
I think you’re coming from a different assumption where the composer has no special knowledge, in which case I agree it’s better to let the data speak for itself. And I’ll agree 20% looks suspiciously arbitrary.
Hi Xan, very interesting discussion!
I do agree with you on simplification and especially that the domain expert has to figure out the message.
I just wanted to distinguish between simplification (make things very clean and clear) and oversimplification (focusing on very trivial messages, hiding comparisons, etc). Many times designers enforce their own subjective messages thus making visualizations very poor.
If a specific benchmark exists, then making a comparison to this benchmark can be extremely valuable. I just don’t see it in this particular case. In the first heat map, 9 table cells are colored red. Visually, this offers the observer 9 red areas (rectangles) of exactly the same size, even though the variances in numbers are quite different. All other values under this threshold are grey areas of exactly the same size, even though the figures vary from 0 to 20%. The lie factor is considerable.
That’s why it seems to me that making it easy to compare the variances is the key element to bring the message out here. Especially if enough space is available (my proposal above has 5 charts but actually takes less space than other proposals). However, choosing the “optimal” option also depends on the intended audience…
Cheers!
PS Regarding box plots I have a similar opinion. Tufte has redesigned it, people invented violin plots, vaseplots, etc. To me, the Anscombe’s Quartet explains it quite nicely, I’m sure you’re familiar with this picture: https://en.wikipedia.org/wiki/Anscombe%27s_quartet
The scatterplot does not lie.
Hi,
I went with a bar chart since I still think it conveys the data effectively. I did, though, restructure the data a bit first, so that I could structure my viz easier. I created a three column table with category, group and score (vs. the original data where each group had their own measure).
I also decided to forego the arbitrary 80% and 120% threshold and instead just measured the deviation from the Total and then encoded that with color. This still shows you which sub-groups deviated from the Total in either direction, but also shows you where the largest deviations occurred.
Since I like to include as much relevant info as possible in the chart (while still keeping it readable), I chose to show both the absolute score for each sub-group and category and then the deviation from the Total in parentheses (and in a slightly lighter font), thereby reinforcing the color encoding.
Hi Michael, what do you think about this alternative bar chart solution?
I think it’s probably better to apply color to the variances only, not the whole bar. In this way, the user’s attention is guided to the variance. The attention is proportional to the difference and the reduction of color scale to two colors makes the categories easier to compare.
Hi Andrej,
That is an interesting suggestion. I do like the idea of accentuating the delta more so than having the entire bar color-coded. I played around with a few options in Tableau, and I think I like #4 the best (shown below) as it limits the color to a smaller space (just the circles) which has the effect of highlighting the deltas quickly while still showing the actual scores per group. What do you think?
https://public.tableau.com/views/HelpingaHeatmap/ViewOptions?:embed=y&:display_count=yes&:showTabs=y
You must think extensively, before you buy anabolic steroids online.