I took a quick trip to San Francisco early this week and stopped by to see my friend Cole Nussbaumer (Storytellingwithdata.com). As it does every once and a while, the topic of whether to crop your graph’s baseline to something other than zero popped up last week. Instead of writing a blog post, I thought Cole and I could sit down and chat about it. There are still lots of questions here, so please submit your comments and questions below, or hit us up on Twitter (me and Cole).
From Drew Skau’s post:
I don’t believe this was discussed on the podcast (apologies if I missed it), but what about line charts that are showing metrics that don’t represent a quantity? In those examples, 0 may not represent anything and doesn’t make sense as a baseline. Take a person’s temperature for example. Here is a nice example illustrating the difference between a zero and non-zero axis tracking someone’s temperature in Celsius:
http://data-informed.com/wp-content/uploads/2015/01/Body-temperature.jpg
I don’t think the zero based version provides any necessary context to use the second chart, which would be much more useful in tracking someone’s symptoms. The zero doesn’t really mean much here, unless comparing the temperature to the freezing point of water is somehow important. But you wouldn’t interpret 20 degrees Celsius as being half as warm as 40 degrees Celsius. 0 degrees Celsius isn’t the baseline value for the temperature scale, so why include it?
Well it has to do with the context and what norm is used (I agree with Speros on this, heart-rate and blood-pressure are also examples.). What is your audience, what are you communicating? It has to do with references which are usually handled in a subject. So crop it if the norm and your references are different.
As Sonja says, I think where Cole and I ended up is that context matters. Just as in the labor force participation rate example, a 0% LFPR is pretty meaningless, so why worry about showing it? However, I do worry about stretching the y-axis to try to make something look like a much bigger change than it really is. So in Speros’ example, is a change from 36.5 to 37 meaningful? If so, then zooming in this way may make sense; if not, then the author is stretching the axis to make it look like it’s meaningful. In some ways, this is similar to a post by Robert Kosara (https://eagereyes.org/blog/2012/bikini-chart) on how our perceptions might change based on how the data are plotted.
Hi Jon! Great point for discussion. Normally, I would adhere to the “always start at zero” rule. But, I recently needed to share some data that showed differences of about 1-4 percentage points on a 100 point scale. I really needed people to see that a 2% change for one program is an important difference from a 4% change in another. I started the y-axis at 70% in order to magnify the differences on a simple column chart. BUT, I put a big red circle around that 70% and added a text box with red text that said “NOTE: axis intentionally manipulated to magnify differences.” AND, this information was for internal use only and not for publication or sharing outside the organization. So, I felt that I was being transparent in communicating the data this way.
Thanks for the comment, Sheila. The thing that gives me pause here is that some might take your experience as an excuse to use non-zero baselines as long as they put some annotation around it. But would we forgive the classic Fox News examples of non-zero baselines (eg, http://flowingdata.com/2012/08/06/fox-news-continues-charting-excellence/) if they put a disclaimer? Probably not, because they are still using the non-zero baseline to create a distorted image of the data. While part of this discussion is about context, another part is about trying to be as objective and honest with the data as we can be.
I agree Jon. When it’s used to intentionally distort readers’ perception, it’s dishonest. I’m sure I can also think creatively and find better ways than using a bar or column chart to show visually how small changes are indeed significant in certain scenarios.
Thanks for this post, Jon and Cole! I recently came across 2 line charts that for me re-ignited the debate on when a non-zero baseline is appropriate.
The first one comes from a (image below) comparing business birth and death rates. Though I have some questions about other aspects of this viz (what constitute’s a firm?, why use % instead of number?, are labels clear?, etc.) which I would also like to hear your comments on, I was speculating as to how much changing the baseline to zero would alter the viewer’s perception of this data. It doesn’t look to me as though it would change it much, so in this case is the non-zero baseline justifiable?
The second is a Also a line graph dealing with percentages, this one starts the baseline considerably high compared to the previous example. I suppose the altered baseline does serve the purpose of allowing the viewer to see the critical “moment” the article is referring to–the point in 2015 when the top 1%’s share of global wealth will surpass 50%. What do you guys think? Is this a justifiable purpose for the non-zero baseline in this scenario?
And the second graph…