Dr. Jessica Witt is a Professor in Psychology at Colorado State University. Dr. Witt received her Ph.D from the University of Virginia in 2007, and has been at CSU since 2012. Dr. Witt recently won the American Psychological Association’s Distinguished Scientific Award for Early Career Contribution to Psychology (2017), the Janet Taylor Spence Award for transformative early career contributions from APS (2015) and the Steve Yantis Early Career Award from the Psychonomic Society (2014). Her CV can be found here. Dr. Witt’s research has been continuously funded by NSF since 2010.
In this week’s episode of the podcast, Professor Witt and I chat about her new paper, Graph Construction: An Empirical Investigation on Setting the Range of the Y-Axis, in which she asks a hotly-contested question in the data visualization field: Does the vertical axis need to start at zero?
Enjoy the show!
Support the Show
This show is completely listener-supported. There are no ads on the show notes page or in the audio. If you would like to financially support the show, please check out my Patreon page, where just for a few bucks a month, you can get a sneak peek at guests, grab stickers, or even a podcast mug. Your support helps me cover audio editing services, transcription services, and more. You can also support the show by sharing it with others and reviewing it on iTunes or your favorite podcast provider.
Welcome back to the PolicyViz podcast. I am your host, Jon Schwabish. On this week’s episode, we are going to talk about doing rules of data visualization. Now, I think it’s the case that in the DataViz community, we have sort of settled on one rule that remains that we all sort of follow, and that is that bar charts should start at zero, that is the axis, the horizontal/vertical axis, whether your bars are going up or to the side, that your vertical axis should start at zero. I think we’ve all sort of talked about when maybe pie charts are appropriate, when they’re not, they’re sort of basic obvious rules that pie charts should sum to a 100%, but those are sort of basic things right. But the one thing that we are still discussing and still trying to find some evidence for is whether it matters if our vertical axis in bar charts starts at zero or not. Fortunately, there is now some research that suggests that maybe that rule that we need to start our bar charts at zero, is it necessarily true. So on this week’s episode of the show, I’m very excited to have Jessica Witt who is a professor at Colorado State University to talk about a new paper she has published on the range of the y-axis when it comes to bar charts. So one of these last bastions of basic rules of data visualization that really needs some attention in the academic community – and I’m really grateful that Professor Witt went in and explored this issue.
There’s of course a lot going on in the academic field, so I don’t want to push that to the side. There’s been interesting papers on basic data visualization questions over the last couple of years. The one that comes to mind most immediately is that work by Drew Skau and Robert Kosara on how do we actually perceive values in a pie chart, and Robert and Drew were on the show when those papers came out, so I would encourage you to go take a listen to that episode of the show. But these basic questions are ones that we sort of take for granted and I’m really grateful that we have some researchers out there thinking about even these what we might take for granted as simple questions but now actually getting in and discussing it and researching it and testing it. So on this week’s episode of the show, my interview with Jessica Witt from Colorado State University, I hope you’ll enjoy it and here is my discussion.
Jon Schwabish: Hi Jessica. How are you?
Jessica Witt: Good. How are you?
JS: I’m doing really well, kicking off the year, getting into the swing of things. Thanks so much for coming on the show and for responding to my email so quickly. I’m excited to talk to you about this paper. So we’re going to talk about this idea of how you should scale your vertical axes or your y-axes in this really interesting paper you wrote. But I want to give you a chance to really talk about your background because you’re in the department of psychology from Colorado State, so it strikes me as an interesting place to come at this particular question and I thought it would help listeners get a sense of where you’re coming from to talk a little bit about your background on how you got interested in this particular topic.
JW: Sure, yeah, I’m a professor in the Department of Psychology at Colorado State University, and my background is as a vision scientist, and so I study how the visual system works and how do we process the information that comes through the eye and how does that result in the perception we have of our surrounding environment. For the past 20 years, my research has been on basic research questions in vision. Specifically, I’ve looked at things like how does your ability to act change, how you see spatial layout. So for example, hills looks steeper when you’re tired, or distances look farther when you’re of low energy, and these are fun and interesting questions to look at; but they’re really kind of basic research, basic science, really getting at the core of how does the visual system work. And after I got my last promotion and I got tenure, I decided to step back a little bit and think about ways that my research could have more immediate impact and what are the topics that really interested me, and I spent a lot of time thinking about kind of what my life goals are and what issues in our world most attracted me. And then it’s been even longer trying to figure out my specific but narrow skill set could actually have an impact in those areas, and eventually this all brought me to the field of InfoViz or information visualization. There’s a lot of big data out there, a lot of really important data – and it doesn’t matter how good the data are or how good the analyses are; if we’re not presenting it in a way that can clearly communicate something, then our message is going to be lost. So that is how I came to the world of InfoViz and I just looked at the field in general and there is some really good empirical work out there, but it’s clear to me that the people doing that work don’t have the same kind of rigorous training that we have in the vision science community and that that could be a way I could have some impact. And so bringing our more sophisticated methodologies to the questions in InfoViz is something that really interested me.
JS: Right. Okay, so in that vein of turning over a new leaf or finding a new path, you’ve started with this paper on an empirical investigation of how we should set the range of the y-axis which is maybe one of the last big DataViz, InfoViz debates out there – we used to have a lot of debates about pie charts, and I think the community is sort of over that, but we’re still debating about whether our bar charts and line charts should start at zero or not, and I think you have one of the few papers that actually looks at that. So can you talk about the research paper here and what you found?
JW: There’s another group that has done a little bit of empirical work on this and what they did was they just tested whether having the baseline start at zero was better than making your axes really misleading where the data were really compressed. And so, what we did that was different from some of this past work is that we systematically manipulated the range of the y-axis. So of course we included a condition where the y-axis started at zero. And then we included a condition, which we call our minimum axis, where you show the range of the y-axis as small as it can be where you can still see all the data. And this actually aligns with the default setting in many programs – we made our graphs in R and this is the default setting in R, I’ve also worked with SPSS and this is the default setting in SPSS, and a lot of programs have this kind of minimum axes range as their default. And so we wanted to include that just because many people make their graphs and they don’t even think about changing the y-axis range, they just leave it to the default, and so we thought that was important to compare.
And so we have kind of the current recommendation which is the full range, we have the default of many of these programs which is the minimum range, and then we added a new condition which was our hypothesis of what would be best. And the big idea that we had was that if the effect is big, it should look big; and if the effect is small, it should look small, so that the real size of the effect should align with the impression the data gives. And in psychology and many social sciences, we measure how big an effect is in terms of standardized units which means we take the size of the effect and we divide it by the standard deviation. So when I say standardized units, that’s what I mean – I mean, we divide it by the standard deviation. So how big is this effect given how noisy the measurement is, and an effect of 0.8 standardized units is considered big in psychology. And so we created these graphs where the y-axis range was about 1.5 standard deviation, so that 0.8 effect looks big in psychology, an effect of 0.3 standard deviations is considered small – it doesn’t matter it’s not meaningful, it’s just a small effect. So when you use these standardized axes and you have a small effect, it looks small. I like to think of the visual system as one of the goals of the visual system is saving you from having to think. So if you can get a visual impression of big or medium or small, you don’t have to think, you don’t have to go read the y-axis range, you don’t have to consider what all the different numbers mean, you could just look and see it. And the visual system works so fast and so automatically, and so what we can do in these visualizations is harness that power of the visual system and get an immediate understanding of the magnitude of the fact just with good graph design. So that’s what motivated that condition.
JS: Just so people have a sense of how you actually do study these three conditions, can you talk a little bit about practically how you actually sat down with your sample size and what you ask people to do?
JW: So first we had to generate our stimuli, and so we generated several datasets. And for each dataset, we plotted it three different times, one for each of our different types of y-axis ranges, the full range, the minimum range, and the standardized range. And so at the end, we had a whole bunch of graphs that people could look at, and we recruited participants – our participants our students here at Colorado State University who are taking the introduction to psychology course, we give them course credit for volunteering their time. We sit them in front of a computer, one person at a time, and we show them all these different graphs. For each graph, we ask them to make a decision – what is the size of the effect that they’re seeing? Is it small, medium, big, or no effect at all? And so they do this task hundreds of times and we record all their responses, and that’s how we collect our data.
Then the next step is how do we analyze our data, and a typical way to analyze data in the InfoViz world is to calculate the mean number or the proportion of trials for which people were correct. So if there was a medium effect on the display and the participant called it medium, they would get credit for a correct response. But if it was a medium effect on display and they called it small or big, then that would be an incorrect response. And so a typical analysis would be to calculate the proportion of trials for which there’s a correct response, and then you can analyze a proportion of correct responses across the three different graph types. What we did, that is taking kind of sophisticated methodological skills from the vision world and applying it to InfoViz, is that we broke out participants’ performance into two different metrics. One metric is called sensitivity or is also sometimes called discriminability, and that’s how good are they at differentiating small from medium from big effects. And then the other measure is bias, so when they make errors, what kind of errors are they making – are they making errors because they’re calling everything too small, or are they making errors because they’re calling these things too big?
And so that’s how we analyzed our data and here’s what we found – so with the full axes range which is including the baseline of zero, and the minimum axes range which is just showing the minimum you need to see all the data, sensitivity was pretty low for both of these conditions. And then in contrast for the standardized axis, sensitivity was about twice as big, it still wasn’t perfect, it still was maybe 50% of what would be ideal performance, but 50% is a lot better than 15 to 20% which is what we were finding with those other conditions. So we find that people are better able to differentiate the magnitude of the depicted effect when the axes are standardized.
The second finding relates to bias. So when we have your baseline at zero, people generally underestimated the magnitude of the effect, they were referring to the graphs as primarily showing small effects, even when the effects were actually medium or even big. And so there was this huge bias to underestimate the magnitude effect with the full axis. And then in contrast with the minimal axes, now everything looks big. Even small tiny effects were judged as big effects and medium or big effects. So not only did the full and minimum axes ranges lead to poorer sensitivity, but they also led to these huge biases – for the full graph it was about a 25% bias to underestimate the effect size, and with the minimal axes it was just over 20% bias to overestimate effect size.
So in the world of psychology or social sciences where you measure effect size in terms of standard deviations, the recommendation is simple. The recommendation is to have your y-axis range be about 1.5 standard deviations or more in case you have a bigger effect you need to show. Now that advice doesn’t transfer nicely to areas where they don’t measure things in terms of standardized units. And so in those cases, the kind of global recommendation is if this is a big effect, it should look big in the graph; and if it’s a small effect, it should look small in the graph. Small effects can be important so you can still argue why a small effect is relevant or is important, but the graph should show it as a small effect. And so the challenge for graph designers, the challenge for people is to figure out how to quantify what is a big effect, what is a small effect – is a 2% increase in home prices a small effect or [inaudible 00:15:54] effect? So the challenge is figuring out what should be considered small and what should be considered medium and what should be considered big. And then once you have that figured out, then you make your y-axis range about twice as big as your “big effect”.
JS: But this question of what’s big and what’s small and what’s medium is kind of the core question in some senses, right?
JW: Yeah. And so it’s kind of putting the effort onto the graph designer as opposed to onto the reader. So of course, you’re labeling the y-axis with numbers, so anyone can go and look at the numbers and figure out the value, the numeric value, and make their own determination about whether that’s big or small. But what my advice is, is doing, it’s forcing the graph designer to put a lot of effort into doing that ahead of time, and you don’t have to worry – I mean, the numbers will still be there, so people can still get it right. But for example, so we know with global temperatures that 2 degrees Celsius is a really important number and actually now it’s been changed to 1.5 degrees, so there you have your number of like what’s big. And so you can use these scientific driven ideas of what’s big or what’s relevant and there will be a y-axis accordingly.
The other thing you could do is look at historical trends. So if you’re looking at home prices in an area, you could look at historic trends, and look at, okay, what’s the biggest kind of increase or decrease we’ve gotten and we’ll consider that big; or you could look at national trends if you want to compare how a given location compares to national. So it depends on the message you want to communicate, but what my recommendation is doing is forcing the graph designer to put some real thought into what should we consider small or big.
JS: Yeah, and I think a lot of people feel like especially with the minimal, what you’re calling the minimal approach which is to have that y-axis started something where all the data are visible that is the case I think where a lot of people get uneasy because we’ve seen so many examples of using that approach to make changes look big or small, and one of the more famous one I think is from Fox News from 2012-2013 where the top marginal tax rate was going to change from 35% to 39.6%, and they started the graph at 34%. So it makes this change look really big, but what’s interesting about that graph, about that bar chart, is that 4.6 percentage point change, in some way it’s large because it’s a meaningful change, if you have to pay another almost 5% in taxes that’s a lot of money, but is it a large change with respect to the rest of the tax system, and that’s I think where the crux of this challenge comes in.
JW: That’s right, and it also has this potential to have a skew, right, the graph designer gets to have a skew – in fact, that’s one of the things I worry about with this research is I’m giving people recipes for how to create misleading graphs.
JW: [inaudible 00:19:21] if you want people to overestimate, use this; if you want people to underestimate, use that.
JS: So now the other interesting thing about your paper is that you’ve done this for both bar charts and for line charts, and one thread of this debate or discussion in the DataViz world is that we’re okay starting the graph at some place other than zero for line charts but not so much for bar charts. So did you find differences between those two types of graphs?
JW: We did not find any differences in the biases or sensitivity that you get from bar charts versus line graphs. And to me that suggests what some other people have said too, which is that you can use your y-axis range to create misleading graphs with line graphs just as you can do it with bar charts. So your y-axis range can be used to create a misleading impression no matter what graph type you use. The concern with the bar graph and why this is such an issue with the bar graphs is because the bar graph has two indicators of a value, it’s the top of the bar, the location of the top of the bar but it’s also the length of the bar. And this is where people get an easy because the length of the bar really is a salient visual feature of the graph. And if people are using the length of the bars as opposed to the location of the top of the bar, then you’re going to get misleading impressions when your y-axis doesn’t start at zero, like that’s the crux of the matter.
JS: Right, that’s the crux, yeah.
JW: And some people have talked about whether even that space below the line also creates that impression, and it does, it does. But because the bar graph is an object, it has a potential to do so even more. Now we don’t have evidence for that, like our data didn’t bear that out and our data did not show that that was the case. But again our sample size was exclusively drawn from the college student population, and so it’s not clear whether other groups would be more likely to use the length of the bar as opposed to the top of the bar and therefore be more influenced by changing the y-axis range.
JS: Right. Yeah, I mean I can imagine a lot of ways to extend this line of research to people who are more or less familiar with different graph types, who are more or less familiar with different types of content. I wanted to ask you to talk a little bit more about in the experiment itself what were you asking people to look at in the data just to give folks an idea of what you are asking people to think about or look at when they were judging things to be a small change, a medium change, or a big change.
JW: Yeah, and you also want me to describe the graphs or do you think that that’s [inaudible 00:22:24]?
JS: Yeah, I mean, I’m looking at them right now, yeah, I think it would be helpful, and I’ll put some images in the show notes for people. But yeah, I think it would be helpful. I mean, they’re pretty simple graphs, so they’re pretty easy to explain, even though this is audio only. But yeah, I think that would be helpful for people to get a sense of what your survey participants had to look at.
JW: Yeah. Or do you want to describe… ?
JS: Yeah sure, okay, I’m looking at them right now. So I’m looking at six graphs, so you have two rows here, the top row is a row of three graphs of bar charts and the bottom row is a row of four line charts. So on the top row there’s just two bars in each, and so that first graph is what we’re calling the full which is going from 0 to 100. The second one is the standardize, so that’s the 1.5 range on the standard deviation, so that’s going from 40 to 54. And then the minimal starts at 44 because the first value here is at 45, so it goes from 44 to 51. So that’s the top row, and then the bottom row – oh and I should say, the two bars is a study style and I’ll let you talk more about the study style – and then the bottom row is a row of line charts and very simply it’s the test score on the y-axis and the number of hours spent studying on the horizontal axis, so four data points is a simple line, so we go from 0 to 100 on the first, so it’s a pretty flat line. The next graph goes from 42 to 54 and so it’s a slightly upward sloping line, and then a last graph, the minimal where we start just below the data points, we go from 46 to 50 and you have a pretty steep upward sloping line. So if you have this in your mind you have three bar charts and three line charts and that’s what survey participants were asked to look at. So maybe you can talk a little bit more about this concept of what it means for small, medium, and large, what I think is really the core of what you’ve been talking about so far.
JW: That’s right. The three bar graphs are all the same data, but if you just were to look at them with these different y-axis ranges, the difference between those two bars, one which is in white, one which is in black, that difference looks really small. And when you look at the same difference but now with those minimal ranges, now the white bar is right near the top of the graph, and the black bar is right into the bottom, so that difference looks large, and then the standardized is in between them. The data that we presented to our students related – well, it was all simulated data, it wasn’t real data, but the scenario was how different types of studying styles affects final test score. And one of the main findings in the memory literature is that if you space out your studying, so you study a little bit every day as opposed to waiting until the last minute and cramming it, that’s called massed study style. And so we thought that if we use this scenario, maybe it would seep into these undergraduate students and help them learn the [inaudible 00:25:24] style.
JS: I see. So you had really like a dual purpose to this this experiment.
JW: Yeah, I figure if we’re going to make something up anyway, we might as well have it aligned [inaudible 00:25:34] study better. But I doubt that the effect depended on the specifics.
JS: Right, yeah sure.
JW: But yeah, if we can communicate better study skills to our students while we’re also doing the study, then all the better.
JW: And so, we did not give them any instructions about what should be considered small or a medium. And so we didn’t say, hey, an effective two test points is small, an effective 10 test points is big, and we didn’t give them any of that. Now, of course, these are undergraduate students, and a difference of 10 points is a difference in a letter grade, that’s a huge difference. And so I’m sure that they did bring to bear on the experiment their own idea of what would be a big difference. If they knew that the space studying style could improve their test score by 10 points, that would be potentially motivating; where if it’s only going to improve performance by three points, then it’s not clear whether that effect is very big. And so, it’s likely that they brought to bear their own ideas of what would be a big difference in final test score.
JS: Right, which to me doesn’t seem that far afield from what people do in the wild or the real life, they bring their own perspective on what they’re looking at. So back to that Fox News graph, if I am in the top income bracket and I see that my tax rate’s going to go from 35 to 40%, that’s much different than if I’m at the other end of the distribution and it’s not going to affect me at all, and I say, well, they’re going to have to pay a lot more. So that seems like what people would experience in real life.
JS: So where do you see this research going, like if I could cut a check right now to fund your work for the next couple of years, like where would you go with this line of research?
JW: One of the problems I think is the most interesting in the field of InfoViz right now is how do we present uncertainty. And the reason I think this is so interesting is that the mind is really bad at understanding both uncertainty and proportions, and I’ve already talked about how great the visual system is, and my goal is to get all of the work out of the cognitive system and the reasoning system and the memory system and put all that work into the visual system. And so how can we get these really complex concepts and data presented in a way where the visual system can just look and make sense of it all – and so looking at how we can plot uncertainty is something that’s really interesting to me right now. So one example would be kind of a famous example is how do we show predictions of where a hurricane might go. It’s all uncertainty. But we have some knowledge and so how can we get the kind of signal and the noise, how can we get the knowledge out of these predictions without letting the uncertainty overwhelm it – and so, that’s one of the, here in Colorado, and obviously in Australia right now, forest fires are a huge concern and trying to predict where they’re going to move next and then how can we best set up protections based on where they’re going to go. So those are some other examples where looking at uncertainty is really relevant.
JS: Before we go, let me ask one last question on this y-axis paper. So you have this recommendation I’d say or tentative recommendation of using the scaling factor to set your y-axis range, and I’m curious what you would say in the case of an academic paper or a report of some sort where someone has multiple bar charts or multiple line charts, maybe it’s the same data being presented over a cross of multiple charts, say I have figure one is a bar chart for the US and Canada, and figure two is Germany and the UK – in cases like that, how do you think about setting these axes? Is it important for someone to have the same axes going from figure to figure or is it still each individual case is what they should be considering and thinking about the data shown within that specific view?
JW: Yeah, that’s a great question. This study was all about a single graph and how to read a single graph, and what happens when you have multiple panels and multiple graphs. So if your graphs are going to end up on different pages, people’s memory systems are not good enough to be able to really remember the exact values from one to another in terms of a visual comparison. You can do it in your head of course, but in terms of making a visual comparison, you can’t really do that. So in that case, when you’ve got your graphs on multiple pages where you’re never going to be looking at the two graphs side by side, then I think it’s a good idea to have the impression given by each graph to be accurate for that graph without really concern about the other graph. So in my paper I recommended that the data were centered on the grand mean, but that’s not quite so relevant. And so, I would say that if the data are close enough where you can get in the ballpark of 1.5 standard deviations for all the graphs, but still have the same y-axis range for all of them, and that means the data are a little bit lower for some and higher for others, that’s fine. I don’t think that our visuals [inaudible 00:31:32] are not so sensitive that they’re going to be able to tell the difference between 1.4 standard deviations and 1.6 standard deviations. So there is some leeway in there for sure. If that leeway is enough and you can make your y-axis range the same for all the graphs, do it. Better to have them do the same for all the graphs than to make sure they’re precisely 1.5 standard deviations for each.
JS: To me that makes total sense and even though, I mean, you can think about having things on two separate pages, but then we start thinking about in a digital presentation where it’s really just a scrolls, so it’s above the page or below the page, how does that work. So I do like this idea of it at least being consistent. I’m going to have to let this paper continue to marinate in my head before I can come up with how I still feel about this because in my gut I’m still very much like, start your bar charts at zero, but I think Chad Skelton who’s probably listening to this is very excited that you have some research that shows, well, maybe we don’t need to start these graphs at zero. So it’s really interesting and thanks for putting this paper out there, it’s one of those papers that’s been needed for a while.
JW: Thank you. Yeah, and I would say don’t start your charts at zero in many cases, because we’re getting huge biases.
JS: Right. I’ll put a bunch of the images from the paper on the show notes so people can at least see these two big ones of what people in your survey saw and then what the main outcomes were, especially on the bias piece because I think that bias piece is really – it’s a pretty stark picture of the differences between these three different cases that you had people look at.
JW: I think so too. And then I don’t know if you saw, I have this other paper where I invented a new graph called a hat graph, and it’s a modification of the bar graph, and it basically takes away the bottoms of the bars. So you don’t have this dilemma of our length versus bar top, and so you can imagine these two bars next side-by-side and you just keep the top part of the lower bar and then the difference of the bigger bar, and that’s the whole graph – And I call it a hat graph because it kind of looks like the brim of a hat and the crown of a hat. And what that does is it makes the difference between the two bars which is typically the thing we’re most interested in is the difference. It makes that difference, it isolates that difference as an object as opposed to having that difference be the space between two objects. And so that can also help increase sensitivity even more.
JS: Oh interesting. I haven’t seen that but I’ll take a look and I’ll put it on the show notes page for folks to take a look at as well.
JS: Jessica, thanks so much for coming on the show. This has been really interesting.
JW: Yeah, thank you.
Thanks everyone for tuning in to this week’s episode, I hope you enjoyed that, I hope we will see some more research on this and other questions of data visualization. For me, I’m still sticking with the zero baseline, I’m still sticking with it for now – at least the research is suggestive that we can make some changes but for now I’m sticking with the zero baseline and we’ll see what the future brings in that particular area of our chart creation process. So until next time, this has been the PolicyViz Podcast. Thanks so much for listening.