Christine Zhang just joined the Financial Times as a data journalist on the US elections team for 2020. Previously, she was a data journalist at The Baltimore Sun, where she used numbers, statistics and graphics to tell local news stories on a variety of topics, including police overtime, homicide patterns, population demographics, local and statewide politics — and even made a series of plots visualizing the impressive performance of Ravens quarterback Lamar Jackson. Prior to joining The Sun in 2018, she worked at Two Sigma in New York City, the Los Angeles Times in Los Angeles and the Brookings Institution in Washington, D.C. She has a B.A. from Smith College and an M.A. from Columbia University.
Here’s a link to the FT-Peterson poll: https://www.ft.com/us-economic-sentiment-poll
Financial Times: Trump vs Biden: who is leading the 2020 US election polls?
Support the Show
This show is completely listener-supported. There are no ads on the show notes page or in the audio. If you would like to financially support the show, please check out my Patreon page, where just for a few bucks a month, you can get a sneak peek at guests, grab stickers, or even a podcast mug. Your support helps me cover audio editing services, transcription services, and more. You can also support the show by sharing it with others and reviewing it on iTunes or your favorite podcast provider.
Welcome back to the PolicyViz podcast. I’m your host Jon Schwabish. On this week’s episode, I’m very excited to chat with Christine Zhang. Christine was a reporter at the Baltimore Sun until very recently when she started a new exciting endeavor which we talk about in this week’s conversation. I was excited to talk to Christine because I’ve talked to a lot of people in the media sector who work at national or international news organizations, and so I was excited to talk to Christine and get that local perspective of what it means to work in a local newsroom. So I hope you’ll enjoy this week’s episode. Before we get to that, very quickly, if you’d like to support the show, please share it with your friends, your family, your neighbors. Please consider writing a review of the show on any of the major podcast providers that you might listen to, Spotify, Stitcher, iTunes, Google Play, and so on and so forth. And if you’d like to support the show finally, I’d also really, really appreciate it, just a couple of bucks per month helps me transcribe the show, helps me pay for sound editing, helps me pay for web services all the things that I need to bring this show to you every other week. So let’s get on to the show. This week’s conversation with Christine Zhang. I hope you’ll enjoy it and I hope you’ll learn something.
Jon Schwabish: Hi Christine, how are you?
Christine Zhang: Hi, I am doing well. It’s a rainy day here in Baltimore, and I am on my way to move into New Jersey, so it’s definitely an interesting time of transition. Moving is stressful enough but…
JS: Enough, right, you are moving at the most inopportune time.
CZ: Yeah. I don’t recommend it to anyone who’s thinking about it.
JS: Right. So we’re going to talk a little bit about why you’re moving in a little bit. So we’ll let that hang in the air for people, so they can…
CZ: Yeah, some suspense.
JS: Some suspense, yeah, for the show. So I’m excited to chat with you about your past work and your current work and your future work which is really exciting. So maybe we’ll just start with – you can just introduce yourself for folks and a little bit about your background, and then we can just chat.
CZ: Yeah, sure. So I am a data journalist, that’s my official job title, and currently I’m working at the Financial Times where I just recently started this spring, mainly to cover the US election this year. And before that I was in similar roles at the Baltimore Sun, hence why I’m in Baltimore currently as well as at the LA Times. But I actually have a pretty varied background, my interest in data journalism had only started just a few years ago before I became a full time data journalist. I was a research analyst at a think tank, the Brookings Institution in Washington DC. And so I’m actually doubly excited to be on this episode, because I feel like, I mean, I don’t know if this, but a lot of the data stuff that I was exposed to in DC including the PolicyViz podcast and blog were what really inspired me to go into data journalism in the first place and explore different ways to communicate and visualize data in the news. I’m not just saying that like – if you want to check my Twitter timeline, it’s true. So it’s really cool to be here.
JS: That’s great. I’m glad we’re able to talk. It’s interesting because it’s actually been a few folks from Brookings who have gone on to data journalism like Chris Ingraham at the Post.
JS: [inaudible 00:03:40] for a long time.
CZ: I think he and I didn’t overlap; I think when I started there, he had just left for the Post; but he’s been one of my data heroes/inspirations as well, and I’ve met him a couple of times and he’s just really, really cool.
JS: Oh he’s great. And here’s a little background treat on how I know Chris. So Chris and I knew each other, I was at CDO at the time, he was at Brookings. And somehow we met up and started talking about our love of data and DataViz, and we decided to try to get together and learn D3 together. It was Chris, me, and one other person at Brookings, and after a few weeks, I was just like, okay, this is just – I’m not going to be a D3 programmer, I can see it, it’s just not going to happen.
CZ: But that’s really cool though.
JS: Yeah, that’s my link to Chris. From this background, you don’t necessarily have a perspective so much in looking at a local newspaper like the Baltimore Sun to a national newspaper like the Post or the Times, but I’m wondering if you can give us maybe just a bit of your experience of working as a data journalist at a local newspaper like the Baltimore Sun, and I’m sure there are listeners out there who will be able to relate or have some perspective on that more than I will, of course, but just what does the day to day look like at a local newspaper when you’re doing data journalism, and maybe not, as I think a lot of people at the big newspapers or the national newspapers are working on, what might be like bigger projects and national projects?
CZ: Yeah, for sure. Well, I mean, first of all, I think, right now, is a really interesting time to think about local versus national/international news. So much of the news coverage has been dominated by COVID-19, and rightly so, since it’s such a big issue of our time, but I think for me it really highlights the comparative advantages of the different types of news organizations. Right now, at the Financial Times, there’s an amazing data team that I’m a part of and many of my colleagues in London have been tracking COVID-19 from an international perspective, looking at different countries, and how their trajectories have evolved. And to get that perspective, I think places like the FT are really great, and even to get like a national perspective as well for the US. But I think as a person living in Baltimore, for instance, like I am now, I obviously still follow the Sun to get updates on my own community and what’s happening at a neighborhood level here, and that’s not something necessarily that the Financial Times would cover because it’s geared towards a more global audience. So I think it’s really interesting because I think it really, for people in general, highlights the different ways that both can be very useful to people. And for me, working at the Sun was really an incredible experience I have a lot of personal attachment to the Baltimore area, it was actually the first place that I lived when I moved to the United States from China. So it’s been a lot of years and I’ve lived in many different places since then, but I took the job at the Baltimore Sun in 2018 because I wanted to follow my dream of becoming a data journalist ever since, as I mentioned, I lived was in DC, but also because I kind of wanted to rediscover this place from my childhood and really understand it and how it’s evolved in the intervening years. And my first story for the Sun actually represents both of these things, so my first story for the Sun was called the gender gap is real – for crabs. It’s kind of a strange title, but basically, I was updating a project that my predecessors on the data team at the Sun had published that tracked the price of steamed crabs in more than 30 local crab houses in the area.
So for those of you who don’t know, Maryland is famous for its blue crab and eating it and how to eat it is like a whole thing. I’m not going to get into that now, but the point is this is like actually a pretty useful “public service” dataset. It’s living in the Baltimore area and you want to know where you can find the cheapest steamed crab or even the most expensive one or the closest one to you, you could go to this website and find out. And we had used a dozen male crabs as the comparison point, so we would call all these places and ask them how much they charged for a dozen males. But when I was calling these places, I remembered that while I was growing up, my parents and I, and a lot of their Chinese-American friends, had actually preferred eating female crabs to male crabs. So I started asking all of these places that I was calling anyway for this project how much crab houses were charging for female crabs. And it turns out that the charge for female crabs was much less than the price for males. So yeah, the article is like a first person essay that talks about the reasons for this price disparity or this “gender gap” which, I think if you – I did calculate it, on average it kind of matches the average pay gap between the males and females in the US. So it’s like a bit tongue-in-cheek. I’m not like – it’s not that profound, I’m not going to win a Pulitzer for it. I didn’t, spoiler alert. But I do think it’s interesting, because it’s not just a matter of size or biology as you might think, it’s actually a question of culture. So I mean, that was just really great to work on. It includes a fun lollipop data visualization that visualizes the gender gap at each house, so that was really great.
And so one of the really great parts about working in a local newsroom is that there are so many opportunities to work on lots of different kinds of stories like the crab prices story, which was just kind of, I don’t want to say random, but definitely an atypical sort of story, and a very personal one to me but it also had data. So it was like a really quirky thing. And I think for better or for worse, the data team at the Sun was pretty small. When I started there, there were three people including me. By the time I left, it was just me on a full time basis, and the entire newsroom is only about 80 or 90 reporters, editors, and photographers. So you can imagine what the challenges are, and I think anyone who’s worked in a local newsroom can imagine those challenges. But on the other hand there wasn’t a whole lot of room for territoriality among beats or anything. It wasn’t like, I don’t know, there was like a crab correspondent who was like, no [inaudible 00:10:53]
JS: It won’t totally surprise me if there was a crab correspondent.
CZ: Maybe someday, yeah. So I think that was nice. I got to work with lots of different people, collaboration was really key, there were – for me anyway, I didn’t feel like there were too many silos. I think a good example of that is one of the projects that I did which involved some collaboration with the sports department. So Lamar Jackson, the quarterback of the Baltimore Ravens, was going to be named the MVP. And basically, everyone knew that that was going to happen or at least we definitely thought there was a high probability, and so the Sun decided to create a special section dedicated to him; and they asked me to work with Tracy Gossen of Baltimore Sun, designer, to create a two-page graphical spread visualizing some of Lamar Jackson’s main achievements; and I’m not the biggest sportsperson, and I only recently learned – this is embarrassing – I probably, I could tell you the basic rules of football, but that’s basically it, that’s it. So I was like, oh my god, I don’t even know where to start, like, I don’t know anything. But I actually think it’s a great example of why you need to have somebody with subject matter expertise in addition to somebody with data expertise or visualization expertise, because if you don’t, then you could end up creating some random chart that makes no sense whatsoever or that doesn’t actually take into account the different nuances of football and what data points really matter. So I’m really indebted to a lot of reporters who are very patient with me in describing the rules of the game.
JS: So when you created that piece, because I think when I first saw that piece, I think someone had taken a picture of the paper version, and that’s where I saw it first, and then I saw it online – so with those types of pieces, are you designing print first, online first, or just both simultaneously?
CZ: Yeah, that’s a really good question – this is interesting because I think as much as a lot of people like to say things like, oh print is dead, or, it’s all digital first all the time, I think in certain instances, like special sections of newspapers or magazines, the print version can be as compelling or even more compelling than the digital version. And in this case, because it was designed to be almost like a collectible section, not just my part in it, obviously, it was an entire special [inaudible 00:14:05] section, it was kind of designed to be a standalone collectible portion of that newspaper. We tried to design it so that it would fit for the confines of print before translating it to the website.
JS: So you spent some time in Baltimore working on local news, although I think Lamar Jackson was bigger than just local. Now you’re moving to New Jersey and you’re working at a different kind of place. Do you want to talk about your move and the work you’ll be doing or already are doing, I guess, over there?
CZ: Yeah, sure. So I’ve been working remotely for the past couple of months. Well, I guess, not just me, everybody is now. Sometimes I forget, because it was only supposed to be me, but now everyone is. I got a new job at the Financial Times, and this job is to be the US election’s data journalist, basically focusing on the 2020 Presidential Election in the US which is huge, I mean, it’s definitely a very big election year. And I think it’s almost coming full circle because 2016 was my first “real job” in journalism, and I was at the LA Times on their data team, and that was also, as you know, a very different and unprecedented election year. So yeah, it’s been pretty awesome so far.
JS: Yeah. And so, it’s weird right now, because the election seems to – well, not seems to, has taken a backseat in a lot of ways to where we all are. So do you want to talk a little bit about the other work that you’ve been doing, and maybe how you’re laying the groundwork for the political coverage you’ll do over the next however many months, six-nine months, wherever we are?
CZ: Yeah, sure. It is weird. First of all, I want to take a step back and just think about, after 2016, when I would give talks about data journalism and what it means and what do data journalists do, my most prominent example was like polls and election forecasting. Since post 2016, that really started to become, for better or for worse, what people associated with data journalism the most. So all of my talks were all about, well, data journalism encompasses political polls and, in some cases, election forecasting, but there’s a wide range of topics that data journalism could possibly cover, including football or crabs or crime rates. All of these things could have data points, maybe you don’t notice it as much but it definitely does. Now though, I really think that in some ways, COVID-19 has been what people think of when they think of data journalism. I don’t know. I’m positing this as a theory and not as a definitive thing. I mean, I don’t know. But I kind of feel like, whereas like polls were what people associated with data journalism the most in the past, now it’s like coronavirus trajectory charts. And I think there’s actually a lot of parallels in that, like, with election data, there are so many nuances to the ways that things are presented, like, uncertainty and margin of error and what a forecast actually is and means, that can cause a lot of confusion. In the same way that data on coronavirus can also cause some confusion. There’s been, I think you did a podcast episode about DataViz in the time of COVID, and certainly, not all the data are necessarily reliable or maybe there are some delays that may make cases appear to be decreasing, when they’re actually not. So it’s a really nuanced thing as well. But yeah, so obviously, as I just laid out, the coronavirus pandemic is really taking over a lot of the traditional election coverage. And I think for me, I’ve kind of been moonlighting, I should say, a little bit as the Baltimore correspondent with regard to certain aspects of coronavirus.
One of my colleagues in London, Federica Cocco, she’s a statistics journalist who does a lot of videos that explain different things in series of charts, and she had an idea of looking at crime in the time of COVID-19, and I also had independently thought of that in my mind – for Baltimore, for instance, it has the highest murder rate among large US cities. I was wondering if the fact that people might be staying in more would have an effect on homicides in the city. And Federica had thought of a similar thing with regard to violent crimes in London. So she asked me to pair up with her to maybe talk about the US perspective, starting with Baltimore and expanding to different cities on how crime has evolved post lockdown, not post pandemic unfortunately, because that won’t come for some time. But it was an interesting exercise, because I think, for me, it’s like, I know Baltimore crime statistics, I know all the nuances of those statistics and what to look out for in terms of interpreting them. But I had to really think about ways to expand beyond just talking about Baltimore and figure out a theme to apply, not necessarily to the entire US, but to many other US cities. So I came across a news story from the Trace which is a nonprofit newsroom that covers gun crimes in the US, and it mentioned how shootings may be an exception to the decrease in violent crimes. So this is where I plug the programming language R, which I use for a lot of my data analyses, and where I give a special shout out to Daniel Nass who was the author of that Trace article, because I asked him to send me some of his R code that outlined how he was calculating crime statistics in different cities, and I modified it and applied it to a selection of cities for this crime video. And basically, the upshot is violent crimes have, in general, decreased post lockdown. I think that it might be pretty intuitive if there are fewer people outside, there might be fewer opportunities for robberies and things like that. But gun violence, shootings, assaults, homicides have in most cities either increased or not gone down at the same level as other violent crimes.
JS: For you personally, what is that shift like going from, you know, if you were doing the same story at the Sun, it would have been presumably just focused on Baltimore, maybe neighboring cities like Philly and DC, but now with the FT, you’re looking nationally and internationally, so for you, what is that like having to do that shift? And also, what does that mean for your ability to tell the stories with data, it seems like it’s a little bit – I don’t know, is it easier – I guess, I’ll just ask: is it easier or harder to tell those stories when you’re dealing with hundreds or dozens of cities as opposed to just one city in the neighborhoods in that city?
CZ: I would say that it is, in those cases, for me, slightly harder. I think one of the maybe advantages of working in local news is that you could, if you have a national dataset, you focus on the part of that national dataset that applies to the city that you live in or the coverage area of the news organization that you’re working for. But when the audience of your news organization is so much broader, then you have to broaden the perspective accordingly. So I think Baltimore is interesting in some cases on its own, like, from a national or even international perspective, but I think that particular video is strengthened by the overarching theme of violent crimes are down except for certain crimes in these places. So in the UK, that would be drug offenses. In many US cities, that would be gun crimes. And in Mexico, which was the other place that another data journalist, Jane Pong, looked at, it would be homicides. So I think it’s really interesting in the sense that it requires more creativity I think sometimes in terms of thinking of common threads or things in terms of weaving stuff together, but there’s also a lot more opportunity for collaboration across continents. And it’s something that, like, it’s not like I would necessarily think to look at what homicide trends in Mexico were like or crimes in London, the trends in crimes in London. So I think that’s interesting.
JS: Yeah, definitely. Go ahead.
CZ: I was also going to say, another aspect of working on the US election for the FT, even though, as you mentioned, a lot of things have slowed down in recent weeks, is that there is an aspect of writing for the US audience but also an aspect of explaining or making things interesting to an international audience. So things like Super Tuesday, which was my first article, I believe, for the FT was a Super Tuesday explainer. Now, I think like, certainly, from a US perspective, a lot of people could imagine why that matters. But I don’t think necessarily people living in other countries totally understand all of the intricacies of the US primary system. And honestly, a lot of people, I suppose, in the US, myself included, don’t understand all of the intricacies, I spent maybe, I don’t even know, hours, trying to find the answer to the question what happens to a candidate’s delegates if they drop out of the race which you think is like a simple question and actually at that point mattered because there were multiple contenders, but it’s actually a really complicated answer that gets into a lot of weird conventions rules that nobody has looked up, except for maybe a couple of scholars.
JS: And you.
CZ: And me. So I mean, I was really proud that we were able to answer that question. And I’m hoping to do a lot more of that as well, like, really try to ask and answer questions that may seem “simple” but actually are more nuanced. Another story that I worked on with my colleague Brooke Fox who’s in New York focused on swing states and what the polls in swing states are showing along with the demographic data. And one key question is what is a swing state, and what is a battleground state in the first place. And people might say, oh okay, like that, it just makes sense, because swing states are obviously states that, like, in 2016, Michigan and Wisconsin were states that really helped to decide that election. But the interesting thing is it’s not like back in 2016, people would have predicted that Wisconsin was going to be the pivotal state at this ceiling point before November. So I think it’s kind of like we’re just using the past to inform them what we’re talking about now, but that might not be indicative of what’s the most important thing to focus on in the future. But it’s also like, this is kind of the best that we have. So certainly, I think election reporting definitely it gets into a lot of deep philosophical questions.
JS: You mentioned earlier about polling data, and I know the FT has this new monthly poll that they’re running, and I wanted to ask you to maybe talk about that a little bit, and maybe also along with that, since you’re going to be neck deep in polling data over the next few months, how you think about communicating either visually or – well, primarily visually, I guess – communicating the uncertainty and the margin of error in those polls as they come up. So that’s kind of a two-part question, so really on the new FT poll, and then on how, I guess, you, as a data journalist, think about communicating the uncertainty around those sorts of estimates.
CZ: Yeah, that is definitely getting into the philosophical questions of data visualization and journalism. So the FT, this was before I arrived there actually, so the FT had partnered with a foundation called the Peterson Foundation, a nonprofit organization to conduct a monthly poll about US voters’ sentiment about the economy ahead of the election. And they’ve actually conducted it ever since October of 2019, so including May whose data I think we should be getting within the next week or so, we’ll have eight months of data from that poll. And it’s interesting because I feel like a lot of news organizations do polls and a lot of news organizations report on polls, especially during election time; and a lot of those polls with, like, I could say, with exception to questions about coronavirus, a lot of those polls boil down to, who are you going to vote for, Trump or Biden, and here’s the answer, and then we can discuss who has the lead. But this poll actually doesn’t ask those questions. It focuses on economic sentiment. So the main question that the poll focuses on is since Donald Trump has become President, would you say that you are financially – and there’s a set of answers. So it’s like somewhat better off, much better off, no change, somewhat worse off or much worse off. So basically, it asks whether voters think that they are financially better or worse off since Trump has become President. That’s not the only question, but it is the headline question of the poll. And it takes its cue from a question that Ronald Reagan had asked Jimmy Carter back in 1980, and basically, I think it was the week before election day. During a debate, he asked Americans to ask themselves, are you better off than you were four years ago. Jimmy Carter was the incumbent president. The economy was in a recession. Rhetorically, I would say, most voters answered no on election day because Reagan won in a landslide, of course. But the answer to that question is not so simple right now. I think that when the FT and the Peterson Foundation started this poll last year, the economy was in good shape and, I think, was for the incumbent President Donald Trump a way of signaling some positive sentiment for his reelection agenda. So at the time it was an interesting question because if most voters said that they were not financially better off, even whilst some of the economy was doing much better, that might signal something for the election. But now, it’s obviously the complete opposite. A lot of economic indicators including unemployment are at record numbers, there’s record high unemployment. So the question now is kind of like are people going to change their answers to that question over the coming months and will that signify something for the way that they’ll vote in November – essentially, like, will the economy matter and to what extent it will? And I think it’s interesting because I have seen just by looking at the past months of data for this poll that the answers are highly partisan. So only 11% of Democrats say that they’re better off since Trump became President, 63 or so percent of Republicans do, and those numbers haven’t changed that much since October. It’s kind of like basically two flatlines one way above the other, and it might show that despite these historic record setting economic indicators in a bad way, historically, negative economic indicators, partisanship is still extremely important in terms of the way that people view their own individual situation. And it might be like almost a referendum as to whether partisan identity is more important or at least more on people’s minds than numbers on certain statistical indicators.
JS: Yeah. Well, I think it’s also personally your individual experience. I think a lot of the commentary I’ve seen about COVID, for example, is a lot of people who are arguing that we don’t need masks and we don’t need socially distance and all those techniques and approaches that public health experts are talking about, a lot of people who are against those approaches, there hasn’t been a huge outbreak of COVID infections that we know of or deaths in their area. So as soon as you start to see your friends and your family and your colleagues and coworkers start to get sick, I think it says something different, same thing when you start to lose your job versus you sort of see 30 million people losing their jobs, maybe it’s just a different, you make a different connection to it.
CZ: Yeah, I think that’s true, for sure. It is interesting how people view their own personal thing versus a more abstract question. I will say, there’s only one other poll that I could find that tracks a similar question, and it’s the Economist/YouGov Poll, and it asks the question: is the country better off now than it was four years ago? And so to your point, 50% of the respondents, as of the beginning of May, have said that the country was better off four years ago and 30% say that the country is better off now. But again that is like so divided by party ID, like, I think, again, a lot of answers to these questions boils down to partisan [inaudible 00:36:14] so I do think that that’s something to keep in mind.
JS: Yeah. Before I let you go and get your truck packed, [inaudible 00:36:27] I wanted to come back to this concept of uncertainty. I mean, have you been working with the FT, the current FT poll?
CZ: The FT-Peterson, yeah, I have.
JS: So working with that and then looking ahead, like, how do you think about visually at least communicating the uncertainty and the margin of error, we’re all going to see that 3% polling margin of error number come up over and over and over again in a few months, so have you and maybe folks you’ve worked with talked about how the best way to communicate those errors and the uncertainty around them?
CZ: Yeah, we’ve thought about it a lot, every month actually Lauren Federer who is a DC correspondent for the FT, and I worked together on a story about the latest findings in the poll, and I’m also in the process of designing a site to house the now eight months’ worth of data and to highlight the key findings. I think deciding whether or not to visualize things like the margin of error is kind of a tricky thing, for one, it’s not entirely clear to me, at least, how you would do this. So say for the month to month answers to the better off question, right now, we’ve decided to visualize this on a line chart with two lines. So one line month to month shows the percentage of those who say that they are better off since Trump became President and another line shows the percentage of those who say that they’re worse off, and each of those lines will have a ribbon around it that represents the confidence interval. It’s supposed to be a visual display of the zone of uncertainty which for any given month is about plus or minus three percentage points. So if it’s like 30%, then we do like 33% would be the max and 27% would be the min – and then that would be the zone of uncertainty. And I think intuitively this makes a lot of sense, but it’s actually kind of misleading from a stats math perspective and going back to my college stats classes, because what we’re doing is drawing a confidence interval for the better off answer, and a confidence interval for the worst off answer separately, 95% confidence interval, with like, again, why is it 95%, like, why don’t we do 90%, like, I’m not even going to get into that, like, we’re just [inaudible 00:39:06] just to make things simple. But really the correct thing to do is actually to draw a confidence interval for the difference between the two and visualize what the difference is. So I think maybe it’s easier to think about in terms of candidates – like, if Biden is leading Trump by five percentage points, is that five statistically different from zero? That’s like the “correct” way to think about it, not like the individual percentage point that Trump has and the individual percentage support that Trump has versus the individual percentage support that Biden has. It’s the margin of error of the difference between the two is actually the correct way to think about these things. I think that there’s a Professor Charles Franklin that I found a paper by, he wrote something called the margin of error for differences in polls. That has been instrumental to my explaining this and thinking about this, but it’s all just to say those lines that you see with the ribbons around them are only visualizing the individual confidence intervals of each person or each answer choice. But what you want is if you truly want to find out if one is leading the other is to visualize the confidence intervals around the difference between the two. But that makes for a less compelling visual, like, I don’t know…
JS: [inaudible 00:40:54] super complicated for people to understand too.
CZ: Yeah. So it’s not like an easy thing. And also, if you have more than one answer choice, then you’re just like, it’s going to look just really, really confusing. So I think the best thing to do is just talk about the margin of error in the text or to indicate it in a footnote. In a lot of the stories that we have done over the past few months about the Peterson polls, we do try to indicate what the margin of error is not just for the poll itself which is plus or minus three percentage points, but also for the different subgroups in the poll. So most polls actually do have a margin of error of plus or minus three percentage points, and that’s not a coincidence. It’s because most nationwide polls sample about a 1000 people. The formula for margin of error is roughly equivalent to one divided by the square root of the sample size. So one divided by the square root of a 1000 is 3.167, no, 3.162, I just did this. So like basically plus or minus three. So you don’t even have to tell people how many people you’ve sampled, they say plus or minus three percentage points, you basically know that’s how much. So that’s fine if you’re looking at the overall top line questions, like, are you better off versus four years ago or whatever. But when you start looking at different subgroups like Democrats, do Democrats say that they’re better off than they were when Trump was elected? Now the poll, this particular poll only asked 300 or so Democrats, and so that sample size is now 300. So one divided by the square root of 300 just like back of the envelope calculation here is like 5.7 percentage points. So that’s different from three percentage points.
CZ: And I think it’s hard because when people see this is a poll with a margin of error of plus or minus three percentage points, they automatically add and subtract three percentage points from every percentage they will see an article even though in some cases it is like almost double that.
JS: Wow. Yeah.
CZ: So like it’s hard. It’s not… I don’t think it’s easy, but I do think that it’s worth the explanation in many cases.
JS: No, it’s a good important challenge, and one that there’s a lot of room to try to solve and to experiment with. You’re kind of in that unique position where you can try lots of different things and see what people respond to.
CZ: Yeah, that’s a good point. I was going to say, I don’t know how data visualization can solve this because I feel like my answer was like, write it in the text or put it as a footnote which is not really DataViz.
JS: Yeah, right.
CZ: But then if you do something like the needle, I don’t know, like [inaudible 00:44:18] completely freak out.
JS: Yeah. But to that point, when you try to do, I mean, the needle maybe kind of the extreme example – and for those who don’t know, we’re talking about, I’ll put a link on the show notes, it’s this needle gauge chart that the New York Times did in the 2016 election about probability of winning, but I think trying some of these other graph types out, you run into this problem that people don’t necessarily know how to read these other graph types. And so you end up in this bit of a Catch-22 of do I try a graph that maybe shows the distribution in a better way but fewer people really understand how to read it or do I try a bar chart, but then how do I show the uncertainty in a better way than just an error bar.
CZ: Yeah, it’s definitely a learning process. But I think that the learning is happening. I think that, for example, I think in 2016 when Clinton won the popular vote but Donald Trump won the electoral college, that was such an anomaly. I mean, I think the last time it happened was 2000, and that was so exceptional for people. I think that now there are a lot more election reporters, myself included, that take that as a possibility or take a look not just at the national polling average but also at the state level polling averages. So I think it is an incremental thing, but it’s happening.
JS: That’s great. Well, that’s great. Well, thanks for chatting with me. Thanks for coming on the show and good luck with the move. I look forward to seeing all the stuff you’re going to be working on and coming out with over the next few months.
CZ: Thank you so much for having me.
Thanks everyone for listening to this week’s episode of the show. I hope you enjoyed it, I hope you learned something, and I hope you’ll be able to take some of the lessons from this week’s episode and apply it to your own work. So again, I hope you’ll consider supporting the show either by sharing it, by writing reviews or even financially by going over to Patreon. So until next time, this has been the PolicyViz podcast, thanks so much for listening.