You have surely heard of Nadieh Bremer and Shirley Wu, the super-duo behind the project and new book, Data Sketches. I’m very excited to welcome both of them to the PolicyViz Podcast this week! We cover some new ground in this interview–we talk in depth about their data visualization creation process as well as how they clean and approach their data. We also talk about what they are looking forward to for the future of data visualization.
Shirley Wu is an award-winning creative focused on data-driven art and visualizations. She has worked with clients such as Google, The Guardian, Scientific American, SFMOMA, and NBC Universal to develop custom, highly interactive data visualizations. She combines her love of art, math, and code into colorful, compelling narratives that push the boundaries of the web.
Nadieh Bremer is a graduated Astronomer, turned data scientist, turned freelancing data visualization designer working for companies such as Google, UNESCO, Sony Music, and the New York Times. She focuses on uniquely crafted, elaborate data visualizations, for print and online, that are both effective and visually appealing.
Shirley Wu, People of the Pandemic
Bussed Out: How America moves its homeless (Guardian)
Susie Lu, Re-Viz-Iting the Receipt
Nadieh & Shirley on the Storytelling with Data podcast
Nadieh & Shirley on the DataViz Today podcast
Support the Show
This show is completely listener-supported. There are no ads on the show notes page or in the audio. If you would like to financially support the show, please check out my Patreon page, where just for a few bucks a month, you can get a sneak peek at guests, grab stickers, or even a podcast mug. Your support helps me cover audio editing services, transcription services, and more. You can also support the show by sharing it with others and reviewing it on iTunes or your favorite podcast provider.
Welcome back to the PolicyViz podcast. I am your host, Jon Schwabish. On this week’s episode of the show, I am very excited to welcome Shirley Wu and Nadieh Bremer to the program. Shirley and Nadieh, as you probably know, are authors of the new book, Data Sketches. It’s a fantastic book, looking at their process of their year-long project of creating interactive visualizations. It is one of the few data visualization books that have come out recently that is larger than my book, which I really appreciate. And so, Nadieh and I and Shirley talk about a variety of different things in this week’s episode. So I’ve noted that they’ve already done an interview with Cole Nussbaumer Knaflic on the storytelling with data podcast, they did an interview with Alli Torban on the Data Viz Today podcast, both dealing with slightly different issues, different questions. And so, I wanted to make sure that our conversation would give you a little bit of a different flavor for their work and their thinking around data and data visualization. So we do talk in depth about their process of creating data visualizations, but we also talk about their process for cleaning and extracting data. That’s sort of step zero in the data visualization process, and both Nadieh and Shirley talk about the processes that they use in the book Data Sketches, but we talk about that in a little more depth in this week’s episode. We also talk about the tools that they use, and we talk about what they are sort of hoping for or wishing for in the future of data visualization.
So I think it’s a really great interview, a really great discussion, I hope you’ll enjoy it. And before we get there, let me just give you a couple of other things to check out on PolicyViz. So I am about to publish a few more Excel videos, if you’re interested in learning how to expand your use of Excel to create data visualizations. I’m also starting a new series on the new Clubhouse app, which is an audio only app. I’m starting a new weekly series, it will take place at Thursdays at 12 o’clock Eastern Time. It’s called All Charts Considered, yes, it’s playing off of the NPR Show, All things Considered. We’re going to talk about all things going on with charts and data visualization. And so, check out that app – if you need an invite, just send me a note, send me a DM, or you can send anyone who’s on the Clubhouse app a little note to get your invitation. So I hope I’ll see you on the Clubhouse app.
So before we get to the show, let me tell you a little bit about my guests. Shirley Wu is an award winning creative focused on data driven art and visualizations. She’s worked with clients such as Google, The Guardian, Scientific America, and NBC Universal to develop custom highly interactive data visualizations. And Nadieh Bremer is a graduated astronomer, that’s right. So if you’ve listened to the show, you know people come from all walks of life, so Nadieh is an astronomer, she’s turned data scientist, turned freelancing data visualization designer, and she’s worked for companies like Google, UNESCO, The New York Times and Sony Music. And both Shirley and Nadieh create amazing visualizations, amazing visualizations that are both print and online, as you’ll hear about in this week’s episode of the show. So I hope you will enjoy this week’s episode of the PolicyViz podcast, and here’s my chat with Nadieh and Shirley.
Jon Schwabish: Hey Shirley, Hey Nadieh, great to see you. How are you both doing? Welcome to the show.
Shirley Wu: Hello. Thank you so much for having us.
Nadieh Bremer: Yes, thank you.
JS: I am very excited to chat with you. Congrats on the new book. So first, it’s a beautiful book, it’s amazing. And also, it’s larger than my book, so I really appreciate the fact that it just like, sort of dwarfs my book on the bookshelf. So I appreciate that too, but congrats on that.
SW: Happy to help you out.
NB: Thank you.
JS: So we have a lot to talk about, and I’ll just preface this whole conversation for folks who are listening that you’ve done a couple of other podcast interviews with folks in the DataViz field with Cole Knaflic over storytelling with data, and Alli Torban over Data Viz Today, and I’ll put links to those two shows. We’ll try to avoid rehashing the same topics. So I have a couple of new questions for you. We’ll see what we can talk about and cover. I want to start with, I think, what’s sort of the central tenet or theme of the book, which is the process by which you both create your visualizations. And before we get to the actual, like, the visualization creation process, I want to start with the data part, sort of step zero of this whole process. And maybe I’ll throw this over to Nadieh to start, because I know in some of the previous other interviews I’ve listened to, you talked about collecting some of your own data and I know you come from this astronomy background, so you’ve worked with lots of data. So can you talk a little bit about how you think about cleaning data, whether you should be visualizing the data, whether it’s objective or not, like, all these huge questions about data, if you could answer those in 30 seconds and give everybody the answer, that would be great.
NB: Right, yes. I don’t have a magic answer to that. It’s always the, it depends, that would be my 30-second answer. And yeah, so for Data Sketches, where we basically, Shirley and I, went through 12 different topics, and we both created our visualizations based around the sort of singular topic word like books or nostalgia or fearless, and because we wanted to create things that were fun, we were often going into directions that were less sort of, not the government type data, but things like Hamilton or Dragon Ball Z, or the SFMOMA. And that meant that we had to collect our own data and do it manually, but also throughout that process, for example, there is a topic where we had the Olympics, and I had this dataset from the Guardian where they gathered all of the medal winners for every Olympic game that has ever happened since the very first one in 1896. And while I was working through that, you feel that the Guardian is a very respectable source, but even there, it’s such a large dataset with thousands and thousands upon thousands of rows, things can go wrong. So at some point, I noticed when I made my first visualization that some of the medals were missing, and then I felt like, oh wait, of course, I need to take a step back and actually check this dataset to see if things make sense. But I don’t want to manually check every single value that would kind of defeat the purpose of relying on a dataset, but also it was a personal project. So there’s only so much time. So in these cases, and, in general, I like to find proxies, so I like to think about adding total values. Like, if I add up all the values from all of my separate observations, does the total make sense in a way, or if we are talking about percentages, should the total add up to 100%, and if yes, does it actually sort of get there? And for the Olympic space, one of the things that I thought about was, well, I have all of these separate medals, and if I look only at the gold medalists, if I add them up per Olympic addition, do they add up to the total number of events that occurred during each Olympic addition, because that should be like a one on one thing. So on Wikipedia, I could find the number of events that occurred for each of these, and I compared that to the number of gold medals that I had, and then I found some really interesting reasons why either there was a difference in gold medals, for example, a wrestling match that lasted for more than nine hours after which they felt like you both get silver, although I feel that would have been like a gold efforts, like, you’re both gold.
JS: Yeah, I feel that’s a gold effort, yeah.
NB: But another thing was that in that particular dataset, for a few of the additions, the horses were also included as having one gold, which was kind of interesting to see, like princess, being a woman winning gold in the Olympics and lady [inaudible 00:08:14] and these kinds of, that was kind of funny. But in the end, I felt like…
JS: Do you think they had a different podium, like, their podium was bigger so the horse could get up there?
NB: Yeah, actually… I don’t actually know. Yeah, that would have been fun though.
JS: Yeah, that would have been fun.
NB: I think it’s fixing those kinds of issues, and then really understanding that, okay, so now all of my gold medals do add up, I have so much more trust in the dataset. And so even if you have datasets from wherever, it’s always good to check it to see if it kind of makes sense, and don’t assume from the very start that your dataset is correct, because there’s always something weird and odd going on with datasets found online.
JS: Absolutely. Especially, yeah.
NB: Especially online.
JS: Yeah, especially online. So Shirley I want to turn the question a little bit, pivot it a little bit for you. So when you are working with clients, so I’ll sort of jam these together, but when you’re working with clients, or you’re just looking at stuff, visualizations are out in the world, are you, for the client work, are you making sure that you’re going through the data, do you make them go through the data, like, is there a quality control that you do with them? And then when you’re looking at stuff out there, are you thinking about, does it seem like the data that they’re showing me, like, makes sense, like, does that enter your thought process, like, I’m looking at this great visualization, but did the person do these checks that Nadieh did, how do you think about that as you’re going through your day to day?
SW: Yeah, this is really great question and actually, I don’t think it comes as naturally as it does for you and Nadieh, I think – and this is something that is so interesting because my background is computer science and business, so I didn’t come from a data background, I came into this from a coding and, like, I didn’t even know what DataViz was. And so, for years, I would – and this is one of the big things I learned from Data Sketches, and from Nadieh, which is like, for years, I made DataViz, not knowing how important it is to validate and verify the data, because, for me, I’m like, oh this is self-expression. [inaudible 00:10:31]. Yeah, both of you are like, oh god, is [inaudible 00:10:39] coming from a code background, I’m like, I’m just putting pretty things onto the screen. And when I first started, I didn’t even care if other people could understand what I was trying to show as long as I had the fun coding it. And it wasn’t until I started reading, and I don’t think I really fully grasped what it meant to make data visualizations until I started Data Sketches, and I started reading Nadieh’s data sections and reading through, yeah, one of the first things I read was about how she validated her data, and I’m like, whoa. I was like, whoa, this is important. And then, as you know, I started to realize how important it was, I became more and more aware of it. I still don’t think I’m very good at it, because I don’t think it comes to me naturally still. That’s why for me [inaudible 00:11:31] even across all the years, haven’t quite developed the intuition. That’s why for me, it’s so important to work with clients. If the dataset itself, and the topic itself is extremely serious to work with clients that are domain experts, or like, I won’t touch a topic or a dataset that’s sensitive, unless I can guarantee that I’m working with the domain expert to make sure that I’m presenting it correctly, because I know that I still don’t have the best intuition for verifying data, and that’s why I think even when I see something, I’m only now just starting to develop the Spidey sense of like, wait, this dataset doesn’t – I think it’s only in the last few years that I’ve started to question DataViz and be like, this data source doesn’t make sense or it feels misleading. And so that’s also why in my personal projects, I just try my best to do pop culture things, I can’t offend anyone, if I absolutely don’t verify all the data correctly.
JS: Right. If I have the script from Lord of the Rings, if I said Gimli said something that Legolas was supposed to say, like, okay.
JS: But when you’re showing COVID data, that’s a serious thing, right?
SW: Oh yeah.
JS: So what are those, without obviously, violating anything that you can’t say, but what are those conversations like, and I think you can both speak to this, but what are those conversations like when you’re talking to clients about their data, is it you really asking them to dive in and demonstrate things to you where, when you work with the data and you see something weird, you go back to them and say, hey, can you explain why this thing is over here? So just give us a flavor of what that’s like and maybe Shirley you can start, and Nadieh, I’m sure you have similar conversations with the folks you work with.
SW: Yeah, I don’t think I have – I can’t remember something off the top of my head, like a specific incidents or something, but I do try my best to go through the data myself and then try and ask them questions, but I do guiltily assume that the data, if I’m working with the client, I do assume that the data I have is a dataset I could trust.
JS: Right. There’s only so far you can go, right? I mean, at some point, you have to, like, the client’s given you the data, you sort of have to assume that they’ve collected it objectively. Right?
SW: Yeah, and I think for me, it’s very much about making sure that if there is any sort of inconsistencies or biases, and that it gets really laid out in the methodology, I think that’s where I go to the most. I think I don’t – I’m not saying that this is a good thing, I think it just tends to be because I think the data side is my weakest, I think I tend to be like, okay, so if we can’t fix this in any way, in terms of the data collection side, then I want to do my best to either point that out in the visualization itself or in the methodology.
JS: Cool. Nadieh, do you have any stories or experience or thoughts on how you sort of handle this data issue?
NB: Yeah. So I like working with like these big, diverse datasets. And so, I think for me, maybe more than half of my client work involves data that has things, sometimes it’s error, sometimes it’s things that they might have thought were supposed to be interpreted in a certain way, but then it appears that it’s more subtle than that. And I am always very sort of open and blunt about that. It’s like, hey, I’m finding this in the data, I thought it should be this, but I’m seeing this, how should I interpret that. Or it’s always a very much of a, I don’t understand this, please explain kind of questions. And I think it’s because I come from science, but I have these – I always write really long emails to my clients, especially the first one after I’m going through the data, like really long. I always also try to give lots of examples. It’s usually when one is wrong, I’ll try and find more of those specific cases where I’m seeing the same thing going wrong, or at least give very specific screenshots and examples, why I don’t understand it. And for now, every client has always responded in a very sort of normal human kind of way where they either explain it or they go dig deeper, or they ask somebody else who was even more closer to the data; and I’ve had recently, I actually had a client who, for as long as he thought that a certain variable called I was the index that connected everything, and it wasn’t, and so, I was actually going through the data, and at some point, things started making – didn’t make sense anymore, because these were basically stations on the world, so they had fixed positions. But when I started digging deeper, it appeared that these positions could just suddenly swap October 23, it just swapped to the other side of the – just moved. And it’s appeared that the index wasn’t actually [inaudible 00:16:51] if they did a certain data update in their system, the indices all got reassigned, and there was something else that was actually fixed. And he was like, oh, I never knew that. That was…
JS: You saved the day on that.
NB: Yeah. And some locations were in the dataset twice. And that also had a very quirky reason that he also didn’t realize, but like, this was a dataset of several, like, tens of millions of data points. It was so big that it’s hard to understand every part exactly. So these are actually usually the clients are kind of okay with me finding quirks so they can actually fix it.
JS: Yeah. Shirley, you mentioned the methodology. So for all of your projects – well, I guess my question is, do you write up a methodology, document or paragraph or thing that’s either internal or external, or in the viz, or outside the viz, is that like a thing that you try to do for all of your projects?
SW: So not all of them. I think I do them, but I’m more likely to do them, the more serious a topic is, and the more I want people to know about all of the considerations that we put into it, and all of the places that maybe we didn’t have the data where we had to make different assumptions. And so, I remember when I made the pandemic game last year, the person I was working with and collaborating with, Steven, he wrote up this huge document methodology that kind of explained every single thing that went into it. I remember when we worked with the Guardian, Nadieh wrote up all of the methodology and assumptions. I think it’s like, I write them when I want to make sure that I communicate all of the shortcomings across, but I don’t do them for, let’s say, when I made Hamilton, [inaudible 00:18:56] this is how I, you know. And so I don’t do them all the time. The answer is it depends on…
JS: Yeah, it depends.
JS: Do you feel like, maybe I’ll shift to Nadieh, do you feel like having that methodology or sources section or whatever it may be, paragraph or document, do you feel like that helps users or readers have more trust in the work that you’re doing, because you’re so transparent about it?
NB: Oh yeah, at least, I think that way. If I would be the reader and I could read a methodology, I would definitely, I think that would definitely increase my trust level. If I can sort of follow along, sort of maybe not understand every step but it’s better if I added a step more, but you get a feeling you get, you kind of understand the logic that went into doing certain steps and then you understand how they came up with the sort of final numbers, yeah, definitely.
JS: Yeah. So we’ve talked about data. Let’s talk about the actual visualization part, because I think there’s probably a lot of listeners who’s like, okay, let’s get to the actual creation stuff. Now, of course, people could just buy the book, and they could read all about it, but I think this process question is maybe the biggest question in DataViz, especially for people who are maybe just starting out and maybe they’re accustomed to making bar charts and line charts and pie charts, and they want to, I don’t know – I don’t know what the right word is, evolve, grow, maybe grow, they want to grow…
JS: Expand, they want to expand, and they want to get to that point where they can create some of the stuff that, you know, things like that you’ve created in Data Sketches. So like saying, tell us about your process is a super broad question. So I guess, I’ll try to narrow it in a little bit, and ask, when you are going through a dataset, and you’re visualizing, how do you move away or expand away from these sort of standard graph types, everybody knows how to read, but it doesn’t really grab your eye for when you see the 900 bar chart on a Wednesday afternoon. And you’ve both been doing this for years, so I know, it’s like part of your DNA now, but all right, so I’ll make this a two-part question, and whoever wants to start, so do you start with sort of these standard sort of traditional Excel dropdown menu type graphs, like, bar charts and line charts, and then how do you go from there to the sorts of things that you showcase in Data Sketches?
NB: I think, at least I have like a process, I think, part of it. And so, when I’m trying to understand the data myself, which I generally do through R, and then I do make lots and lots of bar charts and line charts and scatterplots to sort of really kind of build up this sort of mental model of the stories that are in the data, and what would be interesting to show, once I have like, I feel like I know my direction, and the story that I want to tell, and thinking about how to tell that to an audience in one chart in an engaging way. For me, I think if you’re just starting out and wanting to make that step, I would definitely, at least that was my tactic, I see what other people have done, and then I go and find one that I think is just awesome, or actually many of them that are just awesome, so I use Pinterest boards for that. And I had it on there, and then I go, I look through my Pinterest board, and I have sort of my dataset and my goal in mind, and I try and sort of project that dataset into the way that that person made that specific chart, and I feel like, oh yeah, maybe if I use this variable, like the size of the circles in that visualization, and I use this variable on the lines to connect them in that, in the same visualization, I think that could work. And then I might actually, if I really think that there is something there, and I really think that visualization is awesome, I might try and recreate it in that sense, and that’s really how I started out when I had very little experience, to be able to sort of come up with my own things, just steal like an artist or I like to call it remixing, so you’re inspired by a data visualization that somebody else made, but you’re not copycatting it, like, one on one, but you’re kind of taking things from it that you think the reason why you actually like it so much. And sometimes, when you actually project your data in their kind of visualization, you might see that it doesn’t work for your particular dataset, that could happen and you need to do the process again and find it with another dataset. And then what also helps is really just this building up of the experience, of doing it more and more often, so you’re broadening your view of the kinds of ways that data could be visualized, and then it also helps to look at things like the DataViz catalog to see that there is more than just bar charts and line charts. And then, it really comes to a time so at first, I really had to do it that way and now, years later, I don’t really do it that way anymore. I kind of always start from the data and the goals again, but then I just start sketching. And with the backlog of experience that is now in my mind, I can kind of, I draw from all the things that I’ve seen and try and come up with that. So, I guess, I am still remixing and stealing like an artist, but it’s now a little bit more internal in my mind.
JS: Right. Shirley, do you have such a well-defined process like that?
SW: I actually do have a process of my own, and I actually think that over the years, Nadieh and I probably have converged in some ways, because we’ve just been working together for so long. But I’ll try and highlight the parts where we differ a little bit in our process. And so, for me, the process that I’ve developed is really because, I mentioned this before, and I guess I keep mentioning it because I’m just like surrounded by the two of you that like has the data side really, like data collection, data analysis, data cleaning all so well.
JS: [inaudible 00:24:59]
SW: Yeah, it’s just so intuitive for you. And for me, because I, again, didn’t have that data background, I really struggled for years to try and figure out what is the style of analysis that makes sense for me, and this is kind of like my ad hoc way that I think I want to eventually replace by just like going through the books and teaching myself. But the process that I’ve kind of come up for myself over the years is I’m, what I’ll do is once I’ve finished collecting a dataset, which we’ve already talked about, all of the considerations that go into that, and then assuming that it’s been cleaned and verified, the first thing I do is I kind of look at the dataset, and I kind of look at what all the attributes or what all of the columns I have are, and I started listing, and I’ll do something, I don’t know if anybody else does this, but I’ll list the attribute, and I’ll list what type of data is, and I’ll keep doing that. And so, I’ll be like, this attribute is quantitative or qualitative, or it’s temporal, and then once I have that, I’ll like, look at the columns I have available, and I’ll be like, oh, this makes me think of, like, this makes me think of this question, or, it makes me have this curiosity, or, now I have this hypothesis about data. The process I have now is I’ll go and I’ll put the dataset into [inaudible 00:26:36] I’ll get like an Observable notebook. And that way, this is like my version of Nadieh’s R, and then I’ll kind of like start plotting it, and that’s why I like to list the type of data first, because then it really helps me figure out what kind of charts, quick charts it would lend itself well with, and I’ll explore that way, and I’ll try to answer all of my questions and hypotheses, and some of the questions and hypotheses, I’ll be like, completely incorrect. But in that like exploration, I’ll find something interesting, I’ll jot that down as like another thing to explore. And I’ll keep exploring, until I find the set of things I’m like, oh, this is the set of things that are really interesting, and I want to build the visualization around this to, like, communicate this finding, or, the set of findings.
And then from there, what I do is I completely forget all of the quick charts that I used, and I just go, like, okay, now that I have the central message, which is kind of like what Nadieh was saying about her goal, now that I have this message or goal, how do I want to communicate it, how do I most effectively communicate it, but also in a really interesting way, such that it grabs people’s attention, because, and, I guess, this depends on who my audience is. But for all of my personal projects, this is for a general audience, so I want something that’s like fun and delightful. And from there on, I completely try to forget any charts and any visualizations of anybody’s I’ve seen. So whereas like Nadieh goes and finds other charts that she gets inspiration from, I try to forget all of them, and then I’ll use a Pinterest board. But what I do is I try to find a Pinterest board for like mood and colors and shapes and general shapes that have nothing to do with DataViz, and I’ll see if it sparks my imagination. And then from there on, I’m like, oh, actually, this set of inky dots actually makes me think of this, and then actually that looks kind of like a network. And then that’s how I start thinking of the visualization itself. And then from there on, I’ll start sketching, this is something I used to really [inaudible 00:28:55] doing, but because of Nadieh’s insistence I started doing and now I’m appreciative of it. And I’ll start sketching out my ideas and jotting them down and then eventually I’ll convert that into the visualization itself, and that’s my process, and I think a lot of it is that I realize how important it is to understand the data, which is not something that I used to know. That’s why I developed that whole process at the beginning; and then also the second part about forgetting about all of the charts, I think it really is like a weird, it gets into my head when I feel like I copy someone. I’m like, I don’t know where this comes from, but I’m like, oh – and I feel like this is something I’m working on getting over but I’m like, if I copy anyone I’m a fake [inaudible 00:29:47]. And it really gets into my head. I am not trying to… Yeah, I don’t think that way of anyone else, but if I do it, I just beat myself up over it and that’s why I think I try to look for inspiration in tangential fields, and I would try to look for inspiration in nature or in art museums.
So yeah, and I think that’s probably why sometimes I’ll come up with things where people are like, oh, that’s [inaudible 00:30:17] expect for this dataset. And I’ll be like, yeah, me neither.
JS: Well, it’s interesting, and it’s a whole other discussion, I think, is, at what point do you need to cite someone else’s visualization. Like, we don’t cite William Playfair, Florence Nightingale, every time we make those charts. But at what point, and that’s a whole another discussion, I did want to ask, when you have that final visualization, and it’s up, and you’re looking at it, it doesn’t resemble really anything that anybody has ever created, it’s like this new visualization, which is like, Nadieh, I think I have, well, I know I have one of your visualizations at the end of my book, because it’s like, here’s this whole library of graph types, but like, that’s not finite, it’s an infinite space. But when you’re done, and you’re looking at this thing that you’ve created, that’s like, no one’s ever done it before, you’re like, holy shit, I just created like a whole new thing. Does that occur to you, or, you’re just like, no, I just made a visualization, and let’s see if people like it?
SW: Yeah, just made a visualization, hope people like it. If not, I like it.
NB: I think I generally have that except for that Lord of the Rings visualization that I created, which is heavily based on a [inaudible 00:31:35] diagram, a specific type of visualization, but then kind of mutated into a thing that it’s, like, it’s been evolved into something that’s no longer compatible with the first, so it’s become its own species kind of thing. And then, somebody said, I should make it into its own chart, and then we called it the loom with the springs. So then it became, so in itself, I was suddenly very aware that this became a new possible chart type, if people who wanted to use it.
NB: So sometimes it kind of still comes into my mind where I think also one of the later visualizations that I did about Cardcaptor Sakura, where I think, oh, this is kind of, I haven’t seen this yet, but it feels like it could be, how do you say that, standardized in a way that could be used for other datasets as well. And that one actually pops into my mind, because I’ve had several emails from people saying, like, hey, this kind of visual, I kind of want to use it for different datasets. So then I feel like, oh, I guess, maybe it is a new kind of chart type, even though it’s a very niche thing, and it will only be useful for a handful of people.
SW: I do want to also say on the topic of citation, I do think of, for example, like what Nadieh says about remixing, I think, when she remixes, it’s truly a remix, where I don’t think you can see the hints of the original as much, or I think you can see bits and pieces of their original but it’s all kind of then wrapped up together into something that’s new and Nadieh’s. And I feel like in those sorts of cases, she’s like, kind of, it’s like creative commons, and then, like, she’s modified enough of the original, that’s her own, that I’m like, I don’t think you need to do any sort of, like, inspired by or anything. But I also come across examples of when it’s almost like a one to one, and maybe they change the dataset or something. In that case, I will feel really unhappy if there’s no inspired by this piece or something. And I think that’s really the big difference, if somebody uses the exact same code and very similar dataset, I believe that’s just plagiarism.
NB: That just should not happen.
JS: No. Let me take a twist on this. So you both share a lot of your code, and a lot of the code people just get by inspecting the code. So is there a point where you feel like you own that code, or because it’s often built on an open source platform that if someone grabbed your code, so it’s not that it’s going to be, they’re going to grab the code, but they’re going to change it, they’re going to basically do let’s just say what Nadieh did, right, like, they’re going to remix it, but the base is going to be off a code that you put out there. Do you feel like people need to cite the fact that they started with your code, or that’s just, you know, that’s open source and you’re freely providing that?
NB: If it’s a true remix, so even I would feel that it’s, I can even, I might be able to see that maybe it started out as this thing, but it became its own unique thing, then no, that’s totally fine if they use that sort of that starting code, but it is a gray area where it goes from playing copycat to the remix part. So then it depends on the specific key and how friendly the person making that new visual [inaudible 00:35:05] I guess.
SW: I’ve thought about this a lot, because I think – and the conclusion that I’ve come to is that while, yes, we’ve open sourced our code, I actually think that our work is less similar to open source libraries, where it’s kind of like, it’s this tool that people have built with the intention for other people to use it, and ours is more like an artwork where we have our own style, and we have our own intention with it. In general, if someone takes my code and gets inspiration from it, modifies it enough – and yeah, it’s a gray area, how much is enough, but like, let’s say, they write a bunch of their own code, and the output is their own artwork with their own style, I’m really happy when that happens. I’m like, oh, you were like you liked my work so much that you went and did something in that style, that’s super flattering. But when it’s like my exact code and [inaudible 00:36:08] a different dataset in, that offends me to like no end, because I’m like, you don’t know the number of hours I’ve thought through this design, and why I chose this design for this dataset. It’s actually offensive, [inaudible 00:36:28].
JS: Yeah. It’s really interesting, and I have these conversation with people at work all the time about, you know, to what extent should we release our data, right? Because if I go and grab six datasets, public datasets, and I merge them together, and I put all this work into cleaning it and making sure it’s this and that, even if they’re public datasets, I’ve put a lot of time and effort into that. And so, at what point do you release something like that, even though it’s based on, as you said Shirley, it’s an open source platform, but you’ve put all this time and effort into creating something off of that platform, same thing if you’re merging all these datasets, you’ve created something out of all this publicly available stuff. And so, at what point, is that sort of an ownership thing versus an open sourcing? It’s only sort of weird gray areas.
JS: So we’re getting towards the end of our time, and so I wanted to look forward a little bit, Data Sketches, Volume 2, no, just kidding. I think if you’re anything like me, you don’t want to think about a book project for a long time.
JS: And I know, based on our conversations we’ve had and the other things I’ve heard you talk about, that you both have a variety of interests. Nadieh is currently making a robot in front of us as we talk. But I wanted to ask, you could pick how you want to answer this question, I wanted to ask about the future of data visualization. So I wanted to ask, either what do you see as the future of DataViz, and future can be whatever you want to be, a year, five years, 10 years, a 100 years, although we may all be underwater in a 100 years, so that might not be as interesting. So either what do you see for the future of data visualization, or what do you want to see, or what do you wish to see for the future in the future of data visualization, be it tools or technologies, or, what have you? So maybe Nadieh you can go first, if you have.
JS: Right, yeah, that we narrow into a single tool that [inaudible 00:40:57]. Okay. Shirley, hopes, wishes, dreams for the future of DataViz?
SW: Yeah, I mean, Nadieh took the good one, which is the data literacy part.
JS: That was a good one, yeah, she did.
SW: That was [inaudible 00:41:12]
JS: She’s been thinking about this the whole time, this whole discussion. She’s like, I’m going to get it first, and I want to…
SW: I was like, and I was just thinking, I was like, oh, I got a good one, data literacy, and then Nadieh just goes, and I’m like, okay, so [inaudible 00:41:28]
NB: We also have data literacy.
SW: Yeah, we don’t have to compete with each other.
JS: No, that’s right.
JS: You’re already in partnership, and got this partnership.
SW: Yeah, this is not a competition, it’s not a scarcity mindset, no. Yeah, so that’s something that I was thinking a lot about. Earlier on in the podcast, we mentioned something about methodologies, and I just thought, like, we know to look at methodologies, because, like, as data professionals, that’s the first place we’ll look at to verify a chart and to understand where it’s coming from. That’s not common knowledge, and I feel like it’s, on top of being able to see a chart and recognize what it’s for. I think some things like with all of the misinformation that’s online, and all of the times that my mum has been like, look at this. And I’m like, no. I think it would be really great if there is more conversation about how to suss out charts or information that’s not legitimate versus those that are, and I think it’s like being able to not only kind of understand the chart and where it’s coming from, but also to know, to look at the methodology. So that’s one thing, I think from like a general public perspective.
Another thing that I’m excited about and have been quite excited about is like how data visualization is becoming more prevalent and kind of like the industry from like a corporation perspective, I think, even like, four or five years ago, when I first started freelancing, most companies were like, DataViz, what’s that. They kind of were just starting to understand the importance of data science and data analysis, and thinking about visualization as a separate practice for communication wasn’t even a thought on most companies’ minds. And I think that’s slowly, slowly starting to change, and I’m very excited for that to happen, where we get to a place where companies understand the value of maybe it’s like internally understanding their data and the importance of communicating it or even the importance of communicating it to their consumers. I’m enamored by what Suzy Lu put out one time, and then she kind of just dropped it on us, and then disappeared. But that one receipt project she had, where she kind of took one of her receipts from a grocery store, and then she basically, I think, got a receipt printer, and then she visualized the items that she bought as very simple bar charts, but she put it into, I think it was like, how much I spent on produce, how much I spent on meat or something like that. But this idea of, just seeing data displayed in an easy and engaging way in items around us, that is something that I’m like really excited for. I don’t know, I think it would take a while but like five, 10, 20 years, and for that to be commonplace, I think that’s so cool because it just means that we’re hopefully much more informed. And then, I guess the last thing is just like a personal thing of, I’m also enamored with big installation and art, and I guess this is the direction I want to go towards, which is kind of like art installations that have at its core is informed by data, and that data and story is told through this installation that also is interactive and brings the community together. That’s just my personal dream to work on, yeah.
JS: That’s really cool. So I’ll just say on the second one, I think we’re closer to that than 10 or 20 years. Because if I look at my Apple Watch, it’s got the little rings on it, that’s like, it’s a visualization on my watch. I feel like we’re closer to that, especially, as sort of everything moves to being on your phone, like, when you could just tap your phone to the credit card thing. I can imagine you don’t have to print the bar chart on a receipt, it could actually show up on your phone. So I think we’re close to that. I think that’s a really cool idea, and the DataViz art is when we’re back in museums [inaudible 00:46:12]
SW: Yeah, [inaudible 00:46:14] or even in like outside common spaces, because that’s also another conversation of putting things in museums is actually kind of inherently exclusionary. Yeah, that’s a whole separate conversation.
JS: That’s a whole separate conversation.
SW: [inaudible 00:46:29] that I will not get into.
JS: No, but it’s true. I mean, you look at how many people surround the Bean in Chicago on a nice summer day. You could have something that’s data driven that people are surrounding themselves with.
JS: A whole other conversation. We could talk forever. Thank you both so much for coming on show. Congrats again on the book. It is lovely, and I have been inspired just by peeking through it. So I will link to these other interviews you’ve done so people can hear more about you guys ranting about color, which I think was on the Cole’s interview.
JS: Yeah, more on process on Alli’s interview, so there’s a lot that people can learn from. So thanks to you both for coming on the show. It’s been really great chatting with you.
SW: Thank you so much, Jon.
NB: Thank you for having us. It’s a pleasure.
Thanks so much for tuning into this week’s episode of the podcast. I hope you enjoyed that. I hope you’ll check out more episodes of the podcast going back through the archives. I hope you’ll also join me on some of the new episodes of the All Charts Considered sessions on the Clubhouse app. Again, if you need an invitation, just send me a note over at policyviz.com. So until next time, this has been the PolicyViz podcast. Thanks so much for listening.
A number of people help bring you the PolicyViz podcast. Music is provided by the NRIs, audio editing is provided by Ken Skaggs and each episode is transcribed by Jenny Transcription Services. If you’d to help support the podcast, please share it and review it on iTunes, Stitcher, Spotify or wherever you get your podcasts. The PolicyViz podcast is ad free and supported by listeners. If you’d like to help support the show financially, please visit our Patreon page at patreon.com/policyviz.