Sarah Williams is an Associate Professor of Technology and Urban Planning at the Massachusetts Institute of Technology (MIT) where she is also Director of the Civic Data Design Lab and the Leventhal Center for Advanced Urbanism. Williams’ combines her training in computation and design to create communication strategies that expose urban policy issues to broad audiences and create civic change. She calls the process Data Action, which is also the name of her recent book published by MIT Press. Williams is co-founder and developer of Envelope.city, a web-based software product that visualizes and allows users to modify zoning in New York City.
Before coming to MIT, Williams was Co-Director of the Spatial Information Design Lab at Columbia University’s Graduate School of Architecture Planning and Preservation (GSAPP). Her design work has been widely exhibited including work in the Guggenheim, the Museum of Modern Art (MoMA), Venice Biennale, and the Cooper Hewitt Museum. Williams has won numerous awards including being named top 25 planners in the technology and Game Changer by Metropolis Magazine. Check out her latest exhibition, Visualizing NYC 2021, at the Center for Architecture in New York City which opened November 19th.
Sarah and I talk about her amazing new book, Data Action: Using Data for Public Good. We talk about the data, processes, and data visualization work that went into the stories she tells in the book.
Support the Show
This show is completely listener-supported. There are no ads on the show notes page or in the audio. If you would like to financially support the show, please check out my Patreon page, where just for a few bucks a month, you can get a sneak peek at guests, grab stickers, or even a podcast mug. Your support helps me cover audio editing services, transcription services, and more. You can also support the show by sharing it with others and reviewing it on iTunes or your favorite podcast provider.
Welcome back to the PolicyViz podcast. I am your host, Jon Schwabish. Now, as you know, we spend a lot of time on the show talking about data visualization. But you can only visualize data when you have data. And maybe too many of us don’t think carefully enough about the data that we have and how it’s collected, and ultimately how it’s being used. And so on this week’s show of the podcast, I’m really excited to have Sarah Williams, who is the author of the new book Data Action. And it’s a really great book, I really, really enjoyed it. I think it’s one of those must read books, to help all of us think more carefully about where our data come from, how data are collected, and how we can pull together new data sources that maybe don’t exist. So, Sarah and I talked about her book, we talked about her work, and you hear lots of interesting stories that are presented in that work. And so I hope you’ll enjoy it. So, here is my chat with Sarah Williams.
Jon Schwabish: Hi Sarah, welcome to the podcast. So, great to chat with you.
William Sarah: Really great to be here. Thanks so much for inviting me.
JS: I’m really excited to chat with you, I have just finished your book, data action it is, I don’t want to oversell it, but it is incredible. It is a great book, it takes folks through this entire process, and I really want to dig into it. So, book we start, maybe you could talk a little bit about yourself and your background, and then we can talk about your current work and the book itself.
WS: Great. So, right now, I am an Associate Professor of Technology and Urban Planning at MIT. And I also run the Civic Data Design Lab, which communicates the insights of data to broad publics to affect policy change. And one of the main ways I do this, as you probably know from my book is through data visualizations. My background, I’m trained as a geographer, an architect, and urban planner, and a data scientist, and I also am a computer scientist, as well. And so the work of the lab really matches up those skills to translate data into a tool for action. And I really think, you know, my early training as a geographer would really focus on, you know, using data to understand how people relate to place and how place affects society has been a huge influence on my work. But I’ve always felt like when we’re using data, or as, let’s say, researchers, or when find insights of data, those messages aren’t always communicated to the policy experts or the planners or the people who are making decisions. So, one of the things that I really try to focus on is how can we bring those insights of data to people who really need to make important decisions around cities, but also other kinds of policies, whether it be criminal justice, or things related to internet use, which are all part of my work?
JS: Right. First off, you have this amazing background. So, if we had time, we could go into how you’re able to do manage all of those different skillsets, which is incredible. But you mentioned that you do a lot of your work through data visualization. And there’s a chapter in the book on data visualization. But what I love about this book is it takes a more thorough look at the entire sort of data infrastructure. So, you have this model or method of data action. And so maybe you can just walk us quickly through what that process looks like for people because the data visualization, although it’s really important, and you make, I think you make a great case of how important it is, especially in our current era of content streaming by, it is sort of at the end of the road after you collect the data and analyze the data and work with the data. So, can you sort of lay out this model for us?
WS: Yeah, thanks so much for asking me. Well, I think one of the things that I think is really important in the model is that, you know, really to take action with data, we need multidisciplinary teams, which allow us to ask the right questions of the data, so bringing together a data scientist, a policy expert, the visualizer, and also people from the community in which we hope to effect really helps to create, let’s say, multidimensional projects, but also answers to those projects. So, I really start by, you know, bringing those teams together to identify the right question, then I really think about how do we quantify it? Perhaps how do we create or capture new data that isn’t available or missing? And then how do we create insights or build insights off of that data set, visualize that information through data visualizations, or maps, and then really ask the people in the data whether what we are showing them rings true? So, I call this kind of a ground truthing moment, both ground truthing to see if the insights of what we found on the ground are true, but also whether the insights we’ve developed makes sense to the communities which we work with. So, that’s really why it’s important to have these team. A lot of that, let’s say ground truthing process does come through data visualization, and then modifying our results based on those kinds of ground truth, assessments, and then starting all over again to rebuild the model to get it to be a bit more accurate. But I think like, really at the heart of it, it’s the teams that make the action really impactful, because each one of them has their community of interest and which they bring the insights of our analysis to action oriented results.
JS: So, I love this concept of the multidisciplinary approach. I think it’s something that we all need to embrace more in our work. You mentioned just now and it’s throughout the examples in your book, this idea of talking to communities, or people that are the focal point of the work or who the work will benefit. And I’m curious, coming from a more quantitative background, how do you get colleagues who are more quantitative to be in some sense more qualitative? How do you get them to go into the community? In the book, you have a number of examples, I think we should talk about a couple of them about getting students to go out there and interview people. And I’m curious, do you try to get, you know, folks training? Or is it really just let’s go out on the street and just talk to people?
WS: Well, I think, again, like, you know, usually there’s somebody on our team who’s really specializes in that work. And we really, I asked them to train the students or the researchers on the project to do that, talking with communities. And so, you know, bringing their knowledge of how to talk to communities, so the team helps everybody kind of learn a new skill. In my case, I know how to do that in my background. So, sometimes that’s me doing the training.
WS: Right. Sometimes it’s our policy experts. I’m just thinking about a project that we did in Nairobi, and I met with the students about wanting to talk to people in the community. And they actually were like, Sarah, you know what, I think we can do this on our own, because you being there observing kind of looks like an outsider. And I think you’ve given enough to know what to do and like I was super excited about that. Kind of they went out in the community. This was a project where we were asking about internet use in some of the lowest income communities wondering what was access like, we have information about access to the Internet and download speeds. And we wanted to see if it really rang true to those community members. And I was just really excited to see the students at University of Nairobi take that on after several conversations.
JS: Right. Yeah. That is incredible. So, why don’t we talk about one of those projects, because there is in a couple of the chapters, there’s a discussion of the Matatu infrastructure, I guess, in Kenya. You talked about the collaboration between the data folks between the government, between the drivers and the public, and just how that almost called like a community based project really ended up in this successful final product. So, can you talk about that process a little bit? And then what that product was?
WS: Yeah, great. So, I think this is like one of my favorite projects in the book probably why you see it in multiple chapters. And I think it really illustrates the data action methodology quite well. So, this project, you know, has a long life, it started, I started working in Nairobi in 2006. And with many rapidly developing cities, transportation is a huge issue because the infrastructure hasn’t caught up with the development of the city of suburbs (ph), experiences crazy traffic jams. And I’ve been working on transportation models in the city, but didn’t have, you know, essential information about Matatus, which are the main form of transit in the city, they are small mini buses, either 14 or 30 seaters that really are how people get around in Nairobi, and they’re owned by private operators, you know, we just didn’t have data about where they went. But they represented close to like 80% of the vehicles on the roadway. So, you’re imagining if you’re putting a transportation model together, you really need to have that data. So, we set out on a project to collect that dataset, and really from the beginning included people in the transport sector that we’re interested, the government, the Matatu drivers, the owners in the process of how we were going to collect this data. And ultimately, what we created was an app that you have on your cell phone, and working with University of Nairobi students and my colleagues at the University of Nairobi, computer science department created an app for the local context in which we collected data on all the routes and stops using the cell phone information. And I think what’s like super interesting about the project is everybody was involved to a greater or lesser extent, the government knew about it. But you know, we’re largely disinterested. But then once we had the data, and translated it into a map, something that you might see in London or New York or Washington DC, they got very excited about the project, the government did well, actually, the project went quite viral once we put the maps out themselves. And I think what’s interesting is the government instantly made it the official map of the city. And I think that they were able to do that, because they trusted our process, they trusted the way that we went about the data collection, because we were very transparent, and included them in it all along. And so they felt really like they had ownership of that data phone (ph).
JS: Right. So, you’ve done a lot of work around the world, have you found in projects in the United States, the same necessary interaction between these various stakeholders? I mean, it seems like it would be universal. But that story in Kenya just like seems like such the big success story that everyone would point to. I’m just curious, have you seen or have you been involved in other projects that have the same kind of success? And really focusing on the interaction of all these different groups?
WS: Oh, absolutely. I have a number of them that I write in chapter four, as you alluded to, I think one project that I’ve worked on in particular was a project called the Million Dollar Blocks, where we took intake data from the prison system, and looked block by block how much it costs to incarcerate people from those blocks. And we found over a million dollars is spent to incarcerate people from so many blocks in New York City. But the same level of investment is not given to schools, to job training programs, to the systematic things that might alleviate the reasons that these individuals might be involved in criminal activity to begin with, and so kind of looking at reinvesting the money into the community. But that project, we had a deep partnership with policy experts, with the communities that we worked with, with architects and designers, and I really it’s a great example of, you know, how these kinds of collaborative projects can really have an impact. And in this case, our maps were seen by a congressman and our data visualizations were seen by a congressman who actually use them as evidence for the Criminal Justice Reinvestment Act, which allocates money for reentry programs, which are in job training programs after people leave prison. So, trying to reinvest in the community itself. But I think what’s interesting about that project too is that it had a very long life. And, in fact, it’s come up a lot recently and of the conversations around defunding the police and because, you know, part of the message was let’s reallocate spending towards, you know, the systematic effects in these communities, which is a lot of what that conversation around defunding the police is. So, I see the maps get used all the time still, even though that project is over, you know, over 15 years old, [inaudible 15:35] talks about how, you know, having this collaboration create gives a project life that can have an impact much longer than its initial scope.
JS: Yeah, yeah. I mean, it’s really interesting to have, how that collaboration can give a project success and give it life. Another one of the things that you talk about throughout the book, I think that comes through is data sources, data projects, data visualizations that reinforce power structures, or reinforce racism. And in the chapter on data visualization, you talk about specifically spend some time talking about maps and how they’re inherently political, and you know, what you include, and what you don’t include, illustrates these social contracts, they illustrate power structures. And I guess I’m curious on a couple of perspectives on this, one is, I guess, really, in your work. How do you approach these data visualizations, when you are trying to solve these important questions? So, you mentioned the Matatu map in Nairobi? And I’m curious, you know, when you take a map like that, and you model it after someone models and after the London tube, or the DC metro, like does the team say we should have a different look and feel for this? Because we don’t want it to, you know, look like, you know, Western, you know, Northern European structures and that sort of thing? And then how do you think about as you’re doing work, how do you think about creating visualizations that are receptive and conscious about these different power structures? So, I know that’s a big question, but it’s really just trying to get at you spend a lot of time in the book talking about, and I think it’s such an important issue, especially today and I’m just curious if you could sort of just spin on a little bit, I guess.
WS: Yeah, no, it’s a fantastic question. I’m so glad you asked it too. I think, in particular, my top two projects are great example of that, you know, visualizing information isn’t always a great thing in some context, right. I think, in the case of Nairobi, you know, visualizing the routes and stops of these informal systems really had a benefit to the public. But you can imagine, in some other contexts, where this might not be good, and I guess, you know, in some cities, you know, visualizing that data would cause a crackdown on these vehicles, who, you know, sometimes don’t have proper license and so forth. But, you know, crackdown as it wouldn’t necessarily benefit the people who live there, because they depend on these transportation systems to get around, right?
WS: So, you might not want to expose them. And we have had, you know, since we’ve done the Matatus projects, we’ve had many cities come to us being interested in doing this work. And there are certain cities that exposing it would not be of benefit to those organizations, and we haven’t done the visualizations because of them. So, you know, you always have to think, I guess, at the heart of it is what you’re doing, can it do harm to anyone, right? And in the case of Nairobi, I think, in fact, it provided a huge benefit in terms of, you know, providing an essential resource to the public, but also to the city, but also to, you know, transportation analysts who are trying to model and improve the transportation system. But in other contexts that might not be good. I would also say in the case of Nairobi, we worked a lot with the Matatu drivers, owners and the community to think about a strategy for visualizing the data. And one of the things that they wanted to use the visualization for was to increase funding resources to this, you know, semi-formal system. And by having an association with, let’s say, a London or Paris, and by making it really showing that it is a very similar system to those really helped get that support from those NGOs and outside actors to help improve the system itself. So it was in a way it was strategic. But I guess actually, what was the side benefit is I think it became kind of iconic in Nairobi, we had sweatshirt designer come to us and contact and kind of went viral and became something that people in Nairobi are really proud of as well. Just like we are proud of the New York City map and many people are proud of their own suburb map.
JS: Yeah. So, I want to finish our chat by sort of going to the beginning, to close to the beginning of the book, which is on the data collection side, because I think a lot of people who work in data visualization are focused on, you know, what’s the end product? How do I visualize the data? But you spend time throughout the book in the various chapters talking about data collection. And there’s a really interesting story about collecting pollution data in Beijing during the Olympics. So, maybe you could, you know, tell folks about that project. But I think my real question of interest is, when you are leading or participating, I guess, in any sort of crowd sourced data collection effort, how does one or how do you, Sarah, as the project lead, manage the accuracy of the data? So, it’s in some ways easy for us to go out with our cell phones and do you know, geo tracking, but how do you ensure that the data that are coming in are accurate, so that when you get to that last stage of the data visualization that we can have faith in it that, you know, it is accurate, even though it’s been collected by hundreds or maybe thousands of people?
WS: It’s a great question. You know, in many of the projects and even in the Digital Matatus Project, while that data was crowdsourced, we did have volunteers working on the data, who understood how to collect it, and then we have a team that actually check the data afterward to ensure its accuracy. Yeah, so in the case of the Beijing Olympics Project, we were really interested in trying to get data on air quality levels in Beijing. And what’s really surprising is that just, you know, weeks before the Olympics, there was no data released by the government on air quality levels. And, we know Beijing, we hear about the pollution, obviously, there was quite a concern from the athlete community, but also just quite a bit of interest from the press, and trying to identify what effect that pollution might have on the health of athletes. So, we teamed up with Associated Press to collect data on air quality, and I developed a sensor, a mobile sensor that could be used by the reporters during the Olympics. One thing to note is that mobile sensors are not as accurate as let’s say, kind of these like very big systems that we might see from the US EPA, but they do provide relative accuracy. And in the case of Beijing, where the air quality was so significantly bad, that relative error level was fine. And just to give you an example, like the average air quality in your city, or London, ranges from 10 micrograms per cubic meter to 15, 15 is considered like a bad day or 20 is a bad day. In Beijing, we were getting recordings of 800 micro cubic meters, and this is particulate matter I’m talking about right now.
WS: Some days it’s 200. So, you know, having, it was so extreme and how bad it was having some error and the device was fine, because we were showing or exposing kind of an extreme condition.
JS: So, is the view then that some data in this case, because there is no data. And I mean, in particular, it’s, you know, it is health of people and athletes. In this case, is it, you know, some data, even if it’s not perfect, even if it’s not, you know, government regulated, some data is better than no data at all?
WS: Exactly. And in fact, you know, in this case having just a dataset that says just at what range we have information, have a huge impact on taking action, but just making people aware, and I think that in a lot of the community data projects that I talked about in the book, you know, having some information about the air quality, even though the sensors might not be high quality, have been allowed people to come out and 24:54] high level air quality sensors into the community. So, one example that we use in the book is a community that was convinced that fracking was causing poor air quality in their community due to exhaust and some other, let’s say, mechanisms in which fracking occurs. And they put low quality mobile sensors. And they were able to indicate that in fact there were higher levels or, let’s say poor conditions, as opposed to other areas. And they were able to use that data to get the EPA to come back and put in even higher grade sensors, which did, in fact, prove the poor air quality in the community and had an effect on regulating some of the devices that are used in fracking. So, you can see here, you have this sensor that’s maybe not the highest quality, but it provides data, and ultimately was able to have a huge impact on policy and a huge impact in that community itself.
JS: Right. So, I want to just wrap up and I guess, just ask, like, where do you hope this book will be used? You know, how do you hope people will be able to use it in their own work and then apply it?
WS: Yeah, I mean, I think that’s a really good question. I mean, one of the reasons that I created this book is, you know, you know, we’re all really excited about data and its potential use. But I really wanted to give people guidance on how to use it ethically and responsibly. And, you know, we’ve heard a lot of critique about the misuse of data from people like Virginia Eubanks talking about automating inequity, or Cathy O’Neil Weapons of Math Destruction. But here, I wanted to, within that criticism, provide guidance of ways that you can use it for good. And so I hope people take this book and create their own projects that really start thinking about how we can use data for the public benefit. And at the end of the book, I create what I call the “Data Action” principles, which are seven principles that I think we should all, all data enthusiasts should be thinking about when they attempt to use data for good. And perhaps I can mention those now. You know, one is to say, do no harm, we must interrogate the reasons we want to use data, and determine the potential for our work to do more harm than good that kind of gets back to what I was talking about in Nairobi, you know, as visualizing that data have the potential to do harm. So, we should always ask ourselves that. Two, we should build teams to create narratives around data for action, it’s essential for communicating results effectively. The third principle is change power dynamics by building data hubs, you know, change the power dynamics inherent and controlling and using data. And, you know, those examples that I talked about in terms of air quality really did change those power dynamics. Four, expose hidden systems coming up with unique ways to acquire and quantify a model data can expose messages previously hidden from the public eye. However, we must expose ideas ethically going back to the first principle. So, you know, when I talk about the Million Dollar Blocks Project, we really exposed, you know, the costs of incarcerating people. Five, ground truth, we must validate the work we do with data by literally observing the phenomena on the ground and asking those in the dataset, how our results can be interpreted. Six, we should share data, and I talked about this a lot by sharing the insights, we can really create change in policy. And then seven, create your own ethical standards. Remember that data are people, and we must do them no harm. And we must seek to develop our own standards of practice. I think this really gets at the fact that, you know, technology moves much more quickly than we can create standards of use. And it’s up to us to create those ethical practices and data scientists to really develop them along with the technological development itself.
JS: Yeah. Well, that’s an awesome list right there. And I think I’ll just, you know, I’ll say again, I mean, I’m a big fan of the book, and just, you know, big fan of your work. And I think people could learn a lot by following those seven steps and checking out the book. So, thanks so much for coming on the show and chatting with me. It’s been great talking.
WS: Yeah, thanks so much for having me. And I’m so glad you like the book. And really, I just hope that the book inspires people to create their own data projects and use data for action. So, thanks so much for letting me to share that with you today.
JS: That’s great. Okay. Thanks so much, Sarah, I appreciate it. And thanks, everyone, for tuning into this week’s episode of the show. I hope you enjoyed that. I hope you’ll check out Sarah’s book Data Action. It’s a great read and really will help you think about all the ways in which data can be collected and the issues that we should consider when we are visualizing our data. So, until next time, this has been the policy of this podcast. Thanks so much for listening.
A number of people help bring you the policy of this podcast. Music is provided by the NRIs, audio editing is provided by Ken Skaggs and each episode is transcribed by Jenny Transcription Services. If you’d like to help support the podcast, please share it and review it on iTunes, Stitcher, Spotify or wherever you get your podcasts. The policy of this podcast is ad free and supported by listeners. If you’d like to help support the show financially, please visit our Patreon Page at patreon.com/policyvitz.