Richard Brath is a long time visualization designer, researcher and strategist. At Uncharted Software, Richard focuses on the creation of high-value visual analytic applications that solve real-word problems in capital markets, supply chain and healt-care analytics. These solutions in use by hundreds of thousands of users around the world every day.
Richard is also actively involved with the visualization research community, and has authored two books on data visualization: Graph Analysis and Visualization, together with David Jonker (Wiley 2015); and Visualizing with Text (AK Peters, 2020). Richard’s personal blog on visualization is at richardbrath.wordpress.com and he is on Twitter @rkbrath.
Visualizing with Text
Book companion site
The textual hierarchical table of contents to Cyclopaedia
Episode #205: Steve Franconeri and Jen Christiansen a VisComm Workshop
Episode # 199: Miriah Meyer
Episode # 198: Scott Berkun
New Ways to Support the Show!
With more than 200 guests and eight seasons of episodes, the PolicyViz Podcast is one of the longest-running data visualization podcasts around. You can support the show by downloading and listening, following the work of my guests, and sharing the show with your networks. I’m grateful to everyone who listens and supports the show, and now I’m offering new exciting ways for you to support the show financially. You can check out the special paid version of my newsletter, receive text messages with special data visualization tips, or go to the simplified Patreon platform. Whichever you choose, you’ll be sure to get great content to your inbox or phone every week!
Welcome back to the PolicyViz podcast. I am your host, Jon Schwabish. Welcome to Season 9 of the podcast. That’s right, nine years of doing the PolicyViz podcast, and I’ve got a great lineup of guests coming to you this season. I’ve got folks from all over the world, doing all sorts of great, really interesting work in the fields of data analysis, data visualization, and data communication. And to kick off this season of the show, I’m really excited to welcome Richard Brath to come talk to me about his book Visualizing with Text. Text is going to be one of the themes you’re going to see this season on the show – how do we visualize qualitative data? It’s such a big challenge, and there are more tools, more platforms, and more ways to analyze and visualize those data. So I talked to Richard this week about his book about ways to visualize qualitative data, I hope you’ll check it out, I hope you’ll learn a little bit, and I hope you’ll go over to policyviz.com, where I’ve got an entire collection of qualitative data visualization examples in my ever growing library. So here we go with Season 9, here’s my conversation with Richard Brath.
Jon Schwabish: Hey, Richard, good morning. How are you? Great to see you.
Richard Brath: Hi Jon. Great to see you too. Thanks.
JS: How’s your summer in glorious Toronto?
RB: Summer has been awesome so far. It’s been a great summer so far.
JS: Right. And getting some time on the lake there?
RB: We’re surrounded by lakes here of many different sizes, so Lake Ontario is the lake associated with Toronto that I can see out my window from where I’m sitting, and actually haven’t been to.
JS: Wow. Okay.
RB: But I have been to many of the other lakes.
JS: All right, okay. So as a native of Buffalo, I have been to Erie, and then swimming in both of those lakes, maybe not the best choice, I don’t know.
RB: Yeah, I’ve spent a lot of time in Huron already the summer, so if you know the Great Lakes.
JS: Yeah, okay, so you have a little Great Lakes quiz trivia going on for folks who are listening.
JS: It’s a good start. Awesome, well, thanks for coming on the show. So you are pretty much right now the master of the qualitative data fields, and there’s not a ton of thorough resources like your book on visualizing qualitative data. So I wanted to start there. What drew you to writing a book about visualizing qualitative data?
RB: So I was really interested as a long time practitioner in visualization and what you can and can’t do, and there’s certain things that you do, and you create bar charts or pie charts or whatever, and things just flow very nicely, and it communicates what you need to communicate. But then you run into problems where the kind of toolset that comes with visualization, doesn’t work, doesn’t fit as well, right? Like, what happens when you have more than 10 categories, right? You kind of run out of colors at a certain point. What about word clouds? Like, certainly, there’s got to be something better than word clouds. What about countries, like, you’re viewing data about countries, there’s 200 countries, or maybe 150 that you got data on. And there’s got to be something more than just maps. Right? Maps are obvious, because they work and you can fit everything in, but there’s got to be other techniques. And then, news, we’re looking at things with news and, like, news is all text, but it’s super important to a lot of people. And some people need to trade and make decisions off of news, so you want to be able to do something with that more than just the text. And it’s getting bigger and streaming more and whatever, so you know, there’s all of these problems, where text is an element of it, and you’re struggling with how to fit those into visualization. I just want to add one more quote on that, and I just got to go find it, because I’m going to garble it, if I don’t find it. Maybe I have it here. It’s a book that I just started reading. And there’s one more thing that I want to add on here, and that is a lot of visualization focuses on measured data metrics, and there’s been this obsession, obviously, in the last 20, 30, 40, 50 years about, you can’t manage what you can’t measure, and GE Six Sigma, and all of these things that are super focused on measurement, right? But I was reading this great book, maybe it’s from the 60s or 70s by this guy named William Cameron I hadn’t found before, and right in there he’s got this quote that says not everything that can be counted counts, and not everything that counts can be counted. So, that’s a brilliant quote that kind of encapsulates the importance of qualitative information, and why that needs to be an important part of the discussion that’s brought to the table. I bet you, right now in Russia, there’s a lot of infographics of data that are being used to justify the war. And people are going to believe it, because it looks like data, and data doesn’t lie. But there’s more to it than that, and we know that data can be manipulated, but we also know that you can do all kinds of things with framing and create bias and so forth. And how are some of those things going to be dealt with – some of those things have to be dealt with from a qualitative perspective.
JS: Right. Did you start your career in the data field thinking about qualitative data, or, was this sort of your own personal evolution, sort of, not culminating, because that sounds like an endpoint, but growing up to writing this full sort of book on qualitative data?
RB: Right, so I grew up, if you will, in the field of architecture, designing buildings, and there you go through a design studio in your education. And in that process, you learn to question a lot of things, and you bring a lot of different data to bear on the different ways that you’re doing things, but you’re also bringing a lot of qualitative information. And the system of evaluation in architecture is often critique, so critique is asking a lot of probing questions – why did you do that? Why is this the way this is? Why could we not choose something else? And you do that to avoid honing in too quickly on a solution, because there might be better solutions, and you might just be on like some little local optimum, and there may be some better, better ways to do things. And so, that notion of always probing and questioning your tools, I think, becomes very important, or if “all you have is a hammer, everything looks like a nail” type of problem occurs. And so, there is a need to look beyond that, and that came from that architectural background, and I recognize that was happening when I started getting into those previous discussions about text, and, like, the visualization tools aren’t fitting right, and so you’re just still hammering all these visualization nails in, and you’re going, like, I’m doing it, but it’s not right. And then the second part of it was that I would propose to clients, like, hey, you got the stuff with news, we should be doing more with news, but I didn’t have good answers for them, and they weren’t going to spend a bunch of money to do something where there wasn’t a good answer for. So you kind of had to go off and start saying, well, it’s on me, I’ve got to start looking for some better things, it’s not out there in the literature and the tools of the field.
JS: Yeah. So I want to come back to tools in a bit, because, I’m sure, there are lots of people wondering about a lot of things about the tool, so I do want to come back to that. But I want to ask about do you think that people’s primary challenge of visualizing qualitative data is winnowing the qualitative data down. So when I think about qualitative data in my world, I think about interviews and focus groups, and you have this long transcript that’s pages long. Is that the biggest challenge, or is it the actual act of taking what is already a curated passage of text and figuring out the right visual form?
RB: I think it’s both, it’s still a challenge to work with all of that unstructured text, you can get so much great information out of an interview. Right? And it’s all valuable, but how do you bring that together is one problem. And the second problem is, okay, now that you’ve got your nuggets, like, what do you do, is it a word cloud, or what else is it. And if it’s the “what else is it” you often find that if you start with a visualization, the text just doesn’t fit.
RB: It doesn’t fit in the bars, beside the bars, it’s like just the visualizations don’t have space for text. So it gets pushed out to the perimeter or somewhere else. So that’s why, today, most of the, you know, a lot of the good examples in narrative data today is like a news report where you have your block of text, your visual, your block of text, your visual, your block of text, because that’s the easiest way to work with it today. But it does mean then that if you take the visual, you miss all the context that’s in the text. If you take the text, you miss the context that’s there in the visual. So it has a good flow for the person who’s reading the story, but for the next person who’s taking pieces out on that, and quoting and using that somewhere else, you’re always going to be missing something.
JS: Right. I think that’s a really good point, because I think that I’m in the midst of teaching a class at Georgetown University, and the recent assignment was to grab a graph and critique it, write up a page or two critique. And I’ve seen a little bit of this in some of these critiques where they take a graph out of some Washington Post, New York Times article, whatever it is, and critique the graph, which, in some ways, is fair, but also it misses the lead-in, usually the lead-in into that graph. And yeah, I think we often miss that when we sort of take a graph off of some website and start talking about its merits, but missed the lead-in or the lead-out, I don’t know what the next part is, but yeah.
RB: Right. Yeah, completely agree.
JS: So okay, so you’ve mentioned the word cloud a few times, so let’s dive in, because this, in my experience is like, every person I know who is primarily a qualitative researcher, is familiar with the word cloud, they have probably made a word cloud, they’re generally unsatisfied with the word cloud. So where do you come down on the word cloud?
RB: So I’m generally unsatisfied with the word cloud.
JS: Generally, unsatisfied, yeah.
RB: Word clouds, so word clouds have a place. And one of the things that they actually do very well is they have a visceral appeal, right?
RB: And in communications, a visceral element is really valuable sometimes. And Don Norman talks about this in The Psychology of Everyday Things, or is it – sorry, it’s The Design of Everyday Things, the title changed in his second edition or something like that to get better sales, and it worked, he got a lot more sales out of it. Anyways, that visceral thing that immediately grabs you and engages you and brings you in, can be really useful, and we are caused do that. Like, you have the size and the colors and the angles and all of those things, and your mind will automatically read the text. So you’re between the size and the colors, and the shapes, and those big words, you are automatically dragged into a word cloud. So there are something to be said, for what word clouds can do. But then, on the other hand, after that, the words cloud don’t actually do very much. That’s where so you’ve hit your limit. So you got this visceral engagement, it’s all sizzle, and then, you go for the steak, and there’s no steak there – it would be the analogy. And I think that was one of the motivating factors for me, like, every time you’d have this conversation about qualitative data, the only thing that people could pull out of their toolbox was a word cloud, and it’s like, oh, there’s got to be something better, and you dig and you go, well, I don’t know, and you know what’s better. So one of the first visualizations, very early visualizations that I did when I started going down this path was the, oh yeah, there’s got to be something better, and so, instead of just counting the words in the book, which is what you do in a word cloud. Right? So I did one pass to count the words, and then, from those words, figured out which ones were people. So that’s a thing called entity extraction these days. So you now have the people, then you say, okay, I’m now going to do a pass again, where I look for just those people, and I’m looking for adjectives on either side of their names, and I’m just going to get adjectives associated with those people. And so, I’m just doing word cloud – word counts, the same way that you do in a word cloud, but now you’ve got people and their adjectives associated with them. And so, now I’ve got something that’s a little bit more meaningful than just words that are isolated, and so, there’s a little bit more meaningfulness that you can work with in the data; and then, visually, you can do something with that to do the meaningfulness. And then, in the book, I have stem and leaf plots, where it’s, essentially the noun, the person, the character, and the adjective beside, and it’s the list of adjectives, and it’s just the weight on the adjective. So it’s not word clouds, word [inaudible 00:14:07] it’s just the weight. So it doesn’t have that same kind of, gee whiz wow effect of a word cloud, but it has a content, gee whiz wow effect, in that, you’ll see I ran it through, for example. Grimm’s Fairy Tales – these our fairy tales from like 1700s rural Germany. And you get, like, the king is old and great, the princess is young and beautiful, the witch is wicked and old. And so, it’s like, it’s right there in Grimm’s Fairy Tales, all of these biases are right there in the adjectives. They just bubble right up and pop right out. So they pop right out in terms of the data, you can represent that visually, it’s got that impact and wow effect, not viscerally, but at the second level in Norman’s book that I’m now forgetting, of like, you’ve now got some meaningful insight that you’re gaining out of that data.
JS: Right. So I want to come to a few other visualization types in a moment, but let’s touch on tools real quickly, because I’m sure, people are like, this is a great idea, I’ve got some interviews I’ve done on such and such, and I want to know how people are describing these things. So there are two parts here, there’s the tools that you use to analyze the data, the tax data, and then, there’s the tools that you use to do the visualization. So can we start with the tools or the programs that you use to do the actual data analysis piece?
RB: Right, so this work started in like 2013-14, and at that point, I just picked up Python, which I wasn’t deep in knowledge in Python, but seemed like a great tool, because there were various libraries available in Python for crunching text, and I just wanted, like, the simplest tools for crunching text. And in Python, there were tools like NLTK and spaCy that were just libraries that you could pull in, and they use it as teaching. And so, there were lots of great explanatory reference material, so you could figure it out, and it wasn’t complicated, okay, I want just the nouns, and I could write a little bit of code to get just the nouns right, just the adjectives, and so on. So it was a simple programming tool that I could use to extract the bits out of the text that I wanted. Most recently, I’ve been playing with what are called transformer models, so these are these neural networks that do, like, Google Translate, they do these amazing translations or amazing summarizations, I’m sure you’ve seen in the news things like GPT-3 and BERT and so forth. And I’m just a neophyte there, but they’re really pretty amazing. You can give them some complex text and say summarize it, and nine times out of 10, you’ll get like an amazing summary, that you’ll go like, that’s, like, you know, you did a really good job there. And then, one time out of 10, you’ll get something that’s like completely out of left field, and you have no idea what this model has done. So a human is still very required in working with those, and you have to be very careful, because, well, what about those other nine times out of 10, where you assumed they did a good job, maybe in one or two of those, there’s like some hidden biases or some other things that you hadn’t considered, you’re just getting enamored with the tool. But I think it’s, in the future, those tools will become more and more important to qualitative data analysis, they’ll become very big. But in the near term, and if you want to, like, deliberately have control over every little thing that you’re doing, something like Python and NLTK or spaCy is going to do a lot for you.
JS: So do you find, for example, in the noun/adjective example, do you find that you will run the code, and then you will go through and check each one, because I can imagine, I can’t come, off the top of my head, like, a noun is actually, in this case, an adjective, it uses an [crosstalk 00:18:04].
RB: Yeah, these tools are not always perfect. And again, tools are getting better all the time, but that kind of validation you still want to do. So in the book, there’s an example with the noun and the adjective. When I run it, and I think, actually, that demo is on the supplementary site for the book where I got the interactive demo, if you point at the adjective, it will show you a tooltip that says, here’s the adjective in context. And if it’s not there, it should be in the tooltip – it’s in the, when I originally wrote the code, just on my local laptop, that was my debugging tool to see what’s going on, am I get the right thing out of here. So using tooltips as a debugging tool, so you crunch the words, but you also keep the whole context around for debugging purposes, and then, use it interactively in your visualization to see what you’ve got going on.
JS: Right, really smart. So you’re using the tooltip, even though no one else, we know, will click on tooltips.
JS: But for you, when you’re working with your data, then you don’t have to get the book over here, yeah, that’s smart. Okay, so that’s really helpful, I’ll put links to all these Python libraries, and to the supplementary side, of course, for the book on the show notes. Okay, so now we’ve got our data, so now what about actually visualizing the data – so you’ve mentioned stem and leaf plot, I like the – I can’t remember what you call them, maybe they’re called word lines where you have the words…
RB: Microtext lines.
JS: Microtext, yeah, you have them in the line themselves, which we’ll talk about in a little bit, because I’ve tried to do it in Excel, and it does not – it just doesn’t work, let’s just put it that way. So what’s your main toolkit for the visualization piece?
JS: But I suspect that, well, D3 probably can’t do the analysis piece, I would suspect somewhere buried in Python – and I’m not a Python person, but buried in Python, you could probably make some of these visualizations.
RB: Right, and I think that’s just the comfort thing.
JS: Right. Well, it also seems valuable to your point about adding tooltips or other pieces for your own visualization or analysis workflow that helps you in this process, as opposed to, I can imagine, like, in my head, it’s like, okay, I’ve made the visualization, here’s the book, I’m going to leaf through each page of Grimm’s Fairy Tales, and try to find each of these pieces, which…
JS: Yeah, this is going to be difficult. Okay, so the book has, I’m going to say countless, because I haven’t tried to count, countless different visualizations, obviously, using text. So I have two last questions for you, so most of the book uses Alice in Wonderland. And I’m curious, is it just, like, does it just lend itself well to the book, or was it just, like, your favorite book, what is it about Alice in Wonderland?
RB: So there’s interesting – Alice in Wonderland was not conscious at the beginning of the book. At the beginning of the book, I was just writing, and I was looking for, oh, here’s a good example of this, here’s a good example of that, here’s a good example of that. And at a certain point, you’re going like I’m processing different texts all over the place, and maybe I should try to synchronize around one of them. And just before the book came out, Michael Friendly, and some of his collaborators published this review of visualizations on the Titanic, and it was really interesting, because there’s something like 20 different visualizations that they had found of the Titanic data. So the Titanic data has been around for like a 100 years, and even right after it sank, there was like the first visualizations like a couple of days later in a newspaper, and there have been countless visualizations since, and you’re looking through, you know, like, these are very different kinds of things, it’s one dataset, and very different kinds of ways of visualizing it. So even if you’re focused just on quantitative visualization, it’s a really good thing to go look at that paper to say, like, wow, like, what are people actually doing with this thing, how many different ways are they visualizing. And, of course, there’s many different ways to visualizing, because there’s many different stories in the data that you can extract. Right? So I said, well, you should be able to do something like that from a text perspective and say, well, not just that I’m showing an example here, an example here, an example here, but should be able to take a text and say, look at all the different ways to visualize it, and then, you get into, like, okay, so what text should it be, I don’t want to do the Bible because who knows what’s going to come out from that, and I don’t want to offend anyone.
JS: Right, Yeah.
RB: I don’t want to use Harry Potter because there’s probably copyright police if you just use it ever so slightly the wrong way. And so, Alice was just a really good one, because everybody – not everybody, a lot of people know Alice. It’s been around for a long time, you’ve got some association with it, whether you’ve seen the movies, whether you actually read it or so forth. So that’s why Alice bubbles up and then follow on it’s not in the book, I’ve done a paper and a couple of talks on the how many different ways have people visualized Alice, and I think the count is something like 72 now that I’ve got. Like, it’s just – yeah, lots of people use it. And it’s great, because you can extract different kinds of things out of it, depending on what it is that you’re analyzing, and what tool that you’re using, and what you’re going after. And so, there’s these wonderful, different things – one of my favorites, you know, things that I would never have considered. And so, an artist did a visualization of Alice in Wonderland, what does it sound like in different languages, and they use that thing where they convert the text into the International Phonetic Alphabet, so you can take the text in every language, convert it into its international phonetic alphabet, and the phonetic alphabet gives you what it sounds like. And then, you can build up distributions of all the different sounds and you say, oh so Russian sounds like this, and Portuguese sounds like this. And they’re all based on Alice, and so Alice is just the point of normalization for doing the comparison. But you got this incredible analysis that comes out of it, like, wow, really cool things that people do.
RB: And the funny thing for me about that one is that wasn’t a linguistics person, that wasn’t a visualization person, that was a person from fine arts who did that, like, really amazing possibilities.
JS: Yeah. It is true when you – I think when you have that singular dataset, and you can say, hey, and it’s quantitative as well. Right?
JS: There’s not just a bar chart, there’s this chart and this chart and this chart, yeah, I think that’s a great theme. So my last question for you is: do you have a favorite of – let me put it this way – you can answer this question anyway you like, but, I guess, is there a favorite graph in the book, or is there a favorite version of your collection of 72 Alice in Wonderland visualizations?
RB: There are many favorites in both, that’s always the challenge of asking you, which is your favorite. And so, I’ve made many different visualizations, I’m actually going to say one that’s not my own, but one that really, early in my research that – so in my research process, I said, there’s got to be better ways of visualizing text. And if there are better ways, well, we’ve had, like, the printing press for 500 years, and we’ve had medieval scribes and other people writing stuff down before that. So there’s going to be hints out there, you just got to find them in historical record. And if I can find some of those hints, it’s going to help me understand lots of other kinds of visualizations that can be made. So from that, I would say, there was the very first encyclopedia, which was called Cyclopedia in the UK, published in 1728; and it has within its Table of Contents, it’s something completely new. So they created a visual Table of Contents, let’s just pull this out. And so, I don’t know, is that coming through okay?
JS: Yeah. So yeah, so for the video folks who are watching this, you can see it; for the audio folks, it looks like a, well, I’ll let you try to describe it.
RB: Yeah, so I’ll describe it then, and say, basically, it’s a tree. So as a visualization, we understand what is a tree, it’s a hierarchy. And on the left side is some text that says something along the lines of all knowledge is either, and then, it splits in a branch, and then, the sentence carries on to the next branch, and you can read, it’s either physical or it’s metaphysical. And then, you can take each of those branches, and it just keeps branching out, out, out, out, out, and it’s a fully readable paragraph, or even, like, it’s a fully readable couple of paragraphs of text. So as text, you know, wow, like, it’s this fully readable thing, but as a visualization, it’s also structured as a visualization, as a hierarchy. And furthermore, within that visualization this is a printer just working with the letters and their little box in 1728 that they have, but they have italics, they have superscripts, they have small caps, they have spacing. So they’re using all of that to, like, the different branches and the different chapters in then Cyclopedia. Each chapter is like in small caps, and the major areas are in italics. So you can kind of read it linearly if you want, you can just skim it for the chapter headings, you can walk through it in so many different ways, and it’s using visualization techniques, it’s using typographic techniques, and it’s just using reading techniques all combined into one. And that was, for me, incredibly, mind blowing, if you will, just to stumble across that and say, how, where, why did that come from. You can find earlier examples where they did parts of that and so forth, so it does have a natural evolution, and it kind of disappears because we’re not familiar with it today anymore. But it was this great insight into how visualization of structure and text can live together in one simple visualization. And so, for me, that is my favorite out of the whole book. You know, I should be picking one of my own and selling posters. And so, I have lots of my own favorites out of everything, but I won’t go into those. And unlike you’ve said, it’s really interesting in different audiences. Some people love the Micro line, some people love Grimm’s Fairy Tales, there’s one of motion words that people love, there’s another one of songs. And so, it’s fun that way too, when you’re able to show your own work and get engagement and get responses from audiences. But in this case here, I’ll say, my favorite is that one, because that was really the gem that really started opening things up a lot for me, rather than just thinking about things as like little labels in a word cloud.
JS: Right. That you can combine words, phrases, sentences, paragraphs, and then, in a visual form, it’s not…
RB: And structure, and your visual attributes, like, italics and caps and whatever, and bring all of that together.
JS: That’s great.
RB: And do it in a simple way.
JS: Yeah. Well, Richard, it’s a great book. It’s my favorite book on qualitative DataViz, everyone should check it out. And thanks so much for coming on the show.
RB: I appreciate it. Thanks very much.
And thanks to everyone for tuning in to this week’s episode. I hope you enjoyed that. I hope you learned a lot. There are more guests coming up on the show in the coming weeks. Of course, I’ve also got a bunch of things in the works. I’m trying to get some more blog posts out there. I’ve got a whole lineup of video recordings to share with you on my YouTube channel, and, of course, I’m still trying to grow my Winno community. Winno is a text messaging app, where I share one, two, three or more texts per week about data and data visualization. You can sign up for a free tier where you get one or two texts a week, or you can sign up for the paid tier for only five bucks a month, it’s like a cup of coffee, but you get more information, more details, more coupons, more special things delivered right to your phone. Well, okay, that wraps it up for this week. So until next time, this has been the PolicyViz podcast. Thanks so much for listening.
A whole team helps bring you the PolicyViz podcast. Intro and outro music is provided by the NRIs, a band based here in Northern Virginia. Audio editing is provided by Ken Skaggs. Design and promotion is created with assistance from Sharon Sotsky Remirez. And each episode is transcribed by Jenny Transcription Services. If you’d like to help support the podcast, please share and review it on iTunes, Stitcher, Spotify, YouTube, or wherever you get your podcast. The PolicyViz podcast is ad free and supported by listeners. But if you would like to help support the show financially, please visit our Winno app, PayPal page or Patreon page, all linked and available at policyviz.com.