This is sort of a special, non-podcast version of The PolicyViz Podcast. I asked Jorge Camões, author of the new book Data at Work, to join me on the podcast. After months of seclusion to finish his book, Jorge is finally shaving, but he’s still brushing up his conversations skills. Instead, Jorge was happy to answer my questions via email.
Can you tell readers about yourself?
I spent all my professional life designing and developing information products for internal and external clients. Since I never wanted to be an IT person, becoming an advanced Excel user was unavoidable.
When it comes to data visualization, every report and presentation I made when I started working is a catalogue of the errors and misconceptions that Excel’s bad defaults invites users to make. Then one day I stumbled upon Tufte’s The Visual Display of Quantitative Information and I found the meaning of life…
I became a father of twins by then, and couldn’t afford more than a low-commitment project, and blogging was the obvious answer. I wanted to test if I could be more than an echo chamber, but over time I started feeling frustrated with the blogging format. It was like wanting to discover the world but having to stay at the same hotel every night.
I started writing the book about four years ago, and it took me one more year to translate it. Because it uses real-world data, I had to find new data and make new charts to make it more suitable for an American and an international audience.
I wrote this book for everyone who is not paid for their artistic talent or graphic design skills, and use a spreadsheet as the primary data visualization tool. That’s formulaic and basically means Excel users in a business context.
There are several reasons for that. Excel users are the silent majority of data visualization. They are not supposed to share their work publicly, Excel is not a sexy tool, they succumb too easily to ugly ineffective defaults, and usually you can’t trust peers and Excel trainers to help with sound data visualization principles and best practices.
If you turn to books, Excel manuals are too focused on the tool, while data visualization for graphic designers doesn’t make sense in a business context. Stephen Few is the reference in business visualization, but I’m not sure if he alone can provide the critical mass to overcome inertia. Things are changing, though, and many of the data visualization books been published now target this area, which is great. This at least will make people more aware of the need for more effective data visualization at work.
When it comes to aesthetics-driven data visualization, this book ends when the fun begins, but I secretly hope to see graphic designers reading it. Most of them are truly brilliant in balancing data and aesthetics, but I still see things like using pies to display growth, and I think the book could help with that.
Excel users are the silent majority of data visualization.
As the owner of ExcelCharts.com, I was surprised to see no hands-on Excel tutorials in the book. Why did you not include specific tutorials?
It wouldn’t be fair for me to say “look at this cool chart I made in Excel; you’ll never guess how I made it.” On the other hand, I was more interested in discussing data visualization in a business context than discussing the tool. It’s true that the readers will not find long step-by-step tutorials, but they should see the book companion site (dataatworkbook.com) as a natural extension of the book itself. Almost all charts I included in the book are available there for download. I will write tutorials in my blog, and I encourage readers to leave a comment on the site and I’ll answer as soon as possible. I think that’s a reasonable trade-off between focusing the book on data visualization and sharing the specifics of chart making.
Hope you’ll agree that, in spite of not including those hands-on tutorials, this still is a very practical book. For example, I used real-world data and not dummy data, because real-world data is messier and tends to rebel against our neat concepts, making things more interesting. I share a small project as an example of how things can get too interesting when your expectations about data quality and data availability are not met by the actual data.
Also, because the underlying data tables for each chart are available, I encourage the readers to play with them and suggest better alternatives to my charts. That can be a great learning opportunity for everyone.
I can’t thank enough Nikki McDonald, then my editor at Peachpit, for letting me write the book I wanted to write, and not something else. We did get suggestions to write a more Excel-centered book, with lots of hands-on tutorials. I’m sure that book is needed and someone should write it. I could consider writing it myself, now that I have a better understanding of what data visualization means to me.
By the way, I just downloaded Stephanie Evergreen’s new book. I haven’t read it yet, but my first impression is that her goal was exactly that: combining good data visualization practices with hands-on tutorials in Excel. So that’s a welcome first book in this area.
Aside from Excel, what are your favorite data visualization tools?
Let’s put it this way: complex things are easier for me to do in Excel than making simple things using a tool I don’t know. I can also use Excel as a prototyping tool, and even used it to make charts for a data visualization book. I don’t feel the need to learn a (more glamorous?) proper visualization tool and I don’t have a business case to do so. This means that, so far, this was a very monogamous relationship, but the plan is to use the book to close a cycle and start something new. I like to play with Tableau, and I’m curious about PowerBI. PowerBI is the natural choice if I want to stay under the Microsoft umbrella. When choosing a tool like PowerBI, Tableau or Qlik, you have to assume some design flexibility will be lost, while you gain processing power, interactivity or web access. I’d like to replicate some of the charts in the book both in Tableau and PowerBI. It should be an interesting challenge.
And I want to learn a programming language. Again, it would make sense to choose R (and Microsoft R Open), but I’d like to learn Python with the kids and use it to build things in Minecraft, before using it for work.
What is the biggest misconception about Excel when it comes to data visualization?
I think the most dangerous one is not specific to Excel, but Microsoft made it very obvious in Excel: I’m referring to this notion that, when you are learning a tool, you also acquire domain knowledge: contrary to popular believe, making a chart and making a chart in Excel are two different things and you need them both. When choosing an Excel training provider make sure they don’t spend half of the time playing with the 3D options.
I have a misconception of my own: I tend to believe that some people using other tools simply overgrew Excel. It always surprises me when they say “hey, I didn’t know Excel could do that!”. Excel can probably answer 99% of your business visualization needs. Whether it can do it cost-effectively, that’s a different question.
When choosing an Excel training provider make sure they don’t spend half of the time playing with the 3D options.
What are the top three things that people who are new to data visualization need to know?
Language, effectiveness, and truth are the three keywords I’d pick. Data visualization should be taken as a language, and, like any other language, we use it to communicate. All your design choices (your visual words) should be aligned with the task, with the message you want to send or the way you want to explore the data.
Since we have a language, we can establish a communication channel between the data and the brain. The brain is great at processing visual information, but its resources are finite, so it tends to allocate more resources to process the things you are paying attention to. Data visualization preprocesses the data, and that means that precious cognitive resources can be applied to higher-level tasks. Effectiveness is a way of minimizing brain processing costs, and should be your top priority, especially in business visualization. “How is this improving my communication?” should always be in your mind when creating a chart or choosing a design option.
Now that you have a communication channel and you know how to use it effectively it’s about time to take a look at what flows through those pipes: the data. You should torture your data until it confesses everything it has to confess. I’m sure you’re an expert in your field, so you’ll be able to separate the wheat from the chaff and select the relevant insights, while making sure your visualization remains a truthful translation of the data table. Everything falls apart if the data is wrong or can’t be trusted, so evaluating that should always be your starting point.
Let me add a short bonus keyword: statistics. Data visualization can be sexier than traditional statistical methods, but it’s not better, or worse. What’s the point of wasting space making a chart if there is no meaningful variation? Use the average instead. Or use quartiles to define cut-off points in a distribution. Use a chart to spot outliers. Visualization and statistical methods complement each other nicely, and more often than not they should be used together.
Is there a “right” chart?
Sure there is a right chart! It’s the chart that you, in a given circumstance, think is the best one to wow your audience, discover or communicate knowledge, impress your boss, or advance your agenda. Unless, of course, you prefer to use bar charts for everything, draining all joy out of data visualization like a Dementor (read this and this if you didn’t catch the joke.) But let’s leave these extreme positions and find a more fertile ground.
I like rules. They are fun and have an accurate sense of self worthiness. Sic transit gloria mundi is their motto. Unlike those humorless and generally annoying chaps called dogmas, rules just do their work, always ready to be broken if something better comes up. There are many rules in data visualization: time should flow from left to right, if you are displaying proportions there must be a whole, having a dual-axis chart is bad, etc.
Rules will not tell you what the “right” chart is, but they can help you find a good starting point. Think of a time series. As a rule, you can start with a line chart, which will make you aware of a strong seasonality (monthly tourism data is a good example). This is the right chart to display seasonality, but maybe something interesting is happening in each time period (is tourism in August getting stronger?), so now you add a cycle plot. Or you decide to add a new series to see how they variate one against the other, and you make a connected scatterplot (nights spent by residents versus nights spent by non-residents). This adds more interesting insights. And then you realize some of these right charts are not the right ones for your audience, and you have to “rephrase” your entire message so that it can be understood.
Takeaways? A single dataset can tell many stories, and there are too many other moving parts. Don’t waste your time searching for the “right” chart, not even for “the best of all possible charts”. After exploring the data and making a few charts, all you need to do is to look at each one and ask questions like: is this saying what I want to say? is the message clear and with the right level of detail? I’m I doing my best to help the audience to read it?
What is the role of aesthetics in data visualization?
First, let me say I like to use “data visualization” as an umbrella term that includes every transformation of a data table into a visual display. In that sense, business visualization and data art are data visualization, with different goals and requiring different skill sets. There are common roots to all data visualization branches, like the role of perception and eye physiology. If you want to focus on the differences, then aesthetics will serve you well. It plays a central role in data art, it should play a supporting role in infographics, and should remain a background actor in business visualization (I’m oversimplifying here). Hence, when you make a glossy 3D pie chart you clearly don’t understand the role of aesthetics in business visualization: that’s data art. Really, really, ugly data art.
When you make a glossy 3D pie chart you clearly don’t understand the role of aesthetics in business visualization: that’s data art. Really, really, ugly data art.
You can make boring charts in business visualization, but they can’t be ugly, unless you are doing something wrong that has nothing to do with aesthetics. That’s a bold statement, right? No, it isn’t. Remove noise/junk/clutter (don’t over-sanitize), manage stimuli intensity, make sure the message is clear and the end result will probably be aesthetically safe (or, using Tufte’s words about color, you’ll be able to “avoid catastrophe”). Think of aesthetics as the by-product of your functional design choices. From an artistic point of view, “safe aesthetics” is not exactly a compliment, but in business visualization it could be seen as a safety net that allows you to go exploring. A while ago, I created the “bamboo chart” to compare the difference between subgroups to the national average. Joe Mako made a better version using Tableau. This is my type of exploration: using the objects in my toolset and trying to come up with a better answer to a specific question.
Aesthetics can range between terrific and terrifying. If you have the artistic talent and know-how to translate the interplay of data and aesthetics, you can create terrific visualizations. If you use aesthetics with the sole purpose of getting eyeballs and you don’t care about the data, then your marketing infographic will be terrifying, lipstick-on-a-pig style.
Take the famous vaccine heatmaps published by the Wall street Journal. The impact of vaccines is so obvious that maybe the author could find a more effective way of visually encode the data (read Randy Olson’s great post for a discussion of the issues and possible alternatives.) I don’t mean it should be a line chart: I mean that there could be a better balance between aesthetics and effectiveness, because hiding the change in the data is almost impossible.
This example shows that aesthetics improve engagement, but it helps if the potential for engagement is already there. If the visualization is about “me” (the audience) or if it’s about something “I” care about, if the data itself contains remarkable features, then aesthetics can help emphasizing them.
Aesthetics can’t be reduced to color, but I suspect that a more liberal use of color could be enough to improve engagement. I know this is a dangerous thing to say, and I don’t mean there should be a color rainbow in every chart, but you can’t expect your audience to react enthusiastically to your all-gray chart. Adding a patch of color when it makes sense, using a light colored background or using a hue with varying levels of luminance instead of shades of gray could make your charts more appealing without making fundamental concessions.