I usually hate these kind of “Year in Review” wrap-up posts, but I figured I have enough good and bad stuff saved that it was worth sharing. No rankings here, just a stream-of-consciousness kind of thing. But let me be clear, I haven’t done a good job collecting data over the course of the year. In fact, I just spent some time mining my various lists, emails, and Tweets over the past couple of weeks; basically, these lists are hopelessly inaccurate. But I hope you find a few useful nuggets, resources, and a couple of laughs.
Before I get into the various lists, I want to thank you for reading my posts, attending my workshops, and listening to the PolicyViz Podcast. It was an exciting year (Graphic Continuum, lots of presentations, book writing, and more), and I’m thankful that you find the stuff I’m writing and speaking about useful and of interest.
I hope you and your families have a safe, happy, and healthy New Year. See you in 2016!
I’ll start with data/journalism, which tends to get a lot of credit and a lot of fire for making really awesome data visualizations and making really terrible mistakes with data and statistics. Continuing great work from years past, the New York Times (especially the Upshot), Pro Publica, the Washington Post, and the Guardian have done great work this year.
At the very end of 2014, I made a stink about journalists (and especially some of the newer data/journalism places) making some questionable decisions with data, data analysis, and statistics. But over the past year, many of those places are doing some really great work. On the one had, I think FiveThirtyEight has got it about right. They are using interesting data and covering interesting stories (though much of the Life stuff doesn’t really appeal to me). For my money, their Slack Chat conversations (one on Donald Trump here) are uniquely good because they highlight the how smart, data-driven people think about important issues and debate them in the newsroom (and then edited for us).
On the other hand, I’m not regularly reading Vox anymore. My critique is therefore not exactly fair because I basically stopped visiting the site a few months ago, but that’s because it felt more like Buzzfeed with click-through “explainers” and other lighter news rather than the more in-depth, data-based analysis I thought it was going to be. They did some good stuff this year, like this video on non-zero baselines and this tax bracket explainer, but it is not in my regular rotation.
Data and Data Visualization Tools, Now and Next Year
There’s just far too many tools out there now, I can’t keep up! I’m still a big fan of HighCharts and I played a bit with Plot.ly this year (which recently made their code open source). I still need to explore other tools like Vega and Lyra; PowerBI is near the top of my list for 2016 and I’m curious to see how it will mesh with the rest of the Microsoft ecosystem (and, ultimately, how it will fare versus Tableau).
I also wonder where the traditional, for-profit statistical tools are going to end up. How are tools like SAS, SPSS, and Stata going to keep up with the Rs of the world? The R aficionados will argue that they can’t, but I don’t believe that’s the case, at least not in the short-run. If an organization provides their staff with SAS or Stata, those costs remain hidden to the individual analyst and there is therefore little incentive for those individuals to learn new tools, unless those new tools do things the traditional programs simply can’t. And in my experience, even though the data visualization options are better in R, I don’t think we’re quite there yet.
Some Widely Inaccurate Projections…
In 2016, I expect more tools to emerge, especially those that make it easier for people to work with “Big” Data. I’m not exactly sure what that’ll entail, really, but tools that make it easier to visualize large datasets will be increasingly important. I expect animated GIFs to be a really big thing in 2016; with tools like GifMaker and ScreenToGif, it’s now cheap and easy to create an animated GIF or record yourself using a tool and then post it to your social networks. Animating visualizations in this way also allows organizations without the ability to code to create some interactivity with their traditionally static charts.
Tile Grid maps (example and example; and a great post on maps from NPR here) also seem to be hitting their stride, right at the end of the year. These maps have pros (easy to make) and cons (geography not quite accurate). They’ll probably continue this little hot streak for a while and then cool off in 2016.
I also think “socialviz” will be bigger in 2016. I’m not exactly sure if this is the term others use, but I’m sticking with it–visualizations in which users input some data and the visualization updates to show the results. The Upshot did a couple in 2015 and this one on education and earnings was one of my favorite visualizations of the year.
Data Visualization Critique
I feel like the data visualization community has grown up a lot this year. There’s been fewer pot-shots on bad visualizations and more constructive, thoughtful critiques. Personally, I tried to do less of that drive-by critique (though I wasn’t always successful). I hope my project site, HelpMeViz, can do even more to help people next year. Thoughtful posts from Martin Wattenberg and Fernanda Viegas and another from Ben Jones(1) highlights some of the growth in the field. I expect more thoughtful critiques in 2016 as more people write and argue about best practices. There will always be the debate about pie charts(2), zero-axis baselines(3), and memorability, but it seems the field has come a long way in the last year.
Big, Open, and Personal Data
Clearly some of the major trends in 2015, and all three show no signs of slowing down. I’ve recently written a couple of posts about Big Data as it is used (or not) in social and public policy research (here and here). I expect there to be big changes in how such researchers uses these new and nontraditional forms of data (NANSOD!). This may mean researchers need to learn how to use APIs or Amazon’s Mechanical Turk, or even for-hire sites like Upwork, and I’m optimistic that these data sources and tools can be useful for policy research.
On Open Data, there has been a ton of movement, especially at the state and local levels. The federal government has made big strides too, and though started in 2014, 18F made incredible gains this year and I think will be a real force in 2016 (though who knows what will happen after the elections in November).
Personal Data is also a growing trend with all kinds of tools, technologies, and wearables to help you capture your own data. As I’ll mention below, Dear Data was one of the biggest data visualization projects in 2015, and I wonder if part of the appeal of that project is because there is something uniquely natural about collecting your own data by hand and not letting a watch or app do it for you. I’m sure collecting more personal data and ways to go about doing so will expand in 2016, and with it will certainly come concerns about privacy and security. I’m also curious to see if others will take up the gap in big personal data visualization projects now left by the end of Dear Data and Nicholas Feltron’s annual reports.
I had a really good run of books earlier in the year. Things were just clicking, the writing was great, the topics were interesting, and I was mixing good fiction and non-fiction. There were also some big clunkers, but I won’t list them here.
Here are some of my favorite reads from this year (though not necessarily published this year).
Game of Thrones (but, seriously, can he wrap up at least a couple of story lines?)
Michael Alley’s Craft of Scientific Presentations had some great sections
Made to Stick (yes, I’m behind)
Harry Potter (4, 5, 6, and 7–my daughter demanded I read them along with her, but no complaints from me, friends, those books are awesome)
Robert Putnam’s Our Kids: The American Dream in Crisis
The Idea Factory: Bell Labs and the Great Age of American Innovation
Storytelling with Data by Cole Nussbaumer Knaflic
How Not to Be Wrong: The Power of Mathematical Thinking
The Storytelling Animal: How Stories Make Us Human
Start with Why
Atul Gawande’s Checklist Manifesto.
2016 is shaping up to be the Year of Data Visualization Books, and I’m looking forward to all of them: Alberto Cairo (March), Jorge Camoes, Dear Data (September), Charles Duhigg (not dataviz, but I’m still excited; March), Stephanie Evergreen (May), Carmine Gallo (February), Andy Kirk (May), and Carmen Simon (June). Of course, I’m especially excited for my own book on presentation skills for researchers and analysts, which I’m writing with Columbia University Press and is scheduled for an August 2016 release.
The Bests of 2015
Best static viz. Well, of course, Dear Data has to top the list, right? As I Tweeted early in the year….
— Jon Schwabish (@jschwabish) March 19, 2015
I also liked this timeline of the Presidential election from the NYTimes, and the list of projects at the Information is Beautiful Annual Awards. Although very simple, I thought this graphic from the Guardian on mass shootings in the U.S. was very effective because of the (much too) long, vertical scroll.
Best interactive viz. Many, many to list here; see Andy Kirk’s various monthly roundups. I loved the Wall Street Journal’s heatmap on infectious diseases because it shows how you can use a slightly different visual approach to show simple data. If given a file of infections by state and year, most people would probably make a line chart. But the heatmap allows you to see the same patterns with less clutter, and to pick out specific values. (It works as a static too). Also on measles, this one explaining how vaccines work from the Guardian was very clever. And this NYTimes story on oil prices did a great job combining a vertical stepper with a connected scatterplot. The NYTimes 3D Yield Curve is one of the few visualizations that used 3D in an effective way. Bloomberg’s Global Warming stepper was also really good because it built up a scientific model with interactions in an clever way. Somewhat similarly, this Urban Institute stepper on wealth inequality was one of my favorites this year.
Not surprisingly, many places are publishing the year-in-review-in-graphs, so here’s a list for you:
The Worsts of 2015
AmericansUnited for Life’s (below) horrible, horrible, insulting line chart is beyond bad and shows why readers need to be careful reading graphs, especially from groups with agendas. Time series pie charts are still pretty terrible; this map from the CDC almost made me cry; as did this triplet of 3D exploding pie charts (below) from the BEA. National Review Online had an awesome line chart of global temperatures that shows why zero baselines are not always appropriate. Quartz has a good wrap-up of the most misleading charts of 2015.
Favorite blogs posts: Stephen Few on McCandless (though, really the comments were the best part); Science isn’t broken from FiveThirtyEight; A Visual Introduction to Machine Learning; Archie Tse Tweet discussion on plagiarism (and a similar thing from Nate Silver); Jen Christiansen on audience-centered graphics; Paul Ford on code (kind of short-changing this to call it a blog post); Martin Wattenberg and Fernanda Viegas on data visualization critique; Sarah Slobin on empathy in data visualization; and a few posts from Adam Noar at Presentation Panda, especially this one on pulling colors out of images. Also, the various Reddit AMAs with data visualization experts (eg., Kosara, Bostock, and McCandless) were a lot of fun. (My most popular posts in 2015 were killing paired bar charts and Unicorns.)
Most fun with data: Chris Ingraham at WaPo on goats
Favorite movie: Chef (yes, I know it’s from 2014) and, yes, Star Wars: The Force Awakens. On the family side, Percy Jackson movies (Lightning Thief was good; Sea of Monsters was pretty terrible), and I saw all of the Harry Potter movies this year (my daughter read all the books at least twice, so it was a heavy Harry Potter year, including a HP-themed birthday party). We also did some older family films such as Home Alone and Home Alone 2, which were big hits.
Favorite viz papers: An Evaluation of the Impact of Visual Embelleshments in Bar Charts; Error Bars Considered Harmful: Exploring Alternate Encodings for Mean and Error; ISOTYPE Visualization: Working Memory, Performance, and Engagement; and The Connected Scatterplot for Presenting Paired Time Series.
Favorite podcasts: You know I’m going to lead with The PolicyViz Podcast and Rad Presenters, right? I liked New Tech City, but it seemed to fall off for me this year when it changed to Note to Self. FiveThirtyEight’s podcast What the Point is good, as is the ProPublica Podcast (Scott Klein’s voice is so soothing, he should do a meditation podcast); Vox’s The Weeds is too long for my taste. HBR IdeaCast is sort of hit-or-miss for me. Of course, Data Stories is still great and who doesn’t loveMoritz and Enrico’s “exotic euro” voices?
Explaining uncertainty with dataviz: Me and Hannah Recht on poverty rates; my colleagues at the Urban Institute on the prison population; Kristoffer Magnusson on Statistical Power and Significance Testing; and Xan Gregg on variability and maps.
Bad stats: These are just everywhere! Folks need to be careful about using bad, unfounded statistics. I wrote about some of these in September, and hope to write more in 2016. Bad statistics about attention spans, for example, are usually completely untrue and end up all being derived from a single, terrible source.
Favorite Goodreads: Santiago Ortiz
Favorite Talks: You should probably just listen through the Rad Presenters podcast–I listed a number over the course of the year. A few stood out: Jen Christiansen at Visualized; Fernanda Viegas, Martin Wattenberg, and Lynn Cherny at the U; Lena Groeger at OpenVisConf.
Favorite new tech: Amazon Echo (I thought this would be stupid, but it’s actually kind of fun); Fitbit (yep, I’m behind a bit, and I admittedly don’t like exercise, but I’m enjoying it); Spotify (not quite sure how this is legal, but I’m enjoying it); and my fancy Blue microphone (and the Blue snowflake for travel).
Favorite conferences I attended in 2015: APPAM (speaker), ASSA (speaker), OpenVisConf, Presentation Summit (speaker), Socrata Customer Summit (speaker), Vis (speaker), Visualized (speaker), and many, many others.
Favorite toy: Lego Millenium Falcon (for me, not for my son), Kindle Fire (daughter, plus a seemingly endless supply of books), Gyroscope (son, though it quickly turned to a variety of Legos and footballs).
(1) Sorry, Ben, I disagree that the Word Cloud has any utility besides looking cool. Maybe as a thumbnail image on a website, but not for communicating any actual data.
(2) In her post on pie chart guidelines, Ann Emery argues that pie charts with only 2 or 3 slices are okay. I just want to point out that if you’re in the camp who argues that pie charts with 2 slices are okay, I’m gonna come back and ask whether you need a chart at all! I’m not sold that you need a (pie) chart in such cases because you’re really only showing a single number. As examples, take these two Tweets, one from the Kaiser Family Foundation and the other from the Committee for a Responsible Fiscal Budget. They essentially give you the same thing–a single number–but because we’re not very good and discerning quantities from the slices, the pie chart forces you to read the numbers. By comparison, the single number from KFF gives you the information in big, bold text. The chart may be helpful in the social media, but I think it’s less useful in other reports, websites, etc. when a single number would suffice.
— Kaiser Family Found (@KaiserFamFound) December 4, 2015
— CRFB.org (@BudgetHawks) December 7, 2015
(3) Quick note on Vox’s non-zero baseline video (Quartz also had something from earlier in the year): I just want to point out that their examples were restricted to line charts, and it is with those charts that I generally agree that the vertical axis need not go to zero (though once you decide not to use a zero baseline, you start to get into issues with cherry-picking the range that can distort the data). (See this pre-PolicyViz Podcast audio chat with Cole Nussbaumer). For bar/column charts, however, they should start at zero because it is the length/height of the bar/column that we use to discern the quantities (line charts use position of the data point).