Last month, Ann Emery wrote a post about visualizing statistically significant point estimates. She took a standard table with nine variables spread across three countries—27 numbers in total plus asterisks denoting four different levels of statistical significance—and remade it to visualize the “big-picture findings.”

The filled squares represent statistically significant results and empty squares represent results that were not statistically significant. The cutoff was a p-value under 0.05; anything higher than that—notice that there are two estimates in the original table that have a higher p-value—remained empty.

Ann’s remade table is more visually appealing than the original—the use of shapes and colors draw the eye in a way that the text and asterisks do not. I wanted to take a deeper look because leaving out the actual numbers didn’t feel right to me. But before I get to my critique, let me state up front that I don’t know the final product, context of the results, or the actual variables or their meaning. I also don’t know Ann’s client and it sounds like they were really happy with the results. My goal here, therefore, is not so much to critique Ann’s approach, but instead to continue this discussion about how to visualize uncertainty and statistical significance, a challenge I regularly face.

I think there are two main flaws with this approach of using squares in the table: First, it hides the ** magnitude **of the results. Yes, you can distinguish the statistically significant results from those that are not, but you don’t know how large any of them are. And the fact that I can’t figure out whether the effect of variable 1a is twice as large or half as large as variable 1b is a problem.

In her post, Ann spends a good chunk of time reminding us how important it is to keep the audience in mind (emphasis mine):

And remember our audience: the organization’s internal leaders who were not researchers by training. … They simply needed to know whether their treatment group had done better than their control group so that they could decide whether to expand, shrink, or adjust their approach to programming.

They didn’t need the exact numbers in order to make those decisions.And I worry that providing the exact numbers can actually distract busy leaders from the big picture.The exact numbers go in the appendix of the report.The big picture findings go in the body of the report.

I’m a big fan of the approach Ann espouses here of putting the exact numbers in an appendix or accompanying detailed table, but I think it must be the case (and, again, I don’t know the client here, so I’m speaking generally), *that you need to know relative magnitudes to make any sort of decision about this programming*. It’s possible that this is a situation where the binary result (of statistical significance) is what matters, but I suspect in most (all?) cases, the magnitudes are vitally important. If there’s an 85% chance that some variable will add $1 million in profits to my business–a p-value>0.05–and a 98% chance that some other variable will add $1,000 in profits to my business–a p-value<0.05–I might be willing to take my chance with the former!

Also, while I would certainly argue that there is, generally speaking, lack of statistical numeracy and thus expecting our readers to understand what a p-value is or even what a percentile is can be courageous, there is a time and place to help *educate *our readers about how to read statistical results. I think this is *especially *the case when we are presenting results to decision-makers for whom it is vital to understand what statistical significance is and what it means. Ann’s legend in her remade table, to some extent, do just that. If the full table results can be put in an Appendix, can a more in-depth discussion of statistical significance also appear there? As an example of good annotating/labeling, I think the key in these box-and-whisker plots from the Washington Post do a nice job helping readers understand how the plots work.

The second issue is that by visualizing only statistical significance, this table introduces a form of p-hacking. For those who don’t know, p-hacking is when you look for—and sometimes only represent/publish—results that are statistically significant at or above a particular level (FiveThirtyEight did a great explainer). Notice here how there is a conscious effort to categorize those results that are only statistically significant up to the 5% level; if an estimate was significant at the 7% level and thus would get a +sign next to the number, it would be given an empty square in the remake. This doesn’t mean we always need to show *every possible *level of statistical significance, but by not giving the reader the standard errors, confidence bounds, or asterisks, she is not able to make those determinations.

**Other Options**

Okay, onto my remakes. I took two primary approaches here: Adding visuals to a table and making separate data visualizations.

**Table with Bars or Circles**

I like the idea of embedding a graph within a table, though it can be a little challenging to do. Here, I inserted horizontal bars into the original table (using Excel’s Conditional Formatting tool) and colored them based on whether they are statistically significant (I also keep the */+signs for readers).

In this version, I used circles instead of Ann’s squares. I only did so because in her post she noted how difficult it was to draw the circle shapes and get them to line up. I used Symbols in Excel here, which was really easy, but again it’s probably not something I would use because of the aforementioned reasons.

**Numbers as Bar Charts**

In these two tables, I created dot plots with numbers instead of dots or another marker (FiveThirtyEight has a nice example of this technique). This expands the width of the table, but I’m able to keep the symbols denoting statistical significance while also visualizing the relative magnitudes. I also used boldface text for the statistically significant results to help further highlight them.

In this version, I take the same approach and also fill the statistically significant cells. Because the cells are pretty wide (driven by the 0.567 value in the first column), I’m not sure I love the way this looks.

**Dot Plots**

Moving away from the tables, I played around with different dot plots. Of course, I could have used a standard column chart or some kind of box-and-whisker chart, but I think the dot plot might be the best choice (plus, the column chart approach has some issues). Plus, I really like the approach Bogdan Micu took in his response to Ann’s post (also see Dana Wanzer’s post).

I tried four different dot plots, just playing around with different kinds of labeling:

–*Version 1* plots just the estimates with the error bars (that I made up based on the level of statistical significance).

–*Version 2* adds labels to that basic dot plot.

-In *Version 3*, I tried playing around with the labels. I didn’t really like how the labels looked in Version 2, so here I put them in the middle of the point. This required me to shrink the text size just a bit and blow up the size of the bubble. It’s not necessarily great because it hides some of the error bars, so you can’t quite tell which are statistically significant and which are not.

-Finally, in *Version 4*, I added color to the dot plot by filling in those markers that are statistically significant and leaving empty those estimates that are not statistically significant. This has the same issue with hiding the error bars as in *Version 3*, but the addition of the marker color helps denote which estimates are statistically significance and which are not. I also think this helps satisfy Ann’s “Squint Test.”

**Wrap-Up **

I’m basically left where I began: While I like Ann’s approach because it makes the table easier to read and adds visual elements, it ignores the different magnitudes of the point estimates, which are likely to be important for any sort of policy recommendation or action. I get the idea of making something very visual in the body of a report and leaving the details to the back or someplace else, but I think there are alternative visual approaches rather than getting rid of the numbers altogether.

Great article. I’m so glad that you’re keeping the conversation going.

Geez, it’s hard to write blog posts that anonymize the client’s data and still provide enough context for the reader.

I’ll add that:

– This was the first page of the Results chapter of the report. The real table was much longer (closer to 20 variables instead of my nine simplified ones for the blog post). Chapter 1 was a short intro (about the program, the countries they worked in, etc.). Chapter 2 was the results. So consider this Chapter 2, page 1.

– The rest of Chapter 2 (pages 2 through 21ish) were deeper dives into each of the 20ish variables. On those pages, we included both the significant and non-significant variables.

– We considered leaving out the non-significant variables, but felt it was important to show where the program wasn’t effective… especially for areas that we assumed would’ve been. Like if you provide meals to children in an orphanage then you would assume that their nutritional outcomes would be better off, but this wasn’t necessarily the case–lots of good discussions came from the report!

– The dot plots are really promising. I hope that you/we/others continue exploring them.

– Yep, we defined statistical significance (a paragraph in the Results chapter and a page in the Appendix). The Appendix had lots of methodological nerdy goodies. We didn’t actually expect the report’s audience to look at the appendix, but I’m a huge fan of including tons of details in appendices for the sake of transparency and accountability.

To be continued. Have to run and give a webinar now. Good job!

Thanks to both of you for this discussion. I’m currently dealing with three levels of statistical significance for a client. In one case, I used a heat map approach, with the lightest for p-value<0.1 and darkest for p-value<0.01. In another case, it's a connected scatterplot and I used the empty/filled markers approach of Ann's initial design and Jon's circles above. I might write a blog post when I'm done.

One suggestion I might make for your designs above is to also differentiate the variable name format when the result is significant across the board (bold, italic, background…) – it would spare the reader the work of figuring it out for each row.