It almost goes without saying, but tables are a way to visualize data.
Tables of regression results are a common type of table researchers create. But they can be frustrating to create well—point estimates, standard errors, asterisks, variable names, column headers, and notes means there are a lot of potential objects to corral. The “outreg2” command in Stata, for example, enables you to output your regression results to Excel files, and will get you pretty close; and I’ve been told (but haven’t tried) the broom package in R. In any case, there are likely times when your table results don’t look great because you’re moving from one tool to another.
Take this table, for example. It’s just a random example you could likely find in any academic journal article or working paper. Start with the first column of numbers and notice how the decimals are not lined up for many of them— -0.8094, for example, is shifted slightly to the right. It’s even a little worse when there are asterisks denoting statistical significance. Look at the third column—the 0.0016*** at the top and 0.00054*** about halfway down are not lined up with the rest of the numbers.
Aligning numbers along the decimal/comma makes it easier for your reader to find large and small values. Overall, it also seems to make tables easier to navigate. In which table is it easier to find the largest and smallest values?
For creating regression tables in Excel, what’s the best way to align all of the various table elements together? One challenge is that if you have point estimates with asterisks denoting statistical significance, Excel treats those values as Text and not as Numbers. Thus, the standard formatting approach where you would simply format the cells as numbers is not going to work. I think I’ve come up with a workaround using Custom Number Formats that will enable you to format tables of regression results using formulas so that you can easily apply them to all of your tables.
If you have a different method, please share it in the comments section below—I’m curious to see what approaches others are taking!
A Quick Primer on Custom Number Formats
Custom number formats allow you to control how numbers are displayed in Excel. The primary benefit of custom number formats is that the formats stay the same while the numbers can change. I won’t go into a full treatment of how they work here but if you’re interested I highly recommend Dave Bruns’ blog post over at ExcelJet.
The basic structure of custom number formats may look complex, but they have a specific syntax. Semicolons are used to parse the formats into four sections:
- Positive values
- Negative values
- Zero values
- Text values
You don’t need to specify all four values, and formats can handle different colors, special symbols, and other formatting.
Using Number Formats
Let’s say you want to format two numbers—5,000 and -5,000—and you want to align them following three rules:
- they should both have a comma after the 5,
- the negative number is placed within parentheses, and
- both numbers are aligned along the comma.
In other words, you want “5,000” and “(5,000)” aligned along the comma.
We can use Excel’s built-in Custom Format menu to apply this preferred format. Select the two cells, right-click, and select Format Cells. At the bottom of that menu is the Custom area, in which we can select this default built-in format: #,##0_);(#,##0). This format is doing something pretty simple:
- Positive numbers (the first argument) are formatted with a comma and any trailing zeros; the “_)” term adds a space the width of a closed parentheses after the number. (They underscore says ignore the next object and insert a space the width of that object, here a closed-parentheses.)
- Negative numbers (the second argument) are similarly formatted with a comma and a trailing zero and the number is surrounded by parentheses.
Another way to apply these formats is to use Excel’s TEXT function. The TEXT function has two arguments:
Value: the number to be formatted
Format_text: the number format to be applied
In this case, we put the number formats in their own cells and then refer to them in the TEXT formula: =TEXT(A1,B1). As you can see in cells C1 and C2, this gives you the same result as before, but you don’t need to click on anything.
Formatting a Regression Table
Now that we (kind of) understand Excel’s custom number formatting menu and how to use the TEXT function, we can put them together to format a table of regression coefficients.
The first step is to move the numbers from the original table (above) into Excel. A simple copy and paste yields a table that is even more unaligned than the original.
One potentially easy way to resolve the formatting issue is to put the asterisks in their own columns. If we do that and merge the column headers across the pairs of columns, the digits look lined up, but the asterisks look a little far from the numbers. This isn’t the worst table in the world and at least the numbers are aligned along the commas. This has been the approach I’ve been using for a while.
Now let’s try using the TEXT function to create a table in which the asterisks are close to the numbers and everything is nicely aligned.
Using the same approach as before, I’m going to first determine how many spaces I need after the number so that the numbers are aligned along the decimal. That determination is made based on the statistical significance (one, two, or three asterisks). To help make that assessment, I use the table above in which the numbers are in one cell and the asterisks are by themselves in a neighboring cell.
You can see in this little lookup table, I list the various formats I’m going to use in the main table. If the number is statistically significant, I’ll include asterisks in the format; if not, I’ll include some number of spaces after the number to get the spacing right.
I can now embed the TEXT formula within an IF formula to reference each format. In the second estimate in Model 1 [“Percent Black” in cell AL4: 0.2628**], for example, the formula is:
All this says is if the point estimate has one star, place the point estimate in cell J3 (the original number) and use the number format in cell AL18; if it has two stars, use the point estimate in cell J3 and use the number format in cell AL17; and so on. As you can see, the final format is a number with a leading zero, four decimals, and two asterisks.
The last part is to format the column titles. At the top of the table, you can see the differences between aligning the numbers to the left, right, or center. Personally, I’m partial to customizing the placement of the title—as long as I’m doing the numbers, I might as well do the text too. I again use the TEXT formula to apply the format specified at the bottom of the lookup table.
I haven’t provided you with a full treatment of custom number formats in Excel here (again, see Dave Bruns’ post if you’re interested in learning more), but you can hopefully see how you could use these formats to your advantage to help style your tables, graphs, and more.