Think about the last time you filled out a hand-written survey. Not an online form where you quickly click the Yes/No or Agree/Disagree radio buttons and then ‘Submit,’ but a true, in-person survey where you write your answers on a piece of paper. Maybe it was a petition outside your grocery store or an intake form at your doctor’s office.

Now think about what your handwriting looked like on that form. Was it curly and light? Did you press hard and use block letters? All caps? What kind of pen did you use? Was it yours or was it given to you? Did your writing cross lines on the page or extend into the margins?

I’ve been thinking about these attributes of data collection lately. It’s not the data that was intended to be collected, but is data nonetheless. In a recent conversation with my friend Stefanie Posavec, she called them “lost data.”

“There’s data that’s collected by a study; then there’s data that isn’t collected….so there’s all this data that’s actually being lost. Because it’s not being collected….it’s like additional information about these people. But we’ve chosen not to collect it.”

-Stefanie Posavec

Last May, Chrstine Emba of the Washington Post asked attendees of the National Gun Show in Chantilly, Virginia to write down what gun ownership meant to them. Most replied with the word “protection.” But, for me, what’s impactful about Emba’s story are respondent’s handwritten answers to that prompt. Respondents didn’t blindly type something into a digital form or check a box on a survey; they took time to provide that information. And the different handwriting styles enables me to imagine who the person is and imagine their experience and their identity.

Animated gif of different handwritten notes of why people like guns from the Washington Post

Over the last three years or so, I’ve written and edited six issues of the Urban Institute’s Do No Harm project. Those guides challenge those of us who work with data to think about taking an equitable and inclusive approach to our work. One of the guide’s core tenets is to listen to people so that we can better understand their perspectives and experiences. In the first report, we argued that:

“Quantitative researchers and analysts especially should consider how best to incorporate qualitative methods when conducting research that engages lived experiences. Long-form surveys, interviews, and focus groups can provide an important opportunity for community members to share their experiences and lift up their voices. Such questions can also help answer “why” questions and surface themes in the responses.”

Do No Harm Guide: Applying Equity Awareness in Data Visualization

In doing this qualitative work, I wonder if we can therefore do a better job of capturing this “lost data.”

Separating the task of data visualization from the rest of the data workflow (i.e., collecting, analyzing, etc.), lost data overlaps with Giorgia Lupi’s Data Humanism concept where “Data, if properly contextualized, can be an incredibly powerful tool to write more meaningful and intimate narratives.” But contextualizing only the data we collect leaves out the data we miss. This is the challenge of lost data.

Sometimes the most meaningful data are not necessarily what is available to us. So often when working with data we tend to focus only on the numbers that are already there without realizing that they can actually become much more meaningful if we’re able to uncover a more nuanced and expressive type of data along them.

-Giorgia Lupi, University of Michigan

All of this being said, recording “lost data” doesn’t come without concerns. How would we address the privacy considerations with data that are not necessarily part of the “official” data collection process? How do our biases and preconceptions potentially influence how we might collect lost data? So while I think we should consider adding to our data collection efforts, we should first ask whether we should try to collect those data.

As researchers and data visualization practitioners, how can we explore these “lost data”? Can we collect such data? And, if so, can we communicate them in a way that allows us to learn more about people and communities? Perhaps we should be more careful to use the “data behind the data”—the attitudes and handwriting and emotions people use as they answer our surveys and in our data collection.

I’ll be thinking more about lost data in the coming weeks as I work on another Do No Harm project, but am eager to hear what you think, so reach out and let me know.

Thanks to Alice Feng and Wesley Jenkins for reading early drafts of this post.