
This figure has been published in a book chapter on the epidemiology of autism spectrum disorders (ASD), in which we discuss methodological factors impacting the estimation of prevalence and the interpretation of changes in prevalence estimates over time. All co-authors did not love this figure in particular, but at the time, we could not figure out a better way to illustrate the concept.
The main point of the figure (using simplistic hypothetical numbers) is to highlight a methodological confound in studies that use what are called “referral statistics”, that is, they estimate prevalence of ASD based on the number of individuals with ASD referred for specialist services or special education registers. Studies that rely solely on referral statistics to estimate prevalence of ASD can show large increases in prevalence of ASD between two time points, but this increase may not be due to an actual increase in the number of individuals affected by ASD. Patterns in referral statistics for ASD are necessarily confounded by referral patterns, availability of services, heightened public awareness, decreasing age at diagnosis, and changes over time in diagnostic concepts and practices.
I don’t have any immediate plans for using this figure again, but would love some ideas for illustrating this point visually. Solutions in R would be most helpful, but any ideas are welcome.
My first impression is that the pie chart is redundant, but I’m not even sure you need a visual at all to make the point. The text seems pretty clear without the hypothetical data. As presented, I as a reader am suspicious of how hypothetical the data is (though likely it’s clearer in context). I would want some assurance that it’s typical and representation of real data.
Assuming you had some real or realistically simulated data, one view that might add support is to show all the typical estimates, not just two time points. Now you can make the point that picking any two values in the first set is safe but picking two values in the second set is dangerous. Maybe even color two points in each set if you want to keep your two-time example.
I’ve centered both estimates on the true value, figuring any estimate would be calibrated to reduce overall bias. However, this “univariate scatterplot” works whether that’s the case or not. All the right-hand dots would just be below 100.
Hi Xan,
Thanks for your thoughts. I understand your skepticism about hypothetical data, but it must be here for 2 reasons: (1) the true prevalence of ASD is always an unknown number, we can only estimate it, with some estimates being better than others; and (2) relatedly, the proportion of individuals with ASD accessing services is also unknown since we don’t know how many are not accessing services. At the risk of quoting Donald Rumsfeld, these are “unknowable unknowns.” However, this figure is not meant to display real data, but to encourage researchers and doctors to interpret time trends in prevalence estimates of ASD with caution, as they are *often* misinterpreted (i.e., the autism “epidemic”). I like your idea, but I think it misses the most important element, which is showing the trend over time. It also suggests that one method of estimation at a single point in time is associated with less variable prevalence estimates, which may not be the case (and not the point we are trying the highlight here). But I really appreciate your response- it certainly got me thinking about better ways to explain this in the text as well!
Sincerely,
Alison
Hi Allison,
Here are two versions with time. One with two time values but still showing all expected values. The other shows time as continuous with the kind of trending described. I’ve highlighted two time points using each method to hopefully make your point about the danger of relying on just two data points with the service access estimate. The point pairs could be connected by lines to further emphasize the implied trend.
I also thought it was important to show there is also some random variation in the population survey, too (but presumably less).
At first I had the same thought as Xan: I get it–does this really need a chart? But then, my wife is a grade school teacher and we talk about this phenomenon once in a while.
I see two things to improve on the original:
1. The fact that there are two charts makes this harder to figure out than needed.
2. The trend line clearly slopes upward but doesn’t make you compare the start and end points.
Here’s one approach that addresses both. It needs more labeling, etc., but you get the idea.
J
Hi Jeff,
This is great- thank you! I recreated your version with bars, and also played around with a version with dots (inspired by Xan’s post). Unfortunately, I cannot seem to attach them here as I keep getting an error message. Just for context, the book this was published in is intended for people learning about the epidemiology of autism, so we introduce several concepts that we hope will encourage some healthy critical thinking (using hypothetical data) for interpreting time trends in prevalence estimates (i.e., not all increases are evidence for the “epidemic” hypothesis).
Thanks again,
Alison
Hi, Alison. Glad it was helpful. That makes sense. My wife has been paying attention to ASD for some time, so she’s been aware of increasing diagnosis rates, and her school just got an autism program a year ago. Now, even though she understands the phenomenon you’re describing, she has to actively fight her impression that the numbers have exploded.
Genuinely no matter if someone doesn’t understand after that its up to other visitors that they will help,
so here it happens.