Recently, I’ve gotten some questions on news reports that the Centers for Disease Control and Prevention (CDC) and many states have been conflating two different types of COVID-19 tests in their reporting. The substance of both the news reports and the subsequent questions was that this conflation made the data unreliable and that the situation could be worse—with respect to the virus and the pandemic—than we had thought. Indeed, it is a reasonable concern. And given that I use various coronavirus data here on the blog, I thought it would be worth writing about what we know about the virus and how we know it.
The truth is that there are many issues with the data we have. To name a couple, what deaths should be attributed to COVID-19? Those that are due to the virus or those where the virus was present but the cause of death was something else? And how do we tell? How do we account for deaths that might have been due to the virus but no testing was done? Is there political bias in the reporting? And on and on.
When you look at these questions, it might be easy to conclude that the data we have is so flawed that it’s worthless. For some purposes, that is indeed the case. The news reports on the testing, for example, correctly pointed out that the conflation of two types of tests makes the data useless for epidemiologists. That is indeed a problem.
Are things getting better (or worse)?
Most of us, however, are not epidemiologists. What we need is much simpler than what they need. All we need is to answer the simple question, are things getting better or worse? Once we have that answer, we can then try to dig deeper. But that is the foundational question we need to answer.
To find that answer, we can look at several different metrics that vary in terms of timeliness and robustness. The number of deaths is the most robust data set, in that someone is either dead or not. But it suffers from timeliness, as there is a delay of weeks between catching the infection and dying. The deaths statistic also suffers from the question I posed earlier (i.e., can we define the cause of death?).
The number of new cases is a more timely indicator than deaths, as there is a shorter lag time between inflection and the onset of symptoms. This number does not account, however, for undiagnosed cases or asymptomatic cases, and it does not define how a new case is diagnosed. It is dependent on the extent and reliability of testing, as well as on the flexibility of the diagnostic criteria.
Any metric we look at will have weaknesses and shortfalls, of course. So, the answer is to look at all of those and try to interpolate the truth and to be aware of the weaknesses in the conclusions drawn. This strategy is nothing new in either medicine or economics. But when the stakes are so high, we need to be even more conscious of the potential risks in our data.
With this context, we can consider how and whether we can use the CDC reports, which conflated different types of tests. First, let’s think about what we are losing from that report: the number of tests will be unreliable. Remember, however, that we are not looking for the number. We are looking for the trend: whether things are getting better or worse.
With the assumption that the mix of tests will remain reasonably consistent over time (which does not seem unreasonable), we can indeed continue to draw a trend line. That trend line says that tests continue to rise, so things are indeed getting better. We can also confirm that with the new cases number, if diagnostic criteria remain the same, trends should be reliable. If so, the trend continues down even as testing expands. We can further confirm that trend line with the death numbers, which again tell us the same thing for the same reasons.
The big picture
It makes sense to ask questions about the data we use and to be very cautious about what we believe. If, however, we think through the weaknesses and concerns, understand the limitations, and cross-check as much as we can through different sources, we can still draw useful conclusions despite the weakness of any individual data series—which, in the case of the recent reports, is the testing data.
Am I worried things might get worse? Yes, because I always am, and that possibility is a real risk. But, at the moment, that is not what the data is telling us here in the U.S.—and we can rely on that. The news about the CDC and states misreporting some testing data does not change the big picture.