There have been a lot of questions about the @HHSGov hospitalization data. Because of the political appointees at the department, many people looked askance at HHS producing this data. We can now present some new analysis that the data is...
good. https://covidtracking.com/blog/what-weve-learned-about-the-hhs-hospitalization-data
good. https://covidtracking.com/blog/what-weve-learned-about-the-hhs-hospitalization-data
When we account for variations in state data definitions and that (generally) CTP hospitalization data lags HHS data by one day, we see that the two datasets are nearly perfectly matched.
If HHS current hospitalization data is bad, then the state data is bad, too.
If HHS current hospitalization data is bad, then the state data is bad, too.
The nearly perfect matches hold across 40+ states, when what the states are reporting is properly understood.
In only two cases (Mississippi and New Mexico), do we see the state reporting more hospitalizations than HHS.
You can see for yourself the numbers move in concert.
In only two cases (Mississippi and New Mexico), do we see the state reporting more hospitalizations than HHS.
You can see for yourself the numbers move in concert.
This may be surprising to you. In the early days of this dataset, it had some serious problems. And the new reporting methods that were used to create it caused some problems, too. We covered this at the time over the summer. https://covidtracking.com/blog/hospitalization-data-reported-by-the-hhs-vs-the-states-jumps-drops-and-other
But the data got better and better. The civil servants inside and outside HHS have provided more and better data (and metadata). And that's led to this. These datasets—with CTP state data idiosyncrasies properly accounted for—are within 2% almost all the time now.
Recently, an enterprising article in Science got a hold of a CDC analysis comparing the state (i.e. CTP) and HHS datasets. @cpiller is a great journalist, but I don't think his sources got this one right. https://www.sciencemag.org/news/2020/11/federal-system-tracking-hospital-beds-and-covid-19-patients-provides-questionable-data
The CDC analysis seems, to my eye, quite flawed. It does not take into account published notes on the variability of state data definitions and specific features of the HHS methods.
Just as an example, one of his sources in Alabama seemed to compare the state's data favorably to HHS data. But just look for yourself here. Alabama's data is good—exactly as good as HHS—but the state only reports "confirmed" hospitalizations, rather than confirmed+suspected.
I think the worries about this data gathering going into HHS were justified. There were all kinds of weird things going on at HHS when the change happened. But the fact that the data produced there is SO concordant with what states publish is, in fact, an achievement.
As someone who has devoted my year to compiling national statistics: the Federal government should do this! And they should do it transparently and well.
In the case of current hospitalization data, having worked with it for months, I think the HHS effort now clears that bar.
In the case of current hospitalization data, having worked with it for months, I think the HHS effort now clears that bar.
Let me also say that this analysis is the product of months of working with this data by @anthropoco @PeterJ_Walker @NotoriousRSG @betsyladyzhets @cat_pollack @jessicamalaty @kissane and of course the hundreds of volunteers who create the @COVID19Tracking dataset
Oh and important addendum from former US Deputy CTO and @COVID19Tracking board member @rypan: https://twitter.com/rypan/status/1335015768681979905