How not to do a #COVID19 seroprevalence study
New paper out today from CDC that is so problematic I want to cry.
(great material for @callin_bull)
Quick thread. https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2768834
New paper out today from CDC that is so problematic I want to cry.
(great material for @callin_bull)
Quick thread. https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2768834
Background
We've known since Feb(!) that only fraction of SARS-CoV-2 infections cause illness & less cause severe illness. This led to huge uncertainty & confusion about fatality of #COVID19:
https://twitter.com/DiseaseEcology/status/1252844190070829056
& contributes to Qs about herd immunity: https://twitter.com/DiseaseEcology/status/1275595167936868352
We've known since Feb(!) that only fraction of SARS-CoV-2 infections cause illness & less cause severe illness. This led to huge uncertainty & confusion about fatality of #COVID19:
https://twitter.com/DiseaseEcology/status/1252844190070829056
& contributes to Qs about herd immunity: https://twitter.com/DiseaseEcology/status/1275595167936868352
Best way to measure all infections is seroprevalence study which estimates fraction of people w/ antibodies (Ab) from past infection. Method isn't perfect; e.g. a small % won't mount Ab & titres will wane but most (>90%) will be detectable still ( https://twitter.com/DiseaseEcology/status/1283282941729103872).
The challenge is doing study properly. If you want to estimate the fraction of a population (& subgroups w/in it) that have been exposed you need a few key things. Most important by far is random sample of people. If you don't get this part of study right, data are poor.
(Note: I've been whining about this for months:
https://twitter.com/DiseaseEcology/status/1252473766476541952)
https://twitter.com/DiseaseEcology/status/1252473766476541952)
Note: Bias can go in either direction
You can get serop too high if you invite people to join study (e.g. w/ a facebook ad) & those w/ symptoms are more likely to join; also if you sample people who have high risk jobs. https://twitter.com/DiseaseEcology/status/1251225273871134721
You can get serop too high if you invite people to join study (e.g. w/ a facebook ad) & those w/ symptoms are more likely to join; also if you sample people who have high risk jobs. https://twitter.com/DiseaseEcology/status/1251225273871134721
Bias can be huge: 3.5x+
Compare, e.g. seroprev of hispanic & white people in NYC. If you didn't know you were sampling different pops you could get very diff answers.
https://www.sciencedirect.com/science/article/pii/S1047279720302015
Compare, e.g. seroprev of hispanic & white people in NYC. If you didn't know you were sampling different pops you could get very diff answers.
https://www.sciencedirect.com/science/article/pii/S1047279720302015
You can get seroprev too low if you sample people that have been avoiding high risk activities: people who can work from home; who can pay for grocery delivery; or who can choose not to leave house.
Unfortunately, MOST seroprevalence studies I've read have substantial bias & in most cases we can't assess amount of bias so we can't adjust study to get accurate picture. Examples: blood donors; hospital patients admitted for non-covid reasons; HCWs, grocery store customers, etc
So what about today's paper by CDC in JAMA (respected public health org, top medical journal)?
Amazingly bad.
Note: CDC previously wrote a book on how to collect random sample from communities ( https://www.cdc.gov/nceh/casper/overview.htm) so we know they know how to do it & how important it is.
Amazingly bad.
Note: CDC previously wrote a book on how to collect random sample from communities ( https://www.cdc.gov/nceh/casper/overview.htm) so we know they know how to do it & how important it is.
But did they implement this procedure? No. Instead they looked in the freezer of clinical labs to see what was left over & decided the dregs were good enough to inform the public during a pandemic.
No effort at all was made to collect a randomized sample.
Really CDC?
No effort at all was made to collect a randomized sample.
Really CDC?
Did they make a herculean effort to address potential biases by pulling dozens of variables on samples that might make up for sampling biased pile of tubes left in freezer? Nope. None at all. Only adjustments were for test characteristics, age, sex.
How much does this matter? A ton, it appears. A quick peruse of the results from the paper makes it clear that the data are essentially uninterpretable.
How about some data on a topic no one cares about right now: safety of school re-openings. (clarity note: sarcasm)
Study shows age breakdown for 10 locations. Some might initially get excited & see data to supports their arguments about children being less susceptible, but...
Study shows age breakdown for 10 locations. Some might initially get excited & see data to supports their arguments about children being less susceptible, but...
Look carefully! There's patterns here for everyone!
IN LA, NYC, PA serop in 19-49yr are 3x kids (0-18). 6x in CT!
But in MN, kids 0-18 have HIGHEST serop by a ton! Also FL 0-18 2.5x 19-49. SF 1.6x higher.
Am I cherry picking? You bet!
Aren't they all good data b/c CDC, JAMA?
IN LA, NYC, PA serop in 19-49yr are 3x kids (0-18). 6x in CT!
But in MN, kids 0-18 have HIGHEST serop by a ton! Also FL 0-18 2.5x 19-49. SF 1.6x higher.
Am I cherry picking? You bet!
Aren't they all good data b/c CDC, JAMA?
Nope! Besides obvious & apparent issues (small N for many age groups/sites so huge CIs), there are unmeasurable biases.
I know of no reasonable biological reason that serop would be much higher in kids in 3 places but lower in 4. But I can imagine many BS reasons!
I know of no reasonable biological reason that serop would be much higher in kids in 3 places but lower in 4. But I can imagine many BS reasons!
There's only one way to safely use these data: toss them in the circular filing bin. (for clarity: trash)
Alternatives:
If the patterns fit your pre-conceived ideas, they are reasonably robust.
If they contradict it, they are problematic.
Why collect bad data in the 1st place?
Alternatives:
If the patterns fit your pre-conceived ideas, they are reasonably robust.
If they contradict it, they are problematic.
Why collect bad data in the 1st place?
Answer: Because it is easy.
Instead of designing proper study w/ randomized sample (see e.g. https://twitter.com/DiseaseEcology/status/1280303269269532672), they literally just took blood samples they already had lying around.
Also, one bonus gift for you...
Instead of designing proper study w/ randomized sample (see e.g. https://twitter.com/DiseaseEcology/status/1280303269269532672), they literally just took blood samples they already had lying around.
Also, one bonus gift for you...
Winner for most misleading graph:
Did you know that over time fewer people have antibodies? Waning Ab! (joke)
That's what this graph seems to show. @callin_bull this one is for you!
There are so many ways this could have been presented in informative way. This isn't one of them.
Did you know that over time fewer people have antibodies? Waning Ab! (joke)
That's what this graph seems to show. @callin_bull this one is for you!
There are so many ways this could have been presented in informative way. This isn't one of them.
Are the authors aware of the limitations of this study? You bet they are! They lay it out pretty clear in the Discussion. But if they have to write this in the Discussion doesn't it make you wonder why they did the study in the 1st place?
The only thing worse than this type of study is doing it again, and guess what? That's what they're doing. It's a recurring study! 2nd round has already been done & is ready for mis-interpreting (have at it!): https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/commercial-labs-interactive-serology-dashboard.html
Dear CDC,
Please read the book you wrote on randomized sampling & conduct a study that would actually be useful in helping us understand this pandemic.
Sincerely,
me
Please read the book you wrote on randomized sampling & conduct a study that would actually be useful in helping us understand this pandemic.
Sincerely,
me