Thread by @matt_zefferman, Ok. Here's a simple model of this I made last night waiting [...]

Ok. Here's a simple model of this I made last night waiting to see what happened with GA and PA vote totals. Can high COVID infection rates in Trump-voting countries be explained by sampling variability? https://apnews.com/article/counties-worst-virus-surges-voted-trump-d671a483534024b5486715da6edb6ebf https://twitter.com/matt_zefferman/status/1324495455116406784

I should say that I am not an epidemiologist, so consider this a thought experiment by a non-expert.

I downloaded the 2019 census projections for each US county. https://www.census.gov/data/datasets/time-series/demo/popest/2010s-counties-total.html

County Population Totals: 2010-2019

https://www.census.gov/data/datasets/time-series/demo/popest/2010s-counties-total.html

I could not find a dataset of 2020 presidential vote winners by county (and the AP did not cite their data source), so used 2016 vote winners as a proxy. https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ

I combined these into an R dataframe. However, I dropped Alaska and Hawaii since the databases coded their county-equivalents differently, so this model is CONUS only.

There are around 3000 counties or equivalents in the dataset. Their population looks power-law like with many many small counties and a few big ones:

Same thing on a log scale.

Most counties (about 84%) in 2016 voted Republican for president and the larger counties disproportionately voted Democratic.

In the first cut model, each county has a number of "outbreaks" that scales with population size. Specifically, the number of outbreaks is drawn randomly from a binomial distribution with n = county's population size and p = 1/10000.

https://en.wikipedia.org/wiki/Binomial_distribution

Outbreaks vary in size. Following the simpleist (non-superspreader) model in this paper: https://www.researchgate.net/publication/7476817_Superspreading_and_the_Effect_of_Individual_Variation_on_Disease_Emergence Each outbreak starts with one infected person, and each infected transmits to a number of people drawn from a Poisson distribution with lambda = 0.8.

(PDF) Superspreading and the Effect of Individual Variation on Disease Emergence

PDF | Population-level analyses often use average quantities to describe heterogeneous systems, particularly when variation does not arise from... | Find, read and cite all the research you need on...

https://www.researchgate.net/publication/7476817_Superspreading_and_the_Effect_of_Individual_Variation_on_Disease_Emergence

Lambda is the *average* number of people infected by another person. However, this varies with some people spreading more and some spreading less than lambda. Since lambda is less than 1, outbreaks don't continue forever. https://en.wikipedia.org/wiki/Poisson_distribution

I took the value of 0.8 from this post by Chris Moore @sfiscience, but as we will see, it doesn't really matter much to the result. https://www.santafe.edu/news-center/news/transmission-t-024-cristopher-moore-on-the-heavy-tail-of-outbreaks

Transmission T-024: Cristopher Moore on the heavy tail of outbreaks

R0 is just an average: the transmission rate varies widely, and outbreaks can be surprisingly large even when the epidemic is subcritical.

https://www.santafe.edu/news-center/news/transmission-t-024-cristopher-moore-on-the-heavy-tail-of-outbreaks

So each county has some number of outbreaks of different sizes. The fractions of those infected is just the sum of the size of the outbreaks in a county divided by its population. Most counties end up with small fractions of infected folks and a few have a high fraction.

$So each county has some number of outbreaks of different sizes. The fractions of those infected is just the sum of the size of the outbreaks in a county divided by its population. Most counties end up with small fractions of infected folks and a few have a high fraction.$

Here it is on a log scale.

Finally, we can plot population size by fraction infected in each county. Here red counties voted Republican and the blue counties voted Democratic. Note that this is a log-log plot. The horizontal lines divide out the top 10% of counties and bottom 10% in terms of infection.

$Finally, we can plot population size by fraction infected in each county. Here red counties voted Republican and the blue counties voted Democratic. Note that this is a log-log plot. The horizontal lines divide out the top 10% of counties and bottom 10% in terms of infection.$

As you might expect most counties are red. Large counties are disproportionately blue. The AP compared fraction infected with Republican support in the top 10% of counties by infection (those counties above the top line).

94% of highly infected counties are Republican - close to the AP number of 93% - and much higher than the 84% of all counties that are Republican voting. This is just due to small counties having larger variance in outbreak number and average size.

The same is true for counties with the *lowest* amount of infected (below the line in the plot). In this example 89% of these counties voted Republican vs 84% of the total.

This is consistent. Here are the results of 100 runs of the simulation. Highest and lowest infected counties are disproportionately GOP voting with the red line representing the total fraction of GOP counties. The high number is consistent with the AP study.

$This is consistent. Here are the results of 100 runs of the simulation. Highest and lowest infected counties are disproportionately GOP voting with the red line representing the total fraction of GOP counties. The high number is consistent with the AP study.$

The parameters don't matter much. Here it is with a reproduction rate of 0.9 instead of 0.8.

For a reproduction rate of 0.98 though, outbreaks get large enough that it decreases that outbreaks can start to overwhelm more of the the larger counties.

I think this is the effect that @juemos made to me yesterday about how you can't treat infectious disease the same as you would for non-infections disease like cancer.

However, with this infection rate something like an SIR model would be way more appropriate as you have to worry about frequency dependence etc. (Some of the counties in the above plot have more infected people than population for example).

I don't think the fraction of infected people in any county in the real world approaches one.

Keeping the original infection rate of 0.8 and increasing the outbreak rate from 1/10,000 to 1/1,000 does not change things much.

Finally, @juemos suggested that outbreak rate could be higher in larger counties because people from higher counties may travel more than those in small counties.

Latest Threads Unrolled: