Ok. Here's a simple model of this I made last night waiting to see what happened with GA and PA vote totals. Can high COVID infection rates in Trump-voting countries be explained by sampling variability? https://apnews.com/article/counties-worst-virus-surges-voted-trump-d671a483534024b5486715da6edb6ebf https://twitter.com/matt_zefferman/status/1324495455116406784
I should say that I am not an epidemiologist, so consider this a thought experiment by a non-expert.
I downloaded the 2019 census projections for each US county. https://www.census.gov/data/datasets/time-series/demo/popest/2010s-counties-total.html
I could not find a dataset of 2020 presidential vote winners by county (and the AP did not cite their data source), so used 2016 vote winners as a proxy. https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ
I combined these into an R dataframe. However, I dropped Alaska and Hawaii since the databases coded their county-equivalents differently, so this model is CONUS only.
There are around 3000 counties or equivalents in the dataset. Their population looks power-law like with many many small counties and a few big ones:
Most counties (about 84%) in 2016 voted Republican for president and the larger counties disproportionately voted Democratic.
In the first cut model, each county has a number of "outbreaks" that scales with population size. Specifically, the number of outbreaks is drawn randomly from a binomial distribution with n = county's population size and p = 1/10000.
https://en.wikipedia.org/wiki/Binomial_distribution
https://en.wikipedia.org/wiki/Binomial_distribution
Outbreaks vary in size. Following the simpleist (non-superspreader) model in this paper: https://www.researchgate.net/publication/7476817_Superspreading_and_the_Effect_of_Individual_Variation_on_Disease_Emergence Each outbreak starts with one infected person, and each infected transmits to a number of people drawn from a Poisson distribution with lambda = 0.8.
Lambda is the *average* number of people infected by another person. However, this varies with some people spreading more and some spreading less than lambda. Since lambda is less than 1, outbreaks don't continue forever. https://en.wikipedia.org/wiki/Poisson_distribution
I took the value of 0.8 from this post by Chris Moore @sfiscience, but as we will see, it doesn't really matter much to the result. https://www.santafe.edu/news-center/news/transmission-t-024-cristopher-moore-on-the-heavy-tail-of-outbreaks
So each county has some number of outbreaks of different sizes. The fractions of those infected is just the sum of the size of the outbreaks in a county divided by its population. Most counties end up with small fractions of infected folks and a few have a high fraction.
Finally, we can plot population size by fraction infected in each county. Here red counties voted Republican and the blue counties voted Democratic. Note that this is a log-log plot. The horizontal lines divide out the top 10% of counties and bottom 10% in terms of infection.
As you might expect most counties are red. Large counties are disproportionately blue. The AP compared fraction infected with Republican support in the top 10% of counties by infection (those counties above the top line).
94% of highly infected counties are Republican - close to the AP number of 93% - and much higher than the 84% of all counties that are Republican voting. This is just due to small counties having larger variance in outbreak number and average size.
The same is true for counties with the *lowest* amount of infected (below the line in the plot). In this example 89% of these counties voted Republican vs 84% of the total.
This is consistent. Here are the results of 100 runs of the simulation. Highest and lowest infected counties are disproportionately GOP voting with the red line representing the total fraction of GOP counties. The high number is consistent with the AP study.
For a reproduction rate of 0.98 though, outbreaks get large enough that it decreases that outbreaks can start to overwhelm more of the the larger counties.
I think this is the effect that @juemos made to me yesterday about how you can't treat infectious disease the same as you would for non-infections disease like cancer.
However, with this infection rate something like an SIR model would be way more appropriate as you have to worry about frequency dependence etc. (Some of the counties in the above plot have more infected people than population for example).
I don't think the fraction of infected people in any county in the real world approaches one.
Keeping the original infection rate of 0.8 and increasing the outbreak rate from 1/10,000 to 1/1,000 does not change things much.
Finally, @juemos suggested that outbreak rate could be higher in larger counties because people from higher counties may travel more than those in small counties.