So, a friend of mine last night sent me some tweets from an (anonymous) account using an analysis related to Benford’s “law”. The analysis concludes that Biden’s first-digit vote totals are “extremely anomalous compared to Trump’s.
I’m going to explain in this thread why this analysis is ignorant junk at best.
(For anyone who doesn't know me & cares about credentials, I'm a lawprof/former econ prof with a PhD in economics from MIT & have published many stats-forward articles https://scholar.google.com/citations?user=nYi57uEAAAAJ&hl=en)
(For anyone who doesn't know me & cares about credentials, I'm a lawprof/former econ prof with a PhD in economics from MIT & have published many stats-forward articles https://scholar.google.com/citations?user=nYi57uEAAAAJ&hl=en)
When Benford’s “law” applies, we expect a distribution of numbers to exhibit more values that begin with a small leading digit than a larger one.
The reason is that, eg, to get to 20 you have to first go through 10, 11, …, 19, so there are ten numbers that begin with a “1” before any that being with a “2”.
Under some conditions this means you expect to see a first-digit distribution like this one:
Under some conditions this means you expect to see a first-digit distribution like this one:
Here’s the Wikipedia discussion of Benford’s “law” for those who are interested.
https://en.wikipedia.org/wiki/Benford%27s_law
https://en.wikipedia.org/wiki/Benford%27s_law
The anonymous @statsguyphd did an initial analysis of precinct-level data. Someone else picked up their ball and has posted data and code at https://github.com/cjph8914/2020_benfords (see @statsguyphd’s pinned tweet image below)
I went to the github link, and here's the results of the applied to Chicago precinct-level data.
WOW!!!! EXTREMELY ANOMALOUS!!!!!
Biden/Harris have way too few precincts with leading-digit 1 and Trump/Pence has too *many*! Anomaly!
Right??? Right?
right?
Um: Nope.
WOW!!!! EXTREMELY ANOMALOUS!!!!!
Biden/Harris have way too few precincts with leading-digit 1 and Trump/Pence has too *many*! Anomaly!
Right??? Right?
right?
Um: Nope.
I downloaded the Chicago csv file provided on the github link (Chi was first on the list) & with a little work found that of 2k+ precincts, only 9 had at least 1000 votes.
Only 2 of those 9 precincts have at least 1000 *Biden* votes & *none* has at least 1000 Trump votes.
Only 2 of those 9 precincts have at least 1000 *Biden* votes & *none* has at least 1000 Trump votes.
Why does that matter?
Three reasons.
First, it means the only way a candidate can have a leading “1” is to have precinct-level votes of 10-19 or 100-199.
Second, Biden's avg precinct-level total in the data set is about 379.
Third, Trump's avg precinct-level total is 78.
Three reasons.
First, it means the only way a candidate can have a leading “1” is to have precinct-level votes of 10-19 or 100-199.
Second, Biden's avg precinct-level total in the data set is about 379.
Third, Trump's avg precinct-level total is 78.
Put it all together & you see Trump has a lot more leading-digit “1” values than Benford’s “law” predicts because Trump's precinct-level vote totals are *low*.
Chi voters don't dig Trump; he has many precinct totals in 1/10-19/100s.
Chi voters like Biden; his totals are > 200.
Chi voters don't dig Trump; he has many precinct totals in 1/10-19/100s.
Chi voters like Biden; his totals are > 200.
As with so much else, the ANOMALY!!! analysis boils down to suggesting that Democrats' popularity with Democratic voters is somehow surprising. Extremely anomalous! Maybe even *fraudulent*.
Step off.
Step off.
Benford’s “law” ain't a law; that's why I've been using quotes.
The distribution in question provides a good fit under conditions that apply sometimes but not others, eg, when, as here, data have a fixed max & are distributed away from 1/10-19/100-199.
https://en.wikipedia.org/wiki/Benford%27s_law
The distribution in question provides a good fit under conditions that apply sometimes but not others, eg, when, as here, data have a fixed max & are distributed away from 1/10-19/100-199.
https://en.wikipedia.org/wiki/Benford%27s_law
Once a trained statistician-or even a kind of thoughtful person-sees this is precinct-level data & thinks about the size of precincts & the candidates' relative popularity, it should take that person about 14 seconds to realize Benford’s “law” analysis is garbage-in/garbage-out.
Still don’t believe me? Ok, let's do a simple analysis of the precinct-level vote *totals*.
There are 30 precincts with a leading-digit “1” and WOW THAT’S EVEN FEWER THAN FOR BIDEN ALONE!!!!! ZOMG THE FRAUD EXTENDS TO THE PRECINCT-LEVEL **TOTAL** NUMBERS OF VOTES!!!!
Nah.
There are 30 precincts with a leading-digit “1” and WOW THAT’S EVEN FEWER THAN FOR BIDEN ALONE!!!!! ZOMG THE FRAUD EXTENDS TO THE PRECINCT-LEVEL **TOTAL** NUMBERS OF VOTES!!!!
Nah.
Stop with the caps and !.
You need to calm down / You’re being too loud.
You need to calm down / You’re being too loud.
This @statsguyphd's analysis isn’t evidence of anything even slightly anomalous.
In the data, anyway.
What it is evidence of is - to be charitable - deeply irresponsible sloppiness. The anonymous person behind the account should retract and apologize to those they duped.
In the data, anyway.
What it is evidence of is - to be charitable - deeply irresponsible sloppiness. The anonymous person behind the account should retract and apologize to those they duped.
If you still think there’s something here, I challenge you to go get precinct-level data from heavily pro-Trump areas with a similar distribution of total precinct-level votes, and redo this analysis. See what you get.
Ok, I have spent enough time on this crap.
/fin
/fin