Thread by @gelbach, So, a friend of mine last night sent me some tweets from [...]

So, a friend of mine last night sent me some tweets from an (anonymous) account using an analysis related to Benford’s “law”. The analysis concludes that Biden’s first-digit vote totals are “extremely anomalous compared to Trump’s.

I’m going to explain in this thread why this analysis is ignorant junk at best.

(For anyone who doesn't know me & cares about credentials, I'm a lawprof/former econ prof with a PhD in economics from MIT & have published many stats-forward articles https://scholar.google.com/citations?user=nYi57uEAAAAJ&hl=en)

Jonah Gelbach

University of California at Berkeley Law School - Cited by 9,058 - Civil Procedure - Applied Econometrics - Evidence - Legislation - Public Economics

https://scholar.google.com/citations?user=nYi57uEAAAAJ&hl=en

When Benford’s “law” applies, we expect a distribution of numbers to exhibit more values that begin with a small leading digit than a larger one.

The reason is that, eg, to get to 20 you have to first go through 10, 11, …, 19, so there are ten numbers that begin with a “1” before any that being with a “2”.

Under some conditions this means you expect to see a first-digit distribution like this one:

Here’s the Wikipedia discussion of Benford’s “law” for those who are interested.
https://en.wikipedia.org/wiki/Benford%27s_law

The anonymous @statsguyphd did an initial analysis of precinct-level data. Someone else picked up their ball and has posted data and code at https://github.com/cjph8914/2020_benfords (see @statsguyphd’s pinned tweet image below)

I went to the github link, and here's the results of the applied to Chicago precinct-level data.

WOW!!!! EXTREMELY ANOMALOUS!!!!!

Biden/Harris have way too few precincts with leading-digit 1 and Trump/Pence has too *many*! Anomaly!

Right??? Right?

right?

Um: Nope.

I downloaded the Chicago csv file provided on the github link (Chi was first on the list) & with a little work found that of 2k+ precincts, only 9 had at least 1000 votes.

Only 2 of those 9 precincts have at least 1000 *Biden* votes & *none* has at least 1000 Trump votes.

Why does that matter?

Three reasons.

First, it means the only way a candidate can have a leading “1” is to have precinct-level votes of 10-19 or 100-199.

Second, Biden's avg precinct-level total in the data set is about 379.

Third, Trump's avg precinct-level total is 78.

Put it all together & you see Trump has a lot more leading-digit “1” values than Benford’s “law” predicts because Trump's precinct-level vote totals are *low*.

Chi voters don't dig Trump; he has many precinct totals in 1/10-19/100s.

Chi voters like Biden; his totals are > 200.

As with so much else, the ANOMALY!!! analysis boils down to suggesting that Democrats' popularity with Democratic voters is somehow surprising. Extremely anomalous! Maybe even *fraudulent*.

Step off.

Benford’s “law” ain't a law; that's why I've been using quotes.

The distribution in question provides a good fit under conditions that apply sometimes but not others, eg, when, as here, data have a fixed max & are distributed away from 1/10-19/100-199.

https://en.wikipedia.org/wiki/Benford%27s_law

Once a trained statistician-or even a kind of thoughtful person-sees this is precinct-level data & thinks about the size of precincts & the candidates' relative popularity, it should take that person about 14 seconds to realize Benford’s “law” analysis is garbage-in/garbage-out.

Still don’t believe me? Ok, let's do a simple analysis of the precinct-level vote *totals*.

There are 30 precincts with a leading-digit “1” and WOW THAT’S EVEN FEWER THAN FOR BIDEN ALONE!!!!! ZOMG THE FRAUD EXTENDS TO THE PRECINCT-LEVEL **TOTAL** NUMBERS OF VOTES!!!!

Nah.

Stop with the caps and !.

You need to calm down / You’re being too loud.

This @statsguyphd's analysis isn’t evidence of anything even slightly anomalous.

In the data, anyway.

What it is evidence of is - to be charitable - deeply irresponsible sloppiness. The anonymous person behind the account should retract and apologize to those they duped.

If you still think there’s something here, I challenge you to go get precinct-level data from heavily pro-Trump areas with a similar distribution of total precinct-level votes, and redo this analysis. See what you get.

Ok, I have spent enough time on this crap.

/fin

You can follow @gelbach.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: