Benford's Law and election fraud (a thread)

1) What is Benford's Law?

It's an observation that 1st (left-most) digits of values in many real data sets of counts will fall close to a logarithmic distribution shown below where a 1st digit is more likely to be 1 than other values
There are several explanations ( https://en.wikipedia.org/wiki/Benford%27s_law#Explanations) for WHY this occurs, but that isn't so relevant here. The main thing to know is that it has been used to detect fraud across several different domains because randomly generated numbers will not fit this distribution.
2) What claims are being made?

Some are claiming that in certain places, Biden's vote counts do not follow Benford while Trump's do. The idea is that this must indicate fraud. Here's an example. Looks pretty bad, right?

First, let's look at what type of fraud this implies.
3) What kind of fraud would this be?

Since this is across thousands of precincts, we are talking about fraud that would need to be pervasive in order to show up in the data. Basically it would need to be numbers that are being adjusted across most precincts.
If Democrats were going to perform such fraud, it would require a lot of coordination and lots of people staying quiet.

Also, it makes zero sense to commit such fraud in Chicago which will never affect the electoral college. Of course there are similar claims in other cities.
So even if we assume that Democrats are some super secret cabal that could pull off fraud on a massive scale (while deciding to not win back the Senate), why is it that Benford's Law doesn't make sense to use on this kind of election data?
4) An example:

I pulled the 2016 data from Cook Co (Chicago) ( https://www.cookcountyclerk.com/service/precinct-canvasses) and took a look. Here are histograms showing the total votes and Clinton% per precinct. See if you can figure out why Benford is not going to make sense.
Think about what circumstances would lead to a first digit of 1. Either Clinton gets a less than 50% of votes in precincts with 400 or fewer total votes (100-199 votes) or she gets a very high % in precincts with over 1000 votes (1000-1999 votes). That's really hard w/ this data.
Since the vast majority of precincts were 400-1000 total votes and Clinton got at least 45% in almost all precincts, it's really hard for her to get a vote count that starts with 1. And that shows in the data. (Also note that Clinton's distribution is almost identical to Biden's)
But what about Trump? His vote share is much lower than Clinton's, so he actually ends up with a first digit of 1 a lot more. And importantly, his values span across multiple orders of magnitude (1-9, 10-99, 100-999). This is actually a key requirement for Benford's law
5) Benford actually has been used to detect vote fraud

There is a version of Benford's law (see https://en.wikipedia.org/wiki/Benford%27s_law#Election_data) for *2nd* digits that has been used to detect vote fraud. As you can see, there is very little diff b/w Trump and Clinton in that regard
tl;dr:
Benford's Law doesn't make sense to use on precinct level data because the counts across precincts are too similar in size. Deviations from Benford (1st digits) are not evidence of fraud. Also, any fraud implied by Benford would need to be widespread and coordinated. /fin
Please share. Also would love feedback if missed something. I'm certainly not an expert on this, but felt the need to put something together. Excuse my quick Google Sheets graphs.

@abmakulec @SenhorRaposa @jncar76
You can follow @chadmbol.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.