If someone in your TL is screaming "Benford's Law" in the same breath as "election fraud," this is worth a listen.
Also worth a listen if you like learning things. https://twitter.com/agolis/status/1328389237234929668
Also worth a listen if you like learning things. https://twitter.com/agolis/status/1328389237234929668
I am not a mathematician, but here's a simple, broad-brush breakdown of why Benford's law works for forensic accounting, but does NOT work for forensic analysis of election data.
Benford's law says that for decimal representations of naturally-occuring data sets, 1 will appear as the 1st digit around 30% of the time, while 9 will only appear there around 4% of the time.
This happens because in naturally occurring data sets, big numbers don't get to be big numbers without first being smaller numbers.
(I said broad brush. VERY BRUSH. SUCH BROAD.)
(I said broad brush. VERY BRUSH. SUCH BROAD.)
If you look at the number of twitter followers of each of the people who follow you (this is an experiment you can perform right now!) you'll see more 1s than 9s as the 1st digit of the various follower counts.
Do this enough times, and you'll get Benford percentages.
Do this enough times, and you'll get Benford percentages.
In forensic accounting, scientists are looking for people who cook their books. Those people are trying to HIDE MONEY. Unless they're very, VERY clever about it, removing money from a naturally occurring data set creates an unnatural data set, with broken Benford everywhere.
And this is the beginning of the explanation of why you can't use Benford to evaluate precinct tallies in an election.
First, a precinct is not naturally-occurring data. It's an artificial construct, designed so that all the precincts in the region are roughly the same size. Precinct sizes themselves break Benford's law. We might see all 1s and 2s, with no 9s.
Second, the election divides that non-naturally-occurring number into two numbers (with some tiny remainders for independent candidates).
Assuming the election isn't a blowout, those two numbers will ALSO fail to adhere to Benford.
Assuming the election isn't a blowout, those two numbers will ALSO fail to adhere to Benford.
Precinct has 1,000 voters? We can reasonably expect the tallies to be something like 403 to 597. Not very Benfordy AT ALL?
Why? because even though the division is "organic" or "naturally occurring," it was a division of a human-generated population.
Why? because even though the division is "organic" or "naturally occurring," it was a division of a human-generated population.
Which means that Benford's law, with the percentages for naturally-occurring 1st-digits, can't be used here.
Now, there are some really complicated tools for applying Benfordesque analysis to non-naturally-occurring data sets, but oh wait I AM NOT A MATHEMATICIAN.
Now, there are some really complicated tools for applying Benfordesque analysis to non-naturally-occurring data sets, but oh wait I AM NOT A MATHEMATICIAN.
Suffice it to say, when analysts look at the election data from this election, using the fancy math tools they use to detect fraud in sets where Benford does NOT apply, there's no evidence of fraud.
And because of the way these statistical tools work, "no evidence of fraud" means that whatever tallying fraud may or may not have been committed, it happened at a scale which didn't even affect the PRECINCT, much less the overall election.
In the first tweet I said that the @Radiolab podcast is worth a listen if you've got people screaming about "Benford" and "election fraud" in your timeline.
I feel like I owe you an apology.
The podcast won't change THEIR minds.
It'll protect YOURS.
I feel like I owe you an apology.
The podcast won't change THEIR minds.
It'll protect YOURS.