Thread by @lyskoi, hello! on april 11th 2017, @jon_bois uploaded a video called "What if [...]

hello! on april 11th 2017, @jon_bois uploaded a video called "What if Barry Bonds had played without a baseball bat? | Chart Party." It's a great video, but it has stats issue -- one i've decided to address in this thread.

firstly, if you haven't watched the video, you should watch it before reading further.

second, i call this a "stats issue," but it's really not a flaw in the video. it's really not. it is an excellent story told with admirable rigor.

HOWEVER, i am me. so let's continue.

the video concerns the single-season On Base Percentage record. OBP is, roughly, how often you get on base. (roughly, because, baseball.) Here are three hundred best single-season OBP numbers an MLB player has reached since 1920. In bold: Barry Bonds.

Here's just the top of that chart. In red is the single-season OBP record: 2004 Barry Bonds, .6094. Below it in blue is the second-best: 2002 Barry Bonds, .5817. And there, in orange, is Jon's simulated BBw/oaB 2004: .6078.

The "point" of Jon's video is that, according to his simulation, 2004 Barry Bonds without a Bat would still beat out 2002 Barry Bonds for the all-time single-season OBP record.

But at the end if the video, Jon wonders if his simulation has a flaw. And it does.

Before we continue
-- again, you should really watch the video to (re)familiarize yourself with the constraints that Jon's (and my) simulation use
-- baseballreference's OBP data is given as a four-digit decimal, which i use as well, but multiplied by 1000 for readability.

ANYWAY: as many commenters on the video have noted, Jon uses a monte carlo simulation of BB's 2004, where pitches are resimulated depending on whether contact was involved. The problem: Jon only did one simulation.

I first learned about monte carlo simulations as a way to throw hot dogs at a circle to approximate pi. They're real cool. But you have to do them a lot of times. Like, at least hundreds. Preferably thousands.

Just one simulation is a data point, but in the monte carlo method, you take the *average* of all your data points, and *that's* your result. Which raises some questions:

what if Jon's result was an outlier?

what if 2004 BBw/oaB wasn't as great as we thought?

what if 2004 BBw/oaB wasn't (excluding 2004 BBw/aB) the greatest single-season OBP of all time?

i set out to answer this question.

so i did my duty: downloaded all of 2004 baseball as a 380k line text file; lead-fistedly found all the lines with barry bonds in them; parsed those lines into at-bats; separated them into walks, strikeouts, and balls in play; wrote some "code" to simulate pitches; and then, sim.

(you will later have an opportunity to fully understand how absolutely cork-brained it is to do this it is without scripting language, which i did, out of bull-hearted weakness or weak-hearted bullishness, i can't tell. really going for the object-bodypart metaphors this morning)

The result? Jon's result was 1.5 standard deviations above the mean: a higher-than normal result.

But it didn't matter.

2004 Barry Bonds without a Bat took home an OBP of 586.6 -- less than 5 points ahead of 2002 Barry Bonds -- to take home the all-time OBP crown.

Here's some more details on those results, including the final all-time chart, a fit-curve of a normal distribution that looks okay, a results timeline, and a little table with more info. It worked!

Also, i forgot the most important part of the result: confidence interval 99.9%.

Anyway, that's it. If you want to see my work, the spreadsheet's here: https://docs.google.com/spreadsheets/d/16pNgA_dVmC9Y1Pf5LE8BdodR3qphMHbjs4sHjwquObQ/edit. It includes my horrible methodology and a funny story i found in the data, so check it out.

I'd recommend following @secretbase for more video content.

That's all, thanks!

Barry Bonds Without A Bat Redux: Full Monte Carlo Simulation

Experiment TO UNDERSTAND WHAT I'M ABOUT TO DO, YOU NEED TO WATCH THIS VIDEO: https://www.youtube.com/watch?v=JwMfT2cZGHg What I'm going to do is the same experiment described in this video, but...

https://docs.google.com/spreadsheets/d/16pNgA_dVmC9Y1Pf5LE8BdodR3qphMHbjs4sHjwquObQ/edit

Latest Threads Unrolled: