NFL WR Edition: Have you ever wondered what it would be like to stick a bunch of WR data into a machine learning model?
We'll walk through how we created the model and take a look at the top 3 most important traits in a WR

We'll walk through how we created the model and take a look at the top 3 most important traits in a WR



First we need to define a success metric. A common dynasty player success metric is "# of top 24 seasons" so to keep it simple we'll stick with that
To further simplify it, we'll convert this value to "did this WR have a top 24 season at any point in their career?"
To further simplify it, we'll convert this value to "did this WR have a top 24 season at any point in their career?"
Next we need a player database. @pahowdy has an incredible database we can use and a bunch of free content for you to play with. We would highly recommend you follow him if you haven't already https://twitter.com/pahowdy/status/1349835451985768463
From his WR database, we removed columns that would've messed with the model too much
For example, it seems obvious that '# of 1000 yard receiving seasons" is highly correlated to a WR24 season
For example, it seems obvious that '# of 1000 yard receiving seasons" is highly correlated to a WR24 season
We're also going to remove 'Draft Round' and 'Draft Pick' so we can predict WRs without having that future knowledge
The question we're trying to answer: How can we find a WR gem without knowing these values?

The question we're trying to answer: How can we find a WR gem without knowing these values?



Next, let's split the data set into 80% training and 20% test. This means that only 80% of the data gets fed into the model and then the model makes predictions on the remaining 20% that it hasn't seen yet 


The model is then graded on how accurate it was for the test data



The model is then graded on how accurate it was for the test data
We'll use a popular model often used to win data science competitions - XGBoost
The model itself is fairly complex, but we can implement it in code pretty simply. If you want to try it out check out our repository: https://github.com/LeoXia360/nfl-data/blob/main/nfl/wide-receivers.ipynb
The model itself is fairly complex, but we can implement it in code pretty simply. If you want to try it out check out our repository: https://github.com/LeoXia360/nfl-data/blob/main/nfl/wide-receivers.ipynb


Without extensive tuning, our model received an 85% accuracy on the test data
This means that our model has an 85% chance of predicting whether or not a WR will have a top 24 season
However, the #1 question for us is which columns were most important to the model?






..and you guessed it

Perhaps it shouldn't be a surprise that the top 3 traits are all physical attributes



While we found there was a positive correlation between these top 3 WR traits, they weren't *super* strong (max correlation is a value of 1):



So what does this mean for us?





If you liked this read then consider subscribing to our YouTube channel where we try to explain analytics in a
simple
way 

We breakdown popular metrics, discuss trade value, and talk about how you can build a championship winning roster





We breakdown popular metrics, discuss trade value, and talk about how you can build a championship winning roster

