DEFENSES DON'T MATTER, Episode 2:
A follow-up to projecting weekly fantasy points using player data, team data, and betting data.
A follow-up to projecting weekly fantasy points using player data, team data, and betting data.
In case you missed episode 1. https://twitter.com/rbkeeney/status/1309935287007576064?s=20
Projecting weekly fantasy points is challenging for a variety of reasons: new players, injuries, small sample sizes, etc.
This thread is primarily about debunking the use of team betting lines.
Reminder from episode 1:
Player data >>> everything else, but especially *especially* defensive fantasy points allowed data.
Want an example?
Reminder from episode 1:
Player data >>> everything else, but especially *especially* defensive fantasy points allowed data.
Want an example?
This is a tweet from late October. https://twitter.com/jagibbs_23/status/1322296303192317954?s=20
Now, this is not attacking Josh - I'm sure he's a good guy. But, check out this following tweet from the great Scott Pianowski: https://twitter.com/scott_pianowski/status/1338881050546728960?s=20
So, in the span of about a month, Seattle went from being the best (worst?) defense for QB/WRs to the worst (best?).
Cool. cool cool cool.
I hope you didn't start a mediocre WR3 against them hoping they'd get that Seahawk bump.
Cool. cool cool cool.
I hope you didn't start a mediocre WR3 against them hoping they'd get that Seahawk bump.
Okay, back to our primary topic: betting lines.
I've been using betting lines in my projections, while they don't *hurt* my projections, they don't really add any value either.
So... what does help?
I've been using betting lines in my projections, while they don't *hurt* my projections, they don't really add any value either.
So... what does help?
Let's talk about a few things we can do to improve weekly projections.
1. Get lucky
2. Use an appropriate sample size
3. Use the best data
Let's break each of those areas down:
1. Get lucky
2. Use an appropriate sample size
3. Use the best data
Let's break each of those areas down:
Getting lucky:
There are a lot of things in projections that aren't easy to model. We can get close, but even the BEST weekly projections from a simple modeling perspective are going to hover around an adj-rsq of 0.2-0.4 depending on the position.
There are a lot of things in projections that aren't easy to model. We can get close, but even the BEST weekly projections from a simple modeling perspective are going to hover around an adj-rsq of 0.2-0.4 depending on the position.
Now, you can improve upon that by incorporating additional information such as injuries, depth charts, QB based adjustments, up to date NFL knowledge, etc.
But the edge is small. It's still an edge, but it's small.
But the edge is small. It's still an edge, but it's small.
Consider this: we don't have to go very far to see that weekly projections are all about variance.
Go check out the returns of the best DFS players... ~15% over the long run?
Go check out the returns of the best DFS players... ~15% over the long run?
A good process with lots of variances is still going to be wrong quite often - hopefully less than 50% of the time, but you get the point.
Please remember this the next time you ask for start-sit advice from your favorite analyst.
Please remember this the next time you ask for start-sit advice from your favorite analyst.
On to point #2: sample size.
Better data, specifically with stats, comes in the form of % of team total, yards per route, snap data, etc, etc.
We combine these stats within models to make better projections or descriptive stats (like expected points) depending on our goal.
Better data, specifically with stats, comes in the form of % of team total, yards per route, snap data, etc, etc.
We combine these stats within models to make better projections or descriptive stats (like expected points) depending on our goal.
I've tried my hand at building lots of different models for projections. Linear, non-linear, elo-type, etc.
What I really want to focus on is the question: How much data do you need?
Put another way: How much data is required before your models stop improving?
What I really want to focus on is the question: How much data do you need?
Put another way: How much data is required before your models stop improving?
Wait. That's right. I said the model stops improving.
When projecting seasonal data, MOAR data = better (usually). Same for college prospects.
For weekly projections, you PEAK with about 4-5 weeks worth of averages.
When projecting seasonal data, MOAR data = better (usually). Same for college prospects.
For weekly projections, you PEAK with about 4-5 weeks worth of averages.
That varies on the stat/model, and you don't lose a ton by going with 2-3 weeks, but you certainly don't add anything by using 6-10... and in a few cases, 10+ weeks actually starts to make the model worse!
*Obviously using 1 week of data results in a poor projection.
*Obviously using 1 week of data results in a poor projection.
That sort of seems obvious in retrospect because our projection is trying to reflect the current state of the team!
In fact, some other studies on non-FF data show similar results for team offensive and defensive strength, with correlation peaking around game 6-8.
In fact, some other studies on non-FF data show similar results for team offensive and defensive strength, with correlation peaking around game 6-8.
Point #3: better data
I've talked a little about what stats to use (player usage data!) but there's something else I'd like to mention as well.
A suggestion for how to incorporate betting data.
I've talked a little about what stats to use (player usage data!) but there's something else I'd like to mention as well.
A suggestion for how to incorporate betting data.
Here's a quick thought experiment that got me started on this path.
If a player is on a high-scoring offense, what's more important... their stats or the team implied total?
Hopefully, it's obvious at this point that it's 99.9% of the player's usage data.
If a player is on a high-scoring offense, what's more important... their stats or the team implied total?
Hopefully, it's obvious at this point that it's 99.9% of the player's usage data.
Next, take another player, on a bad offense. Again, it's all about their usage data.
Now, suppose the players swap teams. How should we handle that?
Well, we assume players will be used the same way, but their environments will change.
Now, suppose the players swap teams. How should we handle that?
Well, we assume players will be used the same way, but their environments will change.
We can do that by modeling their usage + their prior team's strength (e.g. average implied point total) and then adjust for their new team's implied point total.
Create a model with a bunch of players on a bunch of different teams and you've got your framework.
Create a model with a bunch of players on a bunch of different teams and you've got your framework.
And yeah. That's the trick.
Just measure the difference between a player's team's implied total from the running average of their last 4-5 games.
That's it.
Just measure the difference between a player's team's implied total from the running average of their last 4-5 games.
That's it.
Guess what that little trick covers?
- Weather
- Defensive matchups
- QB changes
- Team injuries
- etc, etc.
Now, the projections are still dominated by player usage data, but now betting data is actually helpful for FF projections.
- Weather
- Defensive matchups
- QB changes
- Team injuries
- etc, etc.
Now, the projections are still dominated by player usage data, but now betting data is actually helpful for FF projections.
And that wraps it up.
1. Your fantasy analyst doesn't hate you, it's just variance.
2. Use ~ 4-5 game averages to make *single game* stat calculations
3. Only adjust projections further if the current implied team total is signifcantly different than the last ~5 games
1. Your fantasy analyst doesn't hate you, it's just variance.
2. Use ~ 4-5 game averages to make *single game* stat calculations
3. Only adjust projections further if the current implied team total is signifcantly different than the last ~5 games
< Insert standard disclaimer that I'm learning and by no means is this comprehensive. Just passing a few insights for other folks making projections >