I've noticed some new followers after some of my goals saved above expected (GSAx) data was featured last night on @Sportsnet.
For anybody who isn't entirely familiar with how expected goals (xG) works or just wants a refresher, here's a quick high-level overview:
For anybody who isn't entirely familiar with how expected goals (xG) works or just wants a refresher, here's a quick high-level overview:
My xG model uses various information derived from the NHL's Play-By-Play (PBP) data to determine the probability that an unblocked shot (on goal or missed) is a goal.
The info used includes but is not limited to distance & angle from the net, shot type, and game strength state.
The info used includes but is not limited to distance & angle from the net, shot type, and game strength state.
Why include missed shots? Because forcing shots wide is a skill that goalies demonstrate by taking up more of the net and playing their angles well, while hitting the net is a skill that shooters possess by shooting more accurately. Both should be credited for their proficiency.
I built my model using extreme gradient boosting, an efficient machine learning technique.
Basically, I showed my computer 3 years of shots, told it which were goals, and taught it to identify patterns that lead to goals. Now it's guessing which of this year's shots will score.
Basically, I showed my computer 3 years of shots, told it which were goals, and taught it to identify patterns that lead to goals. Now it's guessing which of this year's shots will score.
How well is it guessing? I'd say damn well, considering the limitations. The PBP data is notoriously inaccurate and missing key info like passes and traffic.
Despite these limitations, the area under curve (AUC) for the model is 0.785 at all strengths and 0.797 at even strength.
Despite these limitations, the area under curve (AUC) for the model is 0.785 at all strengths and 0.797 at even strength.
An model with AUC between 0.7 and 0.8 is considered "fair" and an AUC over 0.8 is considered "good." So, these values are strong enough that I'm comfortable using xG as a descriptive measure of how teams control quality shots and how shooters & goalies contribute to them scoring.
Want data from my xG model? My Tableau contains team, goalie, and skater level data and visualizations. It's updated nightly.
@JFreshHockey's Patreon also contains some slick, exclusive visualizations with my xG data that make things easy to understand. https://public.tableau.com/profile/topdownhockey#!/
@JFreshHockey's Patreon also contains some slick, exclusive visualizations with my xG data that make things easy to understand. https://public.tableau.com/profile/topdownhockey#!/
Looking for a more in-depth writeup on expected goals and my model in particular? Here's my full write-up that I put out when I dropped the model.
Any other questions about xG and hockey analytics in general? My DMs are open and I'm happy to clarify. https://topdownhockey.medium.com/a-new-expected-goal-model-that-is-better-than-corsi-at-predicting-future-goals-ecfa44dc84e9
Any other questions about xG and hockey analytics in general? My DMs are open and I'm happy to clarify. https://topdownhockey.medium.com/a-new-expected-goal-model-that-is-better-than-corsi-at-predicting-future-goals-ecfa44dc84e9