I used data to find Arsenal an alternative to Willian…{THREAD}
It’s well documented Willian is getting on. He’s 31 till he's 32, but he turns 32 on Sunday, and giving a big-money contract to an ageing footballer is never ideal… with Özil now permanently absent for whatever reason, the onus has been squarely on Nico Pépé to create.
@Arsenal / @m8arteta has identified the need for an experienced head to help tutor the likes of Saka, Nelson, Smith-Rowe, Willock and Martinelli, without forever blocking their path to the first-team. While it seems sensible, are there better alternatives?
The aim is to explore Europe’s Top 5 Leagues using a combination of two statistical methods ‘Principal Component Analysis’ and ‘K-Means-Clustering’ to find players that share similar statistical traits with Willian.
I appreciate every aspect of football is not statistically measurable and looking at individual stats within a team sport only gives a fragmented view of the bigger picture (not to mention the issues of using stats across different leagues), but I’m all in any way.
PCA allows you to generate 2 main components from the data. This means taking all stats from @fbref (I’ve excluded GKs, dead-ball metric and anyone not lucky enough to play 1,000 mins) and extracting 2 new super-components that represent an amalgamation of all their previous.
These 2 Principal Components are the best possible combination of 2 metrics for statistically conveying how players differ from one another.
For anyone reading that wasn’t at the 2015 @OptaPro presentation (kudos to Will Gürpınar-Morgan), each point on the (following) plot represents a player.
Players at the bottom score highly in defence. Players on the right do well in involvement stats. The higher their position, the better the performance in attack. On the left are those who do well in the air. I’ve left a few players annotated to hammer home the point.
Willian finds himself on the creative side but positioned fairly low for an attack-minded player, suggesting there could be the odd defensive qualities in his game.
The goal of clustering is to determine groups (clusters) based upon relationships within the data and group those that share similar statistical traits. I only looked at players aged 27 and over due to the need for experience and ideally wanted those with PL pedigree.
The players the algorithm spat out were: Groß, Hazard, Lamela, Özil and Mkhitaryan. All things considered, only Lamela was added to the shortlist. I ventured outside the PL adding a couple more from the cluster who I believe are good enough to improve Arsenal: Perišić and Insigne
The 5 metrics used are the primary features from Willian’s cluster and he's hanging with the elites in all of them.
Finally, I dropped stragglers Perišić and Lamela and compared Willian against Insigne (valued at £43.20m on transfermarkt).
A lot has been made about paying @willianborges88 £100k+ a week, but when you consider the average age of the academy products and the fact that bringing in anyone similar would demand a transfer fee, @Arsenal could do a lot worse if looking for a similar alternative.
Thanks to @biscuitchaser who answered all my PCA questions. I had given a similar presentation with @JoshPandz and @PeteWarwick and wanted the fruits of my labour to be online.
@AFHStewart excuse the @, but it's data-driven scouting so (kind of) relevant...
You can follow @jonollington.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.