Yes, even the machines want Connor McJesus on their side! Connor vs Connor! Cano a cano?
Wait, let me rewind a bit.
It’s a foregone conclusion the Oilers draft McDavid, right? The big question of course is: just how good is he?
Traditionally, the way to estimate this is using NHLE (NHL Equivalencies, a la Desjardins and others). Using an NHLE of 0.3 from CHL to NHL, that puts McDavid at about 62 to 63 points over a full season.
But those NHLE are for the entire league, and history suggests they work well for the average player but underestimate elite players.
I took a swing at estimating an NHLE for elite players a while back, using the pretty much standard approach of multiple regression. Because I restricted this to elite players drafted in the top 6 from the CHL in the last 10 years, small sample size (only 17 players) was a significant problem. Nonetheless, the results arguably made more sense, putting McDavid closer to 72 to 74 points.
But still. Sample size!
As I mulled the issue, I wondered if there were any machine learning algorithms that might be worth trying.
And indeed, after a bit of digging, there is an algorithm that at least looks like it applies and will produce some interesting results. It’s called a support vector machine, SVM for short.
The way in which this technique works is completely different from multiple regression. Rather than trying to fit a model to the data, what the SVM does is it takes the parameters of the source data you are using and creates a “space” out of it. If you only have two parameters, it will form a conceptual plane. But if you have more than three parameters, you are now creating a conceptual hyperspace. Cool huh!
What the SVM then does is it takes your source data, also called training data, puts it in that hyperspace, and then divides the hyperspace up into sections (forming complex multidimensional ‘surfaces’) that neatly form spaces around each one of those training points.
Once you’ve done this, you can feed new data (a player) into it, and rather than trying to make a prediction based on that new data, the SVM determines which pre-existing point your new data is closest to and returns that.
In other words, it’s a sophisticated way of doing what we humans do all the time – generating player comparables!
The cool thing is that an SVM will work even when the dataset is smaller than the dimensionality of the data.
For this SVM, I used the same dataset of 17 elite players as I used previously – these 17 then become the comparables. Then I trained two different SVMs: one with two parameters (gpg, apg), and one with three parameters (gpg, apg, age). Lastly, I fed in five players of interest to each of these SVMs to get a prediction (or more properly, a classification).
There results were certainly “interesting”!
Here’s what the SVM comes up with for similarity using just goals per game and assists per game as the data parameters:
|SVM Classification Results – goals and assists only:|
|Connor McDavid||0.936||1.617||1.259||Sidney Crosby|
|Dylan Strome||0.662||1.235||0.646||Taylor Hall|
|Lawson Crouse||0.518||0.393||0.519||Jordan Staal|
|Mitch Marner||0.698||1.302||0.646||Taylor Hall|
|Pavel Zacha||0.432||0.486||0.519||Jordan Staal|
Connor and Sid. Makes sense. As for the rest … hmmmm.
Now bear in mind, this classifier is NOT telling you that Dylan Strome is going to be the next Taylor Hall. What it tells you is that, of the 17 players in the comparables list, Dylan Strome’s pre-draft numbers are closest to Taylor Hall’s pre-draft numbers.
Interesting that these five players gravitate towards just three different season archetypes, Crosby, Hall, and J. Staal.
The second SVM adds in age as a parameter, and it gets even more interesting:
|SVM Classification Results – goals, assists, and age:|
|Connor McDavid||0.936||1.617||18.4||0.878||Patrick Kane|
|Dylan Strome||0.662||1.235||18.2||0.562||Alex Galchenyuk|
|Lawson Crouse||0.518||0.393||17.9||0.519||Jordan Staal|
|Mitch Marner||0.698||1.302||18.1||0.839||Ryan Nugent-Hopkins|
|Pavel Zacha||0.432||0.486||18.2||0.679||Matt Duchene|
Note: the age parameter is the age of the player on June 1 of their draft year.
Oooooh. McJesus is no longer as much like Sid as Patrick Kane.
While Patrick Kane is hardly a slouch at hockeying … should we be concerned? I don’t think so.
Crosby’s data point is way old, and from a different scoring era. If you run Crosby through the regression model I built earlier, he comes in at 77 points. And in this era – I bet that would be bang on.
The SVM model puts Connor at about 72 points in a full season, and I suspect that may be about right.
And make no mistake, in this day and age, that will be an incredible rookie season.
Other than that, there isn’t much I’d read from this prediction, for him and the others. It was more for fun. I will continue to keep a lookout for where other machine learning algorithms can be applied. Hmm, maybe a neural network!
In any case, there is one thing we can be sure the machines are saying to Connor (McDavid, not John or Sarah):