Assessing NHL defensemen is hard.
Just ask MacT about Petry, Fayne, Nikitin, Ference, etc. On second thought, don’t ask MacT!
Anyway … when it comes to assessing players, we (hopefully) already recognize that the eye can be a notoriously unreliable measure. Our eyes tend to notice the big plays and the big mistakes, while ignoring the quietly effective (or ineffective) plays that define 80% of the time spent on ice.
A player who makes the big hit or takes a big shot from the point but is constantly hemmed in the zone might actually look better to the eye than someone who effectively shuts down zone entries time and time again, but never makes a spectacular play and occasionally gets walked. Add in recency bias and confirmation bias and you’ve got yourself a real situation.
This is exactly the kind of problem where we hope that stats, fancy or otherwise, might help us out – by looking objectively at the entire picture of a defenseman’s game, not just the highlights or lowlights.
*** HERESY WARNING ***
Unfortunately, it appears the current crop of popular fancystats don’t help a lot when it comes to assessing defensemen.
You can see this for yourself by running some simple exercises. Pick a common measure, fancy or plain (e.g. Corsi, or points/60), and pull up the list of ‘top’ defensemen by this measure.
Yeah, you’ll see some good defensemen at the top of the list.
You’ll also see a bunch of lousy ones, or at least ones that don’t belong.
Same with the middle of the list. And the bottom of the list.
Consistently inconsistent. It’s the real relation, the underlying theme.
Now, no fancystat type worth his or her salt will ever use a single stat on its own. Ever!
Rather, what we (you?) try to do is use a combination of stats that spotlight and assess specific aspects of the game, and synthesize that (in conjunction with the observations you presumably already have) to form a more complete and objective picture.
You can do this with forwards pretty easily. Take a look at a players boxcars and a Vollman player usage chart (showing TOI, ZS, quality of competition, and Corsi together), and you’ll have a pretty decent profile of that player to complement what you see on the ice.
But not for defensemen.
With defensemen, it doesn’t seem to work that way. With defensemen, I find myself looking at even more data than usual – in addition to boxcars and Vollmans, I’ll typically also look at HERO charts and WOWY (without you with you) data to try and get a more complete picture.
Even when you add these, which show how a defenseman does on a comparative basis on a variety of measures, and his effect on his teammates, you still don’t necessarily get a particularly clear picture.
And the reason is because these are comparative measures – and if the data underlying the comparisons is itself not very useful for assessing defensemen, then what?
For example WOWY data (a good source for this info is puckalytics.com or stats.hockeyanalysis.com), which I personally find probably the most valuable assessment tool for defensemen, only tells you typically how they’re doing Corsi-wise with and without teammates.
And therein lies the conundrum.
If a defenseman has great Corsi with one specific D partner, but poor Corsi without him – and Corsi itself is a poor measure of a defenseman’s contributions … how much have you learned?
Crunching Some Numbers
I ran a little experiment to demonstrate this. Actually, no, scratch that. What I did is I ran an experiment to try and find which fancystats might be good measures for assessing defensemen. After I wrote my previous article about the fancystats mystery that was Nikita Nikitin, I figured a little investigational experimentation was in order.
The problem is that my experiment was a failure.
Because it turns out that, out of a whole slew of fancystats, only the simplest measure – time on ice – showed any real promise.
Here’s what I did.
First, I took a ranking of the ‘top 25’ defensemen in the league from a somewhat mainstream publication. The only things I wanted from this ranking were ‘knowledgeable observer’ subjectivity (i.e. it didn’t mention fancystats anywhere), and recency (so that I could use 2014 season data to compare it against).
Then I collated the rankings of those 25 defensemen for a variety of different measures:
1 – all-situations TOI. The comparison list for ranking included all D who played at least 1,000 minutes in all situations last year.
2 – CF% (Corsi For %). This is a 5v5 measure of how many shot attempts a D was on the ice for vs against. If his team generated 100 shots for and gave up 90 shots against while he was on the ice, he’d get a 52.6% (100 divided by 190). The comparison list included all D who played at least 750 minutes at 5v5 last year.
3 – CA60 (Corsi Against/60). This is a 5v5 shots against metric – how many shot attempts did the team give up per hour while the player was on the ice. Would be considered a purely defensive measure. Same list criteria as above.
4 – CF% Rel – this measures a player vs his teammates. What was the balance of 5v5 shot attempts while the player was on the ice vs off the ice. For example, a +2% would mean the balance of shots was 2% better with a player on the ice than against. Same list criteria as above.
5 – CA60 Rel – as above, but for shot attempts against only. Same comparison criteria: 5v5, 750 minutes.
6 – GF% – like CF%, but now only measuring goals for and against while the player was on the ice. For the people who insist that only goals matter! Again, an even strength measure, 5v5, 750 minutes.
7 – HSCA/60 – this is a war-on-ice statistic, measuring ‘high danger chances against’ per hour. In theory, we’re trying to measure defensemen who are good at keeping down the deadly chances against. Same list criteria.
8 – dCorsi, or delta Corsi. A statistic introduced by Steve Burtch that attempts to adjust Corsi so that we’re measuring how a player would have done relative to an average player with the same teammates and same competition. (Note: the list criteria for this is different than the other data items. The ‘lab’ providing the data does not have an effective filtering mechanism, so I can’t restrict the data to defensemen who’ve played a lot of games. I’m also not entirely sure if it includes EV or all situations. It also appears to include some playoff games. So of necessity, take this list with a bigger grain of salt than usual.)
9 – Corsi Rel QoC – I added this at the suggestion of Darcy McLeod (aka “Woodguy”) of Because Oilers. He’s trying to see if this measure can be used to build a better Quality of Competition metric, especially for defensemen.
The CorRelQoC data was sourced from behindthenet.ca. All of the rest of the data above was sourced from war-on-ice.com. The data for dCorsi can be found under ‘Labs’.
(you can skip this part if you ain’t a numbers geek)
Because all of this data is rankings-based, a standard (Pearson) correlation measure doesn’t work well. Instead, I used what is called a Spearman rank correlation, which is specifically for measuring the similarity of rankings.
Otherwise, the interpretation of the correlation (rho) is the same: -1 is a perfect negative correlation, +1 is a perfect positive correlation, zero is uncorrelated, and the p value tells you the two-tailed significance of the likelihood that the hypothesis of rho=0 is false. I have included p values, but the toolkit I am using tells me these are not particularly reliable until you get to a large dataset (n=500), and I’m nowhere close to that, so take with a grain of salt.
(Lot of salt around here. Watch your blood pressure!)
Here are the correlations for each of the fancystats when compared to the Yahoo! ranking.
The results should be pretty clear. The only stat that matches well with the subjective list is TOI. Otherwise … nada. Zip. Zilch. Nothing else comes even close.
Now of course, this is highly dependent on whether you think the Yahoo list itself is reasonable. Personally? It seems OK to me. Not exactly the one I would have made, but it’s certainly not bad. I deliberately chose one that I did not generate so as to avoid that bias.
A Different Look
A slightly different way of looking at the same data is to see if it is pointing you to styles of play. Style of play is particularly relevant for defensemen, where we speak of offensive and defensive and powerplay specialist defensemen.
You can target your fancystats for these categories. For example, a defensive defensemen you’d hope would limit shot attempts and so look good via CA/60, but for an offensive defensemen you’d expect good results to be related mostly to points (and maybe CF% as well).
To test that idea, I looked at each of the stats above and asked the question, which of the defensemen ranked well in that category out of the original list of 25? And how well do those results match up against my subjective assessment of their styles of play? Maybe at the very least, we can use these stats to ‘bucketize’ the D by styles of play.
But even that isn’t very promising, to be honest.
What I did is just figure out how many of the list of 25 were also in the Top 30 for any particular statistic. Here are the results:
|Metric||Count||Defensemen on Yahoo list also in Top 30 for that Metric|
|TOI||16||Drew Doughty, Duncan Keith, Mark Giordano, Erik Karlsson, PK Subban, Shea Weber, Ryan Suter, Alex Pietrangelo, Oliver Ekman-Larsson, Kris Letang, Zdeno Chara, Roman Josi, Brent Burns, Niklas Kronwall, John Carlson, Tyler Myers|
|P60||12||Mark Giordano, Erik Karlsson, PK Subban, Victor Hedman, Kris Letang, Roman Josi, Brent Burns, Niklas Kronwall, Dustin Byfuglien, Tyson Barrie, Kevin Shattenkirk, John Carlson|
|CF%||7||Drew Doughty, Duncan Keith, Victor Hedman, Kris Letang, Zdeno Chara, Anton Stralman, Kevin Shattenkirk|
|CA/60||5||Drew Doughty, Victor Hedman, Anton Stralman, Niklas Kronwall, Kevin Shattenkirk|
|CF% Rel||9||Duncan Keith, Mark Giordano, Erik Karlsson, PK Subban, Oliver Ekman-Larsson, Kris Letang, Anton Stralman, Tyson Barrie, Kevin Shattenkirk|
|CA/60 Rel||5||Mark Giordano, PK Subban, Oliver Ekman-Larsson, Anton Stralman, Tyson Barrie|
|GF%||5||Victor Hedman, Shea Weber, Roman Josi, Anton Stralman, Kevin Shattenkirk|
|HSCA/60||3||Drew Doughty, Dustin Byfuglien, Kevin Shattenkirk|
|dCorsi||6||Duncan Keith, Mark Giordano, Oliver Ekman-Larsson, Kris Letang, Zdeno Chara, Anton Stralman|
|RelCorQC||11||Mark Giordano, Shea Weber, Ryan McDonagh, Oliver Ekman-Larsson, Kris Letang, Marc-Edouard Vlasic, Zdeno Chara, Roman Josi, Niklas Kronwall, John Carlson, Tyler Myers|
TOI as we saw works pretty well. Points/60 also gives you an OK list for offensive defensemen, though you still see some odd inclusions and exclusions.
With the other stats, you can see the busts pretty quickly. Guys you’d expect would be good ‘box protection’ defensemen like Chara, Weber, Kronwall, and Vlasic don’t show up on the HSCA/60 list – but Shattenkirk does.
Similarly, you’d hope good defensive guys would show up in CA/60, but that’s not a particularly good list. Nor is the CF% list.
Well, I’d say it’s pretty clear. It appears that most of the fancystats we use don’t really add much value, if any, when it comes to ranking defensemen. Certainly not on their own.
This creates a few considerations we should understand:
1 – We want to take any assessment of defensemen that is fancystat based (but does not use TOI) with a bigger grain of salt than usual. Make the author demonstrate the relevance of the stat before you accept the results. I’d say a combination of subjective and objective criteria is a must.
2 – This may set some context as to why a fancystat darling like Mark Fayne has struggled since coming over from New Jersey, or why the unlamented Nikita Nikitin looked better statistically than by eye.
3 – A more subtle consequence: this might explain to some degree why the Corsi-based “Quality of Competition” metric struggles to differentiate players. For the 40% of the skaters on the ice, aka defensemen, it’s using a stat that has close to no meaning. Garbage in, garbage out. It also explains why the TOI-based QoC metric is generally regarded as superior. Because it is! TOI has relevance. But even that might not be so great, since we don’t know if TOI and forward quality correlate at a similar level as do TOI and D quality.
This has also led to the rather (IMO) nonsensical result that many fancystaters have concluded that the importance of quality of competition is low – because the stats don’t show much value.
This is completely at odds with the actual game on the ice, where in-game coaching is all about the matchups. There isn’t a team in the league that wouldn’t jump with joy if they could constantly match their first line against the other teams 3rd pairing and fourth line all night. (Snarky Ed. note: Or in the case of the Oilers, all of their lines and pairings).
The results I expect out of Nugey’s line are almost entirely dependent on who he’s facing that game.
“Quality of competition has no value” is a misguided conclusion, in other words. The flaw here is in the stat, not the results.
We need a better measure for competition before we can hope to effectively measure quality of competition.
Until then, caveat emptor.
Especially when it comes to defensemen.
Note: Keep checking Darcy’s website, because he has promised he’s going to crack this exceedingly difficult nut! 🙂
Also, if you want to recreate any of this work yourself (prove me wrong … please!), you can take a look at/use the ranking data I gathered as a starting point. Download the CSV file here.