Assessing NHL defensemen is hard.
Just ask MacT about Petry, Fayne, Nikitin, Ference, etc. On second thought, don’t ask MacT!
Anyway … when it comes to assessing players, we (hopefully) already recognize that the eye can be a notoriously unreliable measure. Our eyes tend to notice the big plays and the big mistakes, while ignoring the quietly effective (or ineffective) plays that define 80% of the time spent on ice.
A player who makes the big hit or takes a big shot from the point but is constantly hemmed in the zone might actually look better to the eye than someone who effectively shuts down zone entries time and time again, but never makes a spectacular play and occasionally gets walked. Add in recency bias and confirmation bias and you’ve got yourself a real situation.
This is exactly the kind of problem where we hope that stats, fancy or otherwise, might help us out – by looking objectively at the entire picture of a defenseman’s game, not just the highlights or lowlights.
*** HERESY WARNING ***
Unfortunately, it appears the current crop of popular fancystats don’t help a lot when it comes to assessing defensemen.
You can see this for yourself by running some simple exercises. Pick a common measure, fancy or plain (e.g. Corsi, or points/60), and pull up the list of ‘top’ defensemen by this measure.
Yeah, you’ll see some good defensemen at the top of the list.
You’ll also see a bunch of lousy ones, or at least ones that don’t belong.
Same with the middle of the list. And the bottom of the list.
Consistently inconsistent. It’s the real relation, the underlying theme.
Now, no fancystat type worth his or her salt will ever use a single stat on its own. Ever!
Rather, what we (you?) try to do is use a combination of stats that spotlight and assess specific aspects of the game, and synthesize that (in conjunction with the observations you presumably already have) to form a more complete and objective picture.
You can do this with forwards pretty easily. Take a look at a players boxcars and a Vollman player usage chart (showing TOI, ZS, quality of competition, and Corsi together), and you’ll have a pretty decent profile of that player to complement what you see on the ice.
But not for defensemen.
With defensemen, it doesn’t seem to work that way. With defensemen, I find myself looking at even more data than usual – in addition to boxcars and Vollmans, I’ll typically also look at HERO charts and WOWY (without you with you) data to try and get a more complete picture.
Even when you add these, which show how a defenseman does on a comparative basis on a variety of measures, and his effect on his teammates, you still don’t necessarily get a particularly clear picture.
And the reason is because these are comparative measures – and if the data underlying the comparisons is itself not very useful for assessing defensemen, then what?
For example WOWY data (a good source for this info is puckalytics.com or stats.hockeyanalysis.com), which I personally find probably the most valuable assessment tool for defensemen, only tells you typically how they’re doing Corsi-wise with and without teammates.
And therein lies the conundrum.
If a defenseman has great Corsi with one specific D partner, but poor Corsi without him – and Corsi itself is a poor measure of a defenseman’s contributions … how much have you learned?
Crunching Some Numbers
I ran a little experiment to demonstrate this. Actually, no, scratch that. What I did is I ran an experiment to try and find which fancystats might be good measures for assessing defensemen. After I wrote my previous article about the fancystats mystery that was Nikita Nikitin, I figured a little investigational experimentation was in order.
The problem is that my experiment was a failure.
Because it turns out that, out of a whole slew of fancystats, only the simplest measure – time on ice – showed any real promise.
Here’s what I did.
First, I took a ranking of the ‘top 25’ defensemen in the league from a somewhat mainstream publication. The only things I wanted from this ranking were ‘knowledgeable observer’ subjectivity (i.e. it didn’t mention fancystats anywhere), and recency (so that I could use 2014 season data to compare it against).
Then I collated the rankings of those 25 defensemen for a variety of different measures:
1 – all-situations TOI. The comparison list for ranking included all D who played at least 1,000 minutes in all situations last year.
2 – CF% (Corsi For %). This is a 5v5 measure of how many shot attempts a D was on the ice for vs against. If his team generated 100 shots for and gave up 90 shots against while he was on the ice, he’d get a 52.6% (100 divided by 190). The comparison list included all D who played at least 750 minutes at 5v5 last year.
3 – CA60 (Corsi Against/60). This is a 5v5 shots against metric – how many shot attempts did the team give up per hour while the player was on the ice. Would be considered a purely defensive measure. Same list criteria as above.
4 – CF% Rel – this measures a player vs his teammates. What was the balance of 5v5 shot attempts while the player was on the ice vs off the ice. For example, a +2% would mean the balance of shots was 2% better with a player on the ice than against. Same list criteria as above.
5 – CA60 Rel – as above, but for shot attempts against only. Same comparison criteria: 5v5, 750 minutes.
6 – GF% – like CF%, but now only measuring goals for and against while the player was on the ice. For the people who insist that only goals matter! Again, an even strength measure, 5v5, 750 minutes.
7 – HSCA/60 – this is a war-on-ice statistic, measuring ‘high danger chances against’ per hour. In theory, we’re trying to measure defensemen who are good at keeping down the deadly chances against. Same list criteria.
8 – dCorsi, or delta Corsi. A statistic introduced by Steve Burtch that attempts to adjust Corsi so that we’re measuring how a player would have done relative to an average player with the same teammates and same competition. (Note: the list criteria for this is different than the other data items. The ‘lab’ providing the data does not have an effective filtering mechanism, so I can’t restrict the data to defensemen who’ve played a lot of games. I’m also not entirely sure if it includes EV or all situations. It also appears to include some playoff games. So of necessity, take this list with a bigger grain of salt than usual.)
9 – Corsi Rel QoC – I added this at the suggestion of Darcy McLeod (aka “Woodguy”) of Because Oilers. He’s trying to see if this measure can be used to build a better Quality of Competition metric, especially for defensemen.
The CorRelQoC data was sourced from behindthenet.ca. All of the rest of the data above was sourced from war-on-ice.com. The data for dCorsi can be found under ‘Labs’.
(you can skip this part if you ain’t a numbers geek)
Because all of this data is rankings-based, a standard (Pearson) correlation measure doesn’t work well. Instead, I used what is called a Spearman rank correlation, which is specifically for measuring the similarity of rankings.
Otherwise, the interpretation of the correlation (rho) is the same: -1 is a perfect negative correlation, +1 is a perfect positive correlation, zero is uncorrelated, and the p value tells you the two-tailed significance of the likelihood that the hypothesis of rho=0 is false. I have included p values, but the toolkit I am using tells me these are not particularly reliable until you get to a large dataset (n=500), and I’m nowhere close to that, so take with a grain of salt.
(Lot of salt around here. Watch your blood pressure!)
Here are the correlations for each of the fancystats when compared to the Yahoo! ranking.
The results should be pretty clear. The only stat that matches well with the subjective list is TOI. Otherwise … nada. Zip. Zilch. Nothing else comes even close.
Now of course, this is highly dependent on whether you think the Yahoo list itself is reasonable. Personally? It seems OK to me. Not exactly the one I would have made, but it’s certainly not bad. I deliberately chose one that I did not generate so as to avoid that bias.
A Different Look
A slightly different way of looking at the same data is to see if it is pointing you to styles of play. Style of play is particularly relevant for defensemen, where we speak of offensive and defensive and powerplay specialist defensemen.
You can target your fancystats for these categories. For example, a defensive defensemen you’d hope would limit shot attempts and so look good via CA/60, but for an offensive defensemen you’d expect good results to be related mostly to points (and maybe CF% as well).
To test that idea, I looked at each of the stats above and asked the question, which of the defensemen ranked well in that category out of the original list of 25? And how well do those results match up against my subjective assessment of their styles of play? Maybe at the very least, we can use these stats to ‘bucketize’ the D by styles of play.
But even that isn’t very promising, to be honest.
What I did is just figure out how many of the list of 25 were also in the Top 30 for any particular statistic. Here are the results:
|Metric||Count||Defensemen on Yahoo list also in Top 30 for that Metric|
|TOI||16||Drew Doughty, Duncan Keith, Mark Giordano, Erik Karlsson, PK Subban, Shea Weber, Ryan Suter, Alex Pietrangelo, Oliver Ekman-Larsson, Kris Letang, Zdeno Chara, Roman Josi, Brent Burns, Niklas Kronwall, John Carlson, Tyler Myers|
|P60||12||Mark Giordano, Erik Karlsson, PK Subban, Victor Hedman, Kris Letang, Roman Josi, Brent Burns, Niklas Kronwall, Dustin Byfuglien, Tyson Barrie, Kevin Shattenkirk, John Carlson|
|CF%||7||Drew Doughty, Duncan Keith, Victor Hedman, Kris Letang, Zdeno Chara, Anton Stralman, Kevin Shattenkirk|
|CA/60||5||Drew Doughty, Victor Hedman, Anton Stralman, Niklas Kronwall, Kevin Shattenkirk|
|CF% Rel||9||Duncan Keith, Mark Giordano, Erik Karlsson, PK Subban, Oliver Ekman-Larsson, Kris Letang, Anton Stralman, Tyson Barrie, Kevin Shattenkirk|
|CA/60 Rel||5||Mark Giordano, PK Subban, Oliver Ekman-Larsson, Anton Stralman, Tyson Barrie|
|GF%||5||Victor Hedman, Shea Weber, Roman Josi, Anton Stralman, Kevin Shattenkirk|
|HSCA/60||3||Drew Doughty, Dustin Byfuglien, Kevin Shattenkirk|
|dCorsi||6||Duncan Keith, Mark Giordano, Oliver Ekman-Larsson, Kris Letang, Zdeno Chara, Anton Stralman|
|RelCorQC||11||Mark Giordano, Shea Weber, Ryan McDonagh, Oliver Ekman-Larsson, Kris Letang, Marc-Edouard Vlasic, Zdeno Chara, Roman Josi, Niklas Kronwall, John Carlson, Tyler Myers|
TOI as we saw works pretty well. Points/60 also gives you an OK list for offensive defensemen, though you still see some odd inclusions and exclusions.
With the other stats, you can see the busts pretty quickly. Guys you’d expect would be good ‘box protection’ defensemen like Chara, Weber, Kronwall, and Vlasic don’t show up on the HSCA/60 list – but Shattenkirk does.
Similarly, you’d hope good defensive guys would show up in CA/60, but that’s not a particularly good list. Nor is the CF% list.
Well, I’d say it’s pretty clear. It appears that most of the fancystats we use don’t really add much value, if any, when it comes to ranking defensemen. Certainly not on their own.
This creates a few considerations we should understand:
1 – We want to take any assessment of defensemen that is fancystat based (but does not use TOI) with a bigger grain of salt than usual. Make the author demonstrate the relevance of the stat before you accept the results. I’d say a combination of subjective and objective criteria is a must.
2 – This may set some context as to why a fancystat darling like Mark Fayne has struggled since coming over from New Jersey, or why the unlamented Nikita Nikitin looked better statistically than by eye.
3 – A more subtle consequence: this might explain to some degree why the Corsi-based “Quality of Competition” metric struggles to differentiate players. For the 40% of the skaters on the ice, aka defensemen, it’s using a stat that has close to no meaning. Garbage in, garbage out. It also explains why the TOI-based QoC metric is generally regarded as superior. Because it is! TOI has relevance. But even that might not be so great, since we don’t know if TOI and forward quality correlate at a similar level as do TOI and D quality.
This has also led to the rather (IMO) nonsensical result that many fancystaters have concluded that the importance of quality of competition is low – because the stats don’t show much value.
This is completely at odds with the actual game on the ice, where in-game coaching is all about the matchups. There isn’t a team in the league that wouldn’t jump with joy if they could constantly match their first line against the other teams 3rd pairing and fourth line all night. (Snarky Ed. note: Or in the case of the Oilers, all of their lines and pairings).
The results I expect out of Nugey’s line are almost entirely dependent on who he’s facing that game.
“Quality of competition has no value” is a misguided conclusion, in other words. The flaw here is in the stat, not the results.
We need a better measure for competition before we can hope to effectively measure quality of competition.
Until then, caveat emptor.
Especially when it comes to defensemen.
Note: Keep checking Darcy’s website, because he has promised he’s going to crack this exceedingly difficult nut! 🙂
Also, if you want to recreate any of this work yourself (prove me wrong … please!), you can take a look at/use the ranking data I gathered as a starting point. Download the CSV file here.
23 thoughts on “Fancystats and defensemen – an uneasy combination”
So. Schultz’s relative high TOI should equate him as likely to be Oiler’s best D?
Well, there’s two ways you can look at it.
First, even TOI has only about a 60% correlation with the list of ‘top defensemen’ … that’s a lot of room to maneuver. So you can certainly assume Schultz to be part of the ‘other’ group.
Alternatively, if you look at last nights game, Schultz was once again the TOI leader on D. That makes four straight coaches who have played more of Schultz than anyone else. We have to believe that *at least one* of those coaches was competent.
So sadly, we may indeed have to accept that Schultz is the Oilers’ best D. Doesn’t mean he’s good. Just that he’s the best the Oilers have got. And that explains a lot.
When it comes to the “defensive” stats like CA/60, perhaps we are falling into the trap that Sutter pointed out. The defensemen in that list are very “offensive” AND have a lot of TOI. Perhaps it is because they are attacking so much that it limits the chances against that they see. (Also perhaps a function of ZS).
You bet. Ultimately, that’s what speaks to one of my concluding points, which is that what defensemen do is hard to categorize and hard to describe, and even when we think we’re isolating a specific aspect of the game (“CA/60 for defensive prowess”), it seems like we aren’t. So: “I’d say a combination of subjective and objective criteria is a must.”
Great work G.
IMO the fly in the ointment when using fancystats on defensemen is that all defensemen are on the ice when many goals get scored and it skews all the numbers.
There’s much less a logging an outcome success or failure than with other players or like batters and pitchers in baseball, all defensemen fail to keep the puck out of the net.
I agree. This is one of the reasons I use WOWY (without you with you) data, as it shows how defensive tandems work with each other. I think the more players on the ice you are looking at, the greater the validity of what Corsi (and other shot metrics) tell you.
I’m also working on scripts, and likely starting in a couple of weeks, I will be publishing after every the shot metrics (including distance and type adjusted shot metrics i.e. how dangerous they were) for every forward line and defense pairing.
The fact TOI correlates with good defensemen is not overly surprising. The problem is, TOI does not create a good defenseman. A Good defenseman is given TOI. So you can only use TOI to identify defenseman that have already been identified as good. Not useful at all.
I would be interested in looking at these defenseman prior to them being in the “Top 25”, or when they had less TOI per game, and see what makes them different to the other dmen around them.
For example, perhaps Fayne and… Shattenkirk had similar TOI 3 or 4 years ago. One has jumped up into the top 25. One…. has not. Why? What was different between those two players when their TOI was similar, that could have indicated that one would go one way, and one would go another?
Putting players into buckets based on TOI, and seeing what happens to their TOI in the coming years would tell a story (some players go from 15:00 to 20:00, and some go from 15:00 to 10:00).
This type of analysis would tell you
1. What players could we trade for, and give 20:00 a night, and see success?
2. What players currently at 20 minutes will probably see a decline in minutes in the coming years, because of age, or other factors (ex. Perhaps Chara would stand out as a 20:00 minute guy that is destined to drop in TOI quickly, or maybe slowly)?
That could be an interesting way of looking at things. It would be confounded by the aging factor though. I did look at aging curves for defensemen quite a while ago (I might even have published it as a Fanpost over at Copper ‘n’ Blue), and TOI does show a distinct bowl shape.
Agree with your conclusion and appreciate the work to get there.
So then, what is a good metric for D?
I am not familiar with the information already being captured and available but one of the first measures I would check is rebound chances given up. One of the primary duties of a defenseman (and centre) is to reduce these 2nd opportunities. A Dman needs to do quite a few things right to eliminate these (positioning, strength, communication, “battle”) and therefore seems like a reasonable starting point for a proxy–of course, we (you?) need to interpret within the context of other measures.
Are rebounds/second chance opportunities tracked anywhere?
If so, how do they correlate?
correlate with the Yahoo top 25 – i should clarify
Rebounds aren’t explicitly tracked anywhere, but you can ‘estimate’ them by using a four-second gap rule (if it happens within three or fewer seconds of a previous shot attempt, it’s a rebound).
War-on-ice calculates that metric as part of their scoring chances data, and I’ve added it to my scripts as well. I used that data in my article about “NSF” (Nikitin Schultz Ference), and I did compare them as to how many rebound and rush shots they gave up. Take a look, you might find it interesting.
I plan to publish related stats for every Oiler game starting in a couple of weeks (when my scripts are polished and debugged). I will be estimating rebounds as part of that. I will also be adjusting for “shot danger” – that is to say, looking at what type of shot and how far away it was, not just whether a shot occurred. I trialed that capability off in the NSF article as well.
I’ll be showing that data for every player, every defensive pair, and every forward line. Maybe even every five man unit!
That *might* help to understand better which players are doing a good job and which aren’t.
We know it’s you Staples 😉
Did the chicken come before the egg?
All of these defencemen except for OEL are on playoff caliber teams. Does having a good team system improve the quality of these defencemen or do you need a good defenceman to make the playoffs?
You pretty much nailed it. It *is* a chicken and egg question.
Life would be easier if we had an answer.
Maybe the best you can do is hedge your bets and only sign defensemen from non-playoff teams if they are good (based on a wide variety of measures) AND not too expensive.
I’m likely misunderstanding, but didn’t all you do here was prove that Yahoo’s list and fancy stats don’t agree? Yahoo’s list could be completely wrong, or maybe even be based on TOI! To me there is no reason to take the Yahoo list as the “correct” baseline here.
That’s right. It tells you the correlation between the Yahoo list and a bunch of fancystats.
So, much of the analysis indeed depends on whether you think that list is a reasonable list of ‘top’ defensemen or not. (In my books it definitely is – I wouldn’t change a lot of names. If you would, I’d be curious to hear how different your list would be).
If you accept it’s a reasonable list and therefore most reasonable lists would be similar (not identical but similar), then what you *should* see is modest correlations between the list and the fancystats regardless. Reflecting their similarity but no exactness.
But you don’t. It’s particularly damning that you *do* see that decent correlation with TOI. But nothing else. And that’s a problem (for the fancystats I mean).