It has been a while since I posted anything here at OilersNerdAlert, though since the season ended I have had the occasional rant over at BeerLeagueHeroes.com.
The quietness is not indicative of a lack of activity though! Woodguy (@Woodguy55, becauseoilers.blogspot.ca) and I have been hammering and chiseling and bulldozing away on what we feel is potentially groundbreaking new analytical work around quality of competition.
It’s looking fantastic, and so we’ll soon be publishing a whole boatload of that work at BLH and at Because Oilers, along with explanations, charts, and data feeds – an analytical smorgasbord!
A healthy aspect of this new work relies on my Dangerous Fenwick statistic. I explained what it is and how it is calculated last October. I also alluded at that time that I’d done some quick statistical work on it that gave me comfort that it was a useful and valid statistic, but I never published that work.
As we’re going forward with this new work, it will likely put a spotlight on DangerFen, so I figure I better get off my duff and explain to you why its OK to believe.
If you’re a skeptic and need this info, please read on.
And if you’re not – that’s perfectly OK, stay tuned for a ‘quality of competition’ tidal wave!
Split Half Reliability
A key initial test for any statistic is that it should have reasonable split half reliability – that is, if you randomly take half the games in a season for any given team, and compare that to the other half of the games, they should show a strong relationship.
If they fail to show that relationship, then the statistic in question may very well not be measuring anything repeatable enough to have value.
Shot metrics have a second hurdle, which is that they should show reliability that is at least approximately in line with old venerable Corsi. Otherwise, whatever methods are being used to generate the new statistic are arguably adding more noise than value.
The way in which I tested this is I used random sampling to to split the 2015-2016 season’s games into two groups of 41 (some use even-odd splits for this, but it’s my own particularly oddity that I prefer statistical tests be based on random samples) for each NHL team. I then calculated DFF% for that team in each of the resulting two halves.
These were repeated for every team, and the correlation between the first and second half of the resulting data set was calculated. As a control, I also calculated the correlation for Corsi and for Fenwick (on which Dangerous Fenwick is based) for the same random samples.
Here’s how that looks.
Dangerous Fenwick Split Half
(The r value is shown, square it for R^2). All data shown here is 5v5 data. The data values for Dangerous Fenwick were generated by me; all other values are from corsica.hockey. As you can see, there is a healthy correlation between the two splits for Dangerous Fenwick.
Corsi and Fenwick Split Half
Unsurprisingly, both Corsi and Fenwick show good split half reliability, with Corsi in the lead. The split half correlations of all three metrics are very close, and all have high statistical significance.
It’s commonplace to run this many times and use the average results of the runs. However, these results are in line with other tests I’ve reviewed, so I’m comfortable that this is a valid result.
Even so, I did rerun the calculation a handful of times (each time generates a different random sample so the results are always slightly different) to confirm. The pattern seen here is quite consistent and representative: the correlations are high (generally .72 to .82), the three metrics are close, and Corsi generally has a slight edge in correlation followed by DangerFen then Fenwick.
All of which suggests to me that the adjustments that are being made to Fenwick to incorporate “Danger” are not adding noise, and instead are adding useful information to the metric.
The ‘xGF’ Test
There are a variety of additional tests that can be used to give additional confidence in the metric, but I felt there was a useful shortcut available to me.
It’s my opinion that Emmanuel Perry’s (@MannyElk, corsica.hockey) xGF metric is arguably the gold standard for danger-weighted metrics. (Perhaps Nick Abe and a few others might disagree).
My thought process is that if Danger Fenwick indicates much the same things as xGF does, I have comfort that what DFF is telling me has validity.
(In fact, if I had access to how Manny’s xGF is calculated, I would even consider replacing Danger Fen with xGF in my stats programs, but as I don’t have that, I’m using Danger Fen as an acceptable replacment. And I’ll show you here why I think it’s just fine as a substitute!)
First, here’s a panel of charts showing a variety of scatter plots (with regression line) visualizing the relationship between DFF% (Dangerous Fenwick), xGF% (Perry’s Expected Goals), CF% (unadjusted Corsi), FF% (unadjusted Fenwick), and GF% (goals) for all 30 teams for the 2015-2016 season:
You can see where the metrics have a lot of similarities, and where they don’t. In particular, note how well DFF and xGF track each other.
Specifically, here are selected correlation values:
- DFF% and xGF%, r = 0.954. I’ve casually mentioned before that DFF and xGF are well correlated – and when I say well correlated, I do mean *well* correlated.
- DFF% and CF% correlation is r = 0.853, and xGF% and CF% is r = 0.852, so both metrics correlate with Corsi equally well.
- The correlation of DFF% to GF% is r = 0.462, the correlation of xGF% and GF% is r = 0.431, and the correlation of CF% to GF% is r = 0.345. In many ways, shot metrics are simply a large-sample proxy for what really counts, which is goals. As a metric, goals (or GF%) has too much noise to be of much use within a single season, but it’s still of interest to see how the large sample metrics track goals scored. In this sample, DFF% actually correlates/predicts goals a hair *better* than xGF% (though I don’t believe that what we’re seeing is likely to be a sustainable difference). Perhaps more importantly, both metrics outperform raw Corsi by quite a noteable margin.
And with that, I’ll end this ‘light’ statistical analysis.
The split half test tells me that Dangerous Fenwick is as or more reliable than raw Fenwick, and almost on par with Corsi.
The correlations with the wider set of metrics tell me that the danger-weight adjustments being made to Fenwick to create Dangerous Fenwick add significant value to the metric over raw Fenwick or Corsi.
Additionally, the way in which Danger Fen tracks xGF puts it at least close to being in the same league as that metric, despite DFF being a simpler metric.
I can test more, and in future I might – but for now, I’m comfortable with where it’s at.