So far I’ve crunched three seasons worth of data – 2014/15 so far, 2013/2014, and 2012/2013.
They are in CSV format and hopefully the content is self-explanatory. I’ll be updating these with fixed, updated, and enhanced information on a regular basis. Two more seasons to go.
You can download the CSV’s here:
- Updated! Season 2014/2015, all 1230 games (to end of season, April 12th, 2015)
- Season 2013/2014, 1230 games, missing games 108, 109, 855, and 871 due to parsing errors.
- Season 2012/2013, 624 games, missing game 578 due to parsing errors.
Dataset for the predictive/comparable variables post:
If you’re using this data, check back regularly for adds and updates.
Data TO DO List:
- Finish parsing and converting seasons 2011/2012 and 2010/2011
- Add the missing games for each season (if I can, some appears to be bad data on the NHL site … GIGO)
- Fix some errors with shot locations (shots that occur in the same second can get the location information duplicated … haven’t figured a clever way to fix this yet)
- Add scores and situation columns (e.g. “close”, “up 2”, and so on)
- Add shot quality location indicators – low, medium, high, specific to shot type, based on the location data generated from the same three seasons
- Add temporal indicators – rush, rebound, cycle
- Separate players on ice out from a single column to 14 separate columns (Players 1 to 6 plus goalie, for each team). Need 6 players for when the goalie is pulled.
- I may parse and convert the other related RTSS files (shifts and roster).
- I may try to adjust for rink shot location bias.
Metrics for defenders:
- Does the conversion rate of Corsi (or Fenwick or shots) to scoring chances (per war-on-ice) provide a useful metric for evaluating defenders?
Orange for McDavid Project:
- NHLE using only the Top 5 picks – DONE!
I should add one last key link … before I started collecting the event data myself (since there is no other published dataset available that I could find), the number crunching that I did do was collected from a variety of sites. However, the key data came from Rob Vollman, who has made some enormous datasets freely available. While some of them are the more esoteric datasets (e.g. GVT), Vollman has published spreadsheets (up to 350 data points each) on every player and goalie since 1967.
Staggering. Such an amazing and generous contribution. Making my data available is just my feeble attempt at walking in those same footsteps.
The link to his data page if you are interested is here: http://www.hockeyabstract.com/testimonials
(not sure why the page is linked with ‘testimonials’, it actually takes you to his statistics page).