Sunday, 5 October 2014

Relative Zone Start Charts: Better Visualizing Coaches' Deployment

I've been thinking for a while about what we miss when looking at zone starts. On a micro basis, with just one team, we can look at a player usage chart and see a visualization of how coaches deploy players: typically there's a cluster of players in the middle, a few 'shutdown'/third line guys on the left with few offensive zone starts, and a couple of sheltered rookies or offensive specialists over on the right with a fairly high offensive zone start percentage. Here's a fairly typical example from last season in Dallas (min. 30 games played):
My favorite part about team usage charts is how they combine deployment and production. It's sort of the coaching intent and the outcome rolled into one graph. But when you compare two players on different teams with similar offensive zone start percentages, the human element gets lost. Thanks to hockey-reference's recently unveiled advanced stats, including team zone start percentages, for every team from 2007-14, I've been able to express those differences more clearly.

Last season, teams' offensive zone start percentage ranged from 42.1% (Toronto) to 55% (Chicago). The most sheltered skater for Toronto who played more than half the season was Nazem Kadri at 49.3% and their toughest deployments went to Jay McClement with 28.6%. In fact, Toronto was so bad last season that Rob Vollman's usage charts have even Kadri categorized as a "shut-down" player.  Ignoring Chicago for the moment because they give all the worst zone starts to just one line, LA is next at 53.6%. Their zone starts range from 51.8% (Willie Mitchell) to 59.3% (Andrew Campbell). Just to emphasize how absolutely absurd this is:
See where this is leading? Most-sheltered Nazem Kadri would have the toughest zone starts on a Kings squad. That's amazing. And when you consider that he's probably the kind of player whose production you would want to see in sheltered starts, it certainly lends perspective to his production.

My solution to this problem of cross-team visualization is to use a version of a statistic that used to appear on ExtraSkater (EDIT: I'm now told that war-on-ice has them as well) for relative zone starts compared to the team average:

Relative offensive zone start percentage = Individual player's offensive zone start percentage - Team average offensive zone start percentage

(Disclaimer: this isn't strictly a relative stat, at least not in its current incarnation. That would require recalculating to compare the average of all the zone starts when the player was not on the ice, rather than using a team average that includes his deployment, to his own offensive zone start percentage.)

Pretty simple. Because I wanted a visualization that included this variety of deployment more effectively than standard player usage charts, I made a league-wide chart that has relative offensive zone start percentage on the x-axis and relative Corsi on the y-axis. (The size of the bubble doesn't mean anything here; if you want something fancy, feel free to put it together yourself). Clicking on bubbles should bring up the player's name, or you can go directly to the Tableau page here and play around further. 

I think this accomplishes what I wanted. For production purposes, Rob Vollman's player usage charts will always be the gold standard. But if you wanted to compare players who are sheltered relative to their teams across the league to see how well they tend to produce compared to their teams, you can do that. For example, Jeremy Morin (topmost dot on the right) and Nail Yakupov (rightmost dot near the horizontal axis) have nearly the same zone start percentage, but Morin posted a better relative Corsi than Yakupov did. Improving this graph by adding QoC or QoT would add a much-needed dimension that I suspect would explain some of the discrepancy.

I also think it's interesting to think about this kind of visualization in the context of trades or free agent signings. Teams look for players to fill specific kinds of roles. It would be easy to miss a Nazem Kadri in the search for a center who produces decently in carefully curated offensive zone starts--just as easily as missing Willie Mitchell when seeking someone who will play tougher minutes. Zone starts by themselves simply aren't enough when looking for these players, because on a team as bad as Toronto was last year, there isn't an opportunity for Kadri to get the kind of deployment that would likely benefit his production. Despite griping by me and many bloggers about nonsensical deployment decisions, on the whole I do value where coaches choose to play their skaters. And obviously teams and GMs do even more so.

Aptly, Matt has a few resources (some already up and some arriving soon) to present and tweak these stats. On the Player Stats tab, you have the option to adjust for zone starts and score state. And Progressive Hockey will soon have both team zone start percentages and player relative zone start statistics--the latter calculated via better methodology than mine.

Megan blogs intermittently about whichever hockey stats catch her fancy at She can be reached on twitter at @butyoucarlotta, or via email at shinnystats at gmail.

Wednesday, 1 October 2014

Guess What? The Pre-Season Doesn't Matter

I looked at NHL team's goal differential in the pre-season and how they went on to do, by goal differential in the regular season. I found very little evidence of correlation between the two.

Last seasons edition of the New York Rangers had a calamitous 29% goal differential before going on to the Stanley Cup Final, while the Sabres had a 56% differential going into their not-so-great campaign. 

You probably already knew that you shouldn't take much away from your team's performance in the pre-season, but now you have the evidence!

For those who are interested, here's some more statistical information on the model:

Thursday, 25 September 2014

Can Players Impact Their On-Ice Faceoff Percentage?

This is a question that's popped up recently due to the new adjustments we've had for zone starts. It's obvious that zone starts effect a player's on-ice outcomes, and Tyler Dellow has proven how much a faceoff wins or loss can further impact a player's performance.

So it's clear that a further context needed to assess player production will be not just zone starts, but the faceoff percentage in those zone starts. So the question now is, who exactly is this context for and for who is it actually part of their performance?

I've heard arguments either way about if wingers and defensemen are able to affect faceoff percentage, and today I decided to find out definitively.

The Process

I took every player's on-ice faceoff percentage with every center the player was paired with, and compared that with how the center's faceoff percentage without that player on the ice.

To get an idea of what the data looks like, here's Phil Kessel's FO percentage "WOWY"

We then take the change in faceoff percentage observed from when the faceoff taker is matched with Kessel compared to when they're not and weight that by the amount of faceoffs they were on the taker. In total, Kessel had a +1.9% effect on his on-ice faceoff percentage.

The Results

This is a graph that shows a record for every player's Change in on-ice FO% compared to how many faceoffs he's been on the ice for in total.

The narrowing of the statistic is obvious, demonstrating the normalization that would be expected with a larger sample size of faceoffs.

Here's a graph with player's who've been on the ice for more then 2000 faceoffs in their career, and their impact on faceoff percentage by faceoff zone, with boxplots to demonstrate the range of values observed:

Corey Perry is the champion of this statistic. In all three faceoff zones , Perry has had an unbelievable 4.1% positive effect on his centers faceoff percentage.

In the offensive zone, wingers seem to have a larger potential for impact and conversely in the defensive zone defensemen seem to be able to have a larger average deviation from the mean. 


The majority of players with a large enough sample size have had less then a 1% affect on their center's faceoff percentage, but the success of player's like Corey Perry and the failures of Ryan Callahan suggest some players can affect it in a marginal way.

For those interested in playing around with this statistic, check out the interactive graph below.

From Winger to Center: How Players Are Affected By Moving Positions

I don't trust the positions that the NHL provides. Wingers are often labeled as centers, and that can make certain types of analysis hard to do. So to figure out who's actually playing in the middle, I look at the faceoffs: Players who take the majority of the draws while they're on the ice are listed as C's, and those who aren't are listed as wingers.

More centers come into the league then there are spots available for them. This is because the best forwards are frequently playing as centers when they are the dominant player on their team. So pre-season is the time to talk about position swapping. As players with some good NHL performances under their belt start rattling the cages of moving to most forward's preferred spot on the ice: down the middle, I was interested in how these players who do transition from winger to center perform.

I used Progressive Hockey's own Rel. Exp GF% to compare their season at their new position to how they did the previous year. I'm using it because:

a) Takes care of possible variance in usage (teammates, competition, zone starts and score state)


b) Weighting fenwick shot differential by a shot quality model means I don't have to do too much extra thinking about parts of their production I might be excluding.

I found 103 cases since 2008 of wingers converting to center. Here's how they did.


Converted wingers on average saw a -1.9% decrease in their Rel. Exp GF% in their first season at center. They fared slightly better in their second season at the position, only seeing a 1.5% from their last season at winger, but did even worse in their third season if they stuck with it, with a 2.1% decrease from their last season at wing in a decreasing sample size (55). Converted wingers who had a Relative Exp. GF% above 0 (average) the previous season saw a 5% decline during their subsequent season at center .

Conversely, centers transitioning to wingers have gotten a bump in their production.

Remember that Rel. Exp GF% is adjusting for things like the strength of the line and zone starts, so this is as close to a pure indication of performance as one can get.

What is proven here is actually somewhat straight forward: Center is a tough job with added responsibilities, and the majority of wingers who make the jump see a somewhat drastic drop in their performance. Notable exceptions include Nazem Kadri, Maxime Talbot and Claude Giroux.

The first reaction to this evidence is to think that teams are trying too hard to turn their wingers into 
centers, but one should consider the defensive scale in baseball. 

There's a range of difficulty to each position in baseball, and so if any player moves up or down a position (in terms of difficulty) he would theoretically see a boost or hit to his performance at that position. But because of this his competence is not intrinsically tied to his value. A bad shortstop is still more valuable then an excellent fielding first baseman. 

 I think we could make the following conclusions:

1) Very few players who make the transition from winger to center are not going to see a decline in their production. 63% of them get worse, and that is cast in a more negative light by the fact the average age of these players a point where they should be seeing their most rapid improvements. 

2) Production, in the way I'm using it, is not the same as value. As center is a more valuable position,  there is more scarcity of elite center talent, so one would be willing to accept a certain loss of production based on one's own particular roster needs.

I think this is a first step to figuring out some basic economic tenants of the hockey market place. Teams have been willing to give up a better winger for a slightly worse center, and that is, at the very least, objective evidence to something most analysts have known all along. 

Wednesday, 24 September 2014

Goalies and Confidence Intervals: A Study in Uncertainty

How many saves does it take to see a goaltender's true save percentage? If you want to be pedantic, you could say we can never know a goalie's true talent--all we have over time is a steadily accruing sample that gets us closer and closer to some "true" number. One game is definitely too few; one thousand probably gets us where we want to go. But what about the in between? There's a long period of time where we sort of know what's going on, but it's hard to get objectively specific about a goalie's potential to have a career .920 save percentage versus a .925.

Imagine sitting on one side of a dark partition. On the other side is someone with a ten-sided die, some sides of which have been colored red. The person rolls the die and tells you whether they rolled white or red. After 10 rolls, how close are you to knowing how many sides are red? After 50, maybe you can say it's probably between 2 and 5. 100, perhaps a 2 or a 3. You can imagine getting more and more sure that you know how many sides are red. That's the role a confidence interval plays: taking your sample (however big or small) and using it to tell you the values the whole population probably lies between--for a given probability. This is the process of finding out how good a goalie is: looking at roll after roll of a many-sided die to narrow down the possibilities.

Here is a Wilson binomial confidence interval for every active NHL goalie who played a game last season, sorted by their current team or organization. It's at 95% confidence, so only 1 in 20 times would we expect it to be wrong. 95% is the threshold most commonly used for statistical significance, and the point at which--crudely speaking--we basically 'know' something. Don't be intimidated by the giant table; it's just for your future reference. I'll break it down below.

SA = Shots Against
SV% = Career Save Percentage
Low CI = the low value/floor of their 95% confidence interval
High CI = the high value/ceiling of their 95% confidence interval
CI Width = the difference between high and low--a way to eyeball how much we don't know

The main takeaway for me is just how little we can say definitively about goalies who haven't faced a lot of shots. I picked a few goalies from that list (long-time starters, recent starters, career backups, about-to-be starters or backups), and sorted by career saves.

A starting NHL goalie plays about 55 games and faces an average of 30 shots a game, so they see roughly 1650 shots per season. Thomas Greiss, the new Penguins backup, has seen about a season's worth of shots in his career thus far, and yet we can only pinpoint his career save percentage to between .901 and .927. Part of the problem is that most goalies are above a .900 save percentage, but there's still a huge difference between .910 and .920 in terms of goals allowed. But generally, we should be a bit more cautious about assuming a goalie who played a stellar half season will keep it up in the future. I really like Eddie Lack, but at the same time we can't say a lot about him.

It's fair to say that you're willing to accept more risk of being wrong in exchange for a tighter confidence interval. Are you willing to bet that 4 of 5 times a goalie will pan out in that range? To flip a coin?

I took three of my favorite goalies with very different career shots faced and tried out different risk levels on them (the rightmost column). You can see how that affects the confidence interval. Even if we used a confidence interval for Eddie Lack that only had a 50% likelihood of including his career save percentage, we could only posit that it falls between .906 and .918. That's still a pretty big variation.

Confidence intervals aren't too helpful for decisions on minute differences between goalies; they're more suitable for reinforcing what we already know about the unreliability of small sample sizes. Over the long haul, we can look at who the objectively best goalies in the league are--those whose low-end confidence interval values are the highest. Here are those goalies:

And here are the worst goalies--those with the lowest ceiling:

(Fancy meeting you here, Ondrej...)

Confidence intervals are helpful in forcing us to check our expectations about unproven goalies. In the long run, they can do more to help us determine relative value than save percentage alone. As is often said, goalies are voodoo. We would do well to remember that.

Megan blogs intermittently about whichever hockey stats catch her fancy at She can be reached on twitter at @butyoucarlotta, or via email at shinnystats at gmail.

Wednesday, 17 September 2014

Shot Suppression Is The Name Of The Game

As the “Summer of Analytics” wraps up and various NHL camps get underway, there is a palpable urgency apparent from some NHL front offices to find the key to success by expanding their analytics departments. The task teams are currently undertaking is to identify their strengths and weaknesses by conventional and progressive means. Once they do this, they can then come up with a plan to exploit their strengths and improve the areas of weakness.

While there is no secret weapon or magic trick that will suddenly make a good team from a bad team, there are some team strengths that are more important to success than others. In looking at successful teams over the past several seasons, one such strength stands out from the rest: Shot suppression.

Shot suppression is a fairly basic concept, but because it is not as exciting as a high powered offense or as easy to identify as say an excellent penalty kill, it is not often discussed during broadcasts or major media analysis shows. Shot suppression is one of the true measures of the quality of a team’s defensive structure and systems. Even in analytics, this component of team play can be overlooked when we use percentages such as CF% (Corsi For) or FF% (Fenwick For). Percentages are terrific and useful for many things, but one of their shortcomings is that they mask Event Rates.

Event Rates are often expressed as whatever metric is being used “Per 20” or “Per 60”. To understand how aggressive an offense is, CF or FF rates are very useful. For example, the San Jose Sharks had the highest CF60 in the league at Score Close last season with mark of 63.6 and were third in the league in FF% (most popular team possession measurement tool) at 54.9%. The Ottawa Senators were second in the league in CF60 with a rate of 63.2, but were twelfth in the league in FF% at 50.8%.

When the CA and FA rates are added into the mix, we can see which teams allow more shots than others. When used in combination with the team’s CF and FF marks we get a picture of a team’s event rates.

You will note that the best or most successful teams in the league are not at either extreme in terms of event rates. They are not super low event like the New Jersey Devils nor are they super high event like the Ottawa Senators. Teams with very low event rates both in terms of shots for and shots against often struggle to produce enough offense to consistently win games. This was obvious last season, when Devil’s goalie Cory Schneider played very well but was consistently losing games because of a lack of offensive support. Likewise, teams with very high event rates in shots for and against tend to score quite a bit, but they also tend to give up a lot of goals.

The real question is: what is most important? Shot suppression or a prolific offense? Looking back over the past several seasons at teams that were successful both during the regular season and the playoffs may lead us to an answer.

Monday, 8 September 2014

Updates: Sept 8th

Shot Quality

After some serious fence-sitting I've decided to do what many have been asking for: Adjusting Shot Quality, and therefore Expected GF%, by the player's previous shooting% record. This means that Exp. GF%, shot quality and all the other statistics that rely on that model to predict shooting percentage will be stronger, as they don't just use the variables of the individual shot as previously was the case but also his shooting% from the past 3 seasons, if applicable.

Multiple Seasons

In the player and team stat pages, you can now compute statistics for multiple seasons at a time.

Enjoy the updates! Much more to come.