Thursday, 25 September 2014

Can Players Impact Their On-Ice Faceoff Percentage?

This is a question that's popped up recently due to the new adjustments we've had for zone starts. It's obvious that zone starts effect a player's on-ice outcomes, and Tyler Dellow has proven how much a faceoff wins or loss can further impact a player's performance.

So it's clear that a further context needed to assess player production will be not just zone starts, but the faceoff percentage in those zone starts. So the question now is, who exactly is this context for and for who is it actually part of their performance?

I've heard arguments either way about if wingers and defensemen are able to affect faceoff percentage, and today I decided to find out definitively.

The Process

I took every player's on-ice faceoff percentage with every center the player was paired with, and compared that with how the center's faceoff percentage without that player on the ice.

To get an idea of what the data looks like, here's Phil Kessel's FO percentage "WOWY"


We then take the change in faceoff percentage observed from when the faceoff taker is matched with Kessel compared to when they're not and weight that by the amount of faceoffs they were on the taker. In total, Kessel had a +1.9% effect on his on-ice faceoff percentage.

The Results

This is a graph that shows a record for every player's Change in on-ice FO% compared to how many faceoffs he's been on the ice for in total.


The narrowing of the statistic is obvious, demonstrating the normalization that would be expected with a larger sample size of faceoffs.

Here's a graph with player's who've been on the ice for more then 2000 faceoffs in their career, and their impact on faceoff percentage by faceoff zone, with boxplots to demonstrate the range of values observed:


Corey Perry is the champion of this statistic. In all three faceoff zones , Perry has had an unbelievable 4.1% positive effect on his centers faceoff percentage.

In the offensive zone, wingers seem to have a larger potential for impact and conversely in the defensive zone defensemen seem to be able to have a larger average deviation from the mean. 

Conclusion


The majority of players with a large enough sample size have had less then a 1% affect on their center's faceoff percentage, but the success of player's like Corey Perry and the failures of Ryan Callahan suggest some players can affect it in a marginal way.

For those interested in playing around with this statistic, check out the interactive graph below.



From Winger to Center: How Players Are Affected By Moving Positions


I don't trust the positions that the NHL provides. Wingers are often labeled as centers, and that can make certain types of analysis hard to do. So to figure out who's actually playing in the middle, I look at the faceoffs: Players who take the majority of the draws while they're on the ice are listed as C's, and those who aren't are listed as wingers.

More centers come into the league then there are spots available for them. This is because the best forwards are frequently playing as centers when they are the dominant player on their team. So pre-season is the time to talk about position swapping. As players with some good NHL performances under their belt start rattling the cages of moving to most forward's preferred spot on the ice: down the middle, I was interested in how these players who do transition from winger to center perform.

I used Progressive Hockey's own Rel. Exp GF% to compare their season at their new position to how they did the previous year. I'm using it because:

a) Takes care of possible variance in usage (teammates, competition, zone starts and score state)

and

b) Weighting fenwick shot differential by a shot quality model means I don't have to do too much extra thinking about parts of their production I might be excluding.

I found 103 cases since 2008 of wingers converting to center. Here's how they did.

Results


Converted wingers on average saw a -1.9% decrease in their Rel. Exp GF% in their first season at center. They fared slightly better in their second season at the position, only seeing a 1.5% from their last season at winger, but did even worse in their third season if they stuck with it, with a 2.1% decrease from their last season at wing in a decreasing sample size (55). Converted wingers who had a Relative Exp. GF% above 0 (average) the previous season saw a 5% decline during their subsequent season at center .

Conversely, centers transitioning to wingers have gotten a bump in their production.



Remember that Rel. Exp GF% is adjusting for things like the strength of the line and zone starts, so this is as close to a pure indication of performance as one can get.

What is proven here is actually somewhat straight forward: Center is a tough job with added responsibilities, and the majority of wingers who make the jump see a somewhat drastic drop in their performance. Notable exceptions include Nazem Kadri, Maxime Talbot and Claude Giroux.

The first reaction to this evidence is to think that teams are trying too hard to turn their wingers into 
centers, but one should consider the defensive scale in baseball. 

There's a range of difficulty to each position in baseball, and so if any player moves up or down a position (in terms of difficulty) he would theoretically see a boost or hit to his performance at that position. But because of this his competence is not intrinsically tied to his value. A bad shortstop is still more valuable then an excellent fielding first baseman. 

 I think we could make the following conclusions:

1) Very few players who make the transition from winger to center are not going to see a decline in their production. 63% of them get worse, and that is cast in a more negative light by the fact the average age of these players a point where they should be seeing their most rapid improvements. 

2) Production, in the way I'm using it, is not the same as value. As center is a more valuable position,  there is more scarcity of elite center talent, so one would be willing to accept a certain loss of production based on one's own particular roster needs.

I think this is a first step to figuring out some basic economic tenants of the hockey market place. Teams have been willing to give up a better winger for a slightly worse center, and that is, at the very least, objective evidence to something most analysts have known all along. 

Wednesday, 24 September 2014

Goalies and Confidence Intervals: A Study in Uncertainty

How many saves does it take to see a goaltender's true save percentage? If you want to be pedantic, you could say we can never know a goalie's true talent--all we have over time is a steadily accruing sample that gets us closer and closer to some "true" number. One game is definitely too few; one thousand probably gets us where we want to go. But what about the in between? There's a long period of time where we sort of know what's going on, but it's hard to get objectively specific about a goalie's potential to have a career .920 save percentage versus a .925.

Imagine sitting on one side of a dark partition. On the other side is someone with a ten-sided die, some sides of which have been colored red. The person rolls the die and tells you whether they rolled white or red. After 10 rolls, how close are you to knowing how many sides are red? After 50, maybe you can say it's probably between 2 and 5. 100, perhaps a 2 or a 3. You can imagine getting more and more sure that you know how many sides are red. That's the role a confidence interval plays: taking your sample (however big or small) and using it to tell you the values the whole population probably lies between--for a given probability. This is the process of finding out how good a goalie is: looking at roll after roll of a many-sided die to narrow down the possibilities.

Here is a Wilson binomial confidence interval for every active NHL goalie who played a game last season, sorted by their current team or organization. It's at 95% confidence, so only 1 in 20 times would we expect it to be wrong. 95% is the threshold most commonly used for statistical significance, and the point at which--crudely speaking--we basically 'know' something. Don't be intimidated by the giant table; it's just for your future reference. I'll break it down below.

SA = Shots Against
SV% = Career Save Percentage
Low CI = the low value/floor of their 95% confidence interval
High CI = the high value/ceiling of their 95% confidence interval
CI Width = the difference between high and low--a way to eyeball how much we don't know


The main takeaway for me is just how little we can say definitively about goalies who haven't faced a lot of shots. I picked a few goalies from that list (long-time starters, recent starters, career backups, about-to-be starters or backups), and sorted by career saves.


A starting NHL goalie plays about 55 games and faces an average of 30 shots a game, so they see roughly 1650 shots per season. Thomas Greiss, the new Penguins backup, has seen about a season's worth of shots in his career thus far, and yet we can only pinpoint his career save percentage to between .901 and .927. Part of the problem is that most goalies are above a .900 save percentage, but there's still a huge difference between .910 and .920 in terms of goals allowed. But generally, we should be a bit more cautious about assuming a goalie who played a stellar half season will keep it up in the future. I really like Eddie Lack, but at the same time we can't say a lot about him.

It's fair to say that you're willing to accept more risk of being wrong in exchange for a tighter confidence interval. Are you willing to bet that 4 of 5 times a goalie will pan out in that range? To flip a coin?


I took three of my favorite goalies with very different career shots faced and tried out different risk levels on them (the rightmost column). You can see how that affects the confidence interval. Even if we used a confidence interval for Eddie Lack that only had a 50% likelihood of including his career save percentage, we could only posit that it falls between .906 and .918. That's still a pretty big variation.

Confidence intervals aren't too helpful for decisions on minute differences between goalies; they're more suitable for reinforcing what we already know about the unreliability of small sample sizes. Over the long haul, we can look at who the objectively best goalies in the league are--those whose low-end confidence interval values are the highest. Here are those goalies:


And here are the worst goalies--those with the lowest ceiling:


(Fancy meeting you here, Ondrej...)

Confidence intervals are helpful in forcing us to check our expectations about unproven goalies. In the long run, they can do more to help us determine relative value than save percentage alone. As is often said, goalies are voodoo. We would do well to remember that.

Megan blogs intermittently about whichever hockey stats catch her fancy at shinnystats.wordpress.com. She can be reached on twitter at @butyoucarlotta, or via email at shinnystats at gmail.

Wednesday, 17 September 2014

Shot Suppression Is The Name Of The Game

As the “Summer of Analytics” wraps up and various NHL camps get underway, there is a palpable urgency apparent from some NHL front offices to find the key to success by expanding their analytics departments. The task teams are currently undertaking is to identify their strengths and weaknesses by conventional and progressive means. Once they do this, they can then come up with a plan to exploit their strengths and improve the areas of weakness.

While there is no secret weapon or magic trick that will suddenly make a good team from a bad team, there are some team strengths that are more important to success than others. In looking at successful teams over the past several seasons, one such strength stands out from the rest: Shot suppression.

Shot suppression is a fairly basic concept, but because it is not as exciting as a high powered offense or as easy to identify as say an excellent penalty kill, it is not often discussed during broadcasts or major media analysis shows. Shot suppression is one of the true measures of the quality of a team’s defensive structure and systems. Even in analytics, this component of team play can be overlooked when we use percentages such as CF% (Corsi For) or FF% (Fenwick For). Percentages are terrific and useful for many things, but one of their shortcomings is that they mask Event Rates.

Event Rates are often expressed as whatever metric is being used “Per 20” or “Per 60”. To understand how aggressive an offense is, CF or FF rates are very useful. For example, the San Jose Sharks had the highest CF60 in the league at Score Close last season with mark of 63.6 and were third in the league in FF% (most popular team possession measurement tool) at 54.9%. The Ottawa Senators were second in the league in CF60 with a rate of 63.2, but were twelfth in the league in FF% at 50.8%.

When the CA and FA rates are added into the mix, we can see which teams allow more shots than others. When used in combination with the team’s CF and FF marks we get a picture of a team’s event rates.



You will note that the best or most successful teams in the league are not at either extreme in terms of event rates. They are not super low event like the New Jersey Devils nor are they super high event like the Ottawa Senators. Teams with very low event rates both in terms of shots for and shots against often struggle to produce enough offense to consistently win games. This was obvious last season, when Devil’s goalie Cory Schneider played very well but was consistently losing games because of a lack of offensive support. Likewise, teams with very high event rates in shots for and against tend to score quite a bit, but they also tend to give up a lot of goals.

The real question is: what is most important? Shot suppression or a prolific offense? Looking back over the past several seasons at teams that were successful both during the regular season and the playoffs may lead us to an answer.

Monday, 8 September 2014

Updates: Sept 8th

Shot Quality

After some serious fence-sitting I've decided to do what many have been asking for: Adjusting Shot Quality, and therefore Expected GF%, by the player's previous shooting% record. This means that Exp. GF%, shot quality and all the other statistics that rely on that model to predict shooting percentage will be stronger, as they don't just use the variables of the individual shot as previously was the case but also his shooting% from the past 3 seasons, if applicable.

Multiple Seasons

In the player and team stat pages, you can now compute statistics for multiple seasons at a time.

Enjoy the updates! Much more to come.

Tuesday, 2 September 2014

A Beginning

Hello and welcome to ProgressiveHockey.com. This site will be a place where you can access advanced hockey statistics and analysis.

You will notice some small and large differences to what you may have found at other fancystats websites. Here are a few big features to our stats.

Relative:

Relative stats have taken on a whole new meaning. Instead of simply the entire team's performance when the player is off the ice , relative is calculated as the average performance of his actual on ice teammates and competition without the player in question, weighted by their ice time with him. This provides a much more meaningful metric to evaluate players with.

Exp GF%:

You will notice a 4th stat besides corsi, fenwick and goals for percentages: Exp. GF%, or expected GF%. Exp GF% is simply a player's on ice fenwick weighted by the quality of the shot. I outline the methodology for calculating shot quality here, and a player's own average shot quality is also listed.

Adjusting for Score State and Zone starts:

Score close metrics have been firmly rooted in advanced hockey stat methodology for some time now, but there are many issues:

1)  There is still variance in shooting rates even within the score close definition. I go into detail about this here.

2) Score close removes a whole swath of data, which significantly cuts down on the power of the sample size.

To solve this, and the effect of the variance in the ratio of offensive to defensive zone starts for players, a logistic regression is used to parse out the player-neutral odds that can effect shot differentials. If you wish to use this method when using the stats, simply click 'Yes' on the Adjusting for Zone Starts and Score State option.

I think there's a lot for hockey fans to love here, and this site will continuously develop so be sure to check back often!