Part 2: Development of Defensive Adjusted Plus/Minus estimates using advanced statistics

 

 
by Scott Sereday

In many ways, defense has always been the primary weakness of statistics in basketball (and many other sports for that matter). Many important defensive aspects are very inadequately measured. There have been strides toward creating statistics that would better measure impact on the defensive end of the floor, but few of these are publicly available in large quantities. Plus-minus numbers are good, but the most unbiased of these figures, adjusted plus-minus, is highly variable and it is difficult to assess value with much accuracy based on limited data.

The problems with current box score statistics are obvious. The longest standing defensive statistic, fouls, is valued as negative in some formulas, such as PER, but is clearly the best option in some situations. How valuable was Rajon Rondo’s foul on Brad Miller in Game 5 of the Bulls-Celtics series? The Celtics arguably would have lost the game had he not committed that foul. (Of course, if the defensive assignment was not blown in the first place, that situation would usually be preferred to free throw attempts.)  Steals and blocked shots are good statistics that indicate defensive value, but they can often merely reflect high risk, high reward actions. Most importantly, blocks and steals account for less than 12 plays per team per game on the average NBA team. A team cannot decide who they want to “use” on defense as effectively as they can on offense, thus all 5 players have assignments on nearly every play. Thus, if a team has 12 blocks and steals on 100 defensive possessions per game and, if we assume that fouls have no significant positive or negative impact (on average), this means that a player’s impact is not measured in 488 out of 500 defensive possessions, or nearly 98% of the time. Measuring defense by blocks and steals is as silly as rewarding cornerbacks trips to the pro bowl based on the number of interceptions they accumulate. Wait a second… I think that is how cornerbacks earn a pro bowl berth. Perhaps it is best to visually determine who the best players are. This way, the experts can come to a consensus and reputation will determine who is valuably defensively. Of course, Rafael Palmeiro once won a Gold Glove based on expert reputation when the general managers decided he was the most valuable defensive first baseman in the AL despite only playing 28 games at first and being designated hitter the other 135 games he played. Basketball is also clearly not immune to relying too heavily on reputation when assessing defensive value.

In part two of my four part series, I will develop a regression model using advanced statistics to estimate defensive adjusted plus/minus based on data from the 2005-2006 season through the 2007-2008 season.

As I have done in developing an offensive adjusted plus/minus model using advanced statistics, I ran a baseline model using the significant variables from Dan’s original defensive model, which employs only the basic box score statistics:

Model 1 – Defensive Plus/Minus estimate using box score statistics
Mean Squared Residual - 13.38:

 

Coefficients

Standard Error

T Stat

P-value

Intercept

-4.69887

0.732856

-6.41172

<.0001

Blocks

0.80195

0.18072

4.437531

<.0001

Defensive Rebounds

0.413919

0.073503

5.631314

<.0001

Fouls Committed

-0.07551

0.094249

-0.80117

0.423255

Steals

1.597019

0.257176

6.209828

<.0001

Turnovers

-0.26385

0.159047

-1.65894

0.097494

Many statistical formulas (including Hollinger’s PER) include fouls representing a negative value; however, many regression models indicate that they have an insignificant or even positive value. The values of blocks and steals are obvious and are reflected in the above model. In addition to the value obtained from the action itself, a player who accumulates steals and blocks is typically more likely to force turnovers and missed shots. This value is reflected above. Interestingly, a player who attempts to get many steals and blocks can be prone to getting beat and hurting the rest of his team if he doesn’t record that statistic, while a player whose defensive style is physical, aggressive and intelligent could be prone to fouls, but also prone to breaking up offensive plays. I figured that the interaction of these stats might shed some additional light on defensive value.

One of the aspects I wanted to capture was a shot blocking presence beyond the average value of an individual block. I expect this presence to be more valuable as a defender has height and length because such a player will be able to more easily contest a greater number or shots while maintaining an advantageous defensive position. Conversely, a smaller player who frequently contests shots will often be taken out of proper defensive position and be more likely to be beaten on the play or to commit an unnecessary foul. I attempted to account for these tendencies by including the interaction of blocks and height (in inches above 5 feet).

I tried several other combinations and, in the end, the interaction of fouls and steals proved insignificant and the coefficient representing the average value of blocks without interaction to either height or fouls was highly variable. I suspect that the interaction of steals and fouls would lose value for the tendency of a player to get beat when he fails in his gamble, but it would increase in value because of the tendency to force turnovers on physical plays on the ball that aren’t credited as steals or fouls to the individual making such a play.

Finally, I included Opponent PER plus/minus in my model. (I used 16.5 – Opponent PER). I eliminated all insignificant variables for my second model.

Model 2 – Defensive Plus/Minus estimate using interaction variables and advanced statistics
Mean Squared Residual - 12.52:

 

Coefficients

Standard Error

t Stat

P-value

Intercept

-6.2912

0.76882

-8.18293

<.0001

Opponent PER Plus/Minus

0.395194

0.055894

7.070478

<.0001

Defensive Rebounds

0.353939

0.077503

4.566752

<.0001

Fouls Committed

0.392725

0.134159

2.927306

0.00351

Steals

1.712713

0.251091

6.821082

<.0001

Turnovers

-0.29954

0.154094

-1.94386

0.05224

[Height (in inches) – 60] * Blocks

0.082477

0.020888

3.948517

<.0001

Fouls Committed *Blocks

-0.22292

0.082745

-2.69401

0.007198

This model seems to greatly improve the accuracy of predicting defensive adjusted plus/minus. The interaction between fouls and blocks could indicate a player who often attempts to block shots and gets beat, causing a higher scoring expectation than what would have been expected had the block not been attempting in the first place. Opponent PER accounts for 40% of the deviation from average; however, a perfectly measured model would expect a coefficient very close to 100%. Since PER includes defensive and offensive statistics, some of the “missed” value is due to opponent defensive statistics. However, I suspect that much of the “missed” value is a result of teammate help (or lack thereof) and interaction. In observation and in other studies, I have noticed that fouls on help defense seem to be particularly valuable.

Although the second model is much improved over the first one, I was still not content with the results. There were three more factors I attempted to include.

First, I decided to include offensive fouls drawn. Complete estimates are available on 82games.com on the following address:
http://www.82games.com/FSORT10.HTM
I made estimates for incomplete data since some of the data for the 2006 and 2007 seasons was not available. The coefficient attributable to charges drawn was slightly larger when I used the same variables to estimate just the 2008 defensive adjusted plus/minus, but after including the data and estimates from 2006 and 2007, the coefficient was much more statistically significant.

I also included a variable to measure the “non statistical” impact attributable to a specific player. To determine this figure, I took the weighted sum of the defensive adjusted plus/minus for each team and subtracted out the defensive statistical plus/minus obtained from Model 2. In order to try to distribute weight to players more responsible for the success and failures of their respective teams, I tried interactions with this “non statistical” value and other variables.

Finally, I attempted to differentiate the impact of different types of fouls committed. There was no significant difference between the coefficients for personal fouls and shooting fouls. There appear to be differences between the other types of fouls, but the coefficients were not statistically significant enough due to their infrequent occurrences. The most common types of “Other Fouls Committed” were offensive fouls, loose ball fouls, illegal defensive fouls and technical fouls. After testing the impact of these foul types individually, I tested to see which fouls had significant interactions with the other variables.

After trying several combinations, I decided upon the following model:

Model 3 – Final Defensive Plus/Minus estimate using advanced statistics
Mean Squared Residual - 11.59

 

Coefficients

Standard Error

t Stat

P-value

Intercept

-6.23933

0.711252

-8.77231

<.0001

Opponent PER Plus/Minus

0.378744

0.049016

7.726944

<.0001

Defensive Rebounds

0.284151

0.071079

3.997689

<.0001

Steals

1.768025

0.230399

7.673736

<.0001

Turnovers

-0.29683

0.141911

-2.09167

0.036732

[Height (in inches) – 60] * Blocks

0.088827

0.015558

5.709401

<.0001

Shooting Fouls * Blocks

-0.4278

0.109313

-3.91352

<.0001

Shooting/Personal Fouls

0.346479

0.130095

2.663282

0.007869

Other Fouls Committed

1.26013

0.458565

2.747986

0.006109

Charges Drawn

0.678873

0.392199

1.730942

0.083785

Shooting Fouls *Team Non Statistical Plus/Minus

0.045195

0.006843

6.604568

<.0001

Notice that the residual term has greatly reduced. Keep in mind that while a low residual term is good, the residual of a perfect measure is limited to the variability of defensive adjusted plus/minus.

In determining my non statistical interaction variable, I used a measure that is similar to Dean Oliver’s method of using team defensive rating to value individual Defensive Ratings. In my model, I eliminated players who didn’t play enough minutes to be included in the adjusted plus/minus model and I reflected the interaction of shooting fouls. Although a variable of non statistical plus/minus that excludes the interaction with shooting fouls was significant and did improve the accuracy of the model, I excluded it because the coefficient was so small (0.005). Some quick math shows that if an average team commits 10 shooting fouls per 100 defensive possessions, the variable of the interaction between shooting fouls and non statistical plus/minus accounts for slightly less than 50% of the defense that is not statistically measured in Model 2. This is a significant improvement, but far from perfect.

After testing a few combinations, the only type of foul that significantly interacted with blocked shots was shooting fouls. It is interesting to note that fouls committed, a statistical that had such insignificant value in Model 1, now appear in parts of 4 variables and is significant in all of them.

Let’s look at a sample of the defensive plus/minus’ for players in the 2007-2008 season.

 

2008 Estimated Adjusted Plus/Minus

 

Defensive

Name

Actual

Statistical

Advanced Statistical

Barzilia/Ilardi

Hayes, Chuck

8.9

3.4

 5.7

9.8

Garnett, Kevin

9.8

3.7

 5.1

7.6

O'Neal, Shaquille

5.6

1.5

 5.0

5.5

Camby, Marcus

2.9

5.6

 4.8

4.3

Diop, Desagana

6.8

2.3

 4.6

5.5

Bynum, Andrew

1.9

2.7

 4.4

0.3

Thomas, Kurt

6.1

3.3

 4.4

4.6

Howard, Dwight

5.8

3.6

 4.3

1.7

Duncan, Tim

3.7

3.2

 4.1

4.5

Moon, Jamario

2.8

3.3

 4.0

3.2

Frye, Channing

3.6

0.8

 3.6

3.8

Wallace, Rasheed

4.8

3.5

 3.5

3.7

Foyle, Adonal

5.4

1.9

 3.4

0.6

O'Neal, Jermaine

6.2

1.0

 3.2

4.9

Noah, Joakim

6.3

2.2

 3.1

2.7

Ming, Yao

2.2

1.1

 3.0

4.6

Allen, Tony

3.7

0.2

 2.9

4.5

Ginobili, Manu

4.3

1.0

 2.5

3.7

Nowitzki, Dirk

2.2

0.9

 2.0

1.4

James, Lebron

-0.7

1.6

 1.9

2.4

Artest, Ron

5.3

1.5

 1.8

4.5

Battier, Shane

1.1

-0.1

 1.3

1.8

Bryant, Kobe

1.2

0.6

 0.8

-0.5

Stoudemire, Amare

-5.3

2.2

 0.7

-2.6

Paul, Chris

-10.4

1.6

 0.6

-4.5

Bell, Raja

2.9

-1.5

 0.3

2.2

Bowen, Bruce

1.7

-1.2

-1.2

2.8

Prince, Tayshaun

4.6

-1.3

-1.3

3.3

Billups, Chauncey

-3.3

-0.9

-0.3

-2.5

Nash, Steve

-3.2

-2.8

-4.3

-3.2

Actual - defensive adjusted plus/minus
Statistical - the estimate using the coefficients in Model 1
Advanced Statistical - the estimate using the coefficients in Model 3
Barzilia/Ilardi - figure is taken from 82games.com

Almost all of the players with defensive reputations have greater ratings using advanced statistics than their ratings using only box score statistics. One guy who jumped out at me is Chuck Hayes who has incredible defensive adjusted plus minus’, but “only” averaged 1.1 steal, 0.5 blocks, 2.7 fouls and 3.7 defensive rebounds per game (in 20 minutes). Using my advanced statistical model, he tops the list (on a per minute basis). It does appear that my advanced statistical model may capture defensive value beyond box score statistics better for post defenders than it does for perimeter defenders, but my estimates are that it adds about 1 point per 100 defensive possessions to players with reputations as elite perimeter defenders. It also seems to miss approximately the same amount compared to the average defensive adjusted plus/minus for these players. As most statistical and plus/minus models confirm, post players tend to be more valuable defensively and perimeter players tend to be more valuable offensively.

Notice that the greatest values for the defensive advanced statistical estimate are lower, on average, than the greatest values for the offensive advanced statistical estimate.  This occurs despite the defensive model having a lower residual than the offensive model. (A model with less powerful independent variables will tend to produce estimates with less extreme variation and a higher residual or error term.) The lower absolute defensive estimates could result from the limited defensive impact an elite defender can generate. The more valuable a player is, the more important it is to keep him on the floor and, although fouls can be used to increase the success of the team in a given possession, they can also limit the minutes of the player who commits them. Look at the players on the list above and consider how many do not play in nearly the amount of minutes as the top offensive players. This reduction in minutes could also stem from the tax that defensive effort takes on a player’s stamina. It doesn’t appear likely to me that coaches undervalue defensive contributions. (That’s the fan’s job).

Finally, I have also included an estimate of advanced statistical plus/minus for the 2008-2009 season. Since charges drawn are not available for this season yet, I made estimates based on data from the prior season.

 

 

2009 Estimated Adjusted Plus/Minus

 

Defensive

Name

Actual

Statistical

Advanced Statistical

Howard, Dwight

  8.4

  5.4

  8.5

Garnett, Kevin

  7.5

  3.9

  6.0

James, LeBron

  6.6

  2.8

  5.5

Kirilenko, Andrei

  4.1

  2.4

  5.4

Allen, Tony

  4.8

  1.9

  5.2

Przybilla, Joel

  7.7

  4.1

  5.2

Ilgauskas, Zydrunas

  2.5

  1.8

  4.9

Camby, Marcus

10.3

  5.2

  4.7

Wallace, Gerald

  6.1

  2.9

  4.7

Paul, Chris

  4.5

  3.3

  4.6

Ariza, Trevor

  1.0

  3.1

  4.6

Thomas, Kurt

  4.7

  2.9

  4.5

Wallace, Ben

  5.1

  4.0

  4.4

Odom, Lamar

  9.1

  2.9

  4.4

Dalembert, Samuel

  1.7

  3.7

  4.3

Duncan, Tim

  8.0

  3.0

  4.2

Gortat, Marcin

  4.6

  4.1

  4.2

Wade, Dwayne

  4.4

  2.2

  4.0

Perkins, Kendrick

  -3.5

  1.5

  3.9

Ming, Yao

  6.5

  1.9

  3.9

Thomas, Tyrus

  -3.7

  3.9

  3.8

Hilario, Nene

  3.2

  2.4

  3.8

Evans, Reggie

  0.6

  1.9

  3.1

Artest, Ron

  5.5

  0.9

  3.0

Hayes, Chuck

  0.7

  2.9

  2.2

Battier, Shane

  2.3

  0.4

  1.9

Bryant, Kobe

  1.1

  0.5

  1.5

Bowen, Bruce

  3.0

  -0.9

  0.7

Hinrich, Kirk

  7.2

  0.4

  0.7

Prince, Tayshaun

  -7.6

  -0.8

  -0.2

Bell, Raja

  -0.8

  -1.6

  -0.9

Many of the estimates are significantly different than the previous year. This could mean that teammate interaction is very important in assessing player value on defense and that a player can be much more valuable in one system than another. Chuck Hayes’ value reduced dramatically. It is possible that either his production in the 2007-2008 and/or the 2008-2009 season(s) was a fluke based on his limited minutes. According to 82games.com, he has also always been less effective defensively when he was at the center position and he played a much more significant percentage of time at that slot in 2008-2009. Listed at 6’6”, it would be no surprise if this greatly reduced his defensive value. Based on his data from 82games.com, he played about 80% of his time at the center position in 2008-2009!

There are still many ways to improve both offensive and defensive statistical estimates of player value. Teammate interaction is still largely uncovered and better attributions of the defensive assignments a player draws can be determined. Of course, we have long been waiting for better defensive statistics to be publicly available, such as successfully contested shots or points allowed. I will delve into a more comprehensive study at some other time, but for my next section, I will look at the perceived value of the statistics used in my advanced offensive and defensive models. Perhaps the analysis of coach and scouting related ratings will help to shed some additional light to the non statistical impact of offensive and defensive performance.

 

 

 

 

Copyright © 2009 Basketball-Statistics.com