The following is part of a weekly series at the Orlando Magic blog, Third Quarter Collapse.
A week ago I tracked the hustle plays in a game between the Los Angeles Clippers and the Memphis Grizzlies. Tracking hustle plays is presumably something most, if not every, NBA team does. After all, box scores are pretty limited. Even if we use the play-by-play data to do thorough analysis, it still doesn’t include things such as diving for loose balls, deflections, missed blockouts, etc. But teams would like to know these things, so they must track it themselves.
I decided to track the hustle plays during last Saturday’s game between the Magic and the Golden State Warriors. During the game, I kept track of five things. First, I tracked players going for loose balls. In my experience with a college team, we only record plays where a player dove for a loose ball. But since this is the NBA, and effort is often lacking, I include all plays in which a player ends up with the ball, regardless of whether or not he dove. A second thing I track is drawn charges. You can somewhat glean this from the play-by-play data, but it is much easier to just record it yourself.
Thirdly, I kept track of good sprints. I define these as plays in which a player creates a play for himself or others by sprinting the floor and forcing the defense to adjust. For this game featuring the fast-paced Warriors, I had to be more selective in my criteria or else we’d have a lot of good sprints. A fourth thing I tracked for this game was deflections. This is relatively easy to define and track. Basically it includes any deflection that is not recorded as a steal, rebound, etc. Finally, I kept track of missed blockouts. These were most noticeable when they led to an easy offensive rebound, and they were much more rare in this game than in my first one.
Of course, these aren’t all the hustle plays that players can make. Traditional box score stats such as offensive rebounds and steals often reflect hustle plays. Defense is also largely a product of effort, but that is something I will track another time.
Below is a link to a spreadsheet that contains the hustle stats for the Magic-Warriors game. On the left side of each tab is the raw numbers. On the right side is the per-minute numbers. Instead of presenting them as “statistic per minute,” they are presented as “minute per statistic.” I did this because the numbers are so low. As it turns out, this method is not too difficult to grasp conceptually. For positive statistics such as deflections, a lower number is better (a blank number means the player did not record any deflection at all, which obviously is bad). For negative statistics such as missed blockouts, blank numbers are the best and low numbers are the worst.
I have a few observations about the data:
The Magic did not win this game because they outhustled the Warriors. In terms of effort, both teams were solid and about even. The Magic won, obviously, because of a huge run late in the fourth quarter in which they hit their shots and the Warriors made silly plays.
Up next I’d like to track the defense of the Magic. With a few games in the data set, we may be able to rate the defense of Magic players in other ways besides Defensive Rating, plus-minus, etc.
One thing many people have wondered is whether or not there are diminishing returns for rebounds. Basically, what that would mean is that not all of a player’s rebounds would otherwise have been taken by the opponent; some would have been collected by teammates. Therefore, starting five league leaders in rebounds would probably be overkill because eventually they’d just steal them from each other. At some point, there are only so many rebounds a team can grab, and some are just bound to end up in the hands of the opponent.
This principle is very important to statisticians who wish to develop player ratings systems. These ratings often assign weights to different statistics (including offensive and defensive rebounds), so knowing that a defensive rebound collected by one player would most likely otherwise have been collected by a teammate makes that stat less “valuable” in terms of producing wins.
To test the effect of diminishing returns of rebounds, I decided to go through the play-by-play data (available at Basketball Geek) and compare each lineup’s projected rebounding rates (the sum of each player’s individual rebound rates for the season) to their actual rebounding rates (what percentage of rebounds that lineup grabbed while it was on the floor). After doing some research, I found out a very similar study was done by Eli Witus (formerly of CountTheBasket.com, currently of the Houston Rockets). Before you proceed with the rest of my article, you should read his. Although my method is slightly different, he provides a great explanation of why it’s useful to do the research this way and he also lists some advantages and disadvantages of this method.
Before I show you the results, I should explain the intricacies of my research and also some of the differences between Eli’s study and mine. The individual rebound rates I used were taken from the rebound rates I calculated myself using the play-by-play data. Because both the individual rates and the lineup rates were calculated from the same data, there’s less risk of error due to silly things such as differences in calculations or incomplete data. Also, to reduce the effects of small sample sizes due to lineups that didn’t receive a lot of minutes together, Eli chose to group lineups into bins based on their projected rebound rates. He then regressed each bin ‘s (a collection of different lineups with similar projected rebound rates) projected rebound rate to its actual rebound rate.
When I was coming up with my idea, I chose to do things a little differently, although the purpose is the same. Instead of grouping the lineups into bins, I simply only selected the lineups that met a minimum qualification for plays. Only lineups that appeared in at least 400 plays were included in my study. This left me with a sample size of 475 lineups. Like Eli, I then regressed the projected rebounding rates versus the actual rebounding rates. One final difference between us two is that his article was written in February of 2008, so I’m presuming he used data from the 2007-08 season. I’m using data from the 2008-09 season.
Offensive Rebound Rate
The graph for Offensive Rebound Rate is below:
The key to understanding this graph is looking at the slope of the line. Here, it is 0.7462 (close to the 0.77 number he got). If there were no diminishing returns for offensive rebounds, the slope would be 1. This would mean that for each additional rebound a player could offer to his lineup, he would actually add one rebound to the lineup’s total. If the slope is less than one (such as in this case), it means that each additional offensive rebound by the player adds about 0.75 to the lineup’s total, because some of those would have been taken by his teammates anyways. The slope I have here is pretty high, though, indicating that the diminishing returns effect for offensive rebounds isn’t too strong.
Defensive Rebound Rate
In his study, Eli found that the diminishing returns effect was much stronger for defensive rebounds. Can I replicate his results? Below is the graph for defensive rebounds:
Eli found a slope of 0.29. Mine was close, but slightly higher at 0.3331. Regardless of the minor difference, we both can come to the same conclusion: there is a much stronger diminishing returns effect at play with defensive rebounds than there is with offensive rebounds. While each offensive rebound adds 0.75 to the lineup’s total, each defensive rebound only adds 0.33, indicating that many defensive rebounds are taken away from teammates. Of course, individual cases can vary.
These results help explain why a lot of player rating systems make defensive rebounds “worth” less than offensive rebounds. Eli has a good explanation of it at the end of the article here. For example, in his PER system, John Hollinger assigns offensive rebounds a value more than double the value of defensive rebounds. This is partly due to the diminishing returns effect we found here today and originally in Eli’s work. As it turns out, my numbers indicate that offensive rebounds are in fact worth a little more than double the value of defensive boards. So hats off to Hollinger and his many contemporaries who have managed to weight rebounds appropriately.
I could stop here, but I’d like to take this research a little further and see what other insights we can come up with. First, I’d like to break down the data by location (home and away).
One thing to note is that the projected rebounding rates for the lineups are based on overall individual ratings, not just for home games. If rebounding was usually in favor of the home teams, this would lead the projected lineup rebounding rates to usually underestimate the actual rates in this case. However, since it would presumably do this for all lineups, we can still take a look at the effect of diminishing returns.
With that being said, how does the home data compare to the overall data? For offensive rebounds, the slope is flatter, indicating a stronger effect of diminishing returns. However, for defensive rebounds, the slope is slightly higher, indicating a lesser effect. The differences are minor, though.
We can also take a look at the away data:
As you would expect given what we now know about the home data, the effect of diminishing returns appears to be much weaker on the road for offensive rebounds. In fact, as we can see, the slope is getting close to 1. This indicates that there isn’t much in terms of diminishing returns for this type of rebound. Intuitively, this makes sense. If teams rebound the ball better at home, there are less offensive rebound opportunities for the visiting team. Therefore, it is more likely that an offensive rebound by a visiting player would otherwise have been grabbed by the opponent as opposed to one of his teammates, which in turn makes good offensive rebounders more valuable on the road. The same pattern doesn’t follow for defensive rebounds, though. In both cases, the difference isn’t gigantic, so we should be hesitant to draw any serious conclusions.
The one difference that is large and consistent is the difference in slopes between offensive and defensive rebounds, no matter the location. Confirming what Eli found in his original studies, this data says that the effect of diminishing returns is much stronger on defensive rebounds than it is on offensive ones. Therefore, offensive rebounding is a more “valuable” skill in terms of how you rate players, and some of the best player rating systems do take this into consideration.
So far, this whole article has been about the diminishing returns of rebounds. However, we can also use the same lineup-based approach to look at other statistics. Today I’ll also explore the diminishing returns of blocks, steals, and assists. Eli already used his method to take a crack at the usage vs. efficiency debate, and I recommend you read that article for some fascinating insight.
Block Rate, for a lineup, is defined as the percentage of shots by the opposing team that is blocked by one of the players in the lineup.
Blocks are an interesting statistic to examine. After all, there are only so many block opportunities around the basket and occasionally on the perimeter. When you also take into consideration the fact that teams often funnel players into the waiting arms of a dominant shot-blocker, it seems as though the diminishing return for blocks should be relatively strong. That is, if you add a shot blocker that normally blocks 4% of the opposing team’s shots to your lineup, you shouldn’t expect to block nearly that many more as a team because of diminishing returns. To see if this is true, I used the same methodology that I did for rebounding and came up with this graph:
As it turns out, the slope is at 0.6015. This puts Block Rate somewhere in the middle between Offensive Rebounds and Defensive Rebounds. A lineup full of good shot blockers will almost certainly block more shots than a weaker lineup, but the difference may not be as much as you might think due to effects of diminishing returns.
Up next we have Steal Rate. For an individual, it is defined as the number of opponent possessions that end with the given player stealing the ball. Therefore, for a lineup, it would be defined as the number of opponent possessions that end with a steal by anyone from that lineup. The graph for Steal Rate is below:
Here, we see the slope is nearly 1. This indicates that there is practically no diminishing returns effect on steals. If you add a player 2% better than average in terms of steals to your average lineup, you should expect to steal the ball almost 2% more than you currently do. Another way to put it is that usually, if a given player steals the ball, it’s not likely that someone else would have stolen the ball if he failed. Of course, like with every graph so far, the R^2 is still very low. This means that we can’t really predict how many steals a lineup will get simply by adding the Steal Rates of all of its players.
Finally, we have Assist Rate. For an individual, it would mean the number of field goals made by a player’s teammates that he assisted on. For a lineup, it means the percentage of made field goals that were set up by an assist. The graph is below:
Of any graph presented on this page so far, this one has by far the lowest slope. Normally this would indicate that there is a huge diminishing returns effect for assists. However, I’m not sold on this explanation just yet for various reasons, so for now I will just present the data as is.
I discussed a number of different issues today, so I think it’s good to recap what I’ve presented. First, using a method similar to the one Eli Witus used at CountTheBasket.com, I found that there is a large diminishing returns effect for defensive rebounds that is significantly larger than the effect for offensive rebounds. This confirms the common belief that offensive rebounds are “worth” more than defensive ones. When we split the data into home and away, it appears that individual offensive rebounding skill is particularly important on the road, indicated by a very high slope on the graph. Finally, I took a look at the diminishing returns of a few other advanced statistics and found the strongest effect on assists and a weaker but still significant effect on blocks.
If you have suggestions or comments about my work, please e-mail me at firstname.lastname@example.org. And again, much credit must go to Eli Witus, who originally thought of these ideas well before I did.
(Note: These stats are updated through November 27. Games from this past weekend aren’t included. )
For those who are unaware, every year I calculate a statistic called Composite Score (numbers are here and here) for each player. Composite Score is a rating system that combines six different advanced statistics, with three measuring offense and three measuring defense. The offensive statistics are Offensive Rating, Offensive Plus-Minus, and PER. The defensive statistics are Defensive Rating, Defensive Plus-Minus, and Counterpart PER (the estimated PER allowed on defense by a player). These numbers can be obtained from Basketball-Reference.com and 82games.com.
Although I can’t compute Composite Score for Magic players just yet (because of the way its calculated, I need the stats for every player in the league before I can calculate Composite Score), I can still present how every Magic player has fared in the individual components. I will break things down into offense and defense. Below is a table presenting every Magic player’s offensive performance so far, as measured by the three offensive statistics I mentioned earlier:
Dwight Howard has been excellent as usual. Jason Williams has been a pleasant surprise and has been particularly efficient, posting an Offensive Rating of 123. His Offensive Rating is second on the team to J.J. Redick. Believe it or not, the best newcomer offensively this year for the Magic has been Ryan Anderson. Of course, don’t go crazy about his offensive plus-minus just yet. That number can be flammable, and it is particularly unreliable this early in the year. I included it for the sake of completeness, but I rarely use it for reference this early in the season.
By his standards, Vince Carter has struggled offensively. His PER is still relatively good, but his Offensive Rating is below the league average. Before his injury, Jameer Nelson was also failing to meet expectations, but again, he wasn’t bad either. With the exception of Brandon Bass, many of the reserves have struggled somewhat on the offensive end, posting PER’s and Offensive Ratings below league average. Following his return from suspension, Rashard Lewis has struggled perhaps as much as anyone else on the team. His efficiency has been well below his usual rates.
Next, let’s take a look at defense:
Again, I wouldn’t draw too many conclusions from the plus-minus numbers. As you can see from the table, if we were to take them at full value, we’d think the Magic is a team that is half defensive superstars and half defensive liabilities.
To start, Dwight Howard has been just as good on the defensive end as he’s been offensively. A Counterpart PER of 13.2 for a center is particularly impressive. Perhaps riding the coattails of players like Howard, Williams has posted good defensive numbers as well. He’s never had the reputation of being a lockdown defender, but his effort has been solid.
Looking down the list of defensive stats, we see nothing out of the ordinary except for a few things. Nelson’s CPER is very high, especially for a point guard. Carter’s, on the other hand, is very low (more on this later). Anthony Johnson’s plus-minus is comically bad, although that’s almost certainly the result of a small sample size. Finally, Matt Barnes looks great defensively according to Defensive Rating, but below average according to CPER. We’ll see how those numbers progress as the season goes on.
Back to Vince Carter’s defense. A couple of weeks ago, I wrote an article comparing his defense to Hedo Turkoglu’s. In the article, I said:
Here, Turkoglu strikes back. Carter looks below average in just about every category, and this supports his reputation. Turk, on the other hand, recorded numbers well above average in every category. The trickiest part about these comparisons is team context. It is something I’ve mentioned constantly when talking about my Composite Score numbers. Because of the way stats are tracked (at least publicly), it’s very difficult to separate a player’s individual contribution to his defense. How much of this is Hedo’s own doing, and how much of it is due to the fact that Orlando featured a very strong all-around defense? It’s hard to say, but I do think Turkoglu was probably a better defender than Carter.
Looking at Carter’s numbers in the early going, we can see that the team you’re on sure has a huge impact on your defensive numbers. He is better in every defensive category. How does he compare to Turkoglu now?
Turkoglu’s Defensive Rating has skyrocketed to 116, but his other defensive stats are still very impressive. This year, it’s hard to tell which player is better on defense. Carter has a low Defensive Rating and his plus-minus is very, very negative, but we don’t know how much that means yet. I think the lesson to take from this is that defensive statistics are pretty unreliable, especially this early in the season.
We’ll have to return to these numbers in about a month or so. The longer we wait, the clearer the picture becomes.
Something I’ve wanted to do for a while, and something I imagine every team does, is watch a game from start to finish and track all of the hustle plays made by both teams. In a league in which every player is not always giving 110%, sometimes a little bit of hustle and effort can go a long way.
With that in mind, the game I chose to track was Sunday’s afternoon contest between the Los Angeles Clippers and the Memphis Grizzlies. The Grizzlies have been playing well lately, including an impressive victory over Portland. Altogether, they came into the game having won five of their last seven contests. The Clippers were also relatively hot, having won three of their last four. As one would expect from a Grizzlies-Clippers game, the matchup was pretty lackluster through three quarters in terms of excitement and competitiveness. Then the fourth quarter arrived and the Clippers exploded, including a 22-0 run that won them the game. Los Angeles outscored Memphis 33-7 in the fourth quarter. After it was all said and done, the game ended up being a very memorable one for the Clippers and one the Grizzlies would soon like to forget.
Hustle, of course, played a large role. Below is a link to a spreadsheet which has the results of the statistics I tracked for the game. Those statistics included loose ball attempts, charges drawn, good sprints down the court (on either offense or defense), deflections, and missed blockouts.
A couple of non-hustle related notes:
As players get older, the belief is that they learn the tricks of the trade and get better at defense. During their first few years, they’re ill-equipped and unable to have a positive impact on defense, despite their superior athleticism and energy.
Do the numbers support these beliefs? We must turn to the always-useful Basketball-Reference.com. Using its Player Season Finder, I put together a spreadsheet containing every season from every player (minimum 500 minutes played) for the past five years. Using this data, we can see how Defensive Ratings change as players get older. Defensive Rating was developed by Dean Oliver, and it estimates the number of points a player allows per 100 possessions. Obviously, a lower number is better. To read more about it, check out the Basketball-Reference glossary. Let’s take a look at the chart:
I limited the age range from 19 to 36 to avoid outliers. On the x-axis, we have the age, and on the y-axis, the average Defensive Rating for that age. The results seem to confirm the common belief. Younger players tend to post higher (worse) Defensive Ratings than older players. Real life doesn’t work perfectly, so there are some fluctuations. However, the correlation is strong, indicated by the relatively large R^2 (explanation here). Therefore, there does appear to be something to the notion that players get better defensively as they get older.
We can also produce a similar graph using Defensive Win Score, a similar measure to Defensive Rating (for more information, check the glossary again). Basically, DWS is the amount of wins a player adds to his team through his defense. The chart is below:
The R^2 is slightly smaller, but the general idea is the same. Players get better defensively as they get older. Not considerably so, but statistically significantly so.
However, we must approach these results with caution. Let’s say, hypothetically, that big men generally have lower Defensive Ratings. Let’s also say, hypothetically, that big men stay in the league longer than their shorter counterparts. These two scenarios would combine to make it look like players get better defensively with age. What’s a simple way to account for complications such as this? Take a look at the data position by position.
To start, let’s look at centers:
The results appear to be clear as day here. The line is a little wavy, but centers sure seem to get better defensively as they get older. The average for 35-year olds is over three points per 100 possessions lower than the averages for 19-, 20-, and 21-year olds. Do power forwards react the same way to age?
Simply put, yes. These results tend to go with common logic. Many raw and young big men commit silly fouls, ignore help defense, go for the spectacular block too often, etc. However, we should not treat these results as gospel, as I will explain later.
How about small forwards?
Just like the previous two positions, it appears small forwards age well, at least on the defensive end. The magic number for this position appears to be 29. Small forwards that were at least 29 years of age during the last five seasons performed much better on the defensive end than their younger counterparts did. Let’s take a look at shooting guards:
We keep seeing the same results. No matter what position you look at, the story is the same. Players get better on defense as they get older. Finally, let’s take a look at the inevitable and see how point guards get better defensively with age:
Woops. That trend line has an oh-so-slightly negative slope, but it’s not exactly a great fit for the data (the R^2 is practically 0). Clearly, then, point guards don’t follow the same path as other positions. Older is not better in this case. For a position that often relies so much on speed and quickness, this makes sense. However, even point guards in their prime (around the age of 27) don’t perform significantly better than the young ones.
To wrap this up, we can make the following statement based on the data: Except for point guards, players generally get better on the defensive end as they get older. However, there are a number of issues to address before we go too far and actually believe that bold statement I just made:
UPDATE: After doing some more research, we may have to re-think things. Thanks to suggestions by Ryan Parker and Mike G at the APBRmetrics board, I decided to plot the average change in Defensive Rating (the difference between the current year and the last) for each age. It is below:
Looking at the graph above, we notice a couple of things. First, over the last five years, players of all ages tend to get worse defensively on a year-by-year basis. Whether it’s because of improving offenses or declining defenses, scoring has increased during each of the last five years.
More importantly for this study, we see that older players are declining faster than younger players are. For example, during the last five years, a 26 year-old is likely to have a Defensive Rating 0.5 points higher than he did a year ago. On the other hand, a 35 year-old is likely to have a Defensive Rating 1.5 points higher than he did a year ago. The difference between old and young isn’t much, but we can probably say that old isn’t definitively better than young.
Like I said in my original post, selective bias may be a problem. After all, this most recent research doesn’t dispute the fact that as a whole, when you look at all the old players, they tend to be better defensively than the young players. But that’s not because they got better as they got older. The data shows this. What we may be able to say now is that aging doesn’t improve your defensive abilities, and if you want to stay in this league as a veteran, you better be good at defense, because teams will “selectively remove” you from the league if you’re not.
A couple of weeks ago, Eddy Rivera e-mailed me this:
“I was wondering if you could look at the progression of the Orlando Magic defense this year, in comparison to how the team progressed in its first month under Stan Van Gundy when he arrived in 2007. The reason why I ask is because that’s the first year the Magic were adjusting to SVG’s defensive scheme and eventually, they were ranked 6th in defensive efficiency. Given that this year is a new team of sorts, with so many new players, I wanted to see how the squad was adjusting on defense (SCHOENE projects them to finish 5th).”
It’s about time to take a look at this question. With 14 games (through Sunday) now under their belts, the Magic have developed at least a tiny bit of a sample to look at their defense.
Eddy’s question seems pretty straightforward at first. To find the answer, shouldn’t we just look at how many points the Magic are giving up each game this season? Well, we already know that’s not going to work because you have to factor in pace. Once you factor in pace, though, the study is still lacking. After all, if the Magic play a bunch of offensively inept teams in games 1-7 and a lot of great offensive teams in games 8-14, it’s going to look like their defense is getting worse no matter what. So we must also factor in the level of competition.
With that setup in mind, I took a look at the Magic’s defensive progression through 14 games for each of the last three seasons. For each year, I calculated the points scored per 100 possessions of each opponent and compared that to their season average. I called that difference (between the game total and the season average) “Defensive Score.” I then plotted, for each season, the game number versus the Defensive Score for the first 14 games. Let’s start by taking a look at 2007-08, Van Gundy’s first season with the Magic:
As you can see, the Magic were all over the place in their first 14 games, producing a wide range of Defensive Scores. They allowed some teams to score nearly 30 points per 100 possessions above their season average (game #13 against San Antonio) but also held some teams to more than 30 below their season average (game #10 against New Jersey). The fact that the two performances I just mentioned came in the same week shows how up and down the Magic were as they were adjusting to the defensive schemes of their new head coach. I included a trend line in the chart, but don’t pay too much attention to it because we can see from the line’s information (on the right side) that it is a terribly poor fit. In other words, there was no real progression (either good or bad) from the Magic in the first 14 games of 2007-08.
How about 2008-09? Let’s take a look at the chart:
From the get-go, the Magic were dominating opponents on defense. Most of the points on the chart are below 0, meaning the Magic were almost always holding their opponents to lower than their season average. In addition, there weren’t any real stinkers. Now in his second year at the helm, Van Gundy had his defense at midseason form early in 2008-09.
Finally, let’s look at this year’s Magic team, a squad that has certainly had its struggles on defense:
True to form, the Magic were quite poor in their first seven games this year (with decent performances in the middle). However, things started to change in games 8-9, when the Magic at least held their opponents to close to their season averages. Lately, though, they’ve been playing great defense. In four of their last five games, Orlando has held their opponent below their season average. The one slipup was November 16, when the normally putrid Bobcats were able to put up a few points in Orlando. Overall, though, there appears to be a clear progression and a sign that the Magic’s defense is getting better. Unlike the last two trend lines, which had very poor line statistics, this line appears to be a pretty good fit. If you want the details as to why and are unfamiliar with R^2, click on this link, read about it, and check back here. Basically, there does appear to be something positive going on with the Magic defense.
I think these graphs are pretty enlightening. They show that this year’s Magic defense just needs time to get to its 2008-09 levels. I will check back in with these numbers in the future.
Last week, I calculated my own version of various advanced statistics, such as Rebound Rate, Assist Rate, and Usage Rate. The difference between my versions and the ones you normally see are that mine were based on actual play-by-play data, rather than estimates. Although my method isn’t perfect (partly because the play-by-play isn’t always reliable), I figured it was more accurate to base our stats on stuff that has actually happened as opposed to estimates of what happened.
Under that assumption, the question is how accurate are the numbers we’ve grown to know and love? Although they’re not too difficult to calculate, the play-by-play figures aren’t always available, so we need to know if we can count on the data that is most common. How far off are these estimations? Are there certain types of players for which these stats are usually inaccurate?
To recap, these are the stats in question:
Let’s start with a simple test. How well do the estimated numbers correlate with the play-by-play numbers? Below is a table that includes the R^2 (explanation) and standard error of each linear regression, as well as the average difference between the two types:
Thankfully, we see that all of the estimations appear to be pretty darn accurate. The R^2’s are all extremely high, and the standard errors are low. Of the seven stats I’m examining, Steal Rate appears to be the most inaccurate. It fares the worst in each of the three table columns. Overall Rebound Rate appears to be the most accurate. From this table, we are given no reason to doubt the validity of the box score estimations.
Although they may be accurate as a whole, perhaps these numbers are inaccurate just for certain players. Specifically, I was wondering if players that rate either really high or really low in a certain statistic are generally rated accurately by the box score estimation. To try to answer that question, I ran another regression. This time, the box score estimation was the independent variable, and the difference between the box score and play-by-play was the dependent variable. The results are in the table below:
There are some things to look out for. Although the adjusted R^2’s are all quite low, even negative sometimes, the slopes are all positive. This would indicate that as a given player gets better in a certain statistic, the box score data is more likely to overrate him in that category. The biggest problems occur with Assist Rate, which has a moderately sized R^2 value.
If that table doesn’t seem intuitive, I’ve also decided to present the results graphically. In each chart below, the x-axis is the box score estimate’s value, and the y-axis is the difference between the estimate and the play-by-play calculation.
All three Rebound Rates look pretty accurate, although they become more unpredictable as the numbers get high, especially with respect to Defensive Rebound Rate. When the Rate is around 10, the errors are pretty closely scattered around 0. However, when you get to 17.5 or 20, the errors become larger.
As I mentioned before, Assist Rate seems to have some major issues. For low Assist Rates, the differences are pretty small. However, when you get to the top assist men, the differences can be quite large. For example, Chris Paul’s Assist Rate for last season, according to the box score data, was 54.5. However, the play-by-play data has it at 51.2. For someone like him, where the number is astronomically high no matter which method you choose, the difference might seem trivial. But it does appear that top assist men are overrated the most by Assist Rate.
There’s not much to gather from the Steal Rate chart, although it becomes clear that my play-by-play computations are generally lower than the box score estimates.
Like Rebound Rate, Block Rate becomes particularly difficult to estimate when the numbers get high. As a percentage of the Block Rate, though, the difference is actually pretty consistent.
Finally, we have Usage Rate. There aren’t any major issues except for one outlier at the bottom, which is the result of complications due to the weirdness of Luc Richard Mbah a Moute’s name (seriously).
In conclusion, my research has shown me that, despite some minor issues, the box score estimations of things such as available rebounds are actually pretty close. They aren’t always perfect, and they can be particularly unreliable when the numbers get large, but overall they do a good job. Hopefully this work will provoke discussion on how we can continue to perfect those stats.
Another quick update. I removed the 0.44 estimator I was using for Steal Rate and Usage Rate to calculate possessions. Instead, I totaled the possessions from the play-by-play data itself. The updated numbers are below.
Some of the best stats out there, ones that most fans familiar with advanced stats know about, are actually based on estimates using box score data. For example, when we calculate Marcin Gortat’s Offensive Rebound Rate, we’re trying to determine what percentage of available offensive rebounds he collected while he was on the court. However, we don’t really know how many rebounds were available. We have to estimate based on how things usually go for the Magic and their opponents, and assign a portion of that to Gortat.
Using box score data, that’s the best we can do. But we also have play-by-play data, and we don’t have to estimate. We (actually, a programming script) can go through the hundreds of thousands of recorded plays from the NBA 08-09 season, and find how many of those resulted in offensive rebound opportunities for Gortat. From there we just total how many offensive boards he had, and divide that by the number of available ones.
This method removes some of the guessing game, and the results of this method on various stats for the Magic will be discussed today. For a full explanation of how everything works, I will refer you to the article I wrote over at Basketball-Statistics.com last Thursday, which is here. Let’s start by comparing the estimated rebound rates to the actual ones, as calculated from the play-by-play data:
We can see that the estimates are pretty darn close. Amazingly, though, Dwight Howard is an even better rebounder than we thought (by 0.3%). Gortat’s offensive rebounding may have been slightly overestimated, but his defensive rebounding was underestimated. The biggest differences were for Keith Bogans and Rafer Alston, who were actually not rebounding as well as we thought.
Now let’s move on to some stuff for the little guys. Here are the comparisons for assists and steals:
Jameer Nelson’s Assist Rate may have been inflated, while Anthony Johnson didn’t receive enough credit. When we use the play-by-play data instead of the estimates, the difference between the two shrinks from 10.9% to 7%. My play-by-play steal rates are slightly lower for every player, and that may have something to do with differences in the way I calculated possessions.
Finally, let’s look at blocks and usage rate:
Again, we see that each player’s PBP data is less than his estimated data. This is not a Magic-only thing. The reason for this difference is again due to different calculations. Block percentage is normally calculated as the percentage of opponents’ two-point attempts that were blocked by the player in question. My calculations counted three-point attempts as well. I feel that this way is more appropriate because, even though it’s rare, three-pointers do get blocked. With usage rates, we again see that the estimates were actually pretty close to the real thing.
Because the differences between the estimates and the play-by-play data are usually small, this information may seem trivial. In many ways, it is. However, it’s nice to get that warm fuzzy feeling when you know the numbers you’re looking at are thoroughly calculated instead of just estimations.
What, does nobody else get that feeling?
When I posted my recalculated stats using play-by-play data over at the APBRmetrics board, I learned that the Block Rates at Basketball-Reference are actually calculated using only opposing two-point attempts. In other words, a player’s Block Rate is the percentage of opposing two-point field goals that the player blocked.
With that new piece of information, I recalculated the Block Rate for every player. The new figures, along with the rest of the recalculated stats, are posted below: