BasketballStatistics.com  Innovative Stats and Analysis  




by Jon Nichols Composite Score was never developed with the intention of being able to predict the future. The goal was to create a stat that could accurately reflect why teams won basketball games and which players contributed the most to that winning. CS does not take into account age, health, and other factors that vary from year to year. Despite all that, I’ve always been curious to see how good of a job Composite Score would do at predicting a team’s success. To test this, I did a bit of regression analysis (this article will have some statistical stuff in it, but I’ll try to explain things so that people can understand it whether or not they’re experienced with stats). For this simple analysis, I collected three variables. One variable was each team’s win percentage in 200607. Another variable was each team’s win percentage in 200708. The third and final variable was each 0708 team’s weighted 0607 Composite Score. For the weighted 0607 Composite Score, I looked at the percentage of minutes each player on the 0708 teams played (this data is available at 82games.com) and multiplied that by their 0607 Composite Score. This simulated a situation where if you knew how the minutes of your favorite team were going to be distributed this season, and you knew all their Composite Scores from last season, you could predict their win total. To sum it all up, I’m testing which is better at predicting a team’s success: (One note: Players that are on winning teams have higher Composite Scores, so the two factors above aren’t totally unrelated.) 0607 Record vs. 0708 Record
Residuals: Coefficients: Residual standard error: 0.1513 on 28 degrees of freedom As you can see from the chart, there is a pretty strong correlation between a team’s record last year and its record this year. This is supported by the numbers above (which look like nonsense to many of you). The pvalue of .00821 reinforces that there is definitely a relationship between the two variables (lower pvalues indicate a stronger relationship). The R^2 value is .2242, which indicates the strength of the linear relationship (R^2 ranges from 0 to 1, with numbers closer to 1 indicating a stronger relationship). More on this later. 0607 Composite Score vs. 0708 Record
Coefficients: Residual standard error: 0.1384 on 28 degrees of freedom The scatter plot above looks pretty similar to the previous one, so it’s hard to tell if Composite Scores are a better predictor by simply looking at the charts. Instead, we’ll turn to the statistics. As you recall, the pvalue of the last regression was .00821. With Composite Score, it is even lower, at .000566. This is an argument in Composite Score’s favor. In addition, if you look at the R^2 for Composite Score (.3506), it is higher than with the 0607 record (.2242), another argument in CS’s favor. Finally, if you look at the MSE (mean square error) of the record correlation (0.02288) compared to the MSE of the CS correlation (0.01915), it shows that there is more variability and unreliability when using the 0607 record. In other words, if we made two models for projecting a team’s winning percentage, with one based on last season’s record and the other based on last season’s weighted Composite Score, the one based on the records would be more in error more often. Taking a step back, I think this is significant, but there are a few catches. I think it’s significant because teams generally play pretty similarly from year to year, so the last season’s win percentage should generally be a good predictor of the next season’s win percentage. However, using Composite Scores (and of course somehow knowing how many minutes each player would play) is an even more accurate way of predicting the team’s record. If I knew nothing of basketball and was asked to predict a team’s record, I would prefer to know the Composite Scores of its players (and how many minutes they would play) over how that team did last season. Now, for the catches. As I mentioned before, Composite Score is slightly based on team success, so it does cheat a little bit. In addition, using the Composite Score method takes into account free agent signings and trades (although it ignores rookies and injuries), which gives it the edge. There are also some statistical limitations. Correlation does not mean causation. In this case, Composite Score and Win % are both reflecting something else: the talents of the players themselves. Finally, this study is not comparing my rating system to any of the other great ones out there. It’s simply showing that Composite Score is not a bunch of random numbers, and they do have some prediction value.
Copyright © 2009 BasketballStatistics.com 
