## Joining, by = c("Rk", "Player", "Pos", "Age", "Tm", "G", "MP")
It’s hard to keep up with data analysis, data viz, and coding skills in a consistent way in grad school, since you often have long bouts without any data that needs working with. There are plenty of accessible data sets to play around with, but it’s more fun to use living, breathing data like sports stats. I decided to do some playing around with data from Basketball Reference, which would also give me a chance to work on web-scraping. I’ll walk through a number of relationships I wanted to examine and include a little bit of methodology on how I got there1 Check my Github repo for all my scripts from this project.
As a native Minnesotan, I’m required to be a Wolves fan, though I fully understand the inevitable heartache and loss that Minnesota sports will bring me. A prominent narrative surrounding the Wolves this year has been the heavy minutes laid on their top 5 and the toll it’s been taking on the players. Just looking at minutes played this season, 3 of the top 4 players are from Minnesota.
Player | Minutes |
---|---|
Andrew Wiggins | 2384 |
Bradley Beal | 2341 |
Khris Middleton | 2341 |
LeBron James | 2335 |
Karl-Anthony Towns | 2313 |
Russell Westbrook | 2283 |
However, I wanted to look a little deeper at how teams distribute minutes among their top players and their bench. I decided to use cumulative distribution plots to see how teams use their top players and bench. You can use them to look at things like “what % of a team’s minutes do their top 5 play?”.
Here’s an interactive plot2 I used plotly’s ggplotly()
function to put this together. of all NBA teams. Mouse over a line to see the team name and data for that particular point, or click on teams in the legend to remove them from the plot. If you double click on a team in the legend, it’ll show only that team. You can then re-add teams if you want to compare a few.
The Wolves and Nets are two extremes. Comparing the extremes across the NBA, Minnesota and Brooklyn drastically differ in their distributions of playing time. The slope for MIN rises very quickly, with nearly 70% of the team’s minutes coming from their top 5. The line ends with a short, flat tail, since they’ve only played 13 guys all season and most of those deep bench guys are getting only a few % of the team’s minutes.
BRK is the opposite, with their top 5 making up 50% of their total minutes and a full 20 guys getting minutes this season. Not only do the Nets use their bench, but the relatively steep slope all the way through to the end of the bench indicates that these guys are getting fairly significant minutes.
I noticed there was one team that wasn’t too far off from the Wolves in several aspects of their distributions.
Now, I haven’t seen any discussion of the Pacers as another small-rotation team, but the Pacers actually match the Wolves if you look at the contributions by their top 8 guys. The Pacers are certainly distributing minutes more evenly among these guys, but they’re still well above most of the league at this point.
Let’s take a look at the top 3 teams from each division.
Houston is actually comparable to the Wolves and Pacers.
Ok, fine, not in every way…
We all know Golden State doesn’t care about the regular season and it shows here. They get significant contributions from a bunch of bench guys while their starters play fairly typical minutes. The Spurs play their starters even less and have a pretty significant bench to boot.
On the other end, Houston is actually among the top few teams in the NBA in the load on their starters. They’re not all that far off from Minnesota or Indiana. I think the narrative is usually that the Spurs/Warriors model lends itself to playoff success, and combining that with Houston’s past playoff flops makes me wonder whether the Rockets’ streak can continue through the postseason (but come on, they’re just so good).
Yeah.
But let’s dig into a more specific question that often comes up regarding the Spurs: does a stable roster correlate with team success?
More generally, can we leverage ecological methods for NBA data analysis? Is there beta diversity between teams? What kind of ecological sampling methods can we apply to NBA data? What are similarities between ecological techniques and others and how can we map them/make them more broadly approachable?