Are Top 5 picks that much better? (split)

Capt. Factorial

trifolium contra tempestatem subrigere certum est
Staff member
#61
Curious, what’s is the standard error when you are using the entire population and not a sample. Been a while since my statistics class but in our case we used the entire population of All-Stars going back to 1948. I think the error goes away as the error is a sampling error and we aren’t sampling, we are using the universe.
You're actually running into a difference between descriptive statistics and predictive statistics. From a descriptive point of view, if you have the entire population then the standard error really is meaningless. It won't calculate to zero, but there's no uncertainty about what the mean is (from a descriptive point of view) if you have the whole population.

But clearly a goal here is to apply the knowledge we have in order to make predictions about future drafts. In that sense we don't have all the data, because some of it hasn't happened yet, and we're using our sample (drafts that have already happened) to estimate the true values of our population (all drafts past and future).

Another interesting question is has the ability to find an all star in later rounds changed over time and if so in what direction. I suppose that would require grouping the years in some increment. Curious as to your thoughts on the appropriate years grouping. Guessing 15 year increments might be best.
Grouping in increments works a little, but it's not very easy to make a statistical case because you start running short of data points. That's the kind of thing where a linear regression on the individual (ungrouped) data points is often useful, but with data like this (lots of zeros punctuated with values here and there) I don't think linear regression does a very good job. You could try in 10-15 year increments to see what it looks like, but you may have trouble convincing folks (myself included) that any small trend is real.
 
#62
You're actually running into a difference between descriptive statistics and predictive statistics. From a descriptive point of view, if you have the entire population then the standard error really is meaningless. It won't calculate to zero, but there's no uncertainty about what the mean is (from a descriptive point of view) if you have the whole population.

But clearly a goal here is to apply the knowledge we have in order to make predictions about future drafts. In that sense we don't have all the data, because some of it hasn't happened yet, and we're using our sample (drafts that have already happened) to estimate the true values of our population (all drafts past and future).



Grouping in increments works a little, but it's not very easy to make a statistical case because you start running short of data points. That's the kind of thing where a linear regression on the individual (ungrouped) data points is often useful, but with data like this (lots of zeros punctuated with values here and there) I don't think linear regression does a very good job. You could try in 10-15 year increments to see what it looks like, but you may have trouble convincing folks (myself included) that any small trend is real.
Ah yes. Thanks for the clarification and reminder. So how do you calculate the standard error in a predictive model like you described, since all drafts in the future extend into infinity?


Yeah I was wondering about sample size. Do you have another approach you would use to see if the deviation is changing?
 

Capt. Factorial

trifolium contra tempestatem subrigere certum est
Staff member
#63
Ah yes. Thanks for the clarification and reminder. So how do you calculate the standard error in a predictive model like you described, since all drafts in the future extend into infinity?
The standard error is simply the standard deviation divided by the square root of the number of samples. That's the basic formula. It doesn't actually take into account whether there are infinity drafts in the future, or only one, or whatever. I imagine (this is beyond my league) that the calculation as devised probably assumes an infinite population.

I wouldn't get too hung up on the standard error as it's not a be-all end-all in any way. There just happened to be a question about the distinction between standard error and standard deviation, and it's a topic I've had to explain before in my work so I tackled it.

Yeah I was wondering about sample size. Do you have another approach you would use to see if the deviation is changing?
I'm not really sure off the top of my head. Again, my gut says linear regression is the right way to go while at the same time saying that linear regression probably won't work well given the nature of the data set. If I were tackling this (thank goodness I'm not! ;)) I would probably try to find a different measure than number of All-Stars in late picks (I'm assuming you're thinking second round) to get a richer data set. Something like maybe career win shares (or better, average win shares per year to allow comparison of more recent drafts where players have not yet finished their careers) for all players in the second round would give an analysis like this a little more meat to chew on.

Note that there are a couple of pitfalls here if you want to go all the way back in NBA history, one being that for quite some time there were a lot more than two rounds of the NBA draft, and another that there haven't always been 30 teams, so the "second round" now is not exactly comparable to the "second round" 40 years ago.
 
#66
The standard error is simply the standard deviation divided by the square root of the number of samples. That's the basic formula. It doesn't actually take into account whether there are infinity drafts in the future, or only one, or whatever. I imagine (this is beyond my league) that the calculation as devised probably assumes an infinite population.

I wouldn't get too hung up on the standard error as it's not a be-all end-all in any way. There just happened to be a question about the distinction between standard error and standard deviation, and it's a topic I've had to explain before in my work so I tackled it.



I'm not really sure off the top of my head. Again, my gut says linear regression is the right way to go while at the same time saying that linear regression probably won't work well given the nature of the data set. If I were tackling this (thank goodness I'm not! ;)) I would probably try to find a different measure than number of All-Stars in late picks (I'm assuming you're thinking second round) to get a richer data set. Something like maybe career win shares (or better, average win shares per year to allow comparison of more recent drafts where players have not yet finished their careers) for all players in the second round would give an analysis like this a little more meat to chew on.

Note that there are a couple of pitfalls here if you want to go all the way back in NBA history, one being that for quite some time there were a lot more than two rounds of the NBA draft, and another that there haven't always been 30 teams, so the "second round" now is not exactly comparable to the "second round" 40 years ago.
Well I have already gone back as far as the all star model allows. While a one or two is past pick 60 that is about it so I don’t think rounds matter.

I saw your curve fitted for win shares which was consistent with the data we saw with all-stars. Are you saying win shares is enough data to see if the curve shape becomes more linear over a shorter time span? That would allow us to compare curves and see if they were flatter.
 

Capt. Factorial

trifolium contra tempestatem subrigere certum est
Staff member
#67
Well I have already gone back as far as the all star model allows. While a one or two is past pick 60 that is about it so I don’t think rounds matter.

I saw your curve fitted for win shares which was consistent with the data we saw with all-stars. Are you saying win shares is enough data to see if the curve shape becomes more linear over a shorter time span? That would allow us to compare curves and see if they were flatter.
Before I launch into anything that is beside the point, I was under the impression you were interested in a question about (in my own words) whether the quality of players available in the second round (or later in the draft) has changed over time. If that's not what you're after, steer me straight.
 
#68
Before I launch into anything that is beside the point, I was under the impression you were interested in a question about (in my own words) whether the quality of players available in the second round (or later in the draft) has changed over time. If that's not what you're after, steer me straight.
No, sorry for the miscommunication. I am interested in whether scouting has improved generically in the league such that finding that key 3 time all-Star caliber player in later draft picks was easier or harder.

Basically drafts are going to have 2-4 3-time all stars. That means if you are drafting 8-10, two thirds of your colleagues are going to have to screw up before you get your chance. Is that more or less likely now than 30 years ago?
 

Capt. Factorial

trifolium contra tempestatem subrigere certum est
Staff member
#69
No, sorry for the miscommunication. I am interested in whether scouting has improved generically in the league such that finding that key 3 time all-Star caliber player in later draft picks was easier or harder.

Basically drafts are going to have 2-4 3-time all stars. That means if you are drafting 8-10, two thirds of your colleagues are going to have to screw up before you get your chance. Is that more or less likely now than 30 years ago?
OK, well I think that All-Star status is a somewhat simplistic "abbreviation" of player quality. For instance, I don't think that scouts are ignoring players who aren't going to be All-Stars and just making random selections. So if you really want a solid look at the question of scouting ability you'd probably want to look at some metric (as a first offer, say career Win Shares) that gives a look at overall talent. Using just 3x All-Stars would make it a much easier analysis, but will only give a look at talent evaluation of the very best players, and again it reduces your number of data points so it might be harder to find a real trend if one exists.

But it sounds like to me, for your purposes you should probably just figure out, from year to year, the average position at which a 3xAS is drafted. Does that go down, stay steady, or go up over time?
 
#70
The dispersion is a lot higher than I expected in this subject. There's a clear trend though no matter what data you use. Kind of obvious but still fun to look at!
1521744053116.png
Source: Basketball-reference.com
 
#71
That's actually interesting. I've always gotten the impression that a lot of guys from picks 5-8 don't pan out as hoped. I'm wondering if teams are betting on long term star potential at those spots (since the top guys are gone), leaving the best players for taking at the 9-12 spots. This year you'll probably see Carter/Sexton/Bridges/ going around those spots, and they could easily turn out to be some of the better players in the draft.

There may be nothing in the above, but it would be interesting to do a deeper dig.
 
#72
To illustrate Sactowndog's point further, if you take out the top 5 picks the slope of the trendline decreases by a whopping 40% :eek:
1521747850850.png
 
Last edited:

Capt. Factorial

trifolium contra tempestatem subrigere certum est
Staff member
#75
The dispersion is a lot higher than I expected in this subject. There's a clear trend though no matter what data you use. Kind of obvious but still fun to look at!
View attachment 7487
Source: Basketball-reference.com
The strange thing about this is that it doesn't show - at all - the exponential character of the plot further up in the thread.

Win Shares per 48 is a bit of a strange stat. In fact, I'm pretty sure that it's not a good metric here. Because Win Shares are cumulative, floor time makes a difference, and WS/48 factors out floor time.

And there's not much room for variation. If I understand the definition right, a completely average player on a .500 team should get 0.1 WS/48. But the ceiling of the metric is 1.0 WS/48, and that would be for a one man team that went undefeated. Obviously something like 0.4 WS/48 is probably a reasonable ceiling for a LeBron type player, and even then that's a guess and might be generous.

Something like career WS, or WS/year to allow using more recent drafts, would probably give something closer to that exponential graph.
 
#76
The strange thing about this is that it doesn't show - at all - the exponential character of the plot further up in the thread.

Win Shares per 48 is a bit of a strange stat. In fact, I'm pretty sure that it's not a good metric here. Because Win Shares are cumulative, floor time makes a difference, and WS/48 factors out floor time.

And there's not much room for variation. If I understand the definition right, a completely average player on a .500 team should get 0.1 WS/48. But the ceiling of the metric is 1.0 WS/48, and that would be for a one man team that went undefeated. Obviously something like 0.4 WS/48 is probably a reasonable ceiling for a LeBron type player, and even then that's a guess and might be generous.

Something like career WS, or WS/year to allow using more recent drafts, would probably give something closer to that exponential graph.
That's why I only went up to 2013. I think WS/48 tells the story fine, but tbh I never even heard of it until today haha

Maybe since you're the stats guru you can show us how a top 5 draft wins you more championships? I am not a good basketball stats guy, obviously!