Whoa, whoa, whoa, whoa, whoa, guys. We've got all kinds of things going sideways in this discussion and we need a little bit of a reset.
Took the liberty of making a graphic (I like graphs
Although 6 years is a small sample, looks like there is no statistical significance for getting 3XAllstars in the 1-15 seeds. Good news for the anti-tank crowd. *The >20 bin does not have the same sample size so significance can't be used there.
View attachment 7469
When sactowndog said "starting" in 2012, I'm certain he meant starting in 2012 and going *backwards* in time - players from 2013 and onwards have very little chance of being 3-time All-Stars (and from 2016/2017 have *no* chance of being three-time All-Stars yet) due to their short time in the league. So the sample size here is not 6 years. I can't say how far back sactowndog went, but I'm going to guess the sample size was quite a bit larger than 6 years.
I believe the "not the same sample size" comment is actually talking about the fact that there are 40 picks per year in the >20 bin compared to 5 picks per year in the other bins. While that does mean that the number in that bin does not correspond to the probability of getting a 3x All-Star *per pick*, such that the raw value as listed (somewhere around 0.4 3x All-Stars per draft taken after pick 20) does not compare per pick to the other bins, there's nothing that would prevent comparison of those numbers. The question "Are there more 3x All-Stars drafted 1-5 than drafted 21-60" is a perfectly answerable question, but it's probably not exactly the question most are interested in.
Considering the 3rd Quartile (assuming a box and whiskers) lands on 1.5, and has a max of 2.4ish I would say that it might be useful to pick a different argument. Out of the 35 players selected in that range, your 3rd quartile ends at 1.5. That's not a big number, statistically speaking.
These aren't box-and-whisker plots, they are a simple bar graph with an error bar indicating the standard deviation. Quartiles are not indicated. Overlap of standard deviation is not a very good way of assessing statistical significance. In fact it's relatively easy to have two distributions that are statistically-significantly different even though they have overlapping standard deviations - it's really a matter of sample size.
Statistical significance is a percentage. Generally something in the 3-5% range is the typical margin for error. In this case it is over a 100% difference. Not only is it statistically significant it is an exponential relationship.
Statistical significance is a
concept. The fundamental idea is that if we have observed two (or more) populations, and we find that these populations have different averages, we want to know how confident we are that the two populations are actually different - as opposed to the differences just being accidental. Statistical significance is expressed in terms of a p-value, which is an estimate of the probability that the populations that we see and we think are different could have, by chance, been randomly drawn from a single underlying distribution, and still ended up that different. P-values are most often calculated by a statistical test such as a Student's T-Test or an ANOVA (though there are other methods) - they are not simply reflections of the percentage difference between the averages. For many scientific purposes, a p-value of 0.05 or lower (meaning that there is an estimated 5% or lower probability that the two observed populations could have been drawn from a single population) is used as a threshold for declaring confidence that the result is "real". Results that have p-values of 0.05 or lower are often colloquially referred to as "statistically significant", indicating a high confidence that the two observations (technically the distributions underlying the two observations) are in fact different.
Without having the actual data in hand, it's not possible to perform a proper statistical test. However, given my long experience performing statistical tests on wide varieties of data, and having an offhand look at the data presented, combined with the belief that the data set probably spans 20 years or more, my best
guess is that if a proper statistical test were performed that the probability of finding a 3x All-Star with a pick from 1-5 would be "statistically significantly" greater than the probability of finding one with a pick from 6-10.