Perhaps necessary, but not sufficient

Data that fit a linear normal probability plot may be normal and it may be necessary that the limited data set do this, but it is definitely not sufficient.

It also has to make sense. Consider the example of the racing times in seconds of a greyhound named “Barbies Bomber” on the 5/16 mile track (2.5 furlongs):

31.26, 31.35, 31.91, 32.06, 32.37, and 32.52 (ranked low to high).

With only 6 data points, one cannot check normality by drawing a histogram. One uses a Normal Probability Plot (NPP).

It makes sense that race times would be approximately normally distributed: dogs are about average most days and have about equal numbers of good and bad days, and the owner can keep the dog from racing when the dog is really sick, run down, or excessively tired – so really bad racing times might not be seen (skewing the distribution a bit, making it less normal).

But are these six times a reasonable representation of what we would expect to see if we had enough data to plot the histogram?

Indeed, these 6 points fit a normal probability plot rather well: the linear correlation coefficient is 0.970, which comfortably exceeds the critical value of 0.888 for 6 observations.

Reference: Sullivan III, Michael. Statistics: Informed Decisions Using Data (Page 379). Pearson Education. Kindle Edition.

But this plot assigns 31.26 to a z-score of -1.28 and the 32.52 to a z-score of +1.28. The max/min is only 1.04. This 4% difference corresponds to 2.56 standard deviations in the NPP, meaning that the %CV of this dog’s racing times would be only 1.6%. Seems way too low to be biologically believable.

Most reasonable people would say we are concluding normality from what is really a narrow swath of times taken from a real distribution that may be approximately normal, but is more likely somewhat skewed away from really bad racing times for the above mentioned reasons, tempered with an understanding that owners also make mistakes. The whole procedure is not likely to be a valid SOP.

Yet it is reasonable to suppose that racing times are reasonably close to normally distributed and it is reasonable that the full %CV that might exist if the dog had to race every day on schedule is likely attenuated by any insightful and compassionate owner who knows that it is bad for the dog to race when he/she is sick.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s