The Normal Distribution
Data is said to be normally distributed when it’s distribution, or occurrence of scores, follows the normal curve. It doesn’t have to be exactly perfect, but it should be approximately normal in order for us to use the typical statistical analyses that become necessary to answer research questions and/or address hypotheses.
Why is this important? When performing statistical analyses, a common assumption is that the sample used to gather the data was randomly drawn from a normal population. This ensures (hopefully!) that we have managed to get a representative person from every possible area of the normal curve. In the population, regardless of what that population happens to be, about 68% of the scores will be within 1 standard deviation of the mean of that population. About 95% will be within 2 standard deviations of the mean, and over 99% will be within 3 standard deviations of the mean.
If our sample isn’t from a normally distributed population, it makes our inferences from sample data back to the population less accurate. (There is an exception, though. The Central Limit Theorem does state that as long as your sample size is 30 or greater, normality can be assumed whether the population your sample came from is normal or not.) So why should the data be normally distributed? It’s because of the mean. Most of the typical analyses use the mean, and of course the mean is the most reasonable measure of central tendency to use with normally distributed data.
Another reason is because of something called the Sampling Distribution of the Mean. I will explain this in the next post.