Sampling Distribution of the Mean
When we conduct research and gather data to analyze, we wind up with two things to compare: the mean of our treated sample and the mean of the population that has not been treated. For example, suppose we gather a sample from a population of people who never drink soda. Then, our sample of people is told to drink 5 sodas a day (known as the “treatment”) to see if they gain weight. At the end of the data-gathering period we would want to compare the mean weight of these soda-drinkers to the mean weight of the population of non-soda-drinkers. If these two means are found to be significantly different from each other, there are only two reasons. One reason is because the treatment really did make a difference and the soda made them gain weight, and the other reason is because of just sheer dumb luck or chance and they would have gained weight anyway. Since the purpose of many of the common analyses is to determine if the difference between the treated sample mean and the untreated population mean is due to merely chance or due to something that the researcher did (in this case giving them 5 sodas a day), knowing what the mean of the population is would be very useful. Trouble is, we usually don’t know what the population mean is, which could be a problem. But, there is a solution. When we use statistical analyses to address hypotheses or answer research questions, those analyses are based on a theoretical distribution known as the sampling distribution of the mean (or of means).
The sampling distribution of the mean is a normal distribution made up of the means of all theoretically possible samples of a certain size; size “n”. Though we really can’t collect every possible sample from a population, suppose that we could. If we could, we would decide on a sample size, such as n = 30, and randomly choose every possible sample of 30 people, calculate the mean of each of those samples, then plot them all on a frequency distribution. What we would find would be something that looks like this:
Those means in the middle represent most common means, and those in the tails are the least common means. The mean of this distribution of means represents the population mean. Since it usually isn’t feasible to take every possible sample in a population, we have a mathematical way of knowing what this distribution will look like. The Central Limit Theorem describes for us this distribution in all possible ways: those of shape, central tendency, and variability. More on this next time!