Measures of Central Tendency – The Median
Out of the three measures of central tendency, the mean is the most common but it isn’t appropriate unless our data is normally distributed and continuous. What if we have continuous data, but it is skewed instead of normal?
First, what is skewed data? When we think of normally distributed data, we might imagine the normal curve where most of the values are in the middle of the distribution with fewer values on both ends of the curve. In this case, exactly half of the distance between the values and the mean will be above that mean, and half will be below it. By comparison, a skewed distribution will have most of the values on one end or the other, with only a few on the opposite end. In this case, the fewer values are outliers and mathematically they will influence the mean to be artificially higher or lower. For example, if most school children spend 1-2 hours on homework each day, but a few children spend 6 hours on homework, we would see a positively skewed distribution. Depending on how many children are in our sample, the mean might be 4 hours but this clearly wouldn’t be the midpoint. If most children are spending 1-2 hours studying, the midpoint will be closer to this and the mean of 4 a clear exaggeration.
(Figure 8.3 from http://allpsych.com/researchmethods/distributions.html )
Because of the lopsided nature of the data, the mean won’t give us an accurate picture of the midpoint of the data. Instead, we use the median for skewed data, which is the midpoint in terms of the actual values. The median is simply the middle value in the data set after the values have all been put in ascending order, so exactly half the values will be above the median and half will be below. The median is not nearly as sensitive to extreme values as the mean; after all, the middle value is still the middle value regardless of how extreme the outliers might be. The median, however, can be quite different from one sample to the next, so it won’t be useful beyond describing the data set. The mean, on the other hand, is useful in many inferential analyses because it is consistent from one sample to another and is therefore representative of the population the sample comes from.
As useful as the mean and median are for describing the midpoint of data, neither will work if the data is categorical. Tune in next week for a discussion of the mode, and how all three measures of central tendency are related.