Correlation – Part 2
We know from the previous post that correlation gives us a number that tells us about the relationship between two variables (usually named X and Y). There are several types of correlation, but the most common one you’re likely to see discussed in research articles is Pearson’s r.
The correlation coefficient that Pearson’s r produces is a ratio of how the variables vary together (their covariability) and how they vary alone, so the formula tells us to divide SP by the square root of SSxSSy, and it looks like this:
r = SP
And here’s what it means:
r = covariability of X & Y together how much they vary together
variability of X & Y separately how much they vary individually
Recall that covariability means as one variable changes, the other one changes in a somewhat predictable manner. The top number will always be the smaller value. But as the amount of covariability approaches the amount of separate variability (as the top number approaches the same value as the bottom one), we approach the value of 1, indicating a perfect relationship. So the ratio will give us a value between 0 and 1, indicating the strength of the relationship between the two variables. The closer we get to 1, the stronger the relationship between them. Similarly, the closer it is to 0, the weaker the relationship is. So the numerical value alone tells us about the strength of the relationship between the variables.
Pearson’s also tells us about the direction of that relationship. This is indicated by either no sign at all (a positive relationship), or a negative sign (a negative relationship). The sign has nothing to do with the strength of the relationship; it only indicates direction. Therefore, correlations of -.68 and .68 are equally strong.
To find the correlation between two variables, we need a sample in which each person has a score on those two variables. Also, these variables should be continuous (interval or ratio). First, we would find out how much the variables vary together and then we’d find out how much they vary separately (if you’re interested in doing this you can find the formulas for SP and SSxSSy in any good stats book). Then, to get the final correlation, we’d take the ratio.
The correlation coefficient we get can be affected by outliers. Just one person that is very different from the others (which makes their scores on X and Y different) can cause the correlation to change in both size and direction, which could be very misleading. Correlation can also be affected by restricted range, which are scores that are very close together. This situation can make it appear that two variables are not related, when actually they are.
We must remember that “correlation does not imply causation” (a well-known saying that happens to be true). Correlation can only tell us about the relationship between variables, and nothing about whether one variable is causing the other. So even though correlation is limited in its usefulness, it is important to know whether variables are related. After all, if variables aren’t even related to each other, the more extensive analyses we might perform to address our research hypotheses would be meaningless.