Correlation – Part 1
Correlation is a procedure that will tell us about the relationship between two variables. Specifically, it tells us about the nature of their covariability, or how they behave together. Are they related to each other? Is their relationship positive or negative? How strong or weak is their relationship? Correlation answers these questions for us.
Suppose we are interested in whether students who sit and fret over a test do better or worse than students who simply answer the questions and turn it in to the instructor. We can use correlation to give us an idea by finding out how time spent on a test and grades are related. If we should find a strong negative relationship between the two (for example, r = -.68), this would indicate that the longer students lingered over a test, the lower their grades tend to be. The negative sign tells us about the direction of the relationship (this relationship is negative, meaning as one variable increases the other one decreases), while the number (.68) tells us the relationship is strong. How do we know it is strong? Because correlation coefficients range from -1 to +1, and any value that is closer to the extremes indicates a stronger relationship. A correlation of .28 would be much weaker than .68 because it is closer to zero. Of course, the strongest possible relationship (known as a perfect correlation) would be -1 or 1, and isn’t as common as non-perfect relationships.
So does this mean that lingering too long over a test causes lower grades? No, it only means that the two variables are related. Maybe students who linger are more anxiety-ridden, and extreme anxiety tends to interfere with a student’s ability to produce what she knows, which in turn lowers her grade. Or perhaps those who linger are fretting because they neglected to study, which usually lowers one’s grade. There are many reasons why students’ grades are lower, so we must be cautious in assuming that correlation means that one variable is causing the other one. The only real way to find out if this is the case is through a controlled experiment. But knowing whether the variables are correlated is a good, and necessary, first step.
One useful practice is to square the correlation in order to get the coefficient of determination. This value tells us what percentage of the variance in one variable is associated with the variance in the other variable. Here’s what I mean: if we take our -.68 and square it, we get -.682 = .46. This shows us that 46% of the variance in grades is due to the variance in time spent lingering over the test. This also tells us that 54% (100% – 46%) of the variance in grades is associated with other factors that we haven’t even considered yet. Maybe it is anxiety, or maybe it is lack of studying in the days before the test. Maybe it is something else. We won’t know unless we measure other variables and do more correlations.
In the next post I’ll show you how to arrive at your correlation using the very commonly used Pearson’s r.