Regression – Part 2
In the previous post we saw how the size of the correlation between two variables affects the accuracy of prediction. We can see this reflected in the slope of the regression line when it is illustrated on a scatterplot. Take a look at this one:
This plot is based on a positive correlation of r = .78. Notice how the regression line has some steepness to it. The stronger the relationship between the two variables, the steeper will be the regression line. But what is this regression line? How does one know where to draw it on the scatterplot?
First of all, the regression line is all of the values of a variable (known as “Y”) that was predicted using another variable (known as “X”). When all of these predicted values are lined up together, they form a line known as the “least-sqares regression line”. The line itself reflects the smallest squared differences (or squared errors) between actual Y values and the predicted ones, and reflects the accuracy of the prediction. When two variables have a strong correlation, there will be little difference in actual Y values and predicted ones (i.e. the data points will all be close to the line), making the errors smaller and the slope of the line steeper. A weak correlation will show a wide difference in data points and the regression line, and will also have a regression line that is much less steep. If two variables aren’t related, the data points will be scattered all over the plot and the regression line will be horizontal or flat.
But how do we know where to draw this line? We use formulas to find both the slope and the Y-intercept (where the line will meet the Y axis), and this gives us a regression equation:
Y’ = .68(X) + 1.67
By working this regression equation, we find out what Y is predicted to be based on some value of the X variable. It is only necessary to work the equation once or twice, using different values of X, to develop a couple of points through which to draw the line. Though unnecessary, you could conceivably work the equation using every whole value of X and show that all of the predicted Y points would eventually form the line.
As useful as this simple form of regression is, most researchers will want to use more than one variable at a time for prediction. This is known as multiple regression, and we’ll take a look at that next time.