# Correlation Solution

### Example homework problem:

You work for an automotive magazine, and you are investigating the relationship between a car’s gas mileage (in miles-per-gallon) and the amount of horsepower produced by a car’s engine. You collect the following data:

Automobile: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |

Horsepower: | 95 | 135 | 120 | 167 | 210 | 146 | 245 | 110 | 160 | 130 |

MPG: | 37 | 19 | 26 | 20 | 24 | 30 | 15 | 32 | 23 | 33 |

Is there a significant correlation between horsepower and MPG (alpha = .05)?

The goal of your analysis is to assess the naturally-occurring relationship between these two variables. To do this, we will compute the __Pearson Product Moment Correlation Coefficient__:

The correlation is equal to the ratio of the covariance between your variables to the variance within your variables. *n* is equal to the number of pairs of scores.

The __Covariance__ between the two variables is equal to:

where ΣXY is equal to the sum of the crossproducts: you multiply each pair of scores and then add up the products.

**Compute test statistic**. Begin by computing the crossproduct scores, and then compute ΣXY, ΣX, and ΣY:

Automobile: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Sum |

Horsepower: | 95 | 135 | 120 | 167 | 210 | 146 | 245 | 110 | 160 | 130 | 1518 |

MPG: | 37 | 19 | 26 | 20 | 24 | 30 | 15 | 32 | 23 | 33 | 259 |

Crossproduct: | 3515 | 2565 | 3120 | 3340 | 5040 | 4380 | 3675 | 3520 | 3680 | 4290 | 37125 |

Here are the preliminary sums: ΣX = 1518, ΣY = 259, and ΣXY = 37125. *n* = 10.

Compute the variance for each variable. You will find that the variance of X (horsepower) is equal to 2127.51, and the variance of Y (gas mileage) is equal to 48.99. If you would like help with these computations, see the documentation for descriptive statistics. Now, compute the covariance between the two variables:

Let’s pause here for a moment, and place these three statistics into a __Variance / Covariance Matrix__:

Variable: | Horsepower (X) |
MPG (Y) |

Horsepower (X): | 2127.5111 | -243.4667 |

MPG (Y): | 48.9889 |

where the values along the diagonal of the matrix are the variances of X and Y, and the value off the diagonal is the covariance between X and Y. It is a good habit to construct a variance/covariance matrix when you are working on correlation and regression problems.

Now that you have the variances and the covariance, you will find that the correlation is equal to:

**Conduct hypothesis test**. Our correlation will have *df* equal to the number of pairs of scores minus 2. In our case, we have 10 automobiles, so we would have *df* = 10 – 2 = 8.

Alpha was set at .05 and we will conduct a two-tailed test. When you consult your table of critical values for *r*, you will find that if our obtained value of *r* is greater than .632, then we would conclude that the relationship between horsepower and gas mileage is significant — i.e., that *r* is significantly different from zero.

Since the obtained *r* (-.75) is greater in absolute value than the critical *r* (.632), we would conclude that there is a significant negative relationship between the horsepower level of the automobile and its gas mileage — i.e., the greater the horsepower, the lower the gas

mileage.

If you are not testing *r* directly in your class, then you are probably testing the significance of the correlation coefficient with the *t* distribution:

This *t* test will have *df* equal to n – 2, the same as *r*. In our case, *df* = 8. With alpha = .02 and assuming a two-tailed test, we would compare *t* to a critical value of 2.306.

Since our obtained *t* (-3.25) is greater in absolute value than our critical value of *t* (2.306), we would conclude that the correlation between horsepower and gas mileage was significant. Note that these two approaches are perfectly equivalent, and will always result in the same decision.

**Effect Size**. The correlation is an effective descriptive statistic in its own right. You know that it ranges from 0 to +1 and from 0 to -1, where 0 = no relationship and +1/-1 = perfect positive/negative relationship. In *Statistical Power Analysis*, Cohen suggested that we interpret a correlation of .10 as a small effect, a correlation of .30 as a moderate effect, and a correlation of .50 or greater as a large effect.

In addition, the square of the correlation (*r*²) is a very useful effect size estimate called the __Coefficient of Determination__. *r*² represents the proportion of variance in X and Y that is shared — i.e., the proportion of variance in one of the variables that can be accounted for by variation in the other variable. In our case, *r*² = -.7541² = .57. Thus, if you were predicting gas milage from horsepower, you could account for 57% of the variance in gas milage by variations in horsepower.

Return to Correlation Procedure

## Leave a Reply