## Correlation or Regression ?

Correlation makes no a priori assumption as to whether one variable is dependent on the other(s) and is not concerned with the relationship between variables; instead it gives an estimate as to the degree of association between the variables. In fact, correlation analysis tests for interdependence of the variables.

As regression attempts to describe the dependence of a variable on one (or more) explanatory variables; it implicitly assumes that there is a one-way causal effect from the explanatory variable(s) to the response variable, regardless of whether the path of effect is direct or indirect. There are advanced regression methods that allow a non-dependence based relationship to be described (eg. Principal Components Analysis or PCA) and these will be touched on later.

The best way to appreciate this difference is by example.

Take for instance samples of the leg length and skull size from a population of elephants. It would be reasonable to suggest that these two variables are associated in some way, as elephants with short legs tend to have small heads and elephants with long legs tend to have big heads. We may, therefore, formally demonstrate an association exists by performing a correlation analysis. However, would regression be an appropriate tool to describe a relationship between head size and leg length? Does an increase in skull size cause an increase in leg length? Does a decrease in leg length cause the skull to shrink? As you can see, it is meaningless to apply a causal regression analysis to these variables as they are interdependent and one is not wholly dependent on the other, but more likely some other factor that affects them both (eg. food supply, genetic makeup).

Consider two variables: crop yield and temperature. These are measured independently, one by the weather station thermometer and the other by Farmer Giles' scales. While correlation anaylsis would show a high degree of association between these two variables, regression anaylsis would be able to demonstrate the dependence of crop yield on temperature. However, careless use of regression analysis could also demonstrate that temperature is dependent on crop yield: this would suggest that if you grow really big crops you'll be guaranteed a hot summer!

And that's as close as you'll come to a joke in statistics, so make the most of it ...

Thus, it is vital you work out precisely what you are trying to determine (and why) by using regression or correlation analysis, before you begin analysis.

Go back to Linear Regression