On-line
statistics |

**Correlation** makes no *a priori* assumption as
to whether one variable is dependent on the other(s) and is not
concerned with the relationship between variables; instead it
gives an estimate as to the degree of **association** between
the variables. In fact, correlation analysis tests for **interdependence**
of the variables.

As **regression** attempts to describe the dependence of a
variable on one (or more) explanatory variables; it implicitly
assumes that there is a **one-way causal effect** from the
explanatory variable(s) to the response variable, regardless of
whether the path of effect is direct or indirect. There are advanced
regression methods that allow a non-dependence based relationship
to be described (eg. Principal Components Analysis or PCA) and
these will be touched on later.

The best way to appreciate this difference is by example.

Take for instance samples of the leg length and skull size from
a population of elephants. It would be reasonable to suggest that
these two variables are associated in some way, as elephants with
short legs tend to have small heads and elephants with long legs
tend to have big heads. We may, therefore, formally demonstrate
an association exists by performing a correlation analysis. However,
would regression be an appropriate tool to describe a **relationship**
between head size and leg length? Does an increase in skull size
**cause** an increase in leg length? Does a decrease in leg
length cause the skull to shrink? As you can see, it is meaningless
to apply a causal regression analysis to these variables as they
are interdependent and one is not wholly dependent on the other,
but more likely some other factor that affects them both (eg.
food supply, genetic makeup).

Consider two variables: crop yield and temperature. These are
measured independently, one by the weather station thermometer
and the other by Farmer Giles' scales. While correlation anaylsis
would show a high degree of association between these two variables,
regression anaylsis would be able to demonstrate the dependence
of crop yield on temperature. However, careless use of regression
analysis could also demonstrate that temperature is dependent
on crop yield: this would suggest that if you grow really big
crops you'll be guaranteed a hot summer!

Thus, it is vital you work out precisely what you are trying to
determine (*and why*) by using regression or correlation
analysis, **before** you begin analysis.

Go back to Linear Regression

This page written by Dr Jon Read, April 1998.

Descriptive Stats |
Diversity Indices |
Comparisons |
Correlations |
Regression |

Ted Gaten Department of Biology gat@le.ac.uk Entry approved by the Head of Department. Last Updated: May 2000