## On-line statistics

Data transformations

One advantage of using parametric statistics is that it makes it much easier to describe your data. If you have established that it follows a normal distribution you can be sure that a particular set of measurements can be properly described by its mean and standard deviation. If your data are not normally distributed you cannot use any of the tests that assume that it is (e.g. ANOVA, t test, regression analysis). If your data are not normally distributed it is often possible to normalise it by transforming it.

Transforming data to allow you to use parametric statistics is completely legitimate. People often feel uncomfortable when they transform data because it seems like it artificially improves their results but this is only because they feel happiest with linear or arithmetic scales. However, there is no reason for not using other scales (e.g. logarithms, square roots, reciprocals or angles) where appropriate (Sokal & Rohlf, 1995; see pages 411-422).

Different transformations work for different data types:

Logarithms : Growth rates are often exponential and log transforms will often normalise them. Log transforms are particularly appropriate if the variance increases with the mean.

Reciprocal : If a log transform does not normalise your data you could try a reciprocal (1/x) transformation. This is often used for enzyme reaction rate data (see Fowler & Cohen, 1990).

Square root : This transform is often of value when the data are counts, e.g. blood cells on a haemocytometer or woodlice in a garden. Carrying out a square root transform will convert data with a Poisson distribution to a normal distribution.

Arcsine : This transformation is also known as the angular transformation and is especially useful for percentages and proportions.

Some problems with transformations

To present a true mean value of data in the linear scale it is necessary to reconvert the transformed mean. The standard deviation in this case is of no value and you should compute confidence limits of the transformed data and then convert these to the linear scale.

The product moment correlation coefficient as generated by most statistics packages can be artificially affected by transformation of data. Care should be taken in this situation to make sure that the particular correlation coefficient you use is robust. See, for example Kvalseth, T.O. (1985) Cautionary note about r squared, Amer. Stat., 39(4):279-285 and Scott, A. and Wild, C. (1991). Transformations and r-squared. Amer. Stat., 45(2):127-128

# Descriptive Stats Diversity Indices Comparisons Correlations Regression

Ted Gaten  Department of Biology  gat@le.ac.uk
Entry approved by the Head of Department. Last Updated: May 2000