[University home]

 On-line statistics


Overiding limitations of statistics


There is little in life that can compare with the sheer excitement of data analysis, especially if done early on a Sunday morning after a good party. However wonderful statistics are, they are not flawless and have some very important limitations.

How representative is your sample?

Most of us use samples to represent a population, with the number of data points collected in the sample depending on the resources available. If we were to catch the nearest beetle and measure its length, would we be able to say that length is typical (average) of that species? Not with any degree of confidence! It may have just emerged as an adult, or it may have just escaped from a growth-hormone development laboratory and is 20 times the biggest size ever recorded before. For reasons such as these we must collect as many measurements as we can to get an idea of the variability within a population. Then we can find out how representative the sample is by calculating standard error. The lower the standard error is relative to the mean the closer the sample is to representing the population.

Samples used in statistical tests that do not represent the population adequately can give reliable results but with little relevance to the population that it came from.


How well does your data fit the requirements of the tests?

All statistical tests are limited to particular types of data they can be applied to. For instance, some tests are based on the data having a normal distribution. By using this "normal" pattern in the data they provide us with the results.

NOTE: Many books and web sites refer to these "limitations" as "assumptions". The author feels that this word is too obscure and plays down their importance. Therefore, in order to give them the right emphasis they will be referred to as "REQUIREMENTS" throughout this web site. Where applicable, requirements that are not as rigid in application as others will be highlighted.


What the P-value says about your hypotheses

It would be nice if the P-value or "probability value" informed us of the probability of the null hypothesis (HO) occurring. If only life was so simple! The P-value is the probability that the observed differences (e.g. between the means in a t-test, or slope of the line and zero in a correlation) occur only by chance. We then use the reverse logic that if the differences occur by chance so seldom (typically when P<5%), real differences must exist. This has serious implications on what you say about the hypothesis you accept:

By accepting an alternative hypothesis at the 5% confidence level you can say that 95% of the time a difference or a relationship would be found in consecutive samples from the same area (that there is a 5% chance that the differences occur only by chance).

Accepting a null hypothesis does not mean that the samples are the same or that there is no relationship, just that the evidence in the samples is not strong enough to support the opposite. However, most would accept that the two samples were from the same population (comparison) or that the slope of the line is equal to zero (correlation).


Descriptive Stats

Diversity Indices

Comparisons

Correlations

Regression


[University Home][Biology Home][University Index A-Z][University Search][University Help]


 Ted Gaten  Department of Biology  gat@le.ac.uk
Entry approved by the Head of Department. Last Updated: May 2000