Skip to main content

Grouping data in a histogram

A continuous category, such as age, may have a large number of possible values and this could result in complex histogram with so many columns that it becomes difficult to interpret the information. For this reason the data in a histogram are often grouped to reduce the number of categories. For example, instead of drawing a bar for each individual age from 16 onwards, the data in the histogram below have been grouped into a series of continuous age ranges: 16-24, 25-34 etc.

hist2.gif

However, when grouping the original data, it is important to remember that in histograms the size of the category is represented by the area of the bars and not their length. A common error when constructing histograms is to overlook this relationship and this can produce a distorted view of the data.

This usually occurs if the data have been grouped into uneven sized categories, for example if the age ranges were 0-10, 11-15, 16-21, each would represent a different number of years (10, 5, 6) and therefore the corresponding bars in the histogram would have to have different widths to maintain the relationship between area and category size.

In the example above, the bar representing the age range 16-24 is slightly narrower than that for the age ranges 24-34, 35-44 etc. because it includes only 9 years whereas the others include 10. In the same way the bar representing ages over 75 is broader than the other bars since it represents an open-ended category.

Histograms and Excel

The spreadsheet package Excel does not include histograms amongst its standard chart types. It is however, possible to draw basic histograms using Excel by selecting either the column or bar chart types. By default these chart types include a gap between the columns representing each category but this can be removed, in order that adjacent columns butt onto one another, resulting in the chart appearing as a histogram.

This is achieved in the following way:

  1. Highlight any of the columns in the graph by clicking on it with the mouse;
  2. Select Format selected data series from the main menu bar. This will bring up a dialogue box titled Format data series;
  3. In the dialogue box select the Options heading;
  4. Adjust the gap size setting to 0.

Unfortunately, Excel does not include the ability to alter the width of the columns or bars drawn for each category, and therefore it is not suitable for drawing histograms for grouped data when the categories represent groups of different sizes.