Your findings can be presented with a range of graphical and mapping techniques. You should be able to justify each one.
Bar chart
- Nominal data.
- Categories on x-axis.
- Bar height represents frequency.
- Leave gaps between bars as data are discontinuous.
There are many variations on the basic bar chart, such as divided bar chart, percentage bar chart and bi-polar analysis bar chart.
Histogram
- Interval or ratio data.
- Bar area shows frequency.
- Bars are not necessarily of equal width.
- No gaps between bars as data are continuous.
Pie chart
- Nominal or ordinal data.
- Area of circle segment represents proportion.
- Multiple pie charts can be used with the radius of the circle having meaning.
Line graph
- Ordinal, interval or ratio data.
- Both axes are numerical.
- If time is one of the variables, always plot it on the x-axis.
- Only join up the points if the data are continuous.
Scattergraph
- This needs one independent variable (on x-axis) and one dependent variable (on y-axis).
- Both axes need interval or ratio data, and both must be continuous data.
- Do not join up each point, but use a line of best fit instead.
Normal distribution
The symmetrical bell-shaped distribution from a large series of measurements plotted on a frequency histogram. The mean is in the middle, with an equal number of smaller and larger values either side of it.
Averages
Mean
Add all the measurements together then divide by the number of measurements taken. The mean can only be used if the data approximate to a normal distribution, and have an interval or ratio measurement scale.
Median
Arrange the data in order, and take the middle value as the median. It can be used for data which are not normally distributed. Suitable for variables with an ordinal scale.
Mode
The value which occurs most often. Suitable for variables with a nominal scale.
Dispersion is the spread of data around the average.
If the data are normally distributed, use the interquartile range or the standard deviation. The usual way of expressing dispersion is as
mean ± interquartile range or mean ± standard deviation
If the data are not normally distributed, use the median and range.
Range
Range is the distance between the highest and lowest value.
Example: 12 17 21 23 24 24 25 26 29 31
Range = 12–31
Interquartile range is the part of the range that covers the middle 50% of the data.
If the variable has an interval or ratio scale and if the data are normally distributed, use the interquartile range or the standard deviation. Otherwise use the median and range to show dispersion.
Standard deviation
This is calculated using the formula below. In a normal distribution, 68% of values are within one standard deviation, 95% within two standard deviations, and 99.7% within three standard deviations of the mean.
\(s=\sqrt\frac{\sum x^2-(\frac{\sum x^2}{n})}{n-1}\)- \(\sum x^2\) = each individual piece of data squared and then added up
- \(n\) = the number of items of data
Most scientific calculators will do the above calculations if you know how to operate their statistical functions. The instructions will tell you how to do this.
Worked example
A biology student measured the length of ribwort plantain leaves in three different areas of grassland. The results are as follows.
AREA 1 | 0.01 | 0.09 | 0.04 | 0.05 | 0.05 | 0.05 | 0.05 | 0.06 | 0.05 | 0.05 | 0.05 |
AREA 2 | 0.04 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.05 | 0.06 | 0.05 |
AREA 3 | 0.01 | 0.02 | 0.05 | 0.08 | 0.05 | 0.07 | 0.04 | 0.05 | 0.06 | 0.07 | 0.05 |
Notice that the mean for all the areas is the same (0.05m). The data for each group are however obviously different in the way that individual data points are scattered about the mean. Nearly all Area 1 have lengths of 0.05m (the mean value) but there are a couple of short lengths and two bigger lengths. Area 2 are very consistent in lengths. Eight out of ten measured are of 0.05m. There is just one shorter length and one slightly bigger value. The area 3 lengths are much more spread out around the mean.
These are the standard deviation values for the data above:
σ n-1 area 1 = 0.192
σ n-1 area 2 = 0.047
σ n-1 area 3 = 0.221
Simpson’s Diversity Index
Simpson’s Diversity Index is a measure both of species richness (i.e. the number of different species present) and species evenness (i.e. how evenly distributed each species is).
\(D = \frac{N\times(N-1)}{Σ n\times(n-1)}\)sometimes this is expressed as
\(D = 1- Σ(\frac{n}{N})^2\)- \(D\) = Simpson’s Diversity Index
- \(n\) = the number of individuals of each species
- \(N\) = the total number of individuals
Worked example
A biologist is comparing species diversity at two sites within Traeth-y-goes sand dunes. Raw data from point quadrats is used. The number means the total hits per species per site.
Species | Mobile dune | Fixed dune |
---|---|---|
Marram grass | 10 | 4 |
Sea holly | 3 | 0 |
Sand fescue | 1 | 11 |
Saltwort | 2 | 0 |
Dandelion | 0 | 8 |
Calculate \(n\), \(n\times(n-1)\), \(N\) and \(D\) for each site
Mobile dune | Mobile dune | Fixed dune | Fixed dune | |
---|---|---|---|---|
Species | \(n\) | \(n\times(n-1)\) | \(n\) | \(n\times(n-1)\) |
Marram grass | 10 | 90 | 4 | 12 |
Sea holly | 3 | 6 | 0 | 0 |
Sand fescue | 1 | 2 | 11 | 110 |
Saltwort | 2 | 2 | 0 | 0 |
Dandelion | 0 | 0 | 8 | 56 |
TOTAL | 17 | 100 | 23 | 178 |
D = 17(16) / 100 | D = 23(22) / 178 | |||
D = 2.72 | D = 2.84 |
The larger the value of D, the higher the species diversity. A low value of D could be due to low overall species richness or to the dominance of one species.