Maths skills

Your findings can be presented with a range of graphical and mapping techniques. You should be able to justify each one.

Bar chart

Nominal data.
Categories on x-axis.
Bar height represents frequency.
Leave gaps between bars as data are discontinuous.

There are many variations on the basic bar chart, such as divided bar chart, percentage bar chart and bi-polar analysis bar chart.

Histogram

Interval or ratio data.
Bar area shows frequency.
Bars are not necessarily of equal width.
No gaps between bars as data are continuous.

Pie chart

Nominal or ordinal data.
Area of circle segment represents proportion.
Multiple pie charts can be used with the radius of the circle having meaning.

Line graph

Ordinal, interval or ratio data.
Both axes are numerical.
If time is one of the variables, always plot it on the x-axis.
Only join up the points if the data are continuous.

Scattergraph

This needs one independent variable (on x-axis) and one dependent variable (on y-axis).
Both axes need interval or ratio data, and both must be continuous data.
Do not join up each point, but use a line of best fit instead.

Normal distribution

The symmetrical bell-shaped distribution from a large series of measurements plotted on a frequency histogram. The mean is in the middle, with an equal number of smaller and larger values either side of it.

Averages

Mean

Add all the measurements together then divide by the number of measurements taken. The mean can only be used if the data approximate to a normal distribution, and have an interval or ratio measurement scale.

Median

Arrange the data in order, and take the middle value as the median. It can be used for data which are not normally distributed. Suitable for variables with an ordinal scale.

Mode

The value which occurs most often. Suitable for variables with a nominal scale.

Dispersion is the spread of data around the average.

If the data are normally distributed, use the interquartile range or the standard deviation. The usual way of expressing dispersion is as

mean ± interquartile range or mean ± standard deviation

If the data are not normally distributed, use the median and range.

Range

Range is the distance between the highest and lowest value.

Example: 12 17 21 23 24 24 25 26 29 31
Range = 12–31

Interquartile range is the part of the range that covers the middle 50% of the data.

If the variable has an interval or ratio scale and if the data are normally distributed, use the interquartile range or the standard deviation. Otherwise use the median and range to show dispersion.

Standard deviation

This is calculated using the formula below. In a normal distribution, 68% of values are within one standard deviation, 95% within two standard deviations, and 99.7% within three standard deviations of the mean.

\(s=\sqrt\frac{\sum x^2-(\frac{\sum x^2}{n})}{n-1}\)

\(\sum x^2\) = each individual piece of data squared and then added up
\(n\) = the number of items of data

Most scientific calculators will do the above calculations if you know how to operate their statistical functions. The instructions will tell you how to do this.

Worked example

A biology student measured the length of ribwort plantain leaves in three different areas of grassland. The results are as follows.

AREA 1	0.01	0.09	0.04	0.05	0.05	0.05	0.05	0.06	0.05	0.05	0.05
AREA 2	0.04	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.06	0.05
AREA 3	0.01	0.02	0.05	0.08	0.05	0.07	0.04	0.05	0.06	0.07	0.05

Notice that the mean for all the areas is the same (0.05m). The data for each group are however obviously different in the way that individual data points are scattered about the mean. Nearly all Area 1 have lengths of 0.05m (the mean value) but there are a couple of short lengths and two bigger lengths. Area 2 are very consistent in lengths. Eight out of ten measured are of 0.05m. There is just one shorter length and one slightly bigger value. The area 3 lengths are much more spread out around the mean.

These are the standard deviation values for the data above:

σ n-1 area 1 = 0.192

σ n-1 area 2 = 0.047

σ n-1 area 3 = 0.221

Simpson’s Diversity Index

Simpson’s Diversity Index is a measure both of species richness (i.e. the number of different species present) and species evenness (i.e. how evenly distributed each species is).

\(D = \frac{N\times(N-1)}{Σ n\times(n-1)}\)

sometimes this is expressed as

\(D = 1- Σ(\frac{n}{N})^2\)

\(D\) = Simpson’s Diversity Index
\(n\) = the number of individuals of each species
\(N\) = the total number of individuals

Worked example

A biologist is comparing species diversity at two sites within Traeth-y-goes sand dunes. Raw data from point quadrats is used. The number means the total hits per species per site.

Species	Mobile dune	Fixed dune
Marram grass	10	4
Sea holly	3	0
Sand fescue	1	11
Saltwort	2	0
Dandelion	0	8

Calculate \(n\), \(n\times(n-1)\), \(N\) and \(D\) for each site

	Mobile dune	Mobile dune	Fixed dune	Fixed dune
Species	\(n\)	\(n\times(n-1)\)	\(n\)	\(n\times(n-1)\)
Marram grass	10	90	4	12
Sea holly	3	6	0	0
Sand fescue	1	2	11	110
Saltwort	2	2	0	0
Dandelion	0	0	8	56
TOTAL	17	100	23	178
	D = 17(16) / 100	D = 23(22) / 178
	D = 2.72	D = 2.84

The larger the value of D, the higher the species diversity. A low value of D could be due to low overall species richness or to the dominance of one species.

Big Bio Data

Use our resource to explore how we can use big data in biology to explore more about the places we do fieldwork: Enhancement module_16-18 bio_big data and GIS

The NBN Atlas is a powerful online tool for exploring biodiversity data across the UK. It brings together millions of records from scientists, conservation groups, and the public.

Check it out here: https://nbnatlas.org/