Sophisticated data analysis will help you spot patterns, trends and relationships in your results. Data analysis can be qualitative and/or quantitative, and may include statistical tests. An example of a statistical test is outlined below.

Lorenz curves

The Lorenz curve is a graph showing how evenly distributed a variable is over space.
The diagonal black line represents a perfectly even distribution. The blue and red lines show uneven distributions. The further these coloured lines are from the black line, the more uneven is the distribution.

You can draw Lorenz curves based on ordinal data (see worked example 1 below) or interval data (see worked example 2 below).

Worked example 1: Lorenz curve for ordinal data

There are 32844 LSOAs in England. These have been given an IMD score, and then ranked from 0 (the most deprived) to 32844 (the least deprived). The LSOAs can be divided into five quintiles. The table shows how many LSOAs are in each of the five quintiles for Barking and Dagenham and for Hillingdon.

All LSOAs in EnglandAll LSOAs in Barking and DagenhamAll LSOAs in Hillingdon
1st (top 20% deprived)666
2nd3552
3rd830
4th031
5th (least 20% deprived)033
SUM109152

From the raw data, it looks like there is a greater number of deprived LSOAs in Barking and Dagenham. In contrast, Hillingdon contains a more even distribution. Calculate the percentages for all three columns.

EnglandEnglandBarking and DagenhamBarking and DagenhamHillingdonHillingdon
quintile%raw data%raw data%
1st206660.663.9
2nd203532.15234.2
3rd2087.33019.7
4th20003120.4
5th20003321.7
SUM100109100152100

Now calculate the cumulative percentages for all three columns.

EnglandEnglandEnglandBarking and DagenhamBarking and DagenhamBarking and DagenhamHillingdonHillingdonHillingdon
quintile%cu%data%cu%data%cu%
1st20206660.660.663.93.9
2nd20403532.192.75234.238.2
3rd206087.31003019.757.9
4th2080001003120.478.3
5th20100001003321.7100
SUM100100109100100152100100

Plot a scattergraph with axes as follows

  • x-axis: cumulative percentages for England
  • y-axis: cumulative percentages for a single London Borough

The black line shows a perfectly even distribution. This shows the distribution of deprivation ranks in England. The further a line is from this, the more uneven the distribution. As suspected, Barking and Dagenham has a more uneven distribution of IMD ranks than Hillingdon.

Worked example 2: Lorenz curve for interval data

Lorenz curves can also be constructed for interval data, but there are some extra steps.

Bristol City Council have divided up the city into 14 ‘Neighbourhood Areas’. For each Neighbourhood Area, the total population of each area has been counted, plus the number of people with a ‘severe limiting long-term illness’.

This information can be used to help answer the question: do certain areas of Bristol contain a greater concentration of severely ill people than other areas? Or by contrast, are severely ill people evenly distributed throughout Bristol?

Name of Neighbourhood Area in BristolTotal population of this Neighbourhood AreaNumber of severely ill in this Neighbourhood Area
Ashley477823514
Avonmouth202372074
Bishopston367131383
Clifton411921537
Dundry View287713411
Filwood387783553
Bedminster226641864
Brislington221071686
Fishponds275753503
Henbury242532691
Hengrove287862963
Henleaze314122127
Horfield239122141
St Georges240522123
TOTAL41823434570

Calculate the percentages for the ‘total population’ and ‘number of severely ill’ columns. This shows the percentage of Bristol’s population and number of severely ill people in each Neighbourhood Area. For example, Fishponds contains 6.59% of Bristol’s population and 10.13% of Bristol’s severely ill people.

Name of Neighbourhood Area in BristolTotal population of this Neighbourhood AreaNumber of severely ill in this Neighbourhood Area
  % %
Ashley4778211.42351410.16
Avonmouth202374.8420746.00
Bishopston367138.7813834.00
Clifton411929.8515374.45
Dundry View287716.8834119.87
Filwood387789.27355310.28
Bedminster226645.4218645.39
Brislington221075.2916864.88
Fishponds275756.59350310.13
Henbury242535.8026917.78
Hengrove287866.8829638.57
Henleaze314127.5121276.15
Horfield239125.7221416.19
St Georges240525.7521236.14
TOTAL41823410034570100.00

Calculate the ratio between the two percentage columns.

\(\mathsf{ratio = \frac{\%\;severely\;ill}{\%\; population}}\)

For example, in Ashley, the ratio is \(\frac{10.16}{11.42} = 0.89\)

Name of Neighbourhood Area in BristolTotal population of this Neighbourhood AreaNumber of severely ill in this Neighbourhood AreaRatio of % severely ill to % population
 % %
Ashley4778211.42351410.160.89
Avonmouth202374.8420746.001.24
Bishopston367138.7813834.000.46
Clifton411929.8515374.450.45
Dundry View287716.8834119.871.43
Filwood387789.27355310.281.11
Bedminster226645.4218645.391.00
Brislington221075.2916864.880.92
Fishponds275756.59350310.131.54
Henbury242535.8026917.781.34
Hengrove287866.8829638.571.25
Henleaze314127.5121276.150.82
Horfield239125.7221416.191.08
St Georges240525.7521236.141.07
TOTAL41823410034570100.00 

Rank the ratio column from highest number to lowest number. You can either do this by hand or by using the Sort command in Excel.

Name of Neighbourhood Area in BristolTotal population of this Neighbourhood AreaNumber of severely ill in this Neighbourhood AreaRatio of % severely ill to % population
 % % rank
Ashley4778211.42351410.160.8911
Avonmouth202374.8420746.001.245
Bishopston367138.7813834.000.4613
Clifton411929.8515374.450.4514
Dundry View287716.8834119.871.432
Filwood387789.27355310.281.116
Bedminster226645.4218645.391.009
Brislington221075.2916864.880.9210
Fishponds275756.59350310.131.541
Henbury242535.8026917.781.343
Hengrove287866.8829638.571.254
Henleaze314127.5121276.150.8212
Horfield239125.7221416.191.087
St Georges240525.7521236.141.078
TOTAL41823410034570100.00  

Rearrange the rows in the table according to the ranks that you have just made.

Neighbourhood Area% total population% severely illratiorank
Fishponds6.5910.131.541
Dundry View6.889.871.432
Henbury5.807.781.343
Hengrove6.888.571.254
Avonmouth4.846.001.245
Filwood9.2710.281.116
Horfield5.726.191.087
St Georges5.756.141.078
Bedminster5.425.391.009
Brislington5.294.880.9210
Ashley11.4210.160.8911
Henleaze7.516.150.8212
Bishopston8.784.000.4613
Clifton9.854.450.4514

Calculate cumulative figures for the two % columns.

Neighbourhood Areatotal populationseverely ill
 %cumulative %%cumulative %
Fishponds6.596.5910.1310.13
Dundry View6.8813.479.8720.00
Henbury5.8019.277.7827.78
Hengrove6.8826.158.5736.36
Avonmouth4.8430.996.0042.35
Filwood9.2740.2610.2852.63
Horfield5.7245.986.1958.83
St Georges5.7551.736.1464.97
Bedminster5.4257.155.3970.36
Brislington5.2962.444.8875.24
Ashley11.4273.8610.1685.40
Henleaze7.5181.376.1591.55
Bishopston8.7890.154.0095.55
Clifton9.85100.004.45100.00

Finally it is time to draw the Lorenz curve! Plot the cumulative % total population on the x-axis. Plot the cumulative % severely ill on the y-axis.

Gini coefficient

Lorenz curves are a useful visual technique for presenting your data. But it is sometimes difficult to see how one uneven distribution compares to another. The Gini coefficient is a summary statistic that will provide a precise answer.

\(\mathsf{Gini\;coefficient = \frac{area\;of\;graph\; between\;the\;diagonal\;and\;the\;curve}{area\;of\;graph \;above\;the\;diagonal}}\)

The result for the Gini coefficient ranges from 0 (completely even distribution) to 1 (completely uneven distribution).

Worked example of Gini coefficient

There are 32844 LSOAs in England. These have been given an IMD score, and then ranked from 0 (the most deprived) to 32844 (the least deprived). The LSOAs can be divided into five quintiles. The table shows how many LSOAs are in each of the five quintiles for Barking and Dagenham and for Hillingdon.

All LSOAs in EnglandAll LSOAs in Barking and DagenhamAll LSOAs in Hillingdon
1st (top 20% deprived)666
2nd3552
3rd830
4th031
5th (least 20% deprived)033
SUM109152

Lorenz curves were plotted for the data.

To calculate the area of the graph above the diagonal, and the area of graph between the diagonal and the curve, you can count the number of squares on graph paper. Include fractions for part-squares.

There are 625 squares shown 312.5 squares are above the black diagonal line There are 61 squares between the diagonal and the red curve (for Hillingdon) There are 109 squares between the diagonal and the red curve (for Barking)

\(\mathsf{Gini\;coefficient} = \frac{109}{312.5}=0.35\) \(\mathsf{Gini\;coefficient} = \frac{61}{312.5}=0.20\)

Location Quotient

The Location Quotient is another mathematical technique for showing how unevenly distributed a variable is over space.

\(\mathsf{Location\;Quotient = \frac{\%\;in\;one \;area}{\%\;the\;whole\;population}}\)

Location Quotient (LQ) varies from 0 to infinity.

If LQ is less than 1, the variable is under-represented in a particular area. If LQ is greater than 1, the variable is over-represented in a particular area.

Worked example

Bristol City Council have divided up the city into 14 ‘Neighbourhood Areas’. For each Neighbourhood Area, the number of people in different age bands has been counted. Here are the total number of people aged 16-24 and 65-74 for each area.

Name of Neighbourhood Area in BristolTotal population of this Neighbouhood AreaTotal number of people aged 16-24
Ashley477827519
Avonmouth202372364
Bishopston367138351
Clifton4119214003
Dundry View287713621
Filwood387784288
Bedminster226642762
Brislington221072294
Fishponds275755535
Henbury242532631
Hengrove287863137
Henleaze314124160
Horfield239123773
St Georges240522566
TOTAL41823467004

Calculate the percentages for the ‘total population’ and ‘number aged 16-24’ columns. This shows the percentage of Bristol’s population and number of people aged 16-24 in each Neighbourhood Area.

For example, Avonmouth contains 3.53% of all the 16-24 year olds in Bristol. Be careful not to get confused here. This does not mean that 3.53% of Avonmouth’s population is aged 16-24.

Name of Neighbourhood Area in BristolTotal population of this Neighbouhood AreaTotal number of people aged 16-24 in this Neighborhood Area
 % %
Ashley4778211.42751911.22
Avonmouth202374.8423643.53
Bishopston367138.78835112.46
Clifton411929.851400320.90
Dundry View287716.8836215.40
Filwood387789.2742886.40
Bedminster226645.4227624.12
Brislington221075.2922943.42
Fishponds275756.5955358.26
Henbury242535.8026313.93
Hengrove287866.8831374.68
Henleaze314127.5141606.21
Horfield239125.7237735.63
St Georges240525.7525663.83
TOTAL41823410067004100

The Location Quotient is the ratio between the two percentage columns.
\(\mathsf{Location\;Quotient = \frac{\%\;aged\;16-24}{\%\;whole\;population}}\)

For example, in Avonmouth, the LQ is \(\frac{3.53}{4.84} = 0.73\)

Name of Neighbourhood Area in BristolTotal population of this Neighbourhood AreaTotal number of people aged 16-24 in this Neighborhood AreaLocation Quotient
 % %
Ashley4778211.42751911.220.98
Avonmouth202374.8423643.530.73
Bishopston367138.78835112.461.42
Clifton411929.851400320.902.12
Dundry View287716.8836215.400.79
Filwood387789.2742886.400.69
Bedminster226645.4227624.120.76
Brislington221075.2922943.420.65
Fishponds275756.5955358.261.25
Henbury242535.8026313.930.68
Hengrove287866.8831374.680.68
Henleaze314127.5141606.210.83
Horfield239125.7237735.630.98
St Georges240525.7525663.830.67
TOTAL418234100670041001

The calculated figures show that people aged 16-24 are under-represented in a number of areas, such as Avonmouth, Brislington and St Georges. But people aged 16-24 are over-represented in other areas, such as Clifton, Bishopston and Fishponds. The LQ results show that the greatest concentration of young adults is in Clifton: can you find any other data to help explain this?

Index of Dissimilarility

The Index of Dissimilarility is used to compare the distribution of two variables, such as two socio-economic groups or two ethnic groups in a particular area.

\(\mathsf{Index\;of\;dissimilarity} = 1/2 ∑ |x_i/X-y_i/Y|\)
  • \(x_i\)is the population of group \(x\) in small area \(i\)
  • \(X\) is the total population of group \(x\) in the whole area
  • \(y_i\)is the population of group \(y\) in small area \(i\)
  • \(Y\) is the total population of group \(y\) in the whole area

It helps answer the question: is group X more evenly distributed in a particular place than group Y? The index ranges from 0 (complete integration) to 100 (complete segregation).

Worked example 1 of Index of Dissimilarity

Census 2011 data for wards in Sandwell (West Midlands) can be obtained from Nomis. An extract is shown below

Name of ward in SandwellNumber of persons identifying their ethnicity as White in the ward (this is \(x_i\))Number of persons identifying their ethnicity as Asian in the ward (this is \(y_i\))
Abbey90781271
Blackheath10808870
Bristnall90641814
Charlemont with Grove Vale89031918
Cradley Heath and Old Hill119131009
Friar Park11335619
Great Barr with Yew Tree83003105
Great Bridge103931626
Greets Green and Lyng69253244
Hateley Heath102952182
Langley101351448
Newton78792178
Old Warley93881399
Oldbury76484011
Princes End11847369
Rowley10648609
St Pauls42527822
Smethwick71284522
Soho and Victoria38546881
Tipton Green92622625
Tividale10616913
Wednesbury North103311734
Wednesbury South91322232
West Bromwich Central63374857
TOTAL215471 (this is \(X\))59258 (this is \(Y\))

This means that Princes End contains 5.50% of people identifying as White in Sandwell. Be careful not to get confused here. This does not mean that 5.50% of the population of Princes End is White.

WardWhiteWhiteAsianAsian
 raw data% of Sandwell’s population (this is \(\frac{x_i}{X}\))raw data% of Sandwell’s population (this is \(\frac{y_i}{Y}\))
Abbey90784.2112712.14
Blackheath108085.028701.47
Bristnall90644.2118143.06
Charlemont89034.1319183.24
Cradley Heath119135.5310091.70
Friar Park113355.266191.04
Great Barr83003.8531055.24
Great Bridge103934.8216262.74
Greets Green69253.2132445.47
Hateley Heath102954.7821823.68
Langley101354.7014482.44
Newton78793.6621783.68
Old Warley93884.3613992.36
Oldbury76483.5540116.77
Princes End118475.503690.62
Rowley106484.946091.03
St Pauls42521.97782213.20
Smethwick71283.3145227.63
Soho and Victoria38541.79688111.61
Tipton Green92624.3026254.43
Tividale106164.939131.54
Wednesbury N103314.7917342.93
Wednesbury S91324.2422323.77
West Bromwich C63372.9448578.20
SUM215471100.0059258100.00

Calculate \(\vert\;x\;-\;y\;\vert\)

This is the difference between the two columns of percentages. Remove all negative numbers.

WardWhiteWhiteAsianAsian
 raw data%raw data%Differences (this is \(|x_i/X-y_i/Y|\) )
Abbey90784.2112712.142.07
Blackheath108085.028701.473.55
Bristnall90644.2118143.061.15
Charlemont89034.1319183.240.90
Cradley Heath119135.5310091.703.83
Friar Park113355.266191.044.22
Great Barr83003.8531055.241.39
Great Bridge103934.8216262.742.08
Greets Green69253.2132445.472.26
Hateley Heath102954.7821823.681.10
Langley101354.7014482.442.26
Newton78793.6621783.680.02
Old Warley93884.3613992.362.00
Oldbury76483.5540116.773.22
Princes End118475.503690.624.88
Rowley106484.946091.033.91
St Pauls42521.97782213.2011.23
Smethwick71283.3145227.634.23
Soho and Victoria38541.79688111.619.82
Tipton Green92624.3026254.430.13
Tividale106164.939131.543.39
Wednesbury N103314.7917342.931.87
Wednesbury S91324.2422323.770.47
West Bromwich C63372.9448578.205.26
SUM215471100.0059258100.0075.21

This is the sum of all the differences column.

In this example, \(\vert\frac{x_i}{X}-\frac{y_i}{Y}=75.21\vert\)

Calculate \(\mathsf{Index\;of\;dissimilarity} = 1/2 ∑ \vert \frac{x_i}{X} – \frac{y_i}{Y}\vert\)

In this example \(\mathsf{Index\;of\;dissimilarity} = 1/2 \times 75.21 = 37.61\)

This means that 37.61 of the Asian population of Sandwell would need to change residence to a different ward in order to have the same relative distribution as the White population of Sandwell.

Worked example 2 of Index of Dissimilarity

Census 2011 data for wards in Sandwell (West Midlands) can be obtained from Nomis. The Index of Dissimilarility has been calculated for ward-level data for the 7 largest ethnic groups of residents (excluding people of mixed ethnicity). A summary of the results is shown in the table.

 White BritishWhite OtherIndianPakistaniBangladeshiBlack CaribbeanBlack African
White British 31.7637.3954.0157.8133.8633.48
White Other  18.6839.8045.2215.9621.07
Indian   34.0342.7315.5825.60
Pakistani    43.4233.7126.17
Bangladeshi     48.7954.75
Black Caribbean      18.16

Secondary and Further Education Courses

Set your students up for success with our secondary school trips and courses. Offering excellent first hand experiences for your students, all linked to the curriculum.
People forraging in Tollymore

Group Leader and Teacher Training

Centre-based and digital courses for teachers

Experiences for Young People

Do you enjoy the natural world and being outdoors? Opportunities for Young People aged 16-25.
Lady with laptop in mountainous area

Digital Hub Plus

Subscribe/login to our package of teaching videos and resources