Sophisticated data analysis will help you spot patterns, trends and relationships in your results. Data analysis can be qualitative and/or quantitative, and may include statistical tests. An example of a statistical test is outlined below.

## Lorenz curves

The Lorenz curve is a graph showing how evenly distributed a variable is over space.
The diagonal black line represents a perfectly even distribution. The blue and red lines show uneven distributions. The further these coloured lines are from the black line, the more uneven is the distribution.

You can draw Lorenz curves based on ordinal data (see worked example 1 below) or interval data (see worked example 2 below).

### Worked example 1: Lorenz curve for ordinal data

There are 32844 LSOAs in England. These have been given an IMD score, and then ranked from 0 (the most deprived) to 32844 (the least deprived). The LSOAs can be divided into five quintiles. The table shows how many LSOAs are in each of the five quintiles for Barking and Dagenham and for Hillingdon.

From the raw data, it looks like there is a greater number of deprived LSOAs in Barking and Dagenham. In contrast, Hillingdon contains a more even distribution. Calculate the percentages for all three columns.

Now calculate the cumulative percentages for all three columns.

Plot a scattergraph with axes as follows

• x-axis: cumulative percentages for England
• y-axis: cumulative percentages for a single London Borough

The black line shows a perfectly even distribution. This shows the distribution of deprivation ranks in England. The further a line is from this, the more uneven the distribution. As suspected, Barking and Dagenham has a more uneven distribution of IMD ranks than Hillingdon.

### Worked example 2: Lorenz curve for interval data

Lorenz curves can also be constructed for interval data, but there are some extra steps.

Bristol City Council have divided up the city into 14 ‘Neighbourhood Areas’. For each Neighbourhood Area, the total population of each area has been counted, plus the number of people with a ‘severe limiting long-term illness’.

This information can be used to help answer the question: do certain areas of Bristol contain a greater concentration of severely ill people than other areas? Or by contrast, are severely ill people evenly distributed throughout Bristol?

Calculate the percentages for the ‘total population’ and ‘number of severely ill’ columns. This shows the percentage of Bristol’s population and number of severely ill people in each Neighbourhood Area. For example, Fishponds contains 6.59% of Bristol’s population and 10.13% of Bristol’s severely ill people.

Calculate the ratio between the two percentage columns.

$$\mathsf{ratio = \frac{\%\;severely\;ill}{\%\; population}}$$

For example, in Ashley, the ratio is $$\frac{10.16}{11.42} = 0.89$$

Rank the ratio column from highest number to lowest number. You can either do this by hand or by using the Sort command in Excel.

Rearrange the rows in the table according to the ranks that you have just made.

Calculate cumulative figures for the two % columns.

Finally it is time to draw the Lorenz curve! Plot the cumulative % total population on the x-axis. Plot the cumulative % severely ill on the y-axis.

## Gini coefficient

Lorenz curves are a useful visual technique for presenting your data. But it is sometimes difficult to see how one uneven distribution compares to another. The Gini coefficient is a summary statistic that will provide a precise answer.

$$\mathsf{Gini\;coefficient = \frac{area\;of\;graph\; between\;the\;diagonal\;and\;the\;curve}{area\;of\;graph \;above\;the\;diagonal}}$$

The result for the Gini coefficient ranges from 0 (completely even distribution) to 1 (completely uneven distribution).

### Worked example of Gini coefficient

There are 32844 LSOAs in England. These have been given an IMD score, and then ranked from 0 (the most deprived) to 32844 (the least deprived). The LSOAs can be divided into five quintiles. The table shows how many LSOAs are in each of the five quintiles for Barking and Dagenham and for Hillingdon.

Lorenz curves were plotted for the data.

To calculate the area of the graph above the diagonal, and the area of graph between the diagonal and the curve, you can count the number of squares on graph paper. Include fractions for part-squares.

There are 625 squares shown 312.5 squares are above the black diagonal line There are 61 squares between the diagonal and the red curve (for Hillingdon) There are 109 squares between the diagonal and the red curve (for Barking)

$$\mathsf{Gini\;coefficient} = \frac{109}{312.5}=0.35$$ $$\mathsf{Gini\;coefficient} = \frac{61}{312.5}=0.20$$

## Location Quotient

The Location Quotient is another mathematical technique for showing how unevenly distributed a variable is over space.

$$\mathsf{Location\;Quotient = \frac{\%\;in\;one \;area}{\%\;the\;whole\;population}}$$

Location Quotient (LQ) varies from 0 to infinity.

If LQ is less than 1, the variable is under-represented in a particular area. If LQ is greater than 1, the variable is over-represented in a particular area.

### Worked example

Bristol City Council have divided up the city into 14 ‘Neighbourhood Areas’. For each Neighbourhood Area, the number of people in different age bands has been counted. Here are the total number of people aged 16-24 and 65-74 for each area.

Calculate the percentages for the ‘total population’ and ‘number aged 16-24’ columns. This shows the percentage of Bristol’s population and number of people aged 16-24 in each Neighbourhood Area.

For example, Avonmouth contains 3.53% of all the 16-24 year olds in Bristol. Be careful not to get confused here. This does not mean that 3.53% of Avonmouth’s population is aged 16-24.

The Location Quotient is the ratio between the two percentage columns.
$$\mathsf{Location\;Quotient = \frac{\%\;aged\;16-24}{\%\;whole\;population}}$$

For example, in Avonmouth, the LQ is $$\frac{3.53}{4.84} = 0.73$$

The calculated figures show that people aged 16-24 are under-represented in a number of areas, such as Avonmouth, Brislington and St Georges. But people aged 16-24 are over-represented in other areas, such as Clifton, Bishopston and Fishponds. The LQ results show that the greatest concentration of young adults is in Clifton: can you find any other data to help explain this?

## Index of Dissimilarility

The Index of Dissimilarility is used to compare the distribution of two variables, such as two socio-economic groups or two ethnic groups in a particular area.

$$\mathsf{Index\;of\;dissimilarity} = 1/2 ∑ |x_i/X-y_i/Y|$$
• $$x_i$$is the population of group $$x$$ in small area $$i$$
• $$X$$ is the total population of group $$x$$ in the whole area
• $$y_i$$is the population of group $$y$$ in small area $$i$$
• $$Y$$ is the total population of group $$y$$ in the whole area

It helps answer the question: is group X more evenly distributed in a particular place than group Y? The index ranges from 0 (complete integration) to 100 (complete segregation).

### Worked example 1 of Index of Dissimilarity

Census 2011 data for wards in Sandwell (West Midlands) can be obtained from Nomis. An extract is shown below

This means that Princes End contains 5.50% of people identifying as White in Sandwell. Be careful not to get confused here. This does not mean that 5.50% of the population of Princes End is White.

Calculate $$\vert\;x\;-\;y\;\vert$$

This is the difference between the two columns of percentages. Remove all negative numbers.

This is the sum of all the differences column.

In this example, $$\vert\frac{x_i}{X}-\frac{y_i}{Y}=75.21\vert$$

Calculate $$\mathsf{Index\;of\;dissimilarity} = 1/2 ∑ \vert \frac{x_i}{X} – \frac{y_i}{Y}\vert$$

In this example $$\mathsf{Index\;of\;dissimilarity} = 1/2 \times 75.21 = 37.61$$

This means that 37.61 of the Asian population of Sandwell would need to change residence to a different ward in order to have the same relative distribution as the White population of Sandwell.

### Worked example 2 of Index of Dissimilarity

Census 2011 data for wards in Sandwell (West Midlands) can be obtained from Nomis. The Index of Dissimilarility has been calculated for ward-level data for the 7 largest ethnic groups of residents (excluding people of mixed ethnicity). A summary of the results is shown in the table.

## Secondary and Further Education Courses

Set your students up for success with our secondary school trips and courses. Offering excellent first hand experiences for your students, all linked to the curriculum.