Reworking Booth: Geodemographics of Housing

[Update January 2013 – Scottish SIMD 2012 map added, more details.]

I’ve created a new visualisation, a dasymetric map of housing demographics which you can see here, which attempts to improve on the common thematic (a.k.a. choropleth) maps – a traditional example is shown below – where areas across the country are colour-coded according to some attribute. My visualisation clips the colour-coding to the building outlines in each area, leaving open ground, parks etc uncoloured.

The Traditional Approach

The shortcoming of choropleth maps is that each area is coloured uniformly. If the attribute being measured is a property of the houses in that area, such as much of the census data, then choropleth maps not only colour the houses in each area, but also the parks, rivers and mountains that might also be contained within the area, even though the data being displayed arguably only applies to the houses. This means that geodemographic classification results that predominate in rural areas tend to overwhelm a map at smaller scales – as can be seen in the map on the right – where the green represents a countryside geodemographic.

An alternative to choropleth maps is to use cartograms. These distort the area, elastically, to tessellating hexagonal groups or to circles (Dorling cartograms), to match typically population rather than geographic extent, so that the colours are represented more fairly, but cartograms are very difficult for most people to interpret and relate to familiar physical features. They can look very “alien”. One further alternative is dot distribution maps – these assign dots of colour, randomly within each area. This reduces the colour density correctly in sparsely populated areas, but distributes the dots evenly across empty parks and rows of houses, if both are in a single area, and imply single points of population.

Clipping the Choropleth Maps

My visualisation attempts be the best of both worlds, by retaining the familiar geographic shape of the UK and its towns and cities, but not swamping the map with colours in all areas, and indeed ensuring that unpopulated areas have no colour. This is possible because Ordnance Survey Open Data includes Vector Map District. The second release of this dataset improved the quality of building outlines considerably, allowing distinct rows of buildings on streets to be seen and even individual detached houses. Unfortunately building classifications are not included, so the process necessarily colours all buildings, rather than just the residential ones that formed part of the census data. This is why, for example, the Millennium Dome in Greenwich appears, even though no one (hopefully!) lives there.

The major shortcoming of doing this is that it falsely implies a higher level of precision within each Output Area, by often showing and colouring individual buildings, whereas the colour is representative as an average of the properties in the area concerned, rather than telling you something about that particular building itself. That is, the technique is showing no new or more detailed data than can be seen in the traditional choropleth maps, but tends to mislead the viewer otherwise. This is balanced by making the map seem more realistic, by not unformly covering everything in the area with a giant blob of a single colour.

The map can be considered to be a dasymetric map, albeit one where the spatial qualifier, population density, is one of two values – high (in a building) or zero (not in a building).

Booth’s Poverty Map

An inspiration for this kind of map is the Charles Booth Poverty Map of 1898-9, although my example is considerably less sophisticated. For this map, Booth (and his assistants) visited every house, to determine the demographic of the house, and then painstakingly coloured in the houses, along the streets. His map therefore did not suffer from the falsely implied accuracy – his map really was as accurate as it looks. The Museum of London, incidentally, has a “walk in” Booth poverty map, I featured it on Mapping London blog last year.

The photo above compares Booth’s map (from a photo of the map in the Museum exhibition, including a friend’s hand) with my map, for the Hackney area in London.

OAC, IMD and London

My main geodemographic map is showing the OAC (Output Area Classification), which was created by Dan Vickers in Sheffield in 2005, and is based on data from the 2001 census. The areas used are Output Areas, there are around 210,000 of them in the UK, each one with a population of roughly 250 people in 2001.

The OAC map is not particularly illuminating for London – the capital is considerably more ethnically diverse than most other parts of the country, but because the clustering process used to create OAC is run across the whole country uniformly, only one Supergroup appears to show such ethnically diverse areas – “7” (Multicultural), rather than showing the variety within this group that extends across the capital. With this in mind I have created an alternative map, which colours the housing according to the IMD (Index of Multiple Deprivation) rankings. This covers England only, and the data is only available at larger spatial units, called LSOAs (Lower Super Output Areas) but is more up-to-date, being from 2010, and shows considerable more variety across London. Use the link at the bottom of the visualisation to switch between the two.

You can view the map here. It uses geolocation to attempt to zoom to your local area, if you allow it to – it will probably ask you to allow this when you visit the site.

Visit the new Shop
High quality lithographic prints of London data, designed by Oliver O'Brien

22 thoughts on “Reworking Booth: Geodemographics of Housing

  1. Really effective and visually engaging technique Ollie. I expect the OS would be interested in your innovative use of Vector Map, as there’s not many applications yet for this dataset.

    Worth considering trying out this technique on unit postcode data too. Unit postcodes would allow other open data housing information such as house-prices to be mapped, plus would help with the ecological fallacy issue, due to the smaller zone sizes. You could also consider putting in the zone boundaries as dashed lines, though might spoil the clean cartographic style.

    1. Agreed, there are many potential datasets that could be mapped in this way, and using postcode areas would show more precise maps.

      Regarding zone boundaries – unfortunately postcode boundaries are not part of the Open Data release. Postcode centroids are, but these would be less useful.

      Interestingly Google has recently started delivering generalised postcode boundaries in vector format to browsers – but they will have had to licence this specially, or else they’ve done something very clever with the data they’ve gathered.

  2. Hi Ollie,

    Really nice. Did you do a spatial-join between output areas and the building outlines to achieve this? Or some other method?

    There are some strange results in the census data, as anyone who has even been to Cleaver Square in Kennington (51.487663153896456, -0.10861873626708984) would note there is no way that is in the 2nd/3rd most deprived decile. It positively oozes money. I guess it’s the effect of some of the nearby council housing, though I’m still surprised.


    1. Hi James

      Yes, it was a spatial join using Quantum GIS. The resulting shapefile was then put into a PostgreSQL/PostGIS database, and an attribute join done with the resulting polygons and the IMD data when rendering the tiles.

      The boundaries are generalised a bit (to 20m max allowable deviation) so this may explains some discrepancies. Also, the IMD is based on LSOAs rather than OAs – these are quite big (typical population of each is 1500 people) so the combination of a very poor and very rich area in the same LSOA will result in the middling rank that you identified.

  3. Holland Park Villas and Oxford gardens both mapped as orange/red. That can’t be right. Both streets are full of expensive mansions. Expensive owner occupied streets in w10 are also coloured red. Including David Cameron’s street.

  4. I’m not expert enough for this but impressed by the results and the care you took. But Isn’t there also a weakness in the IMD because it aims to capture the presence / scale of (each dimension of) deprivation but, as a result, pays no attention to who the rest of the people in the zone are. So the Cleaver Square / Holland Gardens problems people have commented on don’t actually flow from averaging but from something else. I rae about this snag somewhere but can’t recall…. Michael (UCL Bartlett)

  5. I too was very interested in this excellent tool, and noticed too some of the anomalies, until I realised (as Michael Edwards does above) that this only recognises the people who are deprived within each zone, and not the millionaires who live in the same street.

    Thus in Aberdeen Park N5 for instance, where a one bedroom flat in a Day block costs £542 per sq ft, the map shows 2nd most deprived decile because there is a small estate of council houses in the middle. One need only compare the Acorn data available for this street, which shows that the area is full of ‘senior management’ ‘professional’ couples ‘without children’.

    Incidentally, I understand that apart from his conversations with police, Booth based some of his analysis simple on the frontage width of properties, easily derived from the street. Thus very wide frontages were the highest class (“Upper-middle and Upper classes. Wealthy”, and narrow terraces, the lowest class (“Lowest class. Vicious, semi-criminal”).

    That’s not directly comparable now, as a narrow fronted terrace in Chelsea might be inhabited by millionaires, while a wide-fronted house in Newington Green might be owned by Hackney Council and divided into small flats occupied by welfare recipients…

    1. Hi – this is reasonably straightforward to do in Excel or Google Sheets – you don’t need to map it – however the tricky step is assigning an LSOA for each address (if you are using the Land Registry sold prices data) or otherwise getting your house prices data into LSOAs. If you have it for OAs or postcodes then you can aggregate from postcode to OA to LSOA using the lookup tables available on the ONS geography portal. Then it’s simple maths. NB The 2015 IMD measures are coming out at the beginning of October so you may want to wait there rather than using the current data from 2010.

  6. I’m looking at the Aberdeen (and Westhill data). Something is definitely off – Part of Queens Road from the Kingsgate roundabout east is showing as Orange Yet looking at them on street view you can tell they’re expensive

    Also out at Westhill there are a large number of new builds that are coming up light green when they are expensive. Plus the Houses up Westhill Heights are light green but they’re in the 7-800K range

    1. A couple of points on why this may be happening.

      Firstly, The note the bottom of the page, where it says: “IMPORTANT NOTE: Classifications are an average across the local Output Area/DZ or LSOA, rather than distinguishing between single houses (or even streets), therefore the colour coding on an individual house is not necessarily accurate.” is particularly pertinent in this case. The DZ concerned (which likely coverås ~400 houses – DZs have a typical population of 750) likely has both areas which are deprived and areas which are not deprived (the latter doesn’t necessarily mean expensive/affluent, it is possible for an area to be not deprived and not be affluent). ONS are not great.

      Additionally the index is from 2012 and so likely calculated from 2010 data, so the area may have changed if there have been substantial new builds since then. Additionally the map can be a little misleading because the outlines of the buildings don’t necessarily correspond with the date of the index (or data) but rather are generally from 2012/13.

Leave a Reply

Your email address will not be published. Required fields are marked *

Solve this * Time limit is exhausted. Please reload CAPTCHA.