Category Archives: CDRC

Big Data Here: The Code

So Big Data Here, a little pop-up exhibition of hyperlocal data, has just closed, having run continuously from Tuesday evening to this morning, as part of Big Data Week. We had many people peering through the windows of the characterful North Lodge building beside UCL’s main entrance on Gower Street, particularly during the evening rush hour, when the main projection was obvious through the windows in the dark, and some interested visitors were also able to come inside the room itself and take a closer look during our open sessions on Wednesday, Thursday and Friday afternoons.

Thanks to the Centre for Advanced Spatial Analysis (CASA) for loaning the special floor-mounted projector and the iPad Wall, the Consumer Data Research Centre (CDRC) for arranging for the exhibition with UCL Events, Steven Gray for helping with the configuration and setup of the iPad Wall, Bala Soundararaj for creating visuals of footfall data for 4 of the 12 iPad Wall panels, Jeff for logistics help, Navta for publicity and Wen, Tian, Roberto, Bala and Sarah for helping with the open sessions and logistics.

The exhibition website is here.

I created three custom local data visualisations for the big screen that was the main exhibit in the pop-up. Each of these was shown for around 24 hours, but you can relive the experience on the comfort of your own computer:

bdh_buses

1. Arrival Board

View / Code

This was shown from Tuesday until Wednesday evening, and consisted of a live souped-up “countdown” board for the bus stop outside, alongside one for Euston Square tube station just up the road. Both bus stops and tube stations in London have predicted arrival information supplied by TfL through a “push” API. My code was based on a nice bit of sample code from GitHub, created by one of TfL’s developers. You can see the Arrival Board here or Download the code on Github. This is a slightly enhanced version that includes additional information (e.g. bus registration numbers) that I had to hide due to space constraints, during the exhibition.

Customisation: Note that you need to specify a Naptan ID on the URL to show your bus stop or tube station of choice. To find it out, go here, click “Buses” or “Tube…”, then select your route/line, then the stop/station. Once you are viewing the individual stop page, note the Naptan ID forms part of the URL – copy it and paste it into the Arrival Board URL. For example, the Naptan ID for this page is 940GZZLUBSC, so your Arrival Baord URL needs to be this.

bdh_traffic2

2. Traffic Cameras

View / Code

This was shown from Wednesday evening until Friday morning, and consisted of a looping video feed from the TfL traffic camera positioned right outside the North Lodge. The feed is a 10 second loop and is updated every five minutes. The exhibition version then had 12 other feeds, surrounding the main one and representing the nearest camera in each direction. The code is a slightly modified version of the London Panopticon which you can also get the code for on Github.

Customisation: You can specify a custom location by adding ?lat=X&lon=Y to the URL, using decimal coordinates – find these out from OpenStreetMap. (N.B. TfL has recently changed the way it makes available the list of traffic cameras, so the list used by London Panopticon may not be completely up-to-date.)

bdh_census

3. Census Numbers

View / Code

Finally, the screen showed randomly chosen statistical numbers, for the local Bloomsbury ward that UCL is in, from the 2011 Census. Again, you can see it in action here (wait 10 seconds for each change, or refresh), and download the code from GitHub.

Customisation: This one needs a file for each area it is used in and unfortunately I have, for now, only produced one for Bloomsbury. The data originally came, via the NOMIS download service, from the Office for National Statistics and is Crown Copyright.

bdh_traffic3

Visit the new oobrien.com Shop
High quality lithographic prints of London data, designed by Oliver O'Brien

Population Density and Urban/Rural Split of the UK

popdens1

A new map on CDRC Maps showing perhaps one of the simplest demographic metrics – residential population density – how many people live in each hectare across the UK. The data is available at the smallest statistical area available (output areas in GB and small areas in NI) and I have combined this with the various urban/rural classifications used by the three national statistical agencies across the UK, to produce a single map. Colour is the urban/rural classification, and lightness/darkness shows how densely populated each area is. Because urban areas are so much more densely populated than rural ones are, I’ve used a series of scales to gradate the representation of density on the map – the scale used depends on the classification. This is the best way to allow both high and low density populated areas to be able to show local variations.

A few observations:

  • many linear blocks along roads in east London have a noteably high density compared to the rest of suburbia – there are not tower blocks here, just terraces, so maybe this is a sign of overcrowding?
  • The centre of Birmingham is extremely low density – very few residential blocks here.
  • There is a significant contrast between high-density Portsmouth, hemmed in on three sides by water, and the much lower density Southampton, not far away, which is not so constrained by the sea.
  • Many cities, such as Cardiff (above) show a distinct pattern where the inner city has two parallel zones of high-density population, either side of a relatively sparse CBD core. Other cities where this is seen include Plymouth, Glasgow and Leicester.

popdens2a

There are flaws in this method of combining datasets across national boundaries. The different agencies calculate in different ways. Notably, in Scotland, the small areas are themselves smaller in population and are designed to better encapsulate the urban part only of settlements, with different small areas for the rural parts. As such, Scottish villages tend to show up as higher density than their English counterparts, which by necessity often need to include a substantial rural element in order to hit their population threshold. This is a statistical quirk.

The other significant difference is that English/Wales define “sparseness”, while Scotland and Northern Ireland use “remoteness” and measure this quantitatively in terms of driving time to the nearest settlement of over 10000 people. The definition of sparseness does not relate to distance from such settlements and therefore there are some “urban” areas with population of over 10000 but in a sparse setting. For consistency, I consider these alongside remote settlements in the other nations, which are considered rural. The raw data download, on CDRC Data, includes a simple urban/rural flag if you prefer to use the strict urban/rural definitions.

See the map here.

popdens3

As ever, please note that maps on CDRC Maps show all buildings but the data is generally for residential buildings only. The data is a single value across the whole small area, not a measurement of population in individual buildings.

Visit the new oobrien.com Shop
High quality lithographic prints of London data, designed by Oliver O'Brien

Population Change in Great Britain 2011-14

popchange_doncaster

The ONS publish small-area population estimates annually, for England and Wales, and the NRS similarly do for Scotland. By taking two of these datasets, we can see how the population of Great Britain is changing – births, deaths, internal and international migration and military deployments/homecomings all act to fluctuate the population.

I’ve taken the 2011 and 2014 “mid-year” population estimates for LSOA and DZs – statistical areas with a typical population of 1000-1500 people – and compared them, to derive small-area population changes. You can see the resulting map here.

In London, a couple of striking patterns appear. Inner West London – Kensington & Chelsea, Fulham, Wandwsorth – is seeing a striking depopulation (orange on the map). This may be due to the tendency of landlords in these wealthy areas to convert old housing stock, that was split into multiple flats, back into houses for the (very) rich. In a few exceptional cases, houses themselves are being knocked together. The unaffordability of the area and its old-age population may also have something to do with it. Further east in Tower Hamlets, increased immigration and a high to-immigrant birth rate may be contribution to the rapid rise in the population here (10%+ in many area – dark purple on the map) in just 3 years. The increase across GB in total, from 2011-14, is 2.1%. Some of the large increases can be due to new university campus accommodation opening up, while large falls are often an indication of housing estates being demolished and redeveloped.

Many cities across Great Britain show a characteristic of newly-desirable city centres increasing in population, as denser housing developments pack people in, while the suburbs decrease in population. The Liverpool/Wirral conurbation is a fine example of this. An exception is Milton Keynes, where no Green Belt constraints its expansions, and new housing estates keep being built in the outer “blocks” of this grid city. Some smaller places with special employment constraints on them seem to be almost universally decreasing, such as Barrow in Furness, as well as Thurso and Greenock, both in Scotland.

Explore the map on CDRC Maps, and Download the data on CDRC Data.

Mapping Data: Beyond the Choropleth

I recently gave a presentation as part of an NCRM Administrative Data Research Centre England course: Introduction to Data Visualisation. The presentation focused on adapting choropleths to create better “real life” maps of socioeconomic data, showing the examples of CDRC Maps and named. I also presented some work from Neal Hudson, Duncan Smith and Ben Hennig.

Contents:

  • Technology Summary for Web Mapping
  • Choropleth Maps: The Good and the Bad
  • Moving Beyond the Choropleth
  • Example: CDRC Maps
  • Example: named – KDE “heatmap”
  • Case example: Country of Birth Map – concerns of the data scientist & digital cartographer

Here’s my slidedeck:

(or you can view it directly on Slidedeck).

A Map of Country of Birth Across the UK

eastse_countryofbirth

Above: Areas of east and south-east London with more than 8% of inhabitants being originally from (from top to bottom) India (in East Ham), Lithuania (in Beckton) and Nigeria & Nepal (in Abbey Wood).

[Updated] Ever wondered why some branches of Tesco, the ubiquitous supermarket, have an American food section, while others have a Polish food chiller? Alternatively, it might have a catch-all “World Food” aisle, or it might not. The supermarket is, of course, catering to the local community. Immigrants to the UK do not uniformly spread out across the country, but tend to cluster in particular localities.

The latest map that I’ve published on CDRC Maps is a Country of Birth map, which attempts to summarise such communities in one view. It uses the same technique as Top Industry, it maps the most common country of birth (excluding the home nation) of residents in each small area, as of the 2011 Census. The purpose of the map is to identify and map the approximate extent of single-country communities within the UK. For example, to see how big London’s Chinatown is, or whether a Little Italy in the capital still exists.

This map reveals such communities although there is an important caveat when looking at it. I have set out below the rules I applied when constructing it, the most important of which is that only 8% of inhabitants need to share a single country of birth, for it to appear on the map. Bear in mind that, across the UK, 87% of people were born here. These people do not appear on the map, unless they are outside their home nation (and not at all if they are English).

countryofbirth_keyThere are a number of rules I have needed to apply to make this a map that tells an interesting story in a measured and fair way:

  • I don’t map native births – the English-born people in England, Welsh-born in Wales, Northern-Irish born in Northern Ireland or Scottish-born in Scotland. There are almost no areas anywhere in the UK where people born in a single foreign-born country outnumber the native-born. If I did map such native births, then the map would be almost completely dominated by them, and would not tell much of a story.
  • I also don’t map the English-born within the other home-nations, because the population of England is so much larger than in Scotland, Wales etc such that even the small percentage of them moving into the other home nations would dominate the map of Scotland/Wales/NI, if included.
  • I only map a single-country foreign born area if at least 8% of local residents are from that country. This sounds like a low threshold and it is – if an area is coloured a particular colour, it might still have up to 92% of the local residents actually being native-born.
  • The above rule means that some very multi-cultural areas don’t get mapped, because they have a large number of non-native residents, but these are split amongst various countries such that none reaches the 8% threshold.
  • Necessarily, in the source data, some countries are combined together into regions, either for a whole region (e.g. Central America) or for other countries in a region (e.g. Other East Asia, not including China/Japan etc). This is how the underlying Census statistics are represented. This can have the effect of making a result (for a region) appear when it wouldn’t otherwise appear (for any country in the region). However the number of places where this happens is small so it does not overly bias the map.
  • A slight quirk of the census results is that the Scotland and Northern Ireland chose to, based on their own sum populations, aggregate some of the smaller-UK-population countries in a different way. For example, Northern Ireland doesn’t break out “Other Old EU” (e.g. Belgium) and “Other New EU” (e.g. Bulgaria) into separate categories. The Somalian population in Scotland is not presented as a distinct statistic, but it is in NI (and England/Wales). Again, this only affects countries/regions with smaller UK populations so doesn’t overly distort the map.
  • I don’t colour the map where it would be showing data for less than 10 people. This causes a most noticeable rationalisation of the map in Scotland, because the small areas here have a lower population (typically 125 instead of 250 people). This means Scotland’s country-of-birth diversity is a little underrepresented when compared with the other regions of the map.
  • I’ve used colour hues and brightnesses in an ordered way, to group together continents and regions. Greens = UK nations, Olives = Old EU, Browns = New EU, Yellows = North America, Pinks = Central America, Blues = Africa, Purples = Oceania, Reds = Asia. There is no particular meaning to the colours picked beyond this, but be aware that the eye is naturally drawn to some colour hues more than others.
  • If a second country of birth also scores over 8%, but with a smaller local population than the first, then this is shown in striped lines over the first, and labelled as such in the interactive key.

Have a look at the map, and mouse around to find the meaning for the current colour, or see the scrollable key on the right.

Why 8%? I found that dropping this threshold (I tried initially at 5%) results in a lot of “noise” on the map, where only two large families need to move to an area, for it to acquire their birth-country colour. Increasing this threshold (e.g. to 10%, which I tried) results in many of the interesting patterns disappearing.

Interesting, some famous “immigrant” areas of London virtually disappear on this map. Brixton and Hackney are still associated with the Jamaican communities moving there in the 1940s/50s, but, at 8% threshold they virtually disappear. Only at 5% is there a significant community pattern appear. Similarly, Wandsworth and Shepherds Bush are known for their Australian communities but these also almost vanish when moving from 5% to 8%. At a 5% threshold, Hackney and Islington show a “patchwork” effect of integrated multicultural communities of Irish, Turkish, Nigerian and Jamaican-born immigrants. These also disappear largely from the map at 8% threshold. Remnants of the Irish migration to Kentish Town are more obvious.

London remains a fascinating mix where people from many different countries have set up their home in neighbourhoods with established communities and retail that cater for them. While the UK’s other cities have “international” quarters too, none shows the diverse nature of these communities. Virtually every country in the key has a London neighbourhood. (N.B. Places where there are pockets of many nations in a small area in London, and elsewhere, often indicate a student population at a globally well-known university).

Away from London, the Scottish-origin communities in Corby and Blackpool stand out, while the Americans on military bases in East Anglia also dominate the map there. Luton has a Polish, Pakistani and Irish disapora.

As ever, I am mapping small-area statistics, not those for individual houses (I don’t have that information!) and the representation of a particular house on the map is indicative of the local area rather than each house itself. The addition of houses on CDRC Maps maps is intended to make the map more relatable to the population structure of towns and cities, but it can make the data more detailed than it actually is. The map also includes non-residential buildings – there’s no easy way to filter these with the open dataset used, and the great majority of buildings in the UK are residential.

[Update – See this excellent article written by CityLab on this map, which explains some of the above nuances in a better way than I attempted to.]

Below: There is a Little Italy, but it’s in Peterborough now.

peterborough_countryofbirth

Working Nation

leicester_industry

Top Industry maps the most popular employment for each of the ~220000 statistical small areas* within the UK. I’ve reused the “top result” (i.e. modal only) technique that has produced interesting maps for travel to work, to look at the Industry of Employment tables produced by the national statistics agencies, from the 2011 Census.

The tables I’ve used group each job into a Standard Industry Classification (SIC) category, I’ve then mapped which of these is the most popular. I’m mapping the home locations of workers, rather than where they work. I’m also only mapping where at least 20% of the working population falls into one of the categories. The “G: Wholesale retail trade, repair” category dominates through the UK – we are a nation of shopkeepers – so I’ve used a muted off-white colour to represent areas where this is the most popular. Other, rarer categories have more vivid colours.

swales_industryAll sorts of interesing patterns appear:

The map shows that the UK is far from homogenous when it comes to the industries and occupations that people work in. It reveals many areas where manufacturing remains the key employer for the local working community – typically mid-sized towns – while showing the diverse and uneven nature of the employment landscape in the larger cities. While remembering that the map is only showing the “top” (and second-top where relevant) industry category, and that other industry workers can also live in the same places, it still shows a structure and pattern consistent both with historical reasons for many of the communities’ development, but also the realities of the modern workforce, with new technology industries, and social work, becoming increasingly prevalent.

See the interactive map on CDRC Maps.
The data is available on CDRC Data.

edinburgh_industry

* Known as Output Areas in Great Britain and Small Areas in Northern Ireland.

sengland_industry

The Age of Buildings

liverpool_houseages

We don’t have individual building age open data in the UK, unlike in some other countries (the data has been used to great effect in New York City and Amsterdam) but the Valuation Office Agency, which amongst other things decides council tax bandings for residential properties, has published some interesting data on how old houses are in England and Wales – it’s their “dwelling ages” dataset. A separate governmental organisation, the ONS, publishes house prices summaries, at a relatively small-area* scale, on a quarterly basis for the previous year. I have combined both these datasets into a record on CDRC Data. and have mapped them both on CDRC Maps.

bristol_houseagesThe dwelling age data is supplied grouped in approximately ten-year age bands (+ a Pre-1900 catch-all) with a count of the number of houses in each band, for each small area (LSOA) in England/Wales. I’ve mapped just the modal band, that is, the band with the most number of houses in it**. In some cases, houses were steadily built in an area throughout the 20th century, so that the band assigned to that area is not actually very representative of the houses there – this can be spotted by looking at the “Classif. %” number which appears on the right.

Many UK cities show a pattern of Pre-1900 inner-city (dark grey on the map), with early 20th century houses out towards the edge (lightening blues). The “Green Belts” of the 1940s stopped this radial outward development, so, some old housing was instead overhauled to build 1960s-70s housing estates (shown in yellow) and more recently, the urban core has seen much of the recent housebuilding activity. This shows up on the map as an area of red in the centre of many cities. There are some exceptions – Milton Keynes is a large, and new, town, its map showing mainly yellows and reds.

Not all areas are constrained by Green Belts but some have other, physical constraints, such as the sea. Weston-super-Mare, for example, has steadily expanded westwards over the last 150 years:

westonsupermare_ages

A second map concentrates just on post-WW2 (1945+) building, showing the proportion of such houses in each area. Hello, riverside east London:

london_riverside

The house price pattern in England/Wales is quite familiar to many people – basically London is eye-wateringly expensive, particularly in the central and west, along with some satellite towns and cities (e.g. Oxford and Cambridge) but not others (e.g. Luton and Harlow). I’ve mapped the median house prices for each small-area as I think this better provides an indicator of a typical price paid. 50% of properties sold in the previous 12 months, in each area, sold for less than this amount, and 50% for more. As only a few houses in an area typically get sold in a year (I have included this number in the metric data) it is worth noting that the values can jump around a lot.

Explore the interactive maps:

houseprices

* There is separately individual house transactions (with prices) released regularly by a third organisation, the Land Registry, however I have not mapped this at this time.

** Where an area is fairly equally split between two bands, I’ve included the “runner up” band as well, shown thinner vertical stripes. This only appears where the runner up housing count is 90% of the modal band, and the two bands account for more than half of the total housing. I’m using Mapnik compositing operations to get the vertical stripes, rather than a very long and repetitive stylesheet. I calculated the modal band in Excel from the original VOA dataset by using MAX (to find the value) and nested IFs (to display the category). Calculating runner up (i.e. second from mode) was a little more tricky, but I was able to do this but using COUNTIF and LARGE (to find the value – which could the same as the mode, ie. multimodal) and then nested IFs/ANDs to display the category.