OpenStreetMappers of London

IMG_1370

I contributed a number of graphics to LONDON: The Information Capital, a book co-written by Dr James Cheshire, also of UCL Geography. Two of my graphics that made it into the book were based on data from OpenStreetMap, a huge dataset of spatial data throughout the world. One of the graphics, featured in this post, forms one of the chapter intro pages, and colours all the roads, streets and paths in the Greater London Authority area (around 160,000 “ways” which are discrete sections of road/path) according to the person who most recently updated them. Over 1500 indivdual users helped create and refine the map, and all are featured here. I was pleased to discover I was the 21st most prolific, with 1695 ways most recently modified by myself at the time that the graphic was produced.

The more active users will typically have areas around home and work which they intensively map, plus other, smaller areas such as contributions made during a mapping party or other social event organised by/for the London OSM community. Here’s an example filtering for just one user:

osm_dan

Putting the users together reveals a patchwork of key authors and more minor contributors, together forming a comprehensive map of the city. Detail levels vary, partly as the fabric of the city varies from area to area, but also as some contributors will be careful to map every path and alleyway, while others will concentrate on the driveable road network.

osm_detail

The data was obtained from a local copy of the OpenStreetMap database, for Great Britain, that I maintain for various pieces of work including OpenOrienteeringMap. You can obtain the data files from GeoFabrik (this link is to their new London-only version). The data was captured in early February 2014.

I used QGIS to assemble the data and applied the temp-c colour ramp, classifying across all the contributors – I then changed the ones which were assigned a white colour, to green. The colours used in the book are slightly different as some additional editing took place after I handed the graphic over. The colour ramp is relatively coarse, so multiple users will have the same colour assigned to them. The very long tail of OSM contributions (where only a small number of people make the great majority of edits) mean that this still means that most major contributors have a unique colour assigned to them.

osm_book
View larger version.

Download:

Note that these files actually are for an area that is slightly larger than the Greater London Authority extent – a buffer from Ordnance Survey Open Data Boundary-Line is used to mask out the non-GLA areas.

If you like this thing, it’s worth noting that Eric Fischer independently produced a similar graphic last year, for the whole world. (Interactive version).

Visit the new oobrien.com Shop
High quality lithographic prints of London data, designed by Oliver O'Brien

Tube Tongues – The Ward Edition

wardwords

If you are a Londoner but felt that Tube Tongues passed you by, maybe because you live in south-east London or another part of the city that doesn’t have a tube station nearby, then here’s a special version of Tube Tongues for you. Like the original, it maps the most popularly spoken language after English (based on 2011 Census aggregate tables released by the ONS, via NOMIS) but instead of examining the population living near each tube station, it looks at the population of each ward in London. There are 630* of these, with a typical population of around 10000. I’ve mapped the language as a circle lying in the geographic centroid of each ward. This is a similar technique to what I used for my local election “Political Colour” maps of London.

A few new languages appear, as the “second language” (after English) in particular wards: Swedish, Albanian and Hebrew. Other languages, which were previously represented by a single tube station, become more prominent – Korean around New Malden, German-speaking people around Richmond, Nepalese speakers in Woolwich, Yiddish in the wards near Stamford Hill and Yoruba in Thamesmead. Looking at the lists of all languages spoken by >1% of people in each ward, Swahili makes it on to a list for the first time – in Loxford ward (and some others) in east London. You can see the lists as a popup, by clicking on a ward circle. As before, the area of the circles corresponds to the percentage of people speaking a language in a particular ward. The very small circles in outer south-east London don’t indicate a lack of people – rather that virtually everyone there speaks English as their primary language.

English remains the most popularly spoken language in every ward, right across London. Indeed, there are only a three wards, all in north-west London, where it doesn’t have an absolute majority (50%). London may seem very multilingual, based on a map like this, but actually it is very much still Europe’s English-speaking capital. See the graphic below, which shows the equivalent sizes the circles are for English speakers, or click the “Show/hide English” button, on the interactive map.

Here’s the interactive map. There’s also a ward version of Working Lines.

* I’ve ignored the tiny City of London ones except for Cripplegate, which contains the Barbican Estate.

Background map uses data which is copyright OpenStreetMap contributors. Language data from the ONS (2011 Census).

wardwords_english

Visit the new oobrien.com Shop
High quality lithographic prints of London data, designed by Oliver O'Brien

DataShine: Local Area Rescaling & Data Download

Cross-posted from the Datashine Blog.

DataShine Census has two new features – local area rescaling and data download. The features were launched at the UK Data Service‘s Census Research User Conference, last week at the Royal Statistical Society.

Local Area Rescaling

This helps draw out demographic versions in the current view. You may be in a region where a particular demographic has very low (or high) values compared to the national average, but because the colour breakout is based on the national average, local variation may not be shown clearly. Clicking on the “Rescale for current view” button on the key, will recolour for the current view.

For example, the popularity of London’s underground network with its large population, means that, for other cities with metros or trams, their usage is harder to pick out. So, in Birmingham, the Midland Metro can be hard to spot (interactive version):

metro1

Upon rescaling, just the local results are used when calculating the average and standard deviation, allowing usage variations along the line to be more clearly seen:

metro2

As another example, rescaling can help “smooth” the colours for measures which have a nationally very small count, but locally high numbers – it can remove the “speckle” effect caused by single counts, and help focus on genuinely high values within a small area.

Hebrew speakers in Stamford Hill, north-east London (interactive version):

hebrew1

Upon rescaling, a truer indication of the shape of the core Hebrew-speaking community there can be seen:

hebrew2

Occasionally, the local average/standard deviation values will mean that the colour breakout (or “binning”) adopts a different strategy. This may actually make the local view worse, not better – so click “Reset” to restore the normal colour breakout. Planning/zooming the map will retain the current colour breakout. PDFs created of the current view also include the rescaled colours.

Data Download

On clicking the new “Data” button on the bottom toolbar, you can now download a CSV file containing the census data used in the current view. Like the local area rescaling functionality, this data download includes all output areas (or wards, if zoomed out) in your current view. This file includes geography codes, so can be combined with the relevant geographical shapefiles to recreate views in GIS software such as QGIS.

Next on the DataShine project, we are looking to integrate further datasets – either aggregating certain census ones or including non-census ones such as IMD and IDACI deprivation measures, or pollution.

Working Lines

workinglines_northern

As a followup to Tube Tongues I’ve published Working Lines which is exactly the same concept, except it looks at the occupation statistics from the 2011 census, and shows the most popular occupation by tube station. Again, lots of spatial clustering of results, and some interesting trends come out – for example, the prevalence of teachers in Zones 3-4, that there is a stop on the central line in north-east London which serves a lot of taxi drivers, and that bodyguards really are a big business for serving the rich and famous around Knightsbridge.

The northern line (above) stands out as one that serves a community of artists (to the north) and less excitingly a community of business administrators (to the south). Tottenham/Seven Sisters has a predominance of cleaners, and unsurprisingly perhaps plenty of travel agents live near Heathrow. I never knew that the western branch of the central line, towards West Ruislip, was so popular with construction workers. Etc etc.

Only the actively working population is included, rather than the full population of each area. This makes the numbers included in each buffer smaller, so I’ve upped the lower limit to the greater of 3% and 30 people, to cut down on small-number noise and minimise the effect of any statistical record swapping.

Tube Tongues

tubetongues

I’ve extended my map of tube journeys and busy stations (previous article here) to add in an interesting metric from the 2011 census – that of the second most commonly spoken language (after English) that people who live nearby speak. To do this I’ve analysed all “output areas” which wholly or partly lie within 200m radius of the tube station centroid, and looked at the census aggregate data for the metric – which was a new one, added for the most recent census.

See the new map here.

tubetongues_vicEach tube station has a circle coloured by, after English, the language most spoken by locals. The area of the circle is proportional to the percentage that speak this language – so a circle where 10% of local people primarily speak French will be larger (and a different colour) than a circle where 5% of people primarily speak Spanish.

Language correlates well with some ethnicities (e.g. South Asian) but not others (e.g. African), in London. So some familiar patterns appear – e.g. a popular, and uniform, second language appearing at almost all Tower Hamlets stations. Remember, the map is showing language, not origin – so many of the “Portuguese” speakers, for instance, may be of Brazilian origin.

Click on each station name to see the other languages spoken locally – where at least 1% of local speakers registered them in the census. There is a minimum of 10 people to minimise small number “noise” for tube stations in commercial/industrial areas. In some very mono-linguistic areas of London (typically in Zone 6 and beyond the GLA limits) this means there are no significant second languages, so I’ve included just the second one and no more, even where it is below 1% and/or 10 people.

This measure reveals the most linguistically diverse tube station to be Turnpike Lane on the Piccadilly Line in north-east London, which has 16 languages spoken by more than 1% of the population there, closely followed by Pudding Mill Lane with 15 (though this area has a low population so the confidence is lower). By contrast, almost 98% of people living near Theydon Bois, on the Central Line, speak English as their primary language. English is the most commonly spoken language at every tube station, although at five stations – Southall, Alperton, Wembley Central, Upton Park and East Ham – the proportion is below 50%.

turnpikelane

A revealing map, and I will be looking at some other census aggregate tables to see if others lend themselves well to being visualised in this way.

I’ve also included DLR, Overground, Tramlink, Cable Car and the forthcoming Crossrail stations on the map. Crossrail may not be coming until 2018 but it’s very much making its mark on London, with various large station excavations around the capital.

The idea/methodology is similar to that used by Dr Cheshire for Lives on the Line. The metric was first highlighted by an interesting map, Second Languages, created by Neal Hudson. The map Twitter Tongues also gave me the idea of colour coding dots by language.

One quirk is that speakers of Chinese languages regularly appear on the map at many stations, but show as “Chinese ao” (all other) rather than Cantonese, whereas actually in practice, the Chinese community do mainly speak Cantonese (Yue) in London. This is likely a quirk of the way the question was asked and/or the aggregate data compiled. Chinese ao appears as a small percentage right across London, perhaps due to the traditional desire for Chinese restaurant owners to disperse well to serve the whole capital? [Update – See the comments below for an alternative viewpoint.]

The TfL lines (underground, DLR etc), station locations and names all come from OpenStreetMap data. I’ve put the collated, tidyed and simplified data, that appears on the map, as GeoJSON files on GitHub GIST.

Conference Review: GIScience 2014

IMG_0953c

I was in Vienna for most of last week, presenting at a satellite workshop of the GIScience conference, before joining the main event for the latter part of the week.

GIScience is a biennial international academic conference, alternating between America and Europe. At the intersection between geography, GIS and information visualisation. It is very much academically focused, which contrasts strongly with FOSS4G (GIS technology), WhereCamp (GIS community) and the AGI (GIS business).

My highlights for this year’s conference:

  • Jason Dykes (City) gave a keynote on balancing geovisualisation and information visualisation. As ever with presentations from City’s GICentre unit, the graphics were presented by way of various live demos and compellingly explained.
  • UCL Geography/CEGE had a strong presence of the conference and various of my colleagues gave presentations, a number focusing on using geolocated social media, both as a tool for research (e.g. population synthesis) and for research itself. There was also an unveiling of LOAC (UCL/Liverpool), a classification specially built for London, further details on this to follow soon as LOAC is signed off and rolled out.
  • Another UCL Geography presentation on comparing surname clustering and genotype clustering in the UK
  • A interesting presentation from TU Eindhoven on automatically creating and simplifying network diagrams using circular arcs.
  • Automatic Itinerary Reconstruction from Texts (LIUPPA/Pau) – showed how a fairly accurate map can be made simply by scanning prose, and otherwise unknown locations of places can be roughly determined by their textual relations to other, known places.

Many of the talks appear in an LNCS proceedings book.

Outside of the conference, much Wiener Schnitzel and Gelato was consumed, and historic old Vienna was explored. A highlight was conference drinks in the huge barrelled halls underneath the very grand city hall.

IMG_0963ec

Mapping Geodemographic Classification Uncertainty

oxford_sed

I’m presenting a short paper today at the Uncertainty Workshop at GIScience 2014 in Vienna, looking at cartographic methods of showing uncertainty in the new OAC 2011 geodemographic maps of the UK using textures and hatching to the quality of fit of areas to their defined “supergroup” geodemographic cluster.

Mapnik was used – its compositing operations allow the easy combination of textures and hues from the demographic data and uncertainty measure onto the same tile, suitable for displaying on a standard online map.

These are my presentation slides (if you get a bandwidth message, try refreshing this webpage, or download here):

You can download a PDF of the short paper at from here.

A special version of the OAC map, which includes the special uncertainty layers that you can see in the paper/presentation, can temporarily be found here. Use the extra row of buttons at the top to toggle on/off uncertainty effects, and see the SED scores at the bottom left, as you mouse over areas. Note that this URL is a development one and so likely to change/break at some point soon.

Background mapping is Crown Copyright and Database Right Ordnance Survey 2014, and the OAC data is derived from census data that is Crown Copyright the Office of National Statistics. Both are used under the terms of the Open Government Licence.

A Result/Turnout Correlation for the Scottish Independence Referendum?

graph_corr2

A final update to my Scottish Independence Referendum Data Map – the circle borders now show the turnout percentage, with the highest (>90%) as a solid green, the lowest showing as red.

There is a weak (R^2 = 0.177) negative correlation between the Yes vote %, and the Turnout %, suggesting that the Yes campaign had more difficulty in getting its supporters to vote on the day. This may be due to the traditional tendency for older voters to turn out more than younger ones, and the polls suggesting that younger people were more likely to vote Yes. (The BBC has more on the demographics of the Scottish voters.)

You can see this weak correlation on the map, with green-borders (high turnout %) on red circles (low Yes %), and some of the bluer areas (high Yes %) having red borders (low turnout %), although East Dumbartonshire is a noticeable exception.

map_corr

OpenLayers 3

ol-logo

As a learning exercise, I been trying to “migrate” my recent #indyref map from OpenLayers 2.13.1 to the very new version 3.0.0 of the popular mapping API. It seemed a good time to learn this, because the OpenLayers website now shows v3 as the default version for people to download and use. Much of my output in the last few years has been maps based on OpenLayers, so I have considerable interest in the new version. There are some production sites using OpenLayers 3 already – for example, the official Swiss map.

I use the term “migrate” in inverted commas, because, really, OpenLayers 3 is pretty much a rewrite, with an altered object model, and accordingly requires coding from scratch a new map rather than just changing a few lines. It has so far taken me four times as long to do the conversion, as it did to create the original map, although that is an inevitable consequence of learning as I go along.

I’ll update this blogpost as I discover workarounds.

Shortcomings in v3 that I have come across so far:

  • No Permalink control. This is unfortunate, particularly as “anchor” style permalinks, which update as you move around the map, are very useful for visualisations like DataShine where people share specific views and places, and I can inject extra parameters in. The site linked above suggests this is a feature that should not be in the core mapping library, but instead an additional library can query/construct necessary parameters. Perhaps, but I think layer/zoom/lat/lon parameters are such a key part of a map (as opposed to other interactive content) that they still deserve to be treated specially.
  • The online documentation, particularly the apidoc, is very sparse in places. As mentioned above, there is also some mismatching in functionality suggested in the online tutorials, to what is actually available. Another example, the use of “font” instead of “fontSize” and “fontStyle” for styles. This will improve I am sure, and there is at least one book available on OpenLayers 3, but it’s still a little frustrating at this stage.
  • Label centering on the circle vectors is not as good as with OL 2. This is possibly due to antialiasing of the circle itself. You can see the labels “jump” slightly when comparing the two versions – see links below.
  • Much, much slower on my iPhone 4 (and also on a friend’s Android phone). This is not what I was expecting! This is the “killer” problem for me which means I’ve kept my map on OL 2 for now. Wrapping my vector layer in an Image layer is supposed to speed things up, but causes the layer not to display on my iPhone. Disabling the potentially expensive mousemove listener did not make a difference. Adding a viewport meta tag with width=device-width speeded things up a lot so that it was almost as fast as OL 2 (without the meta tag) but then I would need to rewrite my own UI for mobile – something I don’t need to do with the OL 2 version!
  • No support (yet) for UTFGrids. These are a form of vector tiles, for metadata rather than geographic features, which I use on the DataShine project.

Things which I like about the new version:

  • Smooth vector resizing/repositioning when zooming in/out on a computer. (N.B. This is only when using a Vector layer and a Vector source, rather than Image layer with an ImageVector source that itself uses a Vector source.)
  • Attribution is handled better, it looks nicer.
  • No need to have a 100% width/height on the map div any more.
  • Resolution-specific styling. I’ve used this to hide the labels when zoomed out beyond a certain amount.
  • Can finally specify (in a straightforward fashion) a minimum zoom level.
  • Point coordinates and extents/bounds are specified in a much simpler way.
  • On a more general note, the new syntax is more complete and feels less “hacky”. The developers have taken the opportunity to do it “right” and remove inconsistencies, misplaced functionality and other quirks from the old version. For example, separating out visual UI controls and interaction management controls into two separate classes.
  • Drag-and-drop addition of KML/GeoJSON vector features. Example (use this file as a test).

Some gotchas, which got me for a bit, but I was able to solve:

  • You need to link in a new ol.css stylesheet, not just the Javascript library, in order to get the default controls to display and position correctly.
  • Attribution information is attached to a source object now, not directly to the layer. A layer contains a source.
  • Attribute-based vector styling is a lot more complicated to specify. You need to create a function which you feed in to an attribute. The function has to return a style wrapped in an array – this may be the closure syntax in Javascript that I have not come across before.
  • Hover/mouseover events are not handled directly by OpenLayers any more – but click events are, so the two event types require quite different setups.
  • Minor differences between the debug and regular versions of the library. The example I noticed is that the debug version allows ol.control.ScaleLineUnits.METRIC to be specified as an attribute for the ScaleLine control, but the non-debug version needs to use an explicit string “metric”.
  • No opacity control on individual styles – only on layers. This means I can’t have the circles with the fill at 80% opacity but the text at 100% opacity. Opacity can be set on styles, but has to be specified as part of the colour, in RGBA format (where A is the alpha, i.e. opacity, you want) rather than as a separate attribute. This is contrary to the tutorials on the website. Layer opacity can continue to be specified as seperate attributes.

My OpenLayers 3 version of the #indyref map is here – compare with the OpenLayers 2 one. Note that, since first writing this blogpost, I’ve subsequently updated the OpenLayers 2 one to change the cartography there further.

Scottish Independence Referendum: Data Map

indyref

Scotland’s population is heavily skewed towards the central belt (Glasgow/Edinburgh) which will affect likely reporting times of the independence referendum in the early hours of Friday 19 September, this being dependent both on the overall numbers of votes cast in each of the 32 council areas, and the time taken to get ballot boxes from the far corners of each area to the counting hall in each area. Helicopters will be used, weather permitting, in the Western Isles!

There is also likely a significant variation in the result that each area declares – with regions next to England (so dependent on trade with them) and furthest away from them (so benefiting most from support) likely to strongly vote “No”, the major cities being difficult to call, and the rural areas and smaller, less affluent cities of the central vote much more likely to vote “Yes”. Note that unlike a constituency election which is “first past the vote” for each area, the referendum is a simple sum-total for everyone, so while it will be interesting hearing each individual results, ultimately we won’t know the result until almost every area has declared the result, and the lead for one side becomes unassailable (areas will declare the size of the vote well before the result, which will make this possible).

A screenshot of a table, in a report “Scotland referendum: Looking through the mist” from the Credit Suisse Economics Research unit, was circulating Twitter a couple of days ago:

It has estimates on all three of these metrics, so I’ve taken this, combined it with centroids of each of the council areas, and produced a map. Like many of my maps these days, coloured circles are the way I’m showing the data. Redder areas are more likely to vote no, and larger circles have a larger registered population. The numbers show the estimated declaration times. Looks like I’ll be up all night on Thursday. Mouse over a circle for more information.

View the live #indyref map here.

ps. I’ve subsequently got hold of a copy of the report concerned. To quote the methodology for determining the “Yes” rating, it’s

“derived from support for the Scottish National Party in the 2012 local elections. We… show a range from 0 (the lowest local vote [share] for SNP in 2012, excluding Orkney and Shetland where the vote was negligible) to 10 (highest local vote share for SNP).”

This implies the Orkney/Shetland results were not used in the 0-10 scaling, as their very low results for the SNP overly skewed the metric.