Data Graphics Mashups Technical

Fewer Cities, More Cities

Some bad news and good news about the Bike Share visualisation.

The bad news – the operator behind the schemes in Paris, Seville, Vienna, Dublin, Brussels, Valencia and Toyama asked me to stop getting the current bike share data from their websites. Although I was just loading their webpages, “in practice you are extracting data from [the operator’s] databases and re-utilising it” and “[the] databases are protected under the harmonised sui generis database right, as provided under Directive 96/9/EC: chapter III article 7 (1) and (2).”

For these seven cities, you can still see a historical snapshot from last Monday, when the feeds were switched off, but not the live status, historical animation or trend graphs.

This is despite a quick search on the web revealing a six-month collection of data for one of the schemes (at four minute intervals), the resulting trends being shown at a conference; a better-service campaign website, again for one of the schemes, with regularly updated performance tables; and an iPhone app pulling in the data from numerous schemes run by the operator, amongst others.

Digital Urban also mentioned this in the context of Bike-o-Meter, which uses the aggregated data from my Bike Share maps.

Now for the good news – I’ve added in five more cities – Rennes, Bordeaux, Zaragoza, Mexico City and Rio de Janerio. Yay! The inclusion of Mexico City and Rio should hopefully counter some claims of an European/English-speaking bias! Mexico City’s scheme appears to be concentrated in one very affluent district of the metropolis, while Rio’s is based on the seafront south of the city, rather than in the main urban area.

Rennes is a particularly interesting example, more about that shortly.

[Update – turns out I’m not the first.]


English Counties and UAs in One

Great Britain’s administrative geography is rather complicated, particularly for England – some English areas are “three tier”, made up of counties which are subdivided into districts, and others are “two tier” consisting of unitary authorities. Then there’s London’s boroughs which are in a special category of their own as part of an authority.

The Ordnance Survey Open Data release (easy download page here) includes BoundaryLine, which includes the geography data file for the counties, and a separate for the districts, UAs and boroughs. The latter is complete (and also includes the Scottish and Welsh regions), but the former looks rather strange on a map, with “islands” of counties separated by a “sea”.

I received a request by someone who was interested in having a unified file, at county level for the non-GLA counties, but including the UAs and London boroughs to “fill in” the map. I’ve made such a file by doing a dissolve in Quantum GIS (the districts having the county name as an attribute), and it can be downloaded here (15MB zipped shapefile.) The data is derived from and therefore covered by the OS Open Data licence which requires simply that the original source must be attributed when using it – that is, the data contains Ordnance Survey data © Crown copyright and database right 2010.

The image above is showing the merged data, with the unmerged district data (dotted lines) superimposed.


Easy Webpage Scraping with Python

To produce the tube station usage mashup I obtained the data from the TfL website. Unfortunately the data is not in an immediately usable format – rather than there being a CSV file to download, or a large HTML table, the data is presented as a separate webpage for each station and each year.

Luckily, Python makes it easy to get the data as a CSV file – although you do need to know a little Regex too, to extract the data you want. To construct the regular expressions needed, I used an excellent online tool, RegExr.

Once you have your regular expressions ready, you just use Python’s Urllib, RE and CSV libraries, and some loops, to download the webpages, get the data, and write it into a CSV file.

Here’s the script I used – note I’m using the back-slash character at the end of some lines below to indicate line continuation:

import urllib2, re, csv

stationnums = {2003:4, 2004:4, 2005:4, 2006:4, 
2007:4, 2008:6}

addressPre = "

indRE = '.*?salign=right>([0-9]{1,9}?)</td>.*?'
totalRE = '.*?smillions)s=s([0-9.]{1,9}?)</strong>.*?'
nameRE = '.*?selected>(.*?)</option>'

resFile = open('results.csv', 'w')
resWriter = csv.writer(resFile, quoting=csv.QUOTE_MINIMAL)

for i in range(2003, 2009):
	for j in range(1, stationnums[i]+1):
		address = addressPre + "?id=" + str(j) 
		+ "&agekey=" + str(i)
		html = urllib2.urlopen(address).read()
		indRes = re.findall(indRE, html)
		totalRes = re.findall(totalRE, html)
		nameRes = re.findall(nameRE, html)
		if len(nameRes) > 0:
			resWriter.writerow([i, j, 
			nameRes[0], totalRes[0]] 
			+ [e for e in indRes])

Change the stationnums values for each year to 304 (except 2008, to 306) to get all the data.

Data Graphics Technical

Forests of Great Britain

From the OS OpenData’s Meridian 2 dataset, which was released under a free licence today, here is the extent of forest cover across Great Britain – there is a dedicated polygon shapefile within the distribution showing just this:

As a general rule for orienteering, areas with good forest cover have the best orienteering maps. Scotland and Wales beat England hands down for cover, although Surrey’s doing not too badly at all. The rule doesn’t always hold though – there’s a big patch near Cambridge – more so, it appears than the Lake District, but the latter is considerably finer for orienteering in.

The map contains Ordnance Survey data © Crown copyright and database right 2010.

Mashups Technical

OS OpenData is here

It’s the first of April – but it’s not an April Fool – lots of Ordnance Survey medium-scale data has been released today, under a licence compatible with Creative Commons’ Attribution, i.e. you can do what you like with it as long as you attribute and don’t misrepresent the data source.

The best mirror for the data I’ve found is at MySociety – the OS’s own servers have been apparently overloaded since the release went live.

The first use I’ve made of the data is taking the “CodePoint Open” set of postcodes and locations, I’m now using this data as the postcode lookup for OpenOrienteeringMap. If you type in a UK postcode there, it should now take you to exactly the right place. Before, the lookup was using NPEmap data, which was pretty good in general, but someone did spot some glaring errors when they were using it, coincidently, yesterday.

The attribution statement, by the way, can be seen by mousing over the “Jump to Postcode” text.

OpenStreetMap Technical

Map Adornments with Cairo

For my OpenOrienteeringMap service, I have a added PDF creation facility, which produces a map, ready to print, with a title, north arrow, club logo and link back to the website.

The map itself is rendered in Mapnik, which uses Cairo, using Python and the pycairo bindings. To add the adornments, I’ve also made use of these bindings, and use them at the same time.

The adornments are shown above highlighted in yellow, and the purple control circles and numbers are also added directly using Cairo, rather than being rendered as geospatial objects in Mapnik. This blog post concentrates on the ones at the top of the sheet.

Here’s part of the Python script used to produce the PDF. S2P is a constant used to convert from metres (i.e. map units) to points (i.e. screen/paper units) and its value is 72/0.0254 (72 points per inch, 1/2.54 inches per cm).

I’ve generally use capitals for the names for the various positioning values, with a sort of naming convention – “W” is width, “WM” is west (i.e. left) margin. For example, ADORN_L_W is the width of the logo adornment, in metres. The “surface” sheet is made up of various “contexts” – in effect content boxes, which are positioned and scaled onto the surface, and filled with the content.

We need to import some modules:

import tempfile
import mapnik
import cairo
import urllib

Firstly, set up the PDF sheet, or “surface”, specifying its size:

file = tempfile.NamedTemporaryFile()
surface = cairo.PDFSurface(, PAPER_W*S2P, PAPER_H*S2P)

It’s important to specify the sizes accurately so that printers will print without trying to reduce the PDF, so rendering the scale inaccurate. For example, A4 landscape needs to have a PAPER_W of exactly 0.2970 and PAPER_H of 0.2100.

Then set up the map element – note Mapnik requires integers for the widths and heights:

map = mapnik.Map(int(MAP_W*S2P), int(MAP_H*S2P))
mapnik.load_map(map, styleFile)

Create a context on the surface to draw the map onto, and shift it to allow for margins on the page.

ctx = cairo.Context(surface)
ctx.translate(MAP_WM*S2P, MAP_NM*S2P)
mapnik.render(map, ctx)

Then, just add each adornment in the right place. First the title – the text has been supplied in the URL so is decoded first:

text = urllib.unquote(title)

Then, write the title onto the surface using show_text. Obviously, the server needs to have the fonts installed – I was using Deja Vu Sans initially, but Arial is used for the “regular” Street-O maps that I’m mimicing, so I installed the Microsoft Truetype Core Fonts for Linux:

ctx = cairo.Context(surface)
ctx.select_font_face("Arial Black", cairo.FONT_SLANT_NORMAL, cairo.FONT_WEIGHT_NORMAL)
ctx.translate(MAP_WM*S2P, (MAP_NM-ADORN_T_SM)*S2P)

The image above – and the ones below – are shown at 150% of their actual size on screen.

Now to add a scale caption:

ctx = cairo.Context(surface)
ctx.select_font_face("Arial", cairo.FONT_SLANT_NORMAL, cairo.FONT_WEIGHT_NORMAL)
text = "Scale 1:" + str(scale)
width = ctx.text_extents(text)[4]
ctx.translate((MAP_WM+MAP_W)*S2P-width-(ADORN_A_W+ADORN_L_W)*S2P, (MAP_NM-ADORN_S_SM)*S2P)

..and below it, a scalebar and indicator:

ctx = cairo.Context(surface)
ctx.select_font_face("Arial", cairo.FONT_SLANT_NORMAL, cairo.FONT_WEIGHT_NORMAL)
scaleBarMetres = 500
if scale < 10000:
    scaleBarMetres = 200
text = str(scaleBarMetres) + "m"
width = ctx.text_extents(text)[4]
barCaptionX = (MAP_WM+MAP_W-(ADORN_A_W+ADORN_L_W))*S2P-width
ctx.translate(barCaptionX, (MAP_NM-ADORN_S_SM)*S2P)

..the scalebar itself, making use of stroke:

scaleBarW = scaleBarMetres/float(scale)
ctx.move_to((-scaleBarW-ADORN_S_PADDING)*S2P, 0)
ctx.rel_line_to(0, -ADORN_S_LARGETICK*S2P)
ctx.rel_line_to(0, ADORN_S_LARGETICK*S2P)
ctx.rel_line_to(scaleBarW*S2P/2, 0)
ctx.rel_line_to(0, -ADORN_S_SMALLTICK*S2P)
ctx.rel_line_to(0, ADORN_S_SMALLTICK*S2P)
ctx.rel_line_to(scaleBarW*S2P/2, 0)
ctx.rel_line_to(0, -ADORN_S_LARGETICK*S2P)

The north-arrow is done in a similar way. close_path is used to create a proper triangle, with no “ends” that might result in ugly capping effects. fill makes it solid.

ctx = cairo.Context(surface)
ctx.translate((MAP_WM+MAP_W-ADORN_L_W)*S2P-width, CONTENT_NM*S2P)
ctx.move_to(0, 0)
ctx.line_to(0.001*S2P, 0.002*S2P)
ctx.line_to(-0.001*S2P, 0.002*S2P)

The “N” below the arrow is also drawn with lines, but using round line joins and line caps to produce a smooth letter.

ctx.move_to(0, 0.001*S2P)
ctx.line_to(0, 0.008*S2P)
ctx.move_to(-0.001*S2P, 0.005*S2P)
ctx.rel_line_to(0, -0.002*S2P)
ctx.rel_line_to(0.002*S2P, 0.002*S2P)
ctx.rel_line_to(0, -0.002*S2P)

Finally, a logo is added. This is a bit trickier – the logo is a PNG, but the surface is a PDF. The way I got around this is to create a temporary ImageSurface and then switch the surface that the context is on – the graphic also gets appropriately scaled:

logoSf = cairo.ImageSurface.create_from_png(home+"/logo.png")
ctx = cairo.Context(surface)
width = logoSf.get_width()*ADORN_L_SCALE
ctx.translate((MAP_WM+MAP_W)*S2P-width, CONTENT_NM*S2P)
ctx.set_source_surface(logoSf , 0, 0)

Finally, putting it all together:

return file


A Google Streetview History

Inspired by Alex – all the places I’ve ever lived:

Google’s UK streetview coverage isn’t quite ubiquitous – a couple of the images are snapped looking across fields from the nearest main road, and one of the images is pointing in the correct direction, but the house is completely hidden by all the trees in that general area.

The New York one should be pretty easy to spot, as well as my move to London and its terraced housing. The last picture is of my parent’s new house – the Google camera seems to have fallen onto its side when driving down this particular steep and twisty road.


Prettier Maps with Mapnik 0.7.0

[Updated] The Mapnik 0.7.0 release last month has added a number of features, including one that makes choropleth maps, and others with numerous adjacent polygons, more pretty. The PolygonSymbolizer has a new parameter, gamma, which, when set to a value between 0 and 1, causes the polygon edges to bleed into each other slightly, removing distinctive hairlines that allow the background colour to “shine through” the joins.

Here’s a before-and-after of a choropleth of England & Wales. We don’t care much for the boundaries, as they are Middle-Level Super Output Areas (MSOAs), which are administrative boundaries that people are not familiar with, unlike say county boundaries. So, using the gamma parameter, we can hide them.

Mapnik 0.6

Mapnik 0.7

The new parameter is also useful for my OpenOrienteeringMap project, particularly to make large estuaries, which are typically made up of multiple polygons, appear contiguous. However, applying too low a value appears to affect the anti-aliasing of line features, even ones away from the affected polygons. Here, the gamma is just applied to the “water” style. Note the thin white line crossing the water, near the black dotted line that also crosses it, disappears, but the paths and roads start to distort too:

Mapnik 0.6 – thin white line running across the river.

Mapnik 0.7+patch (Gamma 0.8) – the thin white line has gone. Other features are unchanged.

The map tile, by the way, is showing the Greenwich Foot Tunnel crossing the River Thames in east London.

[Update – The crossed-out section highlights a bug that has now been fixed by the Mapnik development team. I’ve patched my Mapnik build and the gamma changes now only appear where expected.]

OpenStreetMap Technical

Converting OS Eastings/Northings to Grid References in Python

[Updated] I needed to programatically convert a series of Ordnance Survey easting and northings (e.g. 325940, 673060) to “six-figure” grid references (e.g. NT259731) for a project I’m currently working on. It’s a pretty straightforward conversion – no reprojection, just a different way of expressing the same position on the British National Grid.

The specific use is that the 1km x 1km tiles from NPEMap are coding according to eastings & northings, but the calibration files required by TimSC‘s warp-gbos require grid references for the corners and calibration points.

Such a procedure is already well catered for in Perl, PHP and Javascript. Here is a straightforward conversion of some Javascript I found here into Python:

# Derived from # def getOSGridReference(e, n): import math # Note no I gridChars = "ABCDEFGHJKLMNOPQRSTUVWXYZ" # get the 100km-grid indices e100k = math.floor(e/100000) n100k = math.floor(n/100000) if e100k6 or n100k12: return '' # translate those into numeric equivalents # of the grid letters l1 = (19-n100k)-(19-n100k)%5+math.floor((e100k+10)/5) l2 = (19-n100k)*5%25 + e100k%5 letPair = gridChars[int(l1)] + gridChars[int(l2)] # strip 100km-grid indices from easting & northing, # round to 100m e100m = math.trunc(round(float(e)/100)) egr = str(e100m).rjust(4, "0")[1:] if n >= 1000000: n = n - 1000000 # Fix Shetland northings n100m = math.trunc(round(float(n)/100)) ngr = str(n100m).rjust(4, "0")[1:] return letPair + egr + ngr # test print getOSGridReference(96000,906000) # NA960060 print getOSGridReference(465149, 1214051) # HP651141  

[Update – fixed a bug that gave the wrong grid references for the Shetland Islands – their northings have seven figures, and similarly places in the far west or south that only have five figure easting or northings respectively :oops:]


Simple Choropleth Maps in Quantum GIS

It’s straightforward to create attractive choropleth maps in Quantum GIS, but there are a few things that can trip you up in the process.

The choropleth map I want to show is the distribution of deliberate fire-starting across London. A more advanced analysis would weight by ward area or population size, but I’m just showing the raw results, on the probably flawed assumption that wards in London have approximately equal populations and don’t vary in size hugely. Here’s how to do it, in Quantum GIS 1.4 “Enceladus” which was released a few days ago.

1. My boundary spatial data is a shapefile of the wards of London, which Quantum GIS can load without a problem, but my statistical data is in the form of a CSV file, showing the numbers of deliberate fires in each ward. It’s supplied by the new London Datastore, and CSV should not be a problem, but unfortunately the “Join attributes” functionality in Quantum GIS needs the data in DBF format.

Annoyingly, the most recent versions of Microsoft Office applications do not allow you to save data files as DBFs. However, OpenOffice will allow the conversion.

The big gotcha is that Quantum GIS is quite happy to load the CSV file as a layer, and will not complain if you select it as a vector layer to join in the “Join attributes” dialog box. It will simply go ahead and produce a result shapefile containing no data. This was the monitor-throwing bit. You have to instead select “Join dbf table”, and as the caption suggests, it needs to be a DBF.

There is also a bug in the “Join attributes” dialog box – available table columns are not removed when you select different layers in the drop downs, they just get appended to the list, so be careful to select the correct one.

I’ve used the “Continuous Color” option for the symbology setting as this allows me to quickly change the colours and remove the outlines – using “Graduated Symbol” would be a more authentic choropleth as the map would show discrete colours for each grouping.

Here is the resulting choropleth map, with the default adornments. Looks rather nice!