Categories
Data Graphics Mashups OpenLayers OpenStreetMap

CensusGIV Prototype Presentation for CASA

My boss (Dr Pablo Mateos) and I gave this presentation today as part of this term’s CASA Seminar series here at UCL. My bit starts at slide 22 (of 60! – we just about managed it in the hour but only by rushing at the end.)

chorogen
CensusGIV – Geographic Information Visualisation of Census Data

View more documents from oliverobrien.

Note that the censusprofiler site mentioned a couple of times in the presentation only has a (very out of date) blog on it at the moment, and the prototype itself is not yet available for general use – with luck, an alpha version will be available to play with by the end of the year.

Categories
Data Graphics

Flow Map

While search for a way to visualise flows between schools and universities in England, I came across this excellent bit of software by Doantam Phan et al from Stanford: Flow Map (paper). It was presented at Infoviz 2005.

It’s a Java application, simple to install, and with the addition of an appropriately formatted data file, you can have a flow map up quickly. The software allows points to be moved around to declutter the visualisation – allowing a good balance between geographic accuracy and clarity.

Here’s the English universities that Southend-on-sea school pupils choose to go to – N and W England cropped out for simplicity:

sthdaa_crop

London and the other multi-university cities are “exploded” but the universities retain their geographic equivalence.

Categories
Data Graphics

Data Graphics and Beer Don’t Mix

Here’s an example of an outstandingly misleading data-graphic, appearing in this week’s LondonStudent freepaper.

beergraphic

It attempts to show the disparity of bar staff pay across London universities, but:

  • The “empty” pint glass does not correspond to £0.00. To the casual observer it makes it look like CSSD work for free, until the figures are noticed. In fact, the text of the article mentions that, at two other London university bars, the staff do in fact work for free (or for beer – ouch) – but these are not shown on the graphic.
  • The graphic is a 2D (i.e. print) representation of a 3D object (a pint glass, tilted slightly towards the viewer) but the scale appears to vary in 1D – the values form a straight line across the “glass”. Hence the graphic has a large “Lie Factor”, the concept discussed in detail in E. Tufte’s totemic book The Visual Display of Quantitative Information (p57 for those making notes!)
  • LSE’s amount bizarrely isn’t represented at all in the graphic, but appears in a text box above it.
  • The numbers are on their side, even though there is plenty of room to show them horizontally – making the real values harder to read, so the reader concentrates on the misleading graphic representation instead.
  • The actual levels don’t bear any resemblance to the values – the ordering is correct but the relative value differences don’t correspond to the “beer levels”. For example, the drop between £5.90 and £5.95 is larger than the £5.95 to £6.25 drop.
  • Why’s beer being used to represent pay anyway?
Categories
Data Graphics Mashups

HEFCE Funding Map

This is the sixth in a series detailing the projects I have worked on at UCL in the last academic year.

hefce

This was a quick mashup to show on a map the latest HEFCE funding round – HEFCE is the government body that decides and awards research funding to the universities around the UK.

This is a vector-based mashup, once again using OpenLayers. For each point, representing a university’s “main” campus, I request a pie-chart from the Google Charts API, and use the resulting image as the marker for the point. There’s no simplification or other generalisation, so, for example, you’ll need to zoom in quite far if you want to make out the different universities in London.

It was cobbled together in about a day, the Thematic Mapping blog was particularly useful for getting the images working as markers.

You can see the mashup here.

Categories
Data Graphics Mashups OpenLayers

HE Profiler

This is the fourth in a series detailing the projects I have worked on at UCL in the last academic year.

The HE Profiler is the last of the three “core” school-profiling map mashups that I have developed over the last year – this has been developed over the last few weeks and indeed was finished only today, my final project of the year.

It is designed to be used by university widening-participation administrators, as a graphical tool to discover and evaluate the schools to target for campaigns to encourage university application. To do this, it makes use of two metrics – the OAC demographics of pupils attending each school, and the POLAR score of their postcode – in simple terms a National Statistics demographic describing the likelihood that people from this postcode go to university.

Again it is powered by OpenLayers, displaying point-based vector information on top of Google Maps image tiles, using NPE data for geocoding postcodes. The most interesting thing about this application is I’ve started to explore the very powerful rule/attribute based symbolisation for points available in OpenLayers. This sort of symbolisation will be, I expect, very useful in my next year’s project. I am very impressed with what can be done – some quite GIS-like properties present in a popular and freely available web application.

heprofiler

The graphic above shows target schools for a central-London university, based on the proportion of POLAR1/2 pupils (least likely to go to universities) compared with the rest. Schools with a majority of pupils in this category are coloured red. The area of each circle represents the number of such pupils present. The poor representation at university of the Thames Gateway region can be clearly seen. As an aside, the OAC demographic, not shown here, does not work well for London due to its size – the OAC is calibrated across the whole UK, and it is likely a more specific demographic analysis for London (e.g. LOAC) for schools there, would be more useful.

Categories
Data Graphics Technical

Spatial Interaction Modelling for Access to Higher Education

This is the first in a series detailing the projects I have worked on at UCL in the last academic year.

My main project through the last year has been to test a hypothesis, developed by Professor AG Wilson, that the flows of students moving from school to university can be approximately by spatial interaction modelling (SIM). Put simply, SIM is a variant of the 300-odd year old Newton’s Law of Universal Gravitation, i.e. the attraction between two masses is related by each of their masses and the distance between them. Replace the masses by the numbers of final-year pupils a school, and a university’s capacity, and make the distance decay exponential instead of inverse-square, and that’s the basics of the model. A similar theory has been applied to great effect by Joel Dearden of CASA, in his retail SIM, which has shown a “tipping point” explaining how supermarkets and out-of-town retail developments have become attractive to shoppers over the last forty years.

Of course, it’s a little more complicated than that, and even with the more complex model I’ve tested, a large number of simplifying assumptions have to be made.

The two main extra parameters that are added to the model are (1) that universities have an “attractiveness factor” above and beyond their size. I have used one of the common university league tables to provide values for this factor. And (2) the distance-decay is not uniform across all types of school students, but varies by their background. By splitting up the final-year school students by demographic, the variation in the distance-decay can be seen, and this is used to calibrate the model.

simdecay2b

The seven OAC demographic supergroups are shown here – the horizontal scale is distance and is the same in each graph. (Only English-based school students going to English universities are considered in the study.) The vertical scale is the proportion of students, of that OAC supergroup, in each distance bucket. The actual number of students in each supergroup varies dramatically and this is not shown in the graphs.

The graphs show there is indeed considerable variation between supergroups in the “beta value” of the drop-off if approximated as exponential, and also in the “R-squared” fit to true exponential decay.

  1. Blue collar.
  2. City living – this group strongly favours London, Birmingham and Manchester, i.e. the same or other “big cities” in England, hence characteristic peaks appear at these distances – accentuated by the relatively small school-age population in this group.
  3. Countryside – this group rises before falling, as there is a minimum distance they need to travel to get to even their nearest university.
  4. Prospering suburbs – the lowest beta-value, in other words this group attaches the least importance to school-university distance.
  5. Constained by circumstance – similar to the first group.
  6. Typical traits – the “average” group which encouragingly also has an average looking graph.
  7. Multi-cultural – more distance-sensitive than the others – hence the very steep drop-off. This shows that people living in areas classified as multi-cultural will more strongly desire going to a university that is very local to their home.

Prof Wilson’s theory also factors in the subject that the student is studying (not all universities offer all subjects, and some are most are strong in certain subjects and weak in others), and their attainment at school (i.e. they might really want to study Maths at Oxford, and be at a school very near by, but if they get a D in Maths at A-Level, they aren’t going to be able to do that.)
Universities also come in two types – “recruiting”, where there are more places than students genuinely intending studying there, and “selective”, where there are more prospective students than places. One interesting effect of the recent economic downturn is the massive increase in people applying for university in 2009-10 – UCL saw a 12% increase for undergraduate courses, for example. This has had the effect of making more universities selective.

In order to consider two types in the same model, it was necessary to develop what is known as a “partially constrained” SIM. The details are for a future article, but, put simply, an iterative approach, assigning students to a university and then reassigning the weakest for over-capacity universities, is taken.

I built a GUI in Java – it’s the language I’m most comfortable with for “proper” programming – to quickly visualise the results and compare them with real-life flows. Here’s a bit of it:

simpredicted

This shows the perhaps not very surprising prediction that BIRM7s (multi-cultural school students living in Birmingham) are pretty likely to also go to university in Birmingham (AST = Aston, BCU = Birmingham City University, BIR = University of Birmingham), rather than elsewhere in the country.

When compared with the actual flows:
simactual
…the model under-predicts the flow to Birmingham City University, possibly because BCU’s desirability amongst this demographic group is mis-calibrated. Further-education students are also not present in the predicted model, but are included in the actual flows, so the two are not, as presented, normalised.

The model needs to be developed further before it can be presented formally. In particular, attainment is almost certainly a necessary component.

Categories
Data Graphics

Open Plaques

IMG_0069The Open Plaques project, currently in alpha, is aiming to catalogue, photograph and georeference the numerous “blue plaques” scattered around London and elsewhere in the UK. Blue plaques generally mark the house where someone famous lived, or some other event happened. The London ones are generally blue and circular, and are put up by English Heritage or the local borough councils. Other towns and cities have their own schemes.

Contributing to the project is as easy as uploading a (georeferenced) photograph to the Open Plaques Flickr group. In due course, a “machine tag” will appear on your photo, linking it to a blue plaque in the Open Plaques database, and the photo itself should also appear on the site, as long as you’ve specified a licence that allows this to be done. If the plaque is missing from the database altogether, then a new entry presumably gets set up.

Note that the iPhones automatically georeference photos as you take them, however the GPS positional accuracy is very poor unless you give time to settle enough satellites, so I manually re-georeferenced the photo in Flickr using the interactive map tool there.

The good news is that the data on Open Plaques is public domain, so can be used for any purpose. Potentially this could include adding the plaques into OpenStreetMap in the future.

Open Plaques derived the London list from the English Heritage website, which has details and addresses, but no maps or photos. This is very similar to something I did for part of my MSc dissertation last summer, which was looking at using modern GIS and geospatial techniques for enhancing Street-O maps.

Street-O events generally involve finding places and noting down a specific answer at the place, to prove you’ve been there. Blue plaques are popular with course planners, as they are generally unique, in one clearly defined location and contain unambiguous information that the competitor is unlikely to already know.

For part of my dissertation, I screen-scraped the English Heritage website, ran the addresses through Google Local to geocode them, and then plotted the results in a GIS – the idea being the race planner could then use these to build up a race map and question sheet, without having to trawl the streets trying to find plaques manually. Around 80% of the plaques were successfully placed on the map in this way, although the geocoding accuracy wasn’t always great, due to the natural inaccuracy and non-systematic placement of street addresses.

I wrote:

(5.4.2) It was decided to look at these features as one example of using a spatial dataset unrelated to orienteering to enhance the process of creating a Street‐O map for an event…

Unfortunately the blue plaque data isn’t freely available in a spatial format – users can search by postal district, but then are presented with a list of addresses rather than a map.

The pictures below show the results for the Islington area, on the left from the dissertation, and on the right the equivalent map currently on OpenPlaques. (It would be straightforward to pull the data into a GPS, from the CSV files the site provided, for a proper side-by-side comparison. I’m just being lazy by screen-grabbing the map as-is.)
blueplaquesmsc

Plenty more to be added to Open Plaques. The best way, of course, is to visit them – the locations can’t be copied across from my derived list, unfortunately, as the Google-derived locations are not free of copyright.

Potentially the Open Plaques database, once complete for London, will simplify this process even more, by allowing a one-step import of plaques, inscriptions and most imporantly accurate locations, into the GIS, for easy map creation.

Categories
Data Graphics

A Collection of Poor Data Graphics

This BBC article on the budget contains no less than six data graphics – and there’s something wrong with every single one.

By “something wrong”, I mean either:

  • I have to concentrate on the graphic, rather than just glance at it, to understand what it is trying to show, or
  • The numbers are distorted by the graphic – the worse kind of “wrong” as a glance at it could mislead.

The issues are:

  1. UK Budget Deficits: Apart from the unwieldy x-axis labels, showing every second fiscal year, my main gripe is the projected section of this stacked bar chart. It only works because the three projections don’t “swap over” their values at any point. But I still had to look at it for longer than necessary, to realise that the “upper value” stacked bars run “behind” the lower ones.
  2. Long-Term UK Government Debt: The use of a line chart, with smoothly flowing lines, rather than bars suggests that there are values available on a more frequent basis than every year – or that the joins between each yearly point are just artistic and so misleading. If the former, then having the unwieldy “fiscal year” x-axis, with ticks every five years, is unnecessary – why not just shift the tick marks back by 4 months and have normal years? This would be considerably easier to read. If the latter, then that’s just plain misleading!
  3. Treasury Growth Forecasts: The worst one of all. The addition of direction arrows above the positive bars (or below the negative bars) – with the value between them and the bar, and the arrows coloured the same, made me assume the bar ran up to the top of the arrow – massively increasing the value of the 2010 independent forecast, for instance. Not sure why the colours needed to change from the first chart, either, seeing as at least two of the categories have the same source in both charts.
  4. UK Claimant Count: This is a simple sequence of choropleths and as such really shouldn’t be a Flash-based chart – this is trivial to do in HTML/CSS alone, never mind Javascript. The colour sequence is odd too – a series of blues suggesting a value-based ordering, which then arbitrarily switches to purple for the final one/two bins. (The legend changing slightly for the last two!) The choropleth is also too small, so show the needed detail.
  5. Government Spending/Taxes: 3D pie-charts, tut tut! The tilt exaggerates the values at the front, making them seem bigger than they are.
  6. UK Rescue Plans: The circles are correctly scaled in 2D rather than 1D – a common mistake averted. However, they unnecessarily overlap with each other, so partly obscuring the genuine ratios

The Beeb designers need to take a read of Tufte and not go down the Microsoft Excel route!