I’ve been mining the British Orienteering event results pages and have produced a websites presenting the results in a more effective way – i.e. athlete focused rather than event focused. I’m also having a go at recalculating the ranking score based on this data.
Unfortunately there are a couple of flaws:
- The BOF ID is not available on the source website, so I have had to construct a key based on name (which can be misspelled on results uploads from time-to-time) and club (ditto). This mainly works, except where people change club, in which case their results, run under other clubs, that contribute to their ranking score, won’t be included.
- It turns out that, with each new result upload, all the ranking points for all events going back the whole of the last year – possibly more – are recalculated. This has the effect of old scores drifting slightly – I wasn’t expecting the points to fluctuate in such a way. The effect is mainly small – so far one of my scores has drifted by 1 point – but another person’s score has drifted by 7 points. I could mitigate this by scraping all results over the last year every night, but this would put strain on BOF’s servers and they would probably not appreciate it – it would be over 5000 page requests over the course of several hours. So, instead, I’m updating the most recent 25 events nightly and may manually resync the whole year on an ad-hoc basis. The result is that, after a while, the scores don’t match precisely with those on the source website.
The toughness scores for each event are just a bit of fun and based on the details of the course, not how well people did on it. The urban shading is also just based on the name of the event, rather than any specific metadata on the event that I am accessing. Such metadata may be available in the event details section of the source website but I am just using the results information here.
The collation of a large number of results has highlighted various data problems, such as results appearing as HH:MM rather than MM:SS, or x,xxx km instead of x.xxx km. Unfortunately one of my own (few) event result uploads suffered the first problem. This doesn’t affect the points at all, because the times within each course are only used on a relative, not absolute, basis, but it does preclude me, for example, totalling the “yearly run hours” for each athlete, without cleaning up the data on my side.
You can see the stats here – type in your name and club to see your stats. See the notes on the search page, e.g. most Level D events not included. You can also compare two people, looking at where they ran the same courses at the same event.