Categories
London Technical

Me, Geolocated on Twitter

tweets_london

I was prompted by the excellent Twitter Tongues map, where geolocated tweets in London (including mine, and those from hundreds of thousands of others) were mined by Ed Manley over the summer, and then mapped by James Cheshire, to see where I had left my own Twitter footprint.

Many people would probably be quite alarmed to learn that the data, on the exact locations they have tweeted at – if they’ve allowed geolocation – is freely accessible to anyone, not just themselves, through the Twitter API.

tweets_chancerylane

It’s a bit of a faff to get the data – Twitter is starting to rollout a “download my Tweets” option which may make the first few steps here easier – but here’s how I did it.

  1. I used the user_timeline call on the Twitter API, repeatedly, to pull in my last 3200 tweets (the maximum) in batches (“pages”) of 200. The current Twitter API (1.1) requires OAuth authentication – not of the person whose tweets you are mining, but simply yourself, so that rate limits can be correctly applied. Registering a dummy application on the Twitter gives access to OAuth credentials, and then using the OAuth tool generates a CURL string that can then be run – the result is put in a file ( > pageX.json), and I do this 16 times to get all 3200 tweets, using the count, page and include_rts parameters. For this particular case, I’m interested in the locations of my own account but – to stress again – you can do this for anyone else’s account, unless their account is protected and you are not a follower.
  2. The output is as various JSON files. Lacking a JSON parser, or indeed the skill, I had to do a bit of manual text processing. Those with a flexible JSON parser can therefore skip a few steps. I then merged together the files (cat *.json > combined.txt), and in a text editor, put a line break between each },{"crea and replaced ," with ,^" with the caret being an otherwise unused character.
  3. I opened up the file as a text file (not CSV!) in Excel and did a text-to-column on the caret. I then extracted three columns – the date/time, tweet text, and the first coordinates column that occurred. These were the 1st(A), 4th (D) and 28th (AB) columns. I did further find/replace and text-to-columns to remove the keys and quotes, and split the coordinates column into two columns – lat and long.
  4. I removed all the rows that didn’t have a lat/long location. Out of 3186 (14 less than 3200 due to deleted tweets) I had 268 such tweets. I also added a header row.
  5. I created a new Google Fusion Table on the Google Drive website, importing in the Excel file from the above step, and assigning the latter two columns to be a two-column location field.
  6. I marked the table as public (viewable with a link). This is necessary as Google doesn’t allow the creation of a map from a private file, except though a paid (business) account. The flip side of course is this gives Google themselves the right of access to the file contents, although I can’t imagine they are particularly interested in this one.
  7. Finally, I added a tab to the Google Fusion Table which was a map tab, and then zoomed in and around and took the screenshots below. The map is zoomable and the points clickable as normal. It should be possible to colour-code the dots by year, if the categories are set appropriately and the appropriate part of the datetime feed is reformatted appropriately in Step 3.

The whole process, including some trial-and-error, took a little over an hour – not so bad.

In the images above and below, you can see the results – 268 geolocated tweets over the course of two and a half years from my account – many of them precisely and accurately located.

tweets_nweurope

All screenshots from Google Maps.

Categories
Training

Evolving the Shoe, Evolving the Terrain

mizuno_wi9w

I occasionally receive the odd running-related press release, and got an interesting one from Mizuno recently, announcing a couple of new running shoes – the Wave Rider 16 and Wave Inspire 9 – the two being quite similar but with the latter being more of a support shoe and a fraction (10g) heavier.

The shoes look the part as you would expect, and are appropriately vividly coloured and styled – very much the trend these days, and why not – at this time of year, much of the time it’s dark when I’m running, and it makes sense to be as visible as possible.

Anyway I mention the shoes for three reasons.

Firstly I’m impressed that this is the 16th iteration of the Wave Rider shoe. Mizuno clearly know they are on to a good thing – not launching a new brand every year or so, but instead evolving a well known one. The average running shoe only lasts for 3-400 miles so a typical club runner might need to buy a new one twice a year. If the shoe is good, then the club runner will not want to change it for another brand if the old one is no longer available – they might just as easily change the manufacturer altogether, but they would much prefer to stick the name of the shoe that they know – shoes are the critical tool for a runner. So, give them what they want, and take the opportunity to refine it.

But you also need to keep new people discovering the manufacturer and brand, and also update the look to keep it looking new and relevant. So – relaunch it!

The second reason I mention is that I got a rather nice Mizuno freebie – which just happened to be a Wave Rider 15 – during the launch of an unrelated training shoe by them, earlier this year. Like the new shoes here, it wasn’t a subtle shoe – purple and lime green. When added to my red, white and blue running tops, the look is somewhat psychedelic. But it’s a very comfortable shoe and has become my current running shoe of choice. This is partly due to superstition – I started wearing my previous new shoe when I hadn’t fully recovered from an injury, and I put the resulting niggles down to the shoe and not my injury – d’oh. But it’s surprising just how superstitious you can be when it comes to injuries.

Anyway, long story short, I’ve been very pleased with my “v15” Wave Rider the last few months – I even took it to the Venice Street Race in November, although Venice was underwater at the time* so there was not much running involved, and it could well be the v16 that I end up getting next, when the current one wears out – or maybe there will even be a v17 by then? It looks like the Wave Riders will be evolving for a while yet.

The third reason is the that PR came with some photos, of runners running in the shoes, like you would expect. But the locations strongly reminded me of urban orienteering races. None of the running in the photos is taking place on roads, but instead they are along the seafront, through building courtyards, along garden paths – all the places where the best urban orienteering takes places. The campaign’s ad (short video – 30s) even includes the runner ascending some external stairs – very Barbican. You could easily imagine a control in each of these photos. In fact I very nearly doctored the photos to add one in the background. I don’t think Mizuno would have been too impressed at that though.

I’m planning a big urban orienteering race – in fact the second biggest standalone one in the world – next September. It might even be the biggest in the world next year, because the traditional incumbent, Venice, has got cancelled in 2013, after some concerns were raised during this year’s flooded race. Details of the race I’m planning will be up at the end of this month – all I can say for now is that it will have a distinctly watery feel to it. As the planner, I get to pick where the control sites go. And I’ll certainly be aiming to pick ones like the sorts shown in the photos here.

* Resulting in a rather saline shoe now. I’m not sure if it would survive a wash cycle.

mizuno_wi9m

Categories
Bike Share Conferences

Paris Workshop on Bike Sharing Systems

IMG_2856

I attended a one-day workshop last week, hosted by IFSTTAR’s GERI Animatic research group at École des Ponts ParisTech just east of Paris. The workshop was on Bicycle Sharing Systems, and as I have recently been working with a couple of colleagues, Dr Martin Zaltz-Austwick and Dr James Cheshire, on research relating to bicycle sharing data, and mapping the systems currently live in various cities around the world, I was keen to attend, particular as the agenda was packed with interesting sounding talks.

My rush-hour commute through Paris proved to be slightly more traumatic than planned (I wonder if Parisian visitors find London Underground stations as confusing as I find those on the Paris metro?) but I arrived at the École des Ponts ParisTech in time to hear the workshop organiser introducing the sessions. First up was Pierre Borgnat talking about network analysis of Lyon’s system. I had seen a paper by him on Lyon before, and the popularity and density of Lyon’s system has allowed for a rich and interesting dataset for mining and community detection. The community detection has been done using both spatial and temporal variables. Pierre’s thorough and technical treatment of the data was backed up with some excellent mapping of the data, which you can see above and below.

IMG_2859

Next up was Jon Froehlich. Jon’s talk was underpinned by a discussion of the different data sources and types available in the field. He focussed on temporal cluster analysis of the Barcelona bicycle sharing system (below) – a particularly interesting city for me as, along with London and Zurich, it is a case study for the EU project I have recently started working on, EUNOIA. Barcelona’s bicycle sharing system is not unlike London’s, in terms of its size, shape and usage characteristics – although the general downward slope of the city causes headaches for its operator. Jon gets bonus points for including not only a quote from this blog on his presentation, but Martin’s beautiful routed bike-flow animation for London, and Dr Jo Wood’s more recent bi-directional flow animation, again of London.

IMG_2887

Etienne Côme, from the hosting school, was next on, with an analysis of the biggest system (outside of China) of all – the Vélib in Paris. The Vélib is perhaps the holy grail of academic research in the field as its size, and Paris’s multiple commercial and residential zones, means that community and network analysis is likely to be eye-opening. Similar to Pierre, Etienne outlined eight detected communities, by looking at temporal variations in the origin-matrix between the 1200-odd stations on the Vélib network.

IMG_2914

After lunch, Vincent Aguilera was first on, with a switch away from bicycle sharing systems but showing some techniques that have potential for the field – Vincent looked at using mobile phone network data to detect station dwell times and true journey durations on a section of the RER metro in Paris. He compared this data with Twitter messages with appropriate hashtags (below), and the real-time running supplied by the operator on its website. The availability and structure of the cell-towers on the network allowed a direct comparison to be made – indeed, such data may actually be of better quality than that currently available at the operator’s disposal, allowing more fine-tuned operation and monitoring.

IMG_2925

Neal Lathia was next with a look at London’s system – specially effects caused by the addition of casual (i.e. non-key, non-member) availability in December 2010. The additional option did see some changes in the usages of certain docking stations. The comparison was done by clustering the network’s docking stations by time, before and after the transition, and then seeing which stations changed cluster. One of the main areas of change was in the very heart of London, around the Trafalgar Square area, suggesting a slight shift away from the (still dominating) railway station-based usage patterns.

IMG_2948

Fabio Pinelli’s talk was wide-ranging – it included system design, routing for Dublin’s (over)used system, a look at the reliability of the Vélib fleet.

IMG_2950

Finally, Francis Papon from the hosting school took a step back from the modern electronically managed bicycle sharing systems and mobile/social data sources, and looked at change in uses of urban cycling more generally. His dataset stretched over a hundred years, rather than the typically five-year maximum historical range that bicycle sharing systems have. A key trend is that in the largest French cities studied, including Paris, there is a recent (post-2000) renaissance in urban cycling usage, but this is not matched in many of the country’s smaller cities.

The workshop concluded with a general discussion of the research field to date and its direction. What was particularly interesting was that several bike sharing operators were in attendance, they were fully engaged with the academic research being carried out, asking questions but also revealing some nuggets of information about how the systems are rebalanced, relative costs of operations and why they thought some systems were more successful than others.

Hopefully there will be more such workshops in the future in Europe – with UCL CASA, Cambridge, City University London and LSHTM all involved in the field, maybe there should be one taking place in London next year?

Categories
Data Graphics London

A Periodic Table for London

Here is a webpage that uses my own CityDashboard API*, to build a Periodic-Table inspired “data artwork” of live London information, as a series of coloured square panels on a website. The squares update regularly with fresh information, and throb red (or blue) if there are particularly extreme values present.

As an artwork, it’s deliberately not 100% clear what it shows. A key on the bottom right will help a bit, but a degree of guesswork will be needed for some of the panels. With a bit of thought, almost all of the panels should be decipherable.

It’s a super-simple webpage. I’m using CSS3 for the animations – no Javascript used. The page is customised to be most relevant to the CASA office here in central London – the chosen weather station, bike share stands, air quality monitor and variable message road sign have been chosen accordingly. A more sophisticated version – which doesn’t currently exist but would be simple to do – would use a combination of the location information in the CityDashboard feeds, and the HTML5 geolocation functionality of many browsers, to show a version more relevant to where in London the viewer is.

As the page is so simple, it displays well on mobile browsers – on my iPhone, the webpage shows four panels on each row. On larger displays, it will rearrange appropriately. See the acknowledgements link on the page to see where the data’s coming from – the same sources as CityDashboard, including TfL, DEFRA, Yahoo! Finance and Mappiness, as well as CASA’s own sensors.

I created the piece for the ODI’s recent Data as Art installation competition – I didn’t win, but decided to do it anyway.

Live version here.

*Strictly, I’m using my Bike Share Map data for the individual docking station information – this could be easily added to the CityDashboard API in due course.