Systeme D

9.

Normalising OpenStreetMap tag ambiguities by location.

One of the most interesting questions posed by a talk at today’s State of the Map EU was “Is OpenStreetMap a world map or a set of regional maps?”

Regional (and sub-regional) differences in tagging are one of the challenges I’ve faced while building out cycle.travel’s coverage to Western Europe. As a simple example, highway=track (on its own) on the European mainland generally means somewhere you can cycle; in Britain, it’s usually a private farm track.

To fix this, I use a two-stage approach. First, while importing with osm2pgsql, I use a Lua tag transform script to mark that this is an ambiguous tag. For example, if highway=track and there’s no conclusive bike access tags, the Lua script sets a special bicycle=unsure tag.

Then, after import is complete, I run a series of PostGIS UPDATEs to resolve these tags by location:

UPDATE planet_osm_line SET highway='footway'
WHERE bicycle='unsure'
AND ST_Contains(ST_Transform(ST_GeomFromText('POLYGON((2.7905 51.7542, -0.0659 50.1768, -11.4697 48.8213, -10.9204 61.1432, 2.3950 61.2596, 2.7905 51.7542))',4326),900913),way)

In other words, resolve ‘unsure’ tags within the UK as ‘footway, no bike access’. Strictly speaking the Lua step isn’t necessary: you could just run a PostGIS query for a particular combination of tags, especially if you’ve imported the full tag set with hstore. However, Lua gives us a convenient way to unify equivalent values (for example, bicycle=yes/permissive/true/1/designated), and in practice it’s easier to parse the many possibilities here than in a WHERE clause.

(Country polygons, as with so much OSM-based goodness, are available from the Geofabrik download site.)

This can also be used to make a best guess for ambiguous tag values. For example, if a cycleway is tagged highway=path, bicycle=yes and nothing else, what surface might it have?

As a rule of thumb, I say “if it’s in a town, it’s probably asphalt; if it’s in the countryside, it’s probably unsurfaced”. This, again, can be resolved with a post-import PostGIS update, by comparing against a set of built-up area polygons. First of all, because I’ll be using this information to resolve several ambiguities, I set an “in town?” boolean for any ways within these polygons:

UPDATE planet_osm_line SET in_town=true FROM dlua 
 WHERE ST_Contains(built_up.the_geom,way) AND built_up.large=true

(where built_up.large is a boolean for built-up areas of a certain size)

Then an update for any ambiguous ways based on this:

UPDATE planet_osm_line SET highway='cycleway' WHERE highway='cycle_unsure' AND in_town

Of course, you can’t fix every single ambiguity in three queries. For every ambiguity and every regional difference, you need a new query, a few new lines of Lua. But it’s a reasonably fast and efficient method of moving 95% of the way from chaotic source data towards a understandable, normalised map.

(Doing this with a diff-updated database is, of course, left as an exercise for the reader.)

I’m available for OpenStreetMap consulting and development; find out more and contact me.

Posted on Saturday 14 June 2014. Link.

Previous post: Building cycle.travel’s bike directions with OSRM.