Normalising OpenStreetMap tag ambiguities by location.
One of the most interesting questions posed by a talk at today’s State of the Map EU was “Is OpenStreetMap a world map or a set of regional maps?”
Regional (and sub-regional) differences in tagging are one of the challenges I’ve faced while building out cycle.travel’s coverage to Western Europe. As a simple example, highway=track (on its own) on the European mainland generally means somewhere you can cycle; in Britain, it’s usually a private farm track.
To fix this, I use a two-stage approach. First, while importing with osm2pgsql, I use a Lua tag transform script to mark that this is an ambiguous tag. For example, if highway=track and there’s no conclusive bike access tags, the Lua script sets a special bicycle=unsure tag.
Then, after import is complete, I run a series of PostGIS UPDATEs to resolve these tags by location:
UPDATE planet_osm_line SET highway='footway'
AND ST_Contains(ST_Transform(ST_GeomFromText('POLYGON((2.7905 51.7542, -0.0659 50.1768, -11.4697 48.8213, -10.9204 61.1432, 2.3950 61.2596, 2.7905 51.7542))',4326),900913),way)
In other words, resolve ‘unsure’ tags within the UK as ‘footway, no bike access’. Strictly speaking the Lua step isn’t necessary: you could just run a PostGIS query for a particular combination of tags, especially if you’ve imported the full tag set with hstore. However, Lua gives us a convenient way to unify equivalent values (for example, bicycle=yes/permissive/true/1/designated), and in practice it’s easier to parse the many possibilities here than in a WHERE clause.
(Country polygons, as with so much OSM-based goodness, are available from the Geofabrik download site.)
This can also be used to make a best guess for ambiguous tag values. For example, if a cycleway is tagged highway=path, bicycle=yes and nothing else, what surface might it have?
As a rule of thumb, I say “if it’s in a town, it’s probably asphalt; if it’s in the countryside, it’s probably unsurfaced”. This, again, can be resolved with a post-import PostGIS update, by comparing against a set of built-up area polygons. First of all, because I’ll be using this information to resolve several ambiguities, I set an “in town?” boolean for any ways within these polygons:
UPDATE planet_osm_line SET in_town=true FROM dlua WHERE ST_Contains(built_up.the_geom,way) AND built_up.large=true
(where built_up.large is a boolean for built-up areas of a certain size)
Then an update for any ambiguous ways based on this:
UPDATE planet_osm_line SET highway='cycleway' WHERE highway='cycle_unsure' AND in_town
Of course, you can’t fix every single ambiguity in three queries. For every ambiguity and every regional difference, you need a new query, a few new lines of Lua. But it’s a reasonably fast and efficient method of moving 95% of the way from chaotic source data towards a understandable, normalised map.
(Doing this with a diff-updated database is, of course, left as an exercise for the reader.)
I’m available for OpenStreetMap consulting and development; find out more and contact me.
Posted on Saturday 14 June 2014. Link.