Workflow for fixing Wikipedia entries on OSM

Sonntag, 17. Februar 2013,

17:36

in OSM

Since the OSM blogs are not able to display xml code with their Markdown stuff I post that entry here.

(This is mostly a note to myself.)
Download http://toolserver.org/~master/wiwosmlog/broken.php

I will work on a subset of that data:

grep Mississippi broken.php> mississippi

Filter for the ways:

grep '"t":"w","l":""' mississippi > ways_mississippi

Look if there is an wikipedia entry for each object - assuming all objects in Mississippi are from english WP.
For not to overload anything I will limit the lookup to 100 objects:

for i in $(less ways_mississippi | cut -d '"' -f 16|head -n 100|sed s/" "/_/g); do firefox http://en.wikipedia.org/wiki/"$i"; done

Scroll through the opened tabs and watch carefully for glitches.

While doing so I download the data using Overpass API into one single file:

for i in $( less ways_mississippi | cut -d '"' -f 4|head -n 100); do wget - "http://overpass-api.de/api/interpreter?data=rel($i);out meta;" -O >> 100.osm ; done

Looking at the WP entries there is a “no article” for “D”. The next in the row would be Bilox, so I have a look at the data in the file near “Bilox”.

<tag k="wikipedia" v="D"Iberville,_Mississippi"/>  

caused that hiccup. Since also the names in several values are messed up (great import) and a spammy bot added an useless tag I correct this and go on.
Have a look at this way.

Having found no further glitches while looking at the WP entries I can search and replace

<tag k="wikipedia" v="

by

<tag k="wikipedia" v="en:

Remove the following xml headers and footers from the single downloads piped into 100.osm except the first and the last:

<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="Overpass API">
<note>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</note>
<meta osm_base="2013-02-17T10:24:01Z"/>
</osm>

To replace underscores by blanks, I do the following:

for i in $(grep ",_" 100.osm | grep wiki | grep _| cut -d '"' -f 4); do rpl "$i" "$(echo "$i"| sed s/"_"/" "/g)" 100.osm ; done

Now I open the file with JOSM and make a fake edit to tell JOSM that the objects are changed. I do add something like “foo=bar” and remove it again immediately.
To see if objects to upload are close to each other I now update the data (ctrl-u) which will fetch all depending stuff.
With ways that would be the nodes for the ways, with relations all their members.
Additionally we should get an conflict if any of the objects has been edited by an other user meanwhile. (Never had happened to me so far.)

To not make one awful big changeset I do the following:

Zoom to the region where the data is located.
Select one changed object using middle click (changed, not uploaded objects are shown bold)
If there are other changed objects in a reasonable distance select them too.
Upload the selection pressing ctrl-alt-shift-u, space, enter (enter/change the changeset comment), enter.

For better visualisation I’ve created a filter for “(child user:malenki) | user:malenki”. After uploading some changes I press ctrl-alt-u and the objects selected, changed and uploaded will be hidden. Now I can see better which objects still remain.

JOSM with selected and filtered entries