Mathare Building import

From Map Kibera

Jump to: navigation, search

In this page, I (Sébastien) will describe how to import the data of the building extraction of Mathare that we've got from AAAS. See Primož's blog for the initial analysis.


Contents

Changing the SRS

The data was made available as shapefiles in WGS84 Web Mercator Auxiliary Sphere projection, which is known as Web Mercator, spherical Mercator, Google Mercator and is referenced by various ids: EPSG:900913, EPSG:3785 (obsolete), EPSG:3857, ESRI WKID 102100... In QGIS, it's known as Google Mercator, EPSG:900913, ID 3644.

Before working with the data, we need to change the Spatial Reference System (SRS) to be compatible with OSM, that is moving to non-projected WGS84.
For some reason QGIS doesn't recognize the CRS of our shapefile and thus is unable to translate to another CRS. This can be fixed by manually setting the CRS to "Google Mercartor": menu Vector -> Data Management Tools -> Define current projection. Then export to a new projection: menu Vector -> Data Management Tools -> Export to new projection. Then choose the output CRS to be WGS84, EPSG:4326, ID 3452.
The two files that we're going to import are GroundTruthed-wgs84.shp and BuildingsOnTop-wgs84.shp

Setup PostGIS database

PostGIS is relatively easy to install on Ubuntu. I might describe it further later. Anyway, the current environment is PostgreSQL 8.4, PostGIS 1.5.
I found this script from GeoDjango very convenient to create a postgis db template:

$ ./create_template_postgis-debian.sh
$ createdb -T template_postgis mathare

Import the shapefiles to the db

Each shapefile will be imported a separate table with shp2pgsql:

shp2pgsql -s 4326 -c -g the_geom -I -S -W UTF-8 -N skip Groundtruthed-wgs84.shp | psql mathare
shp2pgsql -s 4326 -c -g the_geom -I -S -W UTF-8 -N skip BuildingsOnTop-wgs84.shp | psql mathare

Import OSM data to the db

In this case, we're only interested in buildings from OSM.

Get fresh data from OSM, keep only ways tagged with 'building' and their associated ways:

osmosis \
--read-api left=36.8425655 right=36.881876 top=-1.2480512 bottom=-1.2696752999999998 outPipe.0=1 \
--way-key keyList=building inPipe.0=1 outPipe.0=2 \
--used-node inPipe.0=2 outPipe.0=3 \
--write-xml file=mathare_building.osm inPipe.0=3

Import it to the database:

osm2pgsql -c -d mathare -l -s mathare_building.osm

The following tables will be created:

  • planet_osm_line
  • planet_osm_nodes
  • planet_osm_point
  • planet_osm_polygon
  • planet_osm_rels
  • planet_osm_roads
  • planet_osm_ways

Tagging scheme

Every geometry imported will be tagged with:

 source=AAAS_satellite_extraction_for_Mathare_2011
 

Shapefile column OSM tags
id ignored, since not used in shapefiles
designated see table below
type=* building:type=*
name=* name=*
Designated
Shapefile
Designated=*
OSM tags
Toilet building=yes
amenity=toilet
NGO/CBO building=office
office=ngo
Administration building=office
office=government
CBO building=office
office=ngo
Self-Help Group
building=yes
note=Self-Help Group
fixme=Find an appriopriate tag for "Self help group"
Medical Clinic building=yes
amenity=hospital
health_facility:type=medical_clinic
Business building=commercial
Petrol Station building=yes
amenity=fuel
motor_vehicle=yes
House building=residential
Apartment Building building=apartments
Hardware Shop building=commercial
shop=hardware
Church building=church
amenity=place_of_worship
religion=christian
Bar building=yes
amenity=bar
Bathroom building=yes
amenity=shower
Private Toilet building=yes
amenity=toilet
access=private
fixme=mathare preset scheme
Garage building=garage
Public Toilet building=yes
amenity=toilet
access=public
Market Place

building=yes
amenity=market_place

Cyber Cafe building=yes
amenity=cyber_cafe


Keys:

  • access
  • amenity
  • building
  • building:type
  • fixme
  • health_facility:type
  • motor_vehicle
  • name
  • note
  • office
  • religion
  • shop
  • source

Preparing the database for the new data

DROP TABLE IF EXISTS mathare_building_import;

CREATE TABLE mathare_building_import
(
    shp_gid         integer,
    osm_id          integer,
    layer           integer,
    "access"        text,
    "amenity"       text,
    "building"      text,
    "building:type" text,
    "fixme"         text,
    "health_facility:type" text,
    "motor_vehicle" text,
    "name"          text,
    "note"          text,
    "office"        text,
    "religion"      text,
    "shop"          text,
    "source"        text,
    the_geom        geometry,
    CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2),
    CONSTRAINT enforce_geotype_the_geom CHECK (geometrytype(the_geom) = 'POLYGON'::text OR the_geom IS NULL),
    CONSTRAINT enforce_srid_the_geom CHECK (st_srid(the_geom) = 4326)
) WITH ( OIDS=FALSE );

CREATE INDEX build_the_geom_gist ON mathare_building_import USING gist(the_geom);

Processing the data

INSERT INTO mathare_building_import (
   shp_gid,
   "building",
   "building:type",
   "access",
   "amenity",
   "fixme",
   "health_facility:type",
   "motor_vehicle",
   "name",
   "note",
   "office",
   "religion",
   "shop",
   "source",
   the_geom
) 
SELECT g.gid,
       ( CASE WHEN g.designated IN ('NGO/CBO', 'Administration', 'CBO') THEN 'office'
              WHEN g.designated IN ('Business', 'Hardware Shop' ) THEN 'commercial'
              WHEN g.designated = 'House' THEN 'residential'
              WHEN g.designated = 'Apartment Building' THEN 'apartments'
              WHEN g.designated = 'Church' THEN 'church'
              WHEN g.designated = 'Garage' THEN 'garage'
              ELSE 'yes'
         END
       ) as building,
       lower(replace(g.type, ' ', '_')) as building_type,
       ( CASE WHEN g.designated = 'Public Toilet' THEN 'public'
              WHEN g.designated = 'Private Toilet' THEN 'private'
              ELSE NULL
         END
       ) as access,
       ( CASE WHEN g.designated IN ('Toilet', 'Private Toilet', 'Public Toilet') THEN 'toilet'
              WHEN g.designated = 'Petrol Station' THEN 'fuel'
              WHEN g.designated = 'Medical Clinic' THEN 'hospital'
              WHEN g.designated = 'Church' THEN 'place_of_worship'
              WHEN g.designated = 'Bar' THEN 'bar'
              WHEN g.designated = 'Bathroom' THEN 'shower'
              WHEN g.designated = 'Cyber Cafe' THEN 'cyber_cafe'
              ELSE NULL
          END
        ) as amenity,
        ( CASE WHEN g.designated = 'Self-Help Group' THEN 'Find an appriopriate tag for "Self help group"'
               WHEN g.designated IN ('Public Toilet', 'Private Toilet', 'Toilet', 'Bathroom') THEN 'MapMathare WATSAN-preset scheme missing'
               ELSE NULL
          END
        ) as fixme,
        ( CASE WHEN g.designated = 'Medical Clinic' THEN 'medical_clinic' ELSE NULL END ) as health,
        ( CASE WHEN g.designated = 'Petrol Station' THEN 'yes' ELSE NULL END ) as motor_vehicle,
        g.name as name,
        ( CASE WHEN g.designated = 'Self-Help Group' THEN 'Self-Help Group' ELSE NULL END ) as note,
        ( CASE WHEN g.designated = 'Administration' THEN 'government'
               WHEN g.designated in ( 'NGO/CBO', 'CBO') THEN 'ngo'
               ELSE NULL
          END
        ) as office,
        ( CASE WHEN g.designated = 'Church' THEN 'christian' ELSE NULL END ) as religion,
        ( CASE WHEN g.designated = 'Hardware Shop' THEN 'hardware' ELSE NULL END) as shop,
        'AAAS building extraction from satellite imagery "MM128982" for Mathare 2011' as source,
        g.the_geom
FROM "groundtruthed-wgs84" as g;

Find collisions between OSM and the import candidate

Very few buildings have already been mapped in OSM in Mathare (139 as of 2011.05.09). Yet, we should be careful whilst importing the dataset from aerial imagery extraction. While we can discard the geometry of the existing buildings, we shall preserve their attributes.

Basically, we shall check for geometries that overlaps and apply the attributes of the OSM element to the imported one.

Note: I can see two tricky corner cases, which I don't need to worry about right now since this import does not feature them. However, overlooking these issues in another import will likely result in a loss of data:

  • nodes of the buildings bear attributes
  • multipolygons

We can find the overlapping elements with this query:

mathare#= select g.gid, o.osm_id, from "groundtruthed-wgs84" as g, planet_osm_polygon as o where ST_Intersects(g.the_geom, o.way) ORDER BY g.gid;

This script removes the columns, which are empty:

#!/usr/bin/python

import psycopg2

conn = psycopg2.connect("dbname=mathare")
cur= conn.cursor()
cur2 = conn.cursor()
cur3 = conn.cursor()
cur.execute("select attname from pg_attribute, pg_type where typname='planet_osm_polygon' and typrelid=attrelid ;")
for record in cur:
	column = record[0]
	print column
	query = cur2.mogrify("SELECT count(*) from planet_osm_polygon WHERE \"" + column + "\" IS NOT NULL" )
 	cur2.execute(query)

	for c in cur2:
		if c[0] == 0L:
			query = "ALTER TABLE planet_osm_polygon DROP COLUMN \"" + column + "\" CASCADE;"
			print query
			cur3.execute(query)
			print repr(cur3.statusmessage)
conn.commit()
cur.close()
cur2.close()
cur3.close()
conn.close()


It turns out only one of the pre existing buildings that overlap carries information that should be kept:

mathare=# SELECT DISTINCT m.shp_gid, o.osm_id, o.name, o.amenity, o.religion 
          FROM planet_osm_polygon AS o, mathare_building_import AS m
          WHERE (o.name IS NOT NULL OR o.religion IS NOT NULL or o.amenity IS NOT NULL)
          AND ST_Intersects(m.the_geom, o.way);
 shp_gid |  osm_id  |              name              |     amenity      | religion  
---------+----------+--------------------------------+------------------+-----------
     517 | 96051625 | St. John's & St. Paul's Church | place_of_worship | christian
(1 row)

Buildings on top of others

Some buildings sit on top of others, basically tin shacks built on the roof of apartment buildings.

Set the layer of the buildings that are below:

mathare=# UPDATE mathare_building_import  SET layer = 0
                 WHERE shp_gid IN ( SELECT m.shp_gid from mathare_building_import AS m, "buildingsontop-wgs84" AS bot
                                           WHERE ST_Intersects(m.the_geom, bot.the_geom)
                                  )
          RETURNING layer;

then import the buildings that are on top:

mathare=# INSERT INTO mathare_building_import (the_geom, layer, "building:type", name, building )
                 SELECT the_geom,
                        1,
                        lower(replace(type, ' ', '_')), 
                        name,
                        ( CASE WHEN designated = 'House' THEN 'residential' ELSE 'yes' END ) 
                        FROM "buildingsontop-wgs84";

Note: it turns out that the original shapefile contained duplicate. They can be found:

mathare=# SELECT DISTINCT m1.shp_gid, m2.shp_gid FROM mathare_building_import AS m1,
                                                      mathare_building_import AS m2 
          WHERE m1.the_geom = m2.the_geom AND m1.shp_gid <> m2.shp_gid;

Generate the OSM data to be imported

To my knowledge, there's no tool to convert data from a postgis base to an OSM file.

So, we'll first export the data to a shapefile

pgsql2shp -g the_geom -k -f mathare_building_import.shp mathare

and then convert it to an OSM file with ogr2osm.py:

ogr2osm.py mathare_building_import.shp

The shapefile format has some limitations:

  • the attribute name is limited to 10 characters, which is a problem for keys "building:type", 'health_fac', 'motor_vehi'.
  • integer values strangely starts with empty spaces.

Those can be fixed with sed:

$ cat > shapefile.sed << EOF
s/building:t/building:type/g
s/health_fac/health_facility/g
s/motor_vehi/motor_vehicle/g
s/v=" *\([0-9]*\)"/v="\1"/g
EOF

$ sed -f shapefile.sed mathare_building_import.osm > mathare_building_import_clean.osm

Get the buildings to remove from OSM:

$ pgsql2shp -g way -k -f colliding_buildings.shp mathare "select DISTINCT o.* from \"groundtruthed-wgs84\" as g, planet_osm_polygon as o where ST_Intersects(g.the_geom, o.way) ;"
$ ogr2osm.py colliding_buildings.osm
sed -e 's/v=" *\([0-9]*\)"/v="\1"/g' colliding_buildings.osm > tmp && mv tmp colliding_buildings.osm
Personal tools