SlideShare a Scribd company logo
An intrinsic approach for the detection and correction of attributive
inconsistencies and semantic heterogeneity in OSM data
Martin Loidl | martin.loidl@sbg.ac.at
Stefan Keller| sfkeller@hsr.ch
AAG Annual Meeting – Workshop OpenStreetMap Studies
Chicago, April 24th 2015
 OSM bottom-up community approach
 Rudimentary data model and attribute structure (tagging scheme K = v)
 Attributes: recommendations ≠ conventions ≠ formalized standard
 No restriction of tag usage and definition
Problem Statement
2
http://www.openstreetmap.es
 Within one way
 Within a succession of ways (e.g. street)
Attributive Inconsistencies
3
highway = motorway
name = Kennedy Expressway
bicycle = yes
highway = motorway
name = Kennedy Expressway
ref = I 90
highway = motorway
name = Fisher Freeway
ref = I 90
highway = motorway
name = Kennedy Expressway
ref = I 90
 Different (correct) description for one and the same entity
 Specific to crowd-sourced data (≠ authoritative data follow
strict specifications)
Semantic Heterogeneity
4
highway = cycleway
foot = designated
width = 3
highway = path
bicycle = designated
foot = yes
highway = footway
bicycle = designated
surface = asphalt
 Considering attributive inconsistencies and semantic
heterogeneity is relevant for …
 Visualization (data rendering)
 Descriptive statistics (classification)
 Spatial analysis (e.g. routing)
 Improve results through
 Harmonization (remove semantic heterogeneities)
 Correction through estimation (gaps, inconsistencies)
Relevance
5
 Spatial data quality
 Standards (e.g. ISO 19157 = harmonization of
multiple preceeding standards) and extensive
body of literature  of limited use for OSM data
 Quality asssessment of OSM data
 Primarily focusing on positional accuracy and
geometrical completeness
 Reference data set and/or descriptive
statistics
 Comparable little work on attribute quality
Data Quality
6
Haklay 2010
Hochmair et al. 2015
Barron et al. 2014
 Why an intrinsic approach?
 Extrinsic approach requires reference data set,
which ideally has:
 Same geographical coverage
 Same data model and attribute structure
 [Koukoletsos et al. (2012): multi-stage process
to deal with it to a certain extent]
 Quality of reference data set (authoritative data
doesn‘t necessarily imply better data!)
 Data often created for very different purposes
Quality Assessment
7
Elsbethen (Austria):
authoritative data –
OSM data
 Exclusively based on respective data set (data-centered
approach)
 Makes use of:
 Redundancy
 Inherent logic, functionally related attributes
Intrinsic Approach
8
Translation into query
statements
highway = * surface = *
tracktype = *
Case Study Area
9
 4,600 km² in Austrian-Bavarian
boarder region
 ~ 22,600 km total network length
 Rural and urban areas
 Data preparation
 Extraction from OSM Database
(April 1st 2015)
 Conversion to topological correct
graph (edge-node) in GeoDB
Major Road Network
10
 Major road = motorway, primary, secondary
(incl. links)
 Consistent for road category (highway = *)
 Makes features mappable = primary
intent/purpose of OSM
 Attributes incomplete (n = 11,951 segments)
 name = *: 64.6%
 surface = *: 22.93% [ can be estimated: asphalt]
 maxspeed = *: 72.19%
 lanes = *: 57.86%
 Rather an issue of completeness than of
inconsistency and heterogeneity
Local Road Network
11
 Majority of ways in OSM
 Differences in terms of attribute
quality (existence, consistency etc.)
 Relevant e.g. for active modes of
transport (cycling, hiking etc.)
 In many cases more extensive
(spatial coverage, attribute details)
than authoritative data
 Define set of logical/legal contradictions
 Connect to corresponding tags
 Tag specification according to Wiki
 Query the dataset for contradictions
Attributive Inconsistencies
12
approx. 1 from 1,000
("tracktype" = 'grade3' or "tracktype" = 'grade4' or "tracktype" = 'grade5')
and "surface" = 'asphalt'
 Distribution of inconsistencies:
 Regional diversity (national laws?)
 Spatial clusters (local mapper/communities?)
Spatial Particularities
13
highway = residential
maxspeed = 80
 Correction without ground truthing = estimation
 Quality of estimation depends on number of functionally
related attributes
Correction of Inconsistencies
14
 How to map a mixed foot-/cycleway in OSM?
Heterogeneity
15
http://www.stadt-salzburg.at
 How to map a mixed foot-/cycleway in OSM?
 Co-existence vs. “tag war”
 Credibility and reputation (Flanagin & Metzger 2008)
Heterogeneity
16
("highway" = 'footway' and ("bicycle" =
'designated' or "bicycle" = 'yes' or
"bicycle" = 'official'))
OR
("highway" = 'cycleway' and ("foot" =
'designated' or "foot" = 'yes'))
OR
("highway" = 'path' and ("foot" =
'designated' or "foot" = 'official') and
("bicycle" = 'designated' or "bicycle" =
'official'))
OR
("highway" = 'track' and ("foot" =
'designated' or "foot" = 'official') and
("bicycle" = 'designated' or "bicycle" =
'official'))
669 segments
1,202 segments
2,655 segments
73 segments
 Different (correct) views on same entity
Heterogeneity
17
highway = cycleway
surface = asphalt
ref = BGL 3
foot = designated
bicycle = designated
segregated = no
Last editor: j_cook
highway = path
surface = asphalt
foot = designated
bicycle = designated
Last editor: pyram
18
highway = track
name = Treppelweg
surface = gravel
tracktype = grade2
foot = yes
bicycle = yes
width = 3
highway = path
name = Treppelweg
surface = gravel
tracktype = grade2
foot = designated
bicycle = designated
width = 3
http://www.bing.com/maps
 Define derived attributes that fit best for actual purpose
Harmonization of Heterogeneity
19
Loidl & Zagel (2014)
 OSMAXX
 Extracts OSM data
 Data cleaning (capital
letters etc.) and
harmonization
(generalization)
 Conversion to GIS formats
 For visualization and
geospatial analysis
Harmonization of Heterogeneity
20
 Inconsistency = quality issue
 Can be detected with intrinsic approach
 Heterogeneity = depends on purpose
 Definition of derived attributes
 Implement assessment routines during editing or in post-
processing?
 Tag recommender system during editing (Vandecasteele & Devillers 2014)
 Probabilistic approach and/or functionally related attributes
 Prevent from contradiction
 Data tuning in post-processing allows specification for actual purpose
 Combination  prevent – detect – repair (Herzog et al 2007)
 Data model issue  social complexity of OSM (Spielmann 2014)
Wrap-Up
21
@gicycle_
gicycle.wordpress.com

More Related Content

PDF
Integrating Open Spaces into OSM Routing Graphs for Realistic Crossing Behavi...
PPTX
Landmark-based instructions for pedestrian navigation systems using OSM
ODP
openrouteservice.org at sotm08
PDF
Regional evaluation workflow
PPT
GIS Day 2015 - New Light Technologies, Inc.
PPT
08 Agency Of Mapping
DOCX
PPTX
Semester Presentation
Integrating Open Spaces into OSM Routing Graphs for Realistic Crossing Behavi...
Landmark-based instructions for pedestrian navigation systems using OSM
openrouteservice.org at sotm08
Regional evaluation workflow
GIS Day 2015 - New Light Technologies, Inc.
08 Agency Of Mapping
Semester Presentation

What's hot (19)

PDF
Autonomous emergency manoeuvring
PPT
Unique methods of GIS in Transportation
PPTX
Lect 7 & 8 types of vector data model-gis
PDF
PathView | Well Path Visualization Software
PDF
Konzeptentwicklung
PDF
What Types of Reports Can You Create?
PPTX
Integration of Land Use & Transportation Planning in the SMART Plan
PDF
MAP WIDGET - Knowage Technical Webinars
PPT
Organisation of Traffic Information in Austria
PPT
SFD Gis And VisiCad Dispatcher Training
PPTX
Lenka Zajíčková - The data model for data management of public transport in t...
PPTX
Lect 5 data models-gis
PDF
conversion of digital elevation maps to geological information
PDF
Euro30 2019 - Benchmarking tree approaches on street data
PDF
L4 engineering surveys for highways 1.3
DOCX
Kandy Kenez_Resume_July-30-2015
PDF
11 Traffic Signal Control Overview (Traffic Engineering هندسة المرور & Prof. ...
PDF
Open geo data - technical issue
Autonomous emergency manoeuvring
Unique methods of GIS in Transportation
Lect 7 & 8 types of vector data model-gis
PathView | Well Path Visualization Software
Konzeptentwicklung
What Types of Reports Can You Create?
Integration of Land Use & Transportation Planning in the SMART Plan
MAP WIDGET - Knowage Technical Webinars
Organisation of Traffic Information in Austria
SFD Gis And VisiCad Dispatcher Training
Lenka Zajíčková - The data model for data management of public transport in t...
Lect 5 data models-gis
conversion of digital elevation maps to geological information
Euro30 2019 - Benchmarking tree approaches on street data
L4 engineering surveys for highways 1.3
Kandy Kenez_Resume_July-30-2015
11 Traffic Signal Control Overview (Traffic Engineering هندسة المرور & Prof. ...
Open geo data - technical issue
Ad

Similar to #AAG2015 presentation on OSM attribute inconsistency and semantic heterogeneity (20)

PPSX
Semantic integration of authoritative and VGI
PDF
Qualità dei dati OpenStreetMap: sperimentazioni sulla città di Milano e risul...
PPTX
Geographical Map Annotation With Social Metadata In a Surveillance Environment
PDF
20131106 acm geocrowd
PPT
Osm Quality Assessment 2008
PDF
Lesson2 esa summer_school_brovelli
PPTX
4B_1_How many volunteers does it take to map an area well
PPTX
Beyond good enough? Spatial Data Quality and OpenStreetMap data
PDF
OpenStreetMap and CycleStreets: collaborative map-making and cartography in t...
PDF
A GRASS-based automated procedure to compare OpenStreetMap and authoritative ...
PPT
Road Safety Data Integration using FME
ODP
Osmose, quality assurance tool
PPTX
OpenStreetMap - The Quality Issue
PPT
4B_2_A step towards the improvement of spatial quality of web 2.0 geo-applica...
DOCX
Understanding Map Integration Using GIS Software_ff
PDF
Analyzing cyclists’ behaviors and exploring the environments from cycling tracks
PDF
Kim-Blanco_Cirlugea_de Sherbinin_OSM_validation_Data_science_day
PDF
Analysing OpenStreetMap Data with QGIS
PDF
OSM and QGIS
PDF
Bogdan_Cirlugea_Master_Thesis_Poster
Semantic integration of authoritative and VGI
Qualità dei dati OpenStreetMap: sperimentazioni sulla città di Milano e risul...
Geographical Map Annotation With Social Metadata In a Surveillance Environment
20131106 acm geocrowd
Osm Quality Assessment 2008
Lesson2 esa summer_school_brovelli
4B_1_How many volunteers does it take to map an area well
Beyond good enough? Spatial Data Quality and OpenStreetMap data
OpenStreetMap and CycleStreets: collaborative map-making and cartography in t...
A GRASS-based automated procedure to compare OpenStreetMap and authoritative ...
Road Safety Data Integration using FME
Osmose, quality assurance tool
OpenStreetMap - The Quality Issue
4B_2_A step towards the improvement of spatial quality of web 2.0 geo-applica...
Understanding Map Integration Using GIS Software_ff
Analyzing cyclists’ behaviors and exploring the environments from cycling tracks
Kim-Blanco_Cirlugea_de Sherbinin_OSM_validation_Data_science_day
Analysing OpenStreetMap Data with QGIS
OSM and QGIS
Bogdan_Cirlugea_Master_Thesis_Poster
Ad

More from Martin L (20)

PPTX
Quantifying the shifting effects of newly built or improved cycling facilities
PPTX
Data for Sustainable Mobility
PPTX
(Geo-) Daten für ein besseres Verständnis der Fahrradmobilität
PPTX
Fitnesscenter Arbeitsweg
PPTX
Multidimensional Monitoring of Cycling Mobility
PPTX
Looking at cycling mobility through geographical lenses
PPTX
Promoting active, healthy commuting
PDF
GISMO - Interdisziplinäre Forschung zur Förderung aktiver, gesunder Pendelmob...
PPTX
A very high resolution bicycle flow model
PPTX
Spatial Information for Bicycle Flow Modelling
PDF
Gesünder durch Alternativen zum Auto
PPTX
Geography as melting pot for cross-domain bicycling research and promotion
PPTX
GISMO beim Forschungsforum Mobilität 2017
PPTX
Spatial Information for Bicycling Research
PPTX
Floating Bicycle Data
PPTX
Räumliche Information und Radverkehrssicherheit
PPTX
Spatial Information and Bicycling Safety
PPTX
Spatial information on bicycle crash risk for evidence-based interventions on...
PPTX
A review of current online bicycle routing portals and their potential role i...
PPTX
Bicycle Risk Estimation - Short Report
Quantifying the shifting effects of newly built or improved cycling facilities
Data for Sustainable Mobility
(Geo-) Daten für ein besseres Verständnis der Fahrradmobilität
Fitnesscenter Arbeitsweg
Multidimensional Monitoring of Cycling Mobility
Looking at cycling mobility through geographical lenses
Promoting active, healthy commuting
GISMO - Interdisziplinäre Forschung zur Förderung aktiver, gesunder Pendelmob...
A very high resolution bicycle flow model
Spatial Information for Bicycle Flow Modelling
Gesünder durch Alternativen zum Auto
Geography as melting pot for cross-domain bicycling research and promotion
GISMO beim Forschungsforum Mobilität 2017
Spatial Information for Bicycling Research
Floating Bicycle Data
Räumliche Information und Radverkehrssicherheit
Spatial Information and Bicycling Safety
Spatial information on bicycle crash risk for evidence-based interventions on...
A review of current online bicycle routing portals and their potential role i...
Bicycle Risk Estimation - Short Report

Recently uploaded (20)

PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
famous lake in india and its disturibution and importance
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
Cell Membrane: Structure, Composition & Functions
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
The scientific heritage No 166 (166) (2025)
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
2Systematics of Living Organisms t-.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
Introduction to Cardiovascular system_structure and functions-1
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
famous lake in india and its disturibution and importance
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
2. Earth - The Living Planet Module 2ELS
Derivatives of integument scales, beaks, horns,.pptx
AlphaEarth Foundations and the Satellite Embedding dataset
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Cell Membrane: Structure, Composition & Functions
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Phytochemical Investigation of Miliusa longipes.pdf
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
ECG_Course_Presentation د.محمد صقران ppt
The scientific heritage No 166 (166) (2025)
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
2Systematics of Living Organisms t-.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx

#AAG2015 presentation on OSM attribute inconsistency and semantic heterogeneity

  • 1. An intrinsic approach for the detection and correction of attributive inconsistencies and semantic heterogeneity in OSM data Martin Loidl | martin.loidl@sbg.ac.at Stefan Keller| sfkeller@hsr.ch AAG Annual Meeting – Workshop OpenStreetMap Studies Chicago, April 24th 2015
  • 2.  OSM bottom-up community approach  Rudimentary data model and attribute structure (tagging scheme K = v)  Attributes: recommendations ≠ conventions ≠ formalized standard  No restriction of tag usage and definition Problem Statement 2 http://www.openstreetmap.es
  • 3.  Within one way  Within a succession of ways (e.g. street) Attributive Inconsistencies 3 highway = motorway name = Kennedy Expressway bicycle = yes highway = motorway name = Kennedy Expressway ref = I 90 highway = motorway name = Fisher Freeway ref = I 90 highway = motorway name = Kennedy Expressway ref = I 90
  • 4.  Different (correct) description for one and the same entity  Specific to crowd-sourced data (≠ authoritative data follow strict specifications) Semantic Heterogeneity 4 highway = cycleway foot = designated width = 3 highway = path bicycle = designated foot = yes highway = footway bicycle = designated surface = asphalt
  • 5.  Considering attributive inconsistencies and semantic heterogeneity is relevant for …  Visualization (data rendering)  Descriptive statistics (classification)  Spatial analysis (e.g. routing)  Improve results through  Harmonization (remove semantic heterogeneities)  Correction through estimation (gaps, inconsistencies) Relevance 5
  • 6.  Spatial data quality  Standards (e.g. ISO 19157 = harmonization of multiple preceeding standards) and extensive body of literature  of limited use for OSM data  Quality asssessment of OSM data  Primarily focusing on positional accuracy and geometrical completeness  Reference data set and/or descriptive statistics  Comparable little work on attribute quality Data Quality 6 Haklay 2010 Hochmair et al. 2015 Barron et al. 2014
  • 7.  Why an intrinsic approach?  Extrinsic approach requires reference data set, which ideally has:  Same geographical coverage  Same data model and attribute structure  [Koukoletsos et al. (2012): multi-stage process to deal with it to a certain extent]  Quality of reference data set (authoritative data doesn‘t necessarily imply better data!)  Data often created for very different purposes Quality Assessment 7 Elsbethen (Austria): authoritative data – OSM data
  • 8.  Exclusively based on respective data set (data-centered approach)  Makes use of:  Redundancy  Inherent logic, functionally related attributes Intrinsic Approach 8 Translation into query statements highway = * surface = * tracktype = *
  • 9. Case Study Area 9  4,600 km² in Austrian-Bavarian boarder region  ~ 22,600 km total network length  Rural and urban areas  Data preparation  Extraction from OSM Database (April 1st 2015)  Conversion to topological correct graph (edge-node) in GeoDB
  • 10. Major Road Network 10  Major road = motorway, primary, secondary (incl. links)  Consistent for road category (highway = *)  Makes features mappable = primary intent/purpose of OSM  Attributes incomplete (n = 11,951 segments)  name = *: 64.6%  surface = *: 22.93% [ can be estimated: asphalt]  maxspeed = *: 72.19%  lanes = *: 57.86%  Rather an issue of completeness than of inconsistency and heterogeneity
  • 11. Local Road Network 11  Majority of ways in OSM  Differences in terms of attribute quality (existence, consistency etc.)  Relevant e.g. for active modes of transport (cycling, hiking etc.)  In many cases more extensive (spatial coverage, attribute details) than authoritative data
  • 12.  Define set of logical/legal contradictions  Connect to corresponding tags  Tag specification according to Wiki  Query the dataset for contradictions Attributive Inconsistencies 12 approx. 1 from 1,000 ("tracktype" = 'grade3' or "tracktype" = 'grade4' or "tracktype" = 'grade5') and "surface" = 'asphalt'
  • 13.  Distribution of inconsistencies:  Regional diversity (national laws?)  Spatial clusters (local mapper/communities?) Spatial Particularities 13 highway = residential maxspeed = 80
  • 14.  Correction without ground truthing = estimation  Quality of estimation depends on number of functionally related attributes Correction of Inconsistencies 14
  • 15.  How to map a mixed foot-/cycleway in OSM? Heterogeneity 15 http://www.stadt-salzburg.at
  • 16.  How to map a mixed foot-/cycleway in OSM?  Co-existence vs. “tag war”  Credibility and reputation (Flanagin & Metzger 2008) Heterogeneity 16 ("highway" = 'footway' and ("bicycle" = 'designated' or "bicycle" = 'yes' or "bicycle" = 'official')) OR ("highway" = 'cycleway' and ("foot" = 'designated' or "foot" = 'yes')) OR ("highway" = 'path' and ("foot" = 'designated' or "foot" = 'official') and ("bicycle" = 'designated' or "bicycle" = 'official')) OR ("highway" = 'track' and ("foot" = 'designated' or "foot" = 'official') and ("bicycle" = 'designated' or "bicycle" = 'official')) 669 segments 1,202 segments 2,655 segments 73 segments
  • 17.  Different (correct) views on same entity Heterogeneity 17 highway = cycleway surface = asphalt ref = BGL 3 foot = designated bicycle = designated segregated = no Last editor: j_cook highway = path surface = asphalt foot = designated bicycle = designated Last editor: pyram
  • 18. 18 highway = track name = Treppelweg surface = gravel tracktype = grade2 foot = yes bicycle = yes width = 3 highway = path name = Treppelweg surface = gravel tracktype = grade2 foot = designated bicycle = designated width = 3 http://www.bing.com/maps
  • 19.  Define derived attributes that fit best for actual purpose Harmonization of Heterogeneity 19 Loidl & Zagel (2014)
  • 20.  OSMAXX  Extracts OSM data  Data cleaning (capital letters etc.) and harmonization (generalization)  Conversion to GIS formats  For visualization and geospatial analysis Harmonization of Heterogeneity 20
  • 21.  Inconsistency = quality issue  Can be detected with intrinsic approach  Heterogeneity = depends on purpose  Definition of derived attributes  Implement assessment routines during editing or in post- processing?  Tag recommender system during editing (Vandecasteele & Devillers 2014)  Probabilistic approach and/or functionally related attributes  Prevent from contradiction  Data tuning in post-processing allows specification for actual purpose  Combination  prevent – detect – repair (Herzog et al 2007)  Data model issue  social complexity of OSM (Spielmann 2014) Wrap-Up 21 @gicycle_ gicycle.wordpress.com

Editor's Notes

  • #8: Figure from GIP report (Elsbethen)
  • #14: + clustering in test data set (around Goldegg … >> check if it‘s the same author)
  • #15: Graphic functional related attributes
  • #21: Open Street Map Arbitrary Excerpt Extraction