SlideShare a Scribd company logo
Working with
using
&
What we’ll cover
● OpenStreetMap (OSM) and it’s data model
● A Missing Maps use case that needed big data tooling to
process OSM History
● OSMesa, what it is, and what it can do
● The future of distributed OSM processing, and what it will
enable
What is OpenStreetMap?
Working with OpenStreetMap using Apache Spark and Geotrellis
OSM Data Model
The OSM data model consists mainly of 3 elements:
● Nodes - Points
● Ways - LineStrings, Polygons
● Relations - GeometryCollections, Polygon with holes,
MultiPolygons
As well as the tag-based metadata that applies to each
elements, and changesets grouping edits
OSM Data Model: Relations
OSM Data Model: Changesets
● Edits are grouped into changesets, which have their own
metadata such as use comments (for developers, think
commit messages)
● Adding hashtags to user comments allows downstream
processing to group changes - for example, #HOTLunch
Working with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and Geotrellis
Working with OpenStreetMap using Apache Spark and Geotrellis
Backfilling missing maps
● Missing maps leaderboard processes OSM change files to
increment user and campaign statistics
● The statistics were correct for when the streaming
calculation started, but there was the problem of accounting
for edits previous to that streaming calculation not counting
towards user’s totals.
● So, there was a need to “backfill” the statistics based on
OSM history.
● Through the Red Cross and a grant
from Microsoft Philanthropies, Seth
Fitzsimmons of Pacific Atlas was
hired to solve the backfilling problem.
● Seth was previously involved with
releasing OSM data as a public
dataset on AWS and early work on
distributed processing of OSM data
Working with OpenStreetMap using Apache Spark and Geotrellis
Reducing the “time to first question”
Working with OpenStreetMap using Apache Spark and Geotrellis
Source: Seth’s blog post about processing OSM with Athena
Backfill: Athena approach
● Seth first tried to use Athena to calculate the backfill
statistics. This approach didn’t work
● The complexity of the queries made the jobs blow up or
never finish
● Also, Athena's geospatial support hadn't been announced
yet, and once it was, it still didn’t work with the complicated
set of queries
● Seth started showing interest in a set of tools that Azavea
was building at the time that used Apache Spark and
GeoTrellis for calculations calculating similar statistics
● He ported his complicated SQL queries for Athena to
SparkSQL and started contributing to that effort
Backfill: New approach
Leaderboard 2.0 blog post
What is OSMesa?
● It's a loose term for a workflow for OSM data processing
● Still being defined - useful, but amorphous
● More a group of tools and techniques then, say, a library
● Uses Spark, GeoTrellis and AWS to process OSM data into
geometries, vector tiles, and statistics
● a distributed computation engine.
● An API that lets you work with distributed data as a
collection, including a DataFrames API
● Written in Scala, with language bindings for use with Java,
Python, and R.
● Spark DataFrames provide an API that is similar to R or
Pandas DataFrames; allows working with data in a SQL-like
manner
● Very powerful, and can express complicated queries
● (partially) Abstracts away the complexities of distributed
computing
● Core geospatial library in Scala
● Enables Spark with geospatial types and operations
● Generally focused on Raster data, wraps JTS for vector
support
● Vector Tile module for reading and writing vector tiles
OSMesa workflow
AWS EMR Cluster
AWS S3
ORC
Statistics
Vector Tiles
ORC files
● With OSMesa, we can create full historical geometries.
● To do this, we need needed to create a concept of “minor
versions” of geometries
Creating features from History
way v1
highway=unclassified
node v1
node v1
node v1
node v1
node v1
node v1
node v2
node v2
way v2
highway=primary
node v1
node v1
node v2
node v2
way v1
highway=unclassified
way v1
highway=unclassified
node v1
node v1
node v1
node v1
way v1.1
highway=unclassified
node v1
node v1
node v2
node v2
way v2
highway=primary
node v1
node v1
node v2
node v2
minor
version
change
● With minor versions, we can bake new ORC files that
contain geometries of every element in OSM history, with
ways/relations representing every edit to the element as well
as elements that they contain
● Then, we compute statistics per changeset based on
geometries, and roll up the statistics per user and hashtag
Full historical geometries
● Processing of full history into features in under 40 minutes
(cluster of 255 m3.2xlarge nodes)
● This is not a small cluster ( ≈$65/hour). YMMV with smaller
clusters.
● We are building update mechanisms to avoid refreshing the
entire dataset
Processing OSM data at scale
Some data created by OSMesa...
Viewing time slices of Rhode Island OSM
Historical edits for several hashtag campaigns
Global friction surface for cost distance calculations using elevation (SRTM) and OSM roads + water bodies
● Building matching between OSM and other vector datasets
● Generating vector tiles for URCHN containing a subset of
historical data to front-end analytics
OSMesa: Other current uses
This is just the beginning
The Future: Validation workflows, Reputation
scores
● Better validation workflows is a big question in the OSM
community right now (according to SOTM US 2017)
● HOT Tasking manager does some; we can do better
● One way to improve validation workflows is to suggest
validation be done by veteran mappers, validation be
suggested for more junior mappers (“reputations core”)
● Development Seed, who contribute & uses OSMesa work,
have great ideas in this space.
The Future: Data Science notebooks,
production workflows
● We are aiming to create a Python notebook environment for
doing data science on OSM, in combination with raster data
● By building on Spark and projects like GeoMesa’s
“JTSFrames”, RasterFrames, and GeoTrellis, we’re creating
a platform that works both for data scientist poking around
in a Jupyter notebook and production systems.
The Future: Machine Learning pre- and post-
processing
● Pre-processing geospatial imagery and OSM into training
chips - a distributed label-maker
● Managing data into and out of Raster Vision
● Post-processing by cleaning the model output, matching to
OSM or other vector data to remove duplicates, conflation
workflows
● Matching OSM to imagery dates: e.g. pre- and post-
disaster.
Join in the fun
● There is a lot of interesting development challenges that
need to be met in the OSM world
● OSM has many different voices in the room, but they all
have one goal: building a better map
● Join the effort to build a better map
If you could ask the OpenStreetMap any
question, at any scale, what would you ask it?
THANKS!
Rob Emanuele, Azavea
@lossyrob (Twitter, GitHub)
www.azavea.com
Seth Fitzsimmons, Pacific Atlas
@mojodna (Twitter, GitHub)
www.pacatlas.com
github.com/azavea/osmesa
OSM Data Model: Nodes
● Single location; only OSM element with geospatial data
● Can represent points of interest, or be solely for inclusion in
ways
● Represents a Point geometry
OSM Data Model: Ways
● References a sequence of ordered nodes
● Represents a LineString geometry
● Closed ways can represent Polygon geometries
OSM Data Model: Relations
● Group of nodes, ways, and other relations
● Used for representing a Polygon with holes,
MultiPolygons, and more generally GeometryCollections
OSM Data Model: Tags
● Each Node, Way and Relation can have a sequence of
tags, which are string-based keys and values. This
describes the role of each element on the map, e.g.
○ highway=residential
○ landuse=grass
○ amenity=fast_food
Source: Dongpo Deng, https://www.slideshare.net/dongpo/the-one-and-many-maps-participatory-and-temporal-diversities-in-openstreetmap
https://planet.openstreetmap.org/
Ways to work with OSM snapshots
● Import OSM data into PostGIS
○ osm2pgsql
○ imposm3
● Render into raster tiles or vector tiles
○ Mapnik
○ Tegola
● Utilize for routing software
○ pgRouting
Ways to work with OSM history
● Clip it using osmium, and import a subset into PostGIS
● After that … not a lot of mature tooling available
Why is OSM history useful
● Calculating user history statistics
● Calculating campaign history statistics
● Calculating complete answers to the question, “what has
changed?”
● Taking a snapshot of OSM at any point in history
● Analytics for research
Why ORC?
● On-demand querying + predicate push-down is possible if
OSM data is in a format that was well-understood by the
Hadoop ecosystem
● bespoke formats have their place, especially when size or
other considerations are all-consuming, but it's really
frustrating to see people continually implementing OSM PBF
parsers to be slightly faster when those parsers are typically
single-use (for a specific application). i wanted to sidestep
the whole process and use a well-known, well-supported
The Approach: Features from OSM data
● Join element data to the other elements that contain them;
for example, join each node to the way(s) it belongs to.
● Assign a minor version to ways and relations modified
because the underlying elements change; e.g. a minor
version increments for a way if someone moves the nodes
belonging to it.
● Create Points, Line, Polygons, and Multipolygons for each
major and minor version of the element.
ProcessOSM.scala on GitHub
The Approach: Vector Tile Generation
Analytic Vector Tiles
● The name we’ve been using for Vector Tiles that contain
information for analysis not (necessarily) for display
● OSMesa/VectorPipe can create sets of Analytic Vector Tiles
from arbitrary subsets of OSM History and publish them to
AWS S3
● Think custom Mapbox QA Tiles, containing relations and
historical elements
● We are creating streaming update workflows to keep
Analytic Vector Tile sets up-to-the-minute (almost).
Other work in this space
● Mapbox’s Jennings Anderson gave a talk at SOTM and
wrote a blog post around quarterly QA tiles
● Uses a work-in-progress project called osm-wayback to
create the historical QA tiles. Goal of project is “...to create
historic geometries for each intermediate version of an OSM
feature.”
● RocksDB on the backend, which creates a ≈ 600GB index
● We have collaborating and looking to further collaborate,
the work is awesome
Animation of Rhode Island OSM edits over time
Global friction surface for cost distance calculations using elevation (SRTM) and OSM roads + water bodies
Working with OpenStreetMap using Apache Spark and Geotrellis
How to get started with OSMesa
● GitHub
● Gitter
● Docs are a TODO
An Aside - “Push vs Pull” models for AI
tooling for OSM (and in general)

More Related Content

PPT
parallel computing.ppt
PPTX
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
PDF
BKK16-504 Running Linux in EL2 Virtualization
PDF
Kafka tiered-storage-meetup-2022-final-presented
PDF
End-to-end Data Pipeline with Apache Spark
PPTX
Apache Kudu: Technical Deep Dive


PDF
Apache Spark Introduction
PDF
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
parallel computing.ppt
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
BKK16-504 Running Linux in EL2 Virtualization
Kafka tiered-storage-meetup-2022-final-presented
End-to-end Data Pipeline with Apache Spark
Apache Kudu: Technical Deep Dive


Apache Spark Introduction
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)

What's hot (20)

PPT
Distributed Operating System
PPTX
Building Reliable Lakehouses with Apache Flink and Delta Lake
PDF
What is new in Apache Hive 3.0?
PPTX
Apache Airflow Introduction
PDF
Distributed Operating System_1
PPTX
Introduction to Apache Kudu
PDF
Semaphores
PPTX
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
PPTX
Estrategias de búsqueda
PDF
TeraSort
PPT
Лекція №1
PDF
PostgreSQL and Benchmarks
PDF
Introducing the Apache Flink Kubernetes Operator
PPTX
Presto: Distributed sql query engine
PDF
How Impala Works
PPTX
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
PDF
Operating System Lecture Notes
PDF
Cosco: An Efficient Facebook-Scale Shuffle Service
PDF
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
PDF
Optimizing Hive Queries
Distributed Operating System
Building Reliable Lakehouses with Apache Flink and Delta Lake
What is new in Apache Hive 3.0?
Apache Airflow Introduction
Distributed Operating System_1
Introduction to Apache Kudu
Semaphores
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Estrategias de búsqueda
TeraSort
Лекція №1
PostgreSQL and Benchmarks
Introducing the Apache Flink Kubernetes Operator
Presto: Distributed sql query engine
How Impala Works
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
Operating System Lecture Notes
Cosco: An Efficient Facebook-Scale Shuffle Service
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Optimizing Hive Queries
Ad

Similar to Working with OpenStreetMap using Apache Spark and Geotrellis (20)

PDF
OpenStreetMap in the age of Spark
PPT
From OpenStreetMap to PhillyTreeMap - Esri Dev Summit
PDF
Using OSM in Commercial Apps
PDF
PoliMappers - Introduction to OpenStreetMap
PPTX
Rob Savoye, Freelance Developer, OSM Data Manipulation | Workshop | SotM Asia...
ODP
OpenStreetMap : Open Licensed GeoData
ODP
OpenStreetMap : Open Licensed GeoData
PDF
Qualità dei dati OpenStreetMap: sperimentazioni sulla città di Milano e risul...
PPTX
Openstreetmap
PDF
How to organize and run your own OSM humanitarian mapathon
PDF
OpenGeoData Italia - Roma - Simone Cortesi | Maurizio Napolitano | openstreet...
PDF
A dive into OpenStreetMap (by M. Napolitano)
ODP
OpenStreetMap (en Zzzinc)
PDF
Introduction to OSM
PPT
GIS Data Types
ODP
Participatory mapping with OSM in Ulan Bator, Mongolia: general presentation ...
ODP
Introduction to OpenStreetMap and Humanitarian OSM Team for Plan Internationa...
PDF
Producir conocimiento espacial - Workshop WP1
OpenStreetMap in the age of Spark
From OpenStreetMap to PhillyTreeMap - Esri Dev Summit
Using OSM in Commercial Apps
PoliMappers - Introduction to OpenStreetMap
Rob Savoye, Freelance Developer, OSM Data Manipulation | Workshop | SotM Asia...
OpenStreetMap : Open Licensed GeoData
OpenStreetMap : Open Licensed GeoData
Qualità dei dati OpenStreetMap: sperimentazioni sulla città di Milano e risul...
Openstreetmap
How to organize and run your own OSM humanitarian mapathon
OpenGeoData Italia - Roma - Simone Cortesi | Maurizio Napolitano | openstreet...
A dive into OpenStreetMap (by M. Napolitano)
OpenStreetMap (en Zzzinc)
Introduction to OSM
GIS Data Types
Participatory mapping with OSM in Ulan Bator, Mongolia: general presentation ...
Introduction to OpenStreetMap and Humanitarian OSM Team for Plan Internationa...
Producir conocimiento espacial - Workshop WP1
Ad

More from Rob Emanuele (9)

PPTX
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
PDF
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
PPTX
Deep Learning on Aerial Imagery: What does it look like on a map?
PDF
Q4 2016 GeoTrellis Presentation
PDF
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
PDF
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
PDF
Processing Geospatial Data At Scale @locationtech
PDF
Processing Geospatial at Scale at LocationTech
PDF
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Deep Learning on Aerial Imagery: What does it look like on a map?
Q4 2016 GeoTrellis Presentation
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Processing Geospatial Data At Scale @locationtech
Processing Geospatial at Scale at LocationTech
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Big Data Technologies - Introduction.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Approach and Philosophy of On baking technology
PPTX
A Presentation on Artificial Intelligence
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
cuic standard and advanced reporting.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Programs and apps: productivity, graphics, security and other tools
Assigned Numbers - 2025 - Bluetooth® Document
20250228 LYD VKU AI Blended-Learning.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Mobile App Security Testing_ A Comprehensive Guide.pdf
sap open course for s4hana steps from ECC to s4
Unlocking AI with Model Context Protocol (MCP)
Big Data Technologies - Introduction.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Approach and Philosophy of On baking technology
A Presentation on Artificial Intelligence
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Digital-Transformation-Roadmap-for-Companies.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

Working with OpenStreetMap using Apache Spark and Geotrellis

  • 2. What we’ll cover ● OpenStreetMap (OSM) and it’s data model ● A Missing Maps use case that needed big data tooling to process OSM History ● OSMesa, what it is, and what it can do ● The future of distributed OSM processing, and what it will enable
  • 5. OSM Data Model The OSM data model consists mainly of 3 elements: ● Nodes - Points ● Ways - LineStrings, Polygons ● Relations - GeometryCollections, Polygon with holes, MultiPolygons As well as the tag-based metadata that applies to each elements, and changesets grouping edits
  • 6. OSM Data Model: Relations
  • 7. OSM Data Model: Changesets ● Edits are grouped into changesets, which have their own metadata such as use comments (for developers, think commit messages) ● Adding hashtags to user comments allows downstream processing to group changes - for example, #HOTLunch
  • 12. Backfilling missing maps ● Missing maps leaderboard processes OSM change files to increment user and campaign statistics ● The statistics were correct for when the streaming calculation started, but there was the problem of accounting for edits previous to that streaming calculation not counting towards user’s totals. ● So, there was a need to “backfill” the statistics based on OSM history.
  • 13. ● Through the Red Cross and a grant from Microsoft Philanthropies, Seth Fitzsimmons of Pacific Atlas was hired to solve the backfilling problem. ● Seth was previously involved with releasing OSM data as a public dataset on AWS and early work on distributed processing of OSM data
  • 15. Reducing the “time to first question”
  • 17. Source: Seth’s blog post about processing OSM with Athena
  • 18. Backfill: Athena approach ● Seth first tried to use Athena to calculate the backfill statistics. This approach didn’t work ● The complexity of the queries made the jobs blow up or never finish ● Also, Athena's geospatial support hadn't been announced yet, and once it was, it still didn’t work with the complicated set of queries
  • 19. ● Seth started showing interest in a set of tools that Azavea was building at the time that used Apache Spark and GeoTrellis for calculations calculating similar statistics ● He ported his complicated SQL queries for Athena to SparkSQL and started contributing to that effort Backfill: New approach
  • 21. What is OSMesa? ● It's a loose term for a workflow for OSM data processing ● Still being defined - useful, but amorphous ● More a group of tools and techniques then, say, a library ● Uses Spark, GeoTrellis and AWS to process OSM data into geometries, vector tiles, and statistics
  • 22. ● a distributed computation engine. ● An API that lets you work with distributed data as a collection, including a DataFrames API ● Written in Scala, with language bindings for use with Java, Python, and R.
  • 23. ● Spark DataFrames provide an API that is similar to R or Pandas DataFrames; allows working with data in a SQL-like manner ● Very powerful, and can express complicated queries ● (partially) Abstracts away the complexities of distributed computing
  • 24. ● Core geospatial library in Scala ● Enables Spark with geospatial types and operations ● Generally focused on Raster data, wraps JTS for vector support ● Vector Tile module for reading and writing vector tiles
  • 25. OSMesa workflow AWS EMR Cluster AWS S3 ORC Statistics Vector Tiles ORC files
  • 26. ● With OSMesa, we can create full historical geometries. ● To do this, we need needed to create a concept of “minor versions” of geometries Creating features from History
  • 27. way v1 highway=unclassified node v1 node v1 node v1 node v1 node v1 node v1 node v2 node v2 way v2 highway=primary node v1 node v1 node v2 node v2 way v1 highway=unclassified
  • 28. way v1 highway=unclassified node v1 node v1 node v1 node v1 way v1.1 highway=unclassified node v1 node v1 node v2 node v2 way v2 highway=primary node v1 node v1 node v2 node v2 minor version change
  • 29. ● With minor versions, we can bake new ORC files that contain geometries of every element in OSM history, with ways/relations representing every edit to the element as well as elements that they contain ● Then, we compute statistics per changeset based on geometries, and roll up the statistics per user and hashtag Full historical geometries
  • 30. ● Processing of full history into features in under 40 minutes (cluster of 255 m3.2xlarge nodes) ● This is not a small cluster ( ≈$65/hour). YMMV with smaller clusters. ● We are building update mechanisms to avoid refreshing the entire dataset Processing OSM data at scale
  • 31. Some data created by OSMesa...
  • 32. Viewing time slices of Rhode Island OSM
  • 33. Historical edits for several hashtag campaigns
  • 34. Global friction surface for cost distance calculations using elevation (SRTM) and OSM roads + water bodies
  • 35. ● Building matching between OSM and other vector datasets ● Generating vector tiles for URCHN containing a subset of historical data to front-end analytics OSMesa: Other current uses
  • 36. This is just the beginning
  • 37. The Future: Validation workflows, Reputation scores ● Better validation workflows is a big question in the OSM community right now (according to SOTM US 2017) ● HOT Tasking manager does some; we can do better ● One way to improve validation workflows is to suggest validation be done by veteran mappers, validation be suggested for more junior mappers (“reputations core”) ● Development Seed, who contribute & uses OSMesa work, have great ideas in this space.
  • 38. The Future: Data Science notebooks, production workflows ● We are aiming to create a Python notebook environment for doing data science on OSM, in combination with raster data ● By building on Spark and projects like GeoMesa’s “JTSFrames”, RasterFrames, and GeoTrellis, we’re creating a platform that works both for data scientist poking around in a Jupyter notebook and production systems.
  • 39. The Future: Machine Learning pre- and post- processing ● Pre-processing geospatial imagery and OSM into training chips - a distributed label-maker ● Managing data into and out of Raster Vision ● Post-processing by cleaning the model output, matching to OSM or other vector data to remove duplicates, conflation workflows ● Matching OSM to imagery dates: e.g. pre- and post- disaster.
  • 40. Join in the fun ● There is a lot of interesting development challenges that need to be met in the OSM world ● OSM has many different voices in the room, but they all have one goal: building a better map ● Join the effort to build a better map
  • 41. If you could ask the OpenStreetMap any question, at any scale, what would you ask it?
  • 42. THANKS! Rob Emanuele, Azavea @lossyrob (Twitter, GitHub) www.azavea.com Seth Fitzsimmons, Pacific Atlas @mojodna (Twitter, GitHub) www.pacatlas.com github.com/azavea/osmesa
  • 43. OSM Data Model: Nodes ● Single location; only OSM element with geospatial data ● Can represent points of interest, or be solely for inclusion in ways ● Represents a Point geometry
  • 44. OSM Data Model: Ways ● References a sequence of ordered nodes ● Represents a LineString geometry ● Closed ways can represent Polygon geometries
  • 45. OSM Data Model: Relations ● Group of nodes, ways, and other relations ● Used for representing a Polygon with holes, MultiPolygons, and more generally GeometryCollections
  • 46. OSM Data Model: Tags ● Each Node, Way and Relation can have a sequence of tags, which are string-based keys and values. This describes the role of each element on the map, e.g. ○ highway=residential ○ landuse=grass ○ amenity=fast_food
  • 47. Source: Dongpo Deng, https://www.slideshare.net/dongpo/the-one-and-many-maps-participatory-and-temporal-diversities-in-openstreetmap
  • 49. Ways to work with OSM snapshots ● Import OSM data into PostGIS ○ osm2pgsql ○ imposm3 ● Render into raster tiles or vector tiles ○ Mapnik ○ Tegola ● Utilize for routing software ○ pgRouting
  • 50. Ways to work with OSM history ● Clip it using osmium, and import a subset into PostGIS ● After that … not a lot of mature tooling available
  • 51. Why is OSM history useful ● Calculating user history statistics ● Calculating campaign history statistics ● Calculating complete answers to the question, “what has changed?” ● Taking a snapshot of OSM at any point in history ● Analytics for research
  • 52. Why ORC? ● On-demand querying + predicate push-down is possible if OSM data is in a format that was well-understood by the Hadoop ecosystem ● bespoke formats have their place, especially when size or other considerations are all-consuming, but it's really frustrating to see people continually implementing OSM PBF parsers to be slightly faster when those parsers are typically single-use (for a specific application). i wanted to sidestep the whole process and use a well-known, well-supported
  • 53. The Approach: Features from OSM data ● Join element data to the other elements that contain them; for example, join each node to the way(s) it belongs to. ● Assign a minor version to ways and relations modified because the underlying elements change; e.g. a minor version increments for a way if someone moves the nodes belonging to it. ● Create Points, Line, Polygons, and Multipolygons for each major and minor version of the element. ProcessOSM.scala on GitHub
  • 54. The Approach: Vector Tile Generation
  • 55. Analytic Vector Tiles ● The name we’ve been using for Vector Tiles that contain information for analysis not (necessarily) for display ● OSMesa/VectorPipe can create sets of Analytic Vector Tiles from arbitrary subsets of OSM History and publish them to AWS S3 ● Think custom Mapbox QA Tiles, containing relations and historical elements ● We are creating streaming update workflows to keep Analytic Vector Tile sets up-to-the-minute (almost).
  • 56. Other work in this space ● Mapbox’s Jennings Anderson gave a talk at SOTM and wrote a blog post around quarterly QA tiles ● Uses a work-in-progress project called osm-wayback to create the historical QA tiles. Goal of project is “...to create historic geometries for each intermediate version of an OSM feature.” ● RocksDB on the backend, which creates a ≈ 600GB index ● We have collaborating and looking to further collaborate, the work is awesome
  • 57. Animation of Rhode Island OSM edits over time
  • 58. Global friction surface for cost distance calculations using elevation (SRTM) and OSM roads + water bodies
  • 60. How to get started with OSMesa ● GitHub ● Gitter ● Docs are a TODO
  • 61. An Aside - “Push vs Pull” models for AI tooling for OSM (and in general)