SlideShare a Scribd company logo
The$Direc)ons$Pipeline$at$Mapbox
The Directions Pipeline at Mapbox - AWS Meetup Berlin June 2015
The Directions Pipeline at Mapbox - AWS Meetup Berlin June 2015
Goal
• Always(have(the(freshest(map(data(available(for(rou6ng
• Allows(to(6mely(fix(data(problems
1. Get&OpenStreetMap&data
2. Pre1process&for&Direc8ons
3. Load&new&data&into&API&servers
4. Repeat
• Every'step'is'an'own'CloudForma6on'stack.
Overview
1. Get$OpenStreetMap$data
2. Pre'process,for,Direc0ons
3. Load,new,data,into,API,servers
4. Repeat
• Every'step'is'own'CloudForma6on'stack.
OpenStreetMap
• Planet.osm,is,all,OpenStreetMap,data,in,one,file
• New,version,release,every,week
• Big,file:,576,GB,uncompressed,,28,GB,as,compressed,protobuf
• Incremental,diff,update,available,every,minute
OpenStreetMap
• 1#EC2#c3.2xlarge#instance
• Replays#latest#changesets#to#generate#a#new#planet#file
• New#planet#every#2#1/2#hours,#uploaded#to#S3
• Updates#a#file#/latest#with#a#reference#to#most#recent#planet
• Old#planets#are#purged#via#S3#Lifecycle#Object#ExpiraHon
Overview
1. Get$OpenStreetMap$data
2. Pre'process,for,Direc0ons
3. Load,new,data,into,API,servers
4. Repeat
Overview
1. Get&OpenStreetMap&data
2. Pre$process)for)Direc-ons
3. Load&new&data&into&API&servers
4. Repeat
Direc&ons*pre,processing
• 1#EC2#r3.4xlarge#on#Spot#Pricing#per#profile
• Fetch#latest#OSM#planet#from#S3#(discovery#via#/latest)
• Run#preDprocessing#for#profile
• Car:#6#Hours
• Bicycle:#15#Hours
• Walk:#23#Hours
Direc&ons*pre,processing
• Upload(results(to(S3((and(update(/latest)
• Update(CloudForma6on(stack(of(API
Overview
1. Get&OpenStreetMap&data
2. Pre$process)for)Direc-ons
3. Load&new&data&into&API&servers
4. Repeat
Overview
1. Get&OpenStreetMap&data
2. Pre1process&for&Direc8ons
3. Load%new%data%into%API%servers
4. Repeat
Direc&ons*API
• One%CloudForma/on%stack%per%Profile
• Several%EC2s%r3.2xlarge
• ELB/AutoScaling
• On%start,%downloads%latest%Direc/ons%data%from%S3
• EC2%instances%are%cycled%on%CloudForma/on%parameter%update
Direc&ons*pre,processing
aws cloudformation update-stack 
--stack-name "$ApiDirectionsStack" 
--use-previous-template 
--capabilities "CAPABILITY_IAM" 
--parameters $params ParameterKey=LatestTimstamp,ParameterValue="$TimeStamp"
Direc&ons*API*CloudForma&on*Template
{
"Parameters": {
"LatestTimstamp": {
"Type": "String"
}
},
"LaunchConfiguration": {
"Type": "AWS::AutoScaling::LaunchConfiguration",
"UserData": {
{
"Ref": "LatestTimstamp"
}
}
},
"AutoScalingGroup": {
"Type": "AWS::AutoScaling::AutoScalingGroup",
"UpdatePolicy": {
"AutoScalingRollingUpdate": {
}
}
}
}
The Directions Pipeline at Mapbox - AWS Meetup Berlin June 2015
The$Good
• Few%moving%parts%and%easy%concept
• Upda5ng%stack%parameter%mirrors%our%exis5ng%deployment%flow
• Data<driven%approach%decouples%stacks
• no%queues
• Easy%1:n%distribu5on
• Star5ng%new%stacks%is%easy,%they%just%pick%up%latest%state
The$Bad
• Upda&ng)CloudForma&on)stack)parameters)needs)a)lot)of)IAM)
permissions
• Poten&al)security)problem)if)instance)is)hacked
• AutoScaling)is)not)scopeable)by)resource
• Poten&al)for)UPDATE_ROLLBACK_FAILED
• Problem)with)UpdatePolicy,)AutoScaling)and)nonBboo&ng)
EC2s
Open%Source
• h#ps://wiki.openstreetmap.org/wiki/Osmosis
• Process7and7forward7OpenStreetMap7data
• h#p://project=osrm.org/
• Open%Source%Rou,ng%Machine7for7doing7Direc?ons7queries
• Load7OpenStreetMap7data
• Pre=processing7the7data7for7a7profile
• Do7queries7against7the7pre=processed7data
The Directions Pipeline at Mapbox - AWS Meetup Berlin June 2015
1. Get&OpenStreetMap&data
2. Pre1process&for&Direc8ons
3. Load&new&data&into&API&servers
4. Repeat
@freenerd
johan@mapbox.com

More Related Content

PDF
Climate data in r with the raster package
PDF
Downsampling your data October 2017
PDF
Streams processing with Storm
PPTX
Faster Workflows, Faster
PDF
Scaling metrics
PDF
Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without Downtime
PDF
Graph Everything
PDF
Scaling Elasticsearch at Synthesio
Climate data in r with the raster package
Downsampling your data October 2017
Streams processing with Storm
Faster Workflows, Faster
Scaling metrics
Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Without Downtime
Graph Everything
Scaling Elasticsearch at Synthesio

What's hot (20)

PDF
Getting Ready to Move to InfluxDB 2.0 | Tim Hall | InfluxData
PDF
Virtual training Intro to InfluxDB & Telegraf
KEY
Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...
PDF
Beautiful Monitoring With Grafana and InfluxDB
PDF
Monitoring InfluxEnterprise
PPTX
Ordered Record Collection
PPTX
실시간 인벤트 처리
PDF
Indexing big data in the cloud
PPTX
Weather of the Century: Visualization
DOCX
R Data Visualization-Spatial data and Maps in R: Using R as a GIS
PDF
Introduction to InfluxDB and TICK Stack
PDF
Time Series Database and Tick Stack
PPTX
Confidentiality as a service –usable security for the cloud
PDF
Developing Ansible Dynamic Inventory Script - Nov 2017
PDF
Statim, time series interface for Perl.
PDF
Real time and reliable processing with Apache Storm
PDF
The Weather of the Century
PDF
PDF
Hash Functions FTW
PDF
The Weather of the Century Part 3: Visualization
Getting Ready to Move to InfluxDB 2.0 | Tim Hall | InfluxData
Virtual training Intro to InfluxDB & Telegraf
Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...
Beautiful Monitoring With Grafana and InfluxDB
Monitoring InfluxEnterprise
Ordered Record Collection
실시간 인벤트 처리
Indexing big data in the cloud
Weather of the Century: Visualization
R Data Visualization-Spatial data and Maps in R: Using R as a GIS
Introduction to InfluxDB and TICK Stack
Time Series Database and Tick Stack
Confidentiality as a service –usable security for the cloud
Developing Ansible Dynamic Inventory Script - Nov 2017
Statim, time series interface for Perl.
Real time and reliable processing with Apache Storm
The Weather of the Century
Hash Functions FTW
The Weather of the Century Part 3: Visualization
Ad

Similar to The Directions Pipeline at Mapbox - AWS Meetup Berlin June 2015 (20)

PPTX
Spark etl
PDF
Everything you wanted to know about Trove but didn't know whom to ask!
PDF
Speedment - Reactive programming for Java8
PPTX
Java days gbg online
PDF
Introduction to Java 8 java.time
PDF
introduction to data processing using Hadoop and Pig
PDF
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
PDF
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
PDF
Building a Complex, Real-Time Data Management Application
KEY
Patch Maps
PPTX
MongoDB's New Aggregation framework
PDF
スマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみた
PPTX
Introducing Cloudian HyperStore 6.0
PDF
Scaling ingest pipelines with high performance computing principles - Rajiv K...
PDF
Migrating from matlab to python
PDF
SCAPE Information Day at BL - Large Scale Processing with Hadoop
PDF
Node Boot Camp
PDF
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
PDF
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...
KEY
Perl on Amazon Elastic MapReduce
Spark etl
Everything you wanted to know about Trove but didn't know whom to ask!
Speedment - Reactive programming for Java8
Java days gbg online
Introduction to Java 8 java.time
introduction to data processing using Hadoop and Pig
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Building a Complex, Real-Time Data Management Application
Patch Maps
MongoDB's New Aggregation framework
スマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみた
Introducing Cloudian HyperStore 6.0
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Migrating from matlab to python
SCAPE Information Day at BL - Large Scale Processing with Hadoop
Node Boot Camp
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service po...
Perl on Amazon Elastic MapReduce
Ad

More from Johan (14)

PDF
Mapbox at Product Crunch Berlin 2018
PDF
In-Car Navigation with OSRM - Wherecamp Berlin 2016
PDF
State of OSRM - SOTM 2016
PDF
Open Source Routing Machine - FOSS4G 2016 Bonn
PDF
How Mapbox Scales over 9 AWS Regions
PDF
Mapbox.com: Serving maps from 8 regions
PDF
Tracks In A Box FAIL @ MHD Stockholm 2013
PDF
Music Hack Day Reykajvík talk at You Are In Control 2012
PDF
Nerdnite
PDF
HPI hack'n'Tell Hackdays
PDF
Future Music Camp 2012 Talk
PDF
DJ Masterclass
KEY
Barcamp London 7 Tracksonamap Google App Engine
PDF
Entwicklung in Open Source Projekten - MediaWiki
Mapbox at Product Crunch Berlin 2018
In-Car Navigation with OSRM - Wherecamp Berlin 2016
State of OSRM - SOTM 2016
Open Source Routing Machine - FOSS4G 2016 Bonn
How Mapbox Scales over 9 AWS Regions
Mapbox.com: Serving maps from 8 regions
Tracks In A Box FAIL @ MHD Stockholm 2013
Music Hack Day Reykajvík talk at You Are In Control 2012
Nerdnite
HPI hack'n'Tell Hackdays
Future Music Camp 2012 Talk
DJ Masterclass
Barcamp London 7 Tracksonamap Google App Engine
Entwicklung in Open Source Projekten - MediaWiki

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
A Presentation on Artificial Intelligence
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Empathic Computing: Creating Shared Understanding
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Machine learning based COVID-19 study performance prediction
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Approach and Philosophy of On baking technology
MYSQL Presentation for SQL database connectivity
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
A Presentation on Artificial Intelligence
Unlocking AI with Model Context Protocol (MCP)
The Rise and Fall of 3GPP – Time for a Sabbatical?
Empathic Computing: Creating Shared Understanding
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Encapsulation_ Review paper, used for researhc scholars
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Dropbox Q2 2025 Financial Results & Investor Presentation
Digital-Transformation-Roadmap-for-Companies.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
NewMind AI Weekly Chronicles - August'25 Week I
Machine learning based COVID-19 study performance prediction
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Approach and Philosophy of On baking technology

The Directions Pipeline at Mapbox - AWS Meetup Berlin June 2015