Autoscaling near-
persistent EBS
Emil Filipov
AWS Meetup Berlin| November, 2017
HERE Technologies
• The company for location intelligence!
• 7000+ employees worldwide
• 1000+ employees in Berlin
• Navteq (2004), merged into Nokia (2011), split out as HERE (2015)
• Privately owned by a consortium of Audi, BMW, Daimler, Intel and others
• One of the biggest AWS customers
• We’re hiring!
https://www.here.com/careers
© 2017 HERE | PublicAutoscaling near-persistent EBS | November, 20172
About me
• Software Engineer @HERE Technologies
• Certified AWS wrangler
• Python zealot
• Professional Slavic Pessimist
• Working on AWS-centric deployment platforms and automation
© 2017 HERE | PublicAutoscaling near-persistent EBS | November, 20173
01
History
…all good things come out of problems
The Platform
• Dozen identical PuppetMaster EC2 instances in 2 regions
• Delivering configuration/changes to several thousand nodes
• Each node polling PMs every few minutes
• Covering ~100 applications with multiple environments for each
• Working across AWS accounts
© 2017 HERE | PublicAutoscaling near-persistent EBS | November, 20175
The Configuration Path
• Each app lives in it’s own SCM tree
• Each tree sports multiple tags and branches
• Each commit to a tree tag/branch is put through a CI system
• The artifact out of CI is the state of the tag/branch at that commit
• …in the form of an RPM package
• PMs install new RPM packages continuously from a central repo
© 2017 HERE | PublicAutoscaling near-persistent EBS | November, 20176
The Problem
• For a PM to be functional it needs to install thousands of RPMs
• Booting a fresh PM instance takes hours!
 No dynamic scale
 Painful updates & maintenance
 Burning cycles doing the same thing over and over again
• Smells Like Datacenter Spirit
© 2017 HERE | PublicAutoscaling near-persistent EBS | November, 20177
02
Finding a solution
…like a pro
So, I bet you…
• Described in exact detail the pain points of the current system
• Gathered the requirements a potential solution should fulfill
• Conducted scientific research in the space of possible solutions
• Compiled reports of the performance characteristics, pros and cons
• After a vigorous, but objective discussion, a natural winner emerged
Sure…
© 2017 HERE | PublicAutoscaling near-persistent EBS | November, 20179
How about we boot off a backed-up state?
• How are we going to do/store backups?
 EBS Snapshots!
• Who’s gonna take those Snapshots?
 Lambda!
• How frequently do we take Snapshots?
 Every minute!
• Who’s gonna make new EBS volumes and attach them to instances?
 Puppet!
(Oh well, let me put a second RPM DB on that EBS volume)
© 2017 HERE | PublicAutoscaling near-persistent EBS | November, 201710
03
Implementation
…turned out to be rather easy
The DNA
© 2017 HERE | PublicAutoscaling near-persistent EBS | November, 201712
Basic implementation
• Lambda function
 Creates a single Snapshot off the oldest running instance
 Tag new snapshot with expiration time
 Delete all expired snapshots (but never if there is only 1 standing)
• Configuration management (Puppet) automation
 Create EBS volume off the freshest Snapshot
 Wait until volume is available
 Attach
 Mount
© 2017 HERE | PublicAutoscaling near-persistent EBS | November, 201713
Achievement unlocked!
• Boot times reduced from hours to minutes
• Lambda packaged together with the PM CloudFormation stack
• Very few settings = simple to add to an existing stack
• No penalty for the working instances in IO latency or CPU
• Little cost overhead
 Lambda: less than $0.30/month (once per minute execution)
 Snapshots: (1 * DataSize + ChangeRate) * S3 price
© 2017 HERE | PublicAutoscaling near-persistent EBS | November, 201714
Here Be Dragons!
• EBS is only crash-consistent; fsck is a must
• App should ideally be able to handle crashes (but RPM DB does not!)
 Prepare for corrupted data on disk!
• Have to handle EBS leaks separately
• Some persistency is lost between the snapshots
• One more component to monitor
© 2017 HERE | PublicAutoscaling near-persistent EBS | November, 201715
04
Now that I have a Hammer
…I see a lot of nails
When is the near-persistent EBS pattern good?
You can incrementally
sync-up data
You have a lot of static
data
You can afford some data
loss
• You can do “cheap” sync-up
from peers or external
location
• Small or temporary
differences in the dataset
are not problematic
• Example: Replicated
CouchDB, read replicas,
resilient distributed storage
systems in general
• Your application needs to
load a lot of data on startup
• Data is too big to
conveniently use an AMI or
download off central storage
• Examples: ML corpus, Map
Data 
• Data is volatile and losing a
few minutes worth is not
problematic
• You’re sure of your app’s
ability to deal with
filesystem crashes
• Examples: some metric
systems, continuous
training (think search term
autocomplete)
© 2017 HERE | PublicAutoscaling near-persistent EBS | November, 201717
Comparison with other storage strategies
Pattern Near-Persistent
EBS
EFS/NFS “Golden”
EBS
Fully persistent -  
Multi-instance access   -
Single writer  - 
Linear performance  - -
AZ independent   -
© 2017 HERE | PublicAutoscaling near-persistent EBS | November, 201718
05
Demo Time
Source code coming soon at
© 2017 HERE | PublicAutoscaling near-persistent EBS | November, 201720
https://github.com/heremaps
Emil Filipov
Email: emil.filipov (here.com)
IRC: tie (FreeNode)
Thank
you!

More Related Content

PPTX
Cloud Costing Services
PPTX
Big Data Day LA 2015 - Lessons learned from scaling Big Data in the Cloud by...
PPTX
Making it Rain - IMA AWS Usage MCN 2012
PPTX
Lessons Learnt from Guanyu
PPTX
Ansible
PPTX
Leveraging OpenStack at Scale: How the Elastic Cloud Drives Innovation Velocity
PDF
Tis the Season to Scale
PPTX
Not all that glitter is AWS - Nerdearla2016
Cloud Costing Services
Big Data Day LA 2015 - Lessons learned from scaling Big Data in the Cloud by...
Making it Rain - IMA AWS Usage MCN 2012
Lessons Learnt from Guanyu
Ansible
Leveraging OpenStack at Scale: How the Elastic Cloud Drives Innovation Velocity
Tis the Season to Scale
Not all that glitter is AWS - Nerdearla2016

What's hot (16)

PPTX
Operationnal challenges behind Serverless architectures by Laurent Bernaille
PDF
Migrate the Mission Critical Application to AWS Cloud
PPTX
Scale the Cloud - Skaluj chmurę
PPTX
Azure and/or AWS: How to Choose the best cloud platform for your project
PDF
Serverless Comparison: AWS vs Azure vs Google vs IBM
PDF
Cloud Lessons Learned: 3 Cloud Case Studies
PDF
Building a Service Provider Cloud Offering - MVMUG Sept2013
PPTX
AWS account migration for BBC iPlayer Radio
PDF
Training Slides: Introduction To Tungsten Solutions
KEY
Defluffing Cloud Computing
PDF
Ec2onrails
PPTX
A lap around AWS
PPTX
Using Terraform for AWS as the IaC tool
PDF
TransAtlantic Networking using Cloud links
PDF
Amazon Redshift (February 2016)
PPTX
Promise of a better future by Rahul Goma Phulore and Pooja Akshantal, Thought...
Operationnal challenges behind Serverless architectures by Laurent Bernaille
Migrate the Mission Critical Application to AWS Cloud
Scale the Cloud - Skaluj chmurę
Azure and/or AWS: How to Choose the best cloud platform for your project
Serverless Comparison: AWS vs Azure vs Google vs IBM
Cloud Lessons Learned: 3 Cloud Case Studies
Building a Service Provider Cloud Offering - MVMUG Sept2013
AWS account migration for BBC iPlayer Radio
Training Slides: Introduction To Tungsten Solutions
Defluffing Cloud Computing
Ec2onrails
A lap around AWS
Using Terraform for AWS as the IaC tool
TransAtlantic Networking using Cloud links
Amazon Redshift (February 2016)
Promise of a better future by Rahul Goma Phulore and Pooja Akshantal, Thought...
Ad

Similar to Autoscaling near-persistent EBS (20)

PPTX
Running Databases in Containers - Overcome the Challenges of Heavy Containers
PPTX
Virtualization and Containers
PPTX
Essbase On-Prem to Oracle Analytics Cloud - How, When, and Why
PPTX
Case Study: University Alabama-Birmingham.
PPTX
OpenStack and Ceph case study at the University of Alabama
PPTX
Save 60% of Kubernetes storage costs on AWS & others with OpenEBS
PPTX
Cloudy with a Chance of Databases
PDF
Serverless Compose vs hurtownia danych
PPTX
The future of Essbase: Hybrid database format
PPTX
Kubernetes for Docker Users
PDF
Managing Geospatial Open Data Serverlessly [Cloud Native Bern Meetup | May 2025]
PDF
OpenEBS; asymmetrical block layer in user-space breaking the million IOPS bar...
PPTX
Building a PaaS with Docker and AWS
PPTX
Mapping Life Science Informatics to the Cloud
PPTX
CLOUD COMPUTING AWS SERVICESUnit 2 Part 2.pptx
PDF
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
PPTX
Mesos swam-kubernetes-vds-02062017
PPTX
Simplivity webinar presentation
PPTX
Make a Move to AWS Now
PDF
Serverless is the future... or is it?
Running Databases in Containers - Overcome the Challenges of Heavy Containers
Virtualization and Containers
Essbase On-Prem to Oracle Analytics Cloud - How, When, and Why
Case Study: University Alabama-Birmingham.
OpenStack and Ceph case study at the University of Alabama
Save 60% of Kubernetes storage costs on AWS & others with OpenEBS
Cloudy with a Chance of Databases
Serverless Compose vs hurtownia danych
The future of Essbase: Hybrid database format
Kubernetes for Docker Users
Managing Geospatial Open Data Serverlessly [Cloud Native Bern Meetup | May 2025]
OpenEBS; asymmetrical block layer in user-space breaking the million IOPS bar...
Building a PaaS with Docker and AWS
Mapping Life Science Informatics to the Cloud
CLOUD COMPUTING AWS SERVICESUnit 2 Part 2.pptx
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Mesos swam-kubernetes-vds-02062017
Simplivity webinar presentation
Make a Move to AWS Now
Serverless is the future... or is it?
Ad

Recently uploaded (20)

PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PDF
MCP Security Tutorial - Beginner to Advanced
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PDF
Time Tracking Features That Teams and Organizations Actually Need
PPTX
assetexplorer- product-overview - presentation
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
DOCX
How to Use SharePoint as an ISO-Compliant Document Management System
PPTX
GSA Content Generator Crack (2025 Latest)
PPTX
Computer Software and OS of computer science of grade 11.pptx
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PDF
Website Design Services for Small Businesses.pdf
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
PPTX
Tech Workshop Escape Room Tech Workshop
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PDF
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
PDF
Salesforce Agentforce AI Implementation.pdf
PPTX
CNN LeNet5 Architecture: Neural Networks
PPTX
Introduction to Windows Operating System
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
MCP Security Tutorial - Beginner to Advanced
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
Why Generative AI is the Future of Content, Code & Creativity?
Time Tracking Features That Teams and Organizations Actually Need
assetexplorer- product-overview - presentation
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
How to Use SharePoint as an ISO-Compliant Document Management System
GSA Content Generator Crack (2025 Latest)
Computer Software and OS of computer science of grade 11.pptx
Weekly report ppt - harsh dattuprasad patel.pptx
Website Design Services for Small Businesses.pdf
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
Tech Workshop Escape Room Tech Workshop
Topaz Photo AI Crack New Download (Latest 2025)
How Tridens DevSecOps Ensures Compliance, Security, and Agility
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
Salesforce Agentforce AI Implementation.pdf
CNN LeNet5 Architecture: Neural Networks
Introduction to Windows Operating System

Autoscaling near-persistent EBS

  • 1. Autoscaling near- persistent EBS Emil Filipov AWS Meetup Berlin| November, 2017
  • 2. HERE Technologies • The company for location intelligence! • 7000+ employees worldwide • 1000+ employees in Berlin • Navteq (2004), merged into Nokia (2011), split out as HERE (2015) • Privately owned by a consortium of Audi, BMW, Daimler, Intel and others • One of the biggest AWS customers • We’re hiring! https://www.here.com/careers © 2017 HERE | PublicAutoscaling near-persistent EBS | November, 20172
  • 3. About me • Software Engineer @HERE Technologies • Certified AWS wrangler • Python zealot • Professional Slavic Pessimist • Working on AWS-centric deployment platforms and automation © 2017 HERE | PublicAutoscaling near-persistent EBS | November, 20173
  • 4. 01 History …all good things come out of problems
  • 5. The Platform • Dozen identical PuppetMaster EC2 instances in 2 regions • Delivering configuration/changes to several thousand nodes • Each node polling PMs every few minutes • Covering ~100 applications with multiple environments for each • Working across AWS accounts © 2017 HERE | PublicAutoscaling near-persistent EBS | November, 20175
  • 6. The Configuration Path • Each app lives in it’s own SCM tree • Each tree sports multiple tags and branches • Each commit to a tree tag/branch is put through a CI system • The artifact out of CI is the state of the tag/branch at that commit • …in the form of an RPM package • PMs install new RPM packages continuously from a central repo © 2017 HERE | PublicAutoscaling near-persistent EBS | November, 20176
  • 7. The Problem • For a PM to be functional it needs to install thousands of RPMs • Booting a fresh PM instance takes hours!  No dynamic scale  Painful updates & maintenance  Burning cycles doing the same thing over and over again • Smells Like Datacenter Spirit © 2017 HERE | PublicAutoscaling near-persistent EBS | November, 20177
  • 9. So, I bet you… • Described in exact detail the pain points of the current system • Gathered the requirements a potential solution should fulfill • Conducted scientific research in the space of possible solutions • Compiled reports of the performance characteristics, pros and cons • After a vigorous, but objective discussion, a natural winner emerged Sure… © 2017 HERE | PublicAutoscaling near-persistent EBS | November, 20179
  • 10. How about we boot off a backed-up state? • How are we going to do/store backups?  EBS Snapshots! • Who’s gonna take those Snapshots?  Lambda! • How frequently do we take Snapshots?  Every minute! • Who’s gonna make new EBS volumes and attach them to instances?  Puppet! (Oh well, let me put a second RPM DB on that EBS volume) © 2017 HERE | PublicAutoscaling near-persistent EBS | November, 201710
  • 12. The DNA © 2017 HERE | PublicAutoscaling near-persistent EBS | November, 201712
  • 13. Basic implementation • Lambda function  Creates a single Snapshot off the oldest running instance  Tag new snapshot with expiration time  Delete all expired snapshots (but never if there is only 1 standing) • Configuration management (Puppet) automation  Create EBS volume off the freshest Snapshot  Wait until volume is available  Attach  Mount © 2017 HERE | PublicAutoscaling near-persistent EBS | November, 201713
  • 14. Achievement unlocked! • Boot times reduced from hours to minutes • Lambda packaged together with the PM CloudFormation stack • Very few settings = simple to add to an existing stack • No penalty for the working instances in IO latency or CPU • Little cost overhead  Lambda: less than $0.30/month (once per minute execution)  Snapshots: (1 * DataSize + ChangeRate) * S3 price © 2017 HERE | PublicAutoscaling near-persistent EBS | November, 201714
  • 15. Here Be Dragons! • EBS is only crash-consistent; fsck is a must • App should ideally be able to handle crashes (but RPM DB does not!)  Prepare for corrupted data on disk! • Have to handle EBS leaks separately • Some persistency is lost between the snapshots • One more component to monitor © 2017 HERE | PublicAutoscaling near-persistent EBS | November, 201715
  • 16. 04 Now that I have a Hammer …I see a lot of nails
  • 17. When is the near-persistent EBS pattern good? You can incrementally sync-up data You have a lot of static data You can afford some data loss • You can do “cheap” sync-up from peers or external location • Small or temporary differences in the dataset are not problematic • Example: Replicated CouchDB, read replicas, resilient distributed storage systems in general • Your application needs to load a lot of data on startup • Data is too big to conveniently use an AMI or download off central storage • Examples: ML corpus, Map Data  • Data is volatile and losing a few minutes worth is not problematic • You’re sure of your app’s ability to deal with filesystem crashes • Examples: some metric systems, continuous training (think search term autocomplete) © 2017 HERE | PublicAutoscaling near-persistent EBS | November, 201717
  • 18. Comparison with other storage strategies Pattern Near-Persistent EBS EFS/NFS “Golden” EBS Fully persistent -   Multi-instance access   - Single writer  -  Linear performance  - - AZ independent   - © 2017 HERE | PublicAutoscaling near-persistent EBS | November, 201718
  • 20. Source code coming soon at © 2017 HERE | PublicAutoscaling near-persistent EBS | November, 201720 https://github.com/heremaps
  • 21. Emil Filipov Email: emil.filipov (here.com) IRC: tie (FreeNode) Thank you!