Błażej Kasperczyk, Kraków, 05.10.2017
Hey, let's build a PaaS Cloud!
...it's easy, right?
Let's build a PaaS platform, how hard could it be?
Team PaaS...
3
• DevOps team
• Develop and maintain the
Platform
• Backend-oriented
• Python 3.x, Tornado
"With the friends you have in your team, you don't really need enemies!"
• Approx. 2300 VMs of varying sizes
• 1400 active applications, 600 of them in Python3.x
• 9000+ running instances
• ...a third of it is Python, Tornado-based applications
...And our little cloud – and what runs on it
4
• Push-button deployment
• Scale by available resources and the amount of applications
• Quick application installation with our build system
• Communications bus between applications
The PaaS layer
5
The slow start
• Work started in 2011
• Python2.7 + GEvent
• Works over SSH
• Push model
• Hard limit: 10 applications on
each vm
• ...and it works!
6
The inevitable
• Approx. 150 VMs max
• 300 VMs becomes a hard
limit that cannot be bypassed
• A single point of failure
7
While the panel was primitive, "papyrus" was a top trending colour!
• In place of the old orchestrator – a table of states and a coordinator
• An API that exposes what needs to be done to reach the desired state
• A daemon running on the VM handles the rest
• It's 2013 - let's be modern, let's do it in Python3!
A moment of reinvention:
What if we use our cloud, to scale our cloud?
8
Scoreboard
• Coordinates cloud management
• PostgreSQL backend
• Responsible for provisioning
• Supports over 2000 machines...
• ...each querying multiple times
every minute...
• ...currently.
• It can rebuild itself in case of a
database failure
9
Agent daemon
• Runs on the VM it manages
• Automatically launched with
each new VM
• Launches and maintains
applications
• Reports statistics for monitoring
purposes
• Allows the developer to
remotely shut the application
down
10
Density problems
• Over-taxing VMs causes
performance issues
• As it is, the allocation is hit
and miss.
11
Weight balancing
• Each VM has a capacity limit
• Each application declares its size
• Light (White/Green)
• Medium (Yellow)
• Heavy (Red)
• ...that should do it, right?
12
Oversized cats
• A worker can have spikes of
100% CPU usage and 10%
averaged.
• An application can declare
high usage but be harmless.
13
The RnD
• Docker?
• LXC?
• ...CGroups?
14
Docker
• Requires a major overhaul of
our application building and
deployment...
• ...and will actually do what
we already have.
15
LXC
• Current architecture requires
a lack of network translation
between the Agent and
Application...
• ...and that caused issues
when launching applications
16
CGroups!
• The same mechanism that is
used by most containers
• Automatic cleanup
• Simplicity of the solution
17
• Applications in the cloud no longer exceed their assigned resources
• CPU is limited for each instance
• OOMKiller kicks in for memory-heavy applications that tries to exceed its limits
Everything is now in a box...
18
• Time does not stop, or that time we went Xenial and got eaten by SystemD
• The Damocles' sword called "Impending Knapsack Problem"
• Autoscaling
• ...and a few other things
...time to relax, right?
19
As a side effect, we actually made a sane frontend.
Let's build a PaaS platform, how hard could it be?

More Related Content

PPTX
BDM37 - Simon Grondin - Scaling an API proxy in OCaml
PPTX
Alex Fishman - Virtualizing the Cloud
PDF
Rails On AWS - RubyFools Copenhagen 2008 by Jonathan Weiss
PDF
Clouds presentation, aws meetup v2
PDF
OSv presentation from Linux Foundation Collaboration Summit
PPT
OpenNebula Administrator View
PPTX
OSCONF Hyderabad - Shorten all URLs!
PPTX
OSv: probably the best OS for cloud workloads you've never hear of
BDM37 - Simon Grondin - Scaling an API proxy in OCaml
Alex Fishman - Virtualizing the Cloud
Rails On AWS - RubyFools Copenhagen 2008 by Jonathan Weiss
Clouds presentation, aws meetup v2
OSv presentation from Linux Foundation Collaboration Summit
OpenNebula Administrator View
OSCONF Hyderabad - Shorten all URLs!
OSv: probably the best OS for cloud workloads you've never hear of

What's hot (20)

PPTX
Firehose
PPTX
Monitoring the unknown, 1000*100 series a day - Big Data Vilnius 2017
PPTX
Problems you’ll face in the Microservices World: Configuration, Authenticatio...
PDF
Adventures in Research
KEY
Crash reports pycodeconf
PDF
How to stuff a 900 pound gorilla into a smartphone
PPTX
Webinar patterns anti patterns
PPTX
MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017
PPTX
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
PDF
State of the CLI- Kat Marchan
PPTX
Hang fire
PPTX
Introduction to Vagrant
PPTX
Production ready Vert.x
PDF
Unikernels: Rise of the Library Hypervisor
PDF
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
PPTX
Build a reverse proxy for modern immutable infrastructure - Sozu - Devops D D...
PPTX
MySQL Multi-Master Replication
PDF
Queue Everything and Please Everyone
PDF
OpenNebula Conf 2014 | Lightning talk: OpenNebula Puppet Module - Norman Mess...
Firehose
Monitoring the unknown, 1000*100 series a day - Big Data Vilnius 2017
Problems you’ll face in the Microservices World: Configuration, Authenticatio...
Adventures in Research
Crash reports pycodeconf
How to stuff a 900 pound gorilla into a smartphone
Webinar patterns anti patterns
MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
State of the CLI- Kat Marchan
Hang fire
Introduction to Vagrant
Production ready Vert.x
Unikernels: Rise of the Library Hypervisor
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
Build a reverse proxy for modern immutable infrastructure - Sozu - Devops D D...
MySQL Multi-Master Replication
Queue Everything and Please Everyone
OpenNebula Conf 2014 | Lightning talk: OpenNebula Puppet Module - Norman Mess...
Ad

Similar to Let's build a PaaS platform, how hard could it be? (20)

PPTX
Docker Swarm secrets for creating great FIWARE platforms
PDF
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
PDF
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
PDF
FreeSWITCH as a Microservice
PDF
Deploying Containers in Production and at Scale
PPTX
Containers and Docker
PDF
Introduction to Apache Mesos and DC/OS
PDF
Containing the world with Docker
PDF
Dev Ops without the Ops
PPTX
Microservices pros and cons dark
PPTX
Flexible compute
PPTX
Sanger, upcoming Openstack for Bio-informaticians
PDF
Network Stack in Userspace (NUSE)
PDF
Rami Sayar - Node microservices with Docker
PPTX
State of the Container Ecosystem
PPTX
Cloud computing components
PDF
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
PPTX
Cloud computing & lamp applications
PDF
Telepresence - Fast Development Workflows for Kubernetes
Docker Swarm secrets for creating great FIWARE platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
FreeSWITCH as a Microservice
Deploying Containers in Production and at Scale
Containers and Docker
Introduction to Apache Mesos and DC/OS
Containing the world with Docker
Dev Ops without the Ops
Microservices pros and cons dark
Flexible compute
Sanger, upcoming Openstack for Bio-informaticians
Network Stack in Userspace (NUSE)
Rami Sayar - Node microservices with Docker
State of the Container Ecosystem
Cloud computing components
[Capitole du Libre] #serverless -  mettez-le en oeuvre dans votre entreprise...
Cloud computing & lamp applications
Telepresence - Fast Development Workflows for Kubernetes
Ad

More from DreamLab (14)

PDF
DreamLab Academy #12 Wprowadzenie do React.js
PPTX
Selenium WebDriver Testy Automatyczne w Pythonie | DreamLab Academy #8
PDF
Intro to React | DreamLab Academy
PDF
Subtelna sztuka optymalizacji
PDF
Podstawy JavaScript | DreamLab Academy #7
PPSX
Wdrażanie na wulkanie, czyli CI w świecie który nie znosi opóźnień.
PPSX
Gdy testy to za mało - Continuous Monitoring
PPTX
Intro to JavaScript | Wstęp do programowania w Java Script | DreamLab Academy #4
PDF
Intro to Redux | DreamLab Academy #3
PDF
Quick start with React | DreamLab Academy #2
PPTX
About Motivation in DevOps Culture
PPTX
Continuous Integration w konfiguracji urządzeń sieciowych
PPSX
Real User Monitoring at Scale @ Atmosphere Conference 2016
PPTX
DevOps at DreamLab
DreamLab Academy #12 Wprowadzenie do React.js
Selenium WebDriver Testy Automatyczne w Pythonie | DreamLab Academy #8
Intro to React | DreamLab Academy
Subtelna sztuka optymalizacji
Podstawy JavaScript | DreamLab Academy #7
Wdrażanie na wulkanie, czyli CI w świecie który nie znosi opóźnień.
Gdy testy to za mało - Continuous Monitoring
Intro to JavaScript | Wstęp do programowania w Java Script | DreamLab Academy #4
Intro to Redux | DreamLab Academy #3
Quick start with React | DreamLab Academy #2
About Motivation in DevOps Culture
Continuous Integration w konfiguracji urządzeń sieciowych
Real User Monitoring at Scale @ Atmosphere Conference 2016
DevOps at DreamLab

Recently uploaded (20)

PPTX
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
DOC
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
PDF
E-Commerce Website Development Companyin india
PPTX
GSA Content Generator Crack (2025 Latest)
PDF
MCP Security Tutorial - Beginner to Advanced
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PPTX
Introduction to Windows Operating System
PPTX
Download Adobe Photoshop Crack 2025 Free
PDF
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
PPTX
Computer Software - Technology and Livelihood Education
PPTX
MLforCyber_MLDataSetsandFeatures_Presentation.pptx
PPTX
Full-Stack Developer Courses That Actually Land You Jobs
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PDF
Microsoft Office 365 Crack Download Free
PDF
Website Design Services for Small Businesses.pdf
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PPTX
Airline CRS | Airline CRS Systems | CRS System
PDF
iTop VPN Crack Latest Version Full Key 2025
PPTX
"Secure File Sharing Solutions on AWS".pptx
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
E-Commerce Website Development Companyin india
GSA Content Generator Crack (2025 Latest)
MCP Security Tutorial - Beginner to Advanced
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
Introduction to Windows Operating System
Download Adobe Photoshop Crack 2025 Free
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
Computer Software - Technology and Livelihood Education
MLforCyber_MLDataSetsandFeatures_Presentation.pptx
Full-Stack Developer Courses That Actually Land You Jobs
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
Microsoft Office 365 Crack Download Free
Website Design Services for Small Businesses.pdf
DNT Brochure 2025 – ISV Solutions @ D365
Topaz Photo AI Crack New Download (Latest 2025)
Airline CRS | Airline CRS Systems | CRS System
iTop VPN Crack Latest Version Full Key 2025
"Secure File Sharing Solutions on AWS".pptx

Let's build a PaaS platform, how hard could it be?

  • 1. Błażej Kasperczyk, Kraków, 05.10.2017 Hey, let's build a PaaS Cloud! ...it's easy, right?
  • 3. Team PaaS... 3 • DevOps team • Develop and maintain the Platform • Backend-oriented • Python 3.x, Tornado "With the friends you have in your team, you don't really need enemies!"
  • 4. • Approx. 2300 VMs of varying sizes • 1400 active applications, 600 of them in Python3.x • 9000+ running instances • ...a third of it is Python, Tornado-based applications ...And our little cloud – and what runs on it 4
  • 5. • Push-button deployment • Scale by available resources and the amount of applications • Quick application installation with our build system • Communications bus between applications The PaaS layer 5
  • 6. The slow start • Work started in 2011 • Python2.7 + GEvent • Works over SSH • Push model • Hard limit: 10 applications on each vm • ...and it works! 6
  • 7. The inevitable • Approx. 150 VMs max • 300 VMs becomes a hard limit that cannot be bypassed • A single point of failure 7 While the panel was primitive, "papyrus" was a top trending colour!
  • 8. • In place of the old orchestrator – a table of states and a coordinator • An API that exposes what needs to be done to reach the desired state • A daemon running on the VM handles the rest • It's 2013 - let's be modern, let's do it in Python3! A moment of reinvention: What if we use our cloud, to scale our cloud? 8
  • 9. Scoreboard • Coordinates cloud management • PostgreSQL backend • Responsible for provisioning • Supports over 2000 machines... • ...each querying multiple times every minute... • ...currently. • It can rebuild itself in case of a database failure 9
  • 10. Agent daemon • Runs on the VM it manages • Automatically launched with each new VM • Launches and maintains applications • Reports statistics for monitoring purposes • Allows the developer to remotely shut the application down 10
  • 11. Density problems • Over-taxing VMs causes performance issues • As it is, the allocation is hit and miss. 11
  • 12. Weight balancing • Each VM has a capacity limit • Each application declares its size • Light (White/Green) • Medium (Yellow) • Heavy (Red) • ...that should do it, right? 12
  • 13. Oversized cats • A worker can have spikes of 100% CPU usage and 10% averaged. • An application can declare high usage but be harmless. 13
  • 14. The RnD • Docker? • LXC? • ...CGroups? 14
  • 15. Docker • Requires a major overhaul of our application building and deployment... • ...and will actually do what we already have. 15
  • 16. LXC • Current architecture requires a lack of network translation between the Agent and Application... • ...and that caused issues when launching applications 16
  • 17. CGroups! • The same mechanism that is used by most containers • Automatic cleanup • Simplicity of the solution 17
  • 18. • Applications in the cloud no longer exceed their assigned resources • CPU is limited for each instance • OOMKiller kicks in for memory-heavy applications that tries to exceed its limits Everything is now in a box... 18
  • 19. • Time does not stop, or that time we went Xenial and got eaten by SystemD • The Damocles' sword called "Impending Knapsack Problem" • Autoscaling • ...and a few other things ...time to relax, right? 19 As a side effect, we actually made a sane frontend.