SlideShare a Scribd company logo
Airbnb’s Great Migration: 

From Monolith to Service-Oriented
JESSICA TAI / 11.06.18 / QCON SF
• Airbnb 2014 engineering team ~ 90 people
• When gong rang, all engineers go to desk (Majority coded in the Monolith, help debug incident) — gong ringing = site was down
• Now, 2018, > 1000 engineers, 10x growth since I joined 4 years ago
My name is Jessica and I’m an ex-Monolith engineer :] I am currently on our Core Services infrastructure team, which builds the foundation for the
migration to services.
I’ll be discussing how Airbnb managed to scale its engineering team by redesigning its technical architecture.
Life with monolith Growing pains
Service design 

principlesMigration to services
Best practices Results
Begin journey describing how & why stayed with monolith many years
Life with monolith Growing pains
Service design 

principlesMigration to services
Best practices Results
Then describe pain points experienced as the eng team grew rapidly
Life with monolith Growing pains
Service design 

principlesMigration to services
Best practices Results
Next, cover design principle for service building so we could create services in a consistent and scalable manner
Life with monolith Growing pains
Service design 

principlesMigration to services
Best practices Results
Cover how we got started with the migration and safeguards for safely transitioning functionality into services
Life with monolith Growing pains
Service design 

principlesMigration to services
Best practices Results
Best practices developed from mistakes learned during early migration steps
Life with monolith Growing pains
Service design 

principlesMigration to services
Best practices Results
Share some of the results we’ve seen so far
Monorail,ourRubyonRailsmonolith
What is a monolith?
• Single-tier unit
• Responsible for client and server-side functionality
• Model, view controller (MVC) in same repository
Easytogetstartedwithamonolith
EARLY AIRBNB
Client
Monolithic
application
Database
Pros
• Quick to bootstrap
• Convenient to develop
• Simple integration testing
Airbnb’s experience
• Perfect for small team in 2008
• Simple to manage, quick to iterate
Database model



host.first_name, message.save!

2014newhiretask
REQUIRED MESSAGE TO HOST
One of my first new hire tasks was to require the guest to write a message to the host
View template



<h1>Tell your host %{host.name}, 

“Hello”</h1>
2014newhiretask
REQUIRED MESSAGE TO HOST
Controller endpoint logic



/submit_booking

Model View Controller in Monorail
2014newhiretask
REQUIRED MESSAGE TO HOST
All changes in single application, Monorail
Volunteerdevinfra

&sysops
• Architecture relatively simple to manage
• Developer infrastructure projects were volunteer-based
• Site oncall sysops was purely volunteer engineers
And life was simple, eng productivity
We were happy
WHY DECIDE TO MIGRATE?
If the Monolithic life was so great, why spend the time and effort to migrate?
• This talk also nature lesson identifying animals that migrate
• Artic tern bird migrates 1.5 Million miles during lifetime
• 3x to moon and back
Architecture migration is like a million mile journey
x
Difficulttoscalemonoliths
SINGLE MEGA-SERVICE FOR ALL CONCERNS
Client
ModelA ModelB
ModelCModelD
ConcernBConcernA
ConcernD ConcernC
Database
Monolith
Reached a point where difficult for us to scale Monorail
Monolith can theoretically have well defined encapsulation / services. But… a monolith doesn’t enforce encapsulation.
Difficulttoscalemonoliths
SINGLE MEGA-SERVICE FOR ALL CONCERNS
Client
ModelA ModelB
ModelC
ModelD
ConcernBConcernA
ConcernD
ConcernC
Database
Monolith
Airbnb experience
• part of the problem was also a lack of strong architecture
Tight coupling
• Modules highly dependent on one another.
• Module assumes too many responsibilities
• Or when one concern is spread over many modules instead own.
-
Difficulttoscalemonoliths
SINGLE MEGA-SERVICE FOR ALL CONCERNS
Client
ModelA ModelB
ModelC
ModelD
ConcernB
ConcernC
Database
Monolith
ModelE
ConcernA
ConcernD
• Spaghetti entanglement
• Hard to navigate, code, debug
• Single database - more dependencies —> less reliable
• Rapidly growing codebase
1,000,000
2,000,000
3,000,000
4,000,000
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
IncreasinglylargeMonorail
LINES OF CODE GROWTH
GROWING PAINS
• Monorail lines of code kept increasing
• Started feeling growing pains as eng team and Monorail grew
• Reduced developer productivity
Growingpains
200+ 200 15h
Commits deployed to
Monorail per day
Engineers at Airbnb Time Monorail
deployed blocked per
week
Monorail production deploys blocked 15 hours on average per week due to rollbacks or reverts
Messagetohost
MANY TEAMS
s
• Checkout page revamped UI
• Ownership & accountability tricky when multiple teams overlap on same product
Attempts to aid: mandatory reviewers per file/directory
Helped a bit, but per-file level still many teams
Messagetohost
THOUSANDS OF LINES, HUNDREDS OF CONTRIBUTORS
Messagemodel
Files size grew quickly - thousands of lines, hundreds of contributors, multiple teams, but hard to refactor
Moreincidents Slowerdeploytrains
Moreincidents Slowerdeploytrains
Deploy train pains became worse
• Airbnb has value of “democratic deploys” - every eng empowered & responsible to deploy own code to prod and test it
• Due to magnitude of hours for Monorail deploys, I deployed in morning before other engineers got into work to reduce my chances of merge
conflicts or revert delays
• Eng frustrated
• Dev productivity lower
• Ownership & accountability unclear
Oursolution:Service-orientedarchitecture(SOA)
NETWORK OF LOOSELY-COUPLED SERVICES
Client
API gateway
Service1
Service2
Database2Database3
Service3
SOA Advantages
• Build & deploy per service
• Scale independently
• Parallelization
• Defined ownership
CheckoutpageinSOA
Business travel service
Cancellation service
Home demand service
Pricing service
Home service
Reservation service
Review service
Messaging service
Pull out business logic into separate services
Now seems like a lot of services! —> New spaghetti mess?
Confident Airbnb could give the migration a shot as many other companies successfully transitioned:
• Netflix
• Amazon
• Twitter
• Uber
SOA DESIGN TENETS
• Wanted shared principles & understanding of service to guide us
• Penguins also have shared understanding, migrate with all of the colonies meeting at the same place at the same time
• Shared design tenets for eng to build services consistently
Servicesownreads&writes

totheirdata
• Single gatekeeper with access to the data
• Any service interested in a particular dataset must go through the gatekeeper service’s API
• Data consistency
• Encapsulation & isolation
Servicesaddressaspecificconcern
• Avoid creating another monolith
• But don’t go too far the other way and create a polylith
• Service must have a large enough but focused scope
Avoidduplicatefunctionality
https://www.flickr.com/photos/popilop/331357312
• Shared services and library
• Easier maintainability
Mutationspropagate

viastandardevents

(opensourcedSpinaltap)
• In Rails, callbacks executed as hooks in CRUD lifecycle (Example: when reservation transaction complete, then mark home as unavailable for
the reservation dates)
• Mutation publishing allows services to be aware w/o directly accessing data
• SpinalTap (open source https://medium.com/airbnb-engineering/capturing-data-evolution-in-a-service-oriented-architecture-72f7c643ee6f)
Change Data Capture service
• Detect mutation, diff sources; emit std evt w/ low latency
GETTING STARTED WITH
THE MIGRATION
• Monarch butterflies have longer migration cycle than their life cycle
• No one butterfly makes the entire trip
• Similarly, our approaches to migration evolved:
○ Initial ideas were not what we decided to implement
○ Initial services are not examples of current best practices
HOSTED BY BAILEY  
Idyllic home in the trees
$99 PER NIGHT
• Begin at foundation — something impacting the whole site
• Picked Homes data model
• Almost every feature back then involved a Home
! Replace data access methods with service call
! Ruby metaprogramming to override methods
Firstattemptsto

breakapartMonorail
• Considered replacing callsites with Remote Procedural Call (RPC) - network calls, but there were thousands of such ActiveRecord data access
methods
• Metaprogramming override worked for a few months but realized…
○ Relations with other models complex and expensive
○ JOIN queries tricky/inefficient
○ Tangled dependencies
Database
Home.find_by_host_id(4)
ActiveRecord
ActiveRecord adapter
Mysqladapter
Monolith
MigratingRails’sActiveRecord
select * from homes where host_id = 4
ActiveRecord = Ruby library read & write business objects to relational db
Existing pathway:
1. ActiveRecord method (Home.find_by_host_id to load homes data by host id = 4)
2. Wrapped into ActiveRecord
3. Translated to raw sql query
4. Sent via our default mysql ActiveRecord adapter straight to the database
Monolith
CustomActiveRecordadapter
Home.find_by_host_id(4)
ActiveRecord
query object
ActiveRecord adapter
select * from homes where host_id = 4
Airbnb created a custom ActiveRecord adapter to parse the raw sql string into query object
Parsetorequestobject
ActiveRecord adapter
:type => :select,
:table => “homes”,
:filters => [{
:name => “host_id”,
:type => "integer",
:nullable => false,
:comparator => :eq,
:value => 4,
}],
:select => [“id”, “host_id”, “title”],
Monolith
/loadHomes
{
host_id: 4,
fields: [“id”,
“host_id”,“title”]
}
query object
Query object from custom ActiveRecord adapter identified key parts of the query
• Type of query = select for “select * from homes…”
• Table name = homes
• Filters = where clause
• Select = fields of mysql table in being selected in the query (in this case, the example only has 3 fields in the homes table)
Re-routequeriestoservices
ActiveRecord adapter
query object
Database
Request

{host_id: 4
…}
Home service
Monolith
Seems like a roundabout way to get data… but migrating to service calls at the lowest level (raw SQL strings) had benefits:
• Create flexible service api to support raw sql request patterns
• Proof don’t rewrite 1000s callsites
• Test critical service in core booking flow can handle load Monorail was supporting
• Product engineers can still code with ActiveRecord methods (no change to their existing development practices, just change how data retrieved
under the hood)
SERVICE INTERACTION DESIGN
After initial services, we sought to define how services interact with each other.
Jellyfish migrate by interacting with sunlight, following it from east to west
Similarly, we wanted a strict direction for service requests and interaction with other services.
Servicerequest
API gateway
Service
Monorail
Service
Interim Future
API traffic
SOA requests propagated in specific directions throughout network
Requests originate from mobile or web clients
Servicetypes
STRICT FLOW OF DEPENDENCIES
Presentation service
Data service
Database
Derived data service
Derived data
store
Own reads & writes
to data entities
Shared business

logic to disparate 

data sources
Synthesize data
from services for
end users
Middletier
Shared
validation 

logic
API traffic
• Data service
○ Gatekeeper for model
• Derived data service
○ Product feature-related, shared across multiple contexts
• Presentation service
○ Logic for data end user sees
• Middle tier service
○ Need developed later after building out more of the SOA network
Checkoutpage
Checkout page
presentation service
Reservation data
service
Reservation
database
Home demand
derived data service
Offline booking
trend stats
Reservationvalidation
middle-tierservice
Home data service
Home
database
Write
Read
API traffic
• Separation of concerns
• Home demand derived data - only get reservation, homes data needed for demand feature (reservation dates, home location)
• Checkout presentation - get data needed for product feature user sees
○ May read different data from derived data service
○ Example: home name shown on checkout page but not needed for demand statistics
COMPARE FOR DIFFERENCES
• Migrate with no intended functionality change
• Ensure no breakage —> Slowly migrate piecemeal
• Walruses migrate by swimming or riding floating ice sheets
• Same functionality but 2 different transport methods moving towards breeding ground
• Compare new data service access pattern with existing monolith
Dualreadcomparison
WaitRamp&waitCompareGate
Admin UI
configuration
1% traffic Gradual
increments
Gather traffic
patterns
All traffic through
service only
Switch
Monolith
Service
Database
Read path A
Read path B
Reads are idempotent - can issue multiple identical requests and will have the same effect as issuing a single request
• Compare path A’s responses against path B’s response
• Gate dual reads with switch configured by admin UI tool
• With click of a button can ramp dual read traffic up or down
• Can turn off immediately in UI without code changes, review, and deployment needed
Dualreadcomparison
WaitRamp&waitCompareGate
Admin UI
configuration
1% traffic Gradual
increments
Gather traffic
patterns
All traffic through
service only
Switch
Monolith
Service
Database
Read path A
Read path B
Begin with a conservative, small amount of production traffic and compare responses looking for mismatches.
Dualreadcomparison
WaitRamp&waitCompareGate
Admin UI
configuration
1% traffic Gradual
increments
Gather traffic
patterns
All traffic through
service only
Switch
Monolith
Service
Database
Read path A
Read path B
Gradually ramp up while comparing the responses for mismatches along the way.
Dualreadcomparison
WaitRamp&waitCompareGate
Admin UI
configuration
1% traffic Gradual
increments
Gather traffic
patterns
All traffic through
service only
Switch
Monolith
Service
Database
Read path A
Read path B
5%
Production

traffic
Dualreadcomparison
WaitRamp&waitCompareGate
Admin UI
configuration
1% traffic Gradual
increments
Gather traffic
patterns
All traffic through
service only
Switch
Monolith
Service
Database
Read path A
Read path B
10%
Production

traffic
Dualreadcomparison
WaitRamp&waitCompareGate
Admin UI
configuration
1% traffic Gradual
increments
Gather traffic
patterns
All traffic through
service only
Switch
Monolith
Service
Database
Read path A
Read path B
25%
Production

traffic
Dualreadcomparison
WaitRamp&waitCompareGate
Admin UI
configuration
1% traffic Gradual
increments
Gather traffic
patterns
All traffic through
service only
Switch
Monolith
Service
Database
Read path A
Read path B
50% Production

traffic
Dualreadcomparison
WaitRamp&waitCompareGate
Admin UI
configuration
1% traffic Gradual
increments
Gather traffic
patterns
All traffic through
service only
Switch
Monolith
Service
Database
Read path A
Read path B
100%
Production

traffic
Dualreadcomparison
WaitRamp&waitCompareGate
Admin UI
configuration
1% traffic Gradual
increments
Gather traffic
patterns
All traffic through
service only
Switch
Monolith
Service
Database
Read path A
Read path B
100%
While at 100%, wait some more!
• Gather enough traffic to cover all access patterns to your read path in migration
• Ensure your service can sustain 100% of Monorail’s path A traffic
Dualreadcomparison
WaitRamp&waitCompareGate
Admin UI
configuration
1% traffic Gradual
increments
Gather traffic
patterns
All traffic through
service only
Switch
Monolith
Service
Database
Read path A
Read path B
Once the comparisons look clean, cut over to reading only through your service and stop the dual reads.
Dualreadcomparison
WaitRamp&waitCompareGate
Admin UI
configuration
1% traffic Gradual
increments
Gather traffic
patterns
All traffic through
service only
Switch
Monolith
Service
Database
The read path
Writecomparison
DUAL WRITE TO SEPARATE DATABASES
Presentation service
Production
database
Shadow
database
Write validation
middle tier service
Write path A
Write path B
Reads
Monolith
Writes are not idempotent so cannot dual write to same database
s
Writecomparison:services
DUAL WRITE
Presentation service
Production
database
Write validation
middle tier service
The write path
Monolith
Incrementalmigration
• Valued “Democratic deploy”
○ Each engineer responsible for testing and deploying their changes through to production successfully
• Monorail deploys were a big pain point -> services alleviate this pain
○ Option 1: Build service 100% functionality in shadow then switch
○ Option 2: Build and migrate as more functionality built
○ Airbnb picked option 2
• Goal: get teams to be service owners ASAP
! Compare one endpoint at a time
! Unblock clients with incomplete service
○ e.g. /loadUsers
○ fetch users only by id
Migratebyendpoint
• User service started with one /loadUsers endpoint
○ only loaded users by id for one MySQL table
• Onboarded, unblocked 10 clients while adding more support for various data sources (e.g. more user-related MySQL tables), more endpoints
(e.g. /updateUser), more query patterns (e.g. load users by email)
Migratebyattribute
Service Monolith
Database
Read migrated

attributes
Read not-yet-migrated 

attributes
Database
Presentation service
Production traffic
• Presentation services
○ Not all attributes required by presentation may not be currently supported by a service
○ Hydrate SOA supported attributes from services
○ Unsupported attributes still hydrated from Monorail
• Gets traffic through presentation service ASAP
• Incremental changes —> more cautious
• Remember we hit the gong when the site was down?
○ Gong hit during initial attempts to migrate
○ First services had rough patches … are not the poster child services of best practices now, but we gained valuable knowledge of what not to
do!
SOA BEST PRACTICES
Learning from gong-ringing experiences, developed best practices
Wildebeest have dangerous migration path
• developed best practices for keeping their young safe
• place them in center of pack
Use best practices to keep services alive and healthy
Frameworks
Auto-generate code
Testing&deploying
Replay production traffic
Observability
Standard templates
Standardizeservicebuilding
CONSISTENCY
Service
Service&clientsetup
Business logic
Service created with the purpose of supporting specific business functionality or data
Needs some additional setup
Service
Service&clientsetup
Business logic
Endpoint logic
Server
transport
Add endpoint exposed for clients
Service
Service&clientsetup
Business logic
Endpoint logic
Server
transport
Java client
Ruby client
Client
transport
Client
transport
Manually need to write both Java and Ruby clients
This is the minimal set up to get query a service and get a response
Service
Service&clientsetup
Business logic
Server metrics
Server
diagnostics
Startup /
teardown
Endpoint logic
Metrics
Data
validation
Server
transport
Server
resilience
Java client
Ruby client
Metrics
Client
transport
Data
validation
Error
handling
Resilience
Metrics
Client
transport
Data
validation
Error
handling
Resilience
Type
checking
This service will be in production — need to add more to the service & client setup
• Metrics
• Data validation
• Error handling
• Resilience
• Type checking for Ruby
○ Type differences between Java & Ruby have been problematic for us before
Service
Endpoint logic
Service&clientsetup
Business logic
Server metrics
Server
diagnostics
Startup /
teardown
Dashboard
Dashboard
Alert
Alert
Alert
Runbook
documentation
Metrics
Data
validation
Server
transport
Server
resilience
Java client
Ruby client
Metrics
Client
transport
Data
validation
Error
handling
Resilience
Metrics
Client
transport
Data
validation
Error
handling
Resilience
Type
checking
Production needs more pieces to operate & maintain the services:
• Dashboard
• Alerts
• Runbook documentation
Service
Endpoint logic
Service&clientsetup
Business logic
Server metrics
Server
diagnostics
Startup /
teardown
Dashboard
Dashboard
Alert
Alert
Alert
Runbook
documentation
Metrics
Data
validation
Server
transport
Server
resilience
Java client
Ruby client
Metrics
Client
transport
Data
validation
Error
handling
Resilience
Metrics
Client
transport
Data
validation
Error
handling
Resilience
Type
checking
But what the engineer really wanted to focus on (and what’s unique about this service) is the business logic and data it supports. Wouldn’t it be nice
if we didn’t need to manually write all the boilerplate and setup just to get a service started?
IDLusingThrift
• Invested in a services framework team
○ Their mission is to build the foundation for building and scaling services in a consistent and simple way for engineers.
○ Automate or configure best practices
• Aligned on Thrift as Defined best practices coding patterns
IDL
Service
Endpoint logic
Business logic
Server metrics
Server
diagnostics
Startup /
teardown
Dashboard
Dashboard
Alert
Alert
Alert
Runbook
documentation
Metrics
Data
validation
Server
transport
Server
resilience
Java client
Ruby client
Metrics
Client
transport
Data
validation
Error
handling
Resilience
Metrics
Client
transport
Data
validation
Error
handling
Resilience
Type
checking
• Interface Description Language (IDL)
○ Describe API in language-agnostic way
• Now only write business logic & IDL layer
• Rest autogenerated for free
Codingservices
BEFORE AFTER
! Difficult to create and 

maintain services

! Custom Java vs. Ruby clients
! Autogenerated code 

framework & API

! Automated ruby gem client
• [Before IDL] Weeks to create a new service with multiple changes in various repositories that needed to be deployed in the correct order
• Now, run one script and service is pingable and productionized within an hour
• “Make me a service” script sets up a lot for you automatically
○ Boilerplate service
○ Health endpoint
○ Deploy & testing configs
○ Cluster config
• Can create Ruby gems based off of the Thrift IDL for Ruby services to use with a click of a button in our admin UI
ThriftIDL
API FRAMEWORK
/* Batch request */

struct LoadSomeDataRequest {
1: optional set<i64> ids (non_null)
2: optional bool fooBar
}
/* id to data response */
struct LoadSomeDataResponse {
1: optional map<i64, SomeData> data
}
• Self-documenting API with Thrift structs
○ Standard response, request structure
○ Can look at any service’s thrift config to know API
• Strong typing (e.g. interface, storage, communication)
/* /loadSomeData endpoint */

LoadSomeDataResponse loadSomeData


(1: LoadSomeDataRequest request)

throws (1: SomeException exception1)


(accept_replay = "true", rate_limit = "true")
ThriftIDL
API FRAMEWORK
! Unified client for Java & Ruby
! Simple annotations to autogenerate features
Testing&deploying
BEFORE AFTER
! Uncertainty in pre-production

environments

! Trigger manual requests
! Structured pre-production

process

! Automated replayed traffic
Previously testing involved manual curling to trigger a request to our service. Now, we have more automated tools to help us with this.
Testing&deploying
TIMELINE
ProductionCanaryDiffyStagingLocaldev
Dev
environment
supports local
services
Testing&deploying
TIMELINE
Replayed
production
traffic with
other staging
services
ProductionCanaryDiffyStagingLocaldev
Testing&deploying
TIMELINE
ProductionCanaryDiffyStagingLocaldev
Compare
responses from
staging against
production
<Next, slight detour, dive into Diffy>
RegressionTesting
DIFFY
Staging
(new code)
Primary 

(old code)
Secondary
(old code)
Raw response
differences
Non-

deterministic
noise
Filtered
response
differences
Diffy
Replayed traffic
github.com/twitter/diffy
• New code vs. last known good code
• Filter out the noise from the raw response differences to get the changes that can be attributed to the new code just introduced on staging.
○ Helpful for detecting regression in existing endpoints
○ Useful for ensuring a change is reflected in the response if fixing a bug
Is Diffy SOA specific?
• No, but not practical in our Monorail.
• Monorail: too many endpoints with tightly coupled logic
• SOA-services: fewer, narrowly focused endpoints
Testing&deploying
TIMELINE
ProductionCanaryDiffyStagingLocaldev
Deploy to single
instance of
production
Testing&deploying
TIMELINE
ProductionCanaryDiffyStagingLocaldev
Confidently
deploy to prod
Observability
BEFORE AFTER
! Nonstandard metrics, 

dashboards

! Inconsistent alerts
! IDL templated metrics, 

dashboards

! IDL annotation alerts
Before: Debugging domain knowledge, search
• Metrics had uneven coverage, diff naming
• Dashboard inconsistent with completeness, correctness, up-to-date-ness, interpretation
Now, IDL annotations for alerts
• High p95 latency
• High error rate
• Low queries per second (QPS)
• Now: more consistent, quick understanding with templated graphs
• Each IDL service has the same graphs, toggled simply with a dropdown menu for service name
HOW IS THE MIGRATION
GOING SO FAR?
● A lot of work so far, are we done?
● Humpback whales longest migration of any mammal
● Whales & our migration have long, arduous journeys — long migrations take time.
Airbnb’s SOA
progress
Not done yet, still in early stages
Services using IDL framework
250+
Leveraged tools and frameworks to scale services
IDL service endpoints supported
1000+
! Faster build & deploy times
○ Hours (Monorail) to minutes (service)
○ Fewer reverts
! Clear service ownership
! Quicker bug fixes
Promisinginitialresults
SUCCESS
● One service Monorail ship 2 hours -> 4 minutes in service
● Meet bug SLAs quicker
! Ruby Monorail single-threaded
! Java services multi-threaded
! Lower latency from parallelization
○ Search results page 3x faster
○ Home description page 10x faster!
Latencyresults
SUCCESS
● Latency improvements not specific to SOA! Our language change from Ruby -> Java when building services added multi-threading
support.
● Parallelization of requests, more efficient querying of data
800+ 3k
Deploys per weekEngineers
2017
3.5 deploys / minute
800+ 3k
Deploys per weekEngineers
1000+ 10k
2017
2018
1 deploy / minute
Note: this is across all services at Airbnb including the various stages (staging, canary, prod). Some services auto-deploy to staging periodically to
ensure our staging environment is up-to-date.
Product

Frontend

UserInterface
Infrastructure

Backend
Services
Services
On-call
Product Infra
• All engineer teams own multiple services
• Teams can test & deploy separately
• No more sysops
CheckoutpagerequiredmessageinSOA Checkout presentation
Pricing data
Home data
Reservation data
Review data
Home demand
derived data
Cancellation
derived data
Business travel
derived data
Messaging data
• Yes, a lot of services for the checkout page BUT
○ Clear ownership
○ Specific functionality
○ Faster deploys per service
• I would need to make change in
○ checkout presentation service
○ call endpoint in message data service
SOAisnotforeveryone
CAUTION
• Before deconstructing your monolith, be forewarned
• SOA has drawbacks
• Monoliths are beneficial for quick iteration & small teams
• Don’t overcomplicate your architecture if you don’t need to!
Distributedservices
CAUTION
• Request involves multiple services
• More network calls —> could be higher latency
• Each remote service call chance of failure
• Consistency changes with separate databases
• Observability harder —> distributed tracing
Complexserviceorchestration
CAUTION
Service owners must learn to manage and monitor own services
• Learning curve for engineers
• Airbnb started with each service owner managing multiple Amazon EC2 instances
• Moving towards Kubernetes for automated container orchestration
Highinvestmentcost
CAUTION
• High investment cost -> more tooling, frameworks
• Documentation -> hundreds of services to know about vs single Monorail
! Be ready for a long commitment
! Compare slowly & carefully
! Standardize services
! Frameworks, tools, documentation
SOAmigration
TAKEAWAYS
• Migrate with intent!
• Service frameworks is mandatory for scaling microservices quickly and reliably
Lookbothways

beforeyourGreatMigration
Airbnb having positive experience so far in our SOA world.
linkedin.com/in/jessicatai

@jessicamtai
Thank you for listening to the Airbnb migration story.

More Related Content

PPTX
NIC - Understand how Lync integrates with Exchange - Level 300
PPTX
IT Talk «Microservices & Serverless Architectures», Alexander Chichenin (Solu...
PPTX
Micro Services Architecture
PDF
Cloud Native Camel Riding
PPTX
QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYE
PPTX
Cloud Services Powered by IBM SoftLayer and NetflixOSS
PDF
Transforming Enterprise Release Management in Elastic Beanstalk using Jenkins...
PDF
Scalable Microservices at Netflix. Challenges and Tools of the Trade
NIC - Understand how Lync integrates with Exchange - Level 300
IT Talk «Microservices & Serverless Architectures», Alexander Chichenin (Solu...
Micro Services Architecture
Cloud Native Camel Riding
QCon New York 2014 - Scalable, Reliable Analytics Infrastructure at KIXEYE
Cloud Services Powered by IBM SoftLayer and NetflixOSS
Transforming Enterprise Release Management in Elastic Beanstalk using Jenkins...
Scalable Microservices at Netflix. Challenges and Tools of the Trade

What's hot (20)

PPTX
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
PDF
PLAT-5 Jive, Dropbox & Other Integrations
PDF
Microservices Practitioner Summit Jan '15 - Microservice Ecosystems At Scale ...
PDF
Fuse integration-services
PDF
Intro to React
PPTX
An evolution of application networking: service mesh
PPTX
Microservices in Azure
PPT
Jive, dropbox and other integrations
PDF
Camel oneactivemq posta-final
KEY
Scala and Lift
PDF
Nats.io meetup october 2015 - Community Update
ODP
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?
PPTX
Serverless Architecture at iRobot
PDF
Serverless brewbox
PDF
How to Build a Big Data Application: Serverless Edition
PDF
Microservices with Spring Cloud, Netflix OSS and Kubernetes
PPTX
Profiling and Tuning a Web Application - The Dirty Details
PDF
Alfresco Transform Service DevCon 2019
 
PPTX
Scaling micro-services Architecture on AWS
PPTX
RICON 2014 - Build a Cloud Day - Crash Course Open Source Cloud Computing
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
PLAT-5 Jive, Dropbox & Other Integrations
Microservices Practitioner Summit Jan '15 - Microservice Ecosystems At Scale ...
Fuse integration-services
Intro to React
An evolution of application networking: service mesh
Microservices in Azure
Jive, dropbox and other integrations
Camel oneactivemq posta-final
Scala and Lift
Nats.io meetup october 2015 - Community Update
But We're Already Open Source! Why Would I Want To Bring My Code To Apache?
Serverless Architecture at iRobot
Serverless brewbox
How to Build a Big Data Application: Serverless Edition
Microservices with Spring Cloud, Netflix OSS and Kubernetes
Profiling and Tuning a Web Application - The Dirty Details
Alfresco Transform Service DevCon 2019
 
Scaling micro-services Architecture on AWS
RICON 2014 - Build a Cloud Day - Crash Course Open Source Cloud Computing
Ad

Similar to [Annotated] QConSF 2018: Airbnb's Great Migration - From Monolith to Service-Oriented (20)

PDF
[ScaleConf 2020] How to Tame Your Microservices: Evolving Airbnb's Architecture
PDF
APIdays Singapore 2019 - Airbnb's Great Migration: Service APIs at scale, Jes...
PDF
[ApiDays Sngapore 2019] PDF - Airbnb's Great Migration: Building service APIs...
PDF
[Codemotion Milan 2019] Airbnb's Great Migration - Building Services at Scale
PDF
[MicroCPH 2019] Airbnb's Great Migration: Building Services at Scale
PDF
Migrating to an Agile Architecture, Will Demaine, Engineer, Fat Llama
PDF
Geoscience and Microservices
PPTX
Ledingkart Meetup #1: Monolithic to microservices in action
PPTX
Pragmatic Microservices
KEY
Simple Services
PPTX
Microservices - firststatedot.net - 13-oct-15
PDF
Lessons Learned on Uber's Journey into Microservices
PPTX
Microservices at ibotta pitfalls and learnings
PDF
An Iterative Approach to Service Oriented Architecture
PPTX
Inside Wordnik's Architecture
PDF
Service-Oriented Design and Implement with Rails3
PDF
Lowering the risk of monolith to microservices
PDF
From monolith to microservices
PDF
Responsible Microservices
PDF
Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture
[ScaleConf 2020] How to Tame Your Microservices: Evolving Airbnb's Architecture
APIdays Singapore 2019 - Airbnb's Great Migration: Service APIs at scale, Jes...
[ApiDays Sngapore 2019] PDF - Airbnb's Great Migration: Building service APIs...
[Codemotion Milan 2019] Airbnb's Great Migration - Building Services at Scale
[MicroCPH 2019] Airbnb's Great Migration: Building Services at Scale
Migrating to an Agile Architecture, Will Demaine, Engineer, Fat Llama
Geoscience and Microservices
Ledingkart Meetup #1: Monolithic to microservices in action
Pragmatic Microservices
Simple Services
Microservices - firststatedot.net - 13-oct-15
Lessons Learned on Uber's Journey into Microservices
Microservices at ibotta pitfalls and learnings
An Iterative Approach to Service Oriented Architecture
Inside Wordnik's Architecture
Service-Oriented Design and Implement with Rails3
Lowering the risk of monolith to microservices
From monolith to microservices
Responsible Microservices
Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture
Ad

Recently uploaded (20)

PDF
Categorization of Factors Affecting Classification Algorithms Selection
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PDF
Design Guidelines and solutions for Plastics parts
PPTX
Software Engineering and software moduleing
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
PPTX
Module 8- Technological and Communication Skills.pptx
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PPTX
Current and future trends in Computer Vision.pptx
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PPTX
Artificial Intelligence
PPTX
communication and presentation skills 01
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
introduction to high performance computing
Categorization of Factors Affecting Classification Algorithms Selection
Exploratory_Data_Analysis_Fundamentals.pdf
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
Design Guidelines and solutions for Plastics parts
Software Engineering and software moduleing
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
Module 8- Technological and Communication Skills.pptx
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Current and future trends in Computer Vision.pptx
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Safety Seminar civil to be ensured for safe working.
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Artificial Intelligence
communication and presentation skills 01
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
introduction to high performance computing

[Annotated] QConSF 2018: Airbnb's Great Migration - From Monolith to Service-Oriented

  • 1. Airbnb’s Great Migration: 
 From Monolith to Service-Oriented JESSICA TAI / 11.06.18 / QCON SF
  • 2. • Airbnb 2014 engineering team ~ 90 people • When gong rang, all engineers go to desk (Majority coded in the Monolith, help debug incident) — gong ringing = site was down • Now, 2018, > 1000 engineers, 10x growth since I joined 4 years ago My name is Jessica and I’m an ex-Monolith engineer :] I am currently on our Core Services infrastructure team, which builds the foundation for the migration to services. I’ll be discussing how Airbnb managed to scale its engineering team by redesigning its technical architecture.
  • 3. Life with monolith Growing pains Service design 
 principlesMigration to services Best practices Results Begin journey describing how & why stayed with monolith many years
  • 4. Life with monolith Growing pains Service design 
 principlesMigration to services Best practices Results Then describe pain points experienced as the eng team grew rapidly
  • 5. Life with monolith Growing pains Service design 
 principlesMigration to services Best practices Results Next, cover design principle for service building so we could create services in a consistent and scalable manner
  • 6. Life with monolith Growing pains Service design 
 principlesMigration to services Best practices Results Cover how we got started with the migration and safeguards for safely transitioning functionality into services
  • 7. Life with monolith Growing pains Service design 
 principlesMigration to services Best practices Results Best practices developed from mistakes learned during early migration steps
  • 8. Life with monolith Growing pains Service design 
 principlesMigration to services Best practices Results Share some of the results we’ve seen so far
  • 9. Monorail,ourRubyonRailsmonolith What is a monolith? • Single-tier unit • Responsible for client and server-side functionality • Model, view controller (MVC) in same repository
  • 10. Easytogetstartedwithamonolith EARLY AIRBNB Client Monolithic application Database Pros • Quick to bootstrap • Convenient to develop • Simple integration testing Airbnb’s experience • Perfect for small team in 2008 • Simple to manage, quick to iterate
  • 11. Database model
 
 host.first_name, message.save!
 2014newhiretask REQUIRED MESSAGE TO HOST One of my first new hire tasks was to require the guest to write a message to the host
  • 12. View template
 
 <h1>Tell your host %{host.name}, 
 “Hello”</h1> 2014newhiretask REQUIRED MESSAGE TO HOST
  • 13. Controller endpoint logic
 
 /submit_booking
 Model View Controller in Monorail 2014newhiretask REQUIRED MESSAGE TO HOST All changes in single application, Monorail
  • 14. Volunteerdevinfra
 &sysops • Architecture relatively simple to manage • Developer infrastructure projects were volunteer-based • Site oncall sysops was purely volunteer engineers
  • 15. And life was simple, eng productivity We were happy
  • 16. WHY DECIDE TO MIGRATE? If the Monolithic life was so great, why spend the time and effort to migrate? • This talk also nature lesson identifying animals that migrate • Artic tern bird migrates 1.5 Million miles during lifetime • 3x to moon and back Architecture migration is like a million mile journey
  • 17. x Difficulttoscalemonoliths SINGLE MEGA-SERVICE FOR ALL CONCERNS Client ModelA ModelB ModelCModelD ConcernBConcernA ConcernD ConcernC Database Monolith Reached a point where difficult for us to scale Monorail Monolith can theoretically have well defined encapsulation / services. But… a monolith doesn’t enforce encapsulation.
  • 18. Difficulttoscalemonoliths SINGLE MEGA-SERVICE FOR ALL CONCERNS Client ModelA ModelB ModelC ModelD ConcernBConcernA ConcernD ConcernC Database Monolith Airbnb experience • part of the problem was also a lack of strong architecture Tight coupling • Modules highly dependent on one another. • Module assumes too many responsibilities • Or when one concern is spread over many modules instead own. -
  • 19. Difficulttoscalemonoliths SINGLE MEGA-SERVICE FOR ALL CONCERNS Client ModelA ModelB ModelC ModelD ConcernB ConcernC Database Monolith ModelE ConcernA ConcernD • Spaghetti entanglement • Hard to navigate, code, debug • Single database - more dependencies —> less reliable • Rapidly growing codebase
  • 20. 1,000,000 2,000,000 3,000,000 4,000,000 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 IncreasinglylargeMonorail LINES OF CODE GROWTH GROWING PAINS • Monorail lines of code kept increasing • Started feeling growing pains as eng team and Monorail grew • Reduced developer productivity
  • 21. Growingpains 200+ 200 15h Commits deployed to Monorail per day Engineers at Airbnb Time Monorail deployed blocked per week Monorail production deploys blocked 15 hours on average per week due to rollbacks or reverts
  • 22. Messagetohost MANY TEAMS s • Checkout page revamped UI • Ownership & accountability tricky when multiple teams overlap on same product Attempts to aid: mandatory reviewers per file/directory Helped a bit, but per-file level still many teams
  • 23. Messagetohost THOUSANDS OF LINES, HUNDREDS OF CONTRIBUTORS Messagemodel Files size grew quickly - thousands of lines, hundreds of contributors, multiple teams, but hard to refactor
  • 25. Moreincidents Slowerdeploytrains Deploy train pains became worse • Airbnb has value of “democratic deploys” - every eng empowered & responsible to deploy own code to prod and test it • Due to magnitude of hours for Monorail deploys, I deployed in morning before other engineers got into work to reduce my chances of merge conflicts or revert delays • Eng frustrated • Dev productivity lower • Ownership & accountability unclear
  • 26. Oursolution:Service-orientedarchitecture(SOA) NETWORK OF LOOSELY-COUPLED SERVICES Client API gateway Service1 Service2 Database2Database3 Service3 SOA Advantages • Build & deploy per service • Scale independently • Parallelization • Defined ownership
  • 27. CheckoutpageinSOA Business travel service Cancellation service Home demand service Pricing service Home service Reservation service Review service Messaging service Pull out business logic into separate services Now seems like a lot of services! —> New spaghetti mess?
  • 28. Confident Airbnb could give the migration a shot as many other companies successfully transitioned: • Netflix • Amazon • Twitter • Uber
  • 29. SOA DESIGN TENETS • Wanted shared principles & understanding of service to guide us • Penguins also have shared understanding, migrate with all of the colonies meeting at the same place at the same time • Shared design tenets for eng to build services consistently
  • 30. Servicesownreads&writes
 totheirdata • Single gatekeeper with access to the data • Any service interested in a particular dataset must go through the gatekeeper service’s API • Data consistency • Encapsulation & isolation
  • 31. Servicesaddressaspecificconcern • Avoid creating another monolith • But don’t go too far the other way and create a polylith • Service must have a large enough but focused scope
  • 33. Mutationspropagate
 viastandardevents
 (opensourcedSpinaltap) • In Rails, callbacks executed as hooks in CRUD lifecycle (Example: when reservation transaction complete, then mark home as unavailable for the reservation dates) • Mutation publishing allows services to be aware w/o directly accessing data • SpinalTap (open source https://medium.com/airbnb-engineering/capturing-data-evolution-in-a-service-oriented-architecture-72f7c643ee6f) Change Data Capture service • Detect mutation, diff sources; emit std evt w/ low latency
  • 34. GETTING STARTED WITH THE MIGRATION • Monarch butterflies have longer migration cycle than their life cycle • No one butterfly makes the entire trip • Similarly, our approaches to migration evolved: ○ Initial ideas were not what we decided to implement ○ Initial services are not examples of current best practices
  • 35. HOSTED BY BAILEY   Idyllic home in the trees $99 PER NIGHT • Begin at foundation — something impacting the whole site • Picked Homes data model • Almost every feature back then involved a Home
  • 36. ! Replace data access methods with service call ! Ruby metaprogramming to override methods Firstattemptsto
 breakapartMonorail • Considered replacing callsites with Remote Procedural Call (RPC) - network calls, but there were thousands of such ActiveRecord data access methods • Metaprogramming override worked for a few months but realized… ○ Relations with other models complex and expensive ○ JOIN queries tricky/inefficient ○ Tangled dependencies
  • 37. Database Home.find_by_host_id(4) ActiveRecord ActiveRecord adapter Mysqladapter Monolith MigratingRails’sActiveRecord select * from homes where host_id = 4 ActiveRecord = Ruby library read & write business objects to relational db Existing pathway: 1. ActiveRecord method (Home.find_by_host_id to load homes data by host id = 4) 2. Wrapped into ActiveRecord 3. Translated to raw sql query 4. Sent via our default mysql ActiveRecord adapter straight to the database
  • 38. Monolith CustomActiveRecordadapter Home.find_by_host_id(4) ActiveRecord query object ActiveRecord adapter select * from homes where host_id = 4 Airbnb created a custom ActiveRecord adapter to parse the raw sql string into query object
  • 39. Parsetorequestobject ActiveRecord adapter :type => :select, :table => “homes”, :filters => [{ :name => “host_id”, :type => "integer", :nullable => false, :comparator => :eq, :value => 4, }], :select => [“id”, “host_id”, “title”], Monolith /loadHomes { host_id: 4, fields: [“id”, “host_id”,“title”] } query object Query object from custom ActiveRecord adapter identified key parts of the query • Type of query = select for “select * from homes…” • Table name = homes • Filters = where clause • Select = fields of mysql table in being selected in the query (in this case, the example only has 3 fields in the homes table)
  • 40. Re-routequeriestoservices ActiveRecord adapter query object Database Request
 {host_id: 4 …} Home service Monolith Seems like a roundabout way to get data… but migrating to service calls at the lowest level (raw SQL strings) had benefits: • Create flexible service api to support raw sql request patterns • Proof don’t rewrite 1000s callsites • Test critical service in core booking flow can handle load Monorail was supporting • Product engineers can still code with ActiveRecord methods (no change to their existing development practices, just change how data retrieved under the hood)
  • 41. SERVICE INTERACTION DESIGN After initial services, we sought to define how services interact with each other. Jellyfish migrate by interacting with sunlight, following it from east to west Similarly, we wanted a strict direction for service requests and interaction with other services.
  • 42. Servicerequest API gateway Service Monorail Service Interim Future API traffic SOA requests propagated in specific directions throughout network Requests originate from mobile or web clients
  • 43. Servicetypes STRICT FLOW OF DEPENDENCIES Presentation service Data service Database Derived data service Derived data store Own reads & writes to data entities Shared business
 logic to disparate 
 data sources Synthesize data from services for end users Middletier Shared validation 
 logic API traffic • Data service ○ Gatekeeper for model • Derived data service ○ Product feature-related, shared across multiple contexts • Presentation service ○ Logic for data end user sees • Middle tier service ○ Need developed later after building out more of the SOA network
  • 44. Checkoutpage Checkout page presentation service Reservation data service Reservation database Home demand derived data service Offline booking trend stats Reservationvalidation middle-tierservice Home data service Home database Write Read API traffic • Separation of concerns • Home demand derived data - only get reservation, homes data needed for demand feature (reservation dates, home location) • Checkout presentation - get data needed for product feature user sees ○ May read different data from derived data service ○ Example: home name shown on checkout page but not needed for demand statistics
  • 45. COMPARE FOR DIFFERENCES • Migrate with no intended functionality change • Ensure no breakage —> Slowly migrate piecemeal • Walruses migrate by swimming or riding floating ice sheets • Same functionality but 2 different transport methods moving towards breeding ground • Compare new data service access pattern with existing monolith
  • 46. Dualreadcomparison WaitRamp&waitCompareGate Admin UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch Monolith Service Database Read path A Read path B Reads are idempotent - can issue multiple identical requests and will have the same effect as issuing a single request • Compare path A’s responses against path B’s response • Gate dual reads with switch configured by admin UI tool • With click of a button can ramp dual read traffic up or down • Can turn off immediately in UI without code changes, review, and deployment needed
  • 47. Dualreadcomparison WaitRamp&waitCompareGate Admin UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch Monolith Service Database Read path A Read path B Begin with a conservative, small amount of production traffic and compare responses looking for mismatches.
  • 48. Dualreadcomparison WaitRamp&waitCompareGate Admin UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch Monolith Service Database Read path A Read path B Gradually ramp up while comparing the responses for mismatches along the way.
  • 49. Dualreadcomparison WaitRamp&waitCompareGate Admin UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch Monolith Service Database Read path A Read path B 5% Production
 traffic
  • 50. Dualreadcomparison WaitRamp&waitCompareGate Admin UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch Monolith Service Database Read path A Read path B 10% Production
 traffic
  • 51. Dualreadcomparison WaitRamp&waitCompareGate Admin UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch Monolith Service Database Read path A Read path B 25% Production
 traffic
  • 52. Dualreadcomparison WaitRamp&waitCompareGate Admin UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch Monolith Service Database Read path A Read path B 50% Production
 traffic
  • 53. Dualreadcomparison WaitRamp&waitCompareGate Admin UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch Monolith Service Database Read path A Read path B 100% Production
 traffic
  • 54. Dualreadcomparison WaitRamp&waitCompareGate Admin UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch Monolith Service Database Read path A Read path B 100% While at 100%, wait some more! • Gather enough traffic to cover all access patterns to your read path in migration • Ensure your service can sustain 100% of Monorail’s path A traffic
  • 55. Dualreadcomparison WaitRamp&waitCompareGate Admin UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch Monolith Service Database Read path A Read path B Once the comparisons look clean, cut over to reading only through your service and stop the dual reads.
  • 56. Dualreadcomparison WaitRamp&waitCompareGate Admin UI configuration 1% traffic Gradual increments Gather traffic patterns All traffic through service only Switch Monolith Service Database The read path
  • 57. Writecomparison DUAL WRITE TO SEPARATE DATABASES Presentation service Production database Shadow database Write validation middle tier service Write path A Write path B Reads Monolith Writes are not idempotent so cannot dual write to same database
  • 58. s Writecomparison:services DUAL WRITE Presentation service Production database Write validation middle tier service The write path Monolith
  • 59. Incrementalmigration • Valued “Democratic deploy” ○ Each engineer responsible for testing and deploying their changes through to production successfully • Monorail deploys were a big pain point -> services alleviate this pain ○ Option 1: Build service 100% functionality in shadow then switch ○ Option 2: Build and migrate as more functionality built ○ Airbnb picked option 2 • Goal: get teams to be service owners ASAP
  • 60. ! Compare one endpoint at a time ! Unblock clients with incomplete service ○ e.g. /loadUsers ○ fetch users only by id Migratebyendpoint • User service started with one /loadUsers endpoint ○ only loaded users by id for one MySQL table • Onboarded, unblocked 10 clients while adding more support for various data sources (e.g. more user-related MySQL tables), more endpoints (e.g. /updateUser), more query patterns (e.g. load users by email)
  • 61. Migratebyattribute Service Monolith Database Read migrated
 attributes Read not-yet-migrated 
 attributes Database Presentation service Production traffic • Presentation services ○ Not all attributes required by presentation may not be currently supported by a service ○ Hydrate SOA supported attributes from services ○ Unsupported attributes still hydrated from Monorail • Gets traffic through presentation service ASAP • Incremental changes —> more cautious
  • 62. • Remember we hit the gong when the site was down? ○ Gong hit during initial attempts to migrate ○ First services had rough patches … are not the poster child services of best practices now, but we gained valuable knowledge of what not to do!
  • 63. SOA BEST PRACTICES Learning from gong-ringing experiences, developed best practices Wildebeest have dangerous migration path • developed best practices for keeping their young safe • place them in center of pack Use best practices to keep services alive and healthy
  • 64. Frameworks Auto-generate code Testing&deploying Replay production traffic Observability Standard templates Standardizeservicebuilding CONSISTENCY
  • 65. Service Service&clientsetup Business logic Service created with the purpose of supporting specific business functionality or data Needs some additional setup
  • 67. Service Service&clientsetup Business logic Endpoint logic Server transport Java client Ruby client Client transport Client transport Manually need to write both Java and Ruby clients This is the minimal set up to get query a service and get a response
  • 68. Service Service&clientsetup Business logic Server metrics Server diagnostics Startup / teardown Endpoint logic Metrics Data validation Server transport Server resilience Java client Ruby client Metrics Client transport Data validation Error handling Resilience Metrics Client transport Data validation Error handling Resilience Type checking This service will be in production — need to add more to the service & client setup • Metrics • Data validation • Error handling • Resilience • Type checking for Ruby ○ Type differences between Java & Ruby have been problematic for us before
  • 69. Service Endpoint logic Service&clientsetup Business logic Server metrics Server diagnostics Startup / teardown Dashboard Dashboard Alert Alert Alert Runbook documentation Metrics Data validation Server transport Server resilience Java client Ruby client Metrics Client transport Data validation Error handling Resilience Metrics Client transport Data validation Error handling Resilience Type checking Production needs more pieces to operate & maintain the services: • Dashboard • Alerts • Runbook documentation
  • 70. Service Endpoint logic Service&clientsetup Business logic Server metrics Server diagnostics Startup / teardown Dashboard Dashboard Alert Alert Alert Runbook documentation Metrics Data validation Server transport Server resilience Java client Ruby client Metrics Client transport Data validation Error handling Resilience Metrics Client transport Data validation Error handling Resilience Type checking But what the engineer really wanted to focus on (and what’s unique about this service) is the business logic and data it supports. Wouldn’t it be nice if we didn’t need to manually write all the boilerplate and setup just to get a service started?
  • 71. IDLusingThrift • Invested in a services framework team ○ Their mission is to build the foundation for building and scaling services in a consistent and simple way for engineers. ○ Automate or configure best practices • Aligned on Thrift as Defined best practices coding patterns
  • 72. IDL Service Endpoint logic Business logic Server metrics Server diagnostics Startup / teardown Dashboard Dashboard Alert Alert Alert Runbook documentation Metrics Data validation Server transport Server resilience Java client Ruby client Metrics Client transport Data validation Error handling Resilience Metrics Client transport Data validation Error handling Resilience Type checking • Interface Description Language (IDL) ○ Describe API in language-agnostic way • Now only write business logic & IDL layer • Rest autogenerated for free
  • 73. Codingservices BEFORE AFTER ! Difficult to create and 
 maintain services
 ! Custom Java vs. Ruby clients ! Autogenerated code 
 framework & API
 ! Automated ruby gem client • [Before IDL] Weeks to create a new service with multiple changes in various repositories that needed to be deployed in the correct order • Now, run one script and service is pingable and productionized within an hour • “Make me a service” script sets up a lot for you automatically ○ Boilerplate service ○ Health endpoint ○ Deploy & testing configs ○ Cluster config • Can create Ruby gems based off of the Thrift IDL for Ruby services to use with a click of a button in our admin UI
  • 74. ThriftIDL API FRAMEWORK /* Batch request */
 struct LoadSomeDataRequest { 1: optional set<i64> ids (non_null) 2: optional bool fooBar } /* id to data response */ struct LoadSomeDataResponse { 1: optional map<i64, SomeData> data } • Self-documenting API with Thrift structs ○ Standard response, request structure ○ Can look at any service’s thrift config to know API • Strong typing (e.g. interface, storage, communication)
  • 75. /* /loadSomeData endpoint */
 LoadSomeDataResponse loadSomeData 
 (1: LoadSomeDataRequest request)
 throws (1: SomeException exception1) 
 (accept_replay = "true", rate_limit = "true") ThriftIDL API FRAMEWORK ! Unified client for Java & Ruby ! Simple annotations to autogenerate features
  • 76. Testing&deploying BEFORE AFTER ! Uncertainty in pre-production
 environments
 ! Trigger manual requests ! Structured pre-production
 process
 ! Automated replayed traffic Previously testing involved manual curling to trigger a request to our service. Now, we have more automated tools to help us with this.
  • 80. RegressionTesting DIFFY Staging (new code) Primary 
 (old code) Secondary (old code) Raw response differences Non-
 deterministic noise Filtered response differences Diffy Replayed traffic github.com/twitter/diffy • New code vs. last known good code • Filter out the noise from the raw response differences to get the changes that can be attributed to the new code just introduced on staging. ○ Helpful for detecting regression in existing endpoints ○ Useful for ensuring a change is reflected in the response if fixing a bug Is Diffy SOA specific? • No, but not practical in our Monorail. • Monorail: too many endpoints with tightly coupled logic • SOA-services: fewer, narrowly focused endpoints
  • 83. Observability BEFORE AFTER ! Nonstandard metrics, 
 dashboards
 ! Inconsistent alerts ! IDL templated metrics, 
 dashboards
 ! IDL annotation alerts Before: Debugging domain knowledge, search • Metrics had uneven coverage, diff naming • Dashboard inconsistent with completeness, correctness, up-to-date-ness, interpretation Now, IDL annotations for alerts • High p95 latency • High error rate • Low queries per second (QPS)
  • 84. • Now: more consistent, quick understanding with templated graphs • Each IDL service has the same graphs, toggled simply with a dropdown menu for service name
  • 85. HOW IS THE MIGRATION GOING SO FAR? ● A lot of work so far, are we done? ● Humpback whales longest migration of any mammal ● Whales & our migration have long, arduous journeys — long migrations take time.
  • 86. Airbnb’s SOA progress Not done yet, still in early stages
  • 87. Services using IDL framework 250+ Leveraged tools and frameworks to scale services
  • 88. IDL service endpoints supported 1000+
  • 89. ! Faster build & deploy times ○ Hours (Monorail) to minutes (service) ○ Fewer reverts ! Clear service ownership ! Quicker bug fixes Promisinginitialresults SUCCESS ● One service Monorail ship 2 hours -> 4 minutes in service ● Meet bug SLAs quicker
  • 90. ! Ruby Monorail single-threaded ! Java services multi-threaded ! Lower latency from parallelization ○ Search results page 3x faster ○ Home description page 10x faster! Latencyresults SUCCESS ● Latency improvements not specific to SOA! Our language change from Ruby -> Java when building services added multi-threading support. ● Parallelization of requests, more efficient querying of data
  • 91. 800+ 3k Deploys per weekEngineers 2017 3.5 deploys / minute
  • 92. 800+ 3k Deploys per weekEngineers 1000+ 10k 2017 2018 1 deploy / minute Note: this is across all services at Airbnb including the various stages (staging, canary, prod). Some services auto-deploy to staging periodically to ensure our staging environment is up-to-date.
  • 94. Services On-call Product Infra • All engineer teams own multiple services • Teams can test & deploy separately • No more sysops
  • 95. CheckoutpagerequiredmessageinSOA Checkout presentation Pricing data Home data Reservation data Review data Home demand derived data Cancellation derived data Business travel derived data Messaging data • Yes, a lot of services for the checkout page BUT ○ Clear ownership ○ Specific functionality ○ Faster deploys per service • I would need to make change in ○ checkout presentation service ○ call endpoint in message data service
  • 96. SOAisnotforeveryone CAUTION • Before deconstructing your monolith, be forewarned • SOA has drawbacks • Monoliths are beneficial for quick iteration & small teams • Don’t overcomplicate your architecture if you don’t need to!
  • 97. Distributedservices CAUTION • Request involves multiple services • More network calls —> could be higher latency • Each remote service call chance of failure • Consistency changes with separate databases • Observability harder —> distributed tracing
  • 98. Complexserviceorchestration CAUTION Service owners must learn to manage and monitor own services • Learning curve for engineers • Airbnb started with each service owner managing multiple Amazon EC2 instances • Moving towards Kubernetes for automated container orchestration
  • 99. Highinvestmentcost CAUTION • High investment cost -> more tooling, frameworks • Documentation -> hundreds of services to know about vs single Monorail
  • 100. ! Be ready for a long commitment ! Compare slowly & carefully ! Standardize services ! Frameworks, tools, documentation SOAmigration TAKEAWAYS • Migrate with intent! • Service frameworks is mandatory for scaling microservices quickly and reliably
  • 102. linkedin.com/in/jessicatai
 @jessicamtai Thank you for listening to the Airbnb migration story.