SlideShare a Scribd company logo
in production
an experience reportan experience report
what you should know before you go to production
ServerlessServerless
Yan Cui
http://theburningmonk.com
@theburningmonk
Principal Engineer @
“Netflix for sports”
offices in London, Leeds, Katowice and Amsterdam
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
available in Austria, Switzerland,
Germany, Japan and Canada, Italy
and USA
available on 30+ platforms
~1,000,000 concurrent viewers
“Netflix for sports”
offices in London, Leeds, Katowice and Amsterdam
We’re hiring! Visit
engineering.dazn.com to
learn more.
follow @dazneng for updates
about the engineering team.
WE’RE HIRING!
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
apr, 2016
Serverless in production, an experience report (CoDe-Conf)
hey guys, vote on this post
and I’ll announce a winner at
10PM tonight
10PM
traffic
10PM
traffic
70-100x
low utilisation to leave room for spikes
EC2 scaling is slow, so scale earlier
lots of $$$ for unused resources
up to 30 mins for deployment
deployment required downtime
- Dan North
“lead time to someone saying
thank you is the only reputation
metric that matters.”
Serverless in production, an experience report (CoDe-Conf)
“what would good
look like for us?”
be small
be fast
have zero downtime
have no lock-step
DEPLOYMENTS SHOULD...
FEATURES SHOULD...
be deployable independently
be loosely-coupled
WE WANT TO...
minimise cost for unused resources
minimise ops effort
reduce tech mess
deliver visible improvements faster
nov, 2016
170 Lambda functions in prod
1.2 GB deployment packages in prod
95% cost saving vs EC2
15x no. of prod releases per month
time
is a good fit
1st function in prod!
time
is a good fit
?
time
is a good fit
1st function in prod!
ALERTING
CI / CD
TESTING
LOGGING
MONITORING
Practices ToolsPrinciples
what is good? how to make it good? with what?
Principles outlast Tools
170 functions
? ?
time
is a good fit
1st function in prod!
SECURITY
DISTRIBUTED TRACING
CONFIG MANAGEMENT
evolving the PLATFORM
rebuilt search
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearch
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
new analytics pipeline
Legacy Monolith Amazon Kinesis Amazon Lambda
Google BigQuery
Legacy Monolith Amazon Kinesis Amazon Lambda
Google BigQuery
1 developer, 2 days
design production
(his 1st serverless project)
Legacy Monolith Amazon Kinesis Amazon Lambda
Google BigQuery
“nothing ever got done
this fast at Skype!”
- Chris Twamley
- Dan North
“lead time to someone saying
thank you is the only reputation
metric that matters.”
Rebuilt
with Lambda
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
getting PRODUCTION READY
choose a tried-and-tested
deployment framework,
don’t invent your own
http://serverless.com
https://github.com/awslabs/serverless-application-model
http://apex.run
https://apex.github.io/up
https://github.com/claudiajs/claudia
https://github.com/Miserlou/Zappa
http://gosparta.io/
TESTING
amzn.to/29Lxuzu
Level of Testing
1.Unit
do our objects do the right thing?
are they easy to work with?
Serverless in production, an experience report (CoDe-Conf)
Level of Testing
1.Unit
2.Integration
does our code work against code we
can’t change?
handler
handler
test by invoking
the handler
Level of Testing
1.Unit
2.Integration
3.Acceptance
does the whole system work?
Level of Testing
unit
integration
acceptance
feedback
confidence
“…We find that tests that mock external
libraries often need to be complex to
get the code into the right state for the
functionality we need to exercise.
The mess in such tests is telling us that
the design isn’t right but, instead of
fixing the problem by improving the
code, we have to carry the extra
complexity in both code and test…”
Don’t Mock Types You Can’t Change
“…The second risk is that we have to be
sure that the behaviour we stub or mock
matches what the external library will
actually do…
Even if we get it right once, we have to
make sure that the tests remain valid
when we upgrade the libraries…”
Don’t Mock Types You Can’t Change
Don’t Mock Types You Can’t Change
Services
Paul Johnston
The serverless approach to
testing is different and may
actually be easier.
http://bit.ly/2t5viwK
LambdaAPI Gateway DynamoDB
LambdaAPI Gateway DynamoDB
Unit Tests
LambdaAPI Gateway DynamoDB
Unit Tests
Mock/Stub
is our request correct?
is the request mapping
set up correctly?is the API resources
configured correctly?
are we assuming the
correct schema?
LambdaAPI Gateway DynamoDB
is Lambda proxy
configured correctly?
is IAM policy set
up correctly?
is the table created?
what unit tests will not tell you…
Serverless in production, an experience report (CoDe-Conf)
most Lambda functions are simple
have single purpose, the risk of
shipping broken software has largely
shifted to how they integrate with
external services
observation
Serverless in production, an experience report (CoDe-Conf)
optimize towards shipping working
software, even if it means slowing
down your feedback loop…
…if a service can’t provide
you with a relatively easy
way to test the interface in
reality, then you should
consider using another one.
Paul Johnston
“…Wherever possible, an acceptance
test should exercise the system end-to-
end without directly calling its internal
code.
An end-to-end test interacts with the
system only from the outside: through
its interface…”
Testing End-to-End
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
Test Input
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
Test Input
Validate
integration tests exercise
system’s Integration with its
external dependencies
my code
acceptance tests exercise
system End-to-End from
the outside
my code
integration tests differ from
acceptance tests only in HOW the
Lambda functions are invoked
observation
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
CI + CD PIPELINE
Jenkins build config deploys and tests
unit + integration tests
deploy
acceptance tests
if [ "$1" = "deploy" ] && [ $# -eq 4 ]; then
STAGE=$2
REGION=$3
PROFILE=$4
npm install
AWS_PROFILE=$PROFILE 'node_modules/.bin/sls' deploy -s $STAGE -r $REGION
elif [ "$1" = "int-test" ] && [ $# -eq 4 ]; then
STAGE=$2
REGION=$3
PROFILE=$4
npm install
AWS_PROFILE=$PROFILE npm run int-$STAGE
elif [ "$1" = "acceptance-test" ] && [ $# -eq 4 ]; then
STAGE=$2
REGION=$3
PROFILE=$4
npm install
AWS_PROFILE=$PROFILE npm run acceptance-$STAGE
else
usage
exit 1
fi
if [ "$1" = "deploy" ] && [ $# -eq 4 ]; then
STAGE=$2
REGION=$3
PROFILE=$4
npm install
AWS_PROFILE=$PROFILE 'node_modules/.bin/sls' deploy -s $STAGE -r $REGION
elif [ "$1" = "int-test" ] && [ $# -eq 4 ]; then
STAGE=$2
REGION=$3
PROFILE=$4
npm install
AWS_PROFILE=$PROFILE npm run int-$STAGE
elif [ "$1" = "acceptance-test" ] && [ $# -eq 4 ]; then
STAGE=$2
REGION=$3
PROFILE=$4
npm install
AWS_PROFILE=$PROFILE npm run acceptance-$STAGE
else
usage
exit 1
fi
install Serverless framework
as dev dependency
if [ "$1" = "deploy" ] && [ $# -eq 4 ]; then
STAGE=$2
REGION=$3
PROFILE=$4
npm install
AWS_PROFILE=$PROFILE 'node_modules/.bin/sls' deploy -s $STAGE -r $REGION
elif [ "$1" = "int-test" ] && [ $# -eq 4 ]; then
STAGE=$2
REGION=$3
PROFILE=$4
npm install
AWS_PROFILE=$PROFILE npm run int-$STAGE
elif [ "$1" = "acceptance-test" ] && [ $# -eq 4 ]; then
STAGE=$2
REGION=$3
PROFILE=$4
npm install
AWS_PROFILE=$PROFILE npm run acceptance-$STAGE
else
usage
exit 1
fi
install Serverless framework
as dev dependency
mitigate version conflicts
Serverless in production, an experience report (CoDe-Conf)
http://alistair.cockburn.us/Hexagonal+architecture
build.sh allows repeatable builds on both local & CI
Serverless in production, an experience report (CoDe-Conf)
Auto Auto Manual
Serverless in production, an experience report (CoDe-Conf)
LOGGING
Serverless in production, an experience report (CoDe-Conf)
2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae
GOT is off air, what do I do now?
2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae
GOT is off air, what do I do now?
UTC Timestamp API Gateway Request Id
your log message
Me
Logs are not easily searchable
in CloudWatch Logs.
CloudWatch Logs
AWS Lambda
invokes
AWS Lambda
stdout
asynchronously
any log aggregation service
CloudWatch Logs
CloudWatch Logs
AWS Lambda
AWS Lambda
stdout
any log aggregation service
asynchronously
invokes
…
CloudWatch Events
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
DISTRIBUTED TRACING
Serverless in production, an experience report (CoDe-Conf)
a user
my followers didn’t receive
my new post!
where could the
problem be?
correlation IDs*
* eg. request-id, user-id, yubl-id, etc.
wrap HTTP client & AWS SDK clients
to forward captured correlation IDs
kinesis client
http client
sns client
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
use X-Ray for performance tracing
Amazon X-Ray
Amazon X-Ray
X-Ray traces do not span over
async event sources
https://epsagon.com
MONITORING + ALERTING
no place to install agents/daemons
• invocation Count
• error Count
• latency
• throttling
• granular to the minute
• support custom metrics
• same metrics as CW
• better dashboard
• support custom metrics
https://www.datadoghq.com/blog/monitoring-lambda-functions-datadog/
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
my code
my code
my code
internet internet
press button something happens
those extra 10-20ms for
sending custom metrics
would compound when
you have microservices
and multiple APIs are
called within one slice of
user event
Amazon found every 100ms of latency
cost them 1% in sales.
http://bit.ly/2EXPfbA
no more background processing,
other than what the platform provides
console.log(“hydrating yubls from db…”);
console.log(“fetching user info from user-api”);
console.log(“MONITORING|1489795335|27.4|latency|user-api-latency”);
console.log(“MONITORING|1489795335|8|count|yubls-served”);
timestamp metric value
metric type
metric namemetrics
logs
CloudWatch Logs AWS Lambda
ELK stack
logs
metrics
CloudWatch
Serverless in production, an experience report (CoDe-Conf)
don’t forget to setup dashboards & alarms
CONFIG MANAGEMENT
design for easy & quick
propagation of config changes
Serverless in production, an experience report (CoDe-Conf)
me
Environment variables make it
hard to share configurations
across functions.
me
Environment variables make it
hard to implement fine-grained
access to sensitive info.
config service
goes here
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
SSM
Parameter
Store
Serverless in production, an experience report (CoDe-Conf)
Secrets
Manager
sensitive data should be encrypted
in-flight, and at-rest
enforce role-based access to
sensitive configuration values
SSM Parameter Store
HTTPS
role-based access
encrypted in-flight
SSM Parameter Store
encrypt
role-based access
SSM Parameter Store
encrypted at-rest
HTTPS
role-based access
SSM Parameter Store
encrypted in-flight
invest into a robust client library
fetch & cache at cold-start
https://github.com/middyjs/middy
Serverless in production, an experience report (CoDe-Conf)
Serverless in production, an experience report (CoDe-Conf)
https://www.cloudzero.com/blog/2018-cloud-and-serverless-ecosystem-survey
take the survey and enter to win a $100 Amazon Gift Card
understand the activity, cost, performance and relationships
that occur in your cloud and serverless environments
@theburningmonk
theburningmonk.com
github.com/theburningmonk

More Related Content

PDF
Serverless in Production, an experience report (AWS UG South Wales)
PDF
Serverless in production, an experience report (FullStack 2018)
PDF
Serverless in production, an experience report (LNUG)
PDF
Serverless in Production, an experience report (cloudXchange)
PDF
DevOps with Serverless
PDF
Serverless in production, an experience report (microservices london)
PDF
Serverless in production, an experience report (London js community)
PDF
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)
Serverless in Production, an experience report (AWS UG South Wales)
Serverless in production, an experience report (FullStack 2018)
Serverless in production, an experience report (LNUG)
Serverless in Production, an experience report (cloudXchange)
DevOps with Serverless
Serverless in production, an experience report (microservices london)
Serverless in production, an experience report (London js community)
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)

What's hot (18)

PDF
Serverless in production, an experience report (codemotion milan)
PDF
Serverless Architectural Patterns
PDF
Serveless Design Patterns (Serverless Computing London)
PPTX
A. De Biase/C. Quatrini/M. Barsocchi - API Release Process: how to make peopl...
PDF
Serverless in production, an experience report (BuildStuff)
PDF
How to build observability into a serverless application
PDF
Security in serverless world
PDF
Adopting Java for the Serverless world at IT Tage
PDF
How to build observability into Serverless (O'Reilly Velocity 2018)
PDF
DevTernity - DevOps with smell
PDF
Serverless in production, an experience report (NDC London 2018)
PDF
Running serverless at scale
PDF
Streams and serverless at DAZN
PDF
Serverless in production, an experience report (London DevOps)
PDF
Security in serverless world (get.net)
PDF
Continuous Integration and Deployment Best Practices on AWS
PDF
How to build a social network on serverless
PDF
Adopting Java for the Serverless world at JUG Hamburg
Serverless in production, an experience report (codemotion milan)
Serverless Architectural Patterns
Serveless Design Patterns (Serverless Computing London)
A. De Biase/C. Quatrini/M. Barsocchi - API Release Process: how to make peopl...
Serverless in production, an experience report (BuildStuff)
How to build observability into a serverless application
Security in serverless world
Adopting Java for the Serverless world at IT Tage
How to build observability into Serverless (O'Reilly Velocity 2018)
DevTernity - DevOps with smell
Serverless in production, an experience report (NDC London 2018)
Running serverless at scale
Streams and serverless at DAZN
Serverless in production, an experience report (London DevOps)
Security in serverless world (get.net)
Continuous Integration and Deployment Best Practices on AWS
How to build a social network on serverless
Adopting Java for the Serverless world at JUG Hamburg

Similar to Serverless in production, an experience report (CoDe-Conf) (19)

PDF
Serverless in production, an experience report
PDF
Serverless in production, an experience report (JeffConf)
PDF
Serverless in production, an experience report (Going Serverless)
PDF
Serverless in production, an experience report (linuxing in london)
PDF
AWS Lambda from the trenches
PDF
Serverless in production (O'Reilly Software Architecture)
PDF
AWS Lambda from the trenches (Serverless London)
PDF
Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017
PDF
Serverless in production, an experience report (IWOMM)
PDF
Serverless in production, an experience report (NDC London, 31 Jan 2018)
PDF
The future of paas is serverless
PDF
AWS Lambda from the Trenches
PDF
Build reactive systems on lambda
PDF
Introduction to the Serverless paradigm
ODP
AutoScaling and Drupal
PDF
Surviving Serverless Testing: The ultimate Guide
PPTX
DockerCon EU 2015: Stop Being Lazy and Test Your Software!
PPTX
DevOps, A brief introduction to Vagrant & Ansible
PDF
How to build a social network on serverless | Yan Cui
Serverless in production, an experience report
Serverless in production, an experience report (JeffConf)
Serverless in production, an experience report (Going Serverless)
Serverless in production, an experience report (linuxing in london)
AWS Lambda from the trenches
Serverless in production (O'Reilly Software Architecture)
AWS Lambda from the trenches (Serverless London)
Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017
Serverless in production, an experience report (IWOMM)
Serverless in production, an experience report (NDC London, 31 Jan 2018)
The future of paas is serverless
AWS Lambda from the Trenches
Build reactive systems on lambda
Introduction to the Serverless paradigm
AutoScaling and Drupal
Surviving Serverless Testing: The ultimate Guide
DockerCon EU 2015: Stop Being Lazy and Test Your Software!
DevOps, A brief introduction to Vagrant & Ansible
How to build a social network on serverless | Yan Cui

More from Yan Cui (20)

PDF
How to win the game of trade-offs
PDF
How to choose the right messaging service
PDF
How to choose the right messaging service for your workload
PDF
Patterns and practices for building resilient serverless applications.pdf
PDF
Lambda and DynamoDB best practices
PDF
Lessons from running AppSync in prod
PDF
Serverless observability - a hero's perspective
PDF
How to ship customer value faster with step functions
PDF
How serverless changes the cost paradigm
PDF
Why your next serverless project should use AWS AppSync
PDF
Build social network in 4 weeks
PDF
Patterns and practices for building resilient serverless applications
PDF
How to bring chaos engineering to serverless
PDF
Migrating existing monolith to serverless in 8 steps
PDF
Building a social network in under 4 weeks with Serverless and GraphQL
PDF
FinDev as a business advantage in the post covid19 economy
PDF
How to improve lambda cold starts
PDF
What can you do with lambda in 2020
PDF
A chaos experiment a day, keeping the outage away
PDF
How to debug slow lambda response times
How to win the game of trade-offs
How to choose the right messaging service
How to choose the right messaging service for your workload
Patterns and practices for building resilient serverless applications.pdf
Lambda and DynamoDB best practices
Lessons from running AppSync in prod
Serverless observability - a hero's perspective
How to ship customer value faster with step functions
How serverless changes the cost paradigm
Why your next serverless project should use AWS AppSync
Build social network in 4 weeks
Patterns and practices for building resilient serverless applications
How to bring chaos engineering to serverless
Migrating existing monolith to serverless in 8 steps
Building a social network in under 4 weeks with Serverless and GraphQL
FinDev as a business advantage in the post covid19 economy
How to improve lambda cold starts
What can you do with lambda in 2020
A chaos experiment a day, keeping the outage away
How to debug slow lambda response times

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Empathic Computing: Creating Shared Understanding
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation theory and applications.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Electronic commerce courselecture one. Pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Big Data Technologies - Introduction.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Understanding_Digital_Forensics_Presentation.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Empathic Computing: Creating Shared Understanding
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Cloud computing and distributed systems.
NewMind AI Monthly Chronicles - July 2025
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation theory and applications.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Chapter 3 Spatial Domain Image Processing.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
The AUB Centre for AI in Media Proposal.docx
Electronic commerce courselecture one. Pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Serverless in production, an experience report (CoDe-Conf)