SlideShare a Scribd company logo
from the
TRENCHESTRENCHES
what you should know before you go to production
AWS LAMBDAAWS LAMBDA
hi,I’mYanCui
AWS Lambda from the trenches
AWS Lambda from the trenches
AWS Lambda from the trenches
AWS Lambda from the trenches
AWS user since 2009
AWS Lambda from the trenches
AWS Lambda from the trenches
apr, 2016
hidden complexities and dependencies
low utilisation to leave room for traffic spikes
EC2 scaling is slow, so scale earlier
lots of cost for unused resources
up to 30 mins for deployment
deployment required downtime
- Dan North
“lead time to someone saying
thank you is the only reputation
metric that matters.”
AWS Lambda from the trenches
“what would good
look like for us?”
be small
be fast
have zero downtime
have no lock-step
DEPLOYMENTS SHOULD...
FEATURES SHOULD...
be deployable independently
be loosely-coupled
WE WANT TO...
minimise cost for unused resources
minimise ops effort
reduce tech mess
deliver visible improvements faster
nov, 2016
170 Lambda functions in prod
1.2 GB deployment packages in prod
95% cost saving vs EC2
15x no. of prod releases per month
time
is a good fit
1st function in prod!
time
is a good fit
?
time
is a good fit
1st function in prod!
ALERTING
CI / CD
TESTING
LOGGING
MONITORING
170 functions
WOOF!
? ?
time
is a good fit
1st function in prod!
SECURITY
DISTRIBUTED
TRACING
CONFIG
MANAGEMENT
evolving the PLATFORM
rebuilt search
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearch
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
new analytics pipeline
Legacy Monolith Amazon Kinesis Amazon Lambda
Google BigQuery
Legacy Monolith Amazon Kinesis Amazon Lambda
Google BigQuery
1 developer, 2 days
design production
(his 1st serverless project)
Legacy Monolith Amazon Kinesis Amazon Lambda
Google BigQuery
“nothing ever got done
this fast at Skype!”
- Chris Twamley
- Dan North
“lead time to someone saying
thank you is the only reputation
metric that matters.”
Rebuilt
with Lambda
AWS Lambda from the trenches
AWS Lambda from the trenches
AWS Lambda from the trenches
AWS Lambda from the trenches
AWS Lambda from the trenches
AWS Lambda from the trenches
Rebuilt
with Lambda
BigQuery
BigQuery
grapheneDB
BigQuery
grapheneDB
BigQuery
grapheneDB
BigQuery
getting PRODUCTION READY
CHOOSE A
FRAMEWORK
DEPLOYMENT
http://serverless.com
https://github.com/awslabs/serverless-application-model
http://apex.run
https://apex.github.io/up
https://github.com/claudiajs/claudia
https://github.com/Miserlou/Zappa
http://gosparta.io/
TESTING
amzn.to/29Lxuzu
Level of Testing
1.Unit
do our objects do the right thing?
are they easy to work with?
AWS Lambda from the trenches
Level of Testing
1.Unit
2.Integration
does our code work against code we
can’t change?
handler
handler
test by invoking
the handler
Level of Testing
1.Unit
2.Integration
3.Acceptance
does the whole system work?
Level of Testing
unit
integration
acceptance
feedback
confidence
“…We find that tests that mock external
libraries often need to be complex to
get the code into the right state for the
functionality we need to exercise.
The mess in such tests is telling us that
the design isn’t right but, instead of
fixing the problem by improving the
code, we have to carry the extra
complexity in both code and test…”
Don’t Mock Types You Can’t Change
“…The second risk is that we have to be
sure that the behaviour we stub or mock
matches what the external library will
actually do…
Even if we get it right once, we have to
make sure that the tests remain valid
when we upgrade the libraries…”
Don’t Mock Types You Can’t Change
Don’t Mock Types You Can’t Change
Services
AWS Lambda from the trenches
“…Wherever possible, an acceptance
test should exercise the system end-to-
end without directly calling its internal
code.
An end-to-end test interacts with the
system only from the outside: through
its interface…”
Testing End-to-End
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
Test Input
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
Test Input
Validate
CI + CD PIPELINE
“the earlier you consider CI + CD, the
more time you save in the long run”
- me
“…We prefer to have the end-to-end
tests exercise both the system and the
process by which it’s built and
deployed…
This sounds like a lot of effort (it is), but
has to be done anyway repeatedly
during the software’s lifetime…”
Testing End-to-End
“deployment scripts
that only live on the CI
box is a disaster
waiting to happen”
- me
Jenkins build config deploys and tests
unit + integration tests
deploy
acceptance tests
if [ "$1" = "deploy" ] && [ $# -eq 4 ]; then
STAGE=$2
REGION=$3
PROFILE=$4
npm install
AWS_PROFILE=$PROFILE 'node_modules/.bin/sls' deploy -s $STAGE -r $REGION
elif [ "$1" = "int-test" ] && [ $# -eq 4 ]; then
STAGE=$2
REGION=$3
PROFILE=$4
npm install
AWS_PROFILE=$PROFILE npm run int-$STAGE
elif [ "$1" = "acceptance-test" ] && [ $# -eq 4 ]; then
STAGE=$2
REGION=$3
PROFILE=$4
npm install
AWS_PROFILE=$PROFILE npm run acceptance-$STAGE
else
usage
exit 1
fi
build.sh allows repeatable builds on both local & CI
AWS Lambda from the trenches
Auto Auto Manual
LOGGING
AWS Lambda from the trenches
2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae
GOT is off air, what do I do now?
2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae
GOT is off air, what do I do now?
UTC Timestamp API Gateway Request Id
your log message
function name
date
function version
LOG OVERLOAD
CENTRALISE LOGS
CENTRALISE LOGS
MAKE THEM EASILY
SEARCHABLE
+ +
the elk stack
CloudWatch Logs
CloudWatch Logs AWS Lambda ELK stack
CloudWatch Events
AWS Lambda from the trenches
http://bit.ly/2f3zxQG
DISTRIBUTED TRACING
AWS Lambda from the trenches
“my followers didn’t
receive my new post!”
- a user
where could the
problem be?
correlation IDs*
* eg. request-id, user-id, yubl-id, etc.
ROLL YOUR OWN
CLIENTS
kinesis client
http client
sns client
http://bit.ly/2k93hAj
ROLL YOUR OWN
CLIENTS
X-RAY
Amazon X-Ray
Amazon X-Ray
traces do not span over
API Gateway
useful, but hampered by
current limitations
http://bit.ly/2s9yxmA
MONITORING + ALERTING
“where do I install
monitoring agents?”
you can’t
• invocation Count
• error Count
• latency
• throttling
• granular to the minute
• support custom metrics
• same metrics as CW
• better dashboard
• support custom metrics
https://www.datadoghq.com/blog/monitoring-lambda-functions-datadog/
AWS Lambda from the trenches
“how do I batch up
and send logs in the
background?”
you can’t
(kinda)
console.log(“hydrating yubls from db…”);
console.log(“fetching user info from user-api”);
console.log(“MONITORING|1489795335|27.4|latency|user-api-latency”);
console.log(“MONITORING|1489795335|8|count|yubls-served”);
timestamp metric value
metric type
metric namemetrics
logs
CloudWatch Logs AWS Lambda
ELK stack
logs
metrics
CloudWatch
http://bit.ly/2gGredx
DASHBOARDS
DASHBOARDS
SET ALARMS
DASHBOARDS
SET ALARMS
TRACK APP-LEVEL
METRICS
Not Only CloudWatch
AWS Lambda from the trenches
“you really don't want
your monitoring
system to fail at the
same time as the
system it monitors”
- me
CONFIG MANAGEMENT
easily and quickly propagate
config changes
AWS Lambda from the trenches
CENTRALISED
CONFIG SERVICE
config service
goes here
AWS Lambda from the trenches
AWS Lambda from the trenches
AWS Lambda from the trenches
EC2
parameter
store
CENTRALISED
CONFIG SERVICE
CLIENT LIBRARY
AWS Lambda from the trenches
http://bit.ly/2yLUjwd
sensitive data should be encrypted
in-flight, and at rest
(credentials, connection string, etc.)
role-based access
KMS
EC2 Parameter Store
HTTPS
role-based access
encrypted in-flight
EC2 Parameter Store
encrypt
role-based access
EC2 Parameter Store
encrypted at-rest
HTTPS
role-based access
EC2 Parameter Store
encrypted in-flight
KMS
FRAMEWORK
PLUG-INS
PRO TIPS
SERVERLESS
FRAMEWORK
max 75 GB total deployment package size*
* limit is per AWS region
CLEAN UP OLD
PACKAGES
Janitor Monkey
Janitor Lambda
http://bit.ly/2xzVu4a
disable versionFunctions in
install Serverless framework as dev
dependency at project level
dev dependencies are excluded since 1.16.0
http://bit.ly/2vzBqhC
http://amzn.to/2vtUkDU
UNDERSTAND
COLDSTARTS
Amazon X-Ray
1st invocation
2nd invocation
cold start
source: http://bit.ly/2oBEbw2
http://bit.ly/2tb7bLJ
EMBRACE
NODE.JS & PYTHON
http://bit.ly/2rtCCBz
C#
http://bit.ly/2rtCCBz
Java
http://bit.ly/2rtCCBz
NodeJs, Python
http://bit.ly/2rtCCBz
what about type safety?
AWS Lambda from the trenches
complexity ceiling of a
Node.js app
complexity
complexity ceiling of a
Node.js app
complexity
referential transparency
immutability as default
type inference
option types
union types
…
for managing complexity
complexity ceiling of a
Node.js app
complexity
referential transparency
immutability as default
type inference
option types
union types
…
complexity ceiling of a
Node.js app
complexity
complexity ceiling of a
Node.js Lambda function
if you can limit the complexity
of your solution, maybe you
won’t need the tools for
managing that complexity.
me
AVOID HARD
ASSUMPTIONS
ABOUT FUNCTION
LIFETIME
USE STATE
FOR
OPTIMISATION
AVOID
COLDSTARTS
CloudWatch Event AWS Lambda
CloudWatch Event AWS Lambda
ping
ping
ping
ping
CloudWatch Event AWS Lambda
ping
ping
ping
ping
CloudWatch Event AWS Lambda
ping
ping
ping
ping
HEALTH CHECKS?
max 5 mins execution time
USE RECURSION
FOR LONG
RUNNING TASKS
CONSIDER
PARTIAL
FAILURES
“AWS Lambda polls your stream and
invokes your Lambda function. Therefore, if
a Lambda function fails, AWS Lambda
attempts to process the erring batch of
records until the time the data expires…”
http://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html
should function fail on
partial/any failures?
SNS
Kinesis
SQS
after 3 attempts
share processing logic
events are processed in
chronological order
failed events are retried out
of sequence
PROCESS SQS
WITH RECURSIVE
FUNCTIONS
http://bit.ly/2npomX6
AVOID HOT
KINESS
STREAMS
“Each shard can support up to
5 transactions per second for
reads, up to a maximum total data
read rate of 2 MB per second.”
http://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html
“If your stream has 100 active shards,
there will be 100 Lambda functions
running concurrently. Then, each
Lambda function processes events
on a shard in the order that they arrive.”
http://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html
when no. of processors goes up…
ReadProvisionedThroughputExceeded
can have too many Kinesis read operations…
ReadRecords.IteratorAge
unpredictable spikes in read ‘latency’…
can kinda workaround…
http://bit.ly/2uv5LsH
clever but costly
for subsystems that don’t have
to be realtime, or are task-
based (ie. order doesn’t
matter), consider other
triggers such as S3 or SNS.me
@theburningmonk
theburningmonk.com
github.com/theburningmonk
sign up here: http://bit.ly/2xCwJEe

More Related Content

PDF
Security in serverless world
PDF
Build reactive systems on lambda
PDF
The future of paas is serverless
PDF
Serverless in production, an experience report
PDF
AWS Lambda from the trenches (Serverless London)
PDF
Security in serverless world
PDF
Serverless in production, an experience report (Going Serverless)
PDF
How did we get here and where are we going
Security in serverless world
Build reactive systems on lambda
The future of paas is serverless
Serverless in production, an experience report
AWS Lambda from the trenches (Serverless London)
Security in serverless world
Serverless in production, an experience report (Going Serverless)
How did we get here and where are we going

What's hot (16)

PDF
Adopting Java for the Serverless world at JUG Hamburg
PDF
New AWS Services
PPTX
Ten^H^H^H Many Cloud App Design Patterns
PDF
Serverless in production, an experience report (linuxing in london)
PDF
Adopting Java for the Serverless world at IT Tage
PDF
ECS and ECR deep dive
PDF
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
PPTX
PDF
The present and future of Serverless observability
PDF
Adopting Java for the Serverless world at Serverless Meetup Italy
PDF
[AWS Dev Day] 앱 현대화 | AWS Fargate를 사용한 서버리스 컨테이너 활용 하기 - 삼성전자 개발자 포털 사례 - 정영준...
PDF
Docker and java
PPTX
Continuous delivery and deployment on AWS
PPTX
Gruntwork Executive Summary
PDF
DevOps with Amazon Web Services (November 2016)
PDF
Continuous Integration and Deployment Best Practices on AWS
Adopting Java for the Serverless world at JUG Hamburg
New AWS Services
Ten^H^H^H Many Cloud App Design Patterns
Serverless in production, an experience report (linuxing in london)
Adopting Java for the Serverless world at IT Tage
ECS and ECR deep dive
[AWS Dev Day] 실습워크샵 | Amazon EKS 핸즈온 워크샵
The present and future of Serverless observability
Adopting Java for the Serverless world at Serverless Meetup Italy
[AWS Dev Day] 앱 현대화 | AWS Fargate를 사용한 서버리스 컨테이너 활용 하기 - 삼성전자 개발자 포털 사례 - 정영준...
Docker and java
Continuous delivery and deployment on AWS
Gruntwork Executive Summary
DevOps with Amazon Web Services (November 2016)
Continuous Integration and Deployment Best Practices on AWS

Viewers also liked (20)

PDF
Docker in Production, Look No Hands! by Scott Coulton
PPTX
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
PPTX
Cloud adoption patterns April 11 2016
PPTX
Monitor all the cloud things - security monitoring for everyone
PPTX
Microservices mit Java EE - am Beispiel von IBM Liberty
PDF
150430 regiosessie corv_almelo
PPTX
Get complete visibility into containers based application environment
PDF
Sprint 49 review
PDF
IBM Containers- Bluemix
PPTX
Introduction to Data Modeling in Cassandra
PPTX
Security Realism in Education
PPTX
6 Million Ways To Log In Docker - NYC Docker Meetup 12/17/2014
PDF
Fluentd v1.0 in a nutshell
PPTX
Question 7
PPTX
What is dev ops?
PDF
IoT and Big Data
PPTX
All you need to know about Orient Me
PPT
Sitios turísticos de valledupar
PPTX
Interesting Places in Poland
PDF
Better Insights from Your Master Data - Graph Database LA Meetup
Docker in Production, Look No Hands! by Scott Coulton
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
Cloud adoption patterns April 11 2016
Monitor all the cloud things - security monitoring for everyone
Microservices mit Java EE - am Beispiel von IBM Liberty
150430 regiosessie corv_almelo
Get complete visibility into containers based application environment
Sprint 49 review
IBM Containers- Bluemix
Introduction to Data Modeling in Cassandra
Security Realism in Education
6 Million Ways To Log In Docker - NYC Docker Meetup 12/17/2014
Fluentd v1.0 in a nutshell
Question 7
What is dev ops?
IoT and Big Data
All you need to know about Orient Me
Sitios turísticos de valledupar
Interesting Places in Poland
Better Insights from Your Master Data - Graph Database LA Meetup

Similar to AWS Lambda from the trenches (20)

PDF
Serverless in production, an experience report (LNUG)
PDF
Serverless in Production, an experience report (cloudXchange)
PDF
Serverless in production, an experience report (JeffConf)
PDF
Serverless in Production, an experience report (AWS UG South Wales)
PDF
Serverless in production, an experience report (CoDe-Conf)
PDF
Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017
PDF
Serverless in production, an experience report (codemotion milan)
PDF
Serverless in production, an experience report (London DevOps)
PDF
Serverless in production, an experience report (BuildStuff)
PDF
AWS Lambda from the Trenches
PDF
Serverless in production, an experience report (FullStack 2018)
PDF
DevOps with Serverless
PDF
Serverless in production, an experience report (microservices london)
PDF
Serverless in production, an experience report (NDC London 2018)
PDF
Serverless in production, an experience report (NDC London, 31 Jan 2018)
PDF
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)
PDF
Serverless in production, an experience report (London js community)
PDF
Serverless in production (O'Reilly Software Architecture)
PDF
Serverless in production, an experience report (IWOMM)
POTX
Serverless: State of The Union I AWS Dev Day 2018
Serverless in production, an experience report (LNUG)
Serverless in Production, an experience report (cloudXchange)
Serverless in production, an experience report (JeffConf)
Serverless in Production, an experience report (AWS UG South Wales)
Serverless in production, an experience report (CoDe-Conf)
Yan Cui - Serverless in production, an experience report - Codemotion Milan 2017
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (London DevOps)
Serverless in production, an experience report (BuildStuff)
AWS Lambda from the Trenches
Serverless in production, an experience report (FullStack 2018)
DevOps with Serverless
Serverless in production, an experience report (microservices london)
Serverless in production, an experience report (NDC London 2018)
Serverless in production, an experience report (NDC London, 31 Jan 2018)
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)
Serverless in production, an experience report (London js community)
Serverless in production (O'Reilly Software Architecture)
Serverless in production, an experience report (IWOMM)
Serverless: State of The Union I AWS Dev Day 2018

More from Yan Cui (20)

PDF
How to win the game of trade-offs
PDF
How to choose the right messaging service
PDF
How to choose the right messaging service for your workload
PDF
Patterns and practices for building resilient serverless applications.pdf
PDF
Lambda and DynamoDB best practices
PDF
Lessons from running AppSync in prod
PDF
Serverless observability - a hero's perspective
PDF
How to ship customer value faster with step functions
PDF
How serverless changes the cost paradigm
PDF
Why your next serverless project should use AWS AppSync
PDF
Build social network in 4 weeks
PDF
Patterns and practices for building resilient serverless applications
PDF
How to bring chaos engineering to serverless
PDF
Migrating existing monolith to serverless in 8 steps
PDF
Building a social network in under 4 weeks with Serverless and GraphQL
PDF
FinDev as a business advantage in the post covid19 economy
PDF
How to improve lambda cold starts
PDF
What can you do with lambda in 2020
PDF
A chaos experiment a day, keeping the outage away
PDF
How to debug slow lambda response times
How to win the game of trade-offs
How to choose the right messaging service
How to choose the right messaging service for your workload
Patterns and practices for building resilient serverless applications.pdf
Lambda and DynamoDB best practices
Lessons from running AppSync in prod
Serverless observability - a hero's perspective
How to ship customer value faster with step functions
How serverless changes the cost paradigm
Why your next serverless project should use AWS AppSync
Build social network in 4 weeks
Patterns and practices for building resilient serverless applications
How to bring chaos engineering to serverless
Migrating existing monolith to serverless in 8 steps
Building a social network in under 4 weeks with Serverless and GraphQL
FinDev as a business advantage in the post covid19 economy
How to improve lambda cold starts
What can you do with lambda in 2020
A chaos experiment a day, keeping the outage away
How to debug slow lambda response times

Recently uploaded (20)

PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Modernizing your data center with Dell and AMD
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Cloud computing and distributed systems.
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
DOCX
The AUB Centre for AI in Media Proposal.docx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Modernizing your data center with Dell and AMD
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25 Week I
Dropbox Q2 2025 Financial Results & Investor Presentation
MYSQL Presentation for SQL database connectivity
Cloud computing and distributed systems.
Unlocking AI with Model Context Protocol (MCP)
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
20250228 LYD VKU AI Blended-Learning.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
The AUB Centre for AI in Media Proposal.docx

AWS Lambda from the trenches