SlideShare a Scribd company logo
Building Resilient Serverless Systems
with Non-Serverless Components
Jeremy Daly
CTO, AlertMe.news
@jeremy_daly
Jeremy Daly
• CTO at AlertMe.news
• Consult with companies building in the cloud
• 20+ year veteran of technology startups
• Started working with AWS in 2009 and started using
Lambda in 2015
• Blogger (jeremydaly.com), OSS contributor, speaker
• Publish the Off-by-none serverless newsletter
• Host of the Serverless Chats podcast
@jeremy_daly
Agenda
• What is resiliency and what is serverless?
• Working with “less-than-scalable” RDBMS
• Using unreliable APIs
• Managing API quotas
• Decoupling our services
• Other “non-serverless” components
@jeremy_daly
What is resiliency?
“The ability of a software solution to absorb the impact
of a problem in one or more parts of a system, while
continuing to provide an acceptable service level to the
business.” ~ IBM
@jeremy_daly
IT’S NOT ABOUT PREVENTING FAILURE
IT’S UNDERSTANDING HOWTO GRACEFULLY DEAL WITH IT
What does it mean to be serverless?
• No server management
• Flexible scaling
• Pay for value
• Automated high availability
@jeremy_daly
Serverless versus non-serverless
@jeremy_daly
ElastiCache
RDS
EMR Amazon ES
Redshift
Fargate
Anything “on EC2”Lambda Cognito Kinesis
S3 DynamoDB SQS
SNS API Gateway CloudWatch
AppSync IoT Comprehend
Serverless Managed Not Serverless
DocumentDB
(MongoDB)
Managed Streaming
for Kafka
Definitely
Everything has limits!
• Reserved Concurrency 🚦
• FunctionTimeouts ⏳
• Memory Limits 🧠
• NetworkThroughput 🚰
Some components are better than others
@jeremy_daly
Know
Your
Limits
Simple Serverless Web Service
Client
API Gateway Lambda DynamoDB
@jeremy_daly
Highly Scalable Highly Scalable Highly Scalable
“I want my, I want my, I want my SQL”
~ Dire Straits
Simple Serverless Web Service
Client
API Gateway Lambda
@jeremy_daly
Highly Scalable Highly Scalable NotThat Scalable 😳
RDS
^
not so
RDBMS and FaaS don’t play nicely together:
• Concurrency model doesn’t allow connection pooling
• Limited number of DB connections available
• Recycled containers create zombies
Ways to Manage DB Connections
• Increase max_connections setting
• Limit concurrent executions
• Lower your connection timeouts
• Limit connections per username
• Close connection before function ends
@jeremy_daly
🤞
😡
⚠
🎲
😱
👎
Better Ways to Manage DB Connections
• Implement a good caching strategy 💾
• Buffer events for throttling and durability 🏋! " ♂
• Manage connections ourselves 🤔
@jeremy_daly
👎
hit
Implement a good caching strategy
Client API Gateway RDSLambda
Elasticache
Key Points:
• Create new RDS connections ONLY on misses
• Make sureTTLs are set appropriately
• Include the ability to invalidate cache
@jeremy_daly
YOU STILL NEEDTO
SIZEYOUR DATABASE
CLUSTERS APPROPRIATELY
Do you really need immediate feedback?
Synchronous Communication ⏳
Services can be invoked by other services and must wait for a reply.
This is considered a blocking request, because the invoking service
cannot finish executing until a response is received.
Asynchronous Communication 🚀
This is a non-blocking request. A service can invoke (or trigger)
another service directly or it can use another type of communication
channel to queue information.The service typically only needs to wait
for confirmation (ack) that the request was received.
@jeremy_daly
RDS
Buffer events for throttling and durability
Client API Gateway
SQS
Queue
SQS
(DLQ)
Lambda Lambda
(throttled)
ack
“Asynchronous”
Request
Synchronous
Request
@jeremy_daly
Key Points:
• SQS adds durability
• Throttled Lambdas reduce downstream pressure
• Failed events are stored for further inspection/replay
Limit the
concurrency to match
RDS throughput
Manage connections ourselves
1. Count open connections
2. Close connection if connection ratio threshold exceeded
3. Close sleeping connections with high time values
4. Retry connections with exponential back off
@jeremy_daly
Serverless MySQL
https://github.com/jeremydaly/serverless-mysql
@jeremy_daly
Count open connections
@jeremy_daly
Query the
processlist to get
the total number
of active
connections
Close connection if over ratio threshold
@jeremy_daly
If we exceed the
connection ratio
Calculate our timeout
Try to kill zombies
If no zombies,
terminate connection
Else, just try to kill
zombies
Close sleeping connections with high time values
@jeremy_daly
Query processlist for zombies
Kill zombies
Retry connections with exponential back off
@jeremy_daly
If error trying to connect
Retry with Jitter
Does this really work?
@jeremy_daly
• Aurora Serverless (2 ACUs)
• 90 connections available
• 1,024 MB of memory
• 500 users/sec for one minute
• Avg. response time was 41 ms
• ZERO ERRORS
We shouldn’t have to do this!
@jeremy_daly
Amazon
Aurora Serverless
Aurora Serverless
DATA API
Doesn’t solve the max_connections issue Getting better, but limited to Aurora Serverless
Third-Party APIs
Manage calls to third-party APIs
• Implement a good caching strategy
• Buffer events for throttling and durability
• Implement circuit breakers 🚦
@jeremy_daly
DynamoDB
Stripe API
The Circuit Breaker
API
Consumer
API Gateway Lambda
Key Points:
• Cache your cache with warm functions
• Use a reasonable failure count
• Understand idempotency
Status
Check CLOSED
OPEN
Increment Failure Count
HALF OPEN
“Everything fails all the time.”
~WernerVogels
@jeremy_daly
🔥
🔥
Elasticache
or
What about quotas?
• Concurrency has no effect on frequency ⏰
• Stateless functions are not coordinated 😿
• Step Functions would be very expensive 💰
• Adding state wouldn’t prevent needless invocations 🗑
@jeremy_daly
Can we build a better system?
• 100% serverless
• Cost effective
• Scalable
• Resilient
• Efficient
• Coordinated
@jeremy_daly
BUT I DON’T HAVE
TIMETOTELLYOU
ABOUT IT!
YES!
Lambda Orchestrator
(concurrency 1)
The Lambda Orchestrator
DynamoDB
LambdaWorker
LambdaWorker
LambdaWorker
Concurrent Executions
of the SAME function
SQS (DLQ)
@jeremy_daly
CloudWatch Rule
(trigger every minute)
SQS QueueSQS (DLQ)
Status?
Gmail API
250 Quota Units
per minute
jeremydaly.com/throttling-third-party-api-calls-with-aws-lambda
Decoupling Our Services
Multicasting with SNS
Key Points:
• SNS has a “well-defined API”
• Decouples downstream processes
• Allows multiple subscribers with message filters
Client
SNS
“Asynchronous”
Request
ack
Event Service
@jeremy_daly
HTTP
SMS
Lambda
SQS
Email
Multicasting with EventBridge
Key Points:
• Create up to 100 event buses per account
• Allows multiple subscribers with RULES and EVENT PATTERNS
• Forward events to other accounts
@jeremy_daly
Asynchronous
“PutEvents” Request
ack
w/ event id
Amazon
EventBridge
Lambda
SQS
Client
Step Function
Event Bus
+13 others
Stripe API
@jeremy_daly
Distribute & Throttle
ack
SQS
Queue Lambda
(concurrency 25)
SNS
Topic
Client API
Gateway
Lambda
Order Service
total > $0
Key Points:
• SNS to SQS is “guaranteed” (100,010 retries)
• Filter events to selectively trigger services
• Manage throttling/quotas per service
RDS
SQS
Queue Lambda
(concurrency 10)
SMS Alerting Service
Twilio API
SQS
Queue Lambda
(concurrency 5)
Billing Service
status == ”order_complete”
Event Service
Other non-serverless components
• Managed Services
• Legacy Systems
• Our own “serverless” APIs
@jeremy_daly
“Non-serverless” components are inevitable
• Know the limits of your components
• Use a good caching strategy
• Embrace asynchronous processes
• Buffer and throttle events to distributed systems
• Utilize eventual consistency
@jeremy_daly
👈
Blog: jeremydaly.com
Newsletter: Offbynone.io
Podcast: ServerlessChats.com
Lambda API: LambdaAPI.com
GitHub: github.com/jeremydaly
Twitter: @jeremy_daly
@jeremy_daly

More Related Content

PDF
Building Resilient Serverless Systems with Non-Serverless Components
PDF
Serverless Microservice Patterns for AWS
PDF
Building resilient serverless systems with non serverless components
PDF
How to fail with serverless
PDF
Building Event-Driven Applications with Serverless and AWS - AWS Summit New York
PDF
Serverless Security: Best practices and mitigation strategies (re:Inforce 2019)
PDF
Building resilient serverless systems with non-serverless components (Belfast)
PDF
Building resilient serverless systems with non-serverless components - Cardif...
Building Resilient Serverless Systems with Non-Serverless Components
Serverless Microservice Patterns for AWS
Building resilient serverless systems with non serverless components
How to fail with serverless
Building Event-Driven Applications with Serverless and AWS - AWS Summit New York
Serverless Security: Best practices and mitigation strategies (re:Inforce 2019)
Building resilient serverless systems with non-serverless components (Belfast)
Building resilient serverless systems with non-serverless components - Cardif...

What's hot (8)

PDF
Choosing the right messaging service for your serverless app [with lumigo]
PDF
Thinking Asynchronously Full Vesion - Utah UG
PDF
Sloppy Little Serverless Stories
PDF
AWS Lambda
PDF
To Serverless And Beyond!
PDF
Build a Server-less Event-driven Backend with AWS Lambda and Amazon API Gateway
PPTX
2016 - Serverless Microservices on AWS with API Gateway and Lambda
PDF
Stephen Liedig: Building Serverless Backends with AWS Lambda and API Gateway
Choosing the right messaging service for your serverless app [with lumigo]
Thinking Asynchronously Full Vesion - Utah UG
Sloppy Little Serverless Stories
AWS Lambda
To Serverless And Beyond!
Build a Server-less Event-driven Backend with AWS Lambda and Amazon API Gateway
2016 - Serverless Microservices on AWS with API Gateway and Lambda
Stephen Liedig: Building Serverless Backends with AWS Lambda and API Gateway
Ad

Similar to Building resilient serverless systems with non-serverless components - Serverlessconf NYC 2019 (20)

PDF
Serverless Architectural Patterns & Best Practices
PDF
Square Peg Round Hole: Serverless Solutions For Non-Serverless Problems
PDF
Square Peg Round Hole: Serverless Solutions For Non-Serverless Problems
PDF
Serverless Design Patterns
PDF
Serveless Design Patterns (Serverless Computing London)
PDF
Jumpstart your idea with AWS Serverless [Oct 2020]
PDF
Serveless design patterns (VoxxedDays Luxembourg)
PDF
Serverless Architectural Patterns
PDF
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
PDF
Introduction to Serverless through Architectural Patterns
PDF
Why Serverless?
PDF
What can you do with lambda in 2020
PPTX
Serverless at Lifestage
PDF
Running serverless at scale
PDF
Serverless: Beyond Lambda Functions (V2)
PPTX
Demistifying serverless on aws
PDF
Čtvrtkon #64 - AWS Serverless - Michal Haták
PPTX
How Serverless Changes DevOps
PPTX
Serverless Streams, Topics, Queues, & APIs! Pick the Right Serverless Applica...
PDF
Docebo: history of a journey from legacy to serverless
Serverless Architectural Patterns & Best Practices
Square Peg Round Hole: Serverless Solutions For Non-Serverless Problems
Square Peg Round Hole: Serverless Solutions For Non-Serverless Problems
Serverless Design Patterns
Serveless Design Patterns (Serverless Computing London)
Jumpstart your idea with AWS Serverless [Oct 2020]
Serveless design patterns (VoxxedDays Luxembourg)
Serverless Architectural Patterns
Skillenza Build with Serverless Challenge - Advanced Serverless Concepts
Introduction to Serverless through Architectural Patterns
Why Serverless?
What can you do with lambda in 2020
Serverless at Lifestage
Running serverless at scale
Serverless: Beyond Lambda Functions (V2)
Demistifying serverless on aws
Čtvrtkon #64 - AWS Serverless - Michal Haták
How Serverless Changes DevOps
Serverless Streams, Topics, Queues, & APIs! Pick the Right Serverless Applica...
Docebo: history of a journey from legacy to serverless
Ad

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
KodekX | Application Modernization Development
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
MYSQL Presentation for SQL database connectivity
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPT
Teaching material agriculture food technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Machine learning based COVID-19 study performance prediction
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Digital-Transformation-Roadmap-for-Companies.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Spectral efficient network and resource selection model in 5G networks
NewMind AI Monthly Chronicles - July 2025
Review of recent advances in non-invasive hemoglobin estimation
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KodekX | Application Modernization Development
Chapter 3 Spatial Domain Image Processing.pdf
20250228 LYD VKU AI Blended-Learning.pptx
MYSQL Presentation for SQL database connectivity
The AUB Centre for AI in Media Proposal.docx
Encapsulation_ Review paper, used for researhc scholars
Mobile App Security Testing_ A Comprehensive Guide.pdf
Understanding_Digital_Forensics_Presentation.pptx
Teaching material agriculture food technology
Big Data Technologies - Introduction.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

Building resilient serverless systems with non-serverless components - Serverlessconf NYC 2019

  • 1. Building Resilient Serverless Systems with Non-Serverless Components Jeremy Daly CTO, AlertMe.news @jeremy_daly
  • 2. Jeremy Daly • CTO at AlertMe.news • Consult with companies building in the cloud • 20+ year veteran of technology startups • Started working with AWS in 2009 and started using Lambda in 2015 • Blogger (jeremydaly.com), OSS contributor, speaker • Publish the Off-by-none serverless newsletter • Host of the Serverless Chats podcast @jeremy_daly
  • 3. Agenda • What is resiliency and what is serverless? • Working with “less-than-scalable” RDBMS • Using unreliable APIs • Managing API quotas • Decoupling our services • Other “non-serverless” components @jeremy_daly
  • 4. What is resiliency? “The ability of a software solution to absorb the impact of a problem in one or more parts of a system, while continuing to provide an acceptable service level to the business.” ~ IBM @jeremy_daly IT’S NOT ABOUT PREVENTING FAILURE IT’S UNDERSTANDING HOWTO GRACEFULLY DEAL WITH IT
  • 5. What does it mean to be serverless? • No server management • Flexible scaling • Pay for value • Automated high availability @jeremy_daly
  • 6. Serverless versus non-serverless @jeremy_daly ElastiCache RDS EMR Amazon ES Redshift Fargate Anything “on EC2”Lambda Cognito Kinesis S3 DynamoDB SQS SNS API Gateway CloudWatch AppSync IoT Comprehend Serverless Managed Not Serverless DocumentDB (MongoDB) Managed Streaming for Kafka Definitely
  • 7. Everything has limits! • Reserved Concurrency 🚦 • FunctionTimeouts ⏳ • Memory Limits 🧠 • NetworkThroughput 🚰 Some components are better than others @jeremy_daly Know Your Limits
  • 8. Simple Serverless Web Service Client API Gateway Lambda DynamoDB @jeremy_daly Highly Scalable Highly Scalable Highly Scalable
  • 9. “I want my, I want my, I want my SQL” ~ Dire Straits
  • 10. Simple Serverless Web Service Client API Gateway Lambda @jeremy_daly Highly Scalable Highly Scalable NotThat Scalable 😳 RDS ^ not so RDBMS and FaaS don’t play nicely together: • Concurrency model doesn’t allow connection pooling • Limited number of DB connections available • Recycled containers create zombies
  • 11. Ways to Manage DB Connections • Increase max_connections setting • Limit concurrent executions • Lower your connection timeouts • Limit connections per username • Close connection before function ends @jeremy_daly 🤞 😡 ⚠ 🎲 😱 👎
  • 12. Better Ways to Manage DB Connections • Implement a good caching strategy 💾 • Buffer events for throttling and durability 🏋! " ♂ • Manage connections ourselves 🤔 @jeremy_daly 👎
  • 13. hit Implement a good caching strategy Client API Gateway RDSLambda Elasticache Key Points: • Create new RDS connections ONLY on misses • Make sureTTLs are set appropriately • Include the ability to invalidate cache @jeremy_daly YOU STILL NEEDTO SIZEYOUR DATABASE CLUSTERS APPROPRIATELY
  • 14. Do you really need immediate feedback? Synchronous Communication ⏳ Services can be invoked by other services and must wait for a reply. This is considered a blocking request, because the invoking service cannot finish executing until a response is received. Asynchronous Communication 🚀 This is a non-blocking request. A service can invoke (or trigger) another service directly or it can use another type of communication channel to queue information.The service typically only needs to wait for confirmation (ack) that the request was received. @jeremy_daly
  • 15. RDS Buffer events for throttling and durability Client API Gateway SQS Queue SQS (DLQ) Lambda Lambda (throttled) ack “Asynchronous” Request Synchronous Request @jeremy_daly Key Points: • SQS adds durability • Throttled Lambdas reduce downstream pressure • Failed events are stored for further inspection/replay Limit the concurrency to match RDS throughput
  • 16. Manage connections ourselves 1. Count open connections 2. Close connection if connection ratio threshold exceeded 3. Close sleeping connections with high time values 4. Retry connections with exponential back off @jeremy_daly
  • 18. Count open connections @jeremy_daly Query the processlist to get the total number of active connections
  • 19. Close connection if over ratio threshold @jeremy_daly If we exceed the connection ratio Calculate our timeout Try to kill zombies If no zombies, terminate connection Else, just try to kill zombies
  • 20. Close sleeping connections with high time values @jeremy_daly Query processlist for zombies Kill zombies
  • 21. Retry connections with exponential back off @jeremy_daly If error trying to connect Retry with Jitter
  • 22. Does this really work? @jeremy_daly • Aurora Serverless (2 ACUs) • 90 connections available • 1,024 MB of memory • 500 users/sec for one minute • Avg. response time was 41 ms • ZERO ERRORS
  • 23. We shouldn’t have to do this! @jeremy_daly Amazon Aurora Serverless Aurora Serverless DATA API Doesn’t solve the max_connections issue Getting better, but limited to Aurora Serverless
  • 25. Manage calls to third-party APIs • Implement a good caching strategy • Buffer events for throttling and durability • Implement circuit breakers 🚦 @jeremy_daly
  • 26. DynamoDB Stripe API The Circuit Breaker API Consumer API Gateway Lambda Key Points: • Cache your cache with warm functions • Use a reasonable failure count • Understand idempotency Status Check CLOSED OPEN Increment Failure Count HALF OPEN “Everything fails all the time.” ~WernerVogels @jeremy_daly 🔥 🔥 Elasticache or
  • 27. What about quotas? • Concurrency has no effect on frequency ⏰ • Stateless functions are not coordinated 😿 • Step Functions would be very expensive 💰 • Adding state wouldn’t prevent needless invocations 🗑 @jeremy_daly
  • 28. Can we build a better system? • 100% serverless • Cost effective • Scalable • Resilient • Efficient • Coordinated @jeremy_daly BUT I DON’T HAVE TIMETOTELLYOU ABOUT IT! YES!
  • 29. Lambda Orchestrator (concurrency 1) The Lambda Orchestrator DynamoDB LambdaWorker LambdaWorker LambdaWorker Concurrent Executions of the SAME function SQS (DLQ) @jeremy_daly CloudWatch Rule (trigger every minute) SQS QueueSQS (DLQ) Status? Gmail API 250 Quota Units per minute jeremydaly.com/throttling-third-party-api-calls-with-aws-lambda
  • 31. Multicasting with SNS Key Points: • SNS has a “well-defined API” • Decouples downstream processes • Allows multiple subscribers with message filters Client SNS “Asynchronous” Request ack Event Service @jeremy_daly HTTP SMS Lambda SQS Email
  • 32. Multicasting with EventBridge Key Points: • Create up to 100 event buses per account • Allows multiple subscribers with RULES and EVENT PATTERNS • Forward events to other accounts @jeremy_daly Asynchronous “PutEvents” Request ack w/ event id Amazon EventBridge Lambda SQS Client Step Function Event Bus +13 others
  • 33. Stripe API @jeremy_daly Distribute & Throttle ack SQS Queue Lambda (concurrency 25) SNS Topic Client API Gateway Lambda Order Service total > $0 Key Points: • SNS to SQS is “guaranteed” (100,010 retries) • Filter events to selectively trigger services • Manage throttling/quotas per service RDS SQS Queue Lambda (concurrency 10) SMS Alerting Service Twilio API SQS Queue Lambda (concurrency 5) Billing Service status == ”order_complete” Event Service
  • 34. Other non-serverless components • Managed Services • Legacy Systems • Our own “serverless” APIs @jeremy_daly
  • 35. “Non-serverless” components are inevitable • Know the limits of your components • Use a good caching strategy • Embrace asynchronous processes • Buffer and throttle events to distributed systems • Utilize eventual consistency @jeremy_daly 👈
  • 36. Blog: jeremydaly.com Newsletter: Offbynone.io Podcast: ServerlessChats.com Lambda API: LambdaAPI.com GitHub: github.com/jeremydaly Twitter: @jeremy_daly @jeremy_daly