SlideShare a Scribd company logo
How we scaled
push messaging
for millions of
Netflix devices
Susheel Aroskar
Cloud Gateway
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
neflix-push-messaging-scale
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Why do we need
push?
Scaling Push Messaging for Millions of Devices @Netflix
How I spend my
time in Netflix
application...
● What is push?
● What is push?
● How you can build it
● What is push?
● How you can build it
● How you can operate it
● What is push?
● How you can build it
● How you can operate it
● What can you do with it
Susheel Aroskar
Senior Software Engineer
Cloud Gateway
saroskar@netflix.com
github.com/raksoras
@susheelaroskar
PERSIST
UNTIL
SOMETHING
HAPPENS
PERSIST
UNTIL
SOMETHING
HAPPENS
Zuul Push
Architecture
Zuul
Push
Servers
Zuul
Push
Servers
WebSockets / SSE
Push
Registry
Zuul
Push
Servers
Register
User
WebSockets / SSE
Push
Registry
Zuul
Push
Servers
Register
User
WebSockets / SSE
Push Library
Push
Registry
Zuul
Push
Servers
Register
User
WebSockets / SSE
Push Library
Push
Message
Queue
Push
Registry
Zuul
Push
Servers
Register
User
WebSockets / SSE
Message
Processor
Push Library
Push
Message
Queue
Push
Registry
Zuul
Push
Servers
Register
User
WebSockets / SSE
Message
Processor
Push Library
Push
Message
Queue
Push
Registry
Zuul
Push
Servers
Register
User
WebSockets / SSE
Message
Processor
Push Library
Push
Message
Queue
Push
Registry
Zuul
Push
Servers
Register
User
Lookup server
WebSockets / SSE
Message
Processor
Push Library
Push
Message
Queue
Push
Registry
Zuul
Push
Servers
Register
User
Lookup server
Deliver
message
WebSockets / SSE
Handling millions of
persistent connections
Zuul Push server
C10K challenge
Socket Socket
Thread per Connection
Thread-1 Thread-2
Read
Write
Write
Read
Socket Socket
Thread per Connection
Thread-1 Thread-2
Read
Write
Write
Read
Async I/O
Socket
read
callback
write
callback
Socket
Single
Threadread
callback
write
callback
S
O
C
K
E
T
Channel
Inbound
Handler
Channel
Inbound
Handler
Channel
Outbound
Handler
Channel
Outbound
Handler
Channel Pipeline
Head Tail
Netty
protected void addPushHandlers(ChannelPipeline pl) {
pl.addLast(new HttpServerCodec());
pl.addLast(new HttpObjectAggregator());
pl.addLast(getPushAuthHandler());
pl.addLast(new WebSocketServerCompressionHandler());
pl.addLast(new WebSocketServerProtocolHandler());
pl.addLast(getPushRegistrationHandler());
}
Authenticate by Cookies, JWT
or any other custom scheme
Plug in your custom authentication
policy
Tracking clients’ connection
Metadata in real-time
Push Registry
public class MyRegistration extends PushRegistrationHandler {
@Override
protected void registerClient(
ChannelHandlerContext ctx,
PushUserAuth auth,
PushConnection conn,
PushConnectionRegistry registry) {
super.registerClient(ctx, authEvent, conn, registry);
ctx.executor().submit(() -> storeInRedis(auth));
}
}
Push registry features checklist
● Low read latency
Push registry features checklist
● Low read latency
● Record expiry
Push registry features checklist
● Low read latency
● Record expiry
● Sharding
Push registry features checklist
● Low read latency
● Record expiry
● Sharding
● Replication
Push registry features checklist
Scaling Push Messaging for Millions of Devices @Netflix
What we use
https://github.com/Netflix/dynomite
Redis
+ Auto-sharding
+ Read/Write quorum
+ Cross-region replication
Dynomite
Message
Processing
Queue, Route
Deliver
We use Kafka message
queues to decouple
message senders from
receivers
Fire and Forget
Cross-region
Replication
Different queues for
different priorities
We run multiple message
processor instances in parallel
to scale our message
processing throughput.
Operating Zuul Push
Different than REST of them
Persistent connections make
Zuul Push server stateful
Long lived stable connections
Persistent connections make
Zuul Push server stateful
Long lived stable connections
○ Great for client efficiency
Persistent connections make
Zuul Push server stateful
Long lived stable connections
○ Great for client efficiency
○ Terrible for quick deploy/rollback
If you love your clients set them free...
Tear down connections
periodically
Randomize each
connection’s lifetime
#reconnects
Time
Effect of
randomizing
connection
lifetime on
reconnect peaks
Ask client to close its
connection.
Most connections
are idle!
How to optimize push server
BIG Server, tons of connections
ulimit -n 262144
net.ipv4.tcp_rmem="4096 87380
16777216"
net.ipv4.tcp_wmem="4096 87380
16777216"
Scaling Push Messaging for Millions of Devices @Netflix
Scaling Push Messaging for Millions of Devices @Netflix
Goldilocks strategy
Optimize for cost, NOT instance count
✓
$$ $$
❌
How to auto-scale?
How to auto-scale?
RPS? CPU??
How to auto-scale?
RPS? CPU??
Open
Connections
Amazon Elastic Load Balancers cannot proxy
WebSockets.
Solution - Run ELB as a TCP load balancer
7 Application
6 Presentation
5 Session
4 Transport
3 Network
2 Data link
1 Physical
HTTP
TCP
IP
Ethernet
OSI 7 network layers
(conceptual)
HTTP over TCP/IP
Layer 7 HTTP
(WebSocket Upgrade
Request)
Layer 4 TCP
Managing push cluster - a quick recap
● Recycle connections after tens of minutes
Managing push cluster - a quick recap
● Recycle connections after tens of minutes
● Randomize each connection’s lifetime
Managing push cluster - a quick recap
● Recycle connections after tens of minutes
● Randomize connection’s lifetime
● More number of smaller servers >> few BIG servers
Managing push cluster - a quick recap
● Recycle connections after tens of minutes
● Randomize connection’s lifetime
● More number of smaller servers >> few BIG servers
● Auto-scale on number of open connections per box
Managing push cluster - a quick recap
● Recycle connections after tens of minutes
● Randomize connection’s lifetime
● More number of smaller servers >> few BIG servers
● Auto-scale on number of open connections per box
● WebSocket aware vs TCP load balancer
If you build it,
They will push
On-demand diagnostics
Remote recovery
User messaging
WHAT WILL YOU
USE IT FOR?
Call to action
PULL!
PULL!
https://github.com/Netflix/zuul
In conclusion, push can make you
In conclusion, push can make you
rich (in functionality),
In conclusion, push can make you
rich (in functionality),
thin (by getting rid of polling)
In conclusion, push can make you
rich (in functionality),
thin (by getting rid of polling)
and happy!
Thank you.
Questions?
Susheel Aroskar
Senior Software Engineer
Cloud Gateway
saroskar@netflix.com
github.com/raksoras
@susheelaroskar
Rich,
exciting
Apps
More
efficient
systems
Easy to
customize
Easy to
operate
Zuul Push
Battle tested
Watch the video with slide synchronization on
InfoQ.com!
https://www.infoq.com/presentations/neflix-
push-messaging-scale

More Related Content

PPTX
Scaling Push Messaging for Millions of Netflix Devices
PDF
The Complete Guide to Service Mesh
PPTX
Apache Kafka Best Practices
PDF
Trend Micro Big Data Platform and Apache Bigtop
PDF
Introduction to Kong API Gateway
PDF
Apache Kafka Fundamentals for Architects, Admins and Developers
PPTX
Containers and workload security an overview
PDF
Microservices for Application Modernisation
Scaling Push Messaging for Millions of Netflix Devices
The Complete Guide to Service Mesh
Apache Kafka Best Practices
Trend Micro Big Data Platform and Apache Bigtop
Introduction to Kong API Gateway
Apache Kafka Fundamentals for Architects, Admins and Developers
Containers and workload security an overview
Microservices for Application Modernisation

What's hot (20)

PPTX
AWS Certified Solutions Architect Professional Course S6-S9
PPTX
The Top 5 Apache Kafka Use Cases and Architectures in 2022
PPTX
Rethinking Cloud Proxies
PPSX
Agile, User Stories, Domain Driven Design
PPTX
Kubernetes #6 advanced scheduling
PPSX
Elastic-Engineering
PDF
The Paved Road at Netflix
PPTX
API Management
PDF
Container Security Essentials
PPSX
Service Mesh - Observability
PDF
Cilium - API-aware Networking and Security for Containers based on BPF
PDF
An intro to Kubernetes operators
PDF
Strengthen and Scale Security Using DevSecOps - OWASP Indonesia
PDF
The Art of Discovering Bounded Contexts
PDF
Monitoring with prometheus
PDF
Open API and API Management - Introduction and Comparison of Products: TIBCO ...
PPTX
DevSecOps
PDF
End-End Security with Confluent Platform
PDF
Red Hat OpenShift Operators - Operators ABC
AWS Certified Solutions Architect Professional Course S6-S9
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Rethinking Cloud Proxies
Agile, User Stories, Domain Driven Design
Kubernetes #6 advanced scheduling
Elastic-Engineering
The Paved Road at Netflix
API Management
Container Security Essentials
Service Mesh - Observability
Cilium - API-aware Networking and Security for Containers based on BPF
An intro to Kubernetes operators
Strengthen and Scale Security Using DevSecOps - OWASP Indonesia
The Art of Discovering Bounded Contexts
Monitoring with prometheus
Open API and API Management - Introduction and Comparison of Products: TIBCO ...
DevSecOps
End-End Security with Confluent Platform
Red Hat OpenShift Operators - Operators ABC
Ad

Similar to Scaling Push Messaging for Millions of Devices @Netflix (20)

PDF
Fast Streaming into Clickhouse with Apache Pulsar
PDF
No REST - Architecting Real-time Bulk Async APIs
PDF
How Netflix Directs 1/3rd of Internet Traffic
PDF
Stranger Things: The Forces that Disrupt Netflix
PDF
Netty @Apple: Large Scale Deployment/Connectivity
PDF
What's new in OpenStack Liberty
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
QN Blue Lava
PDF
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...
PDF
Introduction-to-Service-Mesh-with-Istio-and-Kiali-OSS-Japan-July-2019.pdf
PDF
Introduction-to-Service-Mesh-with-Istio-and-Kiali-OSS-Japan-July-2019.pdf
PDF
Fault Tolerance at Speed
PDF
Modern Web Security, Lazy but Mindful Like a Fox
PDF
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PDF
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
PDF
Cloud lunch and learn real-time streaming in azure
PPTX
Delivering High Performance Websites with NGINX
PDF
Generating Unified APIs with Protocol Buffers and gRPC
PDF
Spring and Pivotal Application Service - SpringOne Tour - Boston
PDF
Paasta: Application Delivery at Yelp
Fast Streaming into Clickhouse with Apache Pulsar
No REST - Architecting Real-time Bulk Async APIs
How Netflix Directs 1/3rd of Internet Traffic
Stranger Things: The Forces that Disrupt Netflix
Netty @Apple: Large Scale Deployment/Connectivity
What's new in OpenStack Liberty
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
QN Blue Lava
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...
Introduction-to-Service-Mesh-with-Istio-and-Kiali-OSS-Japan-July-2019.pdf
Introduction-to-Service-Mesh-with-Istio-and-Kiali-OSS-Japan-July-2019.pdf
Fault Tolerance at Speed
Modern Web Security, Lazy but Mindful Like a Fox
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Cloud lunch and learn real-time streaming in azure
Delivering High Performance Websites with NGINX
Generating Unified APIs with Protocol Buffers and gRPC
Spring and Pivotal Application Service - SpringOne Tour - Boston
Paasta: Application Delivery at Yelp
Ad

More from C4Media (20)

PDF
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
PDF
Next Generation Client APIs in Envoy Mobile
PDF
Software Teams and Teamwork Trends Report Q1 2020
PDF
Understand the Trade-offs Using Compilers for Java Applications
PDF
Kafka Needs No Keeper
PDF
High Performing Teams Act Like Owners
PDF
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
PDF
Service Meshes- The Ultimate Guide
PDF
Shifting Left with Cloud Native CI/CD
PDF
CI/CD for Machine Learning
PDF
Architectures That Scale Deep - Regaining Control in Deep Systems
PDF
ML in the Browser: Interactive Experiences with Tensorflow.js
PDF
Build Your Own WebAssembly Compiler
PDF
User & Device Identity for Microservices @ Netflix Scale
PDF
Scaling Patterns for Netflix's Edge
PDF
Make Your Electron App Feel at Home Everywhere
PDF
The Talk You've Been Await-ing For
PDF
Future of Data Engineering
PDF
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
PDF
Navigating Complexity: High-performance Delivery and Discovery Teams
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Next Generation Client APIs in Envoy Mobile
Software Teams and Teamwork Trends Report Q1 2020
Understand the Trade-offs Using Compilers for Java Applications
Kafka Needs No Keeper
High Performing Teams Act Like Owners
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Service Meshes- The Ultimate Guide
Shifting Left with Cloud Native CI/CD
CI/CD for Machine Learning
Architectures That Scale Deep - Regaining Control in Deep Systems
ML in the Browser: Interactive Experiences with Tensorflow.js
Build Your Own WebAssembly Compiler
User & Device Identity for Microservices @ Netflix Scale
Scaling Patterns for Netflix's Edge
Make Your Electron App Feel at Home Everywhere
The Talk You've Been Await-ing For
Future of Data Engineering
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Navigating Complexity: High-performance Delivery and Discovery Teams

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Spectroscopy.pptx food analysis technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation theory and applications.pdf
PDF
KodekX | Application Modernization Development
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Big Data Technologies - Introduction.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
MIND Revenue Release Quarter 2 2025 Press Release
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Spectral efficient network and resource selection model in 5G networks
Spectroscopy.pptx food analysis technology
Building Integrated photovoltaic BIPV_UPV.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation theory and applications.pdf
KodekX | Application Modernization Development
Dropbox Q2 2025 Financial Results & Investor Presentation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Empathic Computing: Creating Shared Understanding
Big Data Technologies - Introduction.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?

Scaling Push Messaging for Millions of Devices @Netflix