SlideShare a Scribd company logo
Cloudius Systems presents:
Seastar
Avi Kivity, April 13 2015
● New tech, runs on physical machines, VMs,Linux/OSv
● Multi-million IOPS, fully scalable
● Perfect building block for database/filesystem/cache
● Share-nothing, fully asynchronous model
● Open Source
SeaStar Technology
SeaStar current performance
SeaStar
Before: Thread model After: SeaStar shards
Problem with today’s programing
model
+ Single core performance (frequency, IPC) no
longer growing
+ #core grows but it’s hard to utilize. Apps don’t
scale
+ Locks have costs even w/o contention
+ Data is allocated on one core, copied and used on
others
+ Software can’t keep up with the recent hardware
(SSD, line rate for 10Gbps, NUMA, etc)
Kernel
Application
TCP/IPScheduler
queuequeuequeuequeuequeue
threads
NIC
Queues
Kernel
Traditional stack
Memory
SeaStar Framework
Linear scaling by #core
+ Each engine is executed by each core
+ Shared-nothing per-core design
+ Fits existing shared-nothing distributed
applications model
+ Full kernel bypass, supports zero-copy
+ No threads, no context switch and no locks
+ Instead, asynchronous lambda
invocation
Application
TCP/IP
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Kernel
SeaStar Framework Comparison
Application
TCP/IPScheduler
queuequeuequeuequeuequeue
threads
NIC
Queues
Kernel
Traditional stack SeaStar’s sharded stack
Memory
Lock contention
Cache contention
NUMA unfriendly
Application
TCP/IP
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
No contention
Linear scaling
NUMA friendly
SeaStar handles 1,000,000s
connections in parallel!
Traditional stack SeaStar’s sharded stack
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise is a
pointer to
eventually
computed value
Task is a
pointer to a
lambda function
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread is a
function pointer
Stack is a byte
array from 64k
to megabytes
Context switch cost is
high. Large stacks
pollutes the caches
No sharing, millions
of parallel events
SeaStar current performance
Stock TCP stack SeaStar’s native TCP stack
Basic model
■ Futures
■ Promises
■ Continuations
F-P-C defined: Future
A future is a result of a computation
that may not be available yet.
■ Data buffer from the network
■ Timer expiration
■ Completion of a disk write
■ Result computation that requires the values from one or
more other futures.
F-P-C defined: Promise
A promise is an object or function
that provides you with a future, with
the expectation that it will fulfil the
future.
Basic future/promise
future<int> get(); // promises an int will be produced eventually
future<> put(int) // promises to store an int
void f() {
get().then([] (int value) {
put(value + 1).then([] {
std::cout << "value stored successfullyn";
});
});
}
Chaining
future<int> get(); // promises an int will be produced eventually
future<> put(int) // promises to store an int
void f() {
get().then([] (int value) {
return put(value + 1);
}).then([] {
std::cout << "value stored successfullyn";
});
}
Zero copy friendly
future<temporary_buffer>
connected_socket::read(size_t n);
■ temporary_buffer points at driver-provided pages if
possible
■ discarded after use
Zero copy friendly (2)
future<size_t>
connected_socket::write(temporary_buffer);
■ Future becomes ready when TCP window allows
sending more data (usually immediately)
■ temporary_buffer discarded after data is ACKed
■ can call delete[] or decrement a reference count
Dual Networking Stack
Networking API
Seastar (native) Stack POSIX (hosted) stack
Linux kernel (sockets)
User-space TCP/IP
Interface layer
DPDK
Virtio Xen
igb ixgb
Disk I/O
■ Zero copy using Linux AIO and O_DIRECT
■ Some operations using worker threads (open()
etc.)
■ Plans for direct NVMe support
Rich APIs
● HTTP Server
● HTTP Client
● RPC client/server
● map_reduce
● parallel_for_each
● distributed<>
● when_all()
● timers
More info
■ http://github.com/cloudius-systems/seastar
■ http://seastar-project.com
Thank you
@CloudiusSystems

More Related Content

PDF
Understanding MicroSERVICE Architecture with Java & Spring Boot
PPT
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...
PDF
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
PDF
pfSense 2.4.4 Short Topic Miscellany - pfSense Hangout August 2018
PDF
Nodejs presentation
PPT
Introduction to JavaScript (1).ppt
PPTX
Java bean
Understanding MicroSERVICE Architecture with Java & Spring Boot
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
pfSense 2.4.4 Short Topic Miscellany - pfSense Hangout August 2018
Nodejs presentation
Introduction to JavaScript (1).ppt
Java bean

What's hot (20)

PDF
Apache Kafka Introduction
PPTX
Computer organization &amp; architecture chapter-1
PDF
Apache Kafka Architecture & Fundamentals Explained
PPTX
Intro to React
PPTX
Air traffic controller - Streams Processing meetup
PDF
Virtual Memory and Paging
PPT
PDF
Workshop 4: NodeJS. Express Framework & MongoDB.
PPTX
Redis vs Aerospike
PPTX
Presentation on Core java
PPT
Builder pattern
PPTX
dot net technology
PPT
Java Presentation
PDF
WT UNIT-2 XML.pdf
PDF
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
PDF
An Introduction to Apache Kafka
PDF
SERVER SIDE PROGRAMMING
PPT
Java Servlets
PDF
Asp.net state management
Apache Kafka Introduction
Computer organization &amp; architecture chapter-1
Apache Kafka Architecture & Fundamentals Explained
Intro to React
Air traffic controller - Streams Processing meetup
Virtual Memory and Paging
Workshop 4: NodeJS. Express Framework & MongoDB.
Redis vs Aerospike
Presentation on Core java
Builder pattern
dot net technology
Java Presentation
WT UNIT-2 XML.pdf
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
An Introduction to Apache Kafka
SERVER SIDE PROGRAMMING
Java Servlets
Asp.net state management
Ad

Viewers also liked (20)

PDF
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
PDF
OSv at Cassandra Summit
PDF
Scylla Summit 2016: ScyllaDB, Present and Future
PDF
Performance Monitoring: Understanding Your Scylla Cluster
PDF
Scylla Summit 2016: Keynote - Big Data Goes Native
PDF
ScyllaDB @ Apache BigData, may 2016
PDF
Seastar @ NYCC++UG
PDF
Scylla Summit 2016: Compose on Containing the Database
PDF
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
PDF
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
PDF
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
PDF
Scylla Summit 2016: Graph Processing with Titan and Scylla
PDF
OSv – The OS designed for the Cloud
PPTX
OSv: probably the best OS for cloud workloads you've never hear of
PDF
Scylla Summit 2016: Scylla at Samsung SDS
PDF
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
PDF
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
PDF
Managing Cassandra at Scale by Al Tobey
PPTX
Cassandra Performance and Scalability on AWS
PDF
DataStax: Extreme Cassandra Optimization: The Sequel
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
OSv at Cassandra Summit
Scylla Summit 2016: ScyllaDB, Present and Future
Performance Monitoring: Understanding Your Scylla Cluster
Scylla Summit 2016: Keynote - Big Data Goes Native
ScyllaDB @ Apache BigData, may 2016
Seastar @ NYCC++UG
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Scylla Summit 2016: Graph Processing with Titan and Scylla
OSv – The OS designed for the Cloud
OSv: probably the best OS for cloud workloads you've never hear of
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Managing Cassandra at Scale by Al Tobey
Cassandra Performance and Scalability on AWS
DataStax: Extreme Cassandra Optimization: The Sequel
Ad

Similar to Back to the future with C++ and Seastar (20)

PDF
Seastar @ SF/BA C++UG
PPTX
Seastar at Linux Foundation Collaboration Summit
PPTX
Adventures in Thread-per-Core Async with Redpanda and Seastar
PPTX
Seastar Summit 2019 Keynote
PDF
ScyllaDB: NoSQL at Ludicrous Speed
PDF
Our Concurrent Past; Our Distributed Future
PDF
optimizing_ceph_flash
KEY
High performance network programming on the jvm oscon 2012
PDF
Distributed Systems in Data Engineering
PDF
Voldemort Nosql
PPT
Contiki OS preparation usage with kit CC256
PPTX
The Big Data Stack
PDF
Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)
PDF
Critical Attributes for a High-Performance, Low-Latency Database
PPTX
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
PDF
Scalability, Availability & Stability Patterns
PPTX
Software Architectures, Week 5 - Advanced Architectures
PPTX
High performace network of Cloud Native Taiwan User Group
PPT
PFQ@ 9th Italian Networking Workshop (Courmayeur)
PPTX
Above the cloud joarder kamal
Seastar @ SF/BA C++UG
Seastar at Linux Foundation Collaboration Summit
Adventures in Thread-per-Core Async with Redpanda and Seastar
Seastar Summit 2019 Keynote
ScyllaDB: NoSQL at Ludicrous Speed
Our Concurrent Past; Our Distributed Future
optimizing_ceph_flash
High performance network programming on the jvm oscon 2012
Distributed Systems in Data Engineering
Voldemort Nosql
Contiki OS preparation usage with kit CC256
The Big Data Stack
Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)
Critical Attributes for a High-Performance, Low-Latency Database
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Scalability, Availability & Stability Patterns
Software Architectures, Week 5 - Advanced Architectures
High performace network of Cloud Native Taiwan User Group
PFQ@ 9th Italian Networking Workshop (Courmayeur)
Above the cloud joarder kamal

Recently uploaded (20)

PDF
top salesforce developer skills in 2025.pdf
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Nekopoi APK 2025 free lastest update
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
history of c programming in notes for students .pptx
PPTX
ai tools demonstartion for schools and inter college
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Introduction to Artificial Intelligence
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Transform Your Business with a Software ERP System
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
System and Network Administraation Chapter 3
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
AI in Product Development-omnex systems
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
System and Network Administration Chapter 2
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
PTS Company Brochure 2025 (1).pdf.......
top salesforce developer skills in 2025.pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Nekopoi APK 2025 free lastest update
Design an Analysis of Algorithms I-SECS-1021-03
history of c programming in notes for students .pptx
ai tools demonstartion for schools and inter college
Understanding Forklifts - TECH EHS Solution
Introduction to Artificial Intelligence
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Transform Your Business with a Software ERP System
Internet Downloader Manager (IDM) Crack 6.42 Build 41
System and Network Administraation Chapter 3
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
How to Migrate SBCGlobal Email to Yahoo Easily
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
AI in Product Development-omnex systems
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
System and Network Administration Chapter 2
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PTS Company Brochure 2025 (1).pdf.......

Back to the future with C++ and Seastar

  • 2. ● New tech, runs on physical machines, VMs,Linux/OSv ● Multi-million IOPS, fully scalable ● Perfect building block for database/filesystem/cache ● Share-nothing, fully asynchronous model ● Open Source SeaStar Technology
  • 4. SeaStar Before: Thread model After: SeaStar shards
  • 5. Problem with today’s programing model + Single core performance (frequency, IPC) no longer growing + #core grows but it’s hard to utilize. Apps don’t scale + Locks have costs even w/o contention + Data is allocated on one core, copied and used on others + Software can’t keep up with the recent hardware (SSD, line rate for 10Gbps, NUMA, etc) Kernel Application TCP/IPScheduler queuequeuequeuequeuequeue threads NIC Queues Kernel Traditional stack Memory
  • 6. SeaStar Framework Linear scaling by #core + Each engine is executed by each core + Shared-nothing per-core design + Fits existing shared-nothing distributed applications model + Full kernel bypass, supports zero-copy + No threads, no context switch and no locks + Instead, asynchronous lambda invocation Application TCP/IP Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace Application TCP/IP Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace Application TCP/IP Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace Application TCP/IP Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace
  • 7. Kernel SeaStar Framework Comparison Application TCP/IPScheduler queuequeuequeuequeuequeue threads NIC Queues Kernel Traditional stack SeaStar’s sharded stack Memory Lock contention Cache contention NUMA unfriendly Application TCP/IP Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace Application TCP/IP Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace Application TCP/IP Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace Application TCP/IP Task Scheduler queuequeuequeuequeuequeuesmp queue NIC Queue DPDK Kernel (isn’t involved) Userspace No contention Linear scaling NUMA friendly
  • 8. SeaStar handles 1,000,000s connections in parallel! Traditional stack SeaStar’s sharded stack Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise is a pointer to eventually computed value Task is a pointer to a lambda function Scheduler CPU Scheduler CPU Scheduler CPU Scheduler CPU Scheduler CPU Thread Stack Thread Stack Thread Stack Thread Stack Thread Stack Thread Stack Thread Stack Thread Stack Thread is a function pointer Stack is a byte array from 64k to megabytes Context switch cost is high. Large stacks pollutes the caches No sharing, millions of parallel events
  • 9. SeaStar current performance Stock TCP stack SeaStar’s native TCP stack
  • 10. Basic model ■ Futures ■ Promises ■ Continuations
  • 11. F-P-C defined: Future A future is a result of a computation that may not be available yet. ■ Data buffer from the network ■ Timer expiration ■ Completion of a disk write ■ Result computation that requires the values from one or more other futures.
  • 12. F-P-C defined: Promise A promise is an object or function that provides you with a future, with the expectation that it will fulfil the future.
  • 13. Basic future/promise future<int> get(); // promises an int will be produced eventually future<> put(int) // promises to store an int void f() { get().then([] (int value) { put(value + 1).then([] { std::cout << "value stored successfullyn"; }); }); }
  • 14. Chaining future<int> get(); // promises an int will be produced eventually future<> put(int) // promises to store an int void f() { get().then([] (int value) { return put(value + 1); }).then([] { std::cout << "value stored successfullyn"; }); }
  • 15. Zero copy friendly future<temporary_buffer> connected_socket::read(size_t n); ■ temporary_buffer points at driver-provided pages if possible ■ discarded after use
  • 16. Zero copy friendly (2) future<size_t> connected_socket::write(temporary_buffer); ■ Future becomes ready when TCP window allows sending more data (usually immediately) ■ temporary_buffer discarded after data is ACKed ■ can call delete[] or decrement a reference count
  • 17. Dual Networking Stack Networking API Seastar (native) Stack POSIX (hosted) stack Linux kernel (sockets) User-space TCP/IP Interface layer DPDK Virtio Xen igb ixgb
  • 18. Disk I/O ■ Zero copy using Linux AIO and O_DIRECT ■ Some operations using worker threads (open() etc.) ■ Plans for direct NVMe support
  • 19. Rich APIs ● HTTP Server ● HTTP Client ● RPC client/server ● map_reduce ● parallel_for_each ● distributed<> ● when_all() ● timers