Back to the future with C++ and Seastar

Cloudius Systems presents:
Seastar
Avi Kivity, April 13 2015

● New tech, runs on physical machines, VMs,Linux/OSv
● Multi-million IOPS, fully scalable
● Perfect building block for database/filesystem/cache
● Share-nothing, fully asynchronous model
● Open Source
SeaStar Technology

SeaStar
Before: Thread model After: SeaStar shards

Problem with today’s programing
model
+ Single core performance (frequency, IPC) no
longer growing
+ #core grows but it’s hard to utilize. Apps don’t
scale
+ Locks have costs even w/o contention
+ Data is allocated on one core, copied and used on
others
+ Software can’t keep up with the recent hardware
(SSD, line rate for 10Gbps, NUMA, etc)
Kernel
Application
TCP/IPScheduler
queuequeuequeuequeuequeue
threads
NIC
Queues
Kernel
Traditional stack
Memory

SeaStar Framework
Linear scaling by #core
+ Each engine is executed by each core
+ Shared-nothing per-core design
+ Fits existing shared-nothing distributed
applications model
+ Full kernel bypass, supports zero-copy
+ No threads, no context switch and no locks
+ Instead, asynchronous lambda
invocation
Application
TCP/IP
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace

Kernel
SeaStar Framework Comparison
Application
TCP/IPScheduler
queuequeuequeuequeuequeue
threads
NIC
Queues
Kernel
Traditional stack SeaStar’s sharded stack
Memory
Lock contention
Cache contention
NUMA unfriendly
Application
TCP/IP
Task Scheduler
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
No contention
Linear scaling
NUMA friendly

SeaStar handles 1,000,000s
connections in parallel!
Traditional stack SeaStar’s sharded stack
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise is a
pointer to
eventually
computed value
Task is a
pointer to a
lambda function
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread is a
function pointer
Stack is a byte
array from 64k
to megabytes
Context switch cost is
high. Large stacks
pollutes the caches
No sharing, millions
of parallel events

SeaStar current performance
Stock TCP stack SeaStar’s native TCP stack

Basic model
■ Futures
■ Promises
■ Continuations

F-P-C defined: Future
A future is a result of a computation
that may not be available yet.
■ Data buffer from the network
■ Timer expiration
■ Completion of a disk write
■ Result computation that requires the values from one or
more other futures.

F-P-C defined: Promise
A promise is an object or function
that provides you with a future, with
the expectation that it will fulfil the
future.

Basic future/promise
future<int> get(); // promises an int will be produced eventually
future<> put(int) // promises to store an int
void f() {
get().then([] (int value) {
put(value + 1).then([] {
std::cout << "value stored successfullyn";
});
});
}

Chaining
future<int> get(); // promises an int will be produced eventually
future<> put(int) // promises to store an int
void f() {
get().then([] (int value) {
return put(value + 1);
}).then([] {
std::cout << "value stored successfullyn";
});
}

Zero copy friendly
future<temporary_buffer>
connected_socket::read(size_t n);
■ temporary_buffer points at driver-provided pages if
possible
■ discarded after use

Zero copy friendly (2)
future<size_t>
connected_socket::write(temporary_buffer);
■ Future becomes ready when TCP window allows
sending more data (usually immediately)
■ temporary_buffer discarded after data is ACKed
■ can call delete[] or decrement a reference count

Dual Networking Stack
Networking API
Seastar (native) Stack POSIX (hosted) stack
Linux kernel (sockets)
User-space TCP/IP
Interface layer
DPDK
Virtio Xen
igb ixgb

Disk I/O
■ Zero copy using Linux AIO and O_DIRECT
■ Some operations using worker threads (open()
etc.)
■ Plans for direct NVMe support

Rich APIs
● HTTP Server
● HTTP Client
● RPC client/server
● map_reduce
● parallel_for_each
● distributed<>
● when_all()
● timers

More info
■ http://github.com/cloudius-systems/seastar
■ http://seastar-project.com

Back to the future with C++ and Seastar

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Back to the future with C++ and Seastar (20)

Recently uploaded (20)

Back to the future with C++ and Seastar