SlideShare a Scribd company logo
A ScyllaDB Community
Rust, io_uring, ktls:
How fast can we make HTTP?
Amos Wenger
writer, video producer, cat owner
bearcove
A ScyllaDB Community
Nobody in the Rust space is going
far enough with io_uring
(as far as I'm aware)
Amos Wenger
writer, video producer, cat owner
bearcove
Amos Wenger (they/them) aka @fasterthanlime
writer, video producer, cat owner
■ Wrote "Making our own executable packer"
■ Teaching Rust since 2019 with Cool Bear
■ Fan of TLS (thread-local storage & the other one)
bearcove
Rust + io_uring + ktls: How Fast Can We Make HTTP?
Rust + io_uring + ktls: How Fast Can We Make HTTP?
Define "HTTP"
Define "fast"
Rust HTTP is already fast
hyper on master is 📦 v1.4.1 via 🦀 v1.80.1
❯ gl --color=always | tail -5
Commit: 886551681629de812a87555bb4ecd41515e4dee6
Author: Sean McArthur <sean.monstar@gmail.com>
Date: 2014-08-30 14:18:28 -0700 (10 years ago)
init
HTTP/1.1 200 OK
Date: Fri, 31 Dec 1999 23:59:59 GMT
Content-Type: text/html
Content-Length: 1354
Rust + io_uring + ktls: How Fast Can We Make HTTP?
/// An HTTP status code (`status-code` in RFC 9110 et al.).
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct StatusCode(NonZeroU16);
mystery winner > itoa stack > itoa heap > std::fmt
criterion bench: format_status_code, avg µs
// A string of packed 3-ASCII-digit status code
// values for the supported range of [100, 999]
// (900 codes, 2700 bytes).
const CODE_DIGITS: &str = "
100101102103104105106107108109110
✂ ✂ ✂
989990991992993994995996997998999";
We're not bickering over
assembly anymore
My hypothesis
● spectre, meltdown, etc => mitigations
● mitigations => more expensive syscalls
● more expensive syscalls => io_uring
Type systems are hard
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
Lifetimes exist in every language
Rust merely explicits them
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
blocking
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
fn poll_read(
&mut self,
buf: &mut [u8],
) -> Poll<Result<usize>>
evented (O_NONBLOCK)
blocking
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
fn poll_read(
&mut self,
buf: &mut [u8],
) -> Poll<Result<usize>>
evented (O_NONBLOCK)
blocking
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
fn poll_read(
&mut self,
cx: &mut Context<'_>,
buf: &mut [u8],
) -> Poll<Result<usize>>
evented (O_NONBLOCK)
blocking
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
fn poll_read(
&mut self,
cx: &mut Context<'_>,
buf: &mut ReadBuf<'_>,
) -> Poll<Result<usize>>
evented (O_NONBLOCK)
blocking
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
fn poll_read(
&mut self,
cx: &mut Context<'_>,
buf: &mut ReadBuf<'_>,
) -> Poll<Result<()>>
evented (O_NONBLOCK)
blocking
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
fn poll_read(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
buf: &mut ReadBuf<'_>,
) -> Poll<Result<()>>
evented (O_NONBLOCK)
Rust + io_uring + ktls: How Fast Can We Make HTTP?
blocking
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
fn poll_read(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
buf: &mut ReadBuf<'_>,
) -> Poll<Result<()>>
evented (O_NONBLOCK)
fn read(
&mut self,
buf: &mut [u8]
) -> Read<'_, Self>
where Self: Unpin
async
blocking
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
fn read(
&mut self,
buf: &mut [u8]
) -> Read<'_, Self>
where Self: Unpin
async
blocking
fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
async fn read(
&mut self,
buf: &mut [u8],
) -> Result<usize>
where Self: Unpin { ... }
async
async fn mhh(mut s: TcpStream) -> io::Result<Vec<u8>> {
let mut buf = vec![0u8; 4];
s.read_exact(&mut buf).await?;
Ok(buf)
}
async stack trace
read(&mut [u8])
read_exact(&mut [u8])
mhhh()
// (not shown: tokio runtime internals)
real stack trace
Read::poll(Pin<&mut Read>, &mut Context<'_>)
ReadExact::poll(Pin<&mut ReadExact>, &mut Context<'_>)
Mhh::poll(Pin<&mut Mhh>, &mut Context<'_>)
// (not shown: tokio runtime internals)
async fn mhh(mut s: TcpStream) -> io::Result<Vec<u8>> {
let mut buf = vec![0u8; 4];
tokio::select! {
result = s.read_exact(&mut buf) => {
result?;
Ok(buf)
}
_ = sleep(Duration::from_secs(1)) => {
Err(timeout_err())
}
}
}
async fn mhh(mut s: TcpStream) -> io::Result<Vec<u8>> {
let mut buf = vec![0u8; 4];
tokio::select! {
result = s.read_exact(&mut buf) => {
result?;
Ok(buf)
}
_ = sleep(Duration::from_secs(1)) => {
Err(timeout_err())
}
}
}
rio::Uring
pub fn recv<'a, Fd, Buf>(
&'a self,
stream: &'a Fd,
iov: &'a Buf
) -> Completion<'a, usize>
rio::Uring
impl<'a, C: FromCqe> Drop
for Completion<'a, C> {
fn drop(&mut self) {
self.wait_inner();
}
}
let mut buf = vec![0u8; 4];
let mut read_fut = Box::pin(s.read_exact(&mut buf));
tokio::select! {
_ = &mut read_fut => { todo!() }
_ = sleep(Duration::from_secs(1)) => {
std::mem::forget(read_fut);
Err(timeout_err())
}
}
Rust + io_uring + ktls: How Fast Can We Make HTTP?
Rust + io_uring + ktls: How Fast Can We Make HTTP?
tokio_uring::net::TcpStream
async fn read(&self, buf: T) -> (T, Result<usize>)
where T: BoundedBufMut;
Fine, I'll rewrite everything
on top of io-uring then.
docs.rs/loona
Rust + io_uring + ktls: How Fast Can We Make HTTP?
load testing is hard
■ macOS = nice for dev, useless for perf
■ P-states
■ love your noisy neighbors
■ stats are hard (coordinated omission etc.)
the plan?
■ Intel(R) Xeon(R) CPU E3-1275 v6 @ 3.80GHz
■ h2load from another dedicated server
■ 16 clients virtual clients, max 100 streams per
client
■ python for automation (running commands over
SSH, CSV => XLS etc.)
■ perf for counting cycles, instructions, branches
Rust + io_uring + ktls: How Fast Can We Make HTTP?
Rust + io_uring + ktls: How Fast Can We Make HTTP?
what's next?
● more/better benchmarks
● …on hardware from this decade
● proxying to HTTP/1.1, serving from disk
● messing with: allocators, buffer size
● io_uring: provided buffers, multishot accept/read
● move off of tokio entirely (no atomics needed, no "write to
unpark thread" needed)
how do we make that happen?
● money donations
● hardware donations
● expertise donations
● did I mention money
Thank you! Let’s connect.
Amos Wenger
amos@bearcove.net
@fasterthanlime
https://fasterthanli.me
bearcove

More Related Content

PDF
Rust With async / .await
PPTX
2015 bioinformatics python_io_wim_vancriekinge
PDF
Rust: код может быть одновременно безопасным и быстрым, Степан Кольцов
PDF
Writing Faster Python 3
PPT
Dev8d 2011-pipe2 py
PDF
Hubot: a look inside our robot friend
PDF
Asynchronous IO in Rust - Enrico Risa - Codemotion Rome 2017
PDF
Un monde où 1 ms vaut 100 M€ - Devoxx France 2015
Rust With async / .await
2015 bioinformatics python_io_wim_vancriekinge
Rust: код может быть одновременно безопасным и быстрым, Степан Кольцов
Writing Faster Python 3
Dev8d 2011-pipe2 py
Hubot: a look inside our robot friend
Asynchronous IO in Rust - Enrico Risa - Codemotion Rome 2017
Un monde où 1 ms vaut 100 M€ - Devoxx France 2015

Similar to Rust + io_uring + ktls: How Fast Can We Make HTTP? (20)

PDF
The JSON Architecture - BucharestJS / July
PDF
Storm Anatomy
RTF
Useful linux-commands
ODP
What Shazam doesn't want you to know
DOCX
assign4assign4_part1bonnie.c This is a file system ben.docx
PPT
Real-Time Python Web: Gevent and Socket.io
PDF
Of Owls and IO Objects
PDF
Webscraping with asyncio
PDF
Rust LDN 24 7 19 Oxidising the Command Line
PDF
swift-nio のアーキテクチャーと RxHttpClient
PDF
Степан Кольцов — Rust — лучше, чем C++
PDF
Netty from the trenches
PDF
Masters bioinfo 2013-11-14-15
PDF
The Power of CSS
PDF
How deep is your buffer – Demystifying buffers and application performance
PDF
Go Concurrency
PPT
INTRODUCTION TO SOCKETS IN COMPUTER NETWORKS DEPT OF CSE.ppt
TXT
Tic tac toe
PDF
FPBrno 2018-05-22: Benchmarking in elixir
PDF
mod_perl 2.0 For Speed Freaks!
The JSON Architecture - BucharestJS / July
Storm Anatomy
Useful linux-commands
What Shazam doesn't want you to know
assign4assign4_part1bonnie.c This is a file system ben.docx
Real-Time Python Web: Gevent and Socket.io
Of Owls and IO Objects
Webscraping with asyncio
Rust LDN 24 7 19 Oxidising the Command Line
swift-nio のアーキテクチャーと RxHttpClient
Степан Кольцов — Rust — лучше, чем C++
Netty from the trenches
Masters bioinfo 2013-11-14-15
The Power of CSS
How deep is your buffer – Demystifying buffers and application performance
Go Concurrency
INTRODUCTION TO SOCKETS IN COMPUTER NETWORKS DEPT OF CSE.ppt
Tic tac toe
FPBrno 2018-05-22: Benchmarking in elixir
mod_perl 2.0 For Speed Freaks!
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
PDF
New Ways to Reduce Database Costs with ScyllaDB
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
PDF
Leading a High-Stakes Database Migration
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
Understanding The True Cost of DynamoDB Webinar
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
New Ways to Reduce Database Costs with ScyllaDB
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Leading a High-Stakes Database Migration
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB: 10 Years and Beyond by Dor Laor
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Vector Search with ScyllaDB by Szymon Wasik
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Lessons Learned from Building a Serverless Notifications System by Srushith R...
Ad

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Encapsulation theory and applications.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Modernizing your data center with Dell and AMD
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Dropbox Q2 2025 Financial Results & Investor Presentation
MYSQL Presentation for SQL database connectivity
Encapsulation theory and applications.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Machine learning based COVID-19 study performance prediction
Per capita expenditure prediction using model stacking based on satellite ima...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Spectral efficient network and resource selection model in 5G networks
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Modernizing your data center with Dell and AMD
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Chapter 3 Spatial Domain Image Processing.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The AUB Centre for AI in Media Proposal.docx
20250228 LYD VKU AI Blended-Learning.pptx

Rust + io_uring + ktls: How Fast Can We Make HTTP?

  • 1. A ScyllaDB Community Rust, io_uring, ktls: How fast can we make HTTP? Amos Wenger writer, video producer, cat owner bearcove
  • 2. A ScyllaDB Community Nobody in the Rust space is going far enough with io_uring (as far as I'm aware) Amos Wenger writer, video producer, cat owner bearcove
  • 3. Amos Wenger (they/them) aka @fasterthanlime writer, video producer, cat owner ■ Wrote "Making our own executable packer" ■ Teaching Rust since 2019 with Cool Bear ■ Fan of TLS (thread-local storage & the other one) bearcove
  • 8. Rust HTTP is already fast
  • 9. hyper on master is 📦 v1.4.1 via 🦀 v1.80.1 ❯ gl --color=always | tail -5 Commit: 886551681629de812a87555bb4ecd41515e4dee6 Author: Sean McArthur <sean.monstar@gmail.com> Date: 2014-08-30 14:18:28 -0700 (10 years ago) init
  • 10. HTTP/1.1 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354
  • 12. /// An HTTP status code (`status-code` in RFC 9110 et al.). #[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)] pub struct StatusCode(NonZeroU16);
  • 13. mystery winner > itoa stack > itoa heap > std::fmt criterion bench: format_status_code, avg µs
  • 14. // A string of packed 3-ASCII-digit status code // values for the supported range of [100, 999] // (900 codes, 2700 bytes). const CODE_DIGITS: &str = " 100101102103104105106107108109110 ✂ ✂ ✂ 989990991992993994995996997998999";
  • 15. We're not bickering over assembly anymore
  • 16. My hypothesis ● spectre, meltdown, etc => mitigations ● mitigations => more expensive syscalls ● more expensive syscalls => io_uring
  • 18. fn read( &mut self, buf: &mut [u8], ) -> Result<usize>
  • 19. fn read( &mut self, buf: &mut [u8], ) -> Result<usize>
  • 20. Lifetimes exist in every language Rust merely explicits them
  • 21. fn read( &mut self, buf: &mut [u8], ) -> Result<usize>
  • 22. blocking fn read( &mut self, buf: &mut [u8], ) -> Result<usize> fn poll_read( &mut self, buf: &mut [u8], ) -> Poll<Result<usize>> evented (O_NONBLOCK)
  • 23. blocking fn read( &mut self, buf: &mut [u8], ) -> Result<usize> fn poll_read( &mut self, buf: &mut [u8], ) -> Poll<Result<usize>> evented (O_NONBLOCK)
  • 24. blocking fn read( &mut self, buf: &mut [u8], ) -> Result<usize> fn poll_read( &mut self, cx: &mut Context<'_>, buf: &mut [u8], ) -> Poll<Result<usize>> evented (O_NONBLOCK)
  • 25. blocking fn read( &mut self, buf: &mut [u8], ) -> Result<usize> fn poll_read( &mut self, cx: &mut Context<'_>, buf: &mut ReadBuf<'_>, ) -> Poll<Result<usize>> evented (O_NONBLOCK)
  • 26. blocking fn read( &mut self, buf: &mut [u8], ) -> Result<usize> fn poll_read( &mut self, cx: &mut Context<'_>, buf: &mut ReadBuf<'_>, ) -> Poll<Result<()>> evented (O_NONBLOCK)
  • 27. blocking fn read( &mut self, buf: &mut [u8], ) -> Result<usize> fn poll_read( self: Pin<&mut Self>, cx: &mut Context<'_>, buf: &mut ReadBuf<'_>, ) -> Poll<Result<()>> evented (O_NONBLOCK)
  • 29. blocking fn read( &mut self, buf: &mut [u8], ) -> Result<usize> fn poll_read( self: Pin<&mut Self>, cx: &mut Context<'_>, buf: &mut ReadBuf<'_>, ) -> Poll<Result<()>> evented (O_NONBLOCK) fn read( &mut self, buf: &mut [u8] ) -> Read<'_, Self> where Self: Unpin async
  • 30. blocking fn read( &mut self, buf: &mut [u8], ) -> Result<usize> fn read( &mut self, buf: &mut [u8] ) -> Read<'_, Self> where Self: Unpin async
  • 31. blocking fn read( &mut self, buf: &mut [u8], ) -> Result<usize> async fn read( &mut self, buf: &mut [u8], ) -> Result<usize> where Self: Unpin { ... } async
  • 32. async fn mhh(mut s: TcpStream) -> io::Result<Vec<u8>> { let mut buf = vec![0u8; 4]; s.read_exact(&mut buf).await?; Ok(buf) }
  • 33. async stack trace read(&mut [u8]) read_exact(&mut [u8]) mhhh() // (not shown: tokio runtime internals)
  • 34. real stack trace Read::poll(Pin<&mut Read>, &mut Context<'_>) ReadExact::poll(Pin<&mut ReadExact>, &mut Context<'_>) Mhh::poll(Pin<&mut Mhh>, &mut Context<'_>) // (not shown: tokio runtime internals)
  • 35. async fn mhh(mut s: TcpStream) -> io::Result<Vec<u8>> { let mut buf = vec![0u8; 4]; tokio::select! { result = s.read_exact(&mut buf) => { result?; Ok(buf) } _ = sleep(Duration::from_secs(1)) => { Err(timeout_err()) } } }
  • 36. async fn mhh(mut s: TcpStream) -> io::Result<Vec<u8>> { let mut buf = vec![0u8; 4]; tokio::select! { result = s.read_exact(&mut buf) => { result?; Ok(buf) } _ = sleep(Duration::from_secs(1)) => { Err(timeout_err()) } } }
  • 37. rio::Uring pub fn recv<'a, Fd, Buf>( &'a self, stream: &'a Fd, iov: &'a Buf ) -> Completion<'a, usize>
  • 38. rio::Uring impl<'a, C: FromCqe> Drop for Completion<'a, C> { fn drop(&mut self) { self.wait_inner(); } }
  • 39. let mut buf = vec![0u8; 4]; let mut read_fut = Box::pin(s.read_exact(&mut buf)); tokio::select! { _ = &mut read_fut => { todo!() } _ = sleep(Duration::from_secs(1)) => { std::mem::forget(read_fut); Err(timeout_err()) } }
  • 42. tokio_uring::net::TcpStream async fn read(&self, buf: T) -> (T, Result<usize>) where T: BoundedBufMut;
  • 43. Fine, I'll rewrite everything on top of io-uring then. docs.rs/loona
  • 45. load testing is hard ■ macOS = nice for dev, useless for perf ■ P-states ■ love your noisy neighbors ■ stats are hard (coordinated omission etc.)
  • 46. the plan? ■ Intel(R) Xeon(R) CPU E3-1275 v6 @ 3.80GHz ■ h2load from another dedicated server ■ 16 clients virtual clients, max 100 streams per client ■ python for automation (running commands over SSH, CSV => XLS etc.) ■ perf for counting cycles, instructions, branches
  • 49. what's next? ● more/better benchmarks ● …on hardware from this decade ● proxying to HTTP/1.1, serving from disk ● messing with: allocators, buffer size ● io_uring: provided buffers, multishot accept/read ● move off of tokio entirely (no atomics needed, no "write to unpark thread" needed)
  • 50. how do we make that happen? ● money donations ● hardware donations ● expertise donations ● did I mention money
  • 51. Thank you! Let’s connect. Amos Wenger amos@bearcove.net @fasterthanlime https://fasterthanli.me bearcove