AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access

Beyond S3’s Basics
Bin Fan,
VP of Technology @Alluxio
binfan@alluxio.com
Aug 14 2025

Alluxio Confidential
Computer System 101: Two Common Metrics in I/O Systems
THROUGHPUT LATENCY
Simplified example: application throughput =100 MB/s, T_setup=200 ms
● Large dataset: 1 GB → ~10.2 s (throughput dominates)
● Small dataset: 128 KB → ~0.201 s (latency dominates).
Training & rollout love throughput; inference loves latency!
T_total ≈ T_setup (≈ TTFB) + (data_size / application throughput)

Alluxio makes it easy to share and
manage data from
any storage
to any compute engine
in any environment
with high performance and low cost.

4
Alluxio in AI & Analytics Ecosystem
Various API Support
(S3, HDFS, POSIX etc)

Journey of Alluxio Since Inception
Alluxio open source
project founded
UC Berkeley AMPLab
2019 2023
Baidu deploys
1000+ node cluster
2014
Alluxio scales
to 1 billion files
7/10 top internet brands
accelerated by Alluxio
AliPay accelerates
model training
BIG DATA ANALYTICS CLOUD ADOPTION GENERATIVE AI
1000+ OSS
Contributors
Meta accelerates
Presto workloads
9/10 top internet brands
accelerated by Alluxio
2024
Alluxio scales to
10+ billion files
Leading ecommerce brand
accelerates model training
Fortune 5 brand
accelerates model training
Zhihu accelerates
LLM model training

Alluxio
Worker n
Alluxio
Worker 2
Big Data Query
Big Data ETL Model Training
Core Feature 1: Distributed Caching
Alluxio
Worker 1
A
B
s3:/bucket/file1
s3://bucket/file2
C
A C B
Select worker based on
consistent hashing

Core Feature 2: Filesystem Namespace Virtualization
● Alluxio can be viewed as a logical file system
○ Multiple different storage service can be mounted into same logical Alluxio namespace
● An Alluxio path is backed by an persistent storage address
○ alluxio://ip:port/Data/Sales <-> hdfs://service/salesdata/Sales
Alluxio Namespace
AWS us-east-1
/
Data Users
Alice Bob
s3://bucket/Users
Alice Bob
On-prem data warehouse
hdfs://service/salesdata
Reports Sales
Reports Sales

Throughput Microbenchmark: Reads from A Single Worker
● A Single Alluxio Worker achieved a high throughput vs HPC Storage Solutions:
○ Up to 81.6 Gbps (or 9.5 GiB/s) w/ 100 Gbps network - 2.5 GiB/s(@1 thread) to 9.5 GiB/s(@32 threads)
○ Up to 352.2 Gbps (or 41 GiB/s) w/ 400 Gbps network
Setup
● Alluxio:
1 Alluxio worker (i3en.metal)
● FIO Benchmark:
Sequential Read
bs = 256KB
Note: a Alluxio fuse client
(c5n.metal) co-located with training
servers is responsible for POSIX
API access to Alluxio Workers
which actually cache the data

MLPerf Storage Benchmark
● Alluxio enables >97% GPU utilization across 10–300 H100s, eliminating I/O stalls during ML training
● As the number of GPUs scales 30×, Alluxio scales linearly with workers—delivering consistent performance with
minimal overhead.
Setup
● Alluxio: 1 - 15 Alluxio workers
(i3en.metal)
● MLPerf Storage v1.0 -
Restnet50:
Reading many (~100KB)
pictures
10 - 300 GPUs (H100)
Note: the Alluxio fuse clients co-located
with H100 servers, responsible for
POSIX API access to Alluxio Workers
which actually cache the data

THROUGHPUT
Alluxio has always been a leader in high throughput.
Enables customers to rapidly load massive
quantities of data into GPU memory for AI training
and model deployment/cold starts.
Alluxio also delivers
Ultra Low Latency Caching for data
stored on cloud storage (e.g. AWS
S3).
LATENCY
Computer System 101: Two Common Metrics in I/O Systems

Alluxio is the industry-leading
sub-ms time to ﬁrst byte (TTFB) solution on S3-class storage
How much better is Alluxio? (Details next slide)
➔ 45x Lower Latency than S3 Standard
➔ 5x Lower Latency than S3 Express One Zone
➔ Unlimited, linear scalability
Alluxio for Low Latency Caching

Test environment references
Alluxio EE
● Version/Spec: Alluxio Enterprise AI 3.6 (50TB
cache)
● Test env: 1 FUSE (C5n.metal, 100Gbps
network) and 1 Worker (i3en.metal)
AWS S3
● Version/Spec: AWS S3 bucket (Standard Class)
network)
AWS S3 Express One Zone
● Version/Spec: AWS bucket (S3 Express One
Zone Class)
network)
Alluxio for Low Latency Caching
➔ 45x Lower Latency than S3 Standard
➔ 5x Lower Latency than S3 Express One Zone

What is Alluxio,
indeed?
A shim layer on S3 (or other cloud
storage) to provide sub-ms read
latency, single-digit ms write
latency, and enhanced semantics,
Driven by RAG, Feature Store,
Agentic AI

Let’s Look at S3 (and other Cloud Storage)
● Legacy S3 usage: ETL, OLAP, archival
○ Really good at scale & throughput: S3 stores over 400 trillion objects and handles up to
150 million requests per second.
● S3 is not designed for:
○ Low Latency: Read TTFB (e.g., GetObject) in S3 standard buckets commonly lands in the 30–200 ms range—Okay for
batch, but painful for inference and transactional access
○ POSIX-like Write Semantics: Rename = copy + delete; append = not supported
● Emerging needs:
○ sub-ms Low-latency for inference, RAG, feature stores
○ advanced semantics (append, checkpointing) for OLTP and agentic memory
● Trend: infra teams want performance without giving up S3 scale & economics

Mount a S3 bucket
Mounted s3://bucketA/data to alluxio Path alluxio:///s3:
$ bin/alluxio mount add
--path /s3/
--ufs-uri s3://bucketA/data/

Common theme:
● Use Apache Parquet format for fast
point-query lookup into structured data
○ Industry standard today for data lake
● Store Parquet files of PB level on S3
● Read directly from S3 is bad in tail
latency
Low Latency Read Accelerator on S3
AWS
Hyper/Pandas/Polars
Data Lake
Distributed Cache
~1 ms
30 ms -
200 ms

Common theme:
● Can be overwrite or append
● Either keeps replication in Alluxio space
or async upload to S3
Low-latency & “Reliable” Write Buffer on S3
AWS
Rocksdb/S3 Client
Data Lake
Distributed Cache
~5 ms append
Upload in
background

Alluxio: Bringing Performance and Semantics to S3
a software layer that transparently sits between applications and S3 (or any object
store). It offers both POSIX and S3-compatible
Benefits (on top of S3) Capability
Zero-migration Mount existing S3 buckets as-is; no data move required
Low-latency accelerator Achieves sub-ms latency for S3 objects
Semantic bridge Enable append, async writes, and cache-only updates
Minimal-hardware
requirement
Pool local SSDs/NVMEs for intelligent, cost-efficient caching
Kubernetes-native Deploy via Operator; integrated metrics, tracing, and observability

Alternatives on AWS: Side-by-Side Comparison
Feature S3 Standard S3 Express One
Zone
FSx Lustre +
S3
Alluxio + S3
Latency (TTFB) 100+ ms 1–10 ms 1 ms 1 ms
Multi-cloud ❌ ❌ ❌ ✅
POSIX API ❌ ❌ ✅ ✅
S3 API ✅ ✅ ❌ ✅
Support WALs
(Append)
❌ ✅ ✅ ✅ (via POSIX)
Data Migration
Required
No High (Creation
time choice)
No No
Cost ($/TB/mo) ~$23 1
~$110 2
~$143 3
~$23 4
to ~$41 5
1
Assumes S3 standard is the source of
truth, hoping full data
2
Assumes S3 Express One Zone
holding full data, as it needs to be
decided at bucket creation time
3
Assumes for 1,000 MB/s/TiB class,
FSx Lustre holding 20% hot data, while
S3 keeps full data
4
Assumes Alluxio deployed on GPU
spare disks holding 20% hot data, no
additional hardware cost, while S3
keeps full data
5
Assumes separate Alluxio cluster
holding 20% hot data using i3en.6xlarge
instances (1 yr reserved), while S3
keeps full data

A joint engineering collaboration between Alluxio and
Salesforce. For More details:
https://www.alluxio.io/whitepaper/meet-in-the-middle-fo
r-a-1-000x-performance-boost-querying-parquet-files-o
n-petabyte-scale-data-lakes

Optimization to Achieve Sub-MS Point Query Latency
Agentic
Apps
Alluxio
Query Oﬀloading
Agentic
Apps
● Without query offloading, agentic apps might make multiple fine-grained requests to the data
orchestration layer.
● With query offloading (like single RPC to a parquet reader), the request is optimized, and much of the
filtering/logic happens closer to the data, reducing chatter and latency.
Data Lake
Data Lake

● Reduced Storage Costs & Operation Overhead
○ Avoids full capacity storage purchases by eliminating duplicate datasets from data lakes
● Data lake (many PBs) vs cache capacity (TB to PB)
○ No coordination required for cache space cleanup
● No Egress Costs on Repeated Data Access
○ Over 50% reduction in S3 API calls & egress costs through efficient caching, minimizing cloud API and data
transfer expenses
● Effortless Expansion & Operation
○ Seamlessly scale architecture to new GPU clusters without complex reconfiguration
○ Fully managed with Kubernetes (K8s) for simplified deployment, scaling, and maintenance across
environments
New Architecture: Benefits for Platform Engineers

AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access

More Related Content

More from Alluxio, Inc. (20)

Recently uploaded (20)

AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access