SlideShare a Scribd company logo
Beyond S3’s Basics
Bin Fan,
VP of Technology @Alluxio
binfan@alluxio.com
Aug 14 2025
Alluxio Confidential
Computer System 101: Two Common Metrics in I/O Systems
THROUGHPUT LATENCY
Simplified example: application throughput =100 MB/s, T_setup=200 ms
● Large dataset: 1 GB → ~10.2 s (throughput dominates)
● Small dataset: 128 KB → ~0.201 s (latency dominates).
Training & rollout love throughput; inference loves latency!
T_total ≈ T_setup (≈ TTFB) + (data_size / application throughput)
Alluxio Confidential
Alluxio makes it easy to share and
manage data from
any storage
to any compute engine
in any environment
with high performance and low cost.
4
Alluxio in AI & Analytics Ecosystem
Various API Support
(S3, HDFS, POSIX etc)
Journey of Alluxio Since Inception
Alluxio open source
project founded
UC Berkeley AMPLab
2019 2023
Baidu deploys
1000+ node cluster
2014
Alluxio scales
to 1 billion files
7/10 top internet brands
accelerated by Alluxio
AliPay accelerates
model training
BIG DATA ANALYTICS CLOUD ADOPTION GENERATIVE AI
1000+ OSS
Contributors
Meta accelerates
Presto workloads
9/10 top internet brands
accelerated by Alluxio
2024
Alluxio scales to
10+ billion files
Leading ecommerce brand
accelerates model training
Fortune 5 brand
accelerates model training
Zhihu accelerates
LLM model training
Alluxio
Worker n
Alluxio
Worker 2
Big Data Query
Big Data ETL Model Training
Core Feature 1: Distributed Caching
Alluxio
Worker 1
A
B
s3:/bucket/file1
s3://bucket/file2
C
A C B
Select worker based on
consistent hashing
Core Feature 2: Filesystem Namespace Virtualization
● Alluxio can be viewed as a logical file system
○ Multiple different storage service can be mounted into same logical Alluxio namespace
● An Alluxio path is backed by an persistent storage address
○ alluxio://ip:port/Data/Sales <-> hdfs://service/salesdata/Sales
Alluxio Namespace
AWS us-east-1
/
Data Users
Alice Bob
s3://bucket/Users
Alice Bob
On-prem data warehouse
hdfs://service/salesdata
Reports Sales
Reports Sales
Throughput Microbenchmark: Reads from A Single Worker
● A Single Alluxio Worker achieved a high throughput vs HPC Storage Solutions:
○ Up to 81.6 Gbps (or 9.5 GiB/s) w/ 100 Gbps network - 2.5 GiB/s(@1 thread) to 9.5 GiB/s(@32 threads)
○ Up to 352.2 Gbps (or 41 GiB/s) w/ 400 Gbps network
Setup
● Alluxio:
1 Alluxio worker (i3en.metal)
● FIO Benchmark:
Sequential Read
bs = 256KB
Note: a Alluxio fuse client
(c5n.metal) co-located with training
servers is responsible for POSIX
API access to Alluxio Workers
which actually cache the data
MLPerf Storage Benchmark
● Alluxio enables >97% GPU utilization across 10–300 H100s, eliminating I/O stalls during ML training
● As the number of GPUs scales 30×, Alluxio scales linearly with workers—delivering consistent performance with
minimal overhead.
Setup
● Alluxio: 1 - 15 Alluxio workers
(i3en.metal)
● MLPerf Storage v1.0 -
Restnet50:
Reading many (~100KB)
pictures
10 - 300 GPUs (H100)
Note: the Alluxio fuse clients co-located
with H100 servers, responsible for
POSIX API access to Alluxio Workers
which actually cache the data
Alluxio Confidential
THROUGHPUT
Alluxio has always been a leader in high throughput.
Enables customers to rapidly load massive
quantities of data into GPU memory for AI training
and model deployment/cold starts.
Alluxio also delivers
Ultra Low Latency Caching for data
stored on cloud storage (e.g. AWS
S3).
LATENCY
Computer System 101: Two Common Metrics in I/O Systems
Alluxio Confidential
Alluxio is the industry-leading
sub-ms time to first byte (TTFB) solution on S3-class storage
How much better is Alluxio? (Details next slide)
➔ 45x Lower Latency than S3 Standard
➔ 5x Lower Latency than S3 Express One Zone
➔ Unlimited, linear scalability
Alluxio for Low Latency Caching
Alluxio Confidential
Test environment references
Alluxio EE
● Version/Spec: Alluxio Enterprise AI 3.6 (50TB
cache)
● Test env: 1 FUSE (C5n.metal, 100Gbps
network) and 1 Worker (i3en.metal)
AWS S3
● Version/Spec: AWS S3 bucket (Standard Class)
● Test env: 1 FUSE (C5n.metal, 100Gbps
network)
AWS S3 Express One Zone
● Version/Spec: AWS bucket (S3 Express One
Zone Class)
● Test env: 1 FUSE (C5n.metal, 100Gbps
network)
Alluxio for Low Latency Caching
➔ 45x Lower Latency than S3 Standard
➔ 5x Lower Latency than S3 Express One Zone
What is Alluxio,
indeed?
A shim layer on S3 (or other cloud
storage) to provide sub-ms read
latency, single-digit ms write
latency, and enhanced semantics,
Driven by RAG, Feature Store,
Agentic AI
Alluxio Confidential
Computer System 101: Two Common Metrics in I/O Systems
THROUGHPUT LATENCY
Simplified example: application throughput =100 MB/s, T_setup=200 ms
● Large dataset: 1 GB → ~10.2 s (throughput dominates)
● Small dataset: 128 KB → ~0.201 s (latency dominates).
Training & rollout love throughput; inference loves latency!
T_total ≈ T_setup (≈ TTFB) + (data_size / application throughput)
Let’s Look at S3 (and other Cloud Storage)
● Legacy S3 usage: ETL, OLAP, archival
○ Really good at scale & throughput: S3 stores over 400 trillion objects and handles up to
150 million requests per second.
● S3 is not designed for:
○ Low Latency: Read TTFB (e.g., GetObject) in S3 standard buckets commonly lands in the 30–200 ms range—Okay for
batch, but painful for inference and transactional access
○ POSIX-like Write Semantics: Rename = copy + delete; append = not supported
● Emerging needs:
○ sub-ms Low-latency for inference, RAG, feature stores
○ advanced semantics (append, checkpointing) for OLTP and agentic memory
● Trend: infra teams want performance without giving up S3 scale & economics
Mount a S3 bucket
Mounted s3://bucketA/data to alluxio Path alluxio:///s3:
$ bin/alluxio mount add 
--path /s3/ 
--ufs-uri s3://bucketA/data/
Common theme:
● Use Apache Parquet format for fast
point-query lookup into structured data
○ Industry standard today for data lake
● Store Parquet files of PB level on S3
● Read directly from S3 is bad in tail
latency
Low Latency Read Accelerator on S3
AWS
Hyper/Pandas/Polars
Data Lake
Distributed Cache
~1 ms
30 ms -
200 ms
Common theme:
● Can be overwrite or append
● Either keeps replication in Alluxio space
or async upload to S3
Low-latency & “Reliable” Write Buffer on S3
AWS
Rocksdb/S3 Client
Data Lake
Distributed Cache
~5 ms append
Upload in
background
Alluxio: Bringing Performance and Semantics to S3
a software layer that transparently sits between applications and S3 (or any object
store). It offers both POSIX and S3-compatible
Benefits (on top of S3) Capability
Zero-migration Mount existing S3 buckets as-is; no data move required
Low-latency accelerator Achieves sub-ms latency for S3 objects
Semantic bridge Enable append, async writes, and cache-only updates
Minimal-hardware
requirement
Pool local SSDs/NVMEs for intelligent, cost-efficient caching
Kubernetes-native Deploy via Operator; integrated metrics, tracing, and observability
Alternatives on AWS: Side-by-Side Comparison
Feature S3 Standard S3 Express One
Zone
FSx Lustre +
S3
Alluxio + S3
Latency (TTFB) 100+ ms 1–10 ms 1 ms 1 ms
Multi-cloud ❌ ❌ ❌ ✅
POSIX API ❌ ❌ ✅ ✅
S3 API ✅ ✅ ❌ ✅
Support WALs
(Append)
❌ ✅ ✅ ✅ (via POSIX)
Data Migration
Required
No High (Creation
time choice)
No No
Cost ($/TB/mo) ~$23 1
~$110 2
~$143 3
~$23 4
to ~$41 5
1
Assumes S3 standard is the source of
truth, hoping full data
2
Assumes S3 Express One Zone
holding full data, as it needs to be
decided at bucket creation time
3
Assumes for 1,000 MB/s/TiB class,
FSx Lustre holding 20% hot data, while
S3 keeps full data
4
Assumes Alluxio deployed on GPU
spare disks holding 20% hot data, no
additional hardware cost, while S3
keeps full data
5
Assumes separate Alluxio cluster
holding 20% hot data using i3en.6xlarge
instances (1 yr reserved), while S3
keeps full data
One More Thing
A joint engineering collaboration between Alluxio and
Salesforce. For More details:
https://www.alluxio.io/whitepaper/meet-in-the-middle-fo
r-a-1-000x-performance-boost-querying-parquet-files-o
n-petabyte-scale-data-lakes
Optimization to Achieve Sub-MS Point Query Latency
Agentic
Apps
Alluxio
Query Offloading
Agentic
Apps
● Without query offloading, agentic apps might make multiple fine-grained requests to the data
orchestration layer.
● With query offloading (like single RPC to a parquet reader), the request is optimized, and much of the
filtering/logic happens closer to the data, reducing chatter and latency.
Data Lake
Data Lake
● Reduced Storage Costs & Operation Overhead
○ Avoids full capacity storage purchases by eliminating duplicate datasets from data lakes
● Data lake (many PBs) vs cache capacity (TB to PB)
○ No coordination required for cache space cleanup
● No Egress Costs on Repeated Data Access
○ Over 50% reduction in S3 API calls & egress costs through efficient caching, minimizing cloud API and data
transfer expenses
● Effortless Expansion & Operation
○ Seamlessly scale architecture to new GPU clusters without complex reconfiguration
○ Fully managed with Kubernetes (K8s) for simplified deployment, scaling, and maintenance across
environments
New Architecture: Benefits for Platform Engineers
Q&A

More Related Content

PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...

More from Alluxio, Inc. (20)

PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
PDF
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
PDF
AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for product...
PDF
Alluxio Webinar | Model Training Across Regions and Clouds – Challenges, Solu...
PDF
AI/ML Infra Meetup | Scaling Experimentation Platform in Digital Marketplaces...
PDF
AI/ML Infra Meetup | Scaling Vector Databases for E-Commerce Visual Search: A...
PDF
Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Wor...
PDF
AI/ML Infra Meetup | Maximizing GPU Efficiency : Optimizing Model Training wi...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for product...
Alluxio Webinar | Model Training Across Regions and Clouds – Challenges, Solu...
AI/ML Infra Meetup | Scaling Experimentation Platform in Digital Marketplaces...
AI/ML Infra Meetup | Scaling Vector Databases for E-Commerce Visual Search: A...
Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Wor...
AI/ML Infra Meetup | Maximizing GPU Efficiency : Optimizing Model Training wi...
Ad

Recently uploaded (20)

PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Tally Prime Crack Download New Version 5.1 [2025] (License Key Free
PDF
Complete Guide to Website Development in Malaysia for SMEs
PDF
Download FL Studio Crack Latest version 2025 ?
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
history of c programming in notes for students .pptx
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PDF
Designing Intelligence for the Shop Floor.pdf
PPTX
Patient Appointment Booking in Odoo with online payment
PDF
iTop VPN Crack Latest Version Full Key 2025
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
CapCut Video Editor 6.8.1 Crack for PC Latest Download (Fully Activated) 2025
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Reimagine Home Health with the Power of Agentic AI​
Tally Prime Crack Download New Version 5.1 [2025] (License Key Free
Complete Guide to Website Development in Malaysia for SMEs
Download FL Studio Crack Latest version 2025 ?
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
Design an Analysis of Algorithms II-SECS-1021-03
Operating system designcfffgfgggggggvggggggggg
history of c programming in notes for students .pptx
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
Designing Intelligence for the Shop Floor.pdf
Patient Appointment Booking in Odoo with online payment
iTop VPN Crack Latest Version Full Key 2025
Why Generative AI is the Future of Content, Code & Creativity?
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Wondershare Filmora 15 Crack With Activation Key [2025
CapCut Video Editor 6.8.1 Crack for PC Latest Download (Fully Activated) 2025
wealthsignaloriginal-com-DS-text-... (1).pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Ad

AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access

  • 1. Beyond S3’s Basics Bin Fan, VP of Technology @Alluxio binfan@alluxio.com Aug 14 2025
  • 2. Alluxio Confidential Computer System 101: Two Common Metrics in I/O Systems THROUGHPUT LATENCY Simplified example: application throughput =100 MB/s, T_setup=200 ms ● Large dataset: 1 GB → ~10.2 s (throughput dominates) ● Small dataset: 128 KB → ~0.201 s (latency dominates). Training & rollout love throughput; inference loves latency! T_total ≈ T_setup (≈ TTFB) + (data_size / application throughput)
  • 3. Alluxio Confidential Alluxio makes it easy to share and manage data from any storage to any compute engine in any environment with high performance and low cost.
  • 4. 4 Alluxio in AI & Analytics Ecosystem Various API Support (S3, HDFS, POSIX etc)
  • 5. Journey of Alluxio Since Inception Alluxio open source project founded UC Berkeley AMPLab 2019 2023 Baidu deploys 1000+ node cluster 2014 Alluxio scales to 1 billion files 7/10 top internet brands accelerated by Alluxio AliPay accelerates model training BIG DATA ANALYTICS CLOUD ADOPTION GENERATIVE AI 1000+ OSS Contributors Meta accelerates Presto workloads 9/10 top internet brands accelerated by Alluxio 2024 Alluxio scales to 10+ billion files Leading ecommerce brand accelerates model training Fortune 5 brand accelerates model training Zhihu accelerates LLM model training
  • 6. Alluxio Worker n Alluxio Worker 2 Big Data Query Big Data ETL Model Training Core Feature 1: Distributed Caching Alluxio Worker 1 A B s3:/bucket/file1 s3://bucket/file2 C A C B Select worker based on consistent hashing
  • 7. Core Feature 2: Filesystem Namespace Virtualization ● Alluxio can be viewed as a logical file system ○ Multiple different storage service can be mounted into same logical Alluxio namespace ● An Alluxio path is backed by an persistent storage address ○ alluxio://ip:port/Data/Sales <-> hdfs://service/salesdata/Sales Alluxio Namespace AWS us-east-1 / Data Users Alice Bob s3://bucket/Users Alice Bob On-prem data warehouse hdfs://service/salesdata Reports Sales Reports Sales
  • 8. Throughput Microbenchmark: Reads from A Single Worker ● A Single Alluxio Worker achieved a high throughput vs HPC Storage Solutions: ○ Up to 81.6 Gbps (or 9.5 GiB/s) w/ 100 Gbps network - 2.5 GiB/s(@1 thread) to 9.5 GiB/s(@32 threads) ○ Up to 352.2 Gbps (or 41 GiB/s) w/ 400 Gbps network Setup ● Alluxio: 1 Alluxio worker (i3en.metal) ● FIO Benchmark: Sequential Read bs = 256KB Note: a Alluxio fuse client (c5n.metal) co-located with training servers is responsible for POSIX API access to Alluxio Workers which actually cache the data
  • 9. MLPerf Storage Benchmark ● Alluxio enables >97% GPU utilization across 10–300 H100s, eliminating I/O stalls during ML training ● As the number of GPUs scales 30×, Alluxio scales linearly with workers—delivering consistent performance with minimal overhead. Setup ● Alluxio: 1 - 15 Alluxio workers (i3en.metal) ● MLPerf Storage v1.0 - Restnet50: Reading many (~100KB) pictures 10 - 300 GPUs (H100) Note: the Alluxio fuse clients co-located with H100 servers, responsible for POSIX API access to Alluxio Workers which actually cache the data
  • 10. Alluxio Confidential THROUGHPUT Alluxio has always been a leader in high throughput. Enables customers to rapidly load massive quantities of data into GPU memory for AI training and model deployment/cold starts. Alluxio also delivers Ultra Low Latency Caching for data stored on cloud storage (e.g. AWS S3). LATENCY Computer System 101: Two Common Metrics in I/O Systems
  • 11. Alluxio Confidential Alluxio is the industry-leading sub-ms time to first byte (TTFB) solution on S3-class storage How much better is Alluxio? (Details next slide) ➔ 45x Lower Latency than S3 Standard ➔ 5x Lower Latency than S3 Express One Zone ➔ Unlimited, linear scalability Alluxio for Low Latency Caching
  • 12. Alluxio Confidential Test environment references Alluxio EE ● Version/Spec: Alluxio Enterprise AI 3.6 (50TB cache) ● Test env: 1 FUSE (C5n.metal, 100Gbps network) and 1 Worker (i3en.metal) AWS S3 ● Version/Spec: AWS S3 bucket (Standard Class) ● Test env: 1 FUSE (C5n.metal, 100Gbps network) AWS S3 Express One Zone ● Version/Spec: AWS bucket (S3 Express One Zone Class) ● Test env: 1 FUSE (C5n.metal, 100Gbps network) Alluxio for Low Latency Caching ➔ 45x Lower Latency than S3 Standard ➔ 5x Lower Latency than S3 Express One Zone
  • 13. What is Alluxio, indeed? A shim layer on S3 (or other cloud storage) to provide sub-ms read latency, single-digit ms write latency, and enhanced semantics, Driven by RAG, Feature Store, Agentic AI
  • 14. Alluxio Confidential Computer System 101: Two Common Metrics in I/O Systems THROUGHPUT LATENCY Simplified example: application throughput =100 MB/s, T_setup=200 ms ● Large dataset: 1 GB → ~10.2 s (throughput dominates) ● Small dataset: 128 KB → ~0.201 s (latency dominates). Training & rollout love throughput; inference loves latency! T_total ≈ T_setup (≈ TTFB) + (data_size / application throughput)
  • 15. Let’s Look at S3 (and other Cloud Storage) ● Legacy S3 usage: ETL, OLAP, archival ○ Really good at scale & throughput: S3 stores over 400 trillion objects and handles up to 150 million requests per second. ● S3 is not designed for: ○ Low Latency: Read TTFB (e.g., GetObject) in S3 standard buckets commonly lands in the 30–200 ms range—Okay for batch, but painful for inference and transactional access ○ POSIX-like Write Semantics: Rename = copy + delete; append = not supported ● Emerging needs: ○ sub-ms Low-latency for inference, RAG, feature stores ○ advanced semantics (append, checkpointing) for OLTP and agentic memory ● Trend: infra teams want performance without giving up S3 scale & economics
  • 16. Mount a S3 bucket Mounted s3://bucketA/data to alluxio Path alluxio:///s3: $ bin/alluxio mount add --path /s3/ --ufs-uri s3://bucketA/data/
  • 17. Common theme: ● Use Apache Parquet format for fast point-query lookup into structured data ○ Industry standard today for data lake ● Store Parquet files of PB level on S3 ● Read directly from S3 is bad in tail latency Low Latency Read Accelerator on S3 AWS Hyper/Pandas/Polars Data Lake Distributed Cache ~1 ms 30 ms - 200 ms
  • 18. Common theme: ● Can be overwrite or append ● Either keeps replication in Alluxio space or async upload to S3 Low-latency & “Reliable” Write Buffer on S3 AWS Rocksdb/S3 Client Data Lake Distributed Cache ~5 ms append Upload in background
  • 19. Alluxio: Bringing Performance and Semantics to S3 a software layer that transparently sits between applications and S3 (or any object store). It offers both POSIX and S3-compatible Benefits (on top of S3) Capability Zero-migration Mount existing S3 buckets as-is; no data move required Low-latency accelerator Achieves sub-ms latency for S3 objects Semantic bridge Enable append, async writes, and cache-only updates Minimal-hardware requirement Pool local SSDs/NVMEs for intelligent, cost-efficient caching Kubernetes-native Deploy via Operator; integrated metrics, tracing, and observability
  • 20. Alternatives on AWS: Side-by-Side Comparison Feature S3 Standard S3 Express One Zone FSx Lustre + S3 Alluxio + S3 Latency (TTFB) 100+ ms 1–10 ms 1 ms 1 ms Multi-cloud ❌ ❌ ❌ ✅ POSIX API ❌ ❌ ✅ ✅ S3 API ✅ ✅ ❌ ✅ Support WALs (Append) ❌ ✅ ✅ ✅ (via POSIX) Data Migration Required No High (Creation time choice) No No Cost ($/TB/mo) ~$23 1 ~$110 2 ~$143 3 ~$23 4 to ~$41 5 1 Assumes S3 standard is the source of truth, hoping full data 2 Assumes S3 Express One Zone holding full data, as it needs to be decided at bucket creation time 3 Assumes for 1,000 MB/s/TiB class, FSx Lustre holding 20% hot data, while S3 keeps full data 4 Assumes Alluxio deployed on GPU spare disks holding 20% hot data, no additional hardware cost, while S3 keeps full data 5 Assumes separate Alluxio cluster holding 20% hot data using i3en.6xlarge instances (1 yr reserved), while S3 keeps full data
  • 22. A joint engineering collaboration between Alluxio and Salesforce. For More details: https://www.alluxio.io/whitepaper/meet-in-the-middle-fo r-a-1-000x-performance-boost-querying-parquet-files-o n-petabyte-scale-data-lakes
  • 23. Optimization to Achieve Sub-MS Point Query Latency Agentic Apps Alluxio Query Offloading Agentic Apps ● Without query offloading, agentic apps might make multiple fine-grained requests to the data orchestration layer. ● With query offloading (like single RPC to a parquet reader), the request is optimized, and much of the filtering/logic happens closer to the data, reducing chatter and latency. Data Lake Data Lake
  • 24. ● Reduced Storage Costs & Operation Overhead ○ Avoids full capacity storage purchases by eliminating duplicate datasets from data lakes ● Data lake (many PBs) vs cache capacity (TB to PB) ○ No coordination required for cache space cleanup ● No Egress Costs on Repeated Data Access ○ Over 50% reduction in S3 API calls & egress costs through efficient caching, minimizing cloud API and data transfer expenses ● Effortless Expansion & Operation ○ Seamlessly scale architecture to new GPU clusters without complex reconfiguration ○ Fully managed with Kubernetes (K8s) for simplified deployment, scaling, and maintenance across environments New Architecture: Benefits for Platform Engineers
  • 25. Q&A