Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Workloads

Optimize, Don't Overspend:
Data Caching Strategy for AI Workloads
Sep, 2024

Alluxio makes it easy to share and
manage data from any storage to any
compute engine in any environment,
with high performance and low cost.
2

3
Open Source Started From UC
Berkeley AMPLab in 2014
JOIN THE CONVERSATION
ON SLACK
ALLUXIO.IO/SLACK
1,200+
contributors &
growing
10,000+
Slack Community
Members
Top 10
Most Critical Java
Based Open
Source Project
Top 100
Most Valuable
Repositories Out of 96
Million on GitHub

4
Case Studies
Zhihu
TELCO & MEDIA
E-COMMERCE
FINANCIAL SERVICES
TECH & INTERNET
OTHERS

Leverage GPUs
Anywhere
Run AI workloads wherever
GPUs are available without
data locality concerns

7
Critical infrastructure barriers to
effective AI/ML adoption
LOW PERFORMANCE COST MANAGEMENT
High performance caching
for model training &
distribution
GPU SCARCITY
Multi-region/cloud data
serving capability
Shorten time-to-production
$
$
Higher GPU Utilization
Avoid copying across data lakes
Utilize NVMe directly on the GPU
cluster

I/O Performance for AI Training
and GPU Utilization
1 HPC Performance on Existing Data Lakes
Achieve up to 8GB/s throughput & 200K IOPS for a single client
Improvements compared to 2.x: 35% for hot sequential reads, 20x for
hot random reads, 4x for cold reads
2 GPU Saturation
Fully saturate 8 A100 GPUs, showing over 97% GPU utilization in
MLPerf Storage language processing benchmarks.
Customer production data show GPU utilization improvement from
40% to 60% for search/recommendation models & 50% to 95% for LLMs
3 Checkpoint Optimization
New checkpoint read/write support optimizes training with write caching
capabilities

● Alluxio 3.2: Achieved a bandwidth of 2081 MiB/s(1 thread) to 8183 MiB/s(32 threads) with a single client, significantly outperforming competitors.
● JuiceFS: Recorded a bandwidth of 1886 MiB/s(1 thread) to 6207 MiB/s, showing 9.3% to 24.1% slower than Alluxio 3.2.
● FSx Lustre: Managed a bandwidth of 185 MiB/s(1 thread) to 3992 MiB/s, showing 91.1% to 51.2% slower than Alluxio 3.2.
● Observations: Alluxio 3.2 shows better performance, particularly in handling sequential read operations efficiently.
Comparison against other vendors | FIO - Sequential Read
Setup
● Alluxio:
1 Alluxio worker (i3en.metal)
1 Alluxio fuse client (c5n.metal)
● AWS FSx Lustre (12TB capacity)
● JuiceFS (SaaS)
Note: the Alluxio fuse client co-located with
training servers is responsible for POSIX
API access to Alluxio Workers which
actually cache the data
Alluxio Proprietary and Confidential

Comparison against other vendors | MLPerf Storage
Setup
● Alluxio
1 fuse (c6in.metal)
2 worker (i3en.metal)
Note: DDN with 12 GPUs and Weka
with 20 GPUs are the available data
points published on MLPerf website.
Alluxio Proprietary and Confidential

Scalability
Master as the
bottleneck
Unlimited scalability
Support tens of
billions of small files
with single Alluxio
cluster
Reliability
Fault tolerance
Automatic Fallback
to under file system
More friendly to
Kubernetes and Cloud
Performance
Zero-copy network
transmission with
netty
High concurrent read
Data
Governance
Multi-tenant & quota
management
Plugable security
management
Decentralized Object Repository Architecture (DORA)
Motivation & Benefits

Architecture
70
70
AI/Analytics Applications
Get Task Info
Send Result
Alluxio Client
13
Aﬀinity Block
Location Policy
Client Consistent Hash
(Task Info)
2
3
Service
Registry
Alluxio Worker Alluxio Worker
Alluxio Worker
Execute Task
Get Cluster Info
Find Worker(s)
1
4
Cache miss Under storage task
5
Training Node
Alluxio Cluster
Under Storage

Read Optimization
High Concurrent
Position Read
Solve up to 150X Read
Amplification issue
Improve unstructured file
parallel read up to 9X
Improve structured file position
read 2 - 15X
Zero-copy
Data Transmission
Improve memory eﬀiciency
Improve large file sequential
streaming read performance by
30% - 50%

16
BUSINESS BENEFIT:
TECH BENEFIT:
Increase GPU
utilization
50%
93%
HDFS
Training
Data
Training
Data
M
o
d
e
l
s
Training
Data
Models
Model
Training
Model
Training
Model
Deployment
Model
Inference
Downstream
Applications
Model
Update
Training Clouds Oﬀline Cloud Online Cloud
Zhihu CASE STUDY:
High Performance AI Platform for LLM
2 - 4X faster
time-to-market

17
$
Try Alluxio For Free in 30 min!

Try the fully deployed Alluxio AI cluster for FREE!
● Explore the potential performance benefits of Alluxio by
running FIO benchmarks
● Simplify the deployment process with preconfigured
template clusters
● Maintain full control of your data with Alluxio deployed
within your AWS account
● User friendly webUI with just a few clicks in under 40
minutes
Blog with sign up link and tutorial
Introducing Rapid Alluxio Deployer (RAD) in AWS!
Example

19
Thank you!
$
Join the conversation on Slack
alluxio.io/slack
Sign up RAD at https://signup.alluxio-rad.io/
and send us a screenshot of the cluster you
created to get a chance to win a $50 Amazon
gift card!

Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Workloads

More Related Content

Similar to Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Workloads (20)

More from Alluxio, Inc. (20)

Recently uploaded (20)

Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Workloads