Speed up large scale
ML/DL offline inference job
with Alluxio at Microsoft
Speaker introduction
Binyang Li - Software Engineer
from Bing
Qianxi Zhang - Research Software
Engineer from MSRA
Table of
content
Characteristics for offline inference job
The challenge for running large scale
inference job
Architecture & Optimization
The performance comparison w/o adopt
Alluxio
Further work
Characteristics for offline inference job
• Scale
• Each job about more than 400 tasks
• Each task read different dataset and generate its’ own output. (No interactive)
• Each task read about 2~3 GB data and output 7~8 GB data (total input is ~1Tb, total output
is ~3.5 TB)
• Each task may need 2~4 hours to finish
• Data access pattern
• Read input data only once, sequential read.
• Write output while job running.
• Infra:
• Storage: Azure Blob
• AI platform: OpenPAI microsoft/pai: Resource scheduling and cluster management for AI
(github.com)
• Scheduler: Hived microsoft/hivedscheduler: Kubernetes Scheduler for Deep Learning
(github.com)
Challenges
• Large number for total ingress & egress data, easy to cause IO failure
• Tools like blob-fuse will download data before tasks run and upload data after job
finished, easy to cause high IOPS and reach Azure Storage Limitation
• IO stall takes much time, GPU idle while upload/download data. (Waste time and
Money!)
Prod bed environment
•About 200 Azure Low Priority VMs, each has 4 GPUs. (The worker
node can be preempted at anytime!)
•Using Alluxio 2.3.0
•Using Kubernates 1.15.x
•Running more than 6 months
Architecture with Alluxio
Training/inference Job
read write
Policy Management
Data Caching/Prefetching System
load, cache, move, replica, evict data
Data Storage
(Azure Blob Store, Cosmos Stream, HDFS)
read write
Job
Scheduler(OpenPAI)
load
Alluxio
Optimization-CSI based deployment
Customized mount option
Path in Alluxio
Alluxio/alluxio-csi (github.com)
Optimization-CSI based deployment
•Sperate read/write mount option
• Enable metadata cache for input data folder (Model is share across tasks).
• Disable metadata cache and set write option to Though for output folder.
•Each pod using difference mount point. (Each job has its own mount
point)
•Each job can mount different path. (Can achieve access control)
Optimization-Fuse client improvement
•Flush enhance – Avoid data loss after job finished (Important for
inference job!)
• PR link:Implement fuse flush function by Binyang2014 · Pull Request #13103 ·
Alluxio/alluxio (github.com)
•Release enhance – Release function is aync, so file may not be closed
even we call “close” function. That will leave file in uncompleleted
status.
• PR link: Wait file closed before unmount fuse by Binyang2014 · Pull Request
#13114 · Alluxio/alluxio (github.com)
Prefetch (on-going)
Master
Node-1
GPU GPU
Worker
Node-2
GPU GPU
Worker
Node-3
GPU GPU
Worker
Node-4
GPU GPU
Worker
Load
block
block block
block block
block
Training Job Training Job
Training Job
Prefetch Prefetch
Prefetch
Submit
another
job
OpenPAI
Data paths
Schedule nodes
block
Benefits
•Stream input/output, smooth the IO request
•Handle read retry automatedly, decrease the failure rate
•Speed-up inference job. Decrease IO stall ,the performance improve
around 18%
Inference job without Alluxio, 1h 57min Inference job with Alluxio, 1h 34min
Low GPU usage
Future work
•Add write retry, decrease failure due to worker down
•Adopt Alluxio for training job. Training job has special data access
patten (each epoch read same data exactly once) and more
performance sensitive.
Reference
•OpenPAI: microsoft/pai: Resource scheduling and cluster
management for AI (github.com)
•Hived: microsoft/hivedscheduler: Kubernetes Scheduler for Deep
Learning (github.com)
•Alluxio-CSI: Alluxio/alluxio-csi (github.com)

More Related Content

PDF
Deploying Alluxio in the Cloud for Machine Learning
PDF
Alluxio-FUSE as a data access layer for Dask
PDF
Best Practice in Accelerating Data Applications with Spark+Alluxio
PDF
The Practice of Alluxio in JD.com
PDF
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
PDF
Speeding Up Spark Performance using Alluxio at China Unicom
PDF
Alluxio: Unify Data at Memory Speed; 2016-11-18
PPTX
Hybrid collaborative tiered storage with alluxio
Deploying Alluxio in the Cloud for Machine Learning
Alluxio-FUSE as a data access layer for Dask
Best Practice in Accelerating Data Applications with Spark+Alluxio
The Practice of Alluxio in JD.com
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Speeding Up Spark Performance using Alluxio at China Unicom
Alluxio: Unify Data at Memory Speed; 2016-11-18
Hybrid collaborative tiered storage with alluxio

What's hot (20)

PDF
Open Source Memory Speed Virtual Distributed Storage
PDF
Spark Summit EU talk by Jiri Simsa
PDF
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
PDF
Atom: A cloud native deep learning platform at Supremind
PDF
Best Practices for Using Alluxio with Spark
PDF
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
PDF
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
PDF
Getting Started with Alluxio + Spark + S3
PPTX
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
PDF
Running Spark & Alluxio in Kubernetes
PDF
How to Develop and Operate Cloud First Data Platforms
PDF
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
PDF
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
PDF
Alluxio Keynote at Strata+Hadoop World Beijing 2016
PDF
Improving Presto performance with Alluxio at TikTok
PDF
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
PDF
Hybrid data lake on google cloud with alluxio and dataproc
PDF
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
PDF
Fluid: When Alluxio Meets Kubernetes
PPTX
Alluxio Presentation at Strata San Jose 2016
Open Source Memory Speed Virtual Distributed Storage
Spark Summit EU talk by Jiri Simsa
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Atom: A cloud native deep learning platform at Supremind
Best Practices for Using Alluxio with Spark
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Getting Started with Alluxio + Spark + S3
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
Running Spark & Alluxio in Kubernetes
How to Develop and Operate Cloud First Data Platforms
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Alluxio Keynote at Strata+Hadoop World Beijing 2016
Improving Presto performance with Alluxio at TikTok
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Hybrid data lake on google cloud with alluxio and dataproc
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Fluid: When Alluxio Meets Kubernetes
Alluxio Presentation at Strata San Jose 2016
Ad

Similar to Speed up large-scale ML/DL offline inference job with Alluxio (20)

PDF
AI/ML Infra Meetup | Maximizing GPU Efficiency : Optimizing Model Training wi...
PDF
Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Wor...
PDF
Improving Memory Utilization of Spark Jobs Using Alluxio
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
PDF
Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes
PDF
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
Accelerate Cloud Training with Alluxio
PDF
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
PDF
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
PDF
Accelerating Cloud Training With Alluxio
PDF
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
PDF
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
PDF
Spark Summit EU talk by Jiri Simsa
PDF
Best Practices for Using Alluxio with Apache Spark with Cheng Chang and Haoyu...
PDF
PowerAlluxio
PDF
Best Practices for Using Alluxio with Spark
PDF
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
PDF
Flexible and Fast Storage for Deep Learning with Alluxio
PDF
Best Practices for Using Alluxio with Apache Spark with Gene Pang
AI/ML Infra Meetup | Maximizing GPU Efficiency : Optimizing Model Training wi...
Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Wor...
Improving Memory Utilization of Spark Jobs Using Alluxio
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
Accelerate Cloud Training with Alluxio
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Accelerating Cloud Training With Alluxio
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
Spark Summit EU talk by Jiri Simsa
Best Practices for Using Alluxio with Apache Spark with Cheng Chang and Haoyu...
PowerAlluxio
Best Practices for Using Alluxio with Spark
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Flexible and Fast Storage for Deep Learning with Alluxio
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Ad

More from Alluxio, Inc. (20)

PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI

Recently uploaded (20)

PPTX
MLforCyber_MLDataSetsandFeatures_Presentation.pptx
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PDF
E-Commerce Website Development Companyin india
PDF
AI Guide for Business Growth - Arna Softech
PPTX
Lecture 5 Software Requirement Engineering
PPTX
Matchmaking for JVMs: How to Pick the Perfect GC Partner
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PDF
Type Class Derivation in Scala 3 - Jose Luis Pintado Barbero
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PPTX
Python is a high-level, interpreted programming language
DOC
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
PDF
MCP Security Tutorial - Beginner to Advanced
PDF
Visual explanation of Dijkstra's Algorithm using Python
PDF
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
PPTX
Download Adobe Photoshop Crack 2025 Free
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
PPTX
Tech Workshop Escape Room Tech Workshop
PPTX
GSA Content Generator Crack (2025 Latest)
PPTX
Computer Software - Technology and Livelihood Education
MLforCyber_MLDataSetsandFeatures_Presentation.pptx
DNT Brochure 2025 – ISV Solutions @ D365
E-Commerce Website Development Companyin india
AI Guide for Business Growth - Arna Softech
Lecture 5 Software Requirement Engineering
Matchmaking for JVMs: How to Pick the Perfect GC Partner
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
Type Class Derivation in Scala 3 - Jose Luis Pintado Barbero
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
Python is a high-level, interpreted programming language
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
MCP Security Tutorial - Beginner to Advanced
Visual explanation of Dijkstra's Algorithm using Python
novaPDF Pro 11.9.482 Crack + License Key [Latest 2025]
Download Adobe Photoshop Crack 2025 Free
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
Tech Workshop Escape Room Tech Workshop
GSA Content Generator Crack (2025 Latest)
Computer Software - Technology and Livelihood Education

Speed up large-scale ML/DL offline inference job with Alluxio

  • 1. Speed up large scale ML/DL offline inference job with Alluxio at Microsoft
  • 2. Speaker introduction Binyang Li - Software Engineer from Bing Qianxi Zhang - Research Software Engineer from MSRA
  • 3. Table of content Characteristics for offline inference job The challenge for running large scale inference job Architecture & Optimization The performance comparison w/o adopt Alluxio Further work
  • 4. Characteristics for offline inference job • Scale • Each job about more than 400 tasks • Each task read different dataset and generate its’ own output. (No interactive) • Each task read about 2~3 GB data and output 7~8 GB data (total input is ~1Tb, total output is ~3.5 TB) • Each task may need 2~4 hours to finish • Data access pattern • Read input data only once, sequential read. • Write output while job running. • Infra: • Storage: Azure Blob • AI platform: OpenPAI microsoft/pai: Resource scheduling and cluster management for AI (github.com) • Scheduler: Hived microsoft/hivedscheduler: Kubernetes Scheduler for Deep Learning (github.com)
  • 5. Challenges • Large number for total ingress & egress data, easy to cause IO failure • Tools like blob-fuse will download data before tasks run and upload data after job finished, easy to cause high IOPS and reach Azure Storage Limitation • IO stall takes much time, GPU idle while upload/download data. (Waste time and Money!)
  • 6. Prod bed environment •About 200 Azure Low Priority VMs, each has 4 GPUs. (The worker node can be preempted at anytime!) •Using Alluxio 2.3.0 •Using Kubernates 1.15.x •Running more than 6 months
  • 7. Architecture with Alluxio Training/inference Job read write Policy Management Data Caching/Prefetching System load, cache, move, replica, evict data Data Storage (Azure Blob Store, Cosmos Stream, HDFS) read write Job Scheduler(OpenPAI) load Alluxio
  • 8. Optimization-CSI based deployment Customized mount option Path in Alluxio Alluxio/alluxio-csi (github.com)
  • 9. Optimization-CSI based deployment •Sperate read/write mount option • Enable metadata cache for input data folder (Model is share across tasks). • Disable metadata cache and set write option to Though for output folder. •Each pod using difference mount point. (Each job has its own mount point) •Each job can mount different path. (Can achieve access control)
  • 10. Optimization-Fuse client improvement •Flush enhance – Avoid data loss after job finished (Important for inference job!) • PR link:Implement fuse flush function by Binyang2014 · Pull Request #13103 · Alluxio/alluxio (github.com) •Release enhance – Release function is aync, so file may not be closed even we call “close” function. That will leave file in uncompleleted status. • PR link: Wait file closed before unmount fuse by Binyang2014 · Pull Request #13114 · Alluxio/alluxio (github.com)
  • 11. Prefetch (on-going) Master Node-1 GPU GPU Worker Node-2 GPU GPU Worker Node-3 GPU GPU Worker Node-4 GPU GPU Worker Load block block block block block block Training Job Training Job Training Job Prefetch Prefetch Prefetch Submit another job OpenPAI Data paths Schedule nodes block
  • 12. Benefits •Stream input/output, smooth the IO request •Handle read retry automatedly, decrease the failure rate •Speed-up inference job. Decrease IO stall ,the performance improve around 18% Inference job without Alluxio, 1h 57min Inference job with Alluxio, 1h 34min Low GPU usage
  • 13. Future work •Add write retry, decrease failure due to worker down •Adopt Alluxio for training job. Training job has special data access patten (each epoch read same data exactly once) and more performance sensitive.
  • 14. Reference •OpenPAI: microsoft/pai: Resource scheduling and cluster management for AI (github.com) •Hived: microsoft/hivedscheduler: Kubernetes Scheduler for Deep Learning (github.com) •Alluxio-CSI: Alluxio/alluxio-csi (github.com)