SlideShare a Scribd company logo
Deploying Alluxio in the Cloud for
Machine Learning
Lu Qiu @ Alluxio
06/24/2021
1
About Me – Lu Qiu
● Software Engineer @ Alluxio
● Email: lu@alluxio.com
● Master Data Science @ GWU
● Areas: Alluxio fault tolerant system, journal
system, metrics system, and POSIX API.
Alluxio integration with Cloud
2
Agenda
● What is Alluxio POSIX API
● Deploy Alluxio in K8S for ML workloads
● Updates in Alluxio POSIX Developments
3
What is Alluxio POSIX API
4
Apps Connecting to Alluxio via POSIX API
5
Accessing Remote/Distributed Data as
Local Directories
6
HDFS #1
Obj Store
NFS
HDFS #2
Connecting to
• HDFS
• Amazon S3
• Azure
• Google Cloud
• Ceph
• NFS
• Many more
Alluxio
Server
Alluxio
Server
Model Training
Distributed Caching w/ Unified Namespace
Alluxio
Server
A
B
/path1/file1
/path2/file2
C
A
B C A
Model Training Model Training
7
Deploy Alluxio in K8s for ML
Workloads
8
Use Alluxio POSIX API in ML/AI Training
9
Alluxio
Server
Alluxio
Server
9
...
GPU Instance
Train in Cloud Kubernetes with Alluxio
- Provision the whole Kubernetes cluster on cloud
- Deploy Alluxio services via Docker and Kubernetes
- Launch distributed training job via Kubeflow and Arena in one command
- Accelerate training speed by caching data locally
- Focusing on the training logics instead of worry about the data
preparation (No need to modify training script, use remote data like
local data)
Deployment Variance - Kubectl
- The default way of launching Alluxio cluster on Kubernetes
- Use in testing environment, especially when your deployment is simple
- If you have special requirements that is not fulfilled by other deploy ways,
kubectl is your chose.
11
$ kubectl create -f alluxio-configmap.yaml
$ kubectl create -f ./master/
$ kubectl create -f ./worker/
Deployment Variance - Helm
- Package Manager for Kubernetes, preferred over Kubectl
- Use Alluxio shared Helm charts, provide your desired configuration, and launch
the Alluxio services in one command.
- Define a common template for Alluxio services and deploy the same/similar
applications across different environments without copy-pasting.
12
$ helm repo add alluxio-charts
https://alluxio-charts.storage.googleapis.com/openSource/2.6.0
$ helm install alluxio -f config.yaml alluxio-charts/alluxio
Deploy Alluxio on Kubernetes
Deployment Variance - Alluxio CSI
Container Storage Interface
- Expose Alluxio to container applications (e.g. Docker, Kubernetes) as
persistent storage.
- Support customizing storage features without the need to integrate Alluxio
driver into Kubernetes package and wait for kubernetes releases.
13
Alluxio CSI
- Mount point level access control
- Separate mount points for different users or use
cases
- Access control via POSIX filesystem user group
- Different configuration for different mount points
- Separate read/write path
- Enable metadata cache for input data folder
- Disable metadata cache and set write option to Through
for output folder.
Customized mount
option
Path in
Alluxio
Alluxio CSI
Reference
- Microsoft: Speed up large-scale ML/DL offline inference job with Alluxio
- Alluxio/alluxio-csi (github.com)
- Alluxio CSI integration and improvements (#13435)
Special thanks to Binyang from Microsoft and Baolong from Tencent
15
Deployment Variance - Fluid
Fluid is an open source Kubernetes-native Distributed Dataset Orchestrator and
Accelerator for data-intensive applications, such as big data and AI applications.
Apart from accelerating training, Fluid also supports
- Access control in dataset level for data scientists
- Auto launching and scaling internal Alluxio services. Users are not needed to
worry about the deployment pain points of Alluxio services.
16
17
Fluid
Reference
- Fluid: When Alluxio Meets Kubernetes
- Speeding up Atlas deep learning platform with Alluxio + Fluid
- cheyang/fluid (github.com)
Special thanks to Cheyang from Alibaba
18
Updates in Alluxio POSIX
Developments
19
Community Collaboration
Community-driven collaboration
- Contributors from NJU, Alibaba, Tencent, Microsoft, Alluxio
Already in used by Microsoft, Analytics Aspects, BOSS
20
JNI-Fuse enhancements
(available in 2.6.0)
- Local RPC elimination by integrating Fuse into Worker process
- Modularized JNI-Fuse library (github.com/maobaolong/jnifuse)
- Implement Fuse flush() operation in write path (#13103)
- Implement Fuse utimens() operation to support linux touch command (#13218)
- Implement Fuse symlink() operation to avoid execution errors (#13429)
- Support open file for overwrite (#13236)
- Fix Fuse write then read problem in Fuse async release() operation (#13160)
- Add metrics for Fuse operations (#13201)
21
JNI-Fuse enhancements
(ongoing projects)
- Alluxio CSI improvements (#13435)
- Improve data loading speed by dynamically allocating loading jobs (#13485)
- Support updating configuration during runtime to better support Alluxio on
Kubernetes (#13643)
- Support libfuse 3 (#12758)
- Remote RPC optimization
Join Alluxio weekly community sync to create solutions together!
22
We Are Hiring! (contact careers@alluxio.com)
Join the Alluxio Community today
www.alluxio.io/slack | @alluxio
23

More Related Content

PDF
Alluxio data orchestration for machine learning
PDF
Speed up large-scale ML/DL offline inference job with Alluxio
PDF
Alluxio-FUSE as a data access layer for Dask
PDF
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
PDF
Setting up monitoring system for Alluxio with Prometheus and Grafana in 10 mi...
PDF
Spark Summit EU talk by Jiri Simsa
PDF
The Missing Piece of On-Demand Clusters
PDF
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio data orchestration for machine learning
Speed up large-scale ML/DL offline inference job with Alluxio
Alluxio-FUSE as a data access layer for Dask
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
Setting up monitoring system for Alluxio with Prometheus and Grafana in 10 mi...
Spark Summit EU talk by Jiri Simsa
The Missing Piece of On-Demand Clusters
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...

What's hot (20)

PDF
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
PDF
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
PDF
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
PDF
The Practice of Alluxio in JD.com
PDF
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
PDF
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
PDF
Open Source Memory Speed Virtual Distributed Storage
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Alluxio: Unify Data at Memory Speed; 2016-11-18
PDF
Alluxio Presentation at AMPLab Summer Retreat 2016
PDF
Best Practices for Using Alluxio with Spark
PDF
Best Practices for Using Alluxio with Spark
PDF
Atom: A cloud native deep learning platform at Supremind
PDF
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
PDF
Running Spark & Alluxio in Kubernetes
PPTX
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
PDF
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
PDF
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
PDF
Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes
PDF
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
The Practice of Alluxio in JD.com
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Open Source Memory Speed Virtual Distributed Storage
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio Presentation at AMPLab Summer Retreat 2016
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
Atom: A cloud native deep learning platform at Supremind
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Running Spark & Alluxio in Kubernetes
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Ad

Similar to Deploying Alluxio in the Cloud for Machine Learning (20)

PDF
Accelerating Spark with Kubernetes
PDF
What’s new in Alluxio 2: from seamless operations to structured data management
PDF
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
PDF
Alluxio 2 Community Update
PDF
Pycon9 - Paas per tutti i gusti con Dokku and Kubernetes
PDF
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
PDF
Accelerate Cloud Training with Alluxio
PDF
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
PPTX
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
PDF
Uyuni, the solution to manage your Linux infrastructure
PDF
Uyuni, the solution to manage your IT infrastructure
PDF
Uyuni, the solution to manage your IT infrastructure
PDF
Swiss IPv6 Council – Case Study - Deployment von IPv6 in einer Container Plat...
PDF
IPv6 on Container Plattforms
PDF
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
PPTX
Oscon 2017: Build your own container-based system with the Moby project
PDF
Uyuni, the movie
PPTX
Serverless Pune meetup 3
PDF
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
Accelerating Spark with Kubernetes
What’s new in Alluxio 2: from seamless operations to structured data management
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
Alluxio 2 Community Update
Pycon9 - Paas per tutti i gusti con Dokku and Kubernetes
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
Accelerate Cloud Training with Alluxio
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Uyuni, the solution to manage your Linux infrastructure
Uyuni, the solution to manage your IT infrastructure
Uyuni, the solution to manage your IT infrastructure
Swiss IPv6 Council – Case Study - Deployment von IPv6 in einer Container Plat...
IPv6 on Container Plattforms
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Oscon 2017: Build your own container-based system with the Moby project
Uyuni, the movie
Serverless Pune meetup 3
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
Ad

More from Alluxio, Inc. (20)

PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
PDF
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
AI/ML Infra Meetup | Big Data and AI, Zoom Developers

Recently uploaded (20)

PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Nekopoi APK 2025 free lastest update
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
L1 - Introduction to python Backend.pptx
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
history of c programming in notes for students .pptx
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
Introduction to Artificial Intelligence
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
Transform Your Business with a Software ERP System
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Design an Analysis of Algorithms II-SECS-1021-03
Nekopoi APK 2025 free lastest update
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
L1 - Introduction to python Backend.pptx
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
2025 Textile ERP Trends: SAP, Odoo & Oracle
VVF-Customer-Presentation2025-Ver1.9.pptx
history of c programming in notes for students .pptx
Upgrade and Innovation Strategies for SAP ERP Customers
Design an Analysis of Algorithms I-SECS-1021-03
Introduction to Artificial Intelligence
Navsoft: AI-Powered Business Solutions & Custom Software Development
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Transform Your Business with a Software ERP System

Deploying Alluxio in the Cloud for Machine Learning

  • 1. Deploying Alluxio in the Cloud for Machine Learning Lu Qiu @ Alluxio 06/24/2021 1
  • 2. About Me – Lu Qiu ● Software Engineer @ Alluxio ● Email: lu@alluxio.com ● Master Data Science @ GWU ● Areas: Alluxio fault tolerant system, journal system, metrics system, and POSIX API. Alluxio integration with Cloud 2
  • 3. Agenda ● What is Alluxio POSIX API ● Deploy Alluxio in K8S for ML workloads ● Updates in Alluxio POSIX Developments 3
  • 4. What is Alluxio POSIX API 4
  • 5. Apps Connecting to Alluxio via POSIX API 5
  • 6. Accessing Remote/Distributed Data as Local Directories 6 HDFS #1 Obj Store NFS HDFS #2 Connecting to • HDFS • Amazon S3 • Azure • Google Cloud • Ceph • NFS • Many more
  • 7. Alluxio Server Alluxio Server Model Training Distributed Caching w/ Unified Namespace Alluxio Server A B /path1/file1 /path2/file2 C A B C A Model Training Model Training 7
  • 8. Deploy Alluxio in K8s for ML Workloads 8
  • 9. Use Alluxio POSIX API in ML/AI Training 9 Alluxio Server Alluxio Server 9 ... GPU Instance
  • 10. Train in Cloud Kubernetes with Alluxio - Provision the whole Kubernetes cluster on cloud - Deploy Alluxio services via Docker and Kubernetes - Launch distributed training job via Kubeflow and Arena in one command - Accelerate training speed by caching data locally - Focusing on the training logics instead of worry about the data preparation (No need to modify training script, use remote data like local data)
  • 11. Deployment Variance - Kubectl - The default way of launching Alluxio cluster on Kubernetes - Use in testing environment, especially when your deployment is simple - If you have special requirements that is not fulfilled by other deploy ways, kubectl is your chose. 11 $ kubectl create -f alluxio-configmap.yaml $ kubectl create -f ./master/ $ kubectl create -f ./worker/
  • 12. Deployment Variance - Helm - Package Manager for Kubernetes, preferred over Kubectl - Use Alluxio shared Helm charts, provide your desired configuration, and launch the Alluxio services in one command. - Define a common template for Alluxio services and deploy the same/similar applications across different environments without copy-pasting. 12 $ helm repo add alluxio-charts https://alluxio-charts.storage.googleapis.com/openSource/2.6.0 $ helm install alluxio -f config.yaml alluxio-charts/alluxio Deploy Alluxio on Kubernetes
  • 13. Deployment Variance - Alluxio CSI Container Storage Interface - Expose Alluxio to container applications (e.g. Docker, Kubernetes) as persistent storage. - Support customizing storage features without the need to integrate Alluxio driver into Kubernetes package and wait for kubernetes releases. 13
  • 14. Alluxio CSI - Mount point level access control - Separate mount points for different users or use cases - Access control via POSIX filesystem user group - Different configuration for different mount points - Separate read/write path - Enable metadata cache for input data folder - Disable metadata cache and set write option to Through for output folder. Customized mount option Path in Alluxio
  • 15. Alluxio CSI Reference - Microsoft: Speed up large-scale ML/DL offline inference job with Alluxio - Alluxio/alluxio-csi (github.com) - Alluxio CSI integration and improvements (#13435) Special thanks to Binyang from Microsoft and Baolong from Tencent 15
  • 16. Deployment Variance - Fluid Fluid is an open source Kubernetes-native Distributed Dataset Orchestrator and Accelerator for data-intensive applications, such as big data and AI applications. Apart from accelerating training, Fluid also supports - Access control in dataset level for data scientists - Auto launching and scaling internal Alluxio services. Users are not needed to worry about the deployment pain points of Alluxio services. 16
  • 17. 17
  • 18. Fluid Reference - Fluid: When Alluxio Meets Kubernetes - Speeding up Atlas deep learning platform with Alluxio + Fluid - cheyang/fluid (github.com) Special thanks to Cheyang from Alibaba 18
  • 19. Updates in Alluxio POSIX Developments 19
  • 20. Community Collaboration Community-driven collaboration - Contributors from NJU, Alibaba, Tencent, Microsoft, Alluxio Already in used by Microsoft, Analytics Aspects, BOSS 20
  • 21. JNI-Fuse enhancements (available in 2.6.0) - Local RPC elimination by integrating Fuse into Worker process - Modularized JNI-Fuse library (github.com/maobaolong/jnifuse) - Implement Fuse flush() operation in write path (#13103) - Implement Fuse utimens() operation to support linux touch command (#13218) - Implement Fuse symlink() operation to avoid execution errors (#13429) - Support open file for overwrite (#13236) - Fix Fuse write then read problem in Fuse async release() operation (#13160) - Add metrics for Fuse operations (#13201) 21
  • 22. JNI-Fuse enhancements (ongoing projects) - Alluxio CSI improvements (#13435) - Improve data loading speed by dynamically allocating loading jobs (#13485) - Support updating configuration during runtime to better support Alluxio on Kubernetes (#13643) - Support libfuse 3 (#12758) - Remote RPC optimization Join Alluxio weekly community sync to create solutions together! 22
  • 23. We Are Hiring! (contact careers@alluxio.com) Join the Alluxio Community today www.alluxio.io/slack | @alluxio 23