SlideShare a Scribd company logo
Alluxio –Virtual Unified File System
Li Haoyuan – Founder and CEO at Alluxio
haoyuan@alluxio.com
The Global Datasphere will grow from
33 ZB in 2018 to 175 ZB by 2025
China’s Datasphere is expected to grow 30% on average
over the next 7 years &
will be the largest Datasphere of all regions by 2025
Source: IDC White Paper – #US44413318
We are in the era where
Data is your biggest asset
Extracting maximum value from your data
The Data Ecosystem Evolution
Data Ecosystem - Beta Data Ecosystem 1.0
COMPUTE
STORAGE STORAGE
COMPUTE
Data Ecosystem 1.0 – The Challenges
STORAGE
COMPUTE
Complex
Low performance
Expensive
3 big trends driving the need for a new architecture
Separation of
Compute &
Storage
Hybrid –Multi
cloud
environments
Self-service data
across the
enterprise
The Data Architecture for the Digital Future
Core requirements of 2.0 data ecosystem
Unified Memory-first Native APIs Multi-hybrid cloud
AVirtual Unified File System
Virtual Unified File System
Java File API HDFS Interface S3 Interface REST APIFUSE Interface
HDFS Driver Swift Driver S3 Driver NFS Driver
Unified
Namespace
Bring all files into a
single interface
Interact with data
using any API
Accelerate & tier
data transparently
API
Translation
Intelligent
Multi-tiering
Key Innovations of theVirtual Unified File System
Unified Namespace: Global Data Accessibility
FUSE Interface makes all enterprise data available locally
SUPPORTS
• HDFS
• NFS
• OpenStack
• Ceph
• Amazon S3
• Azure
• Google Cloud
IT OPS FRIENDLY
• Storage mounted into Alluxio
by central IT
• Security in Alluxio mirrors
source data
• Authentication through
LDAP/AD
• Wireline encryption
HDFS #1
Object Store
NFS
HDFS #2
Server-side API Translation: From legacy to modern
Convert from Client-side Interface to native Storage Interface
Java File API HDFS Interface S3 Interface REST APIFUSE Interface
HDFS Driver Swift DriverS3 Driver NFS Driver
Intelligent Multi-tiering: Get high-value data faster
Local performance from remote data using multi-tier storage
Hot Warm Cold
RAM SSD HDD
Read & Write Buffering
Transparent to App
Policies for pinning,
promotion/demotion,TTL
Real world Use cases
Virtual
Data Lake
§ Accelerate batch, micro-
batch & streaming jobs
§ Slowly transition to
lower cost object stores
§ Run in hybrid cloud
environment with
compute in the cloud
§ Accelerate ML jobs
running on object stores
or file systems
§ Provide consistent
performance to data
scientists
§ Provide unified interface
to access all data
§ Accelerate & tier data
transparently across
storage tiers
§ Co-locate remote data
with compute for
performance
Machine Learning
Productivity
Self-service data
across hybrid cloud
Popular Technical Use Cases
100+ Known Production Deployments
Massive clusters deployed, many with 500+ nodes
Financial Services Case Study
Machine Learning Use Case
Challenge –
Gain end to end view of business
with large volume of data
Queries were slow / not interactive,
resulting in operational inefficiency
Solution –
ETL Data from Teradata to Alluxio
Impact –
Faster Time to Market – “Now we
don’t have to work Sundays”
SPARK
TERADATA
SPARK
TERADATA
Retail Case Study
Customer Analytics Use Case
Challenge –
Bottleneck in Trend Analysis of
mission critical daily sales and
inventory management
Queries were slow / not interactive,
resulting in operational inefficiency
Solution –
With Alluxio, data queries are 10X
faster
Impact –
Higher operational efficiency
SPARK
HDFS
SPARK
HDFS
Telecom Case Study
Customer 360 Insights
Challenge –
Desired a central view of consumer
information in near real time for
proactive support.
Many HDFS, different distributions,
many incompatible versions. On-
prem & cloud. Integration through
heavy ETL.
Solution –
Alluxio integrates data into central
catalog for fast access to consumer
interaction records.
Impact –
Reduced integration time
Faster data speed & freshness
HADOOP ML HADOOP
HDFS HDFS HDFS
ML
ETL
HDP
HDFS
CDH
HDFS
MAPR
HDFS
HDFS
Machine Learning / Deep Learning –
Maximizes GPU investment:
• Self-serve data access for data
scientists
• Rapid integration of new data
sources
• Improved memory management &
performance
Incredible Open Source Momentum with growing community
920+ contributors &
growing
3760+ Git Stars
Apache 2.0 Licensed
Hundreds of thousands
of downloads
Download Alluxio today @ www.alluxio.org
ThankYou
Join the Alluxio Community
www.alluxio.org | www.alluxio.com | Twitter: @alluxio

More Related Content

PDF
Achieving compute and storage independence for data-driven workloads
PDF
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Enabling big data & AI workloads on the object store at DBS
PDF
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
PDF
The Pandemic Changes Everything, the Need for Speed and Resiliency
PDF
Data Orchestration for AI, Big Data, and Cloud
PDF
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Achieving compute and storage independence for data-driven workloads
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Accelerate Analytics and ML in the Hybrid Cloud Era
Enabling big data & AI workloads on the object store at DBS
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
The Pandemic Changes Everything, the Need for Speed and Resiliency
Data Orchestration for AI, Big Data, and Cloud
Building a high-performance data lake analytics engine at Alibaba Cloud with ...

What's hot (20)

PDF
Iceberg + Alluxio for Fast Data Analytics
PDF
Using Alluxio as a Fault Tolerant Pluggable Optimization Component to Compute...
PDF
How to Develop and Operate Cloud Native Data Platforms and Applications
PDF
Orchestrate a Data Symphony
PDF
Alluxio + Spark: Accelerating Auto Data Tagging in WeRide
PDF
Best Practices for Using Alluxio with Spark
PDF
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
PDF
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
PDF
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
PDF
How to teach your data scientist to leverage an analytics cluster with Presto...
PDF
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
PDF
Introducing the Hub for Data Orchestration
PDF
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
PDF
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
PDF
Alluxio Use Cases and Future Directions
PDF
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
PDF
Unified Data Access with Gimel
PPTX
Cloudian HyperStore Operating Environment
PDF
Reducing large S3 API costs using Alluxio at Datasapiens
PDF
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
Iceberg + Alluxio for Fast Data Analytics
Using Alluxio as a Fault Tolerant Pluggable Optimization Component to Compute...
How to Develop and Operate Cloud Native Data Platforms and Applications
Orchestrate a Data Symphony
Alluxio + Spark: Accelerating Auto Data Tagging in WeRide
Best Practices for Using Alluxio with Spark
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
How to teach your data scientist to leverage an analytics cluster with Presto...
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Introducing the Hub for Data Orchestration
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Alluxio Use Cases and Future Directions
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Unified Data Access with Gimel
Cloudian HyperStore Operating Environment
Reducing large S3 API costs using Alluxio at Datasapiens
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
Ad

Similar to Alluxio - Virtual Unified File System (20)

PDF
Data EcoSystem 2.0
PDF
The Architecture of Decoupling Compute and Storage with Alluxio
PDF
Unified Big Data Analytics: Any Stack, Any Cloud
PDF
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
PDF
Achieving Separation of Compute and Storage in a Cloud World
PDF
Unify Data at Memory Speed
PDF
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
PDF
Alluxio @ Uber Seattle Meetup
PDF
Best Practices for Using Alluxio with Spark
PDF
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
PDF
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
PDF
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
PDF
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
PDF
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
PDF
Unified Data API for Distributed Cloud Analytics and AI
PDF
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
PDF
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
PDF
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
PDF
Open Source Data Orchestration for AI, Big Data, and Cloud
PDF
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Data EcoSystem 2.0
The Architecture of Decoupling Compute and Storage with Alluxio
Unified Big Data Analytics: Any Stack, Any Cloud
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Achieving Separation of Compute and Storage in a Cloud World
Unify Data at Memory Speed
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio @ Uber Seattle Meetup
Best Practices for Using Alluxio with Spark
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
Unified Data API for Distributed Cloud Analytics and AI
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Ad

More from Alluxio, Inc. (20)

PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
PDF
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
AI/ML Infra Meetup | Big Data and AI, Zoom Developers

Recently uploaded (20)

PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Complete React Javascript Course Syllabus.pdf
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
Transform Your Business with a Software ERP System
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPT
JAVA ppt tutorial basics to learn java programming
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
medical staffing services at VALiNTRY
PPTX
ISO 45001 Occupational Health and Safety Management System
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
history of c programming in notes for students .pptx
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
ai tools demonstartion for schools and inter college
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Digital Strategies for Manufacturing Companies
Upgrade and Innovation Strategies for SAP ERP Customers
Complete React Javascript Course Syllabus.pdf
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Transform Your Business with a Software ERP System
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
How to Migrate SBCGlobal Email to Yahoo Easily
JAVA ppt tutorial basics to learn java programming
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
medical staffing services at VALiNTRY
ISO 45001 Occupational Health and Safety Management System
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
history of c programming in notes for students .pptx
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Operating system designcfffgfgggggggvggggggggg
Design an Analysis of Algorithms I-SECS-1021-03
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PTS Company Brochure 2025 (1).pdf.......
ai tools demonstartion for schools and inter college
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Digital Strategies for Manufacturing Companies

Alluxio - Virtual Unified File System

  • 1. Alluxio –Virtual Unified File System Li Haoyuan – Founder and CEO at Alluxio haoyuan@alluxio.com
  • 2. The Global Datasphere will grow from 33 ZB in 2018 to 175 ZB by 2025 China’s Datasphere is expected to grow 30% on average over the next 7 years & will be the largest Datasphere of all regions by 2025 Source: IDC White Paper – #US44413318
  • 3. We are in the era where Data is your biggest asset
  • 4. Extracting maximum value from your data The Data Ecosystem Evolution
  • 5. Data Ecosystem - Beta Data Ecosystem 1.0 COMPUTE STORAGE STORAGE COMPUTE
  • 6. Data Ecosystem 1.0 – The Challenges STORAGE COMPUTE Complex Low performance Expensive
  • 7. 3 big trends driving the need for a new architecture Separation of Compute & Storage Hybrid –Multi cloud environments Self-service data across the enterprise
  • 8. The Data Architecture for the Digital Future
  • 9. Core requirements of 2.0 data ecosystem Unified Memory-first Native APIs Multi-hybrid cloud
  • 11. Virtual Unified File System Java File API HDFS Interface S3 Interface REST APIFUSE Interface HDFS Driver Swift Driver S3 Driver NFS Driver
  • 12. Unified Namespace Bring all files into a single interface Interact with data using any API Accelerate & tier data transparently API Translation Intelligent Multi-tiering Key Innovations of theVirtual Unified File System
  • 13. Unified Namespace: Global Data Accessibility FUSE Interface makes all enterprise data available locally SUPPORTS • HDFS • NFS • OpenStack • Ceph • Amazon S3 • Azure • Google Cloud IT OPS FRIENDLY • Storage mounted into Alluxio by central IT • Security in Alluxio mirrors source data • Authentication through LDAP/AD • Wireline encryption HDFS #1 Object Store NFS HDFS #2
  • 14. Server-side API Translation: From legacy to modern Convert from Client-side Interface to native Storage Interface Java File API HDFS Interface S3 Interface REST APIFUSE Interface HDFS Driver Swift DriverS3 Driver NFS Driver
  • 15. Intelligent Multi-tiering: Get high-value data faster Local performance from remote data using multi-tier storage Hot Warm Cold RAM SSD HDD Read & Write Buffering Transparent to App Policies for pinning, promotion/demotion,TTL
  • 16. Real world Use cases
  • 17. Virtual Data Lake § Accelerate batch, micro- batch & streaming jobs § Slowly transition to lower cost object stores § Run in hybrid cloud environment with compute in the cloud § Accelerate ML jobs running on object stores or file systems § Provide consistent performance to data scientists § Provide unified interface to access all data § Accelerate & tier data transparently across storage tiers § Co-locate remote data with compute for performance Machine Learning Productivity Self-service data across hybrid cloud Popular Technical Use Cases
  • 18. 100+ Known Production Deployments Massive clusters deployed, many with 500+ nodes
  • 19. Financial Services Case Study Machine Learning Use Case Challenge – Gain end to end view of business with large volume of data Queries were slow / not interactive, resulting in operational inefficiency Solution – ETL Data from Teradata to Alluxio Impact – Faster Time to Market – “Now we don’t have to work Sundays” SPARK TERADATA SPARK TERADATA
  • 20. Retail Case Study Customer Analytics Use Case Challenge – Bottleneck in Trend Analysis of mission critical daily sales and inventory management Queries were slow / not interactive, resulting in operational inefficiency Solution – With Alluxio, data queries are 10X faster Impact – Higher operational efficiency SPARK HDFS SPARK HDFS
  • 21. Telecom Case Study Customer 360 Insights Challenge – Desired a central view of consumer information in near real time for proactive support. Many HDFS, different distributions, many incompatible versions. On- prem & cloud. Integration through heavy ETL. Solution – Alluxio integrates data into central catalog for fast access to consumer interaction records. Impact – Reduced integration time Faster data speed & freshness HADOOP ML HADOOP HDFS HDFS HDFS ML ETL HDP HDFS CDH HDFS MAPR HDFS HDFS
  • 22. Machine Learning / Deep Learning – Maximizes GPU investment: • Self-serve data access for data scientists • Rapid integration of new data sources • Improved memory management & performance
  • 23. Incredible Open Source Momentum with growing community 920+ contributors & growing 3760+ Git Stars Apache 2.0 Licensed Hundreds of thousands of downloads Download Alluxio today @ www.alluxio.org
  • 24. ThankYou Join the Alluxio Community www.alluxio.org | www.alluxio.com | Twitter: @alluxio