SlideShare a Scribd company logo
What’s New in Alluxio 2.3
7/14 Office Hour - Bin Fan, Calvin Jia
Please Join https://alluxio.io/slack for Q&A
Alluxio 2.3 Focus Areas
● Hybrid Cloud
○ Significant number of new use cases
○ Large amount of performance gain and/or cost savings
● Structured Data Services (SDS)
○ Presto continues to be a popular compute engine
○ More interest from cloud and hybrid use cases
● System Scalability & Operability
○ Large scale exposed limits in multi-tier storage management
○ Critical deployments require regular journal backups
Hybrid Cloud Features
● One Command Deployment on Amazon EMR and Dataproc
○ Try the demo hybrid use case - AWS or GCP
○ Use Terraform modules to customize your own cloud deployments
● Validate Env
○ Set of tools for connecting to remote storage and validating it works
○ New GUI for mounting remote storage which runs several diagnostic checks
○ New CLI commands for directly running the checks
■ runHdfsMountTests - checks HDFS connectivity and settings
■ runUfsIOTest - checks UFS I/O capabilities
■ runHmsTests - checks Hive Metastore connectivity and settings
Hybrid Cloud Features
● Concurrent Metadata Synchronization
○ Enable stricter metadata synchronization settings without performance loss
○ Improved synchronization algorithm to enable multiple paths in the same subtree to be
synced at once
○ Order of magnitude improvement, especially for larger namespaces that can utilize the full
parallelism
○ Improved performance of INotify based active sync, an important feature in hybrid
deployments
SDS Features
● Support for Amazon Glue UDB
○ Use Amazon Glue to store table metadata instead of HMS
○ Many users in cloud requested Glue support
○ Will continue to provide compatibility in the cloud
● Full Transformation Support for ORC format
○ Benefit from transformation capabilities in SDS with ORC
○ Seeing a number of users on ORC and Parquet
○ ORC vs Parquet is mostly based on the compute framework and available optimizations
System Scalability
● Tiered Storage V2
○ Use multi-tiered storage in read-write heavy workloads without performance loss
○ Eviction becomes O(1) operation, ensure write or cache path is not impacted
○ Evictors are now termed Annotators because it simply pre-marks the blocks to evict
○ Trade-off made is block placement is not 100% accurate when using multiple tiers
○ Background tasks are system resource aware and will operate only during quiet periods
System Operability
● Delegated Journal Backups for HA Deployments
○ Backups no longer impact master response latency
○ Instead of using primary master to backup, secondaries can run backups
○ Very useful for deployments running regular backups, can avoid performance impact
○ Window for performance hit is reduced to several seconds instead of minutes
Case Studies
Improving Presto Latency at Facebook
With Alluxio data caching, for one of
the Facebook internal Presto use
cases:
- query latencies improved by 33%
(P50), and 48% (P95).
- 57% improvement in IO for
remote data source scans.
https://prestodb.io/blog/2020/06/16/alluxio-datacaching
Presto + Alluxio + Facebook HDFS
Memory-speed I/O for Model Training in K8s
Alibaba Cloud K8s team is leveraging
Alluxio Posix Interface to serve deep
learning training in K8s in Kubernetes
environment.
- 40% reduction in training time &
cost
https://www.alluxio.io/blog/efficient-model-training-in-the-cloud-with-kubernetes-tensorflow-and-alluxio/
Tensorflow + Alluxio + OSS + K8s
Data Proxy for Hybrid Cloud
WeRide, a self-driving-car company,
deployed Alluxio to leverage Alluxio
as a local cache in each offices, on
top of a centralized S3 data lake
- Saving $5 per issue ticket per
engineer
https://www.alluxio.io/blog/building-a-cross-region-hybrid-cloud-storage-gateway-for-engineers-at-weride/
DS tools + Alluxio + S3 (remote)

More Related Content

PDF
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
PDF
Reducing large S3 API costs using Alluxio at Datasapiens
PDF
Data Orchestration for the Hybrid Cloud Era
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Introducing the Hub for Data Orchestration
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
PDF
Alluxio Architecture and Performance
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
Reducing large S3 API costs using Alluxio at Datasapiens
Data Orchestration for the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
Introducing the Hub for Data Orchestration
Accelerate Analytics and ML in the Hybrid Cloud Era
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio Architecture and Performance

What's hot (20)

PDF
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
PDF
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
PDF
Alluxio Use Cases and Future Directions
PDF
Iceberg + Alluxio for Fast Data Analytics
PDF
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
PDF
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
PDF
Enabling big data & AI workloads on the object store at DBS
PDF
Alluxio - Scalable Filesystem Metadata Services
PDF
Accelerate Cloud Training with Alluxio
PDF
Accelerating Data Computation on Ceph Objects
PDF
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
PDF
Best Practices for Using Alluxio with Spark
PDF
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
PDF
Hybrid data lake on google cloud with alluxio and dataproc
PDF
Orchestrate a Data Symphony
PDF
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Data Orchestration for AI, Big Data, and Cloud
PDF
How to Develop and Operate Cloud Native Data Platforms and Applications
PDF
Presto on Alluxio Hands-On Lab
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Alluxio Use Cases and Future Directions
Iceberg + Alluxio for Fast Data Analytics
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Enabling big data & AI workloads on the object store at DBS
Alluxio - Scalable Filesystem Metadata Services
Accelerate Cloud Training with Alluxio
Accelerating Data Computation on Ceph Objects
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Best Practices for Using Alluxio with Spark
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Hybrid data lake on google cloud with alluxio and dataproc
Orchestrate a Data Symphony
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Accelerate Analytics and ML in the Hybrid Cloud Era
Data Orchestration for AI, Big Data, and Cloud
How to Develop and Operate Cloud Native Data Platforms and Applications
Presto on Alluxio Hands-On Lab
Ad

Similar to What's New in Alluxio 2.3 (20)

PDF
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
PDF
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
PDF
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
PPTX
Alluxio: Unify Data at Memory Speed
PDF
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
PDF
Alluxio 2 Community Update
PDF
Unified Big Data Analytics: Any Stack, Any Cloud
PDF
Open Source Data Orchestration for AI, Big Data, and Cloud
PDF
Alluxio Community Office Hour: Getting Started with Alluxio Open Source
PDF
Building Fast SQL Analytics on Anything with Presto, Alluxio
PDF
Alluxio @ Uber Seattle Meetup
PDF
Alluxio Data Orchestration Platform for the Cloud
PDF
Alluxio Innovations for Structured Data
PDF
Enabling Ultra-fast Presto in the Cloud with Alluxio
PDF
Enabling Apache Spark for Hybrid Cloud
PDF
Accelerate Spark Workloads on S3
PDF
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
PDF
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
PDF
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
PDF
Alluxio 2.9 Release Overview
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio: Unify Data at Memory Speed
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Alluxio 2 Community Update
Unified Big Data Analytics: Any Stack, Any Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
Alluxio Community Office Hour: Getting Started with Alluxio Open Source
Building Fast SQL Analytics on Anything with Presto, Alluxio
Alluxio @ Uber Seattle Meetup
Alluxio Data Orchestration Platform for the Cloud
Alluxio Innovations for Structured Data
Enabling Ultra-fast Presto in the Cloud with Alluxio
Enabling Apache Spark for Hybrid Cloud
Accelerate Spark Workloads on S3
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Alluxio 2.9 Release Overview
Ad

More from Alluxio, Inc. (20)

PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...

Recently uploaded (20)

PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
System and Network Administration Chapter 2
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Nekopoi APK 2025 free lastest update
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Introduction to Artificial Intelligence
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Transform Your Business with a Software ERP System
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
2025 Textile ERP Trends: SAP, Odoo & Oracle
Odoo Companies in India – Driving Business Transformation.pdf
System and Network Administration Chapter 2
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
ISO 45001 Occupational Health and Safety Management System
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Nekopoi APK 2025 free lastest update
Odoo POS Development Services by CandidRoot Solutions
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Wondershare Filmora 15 Crack With Activation Key [2025
Design an Analysis of Algorithms II-SECS-1021-03
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
How to Migrate SBCGlobal Email to Yahoo Easily
Introduction to Artificial Intelligence
VVF-Customer-Presentation2025-Ver1.9.pptx
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Transform Your Business with a Software ERP System
ManageIQ - Sprint 268 Review - Slide Deck

What's New in Alluxio 2.3

  • 1. What’s New in Alluxio 2.3 7/14 Office Hour - Bin Fan, Calvin Jia Please Join https://alluxio.io/slack for Q&A
  • 2. Alluxio 2.3 Focus Areas ● Hybrid Cloud ○ Significant number of new use cases ○ Large amount of performance gain and/or cost savings ● Structured Data Services (SDS) ○ Presto continues to be a popular compute engine ○ More interest from cloud and hybrid use cases ● System Scalability & Operability ○ Large scale exposed limits in multi-tier storage management ○ Critical deployments require regular journal backups
  • 3. Hybrid Cloud Features ● One Command Deployment on Amazon EMR and Dataproc ○ Try the demo hybrid use case - AWS or GCP ○ Use Terraform modules to customize your own cloud deployments ● Validate Env ○ Set of tools for connecting to remote storage and validating it works ○ New GUI for mounting remote storage which runs several diagnostic checks ○ New CLI commands for directly running the checks ■ runHdfsMountTests - checks HDFS connectivity and settings ■ runUfsIOTest - checks UFS I/O capabilities ■ runHmsTests - checks Hive Metastore connectivity and settings
  • 4. Hybrid Cloud Features ● Concurrent Metadata Synchronization ○ Enable stricter metadata synchronization settings without performance loss ○ Improved synchronization algorithm to enable multiple paths in the same subtree to be synced at once ○ Order of magnitude improvement, especially for larger namespaces that can utilize the full parallelism ○ Improved performance of INotify based active sync, an important feature in hybrid deployments
  • 5. SDS Features ● Support for Amazon Glue UDB ○ Use Amazon Glue to store table metadata instead of HMS ○ Many users in cloud requested Glue support ○ Will continue to provide compatibility in the cloud ● Full Transformation Support for ORC format ○ Benefit from transformation capabilities in SDS with ORC ○ Seeing a number of users on ORC and Parquet ○ ORC vs Parquet is mostly based on the compute framework and available optimizations
  • 6. System Scalability ● Tiered Storage V2 ○ Use multi-tiered storage in read-write heavy workloads without performance loss ○ Eviction becomes O(1) operation, ensure write or cache path is not impacted ○ Evictors are now termed Annotators because it simply pre-marks the blocks to evict ○ Trade-off made is block placement is not 100% accurate when using multiple tiers ○ Background tasks are system resource aware and will operate only during quiet periods
  • 7. System Operability ● Delegated Journal Backups for HA Deployments ○ Backups no longer impact master response latency ○ Instead of using primary master to backup, secondaries can run backups ○ Very useful for deployments running regular backups, can avoid performance impact ○ Window for performance hit is reduced to several seconds instead of minutes
  • 9. Improving Presto Latency at Facebook With Alluxio data caching, for one of the Facebook internal Presto use cases: - query latencies improved by 33% (P50), and 48% (P95). - 57% improvement in IO for remote data source scans. https://prestodb.io/blog/2020/06/16/alluxio-datacaching Presto + Alluxio + Facebook HDFS
  • 10. Memory-speed I/O for Model Training in K8s Alibaba Cloud K8s team is leveraging Alluxio Posix Interface to serve deep learning training in K8s in Kubernetes environment. - 40% reduction in training time & cost https://www.alluxio.io/blog/efficient-model-training-in-the-cloud-with-kubernetes-tensorflow-and-alluxio/ Tensorflow + Alluxio + OSS + K8s
  • 11. Data Proxy for Hybrid Cloud WeRide, a self-driving-car company, deployed Alluxio to leverage Alluxio as a local cache in each offices, on top of a centralized S3 data lake - Saving $5 per issue ticket per engineer https://www.alluxio.io/blog/building-a-cross-region-hybrid-cloud-storage-gateway-for-engineers-at-weride/ DS tools + Alluxio + S3 (remote)