SlideShare a Scribd company logo
Scalable Filesystem Metadata Services
06/19 Alluxio Meetup
Featuring gRPC, Raft, and RocksDB
● Release Manager for Alluxio 2.0.0
● Contributor since Tachyon 0.4 (2012)
● Founding Engineer @ Alluxio
About Me
Calvin Jia
Alluxio Overview
• Open source data orchestration
• Commonly used for data analytics such as OLAP on Hadoop
• Deployed at Huya, Two Sigma, Tencent, and many others
• Largest deployments of over 1000 nodes
Agenda
Architecture1
Challenges2
Solutions3
Architecture
Alluxio Architecture
Alluxio Master
• Responsible for storing and serving metadata in Alluxio
• Alluxio Metadata consists of files and blocks
• Main data structure is the Filesystem Tree
• The namespace for files in Alluxio
• Can include mounts of other file system namespaces
• The size of the tree can be very large!
Challenges
Metadata Storage Challenges
• Storing the raw metadata becomes a problem with a large number
of files
• On average, each file takes 1KB of on-heap storage
• 1 billion files would take 1 TB of heap space!
• A typical JVM runs with < 64GB of heap space
• GC becomes a big problem when using larger heaps
Metadata Storage Challenges
• Durability for the metadata is important
• Need to restore state after planned or unplanned restarts or machine loss
• The speed at which the system can recover determines the amount
of downtime suffered
• Restoring a 1TB sized snapshot takes a nontrivial amount of time!
Metadata Serving Challenges
• File operations (ie. getStatus, create) need to be fast
• On heap data structures excel in this case
• Operations need to be optimized for high concurrency
• Generally many readers and few writers
Metadata Serving Challenges
• The metadata service also needs to sustain high load
• A cluster of 100 machines can easily house over 5k concurrent clients!
• Connection life cycles need to be managed well
• Connection handshake is expensive
• Holding an idle connection is also detrimental
Solutions
RocksDB
• https://rocksdb.org/
• RocksDB is an embeddable
persistent key-value store for
fast storage
Tiered Metadata Storage
• Uses an embedded RocksDB to store inode tree
• Solves the storage heap space problem
• Developed new data structures to optimize for storage in RocksDB
• Internal cache used to mitigate on-disk RocksDB performance
• Solves the serving latency problem
• Performance is comparable to previous on-heap implementation
• [In-Progress] Use tiered recovery to incrementally make the
namespace available on cold start
• Solves the recovery problem
Tiered Metadata Storage
16
Alluxio Master
Local Disk
RocksDB (Embedded)
● Inode Tree
● Block Map
● Worker Block Locations
On Heap
● Inode Cache
● Mount Table
● Locks
RAFT
• https://raft.github.io/
• Raft is a consensus algorithm that is
designed to be easy to understand.
It's equivalent to Paxos in fault-
tolerance and performance.
• Implemented by
https://github.com/atomix/copycat
Built-in Fault Tolerance
• Alluxio Masters are run as a quorum for journal fault tolerance
• Metadata can be recovered, solving the durability problem
• This was previously done utilizing an external fault tolerance storage
• Alluxio Masters leverage the same quorum to elect a leader
• Enables hot standbys for rapid recovery in case of single node failure
gRPC
• https://grpc.io/
• gRPC is a modern open source
high performance RPC
framework that can run in any
environment
• Works well with Protobuf for
serialization
gRPC Transport Layer
• Connection multiplexing to reduce the number of connections from
# of application threads to # of applications
• Solves the connection life cycle management problem
• Threading model enables the master to serve concurrent requests at
scale
• Solves the high load problem
• High metadata throughput needs to be matched with efficient IO
• Consolidated Thrift (Metadata) and Netty (IO)
gRPC Transport Layer
21
Thrift (Metadata)
Netty (IO)
Alluxio
Master
Alluxio
Worker
Alluxio
Worker
Alluxio
Client
Alluxio
Master
Alluxio
Worker
Alluxio
Worker
Alluxio
Client
gRPC (Metadata + IO)
Questions?
Alluxio Website - https://www.alluxio.io
Alluxio Community Slack Channel - https://www.alluxio.io/slack
Alluxio Office Hours & Webinars - https://www.alluxio.io/events

More Related Content

PDF
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
PDF
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
PDF
What's New in Alluxio 2.3
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Reducing large S3 API costs using Alluxio at Datasapiens
PDF
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
PDF
Data Orchestration for the Hybrid Cloud Era
PDF
Alluxio Architecture and Performance
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
What's New in Alluxio 2.3
Accelerate Analytics and ML in the Hybrid Cloud Era
Reducing large S3 API costs using Alluxio at Datasapiens
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
Data Orchestration for the Hybrid Cloud Era
Alluxio Architecture and Performance

What's hot (20)

PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Data Orchestration for AI, Big Data, and Cloud
PDF
Orchestrate a Data Symphony
PDF
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
PDF
Open Source Data Orchestration for AI, Big Data, and Cloud
PDF
Enabling big data & AI workloads on the object store at DBS
PDF
Running Spark & Alluxio in Kubernetes
PDF
Iceberg + Alluxio for Fast Data Analytics
PDF
Introducing the Hub for Data Orchestration
PDF
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
PDF
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
PDF
Enabling Apache Spark for Hybrid Cloud
PDF
Hands-on with Alluxio Structured Data Management
PDF
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
PDF
Accelerating Data Computation on Ceph Objects
PDF
Alluxio 2 Community Update
PDF
How to Develop and Operate Cloud Native Data Platforms and Applications
PDF
Alluxio Use Cases and Future Directions
PDF
Alluxio Innovations for Structured Data
PDF
Accelerate Cloud Training with Alluxio
Accelerate Analytics and ML in the Hybrid Cloud Era
Data Orchestration for AI, Big Data, and Cloud
Orchestrate a Data Symphony
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Open Source Data Orchestration for AI, Big Data, and Cloud
Enabling big data & AI workloads on the object store at DBS
Running Spark & Alluxio in Kubernetes
Iceberg + Alluxio for Fast Data Analytics
Introducing the Hub for Data Orchestration
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Enabling Apache Spark for Hybrid Cloud
Hands-on with Alluxio Structured Data Management
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Accelerating Data Computation on Ceph Objects
Alluxio 2 Community Update
How to Develop and Operate Cloud Native Data Platforms and Applications
Alluxio Use Cases and Future Directions
Alluxio Innovations for Structured Data
Accelerate Cloud Training with Alluxio
Ad

Similar to Alluxio - Scalable Filesystem Metadata Services (20)

PDF
Scalable and High available Distributed File System Metadata Service Using gR...
PDF
Building a Distributed File System for the Cloud-Native Era
PDF
Scalable Filesystem Metadata Services with RocksDB
PPTX
Alluxio Presentation at Strata San Jose 2016
PDF
Alluxio Community Office Hour: Getting Started with Alluxio Open Source
PDF
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
PDF
4K Video Downloader Crack + License Key 2025
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
PDF
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
PPTX
Alluxio: Unify Data at Memory Speed
PDF
Alluxio Product school Webinar - Distributed Caching for Generative AI
PDF
Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Wor...
PDF
Zookeeper vs Raft: Stateful distributed coordination with HA and Fault Tolerance
PDF
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
PDF
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
PDF
zookeeer+raft-2.pdf
PDF
PowerAlluxio
PDF
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
PDF
Alluxio @ Uber Seattle Meetup
Scalable and High available Distributed File System Metadata Service Using gR...
Building a Distributed File System for the Cloud-Native Era
Scalable Filesystem Metadata Services with RocksDB
Alluxio Presentation at Strata San Jose 2016
Alluxio Community Office Hour: Getting Started with Alluxio Open Source
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
4K Video Downloader Crack + License Key 2025
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
Alluxio: Unify Data at Memory Speed
Alluxio Product school Webinar - Distributed Caching for Generative AI
Alluxio Webinar | Optimize, Don't Overspend: Data Caching Strategy for AI Wor...
Zookeeper vs Raft: Stateful distributed coordination with HA and Fault Tolerance
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
zookeeer+raft-2.pdf
PowerAlluxio
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Alluxio @ Uber Seattle Meetup
Ad

More from Alluxio, Inc. (20)

PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
PDF
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
AI/ML Infra Meetup | Big Data and AI, Zoom Developers

Recently uploaded (20)

PDF
Digital Strategies for Manufacturing Companies
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Introduction to Artificial Intelligence
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
AI in Product Development-omnex systems
PPTX
Transform Your Business with a Software ERP System
PPTX
ISO 45001 Occupational Health and Safety Management System
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
System and Network Administration Chapter 2
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
ai tools demonstartion for schools and inter college
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
history of c programming in notes for students .pptx
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
Digital Strategies for Manufacturing Companies
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Introduction to Artificial Intelligence
PTS Company Brochure 2025 (1).pdf.......
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
How to Choose the Right IT Partner for Your Business in Malaysia
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
CHAPTER 2 - PM Management and IT Context
AI in Product Development-omnex systems
Transform Your Business with a Software ERP System
ISO 45001 Occupational Health and Safety Management System
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
System and Network Administration Chapter 2
Adobe Illustrator 28.6 Crack My Vision of Vector Design
ai tools demonstartion for schools and inter college
Softaken Excel to vCard Converter Software.pdf
Online Work Permit System for Fast Permit Processing
Wondershare Filmora 15 Crack With Activation Key [2025
history of c programming in notes for students .pptx
2025 Textile ERP Trends: SAP, Odoo & Oracle

Alluxio - Scalable Filesystem Metadata Services

  • 1. Scalable Filesystem Metadata Services 06/19 Alluxio Meetup Featuring gRPC, Raft, and RocksDB
  • 2. ● Release Manager for Alluxio 2.0.0 ● Contributor since Tachyon 0.4 (2012) ● Founding Engineer @ Alluxio About Me Calvin Jia
  • 3. Alluxio Overview • Open source data orchestration • Commonly used for data analytics such as OLAP on Hadoop • Deployed at Huya, Two Sigma, Tencent, and many others • Largest deployments of over 1000 nodes
  • 7. Alluxio Master • Responsible for storing and serving metadata in Alluxio • Alluxio Metadata consists of files and blocks • Main data structure is the Filesystem Tree • The namespace for files in Alluxio • Can include mounts of other file system namespaces • The size of the tree can be very large!
  • 9. Metadata Storage Challenges • Storing the raw metadata becomes a problem with a large number of files • On average, each file takes 1KB of on-heap storage • 1 billion files would take 1 TB of heap space! • A typical JVM runs with < 64GB of heap space • GC becomes a big problem when using larger heaps
  • 10. Metadata Storage Challenges • Durability for the metadata is important • Need to restore state after planned or unplanned restarts or machine loss • The speed at which the system can recover determines the amount of downtime suffered • Restoring a 1TB sized snapshot takes a nontrivial amount of time!
  • 11. Metadata Serving Challenges • File operations (ie. getStatus, create) need to be fast • On heap data structures excel in this case • Operations need to be optimized for high concurrency • Generally many readers and few writers
  • 12. Metadata Serving Challenges • The metadata service also needs to sustain high load • A cluster of 100 machines can easily house over 5k concurrent clients! • Connection life cycles need to be managed well • Connection handshake is expensive • Holding an idle connection is also detrimental
  • 14. RocksDB • https://rocksdb.org/ • RocksDB is an embeddable persistent key-value store for fast storage
  • 15. Tiered Metadata Storage • Uses an embedded RocksDB to store inode tree • Solves the storage heap space problem • Developed new data structures to optimize for storage in RocksDB • Internal cache used to mitigate on-disk RocksDB performance • Solves the serving latency problem • Performance is comparable to previous on-heap implementation • [In-Progress] Use tiered recovery to incrementally make the namespace available on cold start • Solves the recovery problem
  • 16. Tiered Metadata Storage 16 Alluxio Master Local Disk RocksDB (Embedded) ● Inode Tree ● Block Map ● Worker Block Locations On Heap ● Inode Cache ● Mount Table ● Locks
  • 17. RAFT • https://raft.github.io/ • Raft is a consensus algorithm that is designed to be easy to understand. It's equivalent to Paxos in fault- tolerance and performance. • Implemented by https://github.com/atomix/copycat
  • 18. Built-in Fault Tolerance • Alluxio Masters are run as a quorum for journal fault tolerance • Metadata can be recovered, solving the durability problem • This was previously done utilizing an external fault tolerance storage • Alluxio Masters leverage the same quorum to elect a leader • Enables hot standbys for rapid recovery in case of single node failure
  • 19. gRPC • https://grpc.io/ • gRPC is a modern open source high performance RPC framework that can run in any environment • Works well with Protobuf for serialization
  • 20. gRPC Transport Layer • Connection multiplexing to reduce the number of connections from # of application threads to # of applications • Solves the connection life cycle management problem • Threading model enables the master to serve concurrent requests at scale • Solves the high load problem • High metadata throughput needs to be matched with efficient IO • Consolidated Thrift (Metadata) and Netty (IO)
  • 21. gRPC Transport Layer 21 Thrift (Metadata) Netty (IO) Alluxio Master Alluxio Worker Alluxio Worker Alluxio Client Alluxio Master Alluxio Worker Alluxio Worker Alluxio Client gRPC (Metadata + IO)
  • 22. Questions? Alluxio Website - https://www.alluxio.io Alluxio Community Slack Channel - https://www.alluxio.io/slack Alluxio Office Hours & Webinars - https://www.alluxio.io/events