SlideShare a Scribd company logo
DATA ORCHESTRATION SUMMIT
2020
Alluxio Architecture and
Scaling Performance
Gene Pang | Founding Engineer & Head Architect @ Alluxio, Inc.
DATA ORCHESTRATION SUMMIT
▪ Alluxio	high-level	architecture	
▪ Scaling	performance	for	large	deployments	
Outline
DATA ORCHESTRATION 
SUMMIT
2020
Alluxio Architecture
High-Level Architecture and Components
DATA ORCHESTRATION SUMMIT
Alluxio	Overview	
Data Orchestration for the Cloud
Java File API HDFS Interface S3 Interface REST APIPOSIX Interface
HDFS Driver Swift Driver S3 Driver NFS Driver
DATA ORCHESTRATION SUMMIT
Alluxio	Processes	
Alluxio
Master
Job
Master
Worker
Job
Worker
Fuse
Process
Proxy
Worker
Job
Worker
Fuse
Process
Proxy
Worker
Job
Worker
Fuse
Process
Proxy
DATA ORCHESTRATION SUMMIT
Worker
Job
Worker
Fuse
Process
Proxy
Worker
Job
Worker
Fuse
Process
Proxy
Worker
Job
Worker
Fuse
Process
Proxy
Job
Master
Alluxio
Master
Alluxio	Processes	
•  Manages	filesystem	namespace	
•  Handles	authentication	and	authorization	
•  Manages	structured	data	catalog	
•  Maintains	Alluxio	worker	membership
DATA ORCHESTRATION SUMMIT
Job
Master
Job
Worker
Fuse
Process
Proxy
Job
Worker
Fuse
Process
Proxy
Job
Worker
Fuse
Process
Proxy
Alluxio
Master
Worker WorkerWorker
Alluxio	Processes	
•  Manages	file/block	data	cache	
•  Writes	and	reads	data	to	and	from	UFS	
•  Interacts	with	clients	for	data	transfer
DATA ORCHESTRATION SUMMIT
Worker
Job
Worker
Fuse
Process
Proxy
Worker
Job
Worker
Fuse
Process
Proxy
Worker
Job
Worker
Fuse
Process
Proxy
Alluxio
Master
Job
Master
Alluxio	Processes	
•  Jobs	are	asynchronous	and	distributed	tasks	(load,	
persist,	policy-based	movements)	
•  Manages	execution	and	state	of	jobs	
•  Maintains	Job	worker	membership
DATA ORCHESTRATION SUMMIT
Job
Master
Fuse
Process
Proxy
Fuse
Process
Proxy
Fuse
Process
Proxy
Alluxio
Master
Worker WorkerWorker
Job
Worker
Job
Worker
Job
Worker
Alluxio	Processes	
•  Executes	distributed	tasks	for	jobs	
•  Uses	the	Alluxio	client	to	interact	with	Alluxio	data
DATA ORCHESTRATION SUMMIT
Job
Master
ProxyProxy Proxy
Alluxio
Master
Worker WorkerWorker
Job
Worker
Job
Worker
Job
Worker
Fuse
Process
Fuse
Process
Fuse
Process
Alluxio	Processes	
•  Enables	Alluxio	Fuse	(Filesystem	in	User	Space)	
•  Uses	Alluxio	client	to	interact	with	other	components	
•  Enables	mounting	Alluxio	namespace	to	local	filesystem	
•  Any	machine	with	this	process	can	interact	with	Alluxio
DATA ORCHESTRATION SUMMIT
Job
Master
Alluxio
Master
Worker WorkerWorker
Job
Worker
Job
Worker
Job
Worker
Fuse
Process
Fuse
Process
Fuse
Process
ProxyProxy Proxy
Alluxio	Processes	
•  Exposes	REST	and	S3-compatible	endpoints	
•  Uses	Alluxio	client	to	interact	with	other	components
DATA ORCHESTRATION 
SUMMIT
2020
Scaling Alluxio Performance
Improving Alluxio for Large Scale Deployments
DATA ORCHESTRATION SUMMIT
Large	Scale	Alluxio	Deployments	
Large	#	of	Files	 Large	#	of	Users
DATA ORCHESTRATION SUMMIT
To	stay	fast,	
avoid	slow	pauses!
DATA ORCHESTRATION SUMMIT
▪  External	Storage	Systems	(UFS)	
•  HDFS	
•  Object	stores	
•  Cloud	blob	stores	
▪  External	Catalogs	(UDB)	
•  Hive	Metastore	
•  AWS	Glue	
▪  Local	Disks	
•  SSDs	
•  HDDs	
Sources	of	Slow	Pauses
DATA ORCHESTRATION SUMMIT
Metadata	Sync	Lock	Contention	
UFS
Thread-1	
Thread-2	
Thread-3	
Thread-4	
•  Metadata	sync	is	single-threaded	
•  Requires	write	lock	for	entire	sync	
•  Blocks	other	users/threads	
Before
DATA ORCHESTRATION SUMMIT
Metadata	Sync	Lock	Contention	
UFS
Thread-1	
Thread-2	
Thread-3	
Thread-4	
•  Shorten	critical	section	which	requires	write	lock	
•  Parallelize	UFS	access	and	sync	with	multiple	threads	
•  Enable	more	concurrent	users,	and	faster	syncs	
After
DATA ORCHESTRATION SUMMIT
Slow	UFS	Data	Reads	
Alluxio
Worker
Client
UFS
•  High	concurrency	saturates	network	bandwidth	and	UFS	
•  Client	times	out	even	though	the	reads	eventually	complete	
•  Applications	may	fail	unnecessarily	
Client
Client
Client
Before
DATA ORCHESTRATION SUMMIT
Slow	UFS	Data	Reads	
Alluxio
Worker UFS
•  Expect	slow	IO,	adjust	and	handle	timeouts	better	
•  Improve	logging	to	observe	slow	IO	
•  Prevent	unnecessary	timeouts	for	users	and	applications,	during	
high	concurrency	
Client
Client
Client
Client
After
DATA ORCHESTRATION SUMMIT
Slow	Catalog	Sync	
Alluxio
Catalog UDBClient
•  Alluxio	catalog	syncs	from	the	UDB	with	a	single	thread	
•  Syncing	databases	with	many	tables	can	take	a	long	time	
Before
DATA ORCHESTRATION SUMMIT
Slow	Catalog	Sync	
Alluxio
Catalog UDBClient
•  Parallelize	catalog	syncing	with	multiple	threads	
•  Accelerate	syncing	large	databases	for	users	
After
DATA ORCHESTRATION SUMMIT
Synchronous	Disk	IO	in	Worker	Storage	Tiers	
MEM
SSD
HDD
•  CACHE_PROMOTE	requires	moving	block	to	top	tier	
•  Finding	free	space	may	require	cascading	eviction	
•  User	reads	of	cached	data	may	block	on	many	disk	IOs	
Before
DATA ORCHESTRATION SUMMIT
Synchronous	Disk	IO	in	Worker	Storage	Tiers	
MEM
SSD
HDD
•  Make	default	read	type	as	CACHE,	to	avoid	synchronous	disk	IO	
•  Rearrange	blocks	asynchronously	to	match	caching	policy	
•  Avoid	blocking	on	disk	IO	for	user	requests,	for	faster	and	more	
predictable	Alluxio	IO	performance	
After
DATA ORCHESTRATION SUMMIT
Unlimited	Waiting	for	Disk	Operations	
Local
Disk
Client
Data
Cache
get()	
•  Client-side	cache	stores	data	on	local	disk	
•  Some	disk	operations	can	take	an	unlimited	amount	of	time	
•  Applications	can	get	stuck	when	accessing	cache	
Before
DATA ORCHESTRATION SUMMIT
Unlimited	Waiting	for	Disk	Operations	
Local
Disk
Client
Data
Cache
get()	
•  Support	configurable	time	outs	for	client-side	cache	interactions	
•  Avoid	unexpected	hangs	when	accessing	client-side	cache	
After
DATA ORCHESTRATION SUMMIT
Conclusion	
Alluxio	Data	Orchestration	is	complex,	
interacts	with	external	storage	
Must	expect	slow	IO	to	external	storage	
Do	not	force	users	to	wait	for	slowness	
Handle	pauses	appropriately	
Benefits	to	Users	
faster	 predictable	
more	
concurrency

More Related Content

PPTX
Presto query optimizer: pursuit of performance
PPTX
Ambari: Agent Registration Flow
PPTX
Qlik ReplicateでApache Kafkaをターゲットとして使用する
PPTX
Netflix viewing data architecture evolution - QCon 2014
PDF
ブレソルでテラバイト級データのALTERを短時間で終わらせる
PDF
Elasticsearchを使うときの注意点 公開用スライド
PPTX
Airflow를 이용한 데이터 Workflow 관리
PDF
Oracle Database統合のベスト・プラクティス
Presto query optimizer: pursuit of performance
Ambari: Agent Registration Flow
Qlik ReplicateでApache Kafkaをターゲットとして使用する
Netflix viewing data architecture evolution - QCon 2014
ブレソルでテラバイト級データのALTERを短時間で終わらせる
Elasticsearchを使うときの注意点 公開用スライド
Airflow를 이용한 데이터 Workflow 관리
Oracle Database統合のベスト・プラクティス

What's hot (20)

PDF
昨今のストレージ選定のポイントとCephStorageの特徴
PPT
DataGuard体験記
PDF
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
PDF
【旧版】Oracle Gen 2 Exadata Cloud@Customer:サービス概要のご紹介 [2021年12月版]
PDF
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
PPTX
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
PDF
TripleO Deep Dive
PPTX
Kibana overview
PPTX
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
PPTX
WebRTCを利用した遠隔リアルタイム映像処理フレームワークの実装
PDF
Grafana introduction
PDF
OpenStackトラブルシューティング入門
PPTX
Beginner's Guide to High Availability for Postgres
 
PDF
Dataflow Management From Edge to Core with Apache NiFi
PPTX
Snowflake on Googleのターゲットエンドポイントとしての利用
PDF
20111015 勉強会 (PCIe / SR-IOV)
PDF
サーバーレスアーキテクチャのすすめ(公開版)
PDF
CloudFrontのリアルタイムログをKibanaで可視化しよう
PDF
Docker道場オンライン#1 Docker基礎概念と用語の理解
PPTX
Deep Dive into Apache Kafka
昨今のストレージ選定のポイントとCephStorageの特徴
DataGuard体験記
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
【旧版】Oracle Gen 2 Exadata Cloud@Customer:サービス概要のご紹介 [2021年12月版]
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
TripleO Deep Dive
Kibana overview
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
WebRTCを利用した遠隔リアルタイム映像処理フレームワークの実装
Grafana introduction
OpenStackトラブルシューティング入門
Beginner's Guide to High Availability for Postgres
 
Dataflow Management From Edge to Core with Apache NiFi
Snowflake on Googleのターゲットエンドポイントとしての利用
20111015 勉強会 (PCIe / SR-IOV)
サーバーレスアーキテクチャのすすめ(公開版)
CloudFrontのリアルタイムログをKibanaで可視化しよう
Docker道場オンライン#1 Docker基礎概念と用語の理解
Deep Dive into Apache Kafka
Ad

Similar to Alluxio Architecture and Performance (20)

PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Data Orchestration for the Hybrid Cloud Era
PDF
Data Orchestration Platform for the Cloud
PDF
From limited Hadoop compute capacity to increased data scientist efficiency
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Introducing the Hub for Data Orchestration
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
PDF
Alluxio 2 Community Update
PDF
Enabling Apache Spark for Hybrid Cloud
PDF
Unified Big Data Analytics: Any Stack, Any Cloud
PDF
Accelerate Spark Workloads on S3
PDF
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
PDF
Alluxio Use Cases and Future Directions
PDF
Alluxio Data Orchestration Platform for the Cloud
PDF
Open Source Data Orchestration for AI, Big Data, and Cloud
PDF
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
PDF
How the Development Bank of Singapore solves on-prem compute capacity challen...
PDF
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
PDF
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Accelerate Analytics and ML in the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud Era
Data Orchestration Platform for the Cloud
From limited Hadoop compute capacity to increased data scientist efficiency
Accelerate Analytics and ML in the Hybrid Cloud Era
Introducing the Hub for Data Orchestration
Accelerate Analytics and ML in the Hybrid Cloud Era
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2 Community Update
Enabling Apache Spark for Hybrid Cloud
Unified Big Data Analytics: Any Stack, Any Cloud
Accelerate Spark Workloads on S3
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Alluxio Use Cases and Future Directions
Alluxio Data Orchestration Platform for the Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
How the Development Bank of Singapore solves on-prem compute capacity challen...
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Ad

More from Alluxio, Inc. (20)

PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...

Recently uploaded (20)

PDF
AI in Product Development-omnex systems
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
L1 - Introduction to python Backend.pptx
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
Digital Strategies for Manufacturing Companies
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
AI in Product Development-omnex systems
Which alternative to Crystal Reports is best for small or large businesses.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Odoo POS Development Services by CandidRoot Solutions
How to Choose the Right IT Partner for Your Business in Malaysia
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Adobe Illustrator 28.6 Crack My Vision of Vector Design
L1 - Introduction to python Backend.pptx
VVF-Customer-Presentation2025-Ver1.9.pptx
Wondershare Filmora 15 Crack With Activation Key [2025
Softaken Excel to vCard Converter Software.pdf
ISO 45001 Occupational Health and Safety Management System
Digital Strategies for Manufacturing Companies
Navsoft: AI-Powered Business Solutions & Custom Software Development
2025 Textile ERP Trends: SAP, Odoo & Oracle
How to Migrate SBCGlobal Email to Yahoo Easily
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Upgrade and Innovation Strategies for SAP ERP Customers
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...

Alluxio Architecture and Performance