SlideShare a Scribd company logo
Alluxio	
  (formerly	
  Tachyon)
Open	
  Source	
  Memory	
  Speed	
  Virtual	
  Distributed	
  Storage
Haoyuan	
  Li
CEO,	
  Alluxio,	
  Inc.
2
Rebranded from Tachyon to Alluxio!
Tachyon
Alluxio
3
Rebranded from Tachyon to Alluxio!
http://www.alluxio.com/blog/
About Alluxio
• Team
– Alluxio Creators and Top Developers/Committers
(all top 8 committers).
• Investors
Performance Trend: Memory is Fast
• RAM throughput increasing exponentially
• Disk throughput increasing slowly
• Memory-locality key to interactive response
times
Price Trend: Memory is Cheaper
Source:	
  jcmit.com
The Big Data Ecosystem Today
What is Alluxio?
• Alluxio: Memory Speed Virtual Distributed Storage
• Enables Virtualized Data Across Multiple Types of Storage
9
Open Source Community Growth
0
50
100
150
200
250
300
350
#	
  Contributors	
  (gitcommit	
  history)
v0.2 v0.3
v0.4
v0.5
v0.6
v0.7
v0.8
10
Open Source Community Growth
0
50
100
150
200
250
300
350
#	
  Contributors	
  (gitcommit	
  history)
v0.2 v0.3
v0.4
v0.5
v0.6
v0.7
v0.8
v1.0
v1.1
Open Source Alluxio System
• The fastest
growing open
source project
in big data
• Over 250
contributors
from over 100
organizations
Alluxio Benefits
• Flexibility
– Enable new workloads across any storage systems
– Unified Name Space enable application to access data in any
storage system
• Agility
– Work with the framework of your choice
– Work with the storage of your choice
• Performance
– High performance data access
• Cost
– Grow Storage and Compute independently
• Any application accesses any data from any storage at
memory speed.
New Features and
Improvements in
Alluxio 1.0 and 1.1
Gene Pang @ Alluxio, Inc.
June 15, 2016 @ Alluxio Meetup (hosted by Intel)
About Me
• Gene Pang - Software Engineer @ Alluxio, Inc.
• One of the core maintainers of Alluxio Open Source Project
• Ph.D. @ AMPLab, UC Berkeley
• Worked at Google before UC Berkeley
• Twitter: @unityxx
14
15
Outline
Performance Improvement Results in Alluxio 1.1
New Developments in Alluxio
Alluxio Architecture Overview
16
Alluxio Architecture
Overview
17
Architecture Overview
Alluxio
Master
Alluxio
Worker
Alluxio
Worker
Alluxio
Worker
Under File
System
Under File
System
Journal
Manages
metadata
Serves
data blocks
Mount multiple
storage systems
18
Alluxio New
Developments
19
Releases
Tachyon 0.8 – Oct 22, 2015
Alluxio 1.0 – Feb 23, 2016
Alluxio 1.1 – Jun 7, 2016
20
New Developments
New Integrations
Usability Improvements
Performance Improvements
Access Control (Alpha)
21
New Integrations
Native OpenStack Swift Driver
Alluxio to FUSE Connector
Google Cloud Storage
Aliyun Object Storage Service
Google Compute Engine
improve performance, reduce complexity
manage data on Alibaba Cloud
mount Alluxio to local file system
manage data on Google Cloud Platform
deploy Alluxio on Google Cloud Platform
22
Access Control (Alpha)
User/Group Support
Command-line Permission Tools
Configuration Parameter
File System Permissions
similar to POSIX permission model
chown, chgrp, chmod
alluxio.security.authorization.permission.enabled
similar to POSIX permission model
23
Usability Improvements
Write Location Policies
Simplified Configuration
Automatic Metadata Loading
configure how to write data to Alluxio
load metadata automatically
customize with properties
24
Performance Improvements
Improved Alluxio Master Scalability
Better Support for Random I/O Workloads
Improved Alluxio Worker Scalability
fine-grained locking, efficient journaling
improved data structures, improved locking
cache blocks during random I/O (e.g., parquet files)
25
Alluxio 1.1 Performance
Improvement Results
26
Create File Throughput
Throughput
Test	
  Duration
1.0.1
(Local Journal)
27
Create File Throughput
Throughput
Test	
  Duration
1.0.1
1.1.0
1.8x improvement
(Local Journal)
28
Create File Throughput
(Remote Journal)
Throughput
Test	
  Duration
1.0.1
29
Create File Throughput
(Remote Journal)
Throughput
Test	
  Duration
1.0.1
1.1.0
23x improvement
30
List Directory Throughput
Throughput
Test	
  Duration
1.0.1
31
List Directory Throughput
Throughput
Test	
  Duration
1.0.1
1.1.0
7x improvement
32
Worker Scalability
Write	
  Latency
#	
  Blocks	
  on	
  Worker
1.0.1
33
Worker Scalability
Write	
  Latency
#	
  Blocks	
  on	
  Worker
1.0.1
1.1.0
34
Try out Alluxio 1.1.0
http://www.alluxio.org/releases

More Related Content

PDF
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
PPTX
Alluxio Presentation at Strata San Jose 2016
PDF
Alluxio Keynote at Strata+Hadoop World Beijing 2016
PDF
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
PDF
Getting Started with Alluxio + Spark + S3
PDF
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
PDF
Alluxio: Unify Data at Memory Speed; 2016-11-18
PDF
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Presentation at Strata San Jose 2016
Alluxio Keynote at Strata+Hadoop World Beijing 2016
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Getting Started with Alluxio + Spark + S3
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...

What's hot (20)

PDF
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
PDF
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
PDF
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
PDF
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
PDF
Spark Summit EU talk by Jiri Simsa
PDF
Alluxio Presentation at AMPLab Summer Retreat 2016
PDF
The Missing Piece of On-Demand Clusters
PDF
Alluxio-FUSE as a data access layer for Dask
PDF
Best Practices for Using Alluxio with Spark
PDF
Accessing Data Anywhere with Unified Namespace
PDF
Tachyon: An Open Source Memory-Centric Distributed Storage System
PDF
Best Practices for Using Alluxio with Spark
PDF
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
PDF
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
PDF
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
PPTX
Tachyon workshop 2015-07-19
PDF
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
PDF
Accelerating Spark Workloads in a Mesos Environment with Alluxio
PPTX
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
PDF
Flexible and Fast Storage for Deep Learning with Alluxio
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Spark Summit EU talk by Jiri Simsa
Alluxio Presentation at AMPLab Summer Retreat 2016
The Missing Piece of On-Demand Clusters
Alluxio-FUSE as a data access layer for Dask
Best Practices for Using Alluxio with Spark
Accessing Data Anywhere with Unified Namespace
Tachyon: An Open Source Memory-Centric Distributed Storage System
Best Practices for Using Alluxio with Spark
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Tachyon workshop 2015-07-19
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Accelerating Spark Workloads in a Mesos Environment with Alluxio
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
Flexible and Fast Storage for Deep Learning with Alluxio
Ad

Viewers also liked (12)

PDF
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
PDF
기업과오픈소스 Fo4 s_ktds_v1.0_20160823
PPTX
Open Source By The Numbers
PDF
Memory Leaks on Android
PPTX
Presentation by TachyonNexus & Intel at Strata Singapore 2015
PPTX
Presentation by TachyonNexus & Baidu at Strata Singapore 2015
PDF
Code lifecycle in the jvm - TopConf Linz
PDF
Tachyon Presentation at AMPCamp 6 (November, 2015)
ODP
Just-in-time compiler (March, 2017)
PDF
系統程式 -- 第 9 章
PDF
系統程式 -- 第 8 章
PDF
Concurrency
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
기업과오픈소스 Fo4 s_ktds_v1.0_20160823
Open Source By The Numbers
Memory Leaks on Android
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Baidu at Strata Singapore 2015
Code lifecycle in the jvm - TopConf Linz
Tachyon Presentation at AMPCamp 6 (November, 2015)
Just-in-time compiler (March, 2017)
系統程式 -- 第 9 章
系統程式 -- 第 8 章
Concurrency
Ad

Similar to Open Source Memory Speed Virtual Distributed Storage (20)

PDF
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
PPTX
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
PDF
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
PDF
Deploying Alluxio in the Cloud for Machine Learning
PDF
A Reliable Memory-Centric Distributed Storage System
PDF
Accelerate Cloud Training with Alluxio
PDF
Spark Pipelines in the Cloud with Alluxio
PDF
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
Unified Big Data Analytics: Any Stack, Any Cloud
PDF
Spark Pipelines in the Cloud with Alluxio with Gene Pang
PDF
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
PPT
ECM and Open Source Software: A Disruptive Force in ECM Solutions
PDF
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
PDF
Enabling new client operating systems in Uyuni. AlmaLinux as an example.
PDF
Running Spark & Alluxio in Kubernetes
PDF
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
PDF
Alluxio 2 Community Update
PDF
What’s new in Alluxio 2: from seamless operations to structured data management
PPTX
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Deploying Alluxio in the Cloud for Machine Learning
A Reliable Memory-Centric Distributed Storage System
Accelerate Cloud Training with Alluxio
Spark Pipelines in the Cloud with Alluxio
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
Unified Big Data Analytics: Any Stack, Any Cloud
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
ECM and Open Source Software: A Disruptive Force in ECM Solutions
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
Enabling new client operating systems in Uyuni. AlmaLinux as an example.
Running Spark & Alluxio in Kubernetes
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Alluxio 2 Community Update
What’s new in Alluxio 2: from seamless operations to structured data management
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics

More from Alluxio, Inc. (20)

PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
PDF
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
AI/ML Infra Meetup | Big Data and AI, Zoom Developers

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Machine learning based COVID-19 study performance prediction
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
KodekX | Application Modernization Development
PDF
Electronic commerce courselecture one. Pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Spectroscopy.pptx food analysis technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
Teaching material agriculture food technology
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Approach and Philosophy of On baking technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Empathic Computing: Creating Shared Understanding
Diabetes mellitus diagnosis method based random forest with bat algorithm
Machine learning based COVID-19 study performance prediction
Building Integrated photovoltaic BIPV_UPV.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
KodekX | Application Modernization Development
Electronic commerce courselecture one. Pdf
The AUB Centre for AI in Media Proposal.docx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Spectroscopy.pptx food analysis technology
MIND Revenue Release Quarter 2 2025 Press Release
Digital-Transformation-Roadmap-for-Companies.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Teaching material agriculture food technology
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Weekly Chronicles - August'25 Week I
Review of recent advances in non-invasive hemoglobin estimation
Approach and Philosophy of On baking technology

Open Source Memory Speed Virtual Distributed Storage