SlideShare a Scribd company logo
Haoyuan Li, Tachyon Nexus

haoyuan@tachyonnexus.com

September 22, 2015 @ SDC 2015
A Reliable Memory-Centric
Distributed Storage System
•  Team consists of Tachyon creators, top contributors,
people from UC Berkeley, Google, CMU, VMware, Stanford,
Facebook, etc.
•  $7.5 million Series A from Andreessen Horowitz


•  Committed to Tachyon Open Source


2
3
Outline
•  Overview
– Motivation
– Tachyon Architecture
– Using Tachyon
•  Open Source
– Status
– Production Use Cases
•  Roadmap
4
Outline
•  Overview
– Motivation
– Tachyon Architecture
– Using Tachyon
•  Open Source
– Status
– Production Use Cases
•  Roadmap
5
Tachyon: Born in UC Berkeley AMPLab
6
Cluster manager
 Parallel computation
framework
Reliable, distributed memory-centric storage system
7
Why Tachyon?
Memory is Fast
•  RAM throughput 

increasing exponentially
•  Disk throughput
increasing slowly
8
Memory-locality key to interactive response times
Memory is Cheaper
source:	
  jcmit.com	
  
9
Realized by many…
10
11
Is the
Problem Solved?
12
Missing a Solution
for the Storage Layer
An Example: - 
•  Fast, in-memory data processing framework
– Keep one in-memory copy inside JVM
– Track lineage of operations used to derive data
– Upon failure, use lineage to recompute data
map
filter
 map
join
 reduce
Lineage Tracking
13
Issue 1
14
Data Sharing is the bottleneck in
analytics pipeline:

Slow writes to disk
Spark Job1
Spark mem
block manager
block 1
block 3
Spark Job2
Spark mem
block manager
block 3
block 1
HDFS / Amazon S3
block 1
block 3
block 2
block 4
storage engine & 
execution engine
same process
(slow writes)
Issue 1
15
Spark Job
Spark mem
block manager
block 1
block 3
Hadoop MR Job
YARN
HDFS / Amazon S3
block 1
block 3
block 2
block 4
Data Sharing is the bottleneck in
analytics pipeline:

Slow writes to disk
storage engine & 
execution engine
same process
(slow writes)
Issue 2
16
Spark Task
Spark memory
block manager
block 1
block 3
HDFS / Amazon S3
block 1
block 3
block 2
block 4
execution engine & 

storage engine
same process
Cache loss when process
crashes
Issue 2
17
crash
Spark memory
block manager
block 1
block 3
HDFS / Amazon S3
block 1
block 3
block 2
block 4
execution engine & 

storage engine
same process
Cache loss when process
crashes
HDFS / Amazon S3
Issue 2
18
block 1
block 3
block 2
block 4
execution engine & 

storage engine
same process
crash
Cache loss when process
crashes
HDFS / Amazon S3
Issue 3
19
In-memory Data Duplication &
Java Garbage Collection
Spark Task1
Spark mem
block manager
block 1
block 3
Spark Task2
Spark mem
block manager
block 3
block 1
block 1
block 3
block 2
block 4
execution engine & 

storage engine
same process
(duplication & GC)
Tachyon 
Reliable data sharing at
memory-speed within and across
cluster frameworks/jobs
20
Technical Overview
Ideas
•  A memory-centric storage architecture
•  Push lineage down to storage layer
•  Manage tiered storage

Facts
•  One data copy in memory
•  Re-computation for fault-tolerance
21
Eco-System	
  
22
Tachyon Memory-Centric
Architecture
23
Tachyon Memory-Centric
Architecture
24
Lineage in Tachyon
25
Issue 1 revisited
26
Memory-speed data sharing

among jobs in different
frameworks
execution engine & 

storage engine
same process
(fast writes)
Spark Job
Spark mem
Hadoop MR Job
YARN
HDFS / Amazon S3
block 1
block 3
block 2
block 4
HDFS	
  
disk	
  
block	
  1	
  
block	
  3	
  
block	
  2	
  
block	
  4	
  
Tachyon"
in-memory
block 1
block 3
 block 4
HDFS / Amazon S3
block 1
block 3
block 2
block 4
Tachyon"
in-memory
block 1
block 3
 block 4
Issue 2 revisited
27
Spark Task
Spark memory
block manager
execution engine & 

storage engine
same process
Keep in-memory data safe,

even when a job crashes.
Issue 2 revisited
28
HDFS	
  
disk	
  
block	
  1	
  
block	
  3	
  
block	
  2	
  
block	
  4	
  
execution engine & 

storage engine
same process
Tachyon"
in-memory 

block 1
block 3
 block 4
crash
HDFS / Amazon S3
block 1
block 3
block 2
block 4
Keep in-memory data safe,

even when a job crashes.
Issue 3 revisited
29
No in-memory data duplication,

much less GC
Spark Task
Spark mem
Spark Task
Spark mem
HDFS / Amazon S3
block 1
block 3
block 2
block 4
execution engine & 

storage engine
same process
(no duplication & GC)
HDFS	
  
disk	
  
block	
  1	
  
block	
  3	
  
block	
  2	
  
block	
  4	
  
Tachyon"
in-memory
block 1
block 3
 block 4
Comparison with In-Memory HDFS 
30
Outline
•  Overview
– Motivation
– Tachyon Architecture
– Using Tachyon
•  Open Source
– Status
– Production Use Cases
•  Roadmap
31
Open Source Status
•  Started at UC Berkeley AMPLab in Summer 2012


•  Apache License 2.0, Version 0.7.1 (August 2015)
•  Deployed at > 50 companies (July 2014)
•  30+ Companies Contributing

32
Contributors Growth
33
v0.4"
Feb ‘14
v0.3"
Oct ‘13
v0.2
Apr ‘13
v0.1
Dec ‘12
v0.6"
Mar ‘15
v0.5"
Jul ‘14
v0.7"
Jul ‘15
1
 3
15
30
46
70
111
Codebase Growth
34
v0.4"
Feb ‘14
v0.3"
Oct ‘13
v0.2
Apr ‘13
v0.6"
Mar ‘15
v0.5"
Jul ‘14
v0.7"
Jul ‘15
465

commits
696
commits
1080
commits
1610
commits
2884
commits
5021
commits
Thanks to Our Contributors!
35
Reported Tachyon Usage
36
Under Filesystem Choices
(Big Data, Cloud, HPC, Enterprise)
37
Use Case: Baidu
•  Framework: SparkSQL
•  Under Storage: Baidu’s File System
•  Storage Media: MEM + HDD
•  100+ nodes deployment
•  1PB+ managed space
•  30x Performance Improvement
38
Use Case: a SAAS Company
•  Framework: Impala
•  Under Storage: S3
•  Storage Media: MEM + SSD
•  15x Performance Improvement
39
Use Case: an Oil Company
•  Framework: Spark
•  Under Storage: GlusterFS
•  Storage Media: MEM only
•  Analyzing data in traditional storage
40
Use Case: a SAAS Company
•  Framework: Spark
•  Under Storage: S3
•  Storage Media: SSD only
•  Elastic Tachyon deployment
41
Outline
•  Overview
– Motivation
– Tachyon Architecture
– Using Tachyon
•  Open Source
– Status
– Production Use Cases
•  Roadmap
42
New Features
•  Lineage in Storage (alpha)
•  Tiered Storage (alpha)
43
New Features
•  Lineage in Storage (alpha)
•  Tiered Storage (alpha)
•  Data Serving
•  Support for New Hardware
•  …
•  Your New Feature!
44
45
Tachyon’s Goal?
Distributed Memory-Centric Storage:

Better Assist Other Components
Welcome Collaboration!
46
JIRA New Contributor Tasks
•  Website: http://tachyon-project.org


•  Github: https://github.com/amplab/tachyon


•  Meetup: http://www.meetup.com/Tachyon


•  News Letter Subscription: http://goo.gl/mwB2sX

•  Email: haoyuan@tachyonnexus.com
47

More Related Content

PDF
Fast Big Data Analytics with Spark on Tachyon
PDF
Embracing hybrid cloud for data-intensive analytic workloads
PDF
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
PDF
Accelerate Cloud Training with Alluxio
PDF
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
PDF
Speeding Up Spark Performance using Alluxio at China Unicom
PPTX
Tachyon workshop 2015-07-19
PDF
Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3
Fast Big Data Analytics with Spark on Tachyon
Embracing hybrid cloud for data-intensive analytic workloads
Alluxio on AWS EMR Fast Storage Access & Sharing for Spark
Accelerate Cloud Training with Alluxio
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Speeding Up Spark Performance using Alluxio at China Unicom
Tachyon workshop 2015-07-19
Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3

What's hot (20)

PDF
Best Practice in Accelerating Data Applications with Spark+Alluxio
PDF
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
PDF
Accelerating Data Computation on Ceph Objects
PDF
Hybrid data lake on google cloud with alluxio and dataproc
PDF
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
PDF
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
PDF
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
PDF
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
PDF
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
PDF
Flexible and Fast Storage for Deep Learning with Alluxio
PDF
Hands-on with Alluxio Structured Data Management
PDF
Apache Hudi: The Path Forward
PDF
Accelerating Hive with Alluxio on S3
PDF
RaptorX: Building a 10X Faster Presto with hierarchical cache
PDF
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PDF
Open Source Memory Speed Virtual Distributed Storage
PPTX
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
PPTX
Presentation by TachyonNexus & Baidu at Strata Singapore 2015
PDF
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Best Practice in Accelerating Data Applications with Spark+Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Accelerating Data Computation on Ceph Objects
Hybrid data lake on google cloud with alluxio and dataproc
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio
Hands-on with Alluxio Structured Data Management
Apache Hudi: The Path Forward
Accelerating Hive with Alluxio on S3
RaptorX: Building a 10X Faster Presto with hierarchical cache
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Accelerate Analytics and ML in the Hybrid Cloud Era
Open Source Memory Speed Virtual Distributed Storage
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
Presentation by TachyonNexus & Baidu at Strata Singapore 2015
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Ad

Similar to A Reliable Memory-Centric Distributed Storage System (20)

PDF
Tachyon: An Open Source Memory-Centric Distributed Storage System
PDF
Tachyon Presentation at AMPCamp 6 (November, 2015)
PDF
Using Spark with Tachyon by Gene Pang
PDF
Tachyon-2014-11-21-amp-camp5
PPTX
Presentation by TachyonNexus & Intel at Strata Singapore 2015
PPTX
Tachyon meetup slides.
PPTX
Tachyon_meetup_5-28-2015-IBM
PDF
Tachyon memory centric, fault tolerance storage for cluster framworks
PPT
Solving Big Data Problems
PPTX
Tachyon meetup San Francisco Oct 2014
PDF
Tachyon and Apache Spark
PDF
Elastic Data Analytics Platform @Datadog
PDF
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
PPTX
HDFS Tiered Storage: Mounting Object Stores in HDFS
PDF
Hadoop and object stores can we do it better
PDF
Hadoop and object stores: Can we do it better?
PDF
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
PDF
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
PPTX
Spark, Tachyon and Mesos internals
PDF
First-ever scalable, distributed deep learning architecture using Spark & Tac...
Tachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon Presentation at AMPCamp 6 (November, 2015)
Using Spark with Tachyon by Gene Pang
Tachyon-2014-11-21-amp-camp5
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Tachyon meetup slides.
Tachyon_meetup_5-28-2015-IBM
Tachyon memory centric, fault tolerance storage for cluster framworks
Solving Big Data Problems
Tachyon meetup San Francisco Oct 2014
Tachyon and Apache Spark
Elastic Data Analytics Platform @Datadog
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
HDFS Tiered Storage: Mounting Object Stores in HDFS
Hadoop and object stores can we do it better
Hadoop and object stores: Can we do it better?
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Spark, Tachyon and Mesos internals
First-ever scalable, distributed deep learning architecture using Spark & Tac...
Ad

More from Alluxio, Inc. (20)

PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...

Recently uploaded (20)

PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Nekopoi APK 2025 free lastest update
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Introduction to Artificial Intelligence
PDF
System and Network Administration Chapter 2
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
ai tools demonstartion for schools and inter college
PPTX
history of c programming in notes for students .pptx
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
medical staffing services at VALiNTRY
PDF
Design an Analysis of Algorithms II-SECS-1021-03
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Nekopoi APK 2025 free lastest update
How to Migrate SBCGlobal Email to Yahoo Easily
How Creative Agencies Leverage Project Management Software.pdf
Upgrade and Innovation Strategies for SAP ERP Customers
Introduction to Artificial Intelligence
System and Network Administration Chapter 2
ManageIQ - Sprint 268 Review - Slide Deck
How to Choose the Right IT Partner for Your Business in Malaysia
Which alternative to Crystal Reports is best for small or large businesses.pdf
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
ai tools demonstartion for schools and inter college
history of c programming in notes for students .pptx
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
medical staffing services at VALiNTRY
Design an Analysis of Algorithms II-SECS-1021-03

A Reliable Memory-Centric Distributed Storage System