Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017

UNIFY DATA AT MEMORY SPEED
Haoyuan (HY) Li, CEO @ Alluxio Inc.
VAULT Conference 2017
March 2017

HISTORY
• Started at UC Berkeley AMPLab In Summer 2012
• Originally named as Tachyon
• Rebranded to Alluxio in early 2016
• Open Sourced in 2013
• Apache License 2.0
• Latest Stable Release: Alluxio 1.4.0
• Alluxio 1.5.0 Planned For Q2, 2017
2

© 2017 Alluxio Confidential
BIG DATA ECOSYSTEM YESTERDAY
3

BIG DATA ECOSYSTEM TODAY
…
…
3

…
…
BIG DATA ECOSYSTEM ISSUES
3

BIG DATA ECOSYSTEM WITH ALLUXIO
…
…
FUSE Compatible File
System
Hadoop Compatible File
System
Native Key-Value
Interface
Native File System
GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface
3

BIG DATA ECOSYSTEM WITH ALLUXIO
…
…
FUSE Compatible File
System
Hadoop Compatible File
System
Native Key-Value
Interface
Native File System
Enabling Application to Access Data from any
Storage System at Memory-speed
GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface
3

FASTEST-GROWING BIG DATA PROJECT
6

FASTEST-GROWING BIG DATA PROJECT
• Formerly named
Tachyon, born in the
AMPLab
• 500+ contributors
from 100+
organizations
• Running world’s
largest production
clusters
6

WHY ALLUXIO
7
Co-located compute and data with memory-speed access to data
Virtualized across different storage systems under a unified namespace
Scale-out architecture
File system API, software only

ALLUXIO BENEFITS
Unification
New workflows across
any data in any storage
system
Orders of magnitude
improvement in run
time
Choice in compute and
storage – grow each
independently, buy only
what is needed
Performance Flexibility
8

ALLUXIO DEPLOYMENTS
9

ALLUXIO USE CASES
On-Demand Analytics & 
Accelerating I/O to and from remote storage
Managing data across disparate storage systems
Sharing data across workloads at memory speed
10

MANAGE DATA ACROSS STORAGE SYSTEMS
“We’ve been running in production for
over 9 months, Alluxio’s enabled
different applications & frameworks to
easily interact with data from different
storage systems
RESULTS
• Data sharing among Spark
Streaming, Spark batch and Flink
jobs provide efficient data sharing
• Improved the performance of their
system with 15x – 300x speedups
• Tiered storage feature manages
storage resources including
memory, SSD and disk
Qunar uses real-time machine learning
for their website ads
• 200+ nodes deployment
• 6 billion logs (4.5 TB) daily
• Mix of Memory + HDD
ALLUXIO
11

ON-DEMAND ANALYTICS & 
ACCELERATE I/O TO/FROM REMOTE STORAGE
“The performance was amazing. With
Spark SQL alone, it took 100-150 seconds to
finish a query; using Alluxio, where data
may hit local or remote Alluxio nodes, it
took 10-15 seconds.
RESULTS
• Data queries are now 30x faster with
Alluxio
• Alluxio cluster runs stably, providing
over 50TB of RAM space
• By using Alluxio, batch queries usually
lasting over 15 minutes were
transformed into an interactive query
taking less than 30 seconds
PMs run interactive queries to gain
insights into their products & business
• 200+ nodes deployment
• 2+ petabytes of storage
• Mix of memory + HDD
ALLUXIO
Baidu
File
System
12

SHARE DATA ACROSS JOBS @ MEMORY SPEED
“Thanks to Alluxio, we now have the raw
data immediately available at every
iteration & can skip the costs of loading
in terms of time waiting, network traffic,
and RDBMS activity.
RESULTS
• Barclays workflow iteration time
decreased from hours to seconds
• Alluxio enabled workflows that were
impossible before
• By keeping data only in memory, the
I/O cost of loading and storing in
Alluxio is now on the order of
seconds
Barclays uses query & machine learning
to train models for risk management
• 6 node deployment
• 1TB of storage
• Memory only
ALLUXIO
13
ALLUXIO
Relational
Database:
Teradata

Thank you!
Contact: {haoyuan}@alluxio.com or info@alluxio.com
14

Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017

More Related Content

What's hot (20)

Viewers also liked (17)

Similar to Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017 (20)

More from Alluxio, Inc. (20)

Recently uploaded (20)

Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017