SlideShare a Scribd company logo
UNIFY DATA AT MEMORY SPEED
Haoyuan (HY) Li, CEO @ Alluxio Inc.
VAULT Conference 2017
March 2017
HISTORY
• Started at UC Berkeley AMPLab In Summer 2012
• Originally named as Tachyon
• Rebranded to Alluxio in early 2016
• Open Sourced in 2013
• Apache License 2.0
• Latest Stable Release: Alluxio 1.4.0
• Alluxio 1.5.0 Planned For Q2, 2017
2
© 2017 Alluxio Confidential
BIG DATA ECOSYSTEM YESTERDAY
3
© 2017 Alluxio Confidential
BIG DATA ECOSYSTEM TODAY
…
…
3
© 2017 Alluxio Confidential
…
…
BIG DATA ECOSYSTEM ISSUES
3
© 2017 Alluxio Confidential
BIG DATA ECOSYSTEM WITH ALLUXIO
…
…
FUSE Compatible File
System
Hadoop Compatible File
System
Native Key-Value
Interface
Native File System
GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface
3
© 2017 Alluxio Confidential
BIG DATA ECOSYSTEM WITH ALLUXIO
…
…
FUSE Compatible File
System
Hadoop Compatible File
System
Native Key-Value
Interface
Native File System
Enabling Application to Access Data from any
Storage System at Memory-speed
GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface
3
© 2017 Alluxio Confidential 4
© 2017 Alluxio Confidential 5
© 2017 Alluxio Confidential
FASTEST-GROWING BIG DATA PROJECT
6
© 2017 Alluxio Confidential
FASTEST-GROWING BIG DATA PROJECT
• Formerly named
Tachyon, born in the
AMPLab
• 500+ contributors
from 100+
organizations
• Running world’s
largest production
clusters
6
© 2017 Alluxio Confidential
WHY ALLUXIO
7
Co-located compute and data with memory-speed access to data
Virtualized across different storage systems under a unified namespace
Scale-out architecture
File system API, software only
© 2017 Alluxio Confidential
ALLUXIO BENEFITS
Unification
New workflows across
any data in any storage
system
Orders of magnitude
improvement in run
time
Choice in compute and
storage – grow each
independently, buy only
what is needed
Performance Flexibility
8
© 2017 Alluxio Confidential
ALLUXIO DEPLOYMENTS
9
© 2017 Alluxio Confidential
ALLUXIO USE CASES
On-Demand Analytics &

Accelerating I/O to and from remote storage
Managing data across disparate storage systems
Sharing data across workloads at memory speed
10
© 2017 Alluxio Confidential
MANAGE DATA ACROSS STORAGE SYSTEMS
“We’ve been running in production for
over 9 months, Alluxio’s enabled
different applications & frameworks to
easily interact with data from different
storage systems
RESULTS
• Data sharing among Spark
Streaming, Spark batch and Flink
jobs provide efficient data sharing
• Improved the performance of their
system with 15x – 300x speedups
• Tiered storage feature manages
storage resources including
memory, SSD and disk
Qunar uses real-time machine learning
for their website ads
• 200+ nodes deployment
• 6 billion logs (4.5 TB) daily
• Mix of Memory + HDD
ALLUXIO
11
© 2017 Alluxio Confidential
ON-DEMAND ANALYTICS &

ACCELERATE I/O TO/FROM REMOTE STORAGE
“The performance was amazing. With
Spark SQL alone, it took 100-150 seconds to
finish a query; using Alluxio, where data
may hit local or remote Alluxio nodes, it
took 10-15 seconds.
RESULTS
• Data queries are now 30x faster with
Alluxio
• Alluxio cluster runs stably, providing
over 50TB of RAM space
• By using Alluxio, batch queries usually
lasting over 15 minutes were
transformed into an interactive query
taking less than 30 seconds
PMs run interactive queries to gain
insights into their products & business
• 200+ nodes deployment
• 2+ petabytes of storage
• Mix of memory + HDD
ALLUXIO
Baidu
File
System
12
© 2017 Alluxio Confidential
SHARE DATA ACROSS JOBS @ MEMORY SPEED
“Thanks to Alluxio, we now have the raw
data immediately available at every
iteration & can skip the costs of loading
in terms of time waiting, network traffic,
and RDBMS activity.
RESULTS
• Barclays workflow iteration time
decreased from hours to seconds
• Alluxio enabled workflows that were
impossible before
• By keeping data only in memory, the
I/O cost of loading and storing in
Alluxio is now on the order of
seconds
Barclays uses query & machine learning
to train models for risk management
• 6 node deployment
• 1TB of storage
• Memory only
ALLUXIO
13
ALLUXIO
Relational
Database:
Teradata
© 2017 Alluxio Confidential
Thank you!
Contact: {haoyuan}@alluxio.com or info@alluxio.com
14

More Related Content

PDF
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
PDF
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
PDF
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
PDF
Best Practices for Using Alluxio with Spark
PDF
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
PDF
Alluxio Presentation at AMPLab Summer Retreat 2016
PDF
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
PDF
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Best Practices for Using Alluxio with Spark
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio Presentation at AMPLab Summer Retreat 2016
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Alluxio Use Cases at Strata+Hadoop World Beijing 2016

What's hot (20)

PDF
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
PDF
The Missing Piece of On-Demand Clusters
PDF
Spark Summit EU talk by Jiri Simsa
PDF
Open Source Memory Speed Virtual Distributed Storage
PDF
Alluxio Mesos Meetup - SMACK to SMAACK
PDF
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
PDF
Accelerating Spark Workloads in a Mesos Environment with Alluxio
PDF
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
PDF
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
PDF
Alluxio: Unify Data at Memory Speed; 2016-11-18
PDF
Accessing Data Anywhere with Unified Namespace
PDF
Alluxio Keynote at Strata+Hadoop World Beijing 2016
PDF
Best Practices for Using Alluxio with Spark
PDF
Alluxio-FUSE as a data access layer for Dask
PPTX
Alluxio Presentation at Strata San Jose 2016
PDF
Getting Started with Alluxio + Spark + S3
PDF
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
PDF
Best Practices for Using Alluxio with Spark
PDF
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
PDF
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
The Missing Piece of On-Demand Clusters
Spark Summit EU talk by Jiri Simsa
Open Source Memory Speed Virtual Distributed Storage
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Accelerating Spark Workloads in a Mesos Environment with Alluxio
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio: Unify Data at Memory Speed; 2016-11-18
Accessing Data Anywhere with Unified Namespace
Alluxio Keynote at Strata+Hadoop World Beijing 2016
Best Practices for Using Alluxio with Spark
Alluxio-FUSE as a data access layer for Dask
Alluxio Presentation at Strata San Jose 2016
Getting Started with Alluxio + Spark + S3
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Best Practices for Using Alluxio with Spark
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Ad

Viewers also liked (17)

PPTX
Proyecto de corrosion
DOCX
La tecnologia y la informacion en los ultimos tiempos
PDF
Consumer Views on Authentication with Aite Group
PPTX
S4 tarea4 jasad
PPTX
bigdawg overview
PPTX
Comment Lier Son Twitter Et Linkedin Ensemble
PPTX
Situación de aprendizaje
PPTX
Miriam proyecto 2
PPTX
Tachyon workshop 2015-07-19
PDF
Empowering Global Research in Biodiversity: The Biodiversity Heritage Library
PDF
The Story of Engagement: Outreach Strategies at the Biodiversity Heritage Lib...
PDF
Tachyon: An Open Source Memory-Centric Distributed Storage System
PDF
Meet the team
PDF
Avis White Paper - The Evolution of the Connected Car
PDF
In 6 Monaten zum eigenen Buch 2017
PPTX
3Com 3C16986
PPTX
Teorías y corrientes pedagógicas2
Proyecto de corrosion
La tecnologia y la informacion en los ultimos tiempos
Consumer Views on Authentication with Aite Group
S4 tarea4 jasad
bigdawg overview
Comment Lier Son Twitter Et Linkedin Ensemble
Situación de aprendizaje
Miriam proyecto 2
Tachyon workshop 2015-07-19
Empowering Global Research in Biodiversity: The Biodiversity Heritage Library
The Story of Engagement: Outreach Strategies at the Biodiversity Heritage Lib...
Tachyon: An Open Source Memory-Centric Distributed Storage System
Meet the team
Avis White Paper - The Evolution of the Connected Car
In 6 Monaten zum eigenen Buch 2017
3Com 3C16986
Teorías y corrientes pedagógicas2
Ad

Similar to Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017 (20)

PDF
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
PDF
Spark Summit EU talk by Jiri Simsa
PDF
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
PDF
Unify Data at Memory Speed
PDF
The Architecture of Decoupling Compute and Storage with Alluxio
PDF
Alluxio @ Uber Seattle Meetup
PDF
Building Fast SQL Analytics on Anything with Presto, Alluxio
PDF
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
PDF
Unified Big Data Analytics: Any Stack, Any Cloud
PPTX
Alluxio: Unify Data at Memory Speed
PDF
Data EcoSystem 2.0
PDF
Leveraging Alluxio with Spark SQL to Speed Up Ad-hoc Analysis
PDF
Achieving Separation of Compute and Storage in a Cloud World
PDF
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
PPTX
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
PDF
Best Practices for Using Alluxio with Apache Spark with Cheng Chang and Haoyu...
PDF
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
PDF
Alluxio Community Office Hour: Getting Started with Alluxio Open Source
PDF
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
PDF
Best Practice in Accelerating Data Applications with Spark+Alluxio
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Spark Summit EU talk by Jiri Simsa
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Unify Data at Memory Speed
The Architecture of Decoupling Compute and Storage with Alluxio
Alluxio @ Uber Seattle Meetup
Building Fast SQL Analytics on Anything with Presto, Alluxio
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Unified Big Data Analytics: Any Stack, Any Cloud
Alluxio: Unify Data at Memory Speed
Data EcoSystem 2.0
Leveraging Alluxio with Spark SQL to Speed Up Ad-hoc Analysis
Achieving Separation of Compute and Storage in a Cloud World
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Best Practices for Using Alluxio with Apache Spark with Cheng Chang and Haoyu...
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Alluxio Community Office Hour: Getting Started with Alluxio Open Source
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Best Practice in Accelerating Data Applications with Spark+Alluxio

More from Alluxio, Inc. (20)

PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101

Recently uploaded (20)

PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Modernising the Digital Integration Hub
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Getting Started with Data Integration: FME Form 101
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
project resource management chapter-09.pdf
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Architecture types and enterprise applications.pdf
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
O2C Customer Invoices to Receipt V15A.pptx
OMC Textile Division Presentation 2021.pptx
Assigned Numbers - 2025 - Bluetooth® Document
DP Operators-handbook-extract for the Mautical Institute
A comparative study of natural language inference in Swahili using monolingua...
Group 1 Presentation -Planning and Decision Making .pptx
Modernising the Digital Integration Hub
cloud_computing_Infrastucture_as_cloud_p
Hindi spoken digit analysis for native and non-native speakers
Getting Started with Data Integration: FME Form 101
Univ-Connecticut-ChatGPT-Presentaion.pdf
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
project resource management chapter-09.pdf
Final SEM Unit 1 for mit wpu at pune .pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Architecture types and enterprise applications.pdf
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
O2C Customer Invoices to Receipt V15A.pptx

Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017

  • 1. UNIFY DATA AT MEMORY SPEED Haoyuan (HY) Li, CEO @ Alluxio Inc. VAULT Conference 2017 March 2017
  • 2. HISTORY • Started at UC Berkeley AMPLab In Summer 2012 • Originally named as Tachyon • Rebranded to Alluxio in early 2016 • Open Sourced in 2013 • Apache License 2.0 • Latest Stable Release: Alluxio 1.4.0 • Alluxio 1.5.0 Planned For Q2, 2017 2
  • 3. © 2017 Alluxio Confidential BIG DATA ECOSYSTEM YESTERDAY 3
  • 4. © 2017 Alluxio Confidential BIG DATA ECOSYSTEM TODAY … … 3
  • 5. © 2017 Alluxio Confidential … … BIG DATA ECOSYSTEM ISSUES 3
  • 6. © 2017 Alluxio Confidential BIG DATA ECOSYSTEM WITH ALLUXIO … … FUSE Compatible File System Hadoop Compatible File System Native Key-Value Interface Native File System GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface 3
  • 7. © 2017 Alluxio Confidential BIG DATA ECOSYSTEM WITH ALLUXIO … … FUSE Compatible File System Hadoop Compatible File System Native Key-Value Interface Native File System Enabling Application to Access Data from any Storage System at Memory-speed GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface 3
  • 8. © 2017 Alluxio Confidential 4
  • 9. © 2017 Alluxio Confidential 5
  • 10. © 2017 Alluxio Confidential FASTEST-GROWING BIG DATA PROJECT 6
  • 11. © 2017 Alluxio Confidential FASTEST-GROWING BIG DATA PROJECT • Formerly named Tachyon, born in the AMPLab • 500+ contributors from 100+ organizations • Running world’s largest production clusters 6
  • 12. © 2017 Alluxio Confidential WHY ALLUXIO 7 Co-located compute and data with memory-speed access to data Virtualized across different storage systems under a unified namespace Scale-out architecture File system API, software only
  • 13. © 2017 Alluxio Confidential ALLUXIO BENEFITS Unification New workflows across any data in any storage system Orders of magnitude improvement in run time Choice in compute and storage – grow each independently, buy only what is needed Performance Flexibility 8
  • 14. © 2017 Alluxio Confidential ALLUXIO DEPLOYMENTS 9
  • 15. © 2017 Alluxio Confidential ALLUXIO USE CASES On-Demand Analytics &
 Accelerating I/O to and from remote storage Managing data across disparate storage systems Sharing data across workloads at memory speed 10
  • 16. © 2017 Alluxio Confidential MANAGE DATA ACROSS STORAGE SYSTEMS “We’ve been running in production for over 9 months, Alluxio’s enabled different applications & frameworks to easily interact with data from different storage systems RESULTS • Data sharing among Spark Streaming, Spark batch and Flink jobs provide efficient data sharing • Improved the performance of their system with 15x – 300x speedups • Tiered storage feature manages storage resources including memory, SSD and disk Qunar uses real-time machine learning for their website ads • 200+ nodes deployment • 6 billion logs (4.5 TB) daily • Mix of Memory + HDD ALLUXIO 11
  • 17. © 2017 Alluxio Confidential ON-DEMAND ANALYTICS &
 ACCELERATE I/O TO/FROM REMOTE STORAGE “The performance was amazing. With Spark SQL alone, it took 100-150 seconds to finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. RESULTS • Data queries are now 30x faster with Alluxio • Alluxio cluster runs stably, providing over 50TB of RAM space • By using Alluxio, batch queries usually lasting over 15 minutes were transformed into an interactive query taking less than 30 seconds PMs run interactive queries to gain insights into their products & business • 200+ nodes deployment • 2+ petabytes of storage • Mix of memory + HDD ALLUXIO Baidu File System 12
  • 18. © 2017 Alluxio Confidential SHARE DATA ACROSS JOBS @ MEMORY SPEED “Thanks to Alluxio, we now have the raw data immediately available at every iteration & can skip the costs of loading in terms of time waiting, network traffic, and RDBMS activity. RESULTS • Barclays workflow iteration time decreased from hours to seconds • Alluxio enabled workflows that were impossible before • By keeping data only in memory, the I/O cost of loading and storing in Alluxio is now on the order of seconds Barclays uses query & machine learning to train models for risk management • 6 node deployment • 1TB of storage • Memory only ALLUXIO 13 ALLUXIO Relational Database: Teradata
  • 19. © 2017 Alluxio Confidential Thank you! Contact: {haoyuan}@alluxio.com or info@alluxio.com 14