SlideShare a Scribd company logo
Alluxio (formerly Tachyon):
Accessing Data Anywhere with Unified Namespace
Jiri Simsa
June 15, 2016 @ Alluxio Meetup (hosted by Intel)
About Me
• Software Engineer @ Alluxio, Inc.
• PMC Member and Maintainer of Alluxio Open Source Project
• Ph.D. from Carnegie Mellon University (Parallel Data Lab)
• Worked at Google before joining Alluxio
• Twitter: @jsimsa, Github: jsimsa
2
Outline
• Motivation
• Unified Namespace
• Use Cases
• Demo
3
Big Data Ecosystem
4
Big Data Ecosystem
5
Big Data Ecosystem
6
Alluxio Benefits
• Future-proofing your applications
–applications can communicate with different storage
systems, both existing and new, using the same
namespace and interface
–seamless integration between applications and new
storage systems enables faster innovation
• Enabling new workloads
–one-time effort to enable an application to access many
different types of storage systems and a storage system
to be accessed by many different types of applications
7
Outline
• Motivation
• Unified Namespace
• Use Cases
• Demo
8
Unified Namespace
an abstraction that makes it possible for
applications to access different storage
systems through the same interface
9
Transparent Naming
•Operations over persisted Alluxio objects
mapped transparently to underlying storage
•Alluxio paths are preserved in storage layer
Alluxio Storage System (HDFS, S3, …)
alluxio://host:port/
Data Users
Reports Sales Alice Bob
hdfs://host:port/
Data Users
Reports Sales Alice Bob
10
Multiple Storage Systems
•Unified namespace for multiple data sources
•Sharing of data across storage systems
•API for on-the-fly mounting / unmounting
Alluxio
Storage System A
alluxio://host:port/
Data Users
Alice Bob
hdfs://host:port/
Users
Alice Bob
Storage System B
s3://host/bucket
Reports Sales
Reports Sales
11
Outline
• Motivation
• Unified Namespace
• Use Cases
• Demo
12
Multiple Storage / Compute
13
Changing Storage Backend
14
Changing Storage Backend
15
Outline
• Motivation
• Unified Namespace
• Use Cases
• Demo
16
Resources
• Alluxio Project: http://www.alluxio.org
• Development: https://github.com/Alluxio/alluxio
• Meet Friends: http://www.meetup.com/Alluxio
• Alluxio, Inc.: http://www.alluxio.com
• Contact us: info@alluxio.com
17
Backup Slides
18
Architecture Overview
19

More Related Content

PDF
From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...
PDF
Big Data and Fast Data - Lambda Architecture in Action
PDF
Driving Digital Transformation With Containers And Kubernetes Complete Deck
PPTX
Step By Step How To Install Oracle XE
PPTX
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
PDF
Seamless MLOps with Seldon and MLflow
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
PDF
Docker Compose Setup for MySQL InnoDB Cluster
From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...
Big Data and Fast Data - Lambda Architecture in Action
Driving Digital Transformation With Containers And Kubernetes Complete Deck
Step By Step How To Install Oracle XE
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
Seamless MLOps with Seldon and MLflow
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Docker Compose Setup for MySQL InnoDB Cluster

What's hot (20)

PDF
Kubernetes (k8s).pdf
PDF
Azure Data Factory Introduction.pdf
PPTX
warner-DP-203-slides.pptx
PDF
GPT and Graph Data Science to power your Knowledge Graph
PDF
Présentation docker et kubernetes
PDF
storm at twitter
PDF
Data pipelines from zero to solid
PDF
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
PPTX
How to Choose The Right Database on AWS - Berlin Summit - 2019
PDF
MLOps Virtual Event: Automating ML at Scale
PPTX
CICD with Jenkins
ODP
Big Data Testing Strategies
PDF
Databricks: A Tool That Empowers You To Do More With Data
PDF
Cloud Native Bern 05.2023 — Zero Trust Visibility
PDF
Alex menezes - Analista de Suporte Técnico
PDF
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
PDF
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
PDF
How OpenShift SDN helps to automate
PPTX
Pythonsevilla2019 - Introduction to MLFlow
Kubernetes (k8s).pdf
Azure Data Factory Introduction.pdf
warner-DP-203-slides.pptx
GPT and Graph Data Science to power your Knowledge Graph
Présentation docker et kubernetes
storm at twitter
Data pipelines from zero to solid
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
How to Choose The Right Database on AWS - Berlin Summit - 2019
MLOps Virtual Event: Automating ML at Scale
CICD with Jenkins
Big Data Testing Strategies
Databricks: A Tool That Empowers You To Do More With Data
Cloud Native Bern 05.2023 — Zero Trust Visibility
Alex menezes - Analista de Suporte Técnico
From Postgres to Event-Driven: using docker-compose to build CDC pipelines in...
Architect’s Open-Source Guide for a Data Mesh Architecture
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
How OpenShift SDN helps to automate
Pythonsevilla2019 - Introduction to MLFlow
Ad

Viewers also liked (20)

PDF
Getting Started with Alluxio + Spark + S3
PDF
Alluxio Presentation at AMPLab Summer Retreat 2016
PDF
Open Source Memory Speed Virtual Distributed Storage
PDF
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
PDF
The Missing Piece of On-Demand Clusters
PDF
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
PDF
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
PDF
Tachyon Presentation at AMPCamp 6 (November, 2015)
PDF
Alluxio: Unify Data at Memory Speed; 2016-11-18
PPTX
Alluxio Presentation at Strata San Jose 2016
PDF
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
PDF
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
PDF
Alluxio Keynote at Strata+Hadoop World Beijing 2016
PDF
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
PDF
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
PDF
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
PDF
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
PDF
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
PPTX
Presentation by TachyonNexus & Intel at Strata Singapore 2015
PPTX
Presentation by TachyonNexus & Baidu at Strata Singapore 2015
Getting Started with Alluxio + Spark + S3
Alluxio Presentation at AMPLab Summer Retreat 2016
Open Source Memory Speed Virtual Distributed Storage
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
The Missing Piece of On-Demand Clusters
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Tachyon Presentation at AMPCamp 6 (November, 2015)
Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio Presentation at Strata San Jose 2016
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Alluxio Keynote at Strata+Hadoop World Beijing 2016
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Baidu at Strata Singapore 2015
Ad

Similar to Accessing Data Anywhere with Unified Namespace (20)

PDF
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
PDF
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
PDF
Unified Big Data Analytics: Any Stack, Any Cloud
PDF
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
PDF
Alluxio Community Office Hour: Getting Started with Alluxio Open Source
PDF
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
PDF
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
PDF
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
PDF
Alluxio @ Uber Seattle Meetup
PDF
Alluxio - Virtual Unified File System
PDF
Achieving Separation of Compute and Storage in a Cloud World
PDF
Unify Data at Memory Speed
PDF
Data EcoSystem 2.0
PDF
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
PPTX
Alluxio: Unify Data at Memory Speed
PDF
Unified Data API for Distributed Cloud Analytics and AI
PDF
Building a Distributed File System for the Cloud-Native Era
PDF
The Architecture of Decoupling Compute and Storage with Alluxio
PDF
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
PDF
Achieving compute and storage independence for data-driven workloads
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
Unified Big Data Analytics: Any Stack, Any Cloud
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Alluxio Community Office Hour: Getting Started with Alluxio Open Source
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Alluxio @ Uber Seattle Meetup
Alluxio - Virtual Unified File System
Achieving Separation of Compute and Storage in a Cloud World
Unify Data at Memory Speed
Data EcoSystem 2.0
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio: Unify Data at Memory Speed
Unified Data API for Distributed Cloud Analytics and AI
Building a Distributed File System for the Cloud-Native Era
The Architecture of Decoupling Compute and Storage with Alluxio
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
Achieving compute and storage independence for data-driven workloads

More from Alluxio, Inc. (20)

PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Approach and Philosophy of On baking technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
KodekX | Application Modernization Development
PPTX
Cloud computing and distributed systems.
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation theory and applications.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Unlocking AI with Model Context Protocol (MCP)
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
sap open course for s4hana steps from ECC to s4
Understanding_Digital_Forensics_Presentation.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectroscopy.pptx food analysis technology
Approach and Philosophy of On baking technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
KodekX | Application Modernization Development
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks
Encapsulation theory and applications.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Reach Out and Touch Someone: Haptics and Empathic Computing
Advanced methodologies resolving dimensionality complications for autism neur...
Building Integrated photovoltaic BIPV_UPV.pdf

Accessing Data Anywhere with Unified Namespace