SlideShare a Scribd company logo
Optimizing Latency-Sensitive Queries for
Presto at Facebook
A Collaboration Between Presto & Alluxio
Rohit Jain (Facebook), James Sun (Facebook), Bin Fan (Alluxio)
05/07/2020
Q&A: www.alluxio.io/slack
• Overview
• Architecture and Problems
• Re-architecture and solution
• Performance
• Introduce Alluxio Local Cache
• Timeline
SELECT * FROM A, B
WHERE A.k = B.k
Storage
A Distributed SQL (Compute) Engine
Result
Presto is Open Source
Presto @ Facebook Scale
40K
Servers
~ 1 EB data scan
per day
> 80%
new ETL
Interactive Use Cases @ Facebook *
Rapto
r
MySQL
. . .joins allowed across heterogenous data
sources
Hive
• presto-hive (Presto)
• General-purpose dashboarding and adhoc queries
• presto-raptor (Raptor)
• Low-latency dashboarding and A/B testing
• Usually 10X faster than Presto
• The two use cases have the similar fleet sizes
• Overview
• Architecture and Problems
• Re-architecture and solution
• Performance
• Introduce Alluxio Local Cache
• Timeline
Difference between Presto and Raptor: How Presto Works
Driver
Driver
Planner/
Optimizer
Scheduler
Worker
Worker
Driver
Worker
HDFS
Hive
Metastore
read/writeBlock
workload balanced
openFiles
getPartitions
getFiles
SQL
result
Difference between Presto and Raptor: How Raptor Works
Driver
Driver
Planner/
Optimizer
Scheduler
Worker
Worker
Driver
Worker KV Store
Raptor
Metastore
getFiles
read/writeBlock
hard affinity
Local SSD
Background
Job
backup
compaction /
cleaning
write thru.
SQL
result
Pros and Cons between Presto and Raptor
Presto Raptor
Pros
Large-scale storage (EB) Low latency (sub-second)
Independent storage and compute scale Refined metastore (file-level)
Cons
High latency (sub-minute) Mid-scale storage (PB)
Coarse metastore (partition-level) Coupled storage and compute
• Overview
• Architecture and Problems
• Re-architecture and solution
• Performance
• Introduce Alluxio Local Cache
• Timeline
New Architecture to Unify Presto and Raptor
Driver
Driver
Planner/
Optimizer
Scheduler
Worker
Worker
Driver
Worker
HDFS
Hive
Metastore
getFiles
openFile/footer cache
read/writeBlock
soft affinity
local data cache
Local SSD
Caching
Low-overhead
coordinator
KV store
file location/stats
• Random Node Scheduler
• Best efforts to assign the same split to the same worker
Affinity scheduling
• A common optimization technique is to cache working dataset
closer to the compute node.
• With lesser trips to remote storage should help with latencies
and IO.
Data Caching
• Facebook internal caching libraries
• Open source solutions
• Build our own
Various Options
• Naïve solution
• Copying files from remote storage on local storage
• Merging files in the local storage to keep file count low
File Merge Caching
File Merge Caching contd..
• Java based OSS library
• Segment Based data caching: Reading, writing and evicting in
smaller units.
• Asynchronous operations
• Pluggable eviction policies
• Semantic aware caching
Learnings
• A Java based OSS library
• Segment Based data caching
• Pluggable eviction policies
• Configuration of various aspects, sizes, resources usage, eviction policies, etc.
• Provide detailed stats regarding cache usage.
• Caching should not become a point of failure.
• Asynchronous operations.
• Files management at the disk level.
• Flash throughput limiter to avoid endurance issues.
Collaboration with Alluxio
• Two full days worth of queries from the production cluster was
shadowed to the test cluster.
• Query Count: 17320
• 600 nodes cluster
• 460GB per node was configured for data caching.
• LRU eviction policy.
• 1MB as the block size, meaning data is read, stored, and evicted
in the 1 MB size.
Benchmark
• Overview
• Architecture and Problems
• Re-architecture and solution
• Performance
• Introduce Alluxio Local Cache
• Timeline
Benchmark result
• IO Savings
• Data Size read for master branch run: 582 T Bytes
• Data Size read for caching branch run: 251 T Bytes
• Savings in Scans: 57%
Benchmark result cond..
• Cache hit rate
Benchmark result cont..
• Overview
• Architecture and Problems
• Re-architecture and solution
• Performance
• Introduce Alluxio Local Cache
• Timeline
Confidential Use Only – Do Not Share
Alluxio Overview
• Open source data orchestration
• Commonly used for data analytics such as OLAP on Hadoop
• Started as a research project “Tachyon” in UC Berkeley
Confidential Use Only – Do Not Share
750
1 3 70
210
1080
Fast Growing Open Source Community
v1.0
Feb ‘16
v0.6
Mar ‘15
v0.2
Apr ‘13
v0.1
Dec ‘12
v2.1
Nov ‘19
v1.8
Jul ‘18
Over 1000 Github Contributors
Latest release: 2.2.0 in March 2020
Consumer Travel & TransportationTelco & Media
TechnologyFinancial Services Retail & Entertainment Data & Analytics
Services
Deployed in Hundreds of Companies
Alluxio Local Cache: Architecture
Local cache
storage
Alluxio Caching
File System
On Cache Hit
External
Storage
Presto
Worker
On Cache Miss
HDFS API Calls
Alluxio Cache
Manager
External File
System
Presto Server JVM
• Seek-heavy read pattern is (non-sequential)
• Segment (1MB by default) based caching (compared file size)
• Presto server is highly concurrent by design
• Light-weight & fine-grained locking across segments
• Queries are bursty
• Optional async write to cache
I/O Challenges & Implementation
• Pluggable cache replace policies:
• LRU, LFU
• Pluggable cache storage options:
• local file system: each segment -> one file
• Rocksdb
Cache Configuration
• Alluxio Local Cache is an embedded library
• Shipped with Alluxio client jar since v2.2.0
• No extra server daemon required
• Can be easily used in other JVM applications
• Alluxio System supports full functionalities
• Data policies: free, pin, TTL etc
• Metadata caching and synching
• Familiar Filesystem CLIs
• Transformation service
Alluxio Local Cache vs. Alluxio System
• Overview
• Architecture and Problems
• Re-architecture and solution
• Performance
• Introduce Alluxio Local Cache
• Timeline
• Enable Presto + Alluxio Local Cache:
• edit etc/catalog/hive.properties
• available in next Presto release
• Future work
• Semantics-aware metadata caching
• Performance tuning: CPU vs mem
Timeline & Future Work
cache.enabled=true
cache.type=ALLUXIO
cache.base-directory=/tmp/alluxio-cache
Recording of this talk will be available soonQ & A

More Related Content

PDF
Achieving Separation of Compute and Storage in a Cloud World
PDF
Scalable and High available Distributed File System Metadata Service Using gR...
PDF
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
PDF
Improving Presto performance with Alluxio at TikTok
PDF
Presto on Alluxio Hands-On Lab
PDF
From limited Hadoop compute capacity to increased data scientist efficiency
PDF
The Practice of Alluxio in JD.com
PDF
Hybrid data lake on google cloud with alluxio and dataproc
Achieving Separation of Compute and Storage in a Cloud World
Scalable and High available Distributed File System Metadata Service Using gR...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Improving Presto performance with Alluxio at TikTok
Presto on Alluxio Hands-On Lab
From limited Hadoop compute capacity to increased data scientist efficiency
The Practice of Alluxio in JD.com
Hybrid data lake on google cloud with alluxio and dataproc

What's hot (20)

PDF
How to Develop and Operate Cloud First Data Platforms
PDF
Powering Interactive Analytics with Alluxio and Presto
PDF
Best Practices for Using Alluxio with Spark
PDF
RaptorX: Building a 10X Faster Presto with hierarchical cache
PDF
The Practice of Presto & Alluxio in E-Commerce Big Data Platform
PPTX
Hybrid collaborative tiered storage with alluxio
PDF
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
PDF
Apache Hudi: The Path Forward
PDF
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
PPTX
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
PDF
Burst Presto & Spark workloads to AWS EMR with no data copies
PDF
How to Build a Cloud Native Stack for Analytics with Spark, Hive, and Alluxio...
PDF
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
PDF
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
PDF
Speeding Up Spark Performance using Alluxio at China Unicom
PDF
Accelerate Cloud Training with Alluxio
PDF
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
PDF
Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3
PPTX
Running Solr in the Cloud at Memory Speed with Alluxio
PDF
Alluxio Data Orchestration Platform for the Cloud
How to Develop and Operate Cloud First Data Platforms
Powering Interactive Analytics with Alluxio and Presto
Best Practices for Using Alluxio with Spark
RaptorX: Building a 10X Faster Presto with hierarchical cache
The Practice of Presto & Alluxio in E-Commerce Big Data Platform
Hybrid collaborative tiered storage with alluxio
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Apache Hudi: The Path Forward
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com'...
Burst Presto & Spark workloads to AWS EMR with no data copies
How to Build a Cloud Native Stack for Analytics with Spark, Hive, and Alluxio...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Speeding Up Spark Performance using Alluxio at China Unicom
Accelerate Cloud Training with Alluxio
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3
Running Solr in the Cloud at Memory Speed with Alluxio
Alluxio Data Orchestration Platform for the Cloud
Ad

Similar to Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration Between Presto & Alluxio (20)

PDF
Alluxio - Scalable Filesystem Metadata Services
PDF
Building a Distributed File System for the Cloud-Native Era
PPTX
Big data talk barcelona - jsr - jc
PDF
Webinar - DreamObjects/Ceph Case Study
PDF
Apache Tajo - An open source big data warehouse
PDF
Scalable Filesystem Metadata Services with RocksDB
PDF
SharePoint Saturday San Antonio: SharePoint 2010 Performance
PPTX
Cloud computing UNIT 2.1 presentation in
PPTX
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
PDF
High Concurrency Architecture and Laravel Performance Tuning
PPTX
Elasticsearch 5.0
PDF
Inter connect2016 yss1841-cloud-storage-options-v4
PPTX
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...
PDF
Share point 2010 performance and capacity planning best practices
PDF
Boost the Performance of SharePoint Today!
PPTX
Scale your Alfresco Solutions
PPTX
Mini-Training: To cache or not to cache
PDF
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
PDF
Aesop change data propagation
PPTX
Training Webinar: Enterprise application performance with distributed caching
Alluxio - Scalable Filesystem Metadata Services
Building a Distributed File System for the Cloud-Native Era
Big data talk barcelona - jsr - jc
Webinar - DreamObjects/Ceph Case Study
Apache Tajo - An open source big data warehouse
Scalable Filesystem Metadata Services with RocksDB
SharePoint Saturday San Antonio: SharePoint 2010 Performance
Cloud computing UNIT 2.1 presentation in
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
High Concurrency Architecture and Laravel Performance Tuning
Elasticsearch 5.0
Inter connect2016 yss1841-cloud-storage-options-v4
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...
Share point 2010 performance and capacity planning best practices
Boost the Performance of SharePoint Today!
Scale your Alfresco Solutions
Mini-Training: To cache or not to cache
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Aesop change data propagation
Training Webinar: Enterprise application performance with distributed caching
Ad

More from Alluxio, Inc. (20)

PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...

Recently uploaded (20)

PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Introduction to Artificial Intelligence
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
medical staffing services at VALiNTRY
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
top salesforce developer skills in 2025.pdf
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
ai tools demonstartion for schools and inter college
PDF
Understanding Forklifts - TECH EHS Solution
PDF
System and Network Administration Chapter 2
PTS Company Brochure 2025 (1).pdf.......
Softaken Excel to vCard Converter Software.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Introduction to Artificial Intelligence
CHAPTER 2 - PM Management and IT Context
medical staffing services at VALiNTRY
2025 Textile ERP Trends: SAP, Odoo & Oracle
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Design an Analysis of Algorithms II-SECS-1021-03
Odoo Companies in India – Driving Business Transformation.pdf
Operating system designcfffgfgggggggvggggggggg
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
How to Choose the Right IT Partner for Your Business in Malaysia
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
top salesforce developer skills in 2025.pdf
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
ai tools demonstartion for schools and inter college
Understanding Forklifts - TECH EHS Solution
System and Network Administration Chapter 2

Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration Between Presto & Alluxio

  • 1. Optimizing Latency-Sensitive Queries for Presto at Facebook A Collaboration Between Presto & Alluxio Rohit Jain (Facebook), James Sun (Facebook), Bin Fan (Alluxio) 05/07/2020 Q&A: www.alluxio.io/slack
  • 2. • Overview • Architecture and Problems • Re-architecture and solution • Performance • Introduce Alluxio Local Cache • Timeline
  • 3. SELECT * FROM A, B WHERE A.k = B.k Storage A Distributed SQL (Compute) Engine Result
  • 4. Presto is Open Source
  • 5. Presto @ Facebook Scale 40K Servers ~ 1 EB data scan per day > 80% new ETL
  • 6. Interactive Use Cases @ Facebook * Rapto r MySQL . . .joins allowed across heterogenous data sources Hive • presto-hive (Presto) • General-purpose dashboarding and adhoc queries • presto-raptor (Raptor) • Low-latency dashboarding and A/B testing • Usually 10X faster than Presto • The two use cases have the similar fleet sizes
  • 7. • Overview • Architecture and Problems • Re-architecture and solution • Performance • Introduce Alluxio Local Cache • Timeline
  • 8. Difference between Presto and Raptor: How Presto Works Driver Driver Planner/ Optimizer Scheduler Worker Worker Driver Worker HDFS Hive Metastore read/writeBlock workload balanced openFiles getPartitions getFiles SQL result
  • 9. Difference between Presto and Raptor: How Raptor Works Driver Driver Planner/ Optimizer Scheduler Worker Worker Driver Worker KV Store Raptor Metastore getFiles read/writeBlock hard affinity Local SSD Background Job backup compaction / cleaning write thru. SQL result
  • 10. Pros and Cons between Presto and Raptor Presto Raptor Pros Large-scale storage (EB) Low latency (sub-second) Independent storage and compute scale Refined metastore (file-level) Cons High latency (sub-minute) Mid-scale storage (PB) Coarse metastore (partition-level) Coupled storage and compute
  • 11. • Overview • Architecture and Problems • Re-architecture and solution • Performance • Introduce Alluxio Local Cache • Timeline
  • 12. New Architecture to Unify Presto and Raptor Driver Driver Planner/ Optimizer Scheduler Worker Worker Driver Worker HDFS Hive Metastore getFiles openFile/footer cache read/writeBlock soft affinity local data cache Local SSD Caching Low-overhead coordinator KV store file location/stats
  • 13. • Random Node Scheduler • Best efforts to assign the same split to the same worker Affinity scheduling
  • 14. • A common optimization technique is to cache working dataset closer to the compute node. • With lesser trips to remote storage should help with latencies and IO. Data Caching
  • 15. • Facebook internal caching libraries • Open source solutions • Build our own Various Options
  • 16. • Naïve solution • Copying files from remote storage on local storage • Merging files in the local storage to keep file count low File Merge Caching
  • 18. • Java based OSS library • Segment Based data caching: Reading, writing and evicting in smaller units. • Asynchronous operations • Pluggable eviction policies • Semantic aware caching Learnings
  • 19. • A Java based OSS library • Segment Based data caching • Pluggable eviction policies • Configuration of various aspects, sizes, resources usage, eviction policies, etc. • Provide detailed stats regarding cache usage. • Caching should not become a point of failure. • Asynchronous operations. • Files management at the disk level. • Flash throughput limiter to avoid endurance issues. Collaboration with Alluxio
  • 20. • Two full days worth of queries from the production cluster was shadowed to the test cluster. • Query Count: 17320 • 600 nodes cluster • 460GB per node was configured for data caching. • LRU eviction policy. • 1MB as the block size, meaning data is read, stored, and evicted in the 1 MB size. Benchmark
  • 21. • Overview • Architecture and Problems • Re-architecture and solution • Performance • Introduce Alluxio Local Cache • Timeline
  • 23. • IO Savings • Data Size read for master branch run: 582 T Bytes • Data Size read for caching branch run: 251 T Bytes • Savings in Scans: 57% Benchmark result cond..
  • 24. • Cache hit rate Benchmark result cont..
  • 25. • Overview • Architecture and Problems • Re-architecture and solution • Performance • Introduce Alluxio Local Cache • Timeline
  • 26. Confidential Use Only – Do Not Share Alluxio Overview • Open source data orchestration • Commonly used for data analytics such as OLAP on Hadoop • Started as a research project “Tachyon” in UC Berkeley
  • 27. Confidential Use Only – Do Not Share 750 1 3 70 210 1080 Fast Growing Open Source Community v1.0 Feb ‘16 v0.6 Mar ‘15 v0.2 Apr ‘13 v0.1 Dec ‘12 v2.1 Nov ‘19 v1.8 Jul ‘18 Over 1000 Github Contributors Latest release: 2.2.0 in March 2020
  • 28. Consumer Travel & TransportationTelco & Media TechnologyFinancial Services Retail & Entertainment Data & Analytics Services Deployed in Hundreds of Companies
  • 29. Alluxio Local Cache: Architecture Local cache storage Alluxio Caching File System On Cache Hit External Storage Presto Worker On Cache Miss HDFS API Calls Alluxio Cache Manager External File System Presto Server JVM
  • 30. • Seek-heavy read pattern is (non-sequential) • Segment (1MB by default) based caching (compared file size) • Presto server is highly concurrent by design • Light-weight & fine-grained locking across segments • Queries are bursty • Optional async write to cache I/O Challenges & Implementation
  • 31. • Pluggable cache replace policies: • LRU, LFU • Pluggable cache storage options: • local file system: each segment -> one file • Rocksdb Cache Configuration
  • 32. • Alluxio Local Cache is an embedded library • Shipped with Alluxio client jar since v2.2.0 • No extra server daemon required • Can be easily used in other JVM applications • Alluxio System supports full functionalities • Data policies: free, pin, TTL etc • Metadata caching and synching • Familiar Filesystem CLIs • Transformation service Alluxio Local Cache vs. Alluxio System
  • 33. • Overview • Architecture and Problems • Re-architecture and solution • Performance • Introduce Alluxio Local Cache • Timeline
  • 34. • Enable Presto + Alluxio Local Cache: • edit etc/catalog/hive.properties • available in next Presto release • Future work • Semantics-aware metadata caching • Performance tuning: CPU vs mem Timeline & Future Work cache.enabled=true cache.type=ALLUXIO cache.base-directory=/tmp/alluxio-cache
  • 35. Recording of this talk will be available soonQ & A