SlideShare a Scribd company logo
Iceberg + Alluxio For fast
Data Analytics
Beinan Wang & Shouwei Chen @ Alluxio
2021/12/14
Introduction
Beinan Wang
● PrestoDB Committer
● PhD in CE @ Syracuse
● Email: beinan@alluxio.com
● Interactive Query / Compute Engine / Caching
Shouwei Chen
● Core Maintainer @ Alluxio
● PhD in ECE @ Rutgers
● Email: shouwei@alluxio.com
● Data lake / Structured data / Community
Find us on Alluxio community slack!
https://alluxio.io/slack
ALLUXIO 2
Outline
● Alluxio Overview
● Running Iceberg with Alluxio
● Querying your Iceberg Table with Presto
● Presto Iceberg connector updates
● Q & A
ALLUXIO 3
What is Alluxio?
Open Source Started From UC Berkeley AMPLab in 2014
Join the
conversation on
Slack
alluxio.io/slack
1,000+ contributors
& growing
5,000+ Slack
Community Members
Top 10 Most Critical Java
Based Open Source Project
GitHub’s Top 100 Most
Valuable Repositories
Out of 96 Million
Data Orchestration for
Analytics & AI in the Cloud
Available:
ALLUXIO 7
DATA ACCESSIBILITY
Access any storage using any compute
ALLUXIO 8
BRING DATA CLOSER TO COMPUTE ACROSS SILOS
Access based data movement for compute and storage spread across environments
v
REGION A
v
REGION B
REGION A REGION B
PRIVATE DATA
CENTERS
Amazon
EMR
Cloud
Dataproc
Kubernetes
Engine
Compute
Engine
DATACENTER 2
DATACENTER 1
Hive
COMMON USE CASES
Hybrid Cloud Gateway to utilize
on-prem compute for data in the cloud
CASE 02: HYBRID
Alluxio
Spark
PUBLIC CLOUD
ON PREMISE
Cross Datacenter Access without
changing Ingest Pipeline across regions
CASE 03: MULTI-DATACENTER
Presto
Alluxio
DATACENTER 1
DATACENTER 2
INGESTION
ALLUXIO 9
Consistent SLAs, Performance, and
Cost Savings on cloud storage
CASE 01: CLOUD
PUBLIC CLOUD
Tensorflow
Alluxio
Alluxio - Key Innovations
ALLUXIO 10
Acceleration, efficient
representation and movement of
data based on policies
EFFICIENT ACCESS &
EASY DATA MANAGEMENT
Orchestrate a data platform with
agility across regions for private,
hybrid or multi-cloud
ENVIRONMENT AGNOSTIC
& MULTI-CLOUD READY
Support multiple APIs for
analytics and AI with storage
abstraction and streamlined data
movement across the pipeline
UNIFY DATA LAKES
≈
ALLUXIO 11
EXAMPLE JOURNEY
On-premises storage as the source of truth
v
REGION A
REGION B
PRIVATE DATA
CENTERS
Amazon
EMR
DATACENTER 2
INGESTION ETL
Hive
Why using Alluxio with Iceberg?
ALLUXIO 13
Why using Alluxio with Iceberg?
Improve IO performance and efficiency for data analytics with better data locality.
Simplify the management of Iceberg files together with computing engine.
Avoid the eventual consistent file system talk with Iceberg directly.
How to integrate Alluxio with Iceberg?
ALLUXIO 15
Alluxio Write Type
Write Type Description
MUST_CACHE Writes directly to Alluxio
*THROUGH Writes directly to under storage
*CACHE_THROUGH Writes to Alluxio and under storage
synchronously
ASYNC_THROUGH Writes to Alluxio first, then asynchronously
writes to the under storage
When all accesses go through Alluxio (S3 mounted as
under storage with Iceberg tables are stored)
16
Spark can read the iceberg table from Alluxio Data in
S3
Alluxio
Alluxio reads and writes
Iceberg tables from/to S3.
Spark can write Iceberg tables to Alluxio
Alluxio + Iceberg Architecture: Option 1
ALLUXIO 16
When Iceberg tables stored on under storage (e.g. S3 here) can be
updated out side Alluxio, how to avoid reading broken table?
17
On read: Spark query the iceberg table
with “metadata sync interval = 0”
⇒ retrieve the latest iceberg table
Data in
S3
Alluxio
On read: Alluxio always
check meta data and get the
latest Iceberg file and data
file from S3
On write: Alluxio writes to S3
with
CACHE_THROUGH/THROUGH,
which will guarantee the
strong consistency for Iceberg
table commit.
On write: Spark write the Iceberg
file and data file to S3 with
CACHE_THROUGH/THROUGH.
⇒ Strong consistency achieved
for Iceberg table commit.
Alluxio + Iceberg Architecture: Option 2
ALLUXIO 17
Query your Iceberg Table
Create Table
ALLUXIO 19
create table iceberg.test.test1 with
(format = 'PARQUET', partitioning =
ARRAY['c_birth_month']) as
SELECT
c_customer_sk,
c_birth_day,
c_birth_month
FROM
tpcds.sf100.customer
Insert
ALLUXIO 20
insert into
iceberg.test.test1
values
(
1000, 40, 13
)
;
Query
ALLUXIO 21
Screenshot from Chunxu’s talk earlier.
Schema Evolution
ALLUXIO 22
Screenshot from Chunxu’s talk earlier.
Iceberg Connector Updates
ALLUXIO 24
New Features
Native folder for metadata storage (Jack Ye, AWS)
Enable Iceberg Local Cache (Baolong, Tencent)
Upgrade to iceberg 1.12.0 and Parquet 0.12.0 (Xinli Shang, Uber and Beinan, Alluxio)
Predicate pushdown to iceberg (Beinan Wang, Alluxio)
Iceberg Native Catalog
Native folder for metadata storage (Jack Ye, AWS)
ALLUXIO 25
Iceberg Loca Cache
Enable Iceberg Local Cache (Baolong, Tencent)
ALLUXIO 26
Diagram is from: https://prestodb.io/blog/2021/02/04/raptorx
Predicate Pushdown
Reduce the number of partitions scanned by presto
ALLUXIO 27
Predicate Pushdown Resource Usage
Reduce the number of partitions scanned by presto
ALLUXIO 28
ALLUXIO 29
Ongoing Work
Native Iceberg IO (Jack Ye, AWS)
Materialized view (Chunxu Tang, Twitter)
Iceberg v2 support and Row level Delete(Beinan Wang, Alluxio)
Q & A

More Related Content

PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
PDF
Building an open data platform with apache iceberg
PDF
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
PDF
Hudi architecture, fundamentals and capabilities
PDF
Presto on Apache Spark: A Tale of Two Computation Engines
PDF
Apache Iceberg Presentation for the St. Louis Big Data IDEA
PDF
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
PDF
Building robust CDC pipeline with Apache Hudi and Debezium
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Building an open data platform with apache iceberg
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Hudi architecture, fundamentals and capabilities
Presto on Apache Spark: A Tale of Two Computation Engines
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Building robust CDC pipeline with Apache Hudi and Debezium

What's hot (20)

PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
PDF
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
PPTX
Snowflake essentials
PDF
Building large scale transactional data lake using apache hudi
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
PDF
Iceberg: A modern table format for big data (Strata NY 2018)
PPTX
Hive + Tez: A Performance Deep Dive
PDF
Let’s get to know Snowflake
PDF
Batch Processing at Scale with Flink & Iceberg
PDF
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
PDF
CDC patterns in Apache Kafka®
PDF
Introduction to elasticsearch
PDF
Apache Hudi: The Path Forward
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PPTX
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
PPTX
Snowflake Architecture.pptx
PPTX
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
PDF
Iceberg: a fast table format for S3
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Apache Iceberg - A Table Format for Hige Analytic Datasets
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
Snowflake essentials
Building large scale transactional data lake using apache hudi
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Iceberg: A modern table format for big data (Strata NY 2018)
Hive + Tez: A Performance Deep Dive
Let’s get to know Snowflake
Batch Processing at Scale with Flink & Iceberg
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
CDC patterns in Apache Kafka®
Introduction to elasticsearch
Apache Hudi: The Path Forward
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Where is my bottleneck? Performance troubleshooting in Flink
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Snowflake Architecture.pptx
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Iceberg: a fast table format for S3
Ad

Similar to Iceberg + Alluxio for Fast Data Analytics (20)

PDF
Accelerating Spark with Kubernetes
PDF
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
PDF
Enabling Ultra-fast Presto in the Cloud with Alluxio
PDF
Building Fast SQL Analytics on Anything with Presto, Alluxio
PDF
Spark Pipelines in the Cloud with Alluxio with Gene Pang
PDF
Unified Data API for Distributed Cloud Analytics and AI
PDF
Spark Pipelines in the Cloud with Alluxio
PDF
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
PDF
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
PDF
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
PDF
Best Practice in Accelerating Data Applications with Spark+Alluxio
PDF
Running Spark & Alluxio in Kubernetes
PDF
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
PDF
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
PDF
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
PDF
Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)
PDF
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
PDF
What’s new in Alluxio 2: from seamless operations to structured data management
PDF
Spark Summit EU talk by Jiri Simsa
PDF
Spark Summit EU talk by Jiri Simsa
Accelerating Spark with Kubernetes
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Enabling Ultra-fast Presto in the Cloud with Alluxio
Building Fast SQL Analytics on Anything with Presto, Alluxio
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Unified Data API for Distributed Cloud Analytics and AI
Spark Pipelines in the Cloud with Alluxio
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
Best Practice in Accelerating Data Applications with Spark+Alluxio
Running Spark & Alluxio in Kubernetes
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
What’s new in Alluxio 2: from seamless operations to structured data management
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
Ad

More from Alluxio, Inc. (20)

PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Introduction to Apache Iceberg™ & Tableflow
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
PDF
Best Practice for LLM Serving in the Cloud
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Introduction to Apache Iceberg™ & Tableflow
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
From Data Preparation to Inference: How Alluxio Speeds Up AI
Best Practice for LLM Serving in the Cloud
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio Webinar | Accelerate AI: Alluxio 101
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...

Recently uploaded (20)

PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
history of c programming in notes for students .pptx
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Introduction to Artificial Intelligence
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Digital Strategies for Manufacturing Companies
PDF
Softaken Excel to vCard Converter Software.pdf
Odoo POS Development Services by CandidRoot Solutions
How Creative Agencies Leverage Project Management Software.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
2025 Textile ERP Trends: SAP, Odoo & Oracle
PTS Company Brochure 2025 (1).pdf.......
history of c programming in notes for students .pptx
How to Migrate SBCGlobal Email to Yahoo Easily
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
VVF-Customer-Presentation2025-Ver1.9.pptx
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
CHAPTER 2 - PM Management and IT Context
Introduction to Artificial Intelligence
Wondershare Filmora 15 Crack With Activation Key [2025
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Which alternative to Crystal Reports is best for small or large businesses.pdf
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Online Work Permit System for Fast Permit Processing
Digital Strategies for Manufacturing Companies
Softaken Excel to vCard Converter Software.pdf

Iceberg + Alluxio for Fast Data Analytics

  • 1. Iceberg + Alluxio For fast Data Analytics Beinan Wang & Shouwei Chen @ Alluxio 2021/12/14
  • 2. Introduction Beinan Wang ● PrestoDB Committer ● PhD in CE @ Syracuse ● Email: beinan@alluxio.com ● Interactive Query / Compute Engine / Caching Shouwei Chen ● Core Maintainer @ Alluxio ● PhD in ECE @ Rutgers ● Email: shouwei@alluxio.com ● Data lake / Structured data / Community Find us on Alluxio community slack! https://alluxio.io/slack ALLUXIO 2
  • 3. Outline ● Alluxio Overview ● Running Iceberg with Alluxio ● Querying your Iceberg Table with Presto ● Presto Iceberg connector updates ● Q & A ALLUXIO 3
  • 5. Open Source Started From UC Berkeley AMPLab in 2014 Join the conversation on Slack alluxio.io/slack 1,000+ contributors & growing 5,000+ Slack Community Members Top 10 Most Critical Java Based Open Source Project GitHub’s Top 100 Most Valuable Repositories Out of 96 Million
  • 6. Data Orchestration for Analytics & AI in the Cloud Available:
  • 7. ALLUXIO 7 DATA ACCESSIBILITY Access any storage using any compute
  • 8. ALLUXIO 8 BRING DATA CLOSER TO COMPUTE ACROSS SILOS Access based data movement for compute and storage spread across environments v REGION A v REGION B REGION A REGION B PRIVATE DATA CENTERS Amazon EMR Cloud Dataproc Kubernetes Engine Compute Engine DATACENTER 2 DATACENTER 1 Hive
  • 9. COMMON USE CASES Hybrid Cloud Gateway to utilize on-prem compute for data in the cloud CASE 02: HYBRID Alluxio Spark PUBLIC CLOUD ON PREMISE Cross Datacenter Access without changing Ingest Pipeline across regions CASE 03: MULTI-DATACENTER Presto Alluxio DATACENTER 1 DATACENTER 2 INGESTION ALLUXIO 9 Consistent SLAs, Performance, and Cost Savings on cloud storage CASE 01: CLOUD PUBLIC CLOUD Tensorflow Alluxio
  • 10. Alluxio - Key Innovations ALLUXIO 10 Acceleration, efficient representation and movement of data based on policies EFFICIENT ACCESS & EASY DATA MANAGEMENT Orchestrate a data platform with agility across regions for private, hybrid or multi-cloud ENVIRONMENT AGNOSTIC & MULTI-CLOUD READY Support multiple APIs for analytics and AI with storage abstraction and streamlined data movement across the pipeline UNIFY DATA LAKES ≈
  • 11. ALLUXIO 11 EXAMPLE JOURNEY On-premises storage as the source of truth v REGION A REGION B PRIVATE DATA CENTERS Amazon EMR DATACENTER 2 INGESTION ETL Hive
  • 12. Why using Alluxio with Iceberg?
  • 13. ALLUXIO 13 Why using Alluxio with Iceberg? Improve IO performance and efficiency for data analytics with better data locality. Simplify the management of Iceberg files together with computing engine. Avoid the eventual consistent file system talk with Iceberg directly.
  • 14. How to integrate Alluxio with Iceberg?
  • 15. ALLUXIO 15 Alluxio Write Type Write Type Description MUST_CACHE Writes directly to Alluxio *THROUGH Writes directly to under storage *CACHE_THROUGH Writes to Alluxio and under storage synchronously ASYNC_THROUGH Writes to Alluxio first, then asynchronously writes to the under storage
  • 16. When all accesses go through Alluxio (S3 mounted as under storage with Iceberg tables are stored) 16 Spark can read the iceberg table from Alluxio Data in S3 Alluxio Alluxio reads and writes Iceberg tables from/to S3. Spark can write Iceberg tables to Alluxio Alluxio + Iceberg Architecture: Option 1 ALLUXIO 16
  • 17. When Iceberg tables stored on under storage (e.g. S3 here) can be updated out side Alluxio, how to avoid reading broken table? 17 On read: Spark query the iceberg table with “metadata sync interval = 0” ⇒ retrieve the latest iceberg table Data in S3 Alluxio On read: Alluxio always check meta data and get the latest Iceberg file and data file from S3 On write: Alluxio writes to S3 with CACHE_THROUGH/THROUGH, which will guarantee the strong consistency for Iceberg table commit. On write: Spark write the Iceberg file and data file to S3 with CACHE_THROUGH/THROUGH. ⇒ Strong consistency achieved for Iceberg table commit. Alluxio + Iceberg Architecture: Option 2 ALLUXIO 17
  • 19. Create Table ALLUXIO 19 create table iceberg.test.test1 with (format = 'PARQUET', partitioning = ARRAY['c_birth_month']) as SELECT c_customer_sk, c_birth_day, c_birth_month FROM tpcds.sf100.customer
  • 21. Query ALLUXIO 21 Screenshot from Chunxu’s talk earlier.
  • 22. Schema Evolution ALLUXIO 22 Screenshot from Chunxu’s talk earlier.
  • 24. ALLUXIO 24 New Features Native folder for metadata storage (Jack Ye, AWS) Enable Iceberg Local Cache (Baolong, Tencent) Upgrade to iceberg 1.12.0 and Parquet 0.12.0 (Xinli Shang, Uber and Beinan, Alluxio) Predicate pushdown to iceberg (Beinan Wang, Alluxio)
  • 25. Iceberg Native Catalog Native folder for metadata storage (Jack Ye, AWS) ALLUXIO 25
  • 26. Iceberg Loca Cache Enable Iceberg Local Cache (Baolong, Tencent) ALLUXIO 26 Diagram is from: https://prestodb.io/blog/2021/02/04/raptorx
  • 27. Predicate Pushdown Reduce the number of partitions scanned by presto ALLUXIO 27
  • 28. Predicate Pushdown Resource Usage Reduce the number of partitions scanned by presto ALLUXIO 28
  • 29. ALLUXIO 29 Ongoing Work Native Iceberg IO (Jack Ye, AWS) Materialized view (Chunxu Tang, Twitter) Iceberg v2 support and Row level Delete(Beinan Wang, Alluxio)
  • 30. Q & A