SlideShare a Scribd company logo
Query Optimizer: pursuit of performance
Martin Traverso, Facebook
Kamil Bajda-Pawlikowski, Starburst
@prestodb @starburstdata
DataWorks Summit
2018 @ San Jose, CA
Presto: SQL-on-Anything
Deploy Anywhere, Query Anything
Key Highlights
● ANSI SQL
● Interactive performance
● High concurrency
● Proven scalability
● Separation of compute and storage
● Query data where it lives (no ETL needed)
● Hadoop / cloud vendor agnostic
● Community-driven open source project
● Apache licence, hosted on GitHub
Project Timeline
©2017 Starburst Data, Inc. All Rights Reserved
FALL 2012
6 developers
start Presto
development
SUMMER 2017
180+ Releases
50+ Contributors
5000+ Commits
WINTER 2017
Starburst is founded
by a team of Presto
committers, Teradata
veterans
FALL 2013
Facebook open
sources Presto
SPRING 2015
Teradata joins the
community, begins
investing heavily in
the project,
connects Teradata
to Presto via
QueryGrid
FALL 2008
Facebook open
sources Hive
Presto Community
See more at https://github.com/prestodb/presto/wiki/Presto-Users
Presto in Production
Facebook: 1000s of nodes, HDFS (ORC, RCFile), sharded MySQL, 1000s of users
Uber: 800+ nodes (2 clusters on premises) with 200K+ queries daily over HDFS (Parquet/ORC)
Twitter: 800+ nodes (several clusters on premises) for HDFS (Parquet)
LinkedIn: 350+ nodes (2 clusters on premises), 40K+ queries daily over HDFS (ORC), 600+ users
Netflix: 250+ nodes in AWS, 40+ PB in S3 (Parquet)
Lyft: 200+ nodes in AWS, 20K+ queries daily, 20+ PBs in Parquet
Yahoo! Japan: 200+ nodes (4 clusters on premises) for HDFS (ORC), ObjectStore, and Cassandra
FINRA: 120+ nodes in AWS, 4PB in S3 (ORC), 200+ users
Built for Performance
Query Execution Engine:
● MPP-style pipelined in-memory execution
● Columnar and vectorized data processing
● Runtime query bytecode compilation
● Memory efficient data structures
● Multi-threaded multi-core execution
● Optimized readers for columnar formats (ORC and Parquet)
● Now also Cost-Based Optimizer
Evolving the optimizer - Challenges
● Diverse and widespread production workloads
● Fast-changing codebase
● Many developers
● Large surface area and usage of plan IR
Before
● Monolithic visitor-based plan transformations
● Visitors responsible for walking and transforming plan tree
● Problems
○ Hard to add new operations (IR node types)
○ Hard to add new optimizations
○ Hard to test optimizers
class LimitPushdown {
Plan optimize(Plan) { return plan.root.accept(this) }
Node visitLimit(LimitNode) { ... }
Node visitProject(ProjectNode) { ... }
Node visitFilter(FilterNode) { ... }
...
}
Presto query optimizer: pursuit of performance
Now
● Granular rule-based transformations
● Rules responsible for transforming localized subplan structure
● Optimizer loop responsible for walking plan and driving rule application
● Benefits
○ Decouples traversal from rule application
○ Decouples adding new optimizations from adding new operations (IR node types)
○ Easier to reason about and test individual rule behavior
class PushLimitThroughProjectRule {
Pattern getPattern() { Patterns.limit().with(source().matching(project())) }
Node apply(Node) { ... }
}
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
Migrating from monolithic optimizers to rules
● Fallback behavior
● Controlled via config option or
per-query session property
● Removed after a few releases
optimizers = [
RuleBasedOptimizer(
legacy = LimitPushdown,
rules = [
PushLimitThroughProject,
PushLimitThroughUnion,
PushLimitThroughJoin
]
),
PredicatePushdown,
PruneUnusedColumns,
AddExchanges,
EliminateCrossJoins,
...
]
Adding cost-aware optimizers
● Just another rule
● Can reason about cost
optimizers = [
RuleBasedOptimizer(
rules = [
PushLimitThroughProject,
PushLimitThroughUnion,
PushLimitThroughJoin,
ReorderJoins
]
),
...
]
class ReorderJoins {
ReorderJoins(CostComparator) { ... }
Pattern getPattern() { ... }
Node apply(Node) { ... }
}
CBO in a nutshell
Cost-Based Optimizer v1 includes:
● support for statistics stored in Hive Metastore
● join reordering based on selectivity estimates and cost
● automatic join type selection (repartitioned vs broadcast)
● automatic left/right side selection for joined tables
https://www.starburstdata.com/technical-blog/
Statistics & Cost
Hive Metastore statistics:
● number of rows in a table
● number of distinct values in a column
● fraction of NULL values in a column
● minimum/maximum value in a column
● average data size for a column
Cost calculation includes:
● CPU
● Memory
● Network I/O
Join type selection
Join left/right side decision
Join reordering
Join reordering with filter
Filter estimation
Join tree shapes
Benchmark results (on prem)
CBO off
CBO on
https://www.starburstdata.com/technical-blog/presto-cost-based-optimizer-rocks-the-tpc-benchmarks/
Benchmark results (cloud)
https://www.starburstdata.com/aws
Benchmark results (Facebook)
Roadmap
● CBO enhancements:
○ Additional rewrites
○ Costing for more operators
○ Built-in statistics collection
○ Exposing statistics for additional connectors
○ Additional types of statistics (e.g., histograms)
● General functionality:
○ Spill to disk enhancements
○ Geospatial functions performance
○ New connectors (ElasticSearch, Kudu)
○ Resource-aware query submission
○ Misc performance improvements
Further reading
www.prestodb.io
www.starburstdata.com
https://eng.uber.com/presto/
https://www.kdnuggets.com/2018/04/presto-data-scientists-sql.html
https://www.oreilly.com/ideas/query-the-planet-geospatial-big-data-analytics-at-uber
https://allegro.tech/2017/06/presto-small-step-for-devops-engineer-big-step-for-big-data-analyst.html
https://www.slideshare.net/MartinTraverso/presto-at-facebook-presto-meetup-boston-1062015
http://engineering.grab.com/scaling-like-a-boss-with-presto
https://www.techatbloomberg.com/blog/reducing-application-development-time-connecting-apache-presto-
accumulo/
Thank You!
@prestodb @starburstdata
www.starburstdata.comwww.prestodb.io

More Related Content

PDF
Apache Calcite Tutorial - BOSS 21
PDF
Apache Calcite (a tutorial given at BOSS '21)
PDF
Apache Calcite: One planner fits all
PPTX
Real-time Analytics with Trino and Apache Pinot
PDF
Extending Druid Index File
PDF
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
PPTX
High Performance, High Reliability Data Loading on ClickHouse
PDF
ClickHouse Deep Dive, by Aleksei Milovidov
Apache Calcite Tutorial - BOSS 21
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite: One planner fits all
Real-time Analytics with Trino and Apache Pinot
Extending Druid Index File
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
High Performance, High Reliability Data Loading on ClickHouse
ClickHouse Deep Dive, by Aleksei Milovidov

What's hot (20)

PDF
All about Zookeeper and ClickHouse Keeper.pdf
PDF
Scalability, Availability & Stability Patterns
PDF
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
PPTX
Dynamic filtering for presto join optimisation
PDF
Embulk - 進化するバルクデータローダ
PDF
Data Source API in Spark
PDF
A Deep Dive into Query Execution Engine of Spark SQL
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
PDF
Under the Hood of a Shard-per-Core Database Architecture
PDF
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
PDF
Facebook Presto presentation
PDF
Spark shuffle introduction
PDF
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
PDF
QuestDB: The building blocks of a fast open-source time-series database
PDF
Understanding and Improving Code Generation
PDF
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
PDF
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
PDF
Introduction to Apache Calcite
PPTX
Optimizing Apache Spark SQL Joins
PDF
백억개의 로그를 모아 검색하고 분석하고 학습도 시켜보자 : 로기스
All about Zookeeper and ClickHouse Keeper.pdf
Scalability, Availability & Stability Patterns
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Dynamic filtering for presto join optimisation
Embulk - 進化するバルクデータローダ
Data Source API in Spark
A Deep Dive into Query Execution Engine of Spark SQL
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Under the Hood of a Shard-per-Core Database Architecture
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Facebook Presto presentation
Spark shuffle introduction
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
QuestDB: The building blocks of a fast open-source time-series database
Understanding and Improving Code Generation
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
Introduction to Apache Calcite
Optimizing Apache Spark SQL Joins
백억개의 로그를 모아 검색하고 분석하고 학습도 시켜보자 : 로기스
Ad

Similar to Presto query optimizer: pursuit of performance (20)

PDF
Presto talk @ Global AI conference 2018 Boston
PDF
Sprint 44 review
PDF
Time series denver an introduction to prometheus
PDF
Sprint 78
PDF
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
PDF
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
PDF
Apache Calcite: One Frontend to Rule Them All
PDF
Tajo_Meetup_20141120
PDF
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
PPTX
Tracking the Performance of the Web Over Time with the HTTP Archive
PPTX
Akamai Edge: Tracking the Performance of the Web with HTTP Archive
PDF
Restlet: Building a multi-tenant API PaaS with DataStax Enterprise Search
PDF
Sprint 50 review
PDF
Presto: Query Anything - Data Engineer’s perspective
PDF
202107 - Orion introduction - COSCUP
PPTX
Geospatial data platform at Uber
PDF
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
PDF
Sprint 71
PDF
Fast federated SQL with Apache Calcite
PDF
Sprint 45 review
Presto talk @ Global AI conference 2018 Boston
Sprint 44 review
Time series denver an introduction to prometheus
Sprint 78
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterp...
OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack by...
Apache Calcite: One Frontend to Rule Them All
Tajo_Meetup_20141120
Agile Oracle to PostgreSQL migrations (PGConf.EU 2013)
Tracking the Performance of the Web Over Time with the HTTP Archive
Akamai Edge: Tracking the Performance of the Web with HTTP Archive
Restlet: Building a multi-tenant API PaaS with DataStax Enterprise Search
Sprint 50 review
Presto: Query Anything - Data Engineer’s perspective
202107 - Orion introduction - COSCUP
Geospatial data platform at Uber
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Sprint 71
Fast federated SQL with Apache Calcite
Sprint 45 review
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Empathic Computing: Creating Shared Understanding
PDF
cuic standard and advanced reporting.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Big Data Technologies - Introduction.pptx
PDF
KodekX | Application Modernization Development
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
MYSQL Presentation for SQL database connectivity
Advanced Soft Computing BINUS July 2025.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Empathic Computing: Creating Shared Understanding
cuic standard and advanced reporting.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Big Data Technologies - Introduction.pptx
KodekX | Application Modernization Development
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Monthly Chronicles - July 2025
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Chapter 3 Spatial Domain Image Processing.pdf

Presto query optimizer: pursuit of performance

Editor's Notes

  • #8: Presto is known for raw performance, but… … there’s only so much we can get without reducing algorithmic complexity of a query plan. Cost-aware optimizations are a step towards being able to handle a broader range of queries efficiently.
  • #9: New behavior needs to be gated and there has to be a fallback to old behaviors Changes have to be made as incrementally as possible Big-bang changes cause developer churn (resolving conflicts, redoing work) and increase risk for production systems
  • #10: Quick look at the history of the presto optimizer
  • #29: Results from evaluating CBO on an internal FB query corpus drawn from production workloads.