SlideShare a Scribd company logo
Beyond EXPLAIN
Yuto Hayamizu
Ryoji Kawamichi
Query Optimization
From Theory To Code
2016/5/20
PGCon 2016 @ Ottawa
Before Relational …
• Querying was physical
• Need to understand
physical organization
• Navigate query execution
by yourself
2016/5/20 2
DEPT
EMP
PROJ
“Which file is this table stored in?”
“How are records linked?”
“Which access path is fast for this table?”
“What is the best order of joining tables”
…
Historically …
Before Relational …
• Querying was physical
• Need to understand
physical organization
• Navigate query execution
by yourself
2016/5/20 3
After Relational …
DEPT
EMP
PROJ
“Which file is this table stored in?”
“How are records linked?”
“Which access path is fast for this table?”
“What is the best order of joining tables”
…
• Querying is logical
• Physical organization is
black-boxed
• Just declare what you want
Historically …
Fill the Gap: Physical and Logical
2016/5/20 4
SELECT * FROM DEPT D, EMP E
WHERE E.D_ID = D.ID AND ...
• Storage I/O strategy
• Access path selection
• Join method selection
• Aggregation, sorting
• Resource allocation
• ...
Query Optimizer
If optimizer perfectly fills the gap...
2016/5/20 5
We don’t need EXPLAIN
Reality Is Tough
• Optimizer is NOT PERFECT
• Generated plans are not always optimal, sometimes
far from optimal
• We have to take care of physical behavior
• That’s why EXPLAIN is so much explained
2016/5/20 6
Go Beyond EXPLAIN
• Deeper understanding of optimization, better
control of your databases
• Theoretical fundamentals of query optimization
• From basic framework to cutting-edge technologies
• PostgreSQL Optimizer implementation
• Focusing on basic scan and join methods
• Behavior observation with TPC-H benchmark
2016/5/20 7
Outline
• Introduction
• Theory: Query Optimization Framework
• Code: PostgreSQL Optimizer
• Theory: Cutting-Edge Technologies Overview
• Summary
2016/5/20 8
Query Optimization Framework
• Cost-based optimization
• Plan selection with estimated execution cost
• Most of modern optimizers, including PostgreSQL,
are cost-based
• Rule-based optimization
• Plan selection with heuristically ranked rules
• Easy to produce the same result
• Hard to evaluate wide variety of plans
• Ex) Oracle (~10g), Hive (~0.13)
2016/5/20 9
Main Challenges in Cost-based Optimization
• Cost modeling is HARD
• Overhead of CPU, I/O, memory access, network, …
• Cardinality estimation is HARD
• Output size of scans, joins, aggregations, …
• Join ordering search is HARD
• Combinatorial explosion of join ordering and access path
• Exhaustive search is NP-hard
2016/5/20 10
System-R optimizer (1979)
• “The standard”
• Cost estimation with I/O and CPU
• Cardinality estimation with table statistics
• Bottom-up plan search
• Many of modern optimizers are “System-R style”
• PostgreSQL, MySQL, DB2, Oracle, ...
2016/5/20 11
Cost/Cardinality Estimation
• [#page fetched],[#storage API calls]
are estimated with cost formula and following
statistics
2016/5/20 12
CPU costI/O cost
COST = [#page fetched] + W * [#storage API calls]
weight parameter
• NCARD(T) ... the cardinality of relation T
• TCARD(T) ... the number of pages in relation T
• ICARD(I) ... the number of distinct keys in index I
• NINDX(I) ... the number of pages in index I
Bottom-up Plan Search
• Candidate plans for single relation
• The cheapest access path
• N-relation join ordering search
• Select the cheapest plans for each relation
• Then, find optimal join orderings of every 2-relation join
• Then, find optimal join orderings of every 3-relation join
• ... until N-relation
2016/5/20 13
Ex) A ⨝ B ⨝ C ⨝ D
Ex) A ⨝ B ⨝ C ⨝ D
Ex) A ⨝ B ⨝ C ⨝ D
Ex) A ⨝ B ⨝ C ⨝ D
Ex) A ⨝ B ⨝ C ⨝ D
Volcano/Cascades (1993)
• Top-down transformational plan search
• Yet another optimization approach
• Not well known as “System-R style”, but widely used in
practice
Ex) SQL Server, Apache Hive (Apache Calcite), Greenplum
Orca
• Extensible optimization framework
2016/5/20 19
Extensible Optimization Framework
Query Optimizer Generator
• Generalized expression of query plan not limited
to relational data model
• Users (optimizer developers) defines actual
implementations:
• Logical operator ... corresponds to relational algebra
• Physical algorithm ... corresponds to scan & join
methods such as sequential scan, index scan, hash
join, nested loop join
2016/5/20 20
Top-down Transformational Search
• Starts from an initial “logical plan”
• Generate alternative plans with:
A) Logical operator transformation
B) Physical algorithm selection
C) Enforcing sorting order
2016/5/20 21
Join Select T
Join
Select R Select S
Proj
Join
Select T
JoinSelect R
Select S
Proj
Change join ordering
Join Select T
Join
Select R
Select SProj
Projection push down
Example: 3-way join with projection
Top-down Transformational Search
• Starts from an initial “logical plan”
• Generate alternative plans with:
A) Logical operator transformation
B) Physical algorithm selection
C) Enforcing sorting order
2016/5/20 22
Join Select T
Join
Select R Select S
ProjExample: 3-way join with projection
HashJoin Select T
Join
SeqScan R SeqScan S
Proj
Join IdxScan T
Join
Select R Select S
Proj
…
Top-down Transformational Search
• Starts from an initial “logical plan”
• Generate alternative plans with:
A) Logical operator transformation
B) Physical algorithm selection
C) Enforcing sorting order
2016/5/20 23
Join Select T
Join
Select R Select S
Proj
Example: 3-way join with projection
Join Select T
Join
Sort Sort
Proj
Select R Select S
Enforce sorting order
merge join of R and S is possible now
Benefits of Top-down approach
• Possible to intentionally limit search space
• Effective pruning with branch-and-bound
• Limit search space with search time deadline
2016/5/20 24
Cost-based Optimization Basics
Two major cost-based optimization style
• System-R
• Cost modeling with statistics
• Bottom-up search
• Volcano/Cascades
• Extensible optimizer generator
• Cost estimation is user’s responsibility
• Top-down transformational search
2016/5/20 25
Outline
• Introduction
• Theory: Query Optimization Framework
• Code: PostgreSQL Optimizer
• Theory: Cutting-Edge Technologies Overview
• Summary
2016/5/20 26
PostgreSQL Optimizer
“System-R style” optimization
• Bottom-up plan search with dynamic programming
• CPU and I/O operation based cost modeling
2016/5/20 27
Seq. I/O Random I/O CPU cost per tuple
Cost of single operation
• seq_page_cost
• random_page_cost
• cpu_tuple_cost
• cpu_index_tuple_cost
• cpu_operator_cost
• (parallel_tuple_cost)
Estimated number of each operation
• Cardinality estimation with
statistics
• Cost formula for each plan type
• SeqScan, IndexScan
• NestLoopJoin, HashJoin,
MergeJoin, ...
Detailed Look At Basic Scan Types
• Sequential scan
• Efficient for accessing large potion of tables
• Index scan
• Efficient for accessing a fraction of data
2016/5/20 28
Execution cost
Query selectivity
Sequential scan
of SeqScan
29
= (# pages in a table)
= (# tuples in a table)
= #qual_operator
= (#tuples) × (weight factor of A)
+ (#tuples) × (weight factor of B)
+ ・・・
・・・ WHERE AND AND ・・・A B
cost_seqscan()
@optimizer/path/costsize.c
of IndexScan
Consists of:
(A) CPU cost of searching B+-tree
(B) CPU cost of scanning index tuples in leaf pages
(C) I/O cost of leaf pages
(D) I/O cost of heap pages
(E) CPU cost of scanning heap tuples
2014/12/04 30
of IndexScan
(A) B+-tree search
(B) Scanning index tuples in leaf pages
2014/12/04 31
+= log2(#index_tuples)
I/O cost of internal pages
Assumed to be always cached in the buffer
+= #qual_operator
× #leaf_pages ✕ #ituple_per_page × σ
Selectivity σ
Comes from statistics
Mackert and Lohman function(Yao function)
I/O count estimation with consideration of buffer caching
of IndexScan
(C) I/O cost of index leaf pages
2014/12/04 32
+= Y(effective_cache_size, #leaf_pages)
Selectivity σ
I/Ocount
of IndexScan
(D) I/O cost of heap pages
(E) CPU cost of scanning heap tuples
・ Estimate the number of scanned tuples from σ
2014/12/04 33
+= α2 × #match_pages
Correlation between index and heap ordering: α
α = 0 : I/O pattern is random α = 1 : I/O pattern is sequential
+= (1-α2) × #match_tuples
Detailed Look At Join Methods
• Hash join
• Efficient for joining large number of records
• Usually combined with sequential scans
• Nested Loop Join
• Efficient for joining small number of records
• Usually combined with index scans or small table
sequential scans
2016/5/20 34
of HashJoin
2014/12/04 35
of HashJoin
Build phase
• Cost += Cost(inner)
2014/12/04 36
+= #qual_op × #inner_tuples
+= #inner_tuples
Hashing cost
of HashJoin
2014/12/04 37
Build phase
• Cost += Cost(inner)
+= #qual_op × #inner_tuples
+= #inner_tuples
of HashJoin
2014/12/04 38
Build phase
• Cost += Cost(inner)+
Probe phase
• Cost += Cost(outer)+
+= #qual_op × #inner_tuples
+= #inner_tuples
+= #qual_op × (1 + #bucket_size × 0.5)
× #outer_tuples
+= #match_tuples
Hashing & table lookup (bucket search) cost
recordrecordrecordrecordrecordrecordrecordrecordrecordrecordrecordrecordrecordrecordrecordtuple
of HashJoin
2014/12/04 39
#buckets: 2 #buckets : 4
build
4 tuples are compared for
lookup in average 2 tuples are compared for lookup
in average
0.E+00
5.E+06
1.E+07
2.E+07
2.E+07
3.E+07
3.E+07
4.E+07
4.E+07
10000 100000 1000000 10000000 100000000
Estimated cost of 2-way HashJoin
# of records
16 records
recordrecordrecordrecordrecordrecordrecordtuple
recordrecordrecordrecordrecordrecordrecordtuple
recordrecordrecordtuple
recordrecordrecordtuple
recordrecordrecordtuple
recordrecordrecordtuple
of NestLoopJoin
2014/12/04 40
R ⨝ S Scan R r2
r1
r3
r4
Scan S with r1
s1
s2
s3
ReScan S with r2
s1
s2
s3
outer inner
of NestLoopJoin
• When #outer_tuples = 1
2014/12/04 41
R ⨝ S Scan R
r1
Scan S with r1
s1
s2
s3
outer inner
Cost = Cost(outer) + Cost(inner) +
+= #inner_tuples
+= #qual_operator × #inner_tuples
of NestLoopJoin
• When #outer_tuples > 1
R ⨝ S Scan R r2
r1
r3
r4
Scan S with r1
s1
s2
s3
ReScan S with r2
s1
s2
s3
outer inner
Cost = Cost(outer) + Cost(inner) +
+ (#outer_tuples - 1) × Cost(ReScan inner)
Higher buffer hit ratio in ReScan
→ Cost of ReScan is lower than cost of IndexScan
+= #inner_tuples × #outer_tuples
+= #qual_operator × #inner_tuples × #outer_tuples
See How It Works
• TPC-H Benchmark
• Specification and tools for benchmarking data
warehouse workload
• Open source implementation: DBT-3, pg_tpch
• Schema, data generation rules and queries
• Experiments with 100GB
• Scale Factor = 100
2016/5/20 43
Experimental Setup
• Dell R720xd
• Xeon (2sockets, 16cores)
• x24 NL-SAS HDD
• With PostgreSQL 9.5
• Default cost parameter settings
• SeqScan & HashJoin
• enable_seqscan = on, enable_hashjoin = on
and disables other methods
• IndexScan & NestLoopJoin
• enable_indexscan = on, enable_nestloop = on
and disables other methods
2016/5/20 44
TPC-H Q.1: The Simplest Case
2016/5/20 45
5.E+05
5.E+06
5.E+07
5.E+08
1 10 100 1000
Estimated cost
Selectivity (l_shipdate)
IndexScan
SeqScan
10
100
1000
10000
1 10 100 1000
Execution time (sec)
Selectivity(l_shipdate)
IndexScan
SeqScan
• Good trend estimation for each
method
• Estimated break-event point is
errorneus
• IndexScan should be more
expensive (need parameter
calibration)
SELECT count(*), ... FROM lineitem
WHERE l_shipdate BETWEEN [X] AND [Y]
TPC-H Q.3
2016/5/20 46
Estimated cost
SeqScan
customer
SeqScan
orders
SeqScan
lineitem
Hash
HashJoin
HashJoin
IndexScan
orders
IndexScan
lineitem
NestLoop
NestLoop
IndexScan
customer
Execution time (sec)
Selectivity
1.E+00
1.E+02
1.E+04
1.E+06
1.E+08
1 10 100 1000 10000 100000 1000000
1
10
100
1000
10000
1 10 100 1000 10000 100000 1000000
Selectivity
NestLoop+IndexScan
HashJoin+SeqScan
NestLoop+IndexScan
HashJoin+SeqScan Similar result as in Q.1
• Good trend estimation for each
• Erroneous break-event point
without parameter calibration
SELECT count(*), ...
FROM customer, orders, lineitem
WHERE c_custkey = o_custkey AND
o_orderkey = l_orderkey AND
c_custkey < [X] AND
c_mktsegment = ‘MACHINERY’;
100
1000
10000
100000
1 10 100 1000 10000
More Complex Case
TPC-H Q.4: Semi-Join Query
• Plan selection for semi-
join tend to be unstable
2016/5/20 47Selectivity
1.E+05
1.E+06
1.E+07
1.E+08
1.E+09
1 10 100 1000 10000
HashJoin+SeqScan
NestLoop+IndexScan
Estimated cost
Execution time (sec)
HashJoin+SeqScan
NestLoop+IndexScan SELECT count(*), ...
FROM orders
WHERE
o_orderdate >= ‘1995-01-01’ AND
o_orderdate < ‘1995-01-01’
+ interval ‘3 month’ AND
EXISTS(
SELECT * FROM lineitem
WHERE l_orderkey = o_orderkey
AND l_commitdate < l_receiptdate)
Selectivity
1.E+06
1.E+07
1.E+08
1.E+09
1.E+10
1.E+11
1.E+00 1.E+02 1.E+04 1.E+06 1.E+08
More Complex Case
TPC-H Q.22: Anti-Join Query
• Difficulties in overall cost
trend estimation
2016/5/20 48
Selectivity
1
10
100
1000
10000
1.E+00 1.E+02 1.E+04 1.E+06 1.E+08
Estimated cost
Execution time (sec)
Selectivity
HashJoin+SeqScan
NestLoop+IndexScan
HashJoin+SeqScan
NestLoop+IndexScan
SELECT count(*), ...
FROM supplier, lineitem l1, orders, nation
WHERE s_suppkey = l1.l_suppkey AND
o_orderkey = l1.l_orderkey AND
o_orderstatus = 'F' AND
l1.l_receiptdate > l1.l_commitdate AND
EXISTS (
SELECT * FROM lineitem l2
WHERE l2.l_orderkey = l1.l_orderkey
AND l2.l_suppkey <> l1.l_suppkey)
AND NOT EXIST (
SELECT * FROM lineitem l3
WHERE l3.l_orderkey = l1.l_orderkey
AND l3.l_suppkey <> l1.l_suppkey
AND l3.l_receiptdate > l3.l_commitdate)
AND s_nationkey = n_nationkey
AND n_name = ‘JAPAN'
Summary: PostgreSQL
Optimizer
• Detailed look at cost modeling of basic methods
• SeqScan, IndexScan
• HashJoin, NestedLoopJoin
• Observation with TPC-H benchmark
• Good cost trend estimation for simple join queries
• Erroneous cheapest plan selection without parameter tuning
• Difficulties with semi-join and anti-join queries
2016/5/20 49
Outline
• Introduction
• Theory: Query Optimization Framework
• Code: PostgreSQL Optimizer
• Theory: Cutting-Edge Technologies Overview
• Summary
2016/5/20 50
Cutting-Edge Technologies
• Traditional optimization was a “closed” problem
• “Rethink the contract” ー Surajit Chaudhuri
• Feedback from previous execution
• Dynamic integration with execution
2016/5/20 51
cardinality
estimation
cost model
plan space
enumeration
(SQL) query plan
Mid-query Re-optimization
• Detects sub-optimality of executing query plan
• Query plans are annotated for later estimation
improvement
• Runtime statistics collection
• Statistics collector probes are inserted into operators of
executing query plan
• Plan modification strategy
• Discard current execution and re-optimize whole plan
• Re-optimizer only subtree of the plan that are not
started yet
• Save partial execution result and generate new SQL
using the result
2016/5/20 52
[N. Kabra et.al., SIGMOD’98]
Plan Bouquet
• Generate a set of plans for each selectivity range
• Estimation improvement with runtime statistics
collection
• Evaluation with PostgreSQL
2016/5/20 53
[A. Dutt et.al., SIGMOD’14]
Bounding Impact of Estimation Error
• “Uncertainty” analysis of cost estimation
• Optimality sensitivity to estimation error
• Execute partially to reduce uncertainty
2016/5/20 54
[T. Neumann et.al., BTW Conf ‘13]
Outline
• Introduction
• Theory: Query Optimization Framework
• Code: PostgreSQL Optimizer
• Theory: Cutting-Edge Technologies Overview
• Summary
2016/5/20 55
Summary
• Cost-based optimization framework
• System-R style bottom-up optimization
• Volcano style top-down optimization
• Detailed look at PostgreSQL optimizer
• Cost modeling of basic scan and join method
• Experiment with TPC-H benchmark
• Brief overview of cutting-edge technologies
2016/5/20 56

More Related Content

PDF
性能測定道 事始め編
PDF
性能測定道 実践編
PDF
Vacuum徹底解説
PDF
より深く知るオプティマイザとそのチューニング
PDF
PostgreSQLアーキテクチャ入門
PDF
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
PDF
PostgreSQL: XID周回問題に潜む別の問題
PPTX
PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~
性能測定道 事始め編
性能測定道 実践編
Vacuum徹底解説
より深く知るオプティマイザとそのチューニング
PostgreSQLアーキテクチャ入門
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
PostgreSQL: XID周回問題に潜む別の問題
PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~

What's hot (20)

PPTX
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
PDF
Parquet performance tuning: the missing guide
PPTX
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
PPTX
PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...
PDF
pg_hint_planを知る(第37回PostgreSQLアンカンファレンス@オンライン 発表資料)
PDF
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
PDF
PostgreSQLアンチパターン
PDF
PostgreSQL:行数推定を読み解く
PDF
PostgreSQLの運用・監視にまつわるエトセトラ
PDF
Solving PostgreSQL wicked problems
PDF
PostgreSQLバックアップの基本
PDF
Designing Structured Streaming Pipelines—How to Architect Things Right
PDF
MySQLで論理削除と正しく付き合う方法
PDF
5 Steps to PostgreSQL Performance
PDF
Parquet - Data I/O - Philadelphia 2013
PDF
Building a SIMD Supported Vectorized Native Engine for Spark SQL
PDF
InnoDB Locking Explained with Stick Figures
PDF
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PDF
Apache Spark Overview
PDF
Delta Lake: Optimizing Merge
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
Parquet performance tuning: the missing guide
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQL 12は ここがスゴイ! ~性能改善やpluggable storage engineなどの新機能を徹底解説~ (NTTデータ テクノ...
pg_hint_planを知る(第37回PostgreSQLアンカンファレンス@オンライン 発表資料)
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
PostgreSQLアンチパターン
PostgreSQL:行数推定を読み解く
PostgreSQLの運用・監視にまつわるエトセトラ
Solving PostgreSQL wicked problems
PostgreSQLバックアップの基本
Designing Structured Streaming Pipelines—How to Architect Things Right
MySQLで論理削除と正しく付き合う方法
5 Steps to PostgreSQL Performance
Parquet - Data I/O - Philadelphia 2013
Building a SIMD Supported Vectorized Native Engine for Spark SQL
InnoDB Locking Explained with Stick Figures
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
Apache Spark Overview
Delta Lake: Optimizing Merge
Ad

Similar to Beyond EXPLAIN: Query Optimization From Theory To Code (20)

PDF
unit 3 DBMS.docx.pdf geometric transformer in query processing
PDF
unit 3 DBMS.docx.pdf geometry in query p
PDF
CH5_Query Processing and Optimization.pdf
PDF
Issues in Query Processing and Optimization
PPTX
CS 542 -- Query Optimization
PDF
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
PDF
dd presentation.pdf
PDF
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
PDF
01Query Processing and Optimization-SUM25.pdf
PPTX
Query optimization
PDF
Cost-Based Optimizer in Apache Spark 2.2
PDF
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENT
PPT
Query optimization
PDF
Query optimization in oodbms identifying subquery for query management
PPTX
CS 542 -- Query Execution
PDF
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
DOCX
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
PDF
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
PDF
Cost-Based query optimization
PDF
Cost-based Query Optimization
unit 3 DBMS.docx.pdf geometric transformer in query processing
unit 3 DBMS.docx.pdf geometry in query p
CH5_Query Processing and Optimization.pdf
Issues in Query Processing and Optimization
CS 542 -- Query Optimization
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
dd presentation.pdf
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
01Query Processing and Optimization-SUM25.pdf
Query optimization
Cost-Based Optimizer in Apache Spark 2.2
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENT
Query optimization
Query optimization in oodbms identifying subquery for query management
CS 542 -- Query Execution
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-Based query optimization
Cost-based Query Optimization
Ad

More from Yuto Hayamizu (7)

PDF
VLDB2015 会議報告
PDF
The Art of Performance Evaluation
PDF
Fine-grained Measurement of PostgreSQL
PDF
指の話
PDF
Emacs Lispのある生活
PDF
いいかげんな人のためのTransactional Memory Primer
PDF
C-x C-f のなかのひと
VLDB2015 会議報告
The Art of Performance Evaluation
Fine-grained Measurement of PostgreSQL
指の話
Emacs Lispのある生活
いいかげんな人のためのTransactional Memory Primer
C-x C-f のなかのひと

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Electronic commerce courselecture one. Pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Cloud computing and distributed systems.
PDF
KodekX | Application Modernization Development
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Encapsulation theory and applications.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Electronic commerce courselecture one. Pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Cloud computing and distributed systems.
KodekX | Application Modernization Development
Empathic Computing: Creating Shared Understanding
Understanding_Digital_Forensics_Presentation.pptx
Machine learning based COVID-19 study performance prediction
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Encapsulation theory and applications.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Advanced methodologies resolving dimensionality complications for autism neur...
Unlocking AI with Model Context Protocol (MCP)
NewMind AI Weekly Chronicles - August'25 Week I
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Per capita expenditure prediction using model stacking based on satellite ima...
MYSQL Presentation for SQL database connectivity
Mobile App Security Testing_ A Comprehensive Guide.pdf
Chapter 3 Spatial Domain Image Processing.pdf

Beyond EXPLAIN: Query Optimization From Theory To Code

  • 1. Beyond EXPLAIN Yuto Hayamizu Ryoji Kawamichi Query Optimization From Theory To Code 2016/5/20 PGCon 2016 @ Ottawa
  • 2. Before Relational … • Querying was physical • Need to understand physical organization • Navigate query execution by yourself 2016/5/20 2 DEPT EMP PROJ “Which file is this table stored in?” “How are records linked?” “Which access path is fast for this table?” “What is the best order of joining tables” … Historically …
  • 3. Before Relational … • Querying was physical • Need to understand physical organization • Navigate query execution by yourself 2016/5/20 3 After Relational … DEPT EMP PROJ “Which file is this table stored in?” “How are records linked?” “Which access path is fast for this table?” “What is the best order of joining tables” … • Querying is logical • Physical organization is black-boxed • Just declare what you want Historically …
  • 4. Fill the Gap: Physical and Logical 2016/5/20 4 SELECT * FROM DEPT D, EMP E WHERE E.D_ID = D.ID AND ... • Storage I/O strategy • Access path selection • Join method selection • Aggregation, sorting • Resource allocation • ... Query Optimizer
  • 5. If optimizer perfectly fills the gap... 2016/5/20 5 We don’t need EXPLAIN
  • 6. Reality Is Tough • Optimizer is NOT PERFECT • Generated plans are not always optimal, sometimes far from optimal • We have to take care of physical behavior • That’s why EXPLAIN is so much explained 2016/5/20 6
  • 7. Go Beyond EXPLAIN • Deeper understanding of optimization, better control of your databases • Theoretical fundamentals of query optimization • From basic framework to cutting-edge technologies • PostgreSQL Optimizer implementation • Focusing on basic scan and join methods • Behavior observation with TPC-H benchmark 2016/5/20 7
  • 8. Outline • Introduction • Theory: Query Optimization Framework • Code: PostgreSQL Optimizer • Theory: Cutting-Edge Technologies Overview • Summary 2016/5/20 8
  • 9. Query Optimization Framework • Cost-based optimization • Plan selection with estimated execution cost • Most of modern optimizers, including PostgreSQL, are cost-based • Rule-based optimization • Plan selection with heuristically ranked rules • Easy to produce the same result • Hard to evaluate wide variety of plans • Ex) Oracle (~10g), Hive (~0.13) 2016/5/20 9
  • 10. Main Challenges in Cost-based Optimization • Cost modeling is HARD • Overhead of CPU, I/O, memory access, network, … • Cardinality estimation is HARD • Output size of scans, joins, aggregations, … • Join ordering search is HARD • Combinatorial explosion of join ordering and access path • Exhaustive search is NP-hard 2016/5/20 10
  • 11. System-R optimizer (1979) • “The standard” • Cost estimation with I/O and CPU • Cardinality estimation with table statistics • Bottom-up plan search • Many of modern optimizers are “System-R style” • PostgreSQL, MySQL, DB2, Oracle, ... 2016/5/20 11
  • 12. Cost/Cardinality Estimation • [#page fetched],[#storage API calls] are estimated with cost formula and following statistics 2016/5/20 12 CPU costI/O cost COST = [#page fetched] + W * [#storage API calls] weight parameter • NCARD(T) ... the cardinality of relation T • TCARD(T) ... the number of pages in relation T • ICARD(I) ... the number of distinct keys in index I • NINDX(I) ... the number of pages in index I
  • 13. Bottom-up Plan Search • Candidate plans for single relation • The cheapest access path • N-relation join ordering search • Select the cheapest plans for each relation • Then, find optimal join orderings of every 2-relation join • Then, find optimal join orderings of every 3-relation join • ... until N-relation 2016/5/20 13
  • 14. Ex) A ⨝ B ⨝ C ⨝ D
  • 15. Ex) A ⨝ B ⨝ C ⨝ D
  • 16. Ex) A ⨝ B ⨝ C ⨝ D
  • 17. Ex) A ⨝ B ⨝ C ⨝ D
  • 18. Ex) A ⨝ B ⨝ C ⨝ D
  • 19. Volcano/Cascades (1993) • Top-down transformational plan search • Yet another optimization approach • Not well known as “System-R style”, but widely used in practice Ex) SQL Server, Apache Hive (Apache Calcite), Greenplum Orca • Extensible optimization framework 2016/5/20 19
  • 20. Extensible Optimization Framework Query Optimizer Generator • Generalized expression of query plan not limited to relational data model • Users (optimizer developers) defines actual implementations: • Logical operator ... corresponds to relational algebra • Physical algorithm ... corresponds to scan & join methods such as sequential scan, index scan, hash join, nested loop join 2016/5/20 20
  • 21. Top-down Transformational Search • Starts from an initial “logical plan” • Generate alternative plans with: A) Logical operator transformation B) Physical algorithm selection C) Enforcing sorting order 2016/5/20 21 Join Select T Join Select R Select S Proj Join Select T JoinSelect R Select S Proj Change join ordering Join Select T Join Select R Select SProj Projection push down Example: 3-way join with projection
  • 22. Top-down Transformational Search • Starts from an initial “logical plan” • Generate alternative plans with: A) Logical operator transformation B) Physical algorithm selection C) Enforcing sorting order 2016/5/20 22 Join Select T Join Select R Select S ProjExample: 3-way join with projection HashJoin Select T Join SeqScan R SeqScan S Proj Join IdxScan T Join Select R Select S Proj …
  • 23. Top-down Transformational Search • Starts from an initial “logical plan” • Generate alternative plans with: A) Logical operator transformation B) Physical algorithm selection C) Enforcing sorting order 2016/5/20 23 Join Select T Join Select R Select S Proj Example: 3-way join with projection Join Select T Join Sort Sort Proj Select R Select S Enforce sorting order merge join of R and S is possible now
  • 24. Benefits of Top-down approach • Possible to intentionally limit search space • Effective pruning with branch-and-bound • Limit search space with search time deadline 2016/5/20 24
  • 25. Cost-based Optimization Basics Two major cost-based optimization style • System-R • Cost modeling with statistics • Bottom-up search • Volcano/Cascades • Extensible optimizer generator • Cost estimation is user’s responsibility • Top-down transformational search 2016/5/20 25
  • 26. Outline • Introduction • Theory: Query Optimization Framework • Code: PostgreSQL Optimizer • Theory: Cutting-Edge Technologies Overview • Summary 2016/5/20 26
  • 27. PostgreSQL Optimizer “System-R style” optimization • Bottom-up plan search with dynamic programming • CPU and I/O operation based cost modeling 2016/5/20 27 Seq. I/O Random I/O CPU cost per tuple Cost of single operation • seq_page_cost • random_page_cost • cpu_tuple_cost • cpu_index_tuple_cost • cpu_operator_cost • (parallel_tuple_cost) Estimated number of each operation • Cardinality estimation with statistics • Cost formula for each plan type • SeqScan, IndexScan • NestLoopJoin, HashJoin, MergeJoin, ...
  • 28. Detailed Look At Basic Scan Types • Sequential scan • Efficient for accessing large potion of tables • Index scan • Efficient for accessing a fraction of data 2016/5/20 28 Execution cost Query selectivity Sequential scan
  • 29. of SeqScan 29 = (# pages in a table) = (# tuples in a table) = #qual_operator = (#tuples) × (weight factor of A) + (#tuples) × (weight factor of B) + ・・・ ・・・ WHERE AND AND ・・・A B cost_seqscan() @optimizer/path/costsize.c
  • 30. of IndexScan Consists of: (A) CPU cost of searching B+-tree (B) CPU cost of scanning index tuples in leaf pages (C) I/O cost of leaf pages (D) I/O cost of heap pages (E) CPU cost of scanning heap tuples 2014/12/04 30
  • 31. of IndexScan (A) B+-tree search (B) Scanning index tuples in leaf pages 2014/12/04 31 += log2(#index_tuples) I/O cost of internal pages Assumed to be always cached in the buffer += #qual_operator × #leaf_pages ✕ #ituple_per_page × σ Selectivity σ Comes from statistics
  • 32. Mackert and Lohman function(Yao function) I/O count estimation with consideration of buffer caching of IndexScan (C) I/O cost of index leaf pages 2014/12/04 32 += Y(effective_cache_size, #leaf_pages) Selectivity σ I/Ocount
  • 33. of IndexScan (D) I/O cost of heap pages (E) CPU cost of scanning heap tuples ・ Estimate the number of scanned tuples from σ 2014/12/04 33 += α2 × #match_pages Correlation between index and heap ordering: α α = 0 : I/O pattern is random α = 1 : I/O pattern is sequential += (1-α2) × #match_tuples
  • 34. Detailed Look At Join Methods • Hash join • Efficient for joining large number of records • Usually combined with sequential scans • Nested Loop Join • Efficient for joining small number of records • Usually combined with index scans or small table sequential scans 2016/5/20 34
  • 36. of HashJoin Build phase • Cost += Cost(inner) 2014/12/04 36 += #qual_op × #inner_tuples += #inner_tuples Hashing cost
  • 37. of HashJoin 2014/12/04 37 Build phase • Cost += Cost(inner) += #qual_op × #inner_tuples += #inner_tuples
  • 38. of HashJoin 2014/12/04 38 Build phase • Cost += Cost(inner)+ Probe phase • Cost += Cost(outer)+ += #qual_op × #inner_tuples += #inner_tuples += #qual_op × (1 + #bucket_size × 0.5) × #outer_tuples += #match_tuples Hashing & table lookup (bucket search) cost
  • 39. recordrecordrecordrecordrecordrecordrecordrecordrecordrecordrecordrecordrecordrecordrecordtuple of HashJoin 2014/12/04 39 #buckets: 2 #buckets : 4 build 4 tuples are compared for lookup in average 2 tuples are compared for lookup in average 0.E+00 5.E+06 1.E+07 2.E+07 2.E+07 3.E+07 3.E+07 4.E+07 4.E+07 10000 100000 1000000 10000000 100000000 Estimated cost of 2-way HashJoin # of records 16 records recordrecordrecordrecordrecordrecordrecordtuple recordrecordrecordrecordrecordrecordrecordtuple recordrecordrecordtuple recordrecordrecordtuple recordrecordrecordtuple recordrecordrecordtuple
  • 40. of NestLoopJoin 2014/12/04 40 R ⨝ S Scan R r2 r1 r3 r4 Scan S with r1 s1 s2 s3 ReScan S with r2 s1 s2 s3 outer inner
  • 41. of NestLoopJoin • When #outer_tuples = 1 2014/12/04 41 R ⨝ S Scan R r1 Scan S with r1 s1 s2 s3 outer inner Cost = Cost(outer) + Cost(inner) + += #inner_tuples += #qual_operator × #inner_tuples
  • 42. of NestLoopJoin • When #outer_tuples > 1 R ⨝ S Scan R r2 r1 r3 r4 Scan S with r1 s1 s2 s3 ReScan S with r2 s1 s2 s3 outer inner Cost = Cost(outer) + Cost(inner) + + (#outer_tuples - 1) × Cost(ReScan inner) Higher buffer hit ratio in ReScan → Cost of ReScan is lower than cost of IndexScan += #inner_tuples × #outer_tuples += #qual_operator × #inner_tuples × #outer_tuples
  • 43. See How It Works • TPC-H Benchmark • Specification and tools for benchmarking data warehouse workload • Open source implementation: DBT-3, pg_tpch • Schema, data generation rules and queries • Experiments with 100GB • Scale Factor = 100 2016/5/20 43
  • 44. Experimental Setup • Dell R720xd • Xeon (2sockets, 16cores) • x24 NL-SAS HDD • With PostgreSQL 9.5 • Default cost parameter settings • SeqScan & HashJoin • enable_seqscan = on, enable_hashjoin = on and disables other methods • IndexScan & NestLoopJoin • enable_indexscan = on, enable_nestloop = on and disables other methods 2016/5/20 44
  • 45. TPC-H Q.1: The Simplest Case 2016/5/20 45 5.E+05 5.E+06 5.E+07 5.E+08 1 10 100 1000 Estimated cost Selectivity (l_shipdate) IndexScan SeqScan 10 100 1000 10000 1 10 100 1000 Execution time (sec) Selectivity(l_shipdate) IndexScan SeqScan • Good trend estimation for each method • Estimated break-event point is errorneus • IndexScan should be more expensive (need parameter calibration) SELECT count(*), ... FROM lineitem WHERE l_shipdate BETWEEN [X] AND [Y]
  • 46. TPC-H Q.3 2016/5/20 46 Estimated cost SeqScan customer SeqScan orders SeqScan lineitem Hash HashJoin HashJoin IndexScan orders IndexScan lineitem NestLoop NestLoop IndexScan customer Execution time (sec) Selectivity 1.E+00 1.E+02 1.E+04 1.E+06 1.E+08 1 10 100 1000 10000 100000 1000000 1 10 100 1000 10000 1 10 100 1000 10000 100000 1000000 Selectivity NestLoop+IndexScan HashJoin+SeqScan NestLoop+IndexScan HashJoin+SeqScan Similar result as in Q.1 • Good trend estimation for each • Erroneous break-event point without parameter calibration SELECT count(*), ... FROM customer, orders, lineitem WHERE c_custkey = o_custkey AND o_orderkey = l_orderkey AND c_custkey < [X] AND c_mktsegment = ‘MACHINERY’;
  • 47. 100 1000 10000 100000 1 10 100 1000 10000 More Complex Case TPC-H Q.4: Semi-Join Query • Plan selection for semi- join tend to be unstable 2016/5/20 47Selectivity 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09 1 10 100 1000 10000 HashJoin+SeqScan NestLoop+IndexScan Estimated cost Execution time (sec) HashJoin+SeqScan NestLoop+IndexScan SELECT count(*), ... FROM orders WHERE o_orderdate >= ‘1995-01-01’ AND o_orderdate < ‘1995-01-01’ + interval ‘3 month’ AND EXISTS( SELECT * FROM lineitem WHERE l_orderkey = o_orderkey AND l_commitdate < l_receiptdate) Selectivity
  • 48. 1.E+06 1.E+07 1.E+08 1.E+09 1.E+10 1.E+11 1.E+00 1.E+02 1.E+04 1.E+06 1.E+08 More Complex Case TPC-H Q.22: Anti-Join Query • Difficulties in overall cost trend estimation 2016/5/20 48 Selectivity 1 10 100 1000 10000 1.E+00 1.E+02 1.E+04 1.E+06 1.E+08 Estimated cost Execution time (sec) Selectivity HashJoin+SeqScan NestLoop+IndexScan HashJoin+SeqScan NestLoop+IndexScan SELECT count(*), ... FROM supplier, lineitem l1, orders, nation WHERE s_suppkey = l1.l_suppkey AND o_orderkey = l1.l_orderkey AND o_orderstatus = 'F' AND l1.l_receiptdate > l1.l_commitdate AND EXISTS ( SELECT * FROM lineitem l2 WHERE l2.l_orderkey = l1.l_orderkey AND l2.l_suppkey <> l1.l_suppkey) AND NOT EXIST ( SELECT * FROM lineitem l3 WHERE l3.l_orderkey = l1.l_orderkey AND l3.l_suppkey <> l1.l_suppkey AND l3.l_receiptdate > l3.l_commitdate) AND s_nationkey = n_nationkey AND n_name = ‘JAPAN'
  • 49. Summary: PostgreSQL Optimizer • Detailed look at cost modeling of basic methods • SeqScan, IndexScan • HashJoin, NestedLoopJoin • Observation with TPC-H benchmark • Good cost trend estimation for simple join queries • Erroneous cheapest plan selection without parameter tuning • Difficulties with semi-join and anti-join queries 2016/5/20 49
  • 50. Outline • Introduction • Theory: Query Optimization Framework • Code: PostgreSQL Optimizer • Theory: Cutting-Edge Technologies Overview • Summary 2016/5/20 50
  • 51. Cutting-Edge Technologies • Traditional optimization was a “closed” problem • “Rethink the contract” ー Surajit Chaudhuri • Feedback from previous execution • Dynamic integration with execution 2016/5/20 51 cardinality estimation cost model plan space enumeration (SQL) query plan
  • 52. Mid-query Re-optimization • Detects sub-optimality of executing query plan • Query plans are annotated for later estimation improvement • Runtime statistics collection • Statistics collector probes are inserted into operators of executing query plan • Plan modification strategy • Discard current execution and re-optimize whole plan • Re-optimizer only subtree of the plan that are not started yet • Save partial execution result and generate new SQL using the result 2016/5/20 52 [N. Kabra et.al., SIGMOD’98]
  • 53. Plan Bouquet • Generate a set of plans for each selectivity range • Estimation improvement with runtime statistics collection • Evaluation with PostgreSQL 2016/5/20 53 [A. Dutt et.al., SIGMOD’14]
  • 54. Bounding Impact of Estimation Error • “Uncertainty” analysis of cost estimation • Optimality sensitivity to estimation error • Execute partially to reduce uncertainty 2016/5/20 54 [T. Neumann et.al., BTW Conf ‘13]
  • 55. Outline • Introduction • Theory: Query Optimization Framework • Code: PostgreSQL Optimizer • Theory: Cutting-Edge Technologies Overview • Summary 2016/5/20 55
  • 56. Summary • Cost-based optimization framework • System-R style bottom-up optimization • Volcano style top-down optimization • Detailed look at PostgreSQL optimizer • Cost modeling of basic scan and join method • Experiment with TPC-H benchmark • Brief overview of cutting-edge technologies 2016/5/20 56