SlideShare a Scribd company logo
Sergei Petrunia <sergey@mariadb.com>
Igor Babaev <igor@mariadb.com>
M|18, February 2018
Understanding
the Query Optimizer
Optimizations for VIEWs,
derived tables, and CTEs
3
Plan
● Earlier versions: derived table merging
● MariaDB 10.2: Condition pushdown
● MariaDB 10.3: Condition pushdown through window
functions
● MariaDB 10.3: GROUP BY splitting
4
Background – derived table merge
● “Customers and their big orders from October”
select *
from
customers,
(select *
from orders
where order_date BETWEEN '2017-10-01' and '2017-10-31'
) as OCT_ORDERS
where
OCT_ORDERS.amount > 1M and
OCT_ORDERS.customer_id = customer.customer_id
5
Naive execution
select *
from
customers,
(select *
from orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31'
) as OCT_ORDERS
where
OCT_ORDERS.amount > 1M and
OCT_ORDERS.customer_id =
customers.customer_id
orders
customers
1 – compute
oct_orders
2- do join OCT_ORDERS
amount > 1M
6
Derived table merge
select *
from
customers,
(select *
from orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31'
) as OCT_ORDERS
where
OCT_ORDERS.amount > 1M and
OCT_ORDERS.customer_id =
customers.customer_id
select *
from
customers,
orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31'
and
orders.amount > 1M and
orders.customer_id =
customers.customer_id
7
Execution after merge
customers
Join
orders
select *
from
customers,
orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31'
and
orders.amount > 1M and
orders.customer_id =
customers.customer_id
Made in October
amount > 1M
● Allows the optimizer to join customers→orders or orders→customers
● Good for optimization
8
Another use case - grouping
● Can’t merge due to GROUP BY in the child.
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
select * from OCT_TOTALS where customer_id=1
9
Execution is inefficient
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
select * from OCT_TOTALS where customer_id=1
orders
1 – compute all totals
2- get* customer=1
OCT_TOTALS
customer_id=1
Sum
( “derived_with_keys” will
build/use an index here)
10
Condition pushdown
select *
from OCT_TOTALS
where customer_id=1
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
● Can push down conditions on GROUP
BY columns
● … to filter out rows that go into groups
we dont care about
11
Condition pushdown
select *
from OCT_TOTALS
where customer_id=1
orders
1 – find customer_id=1
OCT_TOTALS,
customer_id=1
customer_id=1
Sum
● Looking only at rows you’re interested in is much more efficient
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
orders
12
Condition Pushdown into HAVING
select *
from OCT_TOTALS
where TOTAL_AMT > 1M
● Conditions that cannot be pushed
through GROUP BY will be
pushed into the HAVING clause
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
having
TOTAL_AMT > 1M
13
Condition pushdown will also push inferred conditions
select
custmer.customer_name,
TOTAL_AMT
from
customer, OCT_TOTALS
where
customer.customer_id=OCT_TOTALS.customer_id and
customer.customer_id=1
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
OCT_TOTALS.customer_id=1
14
“Split grouping for derived”
select *
from
customer, OCT_TOTALS
where
customer.customer_id=OCT_TOTALS.customer_id and
customer.customer_name IN ('Customer 1', 'Customer 2')
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
15
Execution, the old way
Sum
orders
select *
from
customer, OCT_TOTALS
where
customer.customer_id=
OCT_TOTALS.customer_id and
customer.customer_name IN ('Customer 1',
'Customer 2')
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
Customer 1
Customer 2
Customer 3
Customer 100
Customer 1
Customer 2
Customer 3
Customer 100
customer
Customer 1
Customer 2
OCT_TOTALS
● Inefficient, OCT_TOTALS
is computed for *all*
customers.
16
Split grouping execution
Sum
customer
Customer 2
Customer 2
Customer 1
Customer 100
orders
Customer 1
Customer 1
Customer 2
Sum
SumSum
● Can be used when doing join from
customer to orders
● Must have equalities for GROUP BY
columns:
OCT_TOTALS.customer_id=customer.customer_id
– This allows to select one group
● The underlying table (orders) must
have an index on the GROUP BY
column (customer_id)
– This allows to use ref access
17
Split grouping execution
● EXPLAIN shows “LATERAL DERIVED”
● @@optimizer_switch flag: split_materialization (ON by default)
● Cost-based choice whether use lateralization
select *
from
customer, OCT_TOTALS
where
customer.customer_id=
OCT_TOTALS.customer_id and
customer.customer_name IN ('Customer 1',
'Customer 2')
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
+------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+
| 1 | PRIMARY | customer | ALL | PRIMARY | NULL | NULL | NULL | 1000 | |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | 4 | customer.customer_id | 36 | |
| 2 | LATERAL DERIVED | orders | ref | customer_id | customer_id | 4 | customer.customer_id | 365 | Using where |
+------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+
18
Summary
● MariaDB 10.2: Condition pushdown for derived tables optimization
– Push a condition into derived table
– Used when derived table cannot be merged
– Biggest effect is for subqueries with GROUP BY
● MariaDB 10.3: Condition Pushdown through Window functions
● MariaDB 10.3: Lateral derived optimization
– When doing a join, can’t do condition pushdown
– So, lateral derived is used. It allows to only examine GROUP BY groups that
match other tables. Group By columns must be indexed
Window Functions Optimizations
20
Plan
● What are window functions
● Using window functions is an optimization by itself
● Condition pushdown through PARTITION BY
● Doing fewer sorts
21
Window functions basics
● Window functions are like aggregate functions,
– Each with its own GROUP BY clause
● Except that
– The groups are ordered
– The groups are not collapsed
22
Aggregate function example
select
country, sum(Population) as total
from Cities
group by country
+-----------+---------+------------+
| name | country | population |
+-----------+---------+------------+
| Berlin | DEU | 3386667 |
| Frankfurt | DEU | 643821 |
| Moscow | RUS | 8389200 |
| New York | USA | 8008278 |
| Chicago | USA | 2896016 |
| Seattle | USA | 563374 |
+-----------+---------+------------+
+---------+----------+
| country | total |
+---------+----------+
| DEU | 4030488 |
| RUS | 8389200 |
| USA | 11467668 |
+---------+----------+
23
Window function example
select
name,
rank() over (partition by country,
order by population desc)
from cities
+-----------+---------+------------+
| name | country | population |
+-----------+---------+------------+
| Berlin | DEU | 3386667 |
| Frankfurt | DEU | 643821 |
| Moscow | RUS | 8389200 |
| New York | USA | 8008278 |
| Chicago | USA | 2896016 |
| Seattle | USA | 563374 |
+-----------+---------+------------+
+-----------+------+
| name | rank |
+-----------+------+
| Berlin | 1 |
| Frankfurt | 2 |
| Moscow | 1 |
| New York | 1 |
| Chicago | 2 |
| Seattle | 3 |
+-----------+------+
24
Window function example
select
name,
rank() over (partition by country,
order by population desc)
from cities
+-----------+---------+------------+
| name | country | population |
+-----------+---------+------------+
| Berlin | DEU | 3386667 |
| Frankfurt | DEU | 643821 |
| Moscow | RUS | 8389200 |
| New York | USA | 8008278 |
| Chicago | USA | 2896016 |
| Seattle | USA | 563374 |
+-----------+---------+------------+
+-----------+------+
| name | rank |
+-----------+------+
| Berlin | 1 |
| Frankfurt | 2 |
| Moscow | 1 |
| New York | 1 |
| Chicago | 2 |
| Seattle | 3 |
+-----------+------+
25
Window function computation
● Can look at
– Current row
– Rows in the partition, ordered
● Can compute the window function
● Computing values individually would
be expensive
– O(#rows_in_partition ^ 2)
26
Window function computation
● Many functions can be computed “on
the fly”
– RANK, ROW_NUMBER
– SUM, AVG
– ...
27
Window function computation
10 $total+10
$total
● Example● Example
SELECT
SUM(amount) OVER (ORDER BY date
ROWS BETWEEN
UNBOUNDED PRECEDING
AND CURRENT ROW)
AS cur_balance
FROM
transactions
28
Compare to non-window function
$total
● Typically uses a correlated subquery
SELECT
(SELECT SUM(amount)
FROM
transactions t
WHERE
t.date <= date AND
account_id = 12345
) AS cur_balance
FROM
transactions
● N^2 complexity
29
Performance comparison
# Rows Regular SQL Window Function
100 3.72 sec 0.01 sec
500 30.04 sec 0.01 sec
1000 59.6 sec 0.02 sec
2000 1 min 59 sec 0.03 sec
4000 4 min 1 sec 0.04 sec
16000 18 min 26 sec 0.18 sec
30
MariaDB 10.3: Pushdown through Window Functions
● “Customer’s biggest orders”
create view top_three_orders as
select *
from
(
select
customer_id,
amount,
rank() over (partition by customer_id
order by amount desc
) as order_rank
from orders
) as ordered_orders
where order_rank<3
select * from top_three_orders where customer_id=1
+-------------+--------+------------+
| customer_id | amount | order_rank |
+-------------+--------+------------+
| 1 | 10000 | 1 |
| 1 | 9500 | 2 |
| 1 | 400 | 3 |
| 2 | 3200 | 1 |
| 2 | 1000 | 2 |
| 2 | 400 | 3 |
...
31
MariaDB 10.3: Pushdown through Window Functions
MariaDB 10.2, MySQL 8.0
● Compute
top_three_orders for all
customers
● select rows with
customer_id=1
select * from top_three_orders where customer_id=1
MariaDB 10.3 (and e.g. PostgreSQL)
● Only compute top_three_orders
for customer_id=1
– This can be much faster!
– Can make use of
index(customer_id)
32
Doing fewer sorts
tbl
tbl
tbl
join
sort
select
rank() over (order by incidents),
ntile(4)over (order by incidents),
rank() over (order by ...),
from
support_staff
● Each window function requires a sort
● Could avoid sorting if using an index (not supported yet)
● Identical PARTITION/ORDER BY must share the sort step
– Compatible may share the sort step (supported)
33
Window function optimzation conclusions
● Using window functions is an optimization by itself
● Condition pushdown through PARTITION BY
– This is the most important
● Fewer sorts are done.
34
Thanks!
Q & A

More Related Content

PDF
Improving MariaDB’s Query Optimizer with better selectivity estimates
PDF
JSON Support in MariaDB: News, non-news and the bigger picture
PDF
ANALYZE for Statements - MariaDB's hidden gem
PDF
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
PDF
MariaDB Optimizer - further down the rabbit hole
PDF
MariaDB 10.3 Optimizer - where does it stand
PDF
Optimizer features in recent releases of other databases
PDF
Mysqlconf2013 mariadb-cassandra-interoperability
Improving MariaDB’s Query Optimizer with better selectivity estimates
JSON Support in MariaDB: News, non-news and the bigger picture
ANALYZE for Statements - MariaDB's hidden gem
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
MariaDB Optimizer - further down the rabbit hole
MariaDB 10.3 Optimizer - where does it stand
Optimizer features in recent releases of other databases
Mysqlconf2013 mariadb-cassandra-interoperability

What's hot (20)

PDF
Query Optimizer in MariaDB 10.4
PDF
Using histograms to get better performance
PDF
Efficient Pagination Using MySQL
PDF
MariaDB Temporal Tables
PDF
Optimizer Trace Walkthrough
PDF
Lessons for the optimizer from running the TPC-DS benchmark
PDF
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
PDF
Window functions in MySQL 8.0
PDF
Window functions in MariaDB 10.2
PDF
M|18 Analytics in the Real World, Case Studies and Use Cases
PDF
0888 learning-mysql
PDF
MariaDB: Engine Independent Table Statistics, including histograms
PDF
New features-in-mariadb-and-mysql-optimizers
PDF
MariaDB 10.0 Query Optimizer
PDF
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
PDF
Introduction into MySQL Query Tuning for Dev[Op]s
PDF
Btree. Explore the heart of PostgreSQL.
PDF
MySQL 8.0 EXPLAIN ANALYZE
PDF
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
PDF
Data Love Conference - Window Functions for Database Analytics
Query Optimizer in MariaDB 10.4
Using histograms to get better performance
Efficient Pagination Using MySQL
MariaDB Temporal Tables
Optimizer Trace Walkthrough
Lessons for the optimizer from running the TPC-DS benchmark
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Window functions in MySQL 8.0
Window functions in MariaDB 10.2
M|18 Analytics in the Real World, Case Studies and Use Cases
0888 learning-mysql
MariaDB: Engine Independent Table Statistics, including histograms
New features-in-mariadb-and-mysql-optimizers
MariaDB 10.0 Query Optimizer
MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013
Introduction into MySQL Query Tuning for Dev[Op]s
Btree. Explore the heart of PostgreSQL.
MySQL 8.0 EXPLAIN ANALYZE
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
Data Love Conference - Window Functions for Database Analytics
Ad

Similar to M|18 Understanding the Query Optimizer (20)

PDF
New Query Optimizer features in MariaDB 10.3
PPTX
CBO Basics: Cardinality
PDF
Meetup Beleza na Web - Funções analíticas com SQL
PDF
Need for Speed: Mysql indexing
PDF
Advanced Query Optimizer Tuning and Analysis
PDF
Need for Speed: MySQL Indexing
DOC
Sql queries
PDF
New optimizer features in MariaDB releases before 10.12
DOCX
Oracle 12c Automatic Data Optimization (ADO) - ILM
PDF
Oracle Diagnostics : Explain Plans (Simple)
PDF
Workshop 20140522 BigQuery Implementation
PDF
BigQuery implementation
PDF
Banking Database
PPTX
What's New In MySQL 5.6
PDF
MySQL 8.0: not only good, it’s GREAT! - PHP UK 2019
PDF
Query Optimizer: further down the rabbit hole
PDF
The MySQL Query Optimizer Explained Through Optimizer Trace
PDF
Fulltext engine for non fulltext searches
PPT
Informix Warehouse Accelerator (IWA) features in version 12.1
PDF
Efficient Pagination Using MySQL
New Query Optimizer features in MariaDB 10.3
CBO Basics: Cardinality
Meetup Beleza na Web - Funções analíticas com SQL
Need for Speed: Mysql indexing
Advanced Query Optimizer Tuning and Analysis
Need for Speed: MySQL Indexing
Sql queries
New optimizer features in MariaDB releases before 10.12
Oracle 12c Automatic Data Optimization (ADO) - ILM
Oracle Diagnostics : Explain Plans (Simple)
Workshop 20140522 BigQuery Implementation
BigQuery implementation
Banking Database
What's New In MySQL 5.6
MySQL 8.0: not only good, it’s GREAT! - PHP UK 2019
Query Optimizer: further down the rabbit hole
The MySQL Query Optimizer Explained Through Optimizer Trace
Fulltext engine for non fulltext searches
Informix Warehouse Accelerator (IWA) features in version 12.1
Efficient Pagination Using MySQL
Ad

More from MariaDB plc (20)

PDF
MariaDB Berlin Roadshow Slides - 8 April 2025
PDF
MariaDB München Roadshow - 24 September, 2024
PDF
MariaDB Paris Roadshow - 19 September 2024
PDF
MariaDB Amsterdam Roadshow: 19 September, 2024
PDF
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
PDF
MariaDB Paris Workshop 2023 - Newpharma
PDF
MariaDB Paris Workshop 2023 - Cloud
PDF
MariaDB Paris Workshop 2023 - MariaDB Enterprise
PDF
MariaDB Paris Workshop 2023 - Performance Optimization
PDF
MariaDB Paris Workshop 2023 - MaxScale
PDF
MariaDB Paris Workshop 2023 - novadys presentation
PDF
MariaDB Paris Workshop 2023 - DARVA presentation
PDF
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
PDF
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
PDF
Einführung : MariaDB Tech und Business Update Hamburg 2023
PDF
Hochverfügbarkeitslösungen mit MariaDB
PDF
Die Neuheiten in MariaDB Enterprise Server
PDF
Global Data Replication with Galera for Ansell Guardian®
PDF
Introducing workload analysis
PDF
Under the hood: SkySQL monitoring
MariaDB Berlin Roadshow Slides - 8 April 2025
MariaDB München Roadshow - 24 September, 2024
MariaDB Paris Roadshow - 19 September 2024
MariaDB Amsterdam Roadshow: 19 September, 2024
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
Einführung : MariaDB Tech und Business Update Hamburg 2023
Hochverfügbarkeitslösungen mit MariaDB
Die Neuheiten in MariaDB Enterprise Server
Global Data Replication with Galera for Ansell Guardian®
Introducing workload analysis
Under the hood: SkySQL monitoring

Recently uploaded (20)

PPT
Quality review (1)_presentation of this 21
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
1_Introduction to advance data techniques.pptx
PDF
Business Analytics and business intelligence.pdf
PDF
Foundation of Data Science unit number two notes
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Mega Projects Data Mega Projects Data
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
Quality review (1)_presentation of this 21
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
1_Introduction to advance data techniques.pptx
Business Analytics and business intelligence.pdf
Foundation of Data Science unit number two notes
ISS -ESG Data flows What is ESG and HowHow
Acceptance and paychological effects of mandatory extra coach I classes.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Fluorescence-microscope_Botany_detailed content
Introduction-to-Cloud-ComputingFinal.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Business Acumen Training GuidePresentation.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Mega Projects Data Mega Projects Data
.pdf is not working space design for the following data for the following dat...
Data_Analytics_and_PowerBI_Presentation.pptx

M|18 Understanding the Query Optimizer

  • 1. Sergei Petrunia <sergey@mariadb.com> Igor Babaev <igor@mariadb.com> M|18, February 2018 Understanding the Query Optimizer
  • 3. 3 Plan ● Earlier versions: derived table merging ● MariaDB 10.2: Condition pushdown ● MariaDB 10.3: Condition pushdown through window functions ● MariaDB 10.3: GROUP BY splitting
  • 4. 4 Background – derived table merge ● “Customers and their big orders from October” select * from customers, (select * from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' ) as OCT_ORDERS where OCT_ORDERS.amount > 1M and OCT_ORDERS.customer_id = customer.customer_id
  • 5. 5 Naive execution select * from customers, (select * from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' ) as OCT_ORDERS where OCT_ORDERS.amount > 1M and OCT_ORDERS.customer_id = customers.customer_id orders customers 1 – compute oct_orders 2- do join OCT_ORDERS amount > 1M
  • 6. 6 Derived table merge select * from customers, (select * from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' ) as OCT_ORDERS where OCT_ORDERS.amount > 1M and OCT_ORDERS.customer_id = customers.customer_id select * from customers, orders where order_date BETWEEN '2017-10-01' and '2017-10-31' and orders.amount > 1M and orders.customer_id = customers.customer_id
  • 7. 7 Execution after merge customers Join orders select * from customers, orders where order_date BETWEEN '2017-10-01' and '2017-10-31' and orders.amount > 1M and orders.customer_id = customers.customer_id Made in October amount > 1M ● Allows the optimizer to join customers→orders or orders→customers ● Good for optimization
  • 8. 8 Another use case - grouping ● Can’t merge due to GROUP BY in the child. create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id select * from OCT_TOTALS where customer_id=1
  • 9. 9 Execution is inefficient create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id select * from OCT_TOTALS where customer_id=1 orders 1 – compute all totals 2- get* customer=1 OCT_TOTALS customer_id=1 Sum ( “derived_with_keys” will build/use an index here)
  • 10. 10 Condition pushdown select * from OCT_TOTALS where customer_id=1 create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id ● Can push down conditions on GROUP BY columns ● … to filter out rows that go into groups we dont care about
  • 11. 11 Condition pushdown select * from OCT_TOTALS where customer_id=1 orders 1 – find customer_id=1 OCT_TOTALS, customer_id=1 customer_id=1 Sum ● Looking only at rows you’re interested in is much more efficient create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id orders
  • 12. 12 Condition Pushdown into HAVING select * from OCT_TOTALS where TOTAL_AMT > 1M ● Conditions that cannot be pushed through GROUP BY will be pushed into the HAVING clause create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id having TOTAL_AMT > 1M
  • 13. 13 Condition pushdown will also push inferred conditions select custmer.customer_name, TOTAL_AMT from customer, OCT_TOTALS where customer.customer_id=OCT_TOTALS.customer_id and customer.customer_id=1 create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id OCT_TOTALS.customer_id=1
  • 14. 14 “Split grouping for derived” select * from customer, OCT_TOTALS where customer.customer_id=OCT_TOTALS.customer_id and customer.customer_name IN ('Customer 1', 'Customer 2') create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id
  • 15. 15 Execution, the old way Sum orders select * from customer, OCT_TOTALS where customer.customer_id= OCT_TOTALS.customer_id and customer.customer_name IN ('Customer 1', 'Customer 2') create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id Customer 1 Customer 2 Customer 3 Customer 100 Customer 1 Customer 2 Customer 3 Customer 100 customer Customer 1 Customer 2 OCT_TOTALS ● Inefficient, OCT_TOTALS is computed for *all* customers.
  • 16. 16 Split grouping execution Sum customer Customer 2 Customer 2 Customer 1 Customer 100 orders Customer 1 Customer 1 Customer 2 Sum SumSum ● Can be used when doing join from customer to orders ● Must have equalities for GROUP BY columns: OCT_TOTALS.customer_id=customer.customer_id – This allows to select one group ● The underlying table (orders) must have an index on the GROUP BY column (customer_id) – This allows to use ref access
  • 17. 17 Split grouping execution ● EXPLAIN shows “LATERAL DERIVED” ● @@optimizer_switch flag: split_materialization (ON by default) ● Cost-based choice whether use lateralization select * from customer, OCT_TOTALS where customer.customer_id= OCT_TOTALS.customer_id and customer.customer_name IN ('Customer 1', 'Customer 2') create view OCT_TOTALS as select customer_id, SUM(amount) as TOTAL_AMT from orders where order_date BETWEEN '2017-10-01' and '2017-10-31' group by customer_id +------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+ | 1 | PRIMARY | customer | ALL | PRIMARY | NULL | NULL | NULL | 1000 | | | 1 | PRIMARY | <derived2> | ref | key0 | key0 | 4 | customer.customer_id | 36 | | | 2 | LATERAL DERIVED | orders | ref | customer_id | customer_id | 4 | customer.customer_id | 365 | Using where | +------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+
  • 18. 18 Summary ● MariaDB 10.2: Condition pushdown for derived tables optimization – Push a condition into derived table – Used when derived table cannot be merged – Biggest effect is for subqueries with GROUP BY ● MariaDB 10.3: Condition Pushdown through Window functions ● MariaDB 10.3: Lateral derived optimization – When doing a join, can’t do condition pushdown – So, lateral derived is used. It allows to only examine GROUP BY groups that match other tables. Group By columns must be indexed
  • 20. 20 Plan ● What are window functions ● Using window functions is an optimization by itself ● Condition pushdown through PARTITION BY ● Doing fewer sorts
  • 21. 21 Window functions basics ● Window functions are like aggregate functions, – Each with its own GROUP BY clause ● Except that – The groups are ordered – The groups are not collapsed
  • 22. 22 Aggregate function example select country, sum(Population) as total from Cities group by country +-----------+---------+------------+ | name | country | population | +-----------+---------+------------+ | Berlin | DEU | 3386667 | | Frankfurt | DEU | 643821 | | Moscow | RUS | 8389200 | | New York | USA | 8008278 | | Chicago | USA | 2896016 | | Seattle | USA | 563374 | +-----------+---------+------------+ +---------+----------+ | country | total | +---------+----------+ | DEU | 4030488 | | RUS | 8389200 | | USA | 11467668 | +---------+----------+
  • 23. 23 Window function example select name, rank() over (partition by country, order by population desc) from cities +-----------+---------+------------+ | name | country | population | +-----------+---------+------------+ | Berlin | DEU | 3386667 | | Frankfurt | DEU | 643821 | | Moscow | RUS | 8389200 | | New York | USA | 8008278 | | Chicago | USA | 2896016 | | Seattle | USA | 563374 | +-----------+---------+------------+ +-----------+------+ | name | rank | +-----------+------+ | Berlin | 1 | | Frankfurt | 2 | | Moscow | 1 | | New York | 1 | | Chicago | 2 | | Seattle | 3 | +-----------+------+
  • 24. 24 Window function example select name, rank() over (partition by country, order by population desc) from cities +-----------+---------+------------+ | name | country | population | +-----------+---------+------------+ | Berlin | DEU | 3386667 | | Frankfurt | DEU | 643821 | | Moscow | RUS | 8389200 | | New York | USA | 8008278 | | Chicago | USA | 2896016 | | Seattle | USA | 563374 | +-----------+---------+------------+ +-----------+------+ | name | rank | +-----------+------+ | Berlin | 1 | | Frankfurt | 2 | | Moscow | 1 | | New York | 1 | | Chicago | 2 | | Seattle | 3 | +-----------+------+
  • 25. 25 Window function computation ● Can look at – Current row – Rows in the partition, ordered ● Can compute the window function ● Computing values individually would be expensive – O(#rows_in_partition ^ 2)
  • 26. 26 Window function computation ● Many functions can be computed “on the fly” – RANK, ROW_NUMBER – SUM, AVG – ...
  • 27. 27 Window function computation 10 $total+10 $total ● Example● Example SELECT SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cur_balance FROM transactions
  • 28. 28 Compare to non-window function $total ● Typically uses a correlated subquery SELECT (SELECT SUM(amount) FROM transactions t WHERE t.date <= date AND account_id = 12345 ) AS cur_balance FROM transactions ● N^2 complexity
  • 29. 29 Performance comparison # Rows Regular SQL Window Function 100 3.72 sec 0.01 sec 500 30.04 sec 0.01 sec 1000 59.6 sec 0.02 sec 2000 1 min 59 sec 0.03 sec 4000 4 min 1 sec 0.04 sec 16000 18 min 26 sec 0.18 sec
  • 30. 30 MariaDB 10.3: Pushdown through Window Functions ● “Customer’s biggest orders” create view top_three_orders as select * from ( select customer_id, amount, rank() over (partition by customer_id order by amount desc ) as order_rank from orders ) as ordered_orders where order_rank<3 select * from top_three_orders where customer_id=1 +-------------+--------+------------+ | customer_id | amount | order_rank | +-------------+--------+------------+ | 1 | 10000 | 1 | | 1 | 9500 | 2 | | 1 | 400 | 3 | | 2 | 3200 | 1 | | 2 | 1000 | 2 | | 2 | 400 | 3 | ...
  • 31. 31 MariaDB 10.3: Pushdown through Window Functions MariaDB 10.2, MySQL 8.0 ● Compute top_three_orders for all customers ● select rows with customer_id=1 select * from top_three_orders where customer_id=1 MariaDB 10.3 (and e.g. PostgreSQL) ● Only compute top_three_orders for customer_id=1 – This can be much faster! – Can make use of index(customer_id)
  • 32. 32 Doing fewer sorts tbl tbl tbl join sort select rank() over (order by incidents), ntile(4)over (order by incidents), rank() over (order by ...), from support_staff ● Each window function requires a sort ● Could avoid sorting if using an index (not supported yet) ● Identical PARTITION/ORDER BY must share the sort step – Compatible may share the sort step (supported)
  • 33. 33 Window function optimzation conclusions ● Using window functions is an optimization by itself ● Condition pushdown through PARTITION BY – This is the most important ● Fewer sorts are done.