SlideShare a Scribd company logo
Histogram Support in MySQL 8.0
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Histogram Support in MySQL 8.0
Øystein Grøvlen
Senior Principal Software Engineer
MySQL Optimizer Team, Oracle
February 2018
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How are histograms used?
Query example
Some advice
1
2
3
4
5
3
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How are histograms used?
Query example
Some advice
1
2
3
4
5
4
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Motivating Example
EXPLAIN SELECT *
FROM orders JOIN customer ON o_custkey = c_custkey
WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000;
5
JOIN Query
id
select
type
table type possible keys key
key
len
ref rows filtered extra
1 SIMPLE orders ALL
i_o_orderdate,
i_o_custkey
NULL NULL NULL 15000000 31.19
Using
where
1 SIMPLE customer
eq_
ref
PRIMARY PRIMARY 4
dbt3.orders.
o_custkey
1 33.33
Using
where
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Motivating Example
EXPLAIN SELECT /*+ JOIN_ORDER(customer, orders) */ *
FROM orders JOIN customer ON o_custkey = c_custkey
WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000;
6
Reverse join order
id
select
type
table type possible keys key
key
len
ref rows filtered extra
1 SIMPLE customer ALL PRIMARY NULL NULL NULL 1500000 33.33
Using
where
1 SIMPLE orders ref
i_o_orderdate,
i_o_custkey
i_o_custkey 5
dbt3.
customer.
c_custkey
15 31.19
Using
where
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Comparing Join Order
0
2
4
6
8
10
12
14
16
QueryExecutionTime(seconds)
orders → customer customer → orders
Performance
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Histograms
ANALYZE TABLE customer UPDATE HISTOGRAM ON c_acctbal WITH 1024 BUCKETS;
EXPLAIN SELECT *
FROM orders JOIN customer ON o_custkey = c_custkey
WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000;
8
Create histogram to get a better plan
id
select
type
table type possible keys key
key
len
ref rows filtered extra
1 SIMPLE customer ALL PRIMARY NULL NULL NULL 1500000 0.00
Using
where
1 SIMPLE orders ref
i_o_orderdate,
i_o_custkey
i_o_custkey 5
dbt3.
customer.
c_custkey
15 31.19
Using
where
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How are histograms used?
Query example
Some advice
1
2
3
4
5
9
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Histograms
• Information about value distribution for a column
• Data values group in buckets
– Frequency calculated for each bucket
– Maximum 1024 buckets
• May use sampling to build histogram
– Sample rate depends on available memory
• Automatically chooses between two histogram types:
– Singleton: One value per bucket
– Equi-height: Multiple values per bucket
10
Column statistics
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Singleton Histogram
0
0,05
0,1
0,15
0,2
0,25
0 1 2 3 5 6 7 8 9 10
Frequency
• One value per bucket
• Each bucket stores:
– Value
– Cumulative frequency
• Well suited to estimate both
equality and range predicates
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Equi-Height Histogram
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0 - 0 1 - 1 2 - 3 5 - 6 7 - 10
Frequency
• Multiple values per bucket
• Not quite equi-height
– Values are not split across buckets
⇒Frequent values in separate buckets
• Each bucket stores:
– Minimum value
– Maximum value
– Cumulative frequency
– Number of distinct values
• Best suited for range predicates
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Usage
• Create or refresh histogram(s) for column(s):
ANALYZE TABLE table UPDATE HISTOGRAM ON column [, column] WITH n BUCKETS;
– Note: Will only update histogram, not other statistics
• Drop histogram:
ANALYZE TABLE table DROP HISTOGRAM ON column [, column];
• Based on entire table or sampling:
– Depends on avail. memory: histogram_generation_max_mem_size (default: 20 MB)
• New storage engine API for sampling
– Default implementation: Full table scan even when sampling
– Storage engines may implement more efficient sampling
13
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Storage
• Stored in a JSON column in data dictionary
• Can be inspected in Information Schema table:
SELECT JSON_PRETTY(histogram)
FROM information_schema.column_statistics
WHERE schema_name = 'dbt3_sf1'
AND table_name ='lineitem'
AND column_name = 'l_linenumber';
14
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Histogram content
{
"buckets": [[1, 0.24994938524948698], [2, 0.46421066400720523],
[3, 0.6427401784471978], [4, 0.7855470933802572],
[5, 0.8927398868395817], [6, 0.96423707532558], [7, 1] ],
"data-type": "int",
"null-values": 0.0,
"collation-id": 8,
"last-updated": "2018-02-03 21:05:21.690872",
"sampling-rate": 0.20829115437457252,
"histogram-type": "singleton",
"number-of-buckets-specified": 1024
}
15
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Strings
• Max. 42 characters considered
• Base64 encoded
SELECT FROM_BASE64(SUBSTR(v, LOCATE(':', v, 10) + 1)) value, c cumulfreq
FROM information_schema.column_statistics,
JSON_TABLE(histogram->'$.buckets', '$[*]'
COLUMNS(v VARCHAR(60) PATH '$[0]',
c double PATH '$[1]')) hist
WHERE column_name = 'o_orderstatus';
+-------+--------------------+
| value | cumulfreq |
+-------+--------------------+
| F | 0.4862529264385756 |
| O | 0.974029654577566 |
| P | 0.9999999999999999 |
+-------+--------------------+
16
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Calculate Bucket Frequency
SELECT FROM_BASE64(SUBSTR(v, LOCATE(':', v, 10) + 1)) value, c cumulfreq,
c - LAG(c, 1, 0) over () freq
FROM information_schema.column_statistics,
JSON_TABLE(histogram->'$.buckets', '$[*]'
COLUMNS(v VARCHAR(60) PATH '$[0]',
c double PATH '$[1]')) hist
WHERE column_name = 'o_orderstatus';
+-------+--------------------+----------------------+
| value | cumulfreq | freq |
+-------+--------------------+----------------------+
| F | 0.4862529264385756 | 0.4862529264385756 |
| O | 0.974029654577566 | 0.48777672813899037 |
| P | 0.9999999999999999 | 0.025970345422433927 |
+-------+--------------------+----------------------+
Use window function
17
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How are histograms used?
Query example
Some advice
1
2
3
4
5
18
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• tx JOIN tx+1
• records(tx+1) = records(tx) * condition_filter_effect * records_per_key
When are Histograms useful?
Estimate cost of join
tx tx+1
Ref
access
Number of
records read
from tx
Conditionfilter
effect
Records passing the
table conditions on tx
Cardinality statistics
for index
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Filter estimate based on what is
available:
1. Range estimate
2. Index statistics
3. Guesstimate
= 0.1
<=,<,>,>= 1/3
BETWEEN 1/9
NOT <op> 1 – SEL(<op>)
AND P(A and B) = P(A) * P(B)
OR P(A or B) = P(A) + P(B) – P(A and B)
… …
How to Calculate Condition Filter Effect, MySQL 5.7
SELECT *
FROM office JOIN employee ON office.id = employee.office_id
WHERE office_name = 'San Francisco' AND
employee.name = 'John' AND age > 21 AND
hire_date BETWEEN '2014-01-01' AND '2014-06-01';
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Filter estimate based on what is
available:
1. Range estimate
2. Index statistics
3. Histograms
4. Guesstimate
= 0.1
<=,<,>,>= 1/3
BETWEEN 1/9
NOT <op> 1 – SEL(<op>)
AND P(A and B) = P(A) * P(B)
OR P(A or B) = P(A) + P(B) – P(A and B)
… …
How to Calculate Condition Filter Effect, MySQL 5.7
SELECT *
FROM office JOIN employee ON office.id = employee.office_id
WHERE office_name = 'San Francisco' AND
employee.name = 'John' AND age > 21 AND
hire_date BETWEEN '2014-01-01' AND '2014-06-01';
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
SELECT *
FROM office JOIN employee ON office.id = employee.office_id
WHERE office_name = 'San Francisco' AND
employee.name = 'John' AND age > 21 AND
hire_date BETWEEN '2014-01-01' AND '2014-06-01';
Calculating Condition Filter Effect for Tables
Condition filter effect for tables:
– office: 0.03
– employee: 0.29 * 0.1 * 0.33 ≈ 0.01
Example without histograms
0.1
(guesstimate)
0.33
(guesstimate)
0.29
(range)
0.03
(index)
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
SELECT *
FROM office JOIN employee ON office.id = employee.office_id
WHERE office_name = 'San Francisco' AND
employee.name = 'John' AND age > 21 AND
hire_date BETWEEN '2014-01-01' AND '2014-06-01';
Calculating Condition Filter Effect for Tables
Condition filter effect for tables:
– office: 0.03
– employee: 0.29 * 0.1 * 0.95 ≈ 0.03
Example with histogram
0.1
(guesstimate)
0.95
(histogram)
0.29
(range)
0.03
(index)
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Computing Selectivity From Histogram
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
0-7
8-16
17-24
25-31
32-38
39-46
47-53
54-61
62-70
71-104
Frequency
age
Cumulative Frequency
Example
age <= 21
0.203
Selectivity = 0.203 +
0.306
(0.306 – 0.203) * 5/8 = 0.267
age > 21 Selectivity = 1 - 0.267 = 0.733
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How are histograms used?
Query example
Some advice
1
2
3
4
5
25
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
DBT-3 Query 7
SELECT supp_nation, cust_nation, l_year, SUM(volume) AS revenue
FROM (SELECT n1.n_name AS supp_nation, n2.n_name AS cust_nation,
EXTRACT(YEAR FROM l_shipdate) AS l_year,
l_extendedprice * (1 - l_discount) AS volume
FROM supplier, lineitem, orders, customer, nation n1, nation n2
WHERE s_suppkey = l_suppkey AND o_orderkey = l_orderkey
AND c_custkey = o_custkey AND s_nationkey = n1.n_nationkey
AND c_nationkey = n2.n_nationkey
AND ((n1.n_name = 'RUSSIA' AND n2.n_name = 'FRANCE')
OR (n1.n_name = 'FRANCE' AND n2.n_name = 'RUSSIA'))
AND l_shipdate BETWEEN '1995-01-01' AND '1996-12-31') AS shipping
GROUP BY supp_nation , cust_nation , l_year
ORDER BY supp_nation , cust_nation , l_year;
Volume Shipping Query
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
DBT-3 Query 7
Query plan without histogram
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
DBT-3 Query 7
Query plan with histogram
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
DBT-3 Query 7
0,0
0,2
0,4
0,6
0,8
1,0
1,2
1,4
1,6
1,8
QueryExecutionTime(seconds)
Without histogram With histogram
Performance
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Motivating example
Quick start guide
How is histograms used?
Query example
Some advice
1
2
3
4
5
30
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Some advice
• Histograms are useful for columns that are
– not the first column of any index, and
– used in WHERE conditions of
• JOIN queries
• Queries with IN-subqueries
• ORDER BY ... LIMIT queries
• Best fit
– Low cardinality columns (e.g., gender, orderStatus, dayOfWeek, enums)
– Columns with uneven distribution (skew)
– Stable distribution (do not change much over time)
Which columns to create histograms for?
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Some more advice
• When not to create histograms:
– First column of an index
– Never used in WHERE clause
– Monotonically increasing column values (e.g. date columns)
• Histogram will need frequent updates to be accurate
• Consider to create index
• How many buckets?
– If possible, enough to get a singleton histogram
– For equi-height, 100 buckets should be enough
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
More information
• MySQL Server Team blog
– http://mysqlserverteam.com/
– https://mysqlserverteam.com/histogram-statistics-in-mysql/ (Erik Frøseth)
• My blog:
– http://oysteing.blogspot.com/
• MySQL forums:
– Optimizer & Parser: http://forums.mysql.com/list.php?115
– Performance: http://forums.mysql.com/list.php?24
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The preceding is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
34
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 35
Histogram Support in MySQL 8.0

More Related Content

PDF
MySQL Optimizer: What’s New in 8.0
PDF
Common Table Expressions (CTE) & Window Functions in MySQL 8.0
PDF
MySQL 8.0: Common Table Expressions
PDF
The MySQL Query Optimizer Explained Through Optimizer Trace
PDF
MySQL 8.0: Common Table Expressions
PDF
How to analyze and tune sql queries for better performance
PDF
Using Optimizer Hints to Improve MySQL Query Performance
PDF
Ctes percona live_2017
MySQL Optimizer: What’s New in 8.0
Common Table Expressions (CTE) & Window Functions in MySQL 8.0
MySQL 8.0: Common Table Expressions
The MySQL Query Optimizer Explained Through Optimizer Trace
MySQL 8.0: Common Table Expressions
How to analyze and tune sql queries for better performance
Using Optimizer Hints to Improve MySQL Query Performance
Ctes percona live_2017

What's hot (20)

PDF
How to analyze and tune sql queries for better performance webinar
PDF
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
PDF
How to Take Advantage of Optimizer Improvements in MySQL 8.0
PDF
SQL window functions for MySQL
PDF
LATERAL Derived Tables in MySQL 8.0
PDF
Query optimization techniques for partitioned tables.
PDF
Partition and conquer large data in PostgreSQL 10
PDF
Agile Database Development with JSON
PDF
Api presentation
PPTX
New SQL features in latest MySQL releases
PDF
PostgreSQL: Advanced features in practice
PPTX
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
PPT
Array i imp
PDF
Hadoop Summit EU 2014
PPT
Explain that explain
PPT
Oracle tips and tricks
PPTX
Trie Data Structure
PPTX
Getting started with R when analysing GitHub commits
PDF
R Programming: Export/Output Data In R
PPT
R Brown-bag seminars : Seminar-8
How to analyze and tune sql queries for better performance webinar
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
How to Take Advantage of Optimizer Improvements in MySQL 8.0
SQL window functions for MySQL
LATERAL Derived Tables in MySQL 8.0
Query optimization techniques for partitioned tables.
Partition and conquer large data in PostgreSQL 10
Agile Database Development with JSON
Api presentation
New SQL features in latest MySQL releases
PostgreSQL: Advanced features in practice
Wellington APAC Groundbreakers tour - Upgrading to the 12c Optimizer
Array i imp
Hadoop Summit EU 2014
Explain that explain
Oracle tips and tricks
Trie Data Structure
Getting started with R when analysing GitHub commits
R Programming: Export/Output Data In R
R Brown-bag seminars : Seminar-8
Ad

Similar to Histogram Support in MySQL 8.0 (20)

PDF
Histograms: Pre-12c and now
PDF
How to use histograms to get better performance
PDF
Using histograms to get better performance
PDF
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
PDF
Improved histograms in MariaDB 10.8
PDF
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
PDF
Understanding histogramppt.prn
PDF
Histograms : Pre-12c and Now
PDF
MariaDB 10.3 Optimizer - where does it stand
PDF
Billion Goods in Few Categories: How Histograms Save a Life?
PDF
Histograms in MariaDB, MySQL and PostgreSQL
PPTX
Melbourne Groundbreakers Tour - Upgrading without risk
PPTX
Sangam 18 - The New Optimizer in Oracle 12c
PDF
Histograms in 12c era
PPTX
Calamities with cardinalities
PDF
Riyaj: why optimizer_hates_my_sql_2010
PDF
Optimizer Histograms: When they Help and When Do Not?
PPTX
DOCX
10053 - null is not nothing
PDF
Enhancing Spark SQL Optimizer with Reliable Statistics
Histograms: Pre-12c and now
How to use histograms to get better performance
Using histograms to get better performance
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Improved histograms in MariaDB 10.8
Cardinality Estimation through Histogram in Apache Spark 2.3 with Ron Hu and ...
Understanding histogramppt.prn
Histograms : Pre-12c and Now
MariaDB 10.3 Optimizer - where does it stand
Billion Goods in Few Categories: How Histograms Save a Life?
Histograms in MariaDB, MySQL and PostgreSQL
Melbourne Groundbreakers Tour - Upgrading without risk
Sangam 18 - The New Optimizer in Oracle 12c
Histograms in 12c era
Calamities with cardinalities
Riyaj: why optimizer_hates_my_sql_2010
Optimizer Histograms: When they Help and When Do Not?
10053 - null is not nothing
Enhancing Spark SQL Optimizer with Reliable Statistics
Ad

More from oysteing (9)

PDF
POLARDB: A database architecture for the cloud
PDF
POLARDB: A database architecture for the cloud
PDF
POLARDB for MySQL - Parallel Query
PDF
JSON_TABLE -- The best of both worlds
PDF
How to Analyze and Tune MySQL Queries for Better Performance
PDF
How to Analyze and Tune MySQL Queries for Better Performance
PDF
How to analyze and tune sql queries for better performance vts2016
PDF
How to Analyze and Tune MySQL Queries for Better Performance
PDF
How to analyze and tune sql queries for better performance percona15
POLARDB: A database architecture for the cloud
POLARDB: A database architecture for the cloud
POLARDB for MySQL - Parallel Query
JSON_TABLE -- The best of both worlds
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
How to analyze and tune sql queries for better performance vts2016
How to Analyze and Tune MySQL Queries for Better Performance
How to analyze and tune sql queries for better performance percona15

Recently uploaded (20)

PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
history of c programming in notes for students .pptx
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Nekopoi APK 2025 free lastest update
PDF
System and Network Administration Chapter 2
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
medical staffing services at VALiNTRY
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Essential Infomation Tech presentation.pptx
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Reimagine Home Health with the Power of Agentic AI​
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
CHAPTER 2 - PM Management and IT Context
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
history of c programming in notes for students .pptx
Understanding Forklifts - TECH EHS Solution
Nekopoi APK 2025 free lastest update
System and Network Administration Chapter 2
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
medical staffing services at VALiNTRY
Upgrade and Innovation Strategies for SAP ERP Customers
Which alternative to Crystal Reports is best for small or large businesses.pdf
Odoo POS Development Services by CandidRoot Solutions
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Navsoft: AI-Powered Business Solutions & Custom Software Development
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Essential Infomation Tech presentation.pptx
How to Choose the Right IT Partner for Your Business in Malaysia
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Reimagine Home Health with the Power of Agentic AI​
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf

Histogram Support in MySQL 8.0

  • 2. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Histogram Support in MySQL 8.0 Øystein Grøvlen Senior Principal Software Engineer MySQL Optimizer Team, Oracle February 2018
  • 3. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How are histograms used? Query example Some advice 1 2 3 4 5 3
  • 4. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How are histograms used? Query example Some advice 1 2 3 4 5 4
  • 5. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Motivating Example EXPLAIN SELECT * FROM orders JOIN customer ON o_custkey = c_custkey WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000; 5 JOIN Query id select type table type possible keys key key len ref rows filtered extra 1 SIMPLE orders ALL i_o_orderdate, i_o_custkey NULL NULL NULL 15000000 31.19 Using where 1 SIMPLE customer eq_ ref PRIMARY PRIMARY 4 dbt3.orders. o_custkey 1 33.33 Using where
  • 6. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Motivating Example EXPLAIN SELECT /*+ JOIN_ORDER(customer, orders) */ * FROM orders JOIN customer ON o_custkey = c_custkey WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000; 6 Reverse join order id select type table type possible keys key key len ref rows filtered extra 1 SIMPLE customer ALL PRIMARY NULL NULL NULL 1500000 33.33 Using where 1 SIMPLE orders ref i_o_orderdate, i_o_custkey i_o_custkey 5 dbt3. customer. c_custkey 15 31.19 Using where
  • 7. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Comparing Join Order 0 2 4 6 8 10 12 14 16 QueryExecutionTime(seconds) orders → customer customer → orders Performance
  • 8. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Histograms ANALYZE TABLE customer UPDATE HISTOGRAM ON c_acctbal WITH 1024 BUCKETS; EXPLAIN SELECT * FROM orders JOIN customer ON o_custkey = c_custkey WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000; 8 Create histogram to get a better plan id select type table type possible keys key key len ref rows filtered extra 1 SIMPLE customer ALL PRIMARY NULL NULL NULL 1500000 0.00 Using where 1 SIMPLE orders ref i_o_orderdate, i_o_custkey i_o_custkey 5 dbt3. customer. c_custkey 15 31.19 Using where
  • 9. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How are histograms used? Query example Some advice 1 2 3 4 5 9
  • 10. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Histograms • Information about value distribution for a column • Data values group in buckets – Frequency calculated for each bucket – Maximum 1024 buckets • May use sampling to build histogram – Sample rate depends on available memory • Automatically chooses between two histogram types: – Singleton: One value per bucket – Equi-height: Multiple values per bucket 10 Column statistics
  • 11. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Singleton Histogram 0 0,05 0,1 0,15 0,2 0,25 0 1 2 3 5 6 7 8 9 10 Frequency • One value per bucket • Each bucket stores: – Value – Cumulative frequency • Well suited to estimate both equality and range predicates
  • 12. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Equi-Height Histogram 0 0,05 0,1 0,15 0,2 0,25 0,3 0,35 0 - 0 1 - 1 2 - 3 5 - 6 7 - 10 Frequency • Multiple values per bucket • Not quite equi-height – Values are not split across buckets ⇒Frequent values in separate buckets • Each bucket stores: – Minimum value – Maximum value – Cumulative frequency – Number of distinct values • Best suited for range predicates
  • 13. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Usage • Create or refresh histogram(s) for column(s): ANALYZE TABLE table UPDATE HISTOGRAM ON column [, column] WITH n BUCKETS; – Note: Will only update histogram, not other statistics • Drop histogram: ANALYZE TABLE table DROP HISTOGRAM ON column [, column]; • Based on entire table or sampling: – Depends on avail. memory: histogram_generation_max_mem_size (default: 20 MB) • New storage engine API for sampling – Default implementation: Full table scan even when sampling – Storage engines may implement more efficient sampling 13
  • 14. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Storage • Stored in a JSON column in data dictionary • Can be inspected in Information Schema table: SELECT JSON_PRETTY(histogram) FROM information_schema.column_statistics WHERE schema_name = 'dbt3_sf1' AND table_name ='lineitem' AND column_name = 'l_linenumber'; 14
  • 15. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Histogram content { "buckets": [[1, 0.24994938524948698], [2, 0.46421066400720523], [3, 0.6427401784471978], [4, 0.7855470933802572], [5, 0.8927398868395817], [6, 0.96423707532558], [7, 1] ], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2018-02-03 21:05:21.690872", "sampling-rate": 0.20829115437457252, "histogram-type": "singleton", "number-of-buckets-specified": 1024 } 15
  • 16. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Strings • Max. 42 characters considered • Base64 encoded SELECT FROM_BASE64(SUBSTR(v, LOCATE(':', v, 10) + 1)) value, c cumulfreq FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist WHERE column_name = 'o_orderstatus'; +-------+--------------------+ | value | cumulfreq | +-------+--------------------+ | F | 0.4862529264385756 | | O | 0.974029654577566 | | P | 0.9999999999999999 | +-------+--------------------+ 16
  • 17. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Calculate Bucket Frequency SELECT FROM_BASE64(SUBSTR(v, LOCATE(':', v, 10) + 1)) value, c cumulfreq, c - LAG(c, 1, 0) over () freq FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist WHERE column_name = 'o_orderstatus'; +-------+--------------------+----------------------+ | value | cumulfreq | freq | +-------+--------------------+----------------------+ | F | 0.4862529264385756 | 0.4862529264385756 | | O | 0.974029654577566 | 0.48777672813899037 | | P | 0.9999999999999999 | 0.025970345422433927 | +-------+--------------------+----------------------+ Use window function 17
  • 18. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How are histograms used? Query example Some advice 1 2 3 4 5 18
  • 19. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | • tx JOIN tx+1 • records(tx+1) = records(tx) * condition_filter_effect * records_per_key When are Histograms useful? Estimate cost of join tx tx+1 Ref access Number of records read from tx Conditionfilter effect Records passing the table conditions on tx Cardinality statistics for index
  • 20. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Filter estimate based on what is available: 1. Range estimate 2. Index statistics 3. Guesstimate = 0.1 <=,<,>,>= 1/3 BETWEEN 1/9 NOT <op> 1 – SEL(<op>) AND P(A and B) = P(A) * P(B) OR P(A or B) = P(A) + P(B) – P(A and B) … … How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco' AND employee.name = 'John' AND age > 21 AND hire_date BETWEEN '2014-01-01' AND '2014-06-01';
  • 21. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Filter estimate based on what is available: 1. Range estimate 2. Index statistics 3. Histograms 4. Guesstimate = 0.1 <=,<,>,>= 1/3 BETWEEN 1/9 NOT <op> 1 – SEL(<op>) AND P(A and B) = P(A) * P(B) OR P(A or B) = P(A) + P(B) – P(A and B) … … How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco' AND employee.name = 'John' AND age > 21 AND hire_date BETWEEN '2014-01-01' AND '2014-06-01';
  • 22. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco' AND employee.name = 'John' AND age > 21 AND hire_date BETWEEN '2014-01-01' AND '2014-06-01'; Calculating Condition Filter Effect for Tables Condition filter effect for tables: – office: 0.03 – employee: 0.29 * 0.1 * 0.33 ≈ 0.01 Example without histograms 0.1 (guesstimate) 0.33 (guesstimate) 0.29 (range) 0.03 (index)
  • 23. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco' AND employee.name = 'John' AND age > 21 AND hire_date BETWEEN '2014-01-01' AND '2014-06-01'; Calculating Condition Filter Effect for Tables Condition filter effect for tables: – office: 0.03 – employee: 0.29 * 0.1 * 0.95 ≈ 0.03 Example with histogram 0.1 (guesstimate) 0.95 (histogram) 0.29 (range) 0.03 (index)
  • 24. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Computing Selectivity From Histogram 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 0-7 8-16 17-24 25-31 32-38 39-46 47-53 54-61 62-70 71-104 Frequency age Cumulative Frequency Example age <= 21 0.203 Selectivity = 0.203 + 0.306 (0.306 – 0.203) * 5/8 = 0.267 age > 21 Selectivity = 1 - 0.267 = 0.733
  • 25. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How are histograms used? Query example Some advice 1 2 3 4 5 25
  • 26. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | DBT-3 Query 7 SELECT supp_nation, cust_nation, l_year, SUM(volume) AS revenue FROM (SELECT n1.n_name AS supp_nation, n2.n_name AS cust_nation, EXTRACT(YEAR FROM l_shipdate) AS l_year, l_extendedprice * (1 - l_discount) AS volume FROM supplier, lineitem, orders, customer, nation n1, nation n2 WHERE s_suppkey = l_suppkey AND o_orderkey = l_orderkey AND c_custkey = o_custkey AND s_nationkey = n1.n_nationkey AND c_nationkey = n2.n_nationkey AND ((n1.n_name = 'RUSSIA' AND n2.n_name = 'FRANCE') OR (n1.n_name = 'FRANCE' AND n2.n_name = 'RUSSIA')) AND l_shipdate BETWEEN '1995-01-01' AND '1996-12-31') AS shipping GROUP BY supp_nation , cust_nation , l_year ORDER BY supp_nation , cust_nation , l_year; Volume Shipping Query
  • 27. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | DBT-3 Query 7 Query plan without histogram
  • 28. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | DBT-3 Query 7 Query plan with histogram
  • 29. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | DBT-3 Query 7 0,0 0,2 0,4 0,6 0,8 1,0 1,2 1,4 1,6 1,8 QueryExecutionTime(seconds) Without histogram With histogram Performance
  • 30. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Program Agenda Motivating example Quick start guide How is histograms used? Query example Some advice 1 2 3 4 5 30
  • 31. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Some advice • Histograms are useful for columns that are – not the first column of any index, and – used in WHERE conditions of • JOIN queries • Queries with IN-subqueries • ORDER BY ... LIMIT queries • Best fit – Low cardinality columns (e.g., gender, orderStatus, dayOfWeek, enums) – Columns with uneven distribution (skew) – Stable distribution (do not change much over time) Which columns to create histograms for?
  • 32. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Some more advice • When not to create histograms: – First column of an index – Never used in WHERE clause – Monotonically increasing column values (e.g. date columns) • Histogram will need frequent updates to be accurate • Consider to create index • How many buckets? – If possible, enough to get a singleton histogram – For equi-height, 100 buckets should be enough
  • 33. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | More information • MySQL Server Team blog – http://mysqlserverteam.com/ – https://mysqlserverteam.com/histogram-statistics-in-mysql/ (Erik Frøseth) • My blog: – http://oysteing.blogspot.com/ • MySQL forums: – Optimizer & Parser: http://forums.mysql.com/list.php?115 – Performance: http://forums.mysql.com/list.php?24
  • 34. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 34
  • 35. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 35