SlideShare a Scribd company logo
Robust Access Path Selection without
Cardinality Estimation
Presented by Kenan Yao
PingCAP.com
Problem to Solve
● access path selection
● classical approach
○ predicate push down
○ ranger extracts access conditions
○ selectivity estimation / statistics
○ cost model
○ independent / uniform assumptions
Transaction
PingCAP.com
Drawbacks
● drawbacks of classical approach
○ heavily depends on statistics
○ outdated statistics / expensive to maintain
○ naive cost model
○ assumptions not hold
○ wrong choices / not robust
PingCAP.com
Proposal
● a proposal to dodge those drawbacks
○ new scan operator in executor
○ observe and behave accordingly at runtime
○ relieve planner from access path selection
○ robust by eschewing statistics / cardinality estimation
PingCAP.com
Modeling the Problem
● page caches not considered
● double read
● sensitive / not robust
Transaction
PingCAP.com
Alternative Approaches
● optimizer level: not robust enough
● runtime reoptimization
○ start from index scan
○ switch to table scan if scanned rows exceeds
tipping point
○ binary switch
○ bound worst case
○ risk area
○ not robust
Transaction
PingCAP.com
Smooth Scan
● rough idea
○ executor operator / runtime
○ start from index scan
○ morph between index scan and table scan
○ morph continuously and adaptively
○ trade off CPU and memory for I/O reduction
PingCAP.com
Storage Model
● PostgreSQL style: heap table / B+-tree index
● access path types
○ table scan / index scan
○ bitmap scan
■ reduce random access / cache miss
■ execution model: pipeline breaker
■ order property lost
Transaction
PingCAP.com
Smooth Scan Details
● targeted behavior
○ near-optimal
○ bullet proof of the estimation errors
○ no performance cliff / robust
Transaction
PingCAP.com
Morphing Mechanism
● start from simple index scan
○ monitor retrieved row count
● start morphing if selectivity exceeds threshold
○ probe entire heap page
○ fetch and probe adjacent heap pages if selectivity keeps
increasing
■ start from extra one page
■ morphing region size increases exponentially
PingCAP.com
Correctness
● no tuple missed
○ driven by index scan
● no duplicate tuple
○ bookkeeping
○ CPU / memory cost
PingCAP.com
Morphing Policy
● greedy policy
○ fast convergence for high selectivity
○ unnecessary overhead for low selectivity
● selectivity driven policy
○ monitor local and global selectivity
○ selectivity computed in page level
○ keep or increase morphing region size
● elastic policy
○ skewed data distribution
○ double morphing region size for dense region
○ halve morphing region size for sparse region
PingCAP.com
Threshold
● optimizer driven
○ retrieved row count exceeds optimizer’s estimate
● eager approach
○ start morphing from first tuple
● SLA driven
PingCAP.com
Implementation
● integrated with PostgreSQL
○ page ID cache
○ tuple ID cache
○ result cache to respect order property
■ hash-based data structure
■ store additional tuples found
■ check result cache before index probe
■ pipeline breaker to some extent
■ spill to disk when short of memory
PingCAP.com
Evaluation
● TPC-H and synthetic datasets
● clear database and OS caches
Any Questions ?
Thank You!

More Related Content

PDF
Cassandra summit keynote 2014
PDF
Indexing Strategies for Oracle Databases - Beyond the Create Index Statement
PDF
Postgres can do THAT?
PPTX
Qure Tech Presentation
PDF
Shaping Optimizer's Search Space
PDF
Indexes overview
PDF
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
PDF
Steam Learn: Introduction to RDBMS indexes
Cassandra summit keynote 2014
Indexing Strategies for Oracle Databases - Beyond the Create Index Statement
Postgres can do THAT?
Qure Tech Presentation
Shaping Optimizer's Search Space
Indexes overview
SF Big Analytics: Introduction to Succinct by UC Berkeley AmpLab
Steam Learn: Introduction to RDBMS indexes

Similar to Paper Reading: Smooth Scan (20)

PPTX
Opti̇mi̇zi̇ng Data Access
PDF
Inmemory db nick kabra june 2013 discussion at columbia university
PDF
Scaling MySQL Strategies for Developers
PPT
Column-vs-Row-how-different-are-they.ppt
PPTX
CS 542 -- Query Execution
PPTX
Embarcadero In Search of Plan Stability Part 1 Webinar Slides
PPTX
Introduction to oracle optimizer
PDF
OLAP Indexes and Algorithms CMU Advanced Databases
PDF
01Query Processing and Optimization-SUM25.pdf
PDF
Three steps to untangle data traffic jams
PPTX
SQL Explore 2012: P&T Part 2
KEY
10x improvement-mysql-100419105218-phpapp02
KEY
10x Performance Improvements
PDF
10 sql tips
PPT
Sydney Oracle Meetup - access paths
PDF
query optimization
PDF
Understanding indexing-webinar-deck
PDF
SQL Server 2014 In-Memory Tables (XTP, Hekaton)
PPTX
Join operation
PPTX
File Organization in database management.pptx
Opti̇mi̇zi̇ng Data Access
Inmemory db nick kabra june 2013 discussion at columbia university
Scaling MySQL Strategies for Developers
Column-vs-Row-how-different-are-they.ppt
CS 542 -- Query Execution
Embarcadero In Search of Plan Stability Part 1 Webinar Slides
Introduction to oracle optimizer
OLAP Indexes and Algorithms CMU Advanced Databases
01Query Processing and Optimization-SUM25.pdf
Three steps to untangle data traffic jams
SQL Explore 2012: P&T Part 2
10x improvement-mysql-100419105218-phpapp02
10x Performance Improvements
10 sql tips
Sydney Oracle Meetup - access paths
query optimization
Understanding indexing-webinar-deck
SQL Server 2014 In-Memory Tables (XTP, Hekaton)
Join operation
File Organization in database management.pptx
Ad

More from PingCAP (20)

PPTX
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
PDF
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
PPTX
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
PPTX
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
PPTX
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
PPTX
[Paper Reading] QAGen: Generating query-aware test databases
PDF
[Paper Reading] Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
PDF
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
PDF
[Paperreading] Paxos made easy (by sen han)
PPTX
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
PDF
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
PDF
The Dark Side Of Go -- Go runtime related problems in TiDB in production
PDF
TiDB DevCon 2020 Opening Keynote
PDF
Finding Logic Bugs in Database Management Systems
PDF
Chaos Practice in PingCAP
PDF
TiDB at PayPay
PPTX
Paper Reading: FPTree
PPTX
Paper Reading: Flexible Paxos
PPTX
Paper reading: Cost-based Query Transformation in Oracle
PPTX
Paper reading: HashKV and beyond
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
[Paper Reading] QAGen: Generating query-aware test databases
[Paper Reading] Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paperreading] Paxos made easy (by sen han)
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
The Dark Side Of Go -- Go runtime related problems in TiDB in production
TiDB DevCon 2020 Opening Keynote
Finding Logic Bugs in Database Management Systems
Chaos Practice in PingCAP
TiDB at PayPay
Paper Reading: FPTree
Paper Reading: Flexible Paxos
Paper reading: Cost-based Query Transformation in Oracle
Paper reading: HashKV and beyond
Ad

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Electronic commerce courselecture one. Pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
cuic standard and advanced reporting.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Empathic Computing: Creating Shared Understanding
PPTX
1. Introduction to Computer Programming.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
A Presentation on Artificial Intelligence
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Building Integrated photovoltaic BIPV_UPV.pdf
Tartificialntelligence_presentation.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
“AI and Expert System Decision Support & Business Intelligence Systems”
20250228 LYD VKU AI Blended-Learning.pptx
A comparative analysis of optical character recognition models for extracting...
Electronic commerce courselecture one. Pdf
MYSQL Presentation for SQL database connectivity
cuic standard and advanced reporting.pdf
Big Data Technologies - Introduction.pptx
Encapsulation theory and applications.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Empathic Computing: Creating Shared Understanding
1. Introduction to Computer Programming.pptx
NewMind AI Weekly Chronicles - August'25-Week II
A Presentation on Artificial Intelligence

Paper Reading: Smooth Scan

  • 1. Robust Access Path Selection without Cardinality Estimation Presented by Kenan Yao
  • 2. PingCAP.com Problem to Solve ● access path selection ● classical approach ○ predicate push down ○ ranger extracts access conditions ○ selectivity estimation / statistics ○ cost model ○ independent / uniform assumptions Transaction
  • 3. PingCAP.com Drawbacks ● drawbacks of classical approach ○ heavily depends on statistics ○ outdated statistics / expensive to maintain ○ naive cost model ○ assumptions not hold ○ wrong choices / not robust
  • 4. PingCAP.com Proposal ● a proposal to dodge those drawbacks ○ new scan operator in executor ○ observe and behave accordingly at runtime ○ relieve planner from access path selection ○ robust by eschewing statistics / cardinality estimation
  • 5. PingCAP.com Modeling the Problem ● page caches not considered ● double read ● sensitive / not robust Transaction
  • 6. PingCAP.com Alternative Approaches ● optimizer level: not robust enough ● runtime reoptimization ○ start from index scan ○ switch to table scan if scanned rows exceeds tipping point ○ binary switch ○ bound worst case ○ risk area ○ not robust Transaction
  • 7. PingCAP.com Smooth Scan ● rough idea ○ executor operator / runtime ○ start from index scan ○ morph between index scan and table scan ○ morph continuously and adaptively ○ trade off CPU and memory for I/O reduction
  • 8. PingCAP.com Storage Model ● PostgreSQL style: heap table / B+-tree index ● access path types ○ table scan / index scan ○ bitmap scan ■ reduce random access / cache miss ■ execution model: pipeline breaker ■ order property lost Transaction
  • 9. PingCAP.com Smooth Scan Details ● targeted behavior ○ near-optimal ○ bullet proof of the estimation errors ○ no performance cliff / robust Transaction
  • 10. PingCAP.com Morphing Mechanism ● start from simple index scan ○ monitor retrieved row count ● start morphing if selectivity exceeds threshold ○ probe entire heap page ○ fetch and probe adjacent heap pages if selectivity keeps increasing ■ start from extra one page ■ morphing region size increases exponentially
  • 11. PingCAP.com Correctness ● no tuple missed ○ driven by index scan ● no duplicate tuple ○ bookkeeping ○ CPU / memory cost
  • 12. PingCAP.com Morphing Policy ● greedy policy ○ fast convergence for high selectivity ○ unnecessary overhead for low selectivity ● selectivity driven policy ○ monitor local and global selectivity ○ selectivity computed in page level ○ keep or increase morphing region size ● elastic policy ○ skewed data distribution ○ double morphing region size for dense region ○ halve morphing region size for sparse region
  • 13. PingCAP.com Threshold ● optimizer driven ○ retrieved row count exceeds optimizer’s estimate ● eager approach ○ start morphing from first tuple ● SLA driven
  • 14. PingCAP.com Implementation ● integrated with PostgreSQL ○ page ID cache ○ tuple ID cache ○ result cache to respect order property ■ hash-based data structure ■ store additional tuples found ■ check result cache before index probe ■ pipeline breaker to some extent ■ spill to disk when short of memory
  • 15. PingCAP.com Evaluation ● TPC-H and synthetic datasets ● clear database and OS caches

Editor's Notes

  • #3: 也就是平时说的表扫还是 index 扫描
  • #9: 不像 MySQL 用的 clustered index 存表
  • #13: trigger 之后的策略
  • #15: 作者认为 spill 是顺序读写,代价会比 random access 低
  • #16: 实验部分太长了,总体来说是在 selectivity 低的时候提升比较多,Q6 是因为 dense region ,Q7 是因为优化器估算 sel 偏低,Q14 应该是两种原因都有