© Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0
The Internals of GPORCA
Optimizer
Xin Zhang (xzhang@pivotal.io)
PGConf Seattle 2017
Nov 2017
Disclaimer
This presentation contains statements relating to Pivotal’s expectations, projections, beliefs and prospects which are "forward-looking statements”
about Pivotal’s future which by their nature are uncertain. Such forward-looking statements are not guarantees of future performance, and you are
cautioned not to place undue reliance on these forward-looking statements. Actual results could differ materially from those projected in the
forward-looking statements as a result of many factors, including but not limited to: (i) adverse changes in general economic or market conditions; (ii)
delays or reductions in information technology spending; (iii) risks associated with managing the growth of Pivotal’s business, including operating
costs; (iv) changes to Pivotal’s software business model; (v) competitive factors, including pricing pressures and new product introductions; (vi)
Pivotal’s customers' ability to transition to new products and computing strategies such as cloud computing, the uncertainty of customer acceptance
of emerging technologies, and rapid technological and market changes; (vii) Pivotal's ability to protect its proprietary technology; (viii) Pivotal’s ability
to attract and retain highly qualified employees; (ix) Pivotal’s ability to execute on its plans and strategy; and (x) risks related to data and information
security vulnerabilities. All information set forth in this presentation is current as of the date of this presentation. These forward-looking statements are
based on current expectations and are subject to uncertainties and changes in condition, significance, value and effect as well as other risks disclosed
previously and from time to time in documents filed by Dell Technologies Inc., the parent company of Pivotal, with the U.S. Securities and Exchange
Commission. Dell and Pivotal assume no obligation to, and do not currently intend to, update any such forward-looking statements after the date of
this presentation.
The following is intended to outline the general direction of Pivotal's offerings. It is intended for information purposes only and may not be
incorporated into any contract. Any information regarding pre-release of Pivotal offerings, future updates or other planned modifications is subject to
ongoing evaluation by Pivotal and is subject to change. This information is provided without warranty or any kind, express or implied, and is not a
commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions regarding Pivotal's offerings.
These purchasing decisions should only be based on features currently available. The development, release, and timing of any features or
functionality described for Pivotal's offerings in this presentation remain at the sole discretion of Pivotal. Pivotal has no obligation to update
forward-looking information in this presentation.
The internals of gporca optimizer
Open Source Software (OSS)
Oct 2015, Greenplum Database Open Sourced
(based on PostgreSQL 8.2)
Sep 7th 2017, Greenplum Database 5.0 w/ GPORCA
(based on PostgreSQL 8.3)
Oct 23rd 2017, Greenplum Database 6.0 Alpha
(based on PostgreSQL 8.4)
...
...
Sep 2015, Shin joined Pivotal
Jan 2016, GPORCA Open Source V1.636
http://engineering.pivotal.io/post/gporca-open-source/
Jan 2017, GPORCA V2.0 (merge GPOS)
Sep 6th 2017, GPORCA V2.42.0
Nov 8th 2017, GPORCA V2.48.6
Greenplum (GPDB): MPP + PostgreSQL
● Shared Nothing Architecture
● Data Distributed on Cluster
● Query Processed Locally in Parallel
● Each Segment is a PostgreSQL Instance
http://greenplum.org/
MPP Query Processing Example 1/3
SELECT s.beer, s.price
FROM Bars b, Sells s
WHERE b.name = s.bar
AND b.city = 'San Francisco'
• Bars is distributed randomly → any bar on any host
• Sells is distributed by bar → sells of same bar on same host
MPP Query Processing Example 2/3
SELECT s.beer, s.price
FROM Bars b, Sells s
WHERE b.name = s.bar
AND b.city = 'San Francisco'
• Bars is distributed randomly
• Sells is distributed by bar
Slice 2
Slice 1
Slice 3
Segment N
MPP Query Processing Example 3/3
Slice 3
Slice 2
Master Slice 1
Slice 3
Slice 2
…
Segment 1
GPORCA find the best MPP plan
Outline Context
Architecture
Develop Walkthrough
Build & Test
Contribute
CONTEXT
GPORCA
Jul 2015, PQO: Pivotal Query Optimizer (Released in GPDB 4.3.5.0)
Jun 2014, ORCA: SIGMOD 2014 Paper
In source code GPOPT
src/backend/gpopt
psql: select gp_opt_version();
GPORCA = Greenplum ORCA = ...
GPORCA can optimize all
99 TPC-DS Queries
(and variations)
Reference: Orca: A Modular Query Optimizer Architecture for Big Data, SIGMOD 2014
Great SQL Surface
~60 Logical Operators
~50 Physical Operators
~40 Scalar Operators
~110 Transformation Rules
Great SQL Surface of GPORCA
Great Query Performance vs. Planner
TPC-DS 10TB, 16 nodes, 48 GB/node
Key Features vs. Planner
• Smarter partition elimination
• Multi-level partitioning support (4.3.6)
• Subquery unnesting
• Common table expressions (CTE)
• Join on castable data types
• Improved join ordering (better join order)
• Join-Aggregate reordering (push/pull AGG over join)
• Sort order optimization (avoid sort multiple times)
• Skew awareness (redistribution vs. broadcast)
ARCHITECTURE
GPORCA
GPDB Using Planner (`set optimizer=off`)
Parser Executor
GPORCA
PlannerSQL Results
Query Plan
GPDB Using GPORCA (`set optimizer=on`)
Parser Executor
GPORCA
Planner
Query Plan
SQL Results
Multi-Host with DXL (Data eXchange Language) API
GPORCA
Parser
Host System
< />
SQL
Q2DXL
DXL
Query
MD Provider
MDreq.
Catalog Executor
DXL
MD
DXL
Plan
DXL2Plan
Results
GPORCA Architecture
KERNEL
[Step 0] Pre-Process: e.g., predicates pushdown
[Step 1] Exploration: All equivalent logical plans
[Step 2] Statistics Derivation: histograms
[Step 3] Implementation: Logical to Physical
[Step 4] MPP Optimization: Enforcing and Costing
Five Optimization Steps in Orca
Step 0 Pre-Process with 25 iterations (TL, DR)
(1) Remove unused CTE anchors
(2) Remove intermediate superfluous limit
(3) Trim unnecessary existential subqueries
(4) Remove superfluous outer references from the
outer spec in limits, grouping columns, partition/order
columns in window operators
(5) Remove superfluous equality
(6) Simplify quantified subqueries
(7) Preliminary unnesting of scalar subqueries
(8) Unnest AND/OR/NOT predicates and ensure
predicates are array IN or NOT IN where applicable
(9) Infer predicates from constraints
(10) Eliminate self comparisons
(11) Remove duplicate AND/OR children
(12) Fatorize common expressions
(13) Infer filters out of components of disjunctive filters
(14) Pre-process window functions
(15) Collapse cascaded union/union all
(16) Eliminate unused computed columns
(17) Normlize expression
(18) Transform outer join into inner join
(19) Collapse cascaded inner joins
(20) Generate more predicates from constraints
(21) Eliminate empty subtrees
(22) Collapse casecade of projects
(23) Insert project for scalar subquery return outer
reference
(24) Reorder children of scalar comparison operator
Rewrite IN subquery to EXIST subquery with a
predicate
Step 1 Exploration 1/2: Memoization
Compact in-memory data structure
capturing plan space:
Group: Container of equivalent
expressions
Group Expression: Operator that has
other groups as its children
Inner Join
(T1.a = T2.b)
GROUP 0
Get(T2)Get(T1)
0: Inner Join [1,2]
GROUP 1
0: Get (T1) [ ]
GROUP 2
0: Get (T2) [ ]Logical Expression
Memo
Step 1 Exploration 2/2: Transformation
Inner Join
(T1.a = T2.b)
GROUP 0 Get(T2)Get(T1)
0: Inner Join [1,2]
GROUP 1
0: Get (T1) [ ]
GROUP 2
0: Get (T2) [ ]
Logical Expression
Memo
GROUP 0
0: Inner Join [1,2]
GROUP 1
0: Get (T1) [ ]
GROUP 2
0: Get (T2) [ ]
Memo
1: Inner Join [2,1]
Inner Join
(T1.a = T2.b)
Get(T1)Get(T2)
Logical Expression
Step 2 Statistics Derivation 1/2
1: Inner Join(a=b)
[2,1]
0: Inner Join(a=b)
[1,2]
GROUP 0
0: Get(T1) [ ]
GROUP 1
0: Get(T2) [ ]
GROUP 2
Reqd Stats = { }
Reqd Stats = {a}
Reqd Stats = {b}
Back-end SystemOrca
Step 2 Statistics Derivation 2/2
1: Inner Join(a=b)
[2,1]
0: Inner Join(a=b)
[1,2]
GROUP 0
0: Get(T1) [ ]
GROUP 1
0: Get(T2) [ ]
GROUP 2
Reqd Stats = { }
Reqd Stats = {a} MD Accessor
Reqd Stats = {b} MD Accessor
MD Provider
Catalog
Back-end SystemOrca
MDCache
Step 3 Implementation
Inner Hash Join (T1.a=T2.b) [1,2]
GROUP 1 GROUP 2
Inner Join (T1.a=T2.b) [1,2]
GROUP 1 GROUP 2
Logical Expression Physical Expression
Drvd Props: {Hashed(T1.a), < >}
Step 4 MPP Optimization 1/3 Requirement
Inner Hash Join (T1.a=T2.b) [1,2]
Reqd Props: {Singleton, <T1.a>}
GROUP 1 GROUP 2
Distribution, Order
Optimization Request
Redistribute(T2.b)Scan(T1)
Drvd Props: {Hashed(T1.a), < >}
Step 4 MPP Optimization 2/3 Costing
Inner Hash Join (T1.a=T2.b) [1,2]
Reqd Props: {Singleton, <T1.a>}
Distribution, Order
{Hashed(T1.a), Any} {Hashed(T2.b), Any}
Scan(T2)
Optimization Request
Redistribute(T2.b)Scan(T1)
Drvd Props: {Hashed(T1.a), < >}
Step 4 MPP Optimization 3/3 Enforcement
Inner Hash Join (T1.a=T2.b) [1,2]
Reqd Props: {Singleton, <T1.a>}
Distribution, Order
{Hashed(T1.a), Any} {Hashed(T2.b), Any}
Scan(T2)
Inner Hash Join
Scan(T1) Redistribute(T2.b)
Scan(T2)
Sort(T1.a)
GatherMerge(T1.a)
Inner Hash Join
Scan(T1) Redistribute(T2.b)
Scan(T2)
Gather
Sort(T1.a)
Optimization Request
Property
Enforcing
DEVELOP
GPORCA
Split an aggregate into a pair of local and global
aggregate.
CREATE TABLE foo (a int, b int, c int)
DISTRIBUTED BY (a);
SELECT sum(c) FROM foo GROUP BY b
Do local aggregation on segments
The global aggregation on master
Walkthrough
Split Groupby Aggregate
GpAgg (b)
Get(foo)
Sum(c)
GpAgg (b)
Get(foo)
Sum(c)
GpAgg (b)
Sum(c)
// HEADER FILES
~/orca/libgpopt/include/gpopt/xforms
// SOURCE FILES
~/orca/libgpopt/src/xforms
CXformSplitGbAgg
• Pattern
• Pre-Condition Check
Transformation Trigger
Pattern
GPOS_NEW(pmp)
CExpression
(
pmp,
// logical aggregate operator
GPOS_NEW(pmp) CLogicalGbAgg(pmp),
// relational child
GPOS_NEW(pmp) CExpression(pmp, GPOS_NEW(pmp) CPatternLeaf(pmp)),
// scalar project list
GPOS_NEW(pmp) CExpression(pmp, GPOS_NEW(pmp) CPatternTree(pmp))
));
GpAgg (b)
Get(foo)
Sum(c)
What's WRONG of this pattern?
GPOS_NEW(pmp)
CExpression
(
pmp,
// logical aggregate operator
GPOS_NEW(pmp) CLogicalGbAgg(pmp),
// relational child
GPOS_NEW(pmp) CExpression(pmp, GPOS_NEW(pmp) CPatternLeaf(pmp)),
// scalar project list
GPOS_NEW(pmp) CExpression(pmp, GPOS_NEW(pmp) CPatternTree(pmp))
));
GpAgg (b)
Get(foo)
Sum(c)
GpAgg (b)
Sum(c)
Pre-Condition Check
Do not fire this rule on a logical operator produced by the same rule.
(Avoid Infinite Recursion)
// Compatibility function for splitting aggregates
virtual
BOOL FCompatible(CXform::EXformId exfid)
{
return (CXform::ExfSplitGbAgg != exfid);
}
GpAgg (b)
Get(foo)
Sum(c)
GpAgg (b)
Sum(c)
void Transform
(
CXformContext *pxfctxt, // update
CXformResult *pxfres, // output
CExpression *pexpr // input
)
const;
details: libgpopt/src/xforms/CXformSplitGbAgg.cpp
The Actual Transformation
Register Transformation Rule
void CXformFactory::Instantiate()
{
…
Add(GPOS_NEW(m_pmp) CXformSplitGbAgg(m_pmp));
…
}
BUILD
GPORCA
Dependencies
GP-XERCES:
https://github.com/greenplum-db/gp-xerces
CMake 3.0+
CMake Build
mkdir build
cd build
cmake ../
make && make install
CI: Concourse Pipeline
https://ci.orca.pivotalci.info/teams/main/pipelines/gporca
TEST
GPORCA
# run all unit tests
ctest
# run all unit tests in parallel with 7 threads
ctest -j7
# run only one unit test called CAggTest
./server/gporca_test -U CAggTest
Test GPORCA (<1min)
Follow instructions from:
https://github.com/d/bug-free-fortnight
It's very useful to verify installcheck-good locally with
latest GPORCA changes.
Test with GPDB OSS in Docker
CONTRIBUTE
GPORCA
1. Fork GPORCA at
https://github.com/greenplum-db/gporca
2. Pick an issue
https://github.com/greenplum-db/gporca/issues
3. Send a Pull Request (PR)
1-2-3
Thanks to all these contributors
http://www.ceotodaymagazine.com/issue/issue-09-2017/#32
The internals of gporca optimizer
The internals of gporca optimizer
The internals of gporca optimizer
You are invited!
THANK YOU
GPDB:
http://greenplum.org/
https://github.com/greenplum-db/gpdb
GPORCA Github:
https://github.com/greenplum-db/gporca
mailing lists (400+ member):
gpdb-users@greenplum.org
Greenplum YouTube:
https://www.youtube.com/GreenplumDatabase
Transforming How The World Builds Software
© Copyright 2017 Pivotal Software, Inc. All rights Reserved.
Q: What happen to SORT generated in Enforcement?
A: There is only ONE implementation of SORT, so, that's a physical operator and won't be pushed anywhere after enforcement.
Q: Why CXformResult has more than one alternatives?
A: Some rule (e.g. CXformExpandNAryJoinDP) can produce multiple choices after transformation
Q: How hard to make GPORCA adapt to a new host?
A: The hard part is on the MD translation. So far, GPORCA is very PostgreSQL friendly.
Q: Why there is no separate SQL parser included in GPORCA?
A: GPORCA focused on relational algebra and let host handle the binding, view expansions, and permissions.
Q: How to add a new property like 'reliability' or 'dollar cost' to this optimizer?
A: That can be added to Property Enforcement as the Order/Distribution/Partition/Rewindability. For example, if people want to favor a more
'reliable' data source, they can add a CEnfdReliability class to cost that choice. It's an interesting combination of 'reliability' and 'dollar cost',
usually, when it's more reliable is more expensive. It's an interesting balance to achieve.
Q: How long does the `ctest` run?
A: Around 5min on 2.8Ghz Intel i7. Running with `ctest -j7` finished in < 2min.
Q: Is GPORCA multi-threaded?
A: It's multi-thread READY, but currently, we run with single thread. There are still few caveats (thread safe issues) to iron out before we can fully
turn it on.
FAQ
Publications
Orca: A Modular Query Optimizer Architecture for Big Data, SIGMOD 2014
Mohamed A. Soliman, Lyublena Antova, Venkatesh Raghavan, Amr El-Helw, Zhongxian Gu, Entong Shen, George C.
Caragea, Carlos Garcia-Alvarado, Foyzur Rahman, Michalis Petropoulos, Florian Waas, Sivaramakrishnan Narayanan,
Konstantinos Krikellas, Rhonda Baldwin
Optimization of Common Table Expressions in MPP Database Systems, VLDB 2015
Amr El-Helw, Venkatesh Raghavan, Mohamed A. Soliman, George C. Caragea, Zhongxian Gu, Michalis Petropoulos.
Optimizing Queries over Partitioned Tables in MPP Systems, SIGMOD 2014
Lyublena Antova, Amr El-Helw, Mohamed Soliman, Zhongxian Gu, Michalis Petropoulos, Florian Waas
Reversing Statistics for Scalable Test Databases Generation, DBTest 2013
Entong Shen, Lyublena Antova
Total Operator State Recall - Cost-Effective Reuse of Results in Greenplum Database, ICDE Workshops 2013
George C. Caragea, Carlos Garcia-Alvarado, Michalis Petropoulos, Florian M. Waas
Testing the Accuracy of Query Optimizers, DBTest 2012
Zhongxian Gu, Mohamed A. Soliman, Florian M. Waas
Automatic Capture of Minimal, Portable, and Executable Bug Repros using AMPERe, DBTest 2012
Lyublena Antova, Konstantinos Krikellas, Florian M. Waas
Automatic Data Placement in MPP Databases, ICDE Workshops 2012
Carlos Garcia-Alvarado, Venkatesh Raghavan, Sivaramakrishnan Narayanan, Florian M. Waas
DEBUG
GPORCA
GPORCA Traceflags and GPDB GUC
GPORCA relies on Traceflags to change runtime behavior: Traceflag.h
Exposed in GPDB as GUC (Grand Unified Configuration): guc_gp.c
-- turn on GPORCA
set optimizer=on;
-- print input query (GPORCA TF 101000)
set optimizer_print_query=on;
Turn on the minidump
set client_min_messages='log';
set optimizer=on;
set optimizer_enable_constant_expression_evaluation=off;
set optimizer_enumerate_plans=on;
set optimizer_minidump=always;
Run a query
GPORCA creates a *.mdp file in the $MASTER_DATA_DIRECTORY/minidump
# run only one minidump directly
./server/gporca_test -d ../data/dxl/minidump/TVFRandom.mdp
Minidump: DXL document
Input Query, Output Plan
Debug the plans
set client_min_messages='log';
set optimizer=on;
set optimizer_print_query=on; -- input query, and
preprocessed query
set optimizer_print_plan=on; -- output final physical plan
Plan Enumeration
Turn on the plan enumerations
set client_min_messages='log';
set optimizer=on;
set optimizer_enumerate_plans=on;
Pick a plan out of search space
set optimizer=on;
set client_min_messages='log';
set optimizer_enumerate_plans=on;
set optimizer_plan_id=1;
Optimization Stats and Xform Rules
Debug optimizer stages
set client_min_messages='log';
set optimizer=on;
set optimizer_print_optimization_stats=on;
Debug the transformation rules details
set client_min_messages='log';
set optimizer=on;
set optimizer_print_xform=on;
MEMO Groups
set optimizer_print_memo_after_exploration=on;
set optimizer_print_memo_after_implementation=on;
set optimizer_print_memo_after_optimization=on;
ROOT group is indicated as `ROOT`
Way to Disable Xform Rules
select disable_xform('CXformJoinAssociativity');
select enable_xform('CXformJoinAssociativity');
All the xform rules can be found from the class names under
libgpopt/include/gpopt/xforms
CXformFactory::Instantiate lists all the activated xform rules (~130 rules)
Useful Breakpoints
# Entry point of optimizer
COptimizer::PdxlnOptimize
# DXL: Translate DXL into Query
CTranslatorDXLToExpr::PexprTranslateQuery
# Step 1: Pre-processor
CExpressionPreprocessor::PexprPreprocess
# Step 2-3-4: Optimization
COptimizer::PexprOptimize
# Individual rule transformation, all CXform* classes
CXformSplitGbAgg::Transform
# Enforceable Property
CEngine::FCheckEnfdProps
CPartitionPropagationSpec::AppendEnforcers
# DXL: Translate Plan back in DXL
CTranslatorExprToDXL::PdxlnTranslate
CODE BASE
GPORCA
Top Level
.
├── cmake
├── concourse
├── data
├── libgpdbcost
├── libgpopt
├── libgpos
├── libnaucrates
├── patches
├── scripts
└── server
libgpos: memory management, task scheduler, exception
handling, unit-test framework
libgpos
├── include
└── src
├── common
├── error
├── io
├── memory
├── net
├── string
├── sync
├── task
└── test
├── server
│ ├── include
│ └── src
│ ├── startup
│ └── unittest
│ └── gpos
│ ├── common
│ ├── error
│ ├── io
│ ├── memory
│ ├── string
│ ├── sync
│ ├── task
│ └── test
libnaucrates: DXL, metadata, statistics, traceflags
libnaucrates
├── include
│ └── naucrates
│ ├── base
│ ├── dxl
│ │ ├── operators
│ │ ├── parser
│ │ └── xml
│ ├── md
│ ├── statistics
│ └── traceflags
└── src
├── base
├── md
├── operators
├── parser
├── statistics
└── xml
libgpopt: engine, metadata cache, minidump, operators,
memo, xform rules
libgpopt
├── include
│ └── gpopt
└── src
├── base
├── engine
├── eval
├── mdcache
├── metadata
├── minidump
├── operators
├── optimizer
├── search
├── translate
└── xforms
libgpdbcost: cost model
libgpdbcost
├── CMakeLists.txt
├── include
│ └── gpdbcost
└── src
├── CCostModelGPDB.cpp
├── CCostModelGPDBLegacy.cpp
├── CCostModelParamsGPDB.cpp
├── CCostModelParamsGPDBLegacy.cpp
└── ICostModel.cpp
server: unit tests server
├── include
└── src
├── startup
└── unittest
├── dxl
│ ├── base
│ └── statistics
└── gpopt
├── base
├── cost
├── csq
├── engine
├── eval
├── mdcache
├── metadata
├── minidump
├── operators
├── search
├── translate
└── xforms
Data: all the test data data
├── dxl
│ ├── cost
│ ├── csq_tests
│ ├── expressiontests
│ ├── indexjoin
│ ├── metadata
│ ├── minidump
│ │ ├── CArrayExpansionTest
│ │ ├── CJoinOrderDPTest
│ │ ├── CPhysicalParallelUnionAllTest
│ │ ├── CPruneColumnsTest
│ │ └── sql
│ ├── multilevel-partitioning
│ ├── parse_tests
│ ├── plstmt
│ ├── query
│ ├── search
│ ├── statistics
│ ├── tpcds
│ ├── tpcds-partitioned
│ ├── tpch
│ └── tpch-partitioned

More Related Content

PDF
Incremental View Maintenance with Coral, DBT, and Iceberg
PDF
Will Oracle 23ai make you a better DBA or Developer?
PDF
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
PDF
The Google Bigtable
PPTX
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovations
PPTX
Using LLVM to accelerate processing of data in Apache Arrow
PPTX
Apache Kudu: Technical Deep Dive


PDF
Parquet Hadoop Summit 2013
Incremental View Maintenance with Coral, DBT, and Iceberg
Will Oracle 23ai make you a better DBA or Developer?
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
The Google Bigtable
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovations
Using LLVM to accelerate processing of data in Apache Arrow
Apache Kudu: Technical Deep Dive


Parquet Hadoop Summit 2013

What's hot (20)

PDF
GPDB Meetup GPORCA OSS 101
PPTX
GPORCA: Query Optimization as a Service
PPTX
The Volcano/Cascades Optimizer
PPTX
High Performance, High Reliability Data Loading on ClickHouse
PDF
Solving PostgreSQL wicked problems
PDF
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
PDF
MyRocks Deep Dive
PDF
MySQL 8.0 Optimizer Guide
PPTX
Stability Patterns for Microservices
PDF
Iceberg: a fast table format for S3
PDF
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
PPTX
Tuning and Debugging in Apache Spark
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PDF
Transparent sharding with Spider: what's new and getting started
PDF
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
PPTX
Scylla Summit 2022: Making Schema Changes Safe with Raft
PPTX
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
PDF
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
PPTX
Flink vs. Spark
PDF
Introduction to Apache Calcite
GPDB Meetup GPORCA OSS 101
GPORCA: Query Optimization as a Service
The Volcano/Cascades Optimizer
High Performance, High Reliability Data Loading on ClickHouse
Solving PostgreSQL wicked problems
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
MyRocks Deep Dive
MySQL 8.0 Optimizer Guide
Stability Patterns for Microservices
Iceberg: a fast table format for S3
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tuning and Debugging in Apache Spark
Tuning Apache Kafka Connectors for Flink.pptx
Transparent sharding with Spider: what's new and getting started
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Scylla Summit 2022: Making Schema Changes Safe with Raft
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Flink vs. Spark
Introduction to Apache Calcite
Ad

Similar to The internals of gporca optimizer (20)

PDF
Orca: A Modular Query Optimizer Architecture for Big Data
 
PDF
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
PDF
Greenplum Roadmap
PDF
orca_fosdem_FINAL
PPTX
Greenplum Database Open Source December 2015
PDF
Introduction to Greenplum
PDF
Greenplum Architecture
PPTX
Presto query optimizer: pursuit of performance
PDF
Interactive SQL-on-Hadoop and JethroData
PDF
Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...
PDF
Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...
PDF
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
PDF
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
PPTX
SPL_ALL_EN.pptx
PPTX
DataMass Summit - Machine Learning for Big Data in SQL Server
PDF
Greenplum User Case
PDF
Federated Queries with HAWQ - SQL on Hadoop and Beyond
PDF
Spark + AI Summit recap jul16 2020
PPT
Palo Webinar
PDF
Pivotal OSS meetup - MADlib and PivotalR
Orca: A Modular Query Optimizer Architecture for Big Data
 
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
Greenplum Roadmap
orca_fosdem_FINAL
Greenplum Database Open Source December 2015
Introduction to Greenplum
Greenplum Architecture
Presto query optimizer: pursuit of performance
Interactive SQL-on-Hadoop and JethroData
Learn How Dell Improved Postgres/Greenplum Performance 20x with a Database Pr...
Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI - Gr...
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
SPL_ALL_EN.pptx
DataMass Summit - Machine Learning for Big Data in SQL Server
Greenplum User Case
Federated Queries with HAWQ - SQL on Hadoop and Beyond
Spark + AI Summit recap jul16 2020
Palo Webinar
Pivotal OSS meetup - MADlib and PivotalR
Ad

Recently uploaded (20)

DOCX
How to Use SharePoint as an ISO-Compliant Document Management System
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PDF
MCP Security Tutorial - Beginner to Advanced
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PDF
iTop VPN Crack Latest Version Full Key 2025
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PPTX
GSA Content Generator Crack (2025 Latest)
PPTX
Introduction to Windows Operating System
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PDF
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PPTX
Cybersecurity: Protecting the Digital World
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PPTX
Trending Python Topics for Data Visualization in 2025
PDF
CCleaner 6.39.11548 Crack 2025 License Key
How to Use SharePoint as an ISO-Compliant Document Management System
Weekly report ppt - harsh dattuprasad patel.pptx
MCP Security Tutorial - Beginner to Advanced
Wondershare Recoverit Full Crack New Version (Latest 2025)
iTop VPN Crack Latest Version Full Key 2025
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
GSA Content Generator Crack (2025 Latest)
Introduction to Windows Operating System
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
Why Generative AI is the Future of Content, Code & Creativity?
Cybersecurity: Protecting the Digital World
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
Computer Software and OS of computer science of grade 11.pptx
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
Trending Python Topics for Data Visualization in 2025
CCleaner 6.39.11548 Crack 2025 License Key

The internals of gporca optimizer

  • 1. © Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0 The Internals of GPORCA Optimizer Xin Zhang (xzhang@pivotal.io) PGConf Seattle 2017 Nov 2017
  • 2. Disclaimer This presentation contains statements relating to Pivotal’s expectations, projections, beliefs and prospects which are "forward-looking statements” about Pivotal’s future which by their nature are uncertain. Such forward-looking statements are not guarantees of future performance, and you are cautioned not to place undue reliance on these forward-looking statements. Actual results could differ materially from those projected in the forward-looking statements as a result of many factors, including but not limited to: (i) adverse changes in general economic or market conditions; (ii) delays or reductions in information technology spending; (iii) risks associated with managing the growth of Pivotal’s business, including operating costs; (iv) changes to Pivotal’s software business model; (v) competitive factors, including pricing pressures and new product introductions; (vi) Pivotal’s customers' ability to transition to new products and computing strategies such as cloud computing, the uncertainty of customer acceptance of emerging technologies, and rapid technological and market changes; (vii) Pivotal's ability to protect its proprietary technology; (viii) Pivotal’s ability to attract and retain highly qualified employees; (ix) Pivotal’s ability to execute on its plans and strategy; and (x) risks related to data and information security vulnerabilities. All information set forth in this presentation is current as of the date of this presentation. These forward-looking statements are based on current expectations and are subject to uncertainties and changes in condition, significance, value and effect as well as other risks disclosed previously and from time to time in documents filed by Dell Technologies Inc., the parent company of Pivotal, with the U.S. Securities and Exchange Commission. Dell and Pivotal assume no obligation to, and do not currently intend to, update any such forward-looking statements after the date of this presentation. The following is intended to outline the general direction of Pivotal's offerings. It is intended for information purposes only and may not be incorporated into any contract. Any information regarding pre-release of Pivotal offerings, future updates or other planned modifications is subject to ongoing evaluation by Pivotal and is subject to change. This information is provided without warranty or any kind, express or implied, and is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions regarding Pivotal's offerings. These purchasing decisions should only be based on features currently available. The development, release, and timing of any features or functionality described for Pivotal's offerings in this presentation remain at the sole discretion of Pivotal. Pivotal has no obligation to update forward-looking information in this presentation.
  • 4. Open Source Software (OSS) Oct 2015, Greenplum Database Open Sourced (based on PostgreSQL 8.2) Sep 7th 2017, Greenplum Database 5.0 w/ GPORCA (based on PostgreSQL 8.3) Oct 23rd 2017, Greenplum Database 6.0 Alpha (based on PostgreSQL 8.4) ... ... Sep 2015, Shin joined Pivotal Jan 2016, GPORCA Open Source V1.636 http://engineering.pivotal.io/post/gporca-open-source/ Jan 2017, GPORCA V2.0 (merge GPOS) Sep 6th 2017, GPORCA V2.42.0 Nov 8th 2017, GPORCA V2.48.6
  • 5. Greenplum (GPDB): MPP + PostgreSQL ● Shared Nothing Architecture ● Data Distributed on Cluster ● Query Processed Locally in Parallel ● Each Segment is a PostgreSQL Instance http://greenplum.org/
  • 6. MPP Query Processing Example 1/3 SELECT s.beer, s.price FROM Bars b, Sells s WHERE b.name = s.bar AND b.city = 'San Francisco' • Bars is distributed randomly → any bar on any host • Sells is distributed by bar → sells of same bar on same host
  • 7. MPP Query Processing Example 2/3 SELECT s.beer, s.price FROM Bars b, Sells s WHERE b.name = s.bar AND b.city = 'San Francisco' • Bars is distributed randomly • Sells is distributed by bar Slice 2 Slice 1 Slice 3
  • 8. Segment N MPP Query Processing Example 3/3 Slice 3 Slice 2 Master Slice 1 Slice 3 Slice 2 … Segment 1
  • 9. GPORCA find the best MPP plan
  • 12. Jul 2015, PQO: Pivotal Query Optimizer (Released in GPDB 4.3.5.0) Jun 2014, ORCA: SIGMOD 2014 Paper In source code GPOPT src/backend/gpopt psql: select gp_opt_version(); GPORCA = Greenplum ORCA = ...
  • 13. GPORCA can optimize all 99 TPC-DS Queries (and variations) Reference: Orca: A Modular Query Optimizer Architecture for Big Data, SIGMOD 2014 Great SQL Surface
  • 14. ~60 Logical Operators ~50 Physical Operators ~40 Scalar Operators ~110 Transformation Rules Great SQL Surface of GPORCA
  • 15. Great Query Performance vs. Planner TPC-DS 10TB, 16 nodes, 48 GB/node
  • 16. Key Features vs. Planner • Smarter partition elimination • Multi-level partitioning support (4.3.6) • Subquery unnesting • Common table expressions (CTE) • Join on castable data types • Improved join ordering (better join order) • Join-Aggregate reordering (push/pull AGG over join) • Sort order optimization (avoid sort multiple times) • Skew awareness (redistribution vs. broadcast)
  • 18. GPDB Using Planner (`set optimizer=off`) Parser Executor GPORCA PlannerSQL Results Query Plan
  • 19. GPDB Using GPORCA (`set optimizer=on`) Parser Executor GPORCA Planner Query Plan SQL Results
  • 20. Multi-Host with DXL (Data eXchange Language) API GPORCA Parser Host System < /> SQL Q2DXL DXL Query MD Provider MDreq. Catalog Executor DXL MD DXL Plan DXL2Plan Results
  • 22. [Step 0] Pre-Process: e.g., predicates pushdown [Step 1] Exploration: All equivalent logical plans [Step 2] Statistics Derivation: histograms [Step 3] Implementation: Logical to Physical [Step 4] MPP Optimization: Enforcing and Costing Five Optimization Steps in Orca
  • 23. Step 0 Pre-Process with 25 iterations (TL, DR) (1) Remove unused CTE anchors (2) Remove intermediate superfluous limit (3) Trim unnecessary existential subqueries (4) Remove superfluous outer references from the outer spec in limits, grouping columns, partition/order columns in window operators (5) Remove superfluous equality (6) Simplify quantified subqueries (7) Preliminary unnesting of scalar subqueries (8) Unnest AND/OR/NOT predicates and ensure predicates are array IN or NOT IN where applicable (9) Infer predicates from constraints (10) Eliminate self comparisons (11) Remove duplicate AND/OR children (12) Fatorize common expressions (13) Infer filters out of components of disjunctive filters (14) Pre-process window functions (15) Collapse cascaded union/union all (16) Eliminate unused computed columns (17) Normlize expression (18) Transform outer join into inner join (19) Collapse cascaded inner joins (20) Generate more predicates from constraints (21) Eliminate empty subtrees (22) Collapse casecade of projects (23) Insert project for scalar subquery return outer reference (24) Reorder children of scalar comparison operator Rewrite IN subquery to EXIST subquery with a predicate
  • 24. Step 1 Exploration 1/2: Memoization Compact in-memory data structure capturing plan space: Group: Container of equivalent expressions Group Expression: Operator that has other groups as its children Inner Join (T1.a = T2.b) GROUP 0 Get(T2)Get(T1) 0: Inner Join [1,2] GROUP 1 0: Get (T1) [ ] GROUP 2 0: Get (T2) [ ]Logical Expression Memo
  • 25. Step 1 Exploration 2/2: Transformation Inner Join (T1.a = T2.b) GROUP 0 Get(T2)Get(T1) 0: Inner Join [1,2] GROUP 1 0: Get (T1) [ ] GROUP 2 0: Get (T2) [ ] Logical Expression Memo GROUP 0 0: Inner Join [1,2] GROUP 1 0: Get (T1) [ ] GROUP 2 0: Get (T2) [ ] Memo 1: Inner Join [2,1] Inner Join (T1.a = T2.b) Get(T1)Get(T2) Logical Expression
  • 26. Step 2 Statistics Derivation 1/2 1: Inner Join(a=b) [2,1] 0: Inner Join(a=b) [1,2] GROUP 0 0: Get(T1) [ ] GROUP 1 0: Get(T2) [ ] GROUP 2 Reqd Stats = { } Reqd Stats = {a} Reqd Stats = {b} Back-end SystemOrca
  • 27. Step 2 Statistics Derivation 2/2 1: Inner Join(a=b) [2,1] 0: Inner Join(a=b) [1,2] GROUP 0 0: Get(T1) [ ] GROUP 1 0: Get(T2) [ ] GROUP 2 Reqd Stats = { } Reqd Stats = {a} MD Accessor Reqd Stats = {b} MD Accessor MD Provider Catalog Back-end SystemOrca MDCache
  • 28. Step 3 Implementation Inner Hash Join (T1.a=T2.b) [1,2] GROUP 1 GROUP 2 Inner Join (T1.a=T2.b) [1,2] GROUP 1 GROUP 2 Logical Expression Physical Expression
  • 29. Drvd Props: {Hashed(T1.a), < >} Step 4 MPP Optimization 1/3 Requirement Inner Hash Join (T1.a=T2.b) [1,2] Reqd Props: {Singleton, <T1.a>} GROUP 1 GROUP 2 Distribution, Order Optimization Request
  • 30. Redistribute(T2.b)Scan(T1) Drvd Props: {Hashed(T1.a), < >} Step 4 MPP Optimization 2/3 Costing Inner Hash Join (T1.a=T2.b) [1,2] Reqd Props: {Singleton, <T1.a>} Distribution, Order {Hashed(T1.a), Any} {Hashed(T2.b), Any} Scan(T2) Optimization Request
  • 31. Redistribute(T2.b)Scan(T1) Drvd Props: {Hashed(T1.a), < >} Step 4 MPP Optimization 3/3 Enforcement Inner Hash Join (T1.a=T2.b) [1,2] Reqd Props: {Singleton, <T1.a>} Distribution, Order {Hashed(T1.a), Any} {Hashed(T2.b), Any} Scan(T2) Inner Hash Join Scan(T1) Redistribute(T2.b) Scan(T2) Sort(T1.a) GatherMerge(T1.a) Inner Hash Join Scan(T1) Redistribute(T2.b) Scan(T2) Gather Sort(T1.a) Optimization Request Property Enforcing
  • 33. Split an aggregate into a pair of local and global aggregate. CREATE TABLE foo (a int, b int, c int) DISTRIBUTED BY (a); SELECT sum(c) FROM foo GROUP BY b Do local aggregation on segments The global aggregation on master Walkthrough
  • 34. Split Groupby Aggregate GpAgg (b) Get(foo) Sum(c) GpAgg (b) Get(foo) Sum(c) GpAgg (b) Sum(c)
  • 35. // HEADER FILES ~/orca/libgpopt/include/gpopt/xforms // SOURCE FILES ~/orca/libgpopt/src/xforms CXformSplitGbAgg
  • 36. • Pattern • Pre-Condition Check Transformation Trigger
  • 37. Pattern GPOS_NEW(pmp) CExpression ( pmp, // logical aggregate operator GPOS_NEW(pmp) CLogicalGbAgg(pmp), // relational child GPOS_NEW(pmp) CExpression(pmp, GPOS_NEW(pmp) CPatternLeaf(pmp)), // scalar project list GPOS_NEW(pmp) CExpression(pmp, GPOS_NEW(pmp) CPatternTree(pmp)) )); GpAgg (b) Get(foo) Sum(c)
  • 38. What's WRONG of this pattern? GPOS_NEW(pmp) CExpression ( pmp, // logical aggregate operator GPOS_NEW(pmp) CLogicalGbAgg(pmp), // relational child GPOS_NEW(pmp) CExpression(pmp, GPOS_NEW(pmp) CPatternLeaf(pmp)), // scalar project list GPOS_NEW(pmp) CExpression(pmp, GPOS_NEW(pmp) CPatternTree(pmp)) )); GpAgg (b) Get(foo) Sum(c) GpAgg (b) Sum(c)
  • 39. Pre-Condition Check Do not fire this rule on a logical operator produced by the same rule. (Avoid Infinite Recursion) // Compatibility function for splitting aggregates virtual BOOL FCompatible(CXform::EXformId exfid) { return (CXform::ExfSplitGbAgg != exfid); } GpAgg (b) Get(foo) Sum(c) GpAgg (b) Sum(c)
  • 40. void Transform ( CXformContext *pxfctxt, // update CXformResult *pxfres, // output CExpression *pexpr // input ) const; details: libgpopt/src/xforms/CXformSplitGbAgg.cpp The Actual Transformation
  • 41. Register Transformation Rule void CXformFactory::Instantiate() { … Add(GPOS_NEW(m_pmp) CXformSplitGbAgg(m_pmp)); … }
  • 44. CMake Build mkdir build cd build cmake ../ make && make install
  • 47. # run all unit tests ctest # run all unit tests in parallel with 7 threads ctest -j7 # run only one unit test called CAggTest ./server/gporca_test -U CAggTest Test GPORCA (<1min)
  • 48. Follow instructions from: https://github.com/d/bug-free-fortnight It's very useful to verify installcheck-good locally with latest GPORCA changes. Test with GPDB OSS in Docker
  • 50. 1. Fork GPORCA at https://github.com/greenplum-db/gporca 2. Pick an issue https://github.com/greenplum-db/gporca/issues 3. Send a Pull Request (PR) 1-2-3
  • 51. Thanks to all these contributors
  • 57. THANK YOU GPDB: http://greenplum.org/ https://github.com/greenplum-db/gpdb GPORCA Github: https://github.com/greenplum-db/gporca mailing lists (400+ member): gpdb-users@greenplum.org Greenplum YouTube: https://www.youtube.com/GreenplumDatabase
  • 58. Transforming How The World Builds Software © Copyright 2017 Pivotal Software, Inc. All rights Reserved.
  • 59. Q: What happen to SORT generated in Enforcement? A: There is only ONE implementation of SORT, so, that's a physical operator and won't be pushed anywhere after enforcement. Q: Why CXformResult has more than one alternatives? A: Some rule (e.g. CXformExpandNAryJoinDP) can produce multiple choices after transformation Q: How hard to make GPORCA adapt to a new host? A: The hard part is on the MD translation. So far, GPORCA is very PostgreSQL friendly. Q: Why there is no separate SQL parser included in GPORCA? A: GPORCA focused on relational algebra and let host handle the binding, view expansions, and permissions. Q: How to add a new property like 'reliability' or 'dollar cost' to this optimizer? A: That can be added to Property Enforcement as the Order/Distribution/Partition/Rewindability. For example, if people want to favor a more 'reliable' data source, they can add a CEnfdReliability class to cost that choice. It's an interesting combination of 'reliability' and 'dollar cost', usually, when it's more reliable is more expensive. It's an interesting balance to achieve. Q: How long does the `ctest` run? A: Around 5min on 2.8Ghz Intel i7. Running with `ctest -j7` finished in < 2min. Q: Is GPORCA multi-threaded? A: It's multi-thread READY, but currently, we run with single thread. There are still few caveats (thread safe issues) to iron out before we can fully turn it on. FAQ
  • 60. Publications Orca: A Modular Query Optimizer Architecture for Big Data, SIGMOD 2014 Mohamed A. Soliman, Lyublena Antova, Venkatesh Raghavan, Amr El-Helw, Zhongxian Gu, Entong Shen, George C. Caragea, Carlos Garcia-Alvarado, Foyzur Rahman, Michalis Petropoulos, Florian Waas, Sivaramakrishnan Narayanan, Konstantinos Krikellas, Rhonda Baldwin Optimization of Common Table Expressions in MPP Database Systems, VLDB 2015 Amr El-Helw, Venkatesh Raghavan, Mohamed A. Soliman, George C. Caragea, Zhongxian Gu, Michalis Petropoulos. Optimizing Queries over Partitioned Tables in MPP Systems, SIGMOD 2014 Lyublena Antova, Amr El-Helw, Mohamed Soliman, Zhongxian Gu, Michalis Petropoulos, Florian Waas Reversing Statistics for Scalable Test Databases Generation, DBTest 2013 Entong Shen, Lyublena Antova Total Operator State Recall - Cost-Effective Reuse of Results in Greenplum Database, ICDE Workshops 2013 George C. Caragea, Carlos Garcia-Alvarado, Michalis Petropoulos, Florian M. Waas Testing the Accuracy of Query Optimizers, DBTest 2012 Zhongxian Gu, Mohamed A. Soliman, Florian M. Waas Automatic Capture of Minimal, Portable, and Executable Bug Repros using AMPERe, DBTest 2012 Lyublena Antova, Konstantinos Krikellas, Florian M. Waas Automatic Data Placement in MPP Databases, ICDE Workshops 2012 Carlos Garcia-Alvarado, Venkatesh Raghavan, Sivaramakrishnan Narayanan, Florian M. Waas
  • 62. GPORCA Traceflags and GPDB GUC GPORCA relies on Traceflags to change runtime behavior: Traceflag.h Exposed in GPDB as GUC (Grand Unified Configuration): guc_gp.c -- turn on GPORCA set optimizer=on; -- print input query (GPORCA TF 101000) set optimizer_print_query=on;
  • 63. Turn on the minidump set client_min_messages='log'; set optimizer=on; set optimizer_enable_constant_expression_evaluation=off; set optimizer_enumerate_plans=on; set optimizer_minidump=always; Run a query GPORCA creates a *.mdp file in the $MASTER_DATA_DIRECTORY/minidump # run only one minidump directly ./server/gporca_test -d ../data/dxl/minidump/TVFRandom.mdp Minidump: DXL document
  • 64. Input Query, Output Plan Debug the plans set client_min_messages='log'; set optimizer=on; set optimizer_print_query=on; -- input query, and preprocessed query set optimizer_print_plan=on; -- output final physical plan
  • 65. Plan Enumeration Turn on the plan enumerations set client_min_messages='log'; set optimizer=on; set optimizer_enumerate_plans=on; Pick a plan out of search space set optimizer=on; set client_min_messages='log'; set optimizer_enumerate_plans=on; set optimizer_plan_id=1;
  • 66. Optimization Stats and Xform Rules Debug optimizer stages set client_min_messages='log'; set optimizer=on; set optimizer_print_optimization_stats=on; Debug the transformation rules details set client_min_messages='log'; set optimizer=on; set optimizer_print_xform=on;
  • 67. MEMO Groups set optimizer_print_memo_after_exploration=on; set optimizer_print_memo_after_implementation=on; set optimizer_print_memo_after_optimization=on; ROOT group is indicated as `ROOT`
  • 68. Way to Disable Xform Rules select disable_xform('CXformJoinAssociativity'); select enable_xform('CXformJoinAssociativity'); All the xform rules can be found from the class names under libgpopt/include/gpopt/xforms CXformFactory::Instantiate lists all the activated xform rules (~130 rules)
  • 69. Useful Breakpoints # Entry point of optimizer COptimizer::PdxlnOptimize # DXL: Translate DXL into Query CTranslatorDXLToExpr::PexprTranslateQuery # Step 1: Pre-processor CExpressionPreprocessor::PexprPreprocess # Step 2-3-4: Optimization COptimizer::PexprOptimize # Individual rule transformation, all CXform* classes CXformSplitGbAgg::Transform # Enforceable Property CEngine::FCheckEnfdProps CPartitionPropagationSpec::AppendEnforcers # DXL: Translate Plan back in DXL CTranslatorExprToDXL::PdxlnTranslate
  • 71. Top Level . ├── cmake ├── concourse ├── data ├── libgpdbcost ├── libgpopt ├── libgpos ├── libnaucrates ├── patches ├── scripts └── server
  • 72. libgpos: memory management, task scheduler, exception handling, unit-test framework libgpos ├── include └── src ├── common ├── error ├── io ├── memory ├── net ├── string ├── sync ├── task └── test ├── server │ ├── include │ └── src │ ├── startup │ └── unittest │ └── gpos │ ├── common │ ├── error │ ├── io │ ├── memory │ ├── string │ ├── sync │ ├── task │ └── test
  • 73. libnaucrates: DXL, metadata, statistics, traceflags libnaucrates ├── include │ └── naucrates │ ├── base │ ├── dxl │ │ ├── operators │ │ ├── parser │ │ └── xml │ ├── md │ ├── statistics │ └── traceflags └── src ├── base ├── md ├── operators ├── parser ├── statistics └── xml
  • 74. libgpopt: engine, metadata cache, minidump, operators, memo, xform rules libgpopt ├── include │ └── gpopt └── src ├── base ├── engine ├── eval ├── mdcache ├── metadata ├── minidump ├── operators ├── optimizer ├── search ├── translate └── xforms
  • 75. libgpdbcost: cost model libgpdbcost ├── CMakeLists.txt ├── include │ └── gpdbcost └── src ├── CCostModelGPDB.cpp ├── CCostModelGPDBLegacy.cpp ├── CCostModelParamsGPDB.cpp ├── CCostModelParamsGPDBLegacy.cpp └── ICostModel.cpp
  • 76. server: unit tests server ├── include └── src ├── startup └── unittest ├── dxl │ ├── base │ └── statistics └── gpopt ├── base ├── cost ├── csq ├── engine ├── eval ├── mdcache ├── metadata ├── minidump ├── operators ├── search ├── translate └── xforms
  • 77. Data: all the test data data ├── dxl │ ├── cost │ ├── csq_tests │ ├── expressiontests │ ├── indexjoin │ ├── metadata │ ├── minidump │ │ ├── CArrayExpansionTest │ │ ├── CJoinOrderDPTest │ │ ├── CPhysicalParallelUnionAllTest │ │ ├── CPruneColumnsTest │ │ └── sql │ ├── multilevel-partitioning │ ├── parse_tests │ ├── plstmt │ ├── query │ ├── search │ ├── statistics │ ├── tpcds │ ├── tpcds-partitioned │ ├── tpch │ └── tpch-partitioned