SlideShare a Scribd company logo
Page1 © Hortonworks Inc. 2014
Cost-based query optimization in
Apache Hive
Julian Hyde Julian Hyde
June 4th, 2014
Page2 © Hortonworks Inc. 2014
About me
Julian Hyde
Architect at Hortonworks
Open source:
• Founder & lead, Apache Optiq (query optimization framework)
• Founder & lead, Pentaho Mondrian (analysis engine)
• Committer, Apache Drill
• Contributor, Apache Hive
• Contributor, Cascading Lingual (SQL interface to Cascading)
Past:
• SQLstream (streaming SQL)
• Broadbase (data warehouse)
• Oracle (SQL kernel development)
Page3 © Hortonworks Inc. 2014
(Thanks to
John Pullokkaran,
Harish Butani
for presentation content
and actually doing the work.)
Page4 © Hortonworks Inc. 2014
Apache Hive
The original “SQL on Hadoop”
Undergoing extensive renovation
• Tez execution engine
• YARN execution environment
• Vectorized data representation
• Column-oriented data storage (ORC)
• ACID transactions
• SQL standards compliance
• SQL authorization model
• Cost-based query optimization (CBO) What? Why? How? When?
“Stinger
Initiative”
Page5 © Hortonworks Inc. 2014
Incremental cutover to cost-based optimization
Release Date Remarks
Apache Hive 0.12 October 2013 • Rule-based Optimizations
• No join reordering
• Main optimizations: predicate push-
down & partition pruning
• Semantic info and operator tree tightly
coupled
Apache Hive 0.13 April 2014 “Old-style” JOIN & push-down conditions:
… FROM t1, t2 WHERE …
CBO just missed the deadline 
HDP 2.1 April 2014 Cost-based ordering of joins
• HIVE-6439 “Introduce CBO step in
Semantic Analyzer”
• HIVE-5775 “Introduce Cost Based
Optimizer in Hive”
Apache Hive 0.14 ? CBO patches
More rework of internals
More cost-based features…
Page6 © Hortonworks Inc. 2014
Apache Optiq
(incubating)
Page7 © Hortonworks Inc. 2014
Apache Optiq
Apache incubator project since May, 2014
Query planning framework
• Extensible
• Usable standalone (JDBC) or embedded
Adoption
Lingual – SQL interface to Cascading
Apache Drill
Apache Hive
Adapters: Splunk, Spark, MongoDB, JDBC, CSV, Web tables, In-memory
data
Page8 © Hortonworks Inc. 2014
Conventional DB architecture
Page9 © Hortonworks Inc. 2014
Optiq architecture
Page10 © Hortonworks Inc. 2014
Optiq – APIs and SPIs
Cost, statistics
RelOptCost
RelOptCostFactory
RelMetadataProvider
• RelMdColumnUniquensss
• RelMdDistinctRowCount
• RelMdSelectivity
SQL parser
SqlNode
SqlParser
SqlValidator
Transformation rules
RelOptRule
• MergeFilterRule
• PushAggregateThroughUni
onRule
• RemoveCorrelationForScal
arProjectRule
• 100+ more
Unification (materialized view)
Column trimming
Relational algebra
RelNode (operator)
• TableScan
• Filter
• Project
• Union
• Aggregate
• …
RelDataType (type)
RexNode (expression)
RelTrait (physical property)
• RelConvention (calling-convention)
• RelCollation (sortedness)
• TBD (bucketedness/distribution) JDBC driver
Metadata
Schema
Table
Function
• TableFunction
• TableMacro
Page11 © Hortonworks Inc. 2014
Now… back to Hive
Page12 © Hortonworks Inc. 2014
CBO in Hive
Why cost-based optimization?
Ease of Use – Join Reordering
View Chaining
Ad hoc queries involving multiple views
Enables BI Tools as front ends to Hive
First version
Modest goal
Concrete results
Join re-ordering
Page 12
Page13 © Hortonworks Inc. 2014
Query preparation – Hive 0.13
SQL
parser
Semantic
analyzer
Logical
Optimizer
Physical
Optimizer
Abstract
Syntax
Tree (AST)
Hive SQL
Annotated
AST
Plan
Tez
Tuned
Plan
Page14 © Hortonworks Inc. 2014
Query preparation – full CBO
SQL
parser
Semantic
analyzer
Translate
to algebra
Physical
Optimizer
Abstract
Syntax
Tree (AST)
Hive SQL
Tez
Tuned
Plan
Optiq
optimizer
RelNode
Annotated
AST
Page15 © Hortonworks Inc. 2014
Query preparation – initial CBO
SQL
parser
Semantic
analyzer
Logical
Optimizer
Physical
Optimizer
Hive SQL
AST with optimized
join-ordering
Tez
Tuned
Plan
Translate
to algebra
Optiq
optimizer
Page16 © Hortonworks Inc. 2014
© Hortonworks Inc. 2013
Query Execution – The basics
Page 16
SELECT R1.x
FROM R1
JOIN R2 ON R1.x = R2.x
JOIN R3 on R1.x = R3.x AND R2.x = R3.x
WHERE R1.z > 10;
p
s


R1 R2
R3
TS [R1]
TS [R2]
RS
RS
Shuffle
Join
TS [R3]
Map
Join
Filter FS
Page17 © Hortonworks Inc. 2014
© Hortonworks Inc. 2013
Query Optimization – Rule Based vs. Cost Based
Page 17
p
s


R1 R2
R3
p
s


R1
R2
R3
p
s


R1
R3
R2
p
s


R2
R3
R1
Page18 © Hortonworks Inc. 2014
Introduction of CBO into Hive Planning
cbo
enabled?
No
Generate Plan w/o
multi-way joins
Can
cbo handle
plan?
No
- Predicate Pushdown
- Part. Pruning
- Column Pruning
- Stats Annotation
Pre CBO Optimizer
Col stats
available?
No
Optiq-based
Planner
Hive Plan
Revised AST
Regular Planning route on
new AST with CBO
turned off.
Fallback to Regular
planning: as though cbo
is disabled.
- < 10 total Join
Ops
- No Outer Joins
- No Windowing,
Lateral Views,
Script Op.
Series of gating
factors to get a CBO
Plan.
Page19 © Hortonworks Inc. 2014
Optiq Planner Process
Hive
Plan
Planner
RelNode
GraphRelNode Converter
RexNode Converter
Hive Op  RelNode
Hive Expr  RexNode
• Node for each node in
Input Plan
• Each node is a Set of
alternate Sub Plans
• Set further divided into
Subsets: based on
traits like sortedness
1. Plan Graph
• Rule: specifies a Operator
sub-graph to match and
logic to generate equivalent
‘better’ sub-graph.
• We only have Join
Reordering Rules.
2. Rules
• RelNodes have Cost (&
Cumulative Cost)
• We only use Cardinality
for Cost.
3. Cost Model
- Used to Plugin Schema,
Cost Formulas:
Selectivity, NDV
calculations etc.
- We only added
Selectivity and NDV
formulas; Schema is
only available at the
Node level
4. Metadata Providers
Rule Match Queue
- Add Rule matches to Queue
- Apply Rule match
transformations to Plan Graph
- Iterate for fixed iterations or
until Cost doesn’t change.
- Match importance based on
Cost of RelNode and height.
Best
RelNode
Graph
AST Converter
Revised
AST
Logical Plan
Physical traits:
Table Part./Buckets;
RedSink Ops
removed
Page20 © Hortonworks Inc. 2014
Join Reordering Rules
a b
=
b a
1. Swap Join Rule
a b
=
2. Push Join Through Join Rule
c
a c
b
c b
a=
but is really:
Optiq
schema is
position
based
b
a c
3. So
a b
c
d
≠
a c
d
b
4. Pull Up Project above Join
b
a c
d
a c
b
d
=
Added bonus
Join permutations
across sub-query
blocks
5. Merge Projects
Page21 © Hortonworks Inc. 2014
Summary
Join re-ordering
Join cardinality is used for cost
All other operators are assumed to have tiny cost
Cardinality of filter, join, group-by is based on selectivity
Selectivity is computed based on number-of-distinct-values (NDV)
Table Stats and Column stats are required
Current limitations
Only supports: filter, inner join, group-by, project, order-by, limit
Not all UDFs
Does not attempt all join permutations (e.g. bushy trees; 10-way joins or more)
May not work well for Bucket, SMB & Skew Joins
Page 21
Page22 © Hortonworks Inc. 2014
TPC-DS Query 50
Joins Store Sales, and Store Returns fact tables.
Each of the fact tables are independently restricted by date.
Analysis at Store grain, so this dimension also joined in.
As specified Query starts by joining the 2 Fact tables.
select
s_store_name , .. other store details
,sum(case when (sr_returned_date_sk - ss_sold_date_sk <= 30 ) then 1 else 0 end) as `30 days`, …
from
store_sales ss,store_returns sr,store s ,date_dim d1 ,date_dim d2
where
d2.d_year = 2000 and d2.d_moy = 9
and ss.ss_ticket_number = sr.sr_ticket_number and ss.ss_item_sk = sr.sr_item_sk
and ss.ss_sold_date_sk = d1.d_date_sk and sr.sr_returned_date_sk = d2.d_date_sk
and ss.ss_customer_sk = sr.sr_customer_sk and ss.ss_store_sk = s.s_store_sk
group by store details
order by store details limit 100;
Join Graph
Page23 © Hortonworks Inc. 2014
TPC-DS Query 50
Specified
Join Tree
Non CBO Plan
CBO Plan
Page24 © Hortonworks Inc. 2014
TPC-DS Query 50
Run 1 Run 2
Non CBO 53.1 53.4
CBO 22.5 21.9
 1 year test
 > 10 mins for Non CBO
 CBO time was about the same
 Fact tables
 partitioned by Day,
 bucketed by Item
 Bucketing off
 Bucketing should help CBO plan.
 SR table much smaller. Better chance of Bucket Join in place of Shuffle
Join.
Join Ordering Cost Estimate
['d2', [[['store_sales', 'd1'], 'store_returns'], 'store']] 515074768.659
['d1', [[['store_sales', 'store'], 'store_returns'], 'd2']] 448155.355
…
['store_returns', 'd2'] 9938.93
['store_sales', 'store_returns'] 156727295.634
['d1', 'store_sales'] 123675664.449
Facts restricted to 3 months
Orderings considered by Planner
Page25 © Hortonworks Inc. 2014
TPC-DS Query 17
Joins Store Sales, Store Returns and Catalog
Sales fact tables.
Each of the fact tables are independently
restricted by time.
Analysis at Item and Store grain, so these
dimensions are also joined in.
As specified Query starts by joining the 3 Fact
tables.
select i_item_id
,i_item_desc
,s_state
,count(ss_quantity) as store_sales_quantitycount
,….
from store_sales ss ,store_returns sr, catalog_sales cs,
date_dim d1, date_dim d2, date_dim d3, store s, item I
where d1.d_quarter_name = '2000Q1’
and d1.d_date_sk = ss.ss_sold_date_sk
and i.i_item_sk = ss.ss_item_sk and …
group by i_item_id ,i_item_desc, ,s_state
order by i_item_id ,i_item_desc, s_state
limit 100;
Page26 © Hortonworks Inc. 2014
TPC-DS Query 17
Specified
Join Tree
Non CBO Plan
CBO Plan
Page27 © Hortonworks Inc. 2014
TPC-DS Query 17
Run 1 Run 2
Non CBO 100.71 127.53
CBO 50.9 44.52
 1 year test
 > 10 mins for Non CBO
 CBO time was about the same
 Fact tables
 partitioned by Day,
 bucketed by Item
 Bucketing off
 Bucketing should help CBO plan.
 SR table much smaller. Better chance of Bucket Join in place of Shuffle
Join.
Join Ordering Cost Estimate
['item', [[[[[['d2', 'store_returns'], 'store_sales'], 'catalog_sales'], 'd1'], 'd3'], 'store']] 3547898.061
…
['store_returns', 'd2’] 19224.71
['store_sales', 'store_returns’] 23057497.991
['d1', 'store_sales'] 26142.943
Facts restricted to 3 months
Orderings considered by Planner
Page28 © Hortonworks Inc. 2014
Next?
Outer joins
Scale to larger numbers of joins
Support all expressions (UDFs)
Join algorithm selection
Sortedness & distribution as a trait
Trait propagation
Better cost model
More statistics
Move all pre-planning and logical planning to Optiq
Use Optiq costs/statistics to help physical planning
Constant reduction & tree pruning
Rewrite query to use materialized view
Page29 © Hortonworks Inc. 2014
Thank you!
@julianhyde
http://hive.apache.org/
http://incubator.apache.org/projects/optiq.html

More Related Content

PPTX
Hive+Tez: A performance deep dive
PPTX
How to understand and analyze Apache Hive query execution plan for performanc...
PDF
Hive tuning
PDF
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
PPTX
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
PPTX
Apache Tez – Present and Future
PPTX
Building a Virtual Data Lake with Apache Arrow
PPTX
Apache HBase Performance Tuning
Hive+Tez: A performance deep dive
How to understand and analyze Apache Hive query execution plan for performanc...
Hive tuning
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai
Apache Tez – Present and Future
Building a Virtual Data Lake with Apache Arrow
Apache HBase Performance Tuning

What's hot (20)

PDF
Deep Dive: Memory Management in Apache Spark
PPTX
Hive + Tez: A Performance Deep Dive
PDF
Performance Troubleshooting Using Apache Spark Metrics
PPTX
Hive 3 - a new horizon
PDF
Don’t optimize my queries, optimize my data!
PDF
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
PDF
Iceberg + Alluxio for Fast Data Analytics
PDF
Parquet performance tuning: the missing guide
PPTX
Performance Optimizations in Apache Impala
PDF
Fine Tuning and Enhancing Performance of Apache Spark Jobs
PPTX
Building Reliable Lakehouses with Apache Flink and Delta Lake
PDF
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
PDF
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
PPTX
Hive: Loading Data
PDF
How to use Impala query plan and profile to fix performance issues
PDF
Understanding Query Plans and Spark UIs
PDF
Spark SQL
PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
PPTX
Processing Large Data with Apache Spark -- HasGeek
PDF
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Deep Dive: Memory Management in Apache Spark
Hive + Tez: A Performance Deep Dive
Performance Troubleshooting Using Apache Spark Metrics
Hive 3 - a new horizon
Don’t optimize my queries, optimize my data!
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Iceberg + Alluxio for Fast Data Analytics
Parquet performance tuning: the missing guide
Performance Optimizations in Apache Impala
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Building Reliable Lakehouses with Apache Flink and Delta Lake
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Hive: Loading Data
How to use Impala query plan and profile to fix performance issues
Understanding Query Plans and Spark UIs
Spark SQL
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Processing Large Data with Apache Spark -- HasGeek
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Ad

Similar to Cost-based query optimization in Apache Hive (20)

PPTX
Cost-based query optimization in Apache Hive 0.14
PPTX
Hive - Cost Based Optimizer
PDF
Cost-Based Optimizer in Apache Spark 2.2
PDF
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
PDF
Cost-based query optimization in Apache Hive 0.14
PDF
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
PDF
Hive 3 a new horizon
PDF
Hive 3 a new horizon
PPTX
An Overview on Optimization in Apache Hive: Past, Present Future
PPTX
Hive present-and-feature-shanghai
PDF
2013 July 23 Toronto Hadoop User Group Hive Tuning
PDF
Enhancing Spark SQL Optimizer with Reliable Statistics
PPTX
An In-Depth Look at Putting the Sting in Hive
PPTX
Stinger hadoop summit june 2013
PPTX
Stinger Initiative - Deep Dive
PDF
Optimizing Hive Queries
PDF
Optimizing Hive Queries
PPTX
La big datacamp2014_vikram_dixit
PDF
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
PPTX
Hive Correlation Optimizer
Cost-based query optimization in Apache Hive 0.14
Hive - Cost Based Optimizer
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-based query optimization in Apache Hive 0.14
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Hive 3 a new horizon
Hive 3 a new horizon
An Overview on Optimization in Apache Hive: Past, Present Future
Hive present-and-feature-shanghai
2013 July 23 Toronto Hadoop User Group Hive Tuning
Enhancing Spark SQL Optimizer with Reliable Statistics
An In-Depth Look at Putting the Sting in Hive
Stinger hadoop summit june 2013
Stinger Initiative - Deep Dive
Optimizing Hive Queries
Optimizing Hive Queries
La big datacamp2014_vikram_dixit
Tajolabigdatacamp2014 140618135810-phpapp01 hyunsik-choi
Hive Correlation Optimizer
Ad

More from Julian Hyde (20)

PPTX
Measures in SQL (SIGMOD 2024, Santiago, Chile)
PDF
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
PDF
Building a semantic/metrics layer using Calcite
PDF
Cubing and Metrics in SQL, oh my!
PDF
Adding measures to Calcite SQL
PDF
Morel, a data-parallel programming language
PDF
Is there a perfect data-parallel programming language? (Experiments with More...
PDF
Morel, a Functional Query Language
PDF
Apache Calcite (a tutorial given at BOSS '21)
PDF
The evolution of Apache Calcite and its Community
PDF
What to expect when you're Incubating
PDF
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
PDF
Efficient spatial queries on vanilla databases
PDF
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
PDF
Tactical data engineering
PDF
Don't optimize my queries, organize my data!
PDF
Spatial query on vanilla databases
PDF
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
PPTX
Lazy beats Smart and Fast
PDF
Data profiling with Apache Calcite
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Building a semantic/metrics layer using Calcite
Cubing and Metrics in SQL, oh my!
Adding measures to Calcite SQL
Morel, a data-parallel programming language
Is there a perfect data-parallel programming language? (Experiments with More...
Morel, a Functional Query Language
Apache Calcite (a tutorial given at BOSS '21)
The evolution of Apache Calcite and its Community
What to expect when you're Incubating
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Efficient spatial queries on vanilla databases
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Tactical data engineering
Don't optimize my queries, organize my data!
Spatial query on vanilla databases
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Lazy beats Smart and Fast
Data profiling with Apache Calcite

Recently uploaded (20)

PPT
Teaching material agriculture food technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
KodekX | Application Modernization Development
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Big Data Technologies - Introduction.pptx
PPTX
A Presentation on Artificial Intelligence
Teaching material agriculture food technology
Empathic Computing: Creating Shared Understanding
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Unlocking AI with Model Context Protocol (MCP)
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
The AUB Centre for AI in Media Proposal.docx
KodekX | Application Modernization Development
Mobile App Security Testing_ A Comprehensive Guide.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
cuic standard and advanced reporting.pdf
Encapsulation theory and applications.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Big Data Technologies - Introduction.pptx
A Presentation on Artificial Intelligence

Cost-based query optimization in Apache Hive

  • 1. Page1 © Hortonworks Inc. 2014 Cost-based query optimization in Apache Hive Julian Hyde Julian Hyde June 4th, 2014
  • 2. Page2 © Hortonworks Inc. 2014 About me Julian Hyde Architect at Hortonworks Open source: • Founder & lead, Apache Optiq (query optimization framework) • Founder & lead, Pentaho Mondrian (analysis engine) • Committer, Apache Drill • Contributor, Apache Hive • Contributor, Cascading Lingual (SQL interface to Cascading) Past: • SQLstream (streaming SQL) • Broadbase (data warehouse) • Oracle (SQL kernel development)
  • 3. Page3 © Hortonworks Inc. 2014 (Thanks to John Pullokkaran, Harish Butani for presentation content and actually doing the work.)
  • 4. Page4 © Hortonworks Inc. 2014 Apache Hive The original “SQL on Hadoop” Undergoing extensive renovation • Tez execution engine • YARN execution environment • Vectorized data representation • Column-oriented data storage (ORC) • ACID transactions • SQL standards compliance • SQL authorization model • Cost-based query optimization (CBO) What? Why? How? When? “Stinger Initiative”
  • 5. Page5 © Hortonworks Inc. 2014 Incremental cutover to cost-based optimization Release Date Remarks Apache Hive 0.12 October 2013 • Rule-based Optimizations • No join reordering • Main optimizations: predicate push- down & partition pruning • Semantic info and operator tree tightly coupled Apache Hive 0.13 April 2014 “Old-style” JOIN & push-down conditions: … FROM t1, t2 WHERE … CBO just missed the deadline  HDP 2.1 April 2014 Cost-based ordering of joins • HIVE-6439 “Introduce CBO step in Semantic Analyzer” • HIVE-5775 “Introduce Cost Based Optimizer in Hive” Apache Hive 0.14 ? CBO patches More rework of internals More cost-based features…
  • 6. Page6 © Hortonworks Inc. 2014 Apache Optiq (incubating)
  • 7. Page7 © Hortonworks Inc. 2014 Apache Optiq Apache incubator project since May, 2014 Query planning framework • Extensible • Usable standalone (JDBC) or embedded Adoption Lingual – SQL interface to Cascading Apache Drill Apache Hive Adapters: Splunk, Spark, MongoDB, JDBC, CSV, Web tables, In-memory data
  • 8. Page8 © Hortonworks Inc. 2014 Conventional DB architecture
  • 9. Page9 © Hortonworks Inc. 2014 Optiq architecture
  • 10. Page10 © Hortonworks Inc. 2014 Optiq – APIs and SPIs Cost, statistics RelOptCost RelOptCostFactory RelMetadataProvider • RelMdColumnUniquensss • RelMdDistinctRowCount • RelMdSelectivity SQL parser SqlNode SqlParser SqlValidator Transformation rules RelOptRule • MergeFilterRule • PushAggregateThroughUni onRule • RemoveCorrelationForScal arProjectRule • 100+ more Unification (materialized view) Column trimming Relational algebra RelNode (operator) • TableScan • Filter • Project • Union • Aggregate • … RelDataType (type) RexNode (expression) RelTrait (physical property) • RelConvention (calling-convention) • RelCollation (sortedness) • TBD (bucketedness/distribution) JDBC driver Metadata Schema Table Function • TableFunction • TableMacro
  • 11. Page11 © Hortonworks Inc. 2014 Now… back to Hive
  • 12. Page12 © Hortonworks Inc. 2014 CBO in Hive Why cost-based optimization? Ease of Use – Join Reordering View Chaining Ad hoc queries involving multiple views Enables BI Tools as front ends to Hive First version Modest goal Concrete results Join re-ordering Page 12
  • 13. Page13 © Hortonworks Inc. 2014 Query preparation – Hive 0.13 SQL parser Semantic analyzer Logical Optimizer Physical Optimizer Abstract Syntax Tree (AST) Hive SQL Annotated AST Plan Tez Tuned Plan
  • 14. Page14 © Hortonworks Inc. 2014 Query preparation – full CBO SQL parser Semantic analyzer Translate to algebra Physical Optimizer Abstract Syntax Tree (AST) Hive SQL Tez Tuned Plan Optiq optimizer RelNode Annotated AST
  • 15. Page15 © Hortonworks Inc. 2014 Query preparation – initial CBO SQL parser Semantic analyzer Logical Optimizer Physical Optimizer Hive SQL AST with optimized join-ordering Tez Tuned Plan Translate to algebra Optiq optimizer
  • 16. Page16 © Hortonworks Inc. 2014 © Hortonworks Inc. 2013 Query Execution – The basics Page 16 SELECT R1.x FROM R1 JOIN R2 ON R1.x = R2.x JOIN R3 on R1.x = R3.x AND R2.x = R3.x WHERE R1.z > 10; p s   R1 R2 R3 TS [R1] TS [R2] RS RS Shuffle Join TS [R3] Map Join Filter FS
  • 17. Page17 © Hortonworks Inc. 2014 © Hortonworks Inc. 2013 Query Optimization – Rule Based vs. Cost Based Page 17 p s   R1 R2 R3 p s   R1 R2 R3 p s   R1 R3 R2 p s   R2 R3 R1
  • 18. Page18 © Hortonworks Inc. 2014 Introduction of CBO into Hive Planning cbo enabled? No Generate Plan w/o multi-way joins Can cbo handle plan? No - Predicate Pushdown - Part. Pruning - Column Pruning - Stats Annotation Pre CBO Optimizer Col stats available? No Optiq-based Planner Hive Plan Revised AST Regular Planning route on new AST with CBO turned off. Fallback to Regular planning: as though cbo is disabled. - < 10 total Join Ops - No Outer Joins - No Windowing, Lateral Views, Script Op. Series of gating factors to get a CBO Plan.
  • 19. Page19 © Hortonworks Inc. 2014 Optiq Planner Process Hive Plan Planner RelNode GraphRelNode Converter RexNode Converter Hive Op  RelNode Hive Expr  RexNode • Node for each node in Input Plan • Each node is a Set of alternate Sub Plans • Set further divided into Subsets: based on traits like sortedness 1. Plan Graph • Rule: specifies a Operator sub-graph to match and logic to generate equivalent ‘better’ sub-graph. • We only have Join Reordering Rules. 2. Rules • RelNodes have Cost (& Cumulative Cost) • We only use Cardinality for Cost. 3. Cost Model - Used to Plugin Schema, Cost Formulas: Selectivity, NDV calculations etc. - We only added Selectivity and NDV formulas; Schema is only available at the Node level 4. Metadata Providers Rule Match Queue - Add Rule matches to Queue - Apply Rule match transformations to Plan Graph - Iterate for fixed iterations or until Cost doesn’t change. - Match importance based on Cost of RelNode and height. Best RelNode Graph AST Converter Revised AST Logical Plan Physical traits: Table Part./Buckets; RedSink Ops removed
  • 20. Page20 © Hortonworks Inc. 2014 Join Reordering Rules a b = b a 1. Swap Join Rule a b = 2. Push Join Through Join Rule c a c b c b a= but is really: Optiq schema is position based b a c 3. So a b c d ≠ a c d b 4. Pull Up Project above Join b a c d a c b d = Added bonus Join permutations across sub-query blocks 5. Merge Projects
  • 21. Page21 © Hortonworks Inc. 2014 Summary Join re-ordering Join cardinality is used for cost All other operators are assumed to have tiny cost Cardinality of filter, join, group-by is based on selectivity Selectivity is computed based on number-of-distinct-values (NDV) Table Stats and Column stats are required Current limitations Only supports: filter, inner join, group-by, project, order-by, limit Not all UDFs Does not attempt all join permutations (e.g. bushy trees; 10-way joins or more) May not work well for Bucket, SMB & Skew Joins Page 21
  • 22. Page22 © Hortonworks Inc. 2014 TPC-DS Query 50 Joins Store Sales, and Store Returns fact tables. Each of the fact tables are independently restricted by date. Analysis at Store grain, so this dimension also joined in. As specified Query starts by joining the 2 Fact tables. select s_store_name , .. other store details ,sum(case when (sr_returned_date_sk - ss_sold_date_sk <= 30 ) then 1 else 0 end) as `30 days`, … from store_sales ss,store_returns sr,store s ,date_dim d1 ,date_dim d2 where d2.d_year = 2000 and d2.d_moy = 9 and ss.ss_ticket_number = sr.sr_ticket_number and ss.ss_item_sk = sr.sr_item_sk and ss.ss_sold_date_sk = d1.d_date_sk and sr.sr_returned_date_sk = d2.d_date_sk and ss.ss_customer_sk = sr.sr_customer_sk and ss.ss_store_sk = s.s_store_sk group by store details order by store details limit 100; Join Graph
  • 23. Page23 © Hortonworks Inc. 2014 TPC-DS Query 50 Specified Join Tree Non CBO Plan CBO Plan
  • 24. Page24 © Hortonworks Inc. 2014 TPC-DS Query 50 Run 1 Run 2 Non CBO 53.1 53.4 CBO 22.5 21.9  1 year test  > 10 mins for Non CBO  CBO time was about the same  Fact tables  partitioned by Day,  bucketed by Item  Bucketing off  Bucketing should help CBO plan.  SR table much smaller. Better chance of Bucket Join in place of Shuffle Join. Join Ordering Cost Estimate ['d2', [[['store_sales', 'd1'], 'store_returns'], 'store']] 515074768.659 ['d1', [[['store_sales', 'store'], 'store_returns'], 'd2']] 448155.355 … ['store_returns', 'd2'] 9938.93 ['store_sales', 'store_returns'] 156727295.634 ['d1', 'store_sales'] 123675664.449 Facts restricted to 3 months Orderings considered by Planner
  • 25. Page25 © Hortonworks Inc. 2014 TPC-DS Query 17 Joins Store Sales, Store Returns and Catalog Sales fact tables. Each of the fact tables are independently restricted by time. Analysis at Item and Store grain, so these dimensions are also joined in. As specified Query starts by joining the 3 Fact tables. select i_item_id ,i_item_desc ,s_state ,count(ss_quantity) as store_sales_quantitycount ,…. from store_sales ss ,store_returns sr, catalog_sales cs, date_dim d1, date_dim d2, date_dim d3, store s, item I where d1.d_quarter_name = '2000Q1’ and d1.d_date_sk = ss.ss_sold_date_sk and i.i_item_sk = ss.ss_item_sk and … group by i_item_id ,i_item_desc, ,s_state order by i_item_id ,i_item_desc, s_state limit 100;
  • 26. Page26 © Hortonworks Inc. 2014 TPC-DS Query 17 Specified Join Tree Non CBO Plan CBO Plan
  • 27. Page27 © Hortonworks Inc. 2014 TPC-DS Query 17 Run 1 Run 2 Non CBO 100.71 127.53 CBO 50.9 44.52  1 year test  > 10 mins for Non CBO  CBO time was about the same  Fact tables  partitioned by Day,  bucketed by Item  Bucketing off  Bucketing should help CBO plan.  SR table much smaller. Better chance of Bucket Join in place of Shuffle Join. Join Ordering Cost Estimate ['item', [[[[[['d2', 'store_returns'], 'store_sales'], 'catalog_sales'], 'd1'], 'd3'], 'store']] 3547898.061 … ['store_returns', 'd2’] 19224.71 ['store_sales', 'store_returns’] 23057497.991 ['d1', 'store_sales'] 26142.943 Facts restricted to 3 months Orderings considered by Planner
  • 28. Page28 © Hortonworks Inc. 2014 Next? Outer joins Scale to larger numbers of joins Support all expressions (UDFs) Join algorithm selection Sortedness & distribution as a trait Trait propagation Better cost model More statistics Move all pre-planning and logical planning to Optiq Use Optiq costs/statistics to help physical planning Constant reduction & tree pruning Rewrite query to use materialized view
  • 29. Page29 © Hortonworks Inc. 2014 Thank you! @julianhyde http://hive.apache.org/ http://incubator.apache.org/projects/optiq.html

Editor's Notes

  • #5: Hive CBO didn’t quite make it into Apache Hive 0.13. This talk: What is CBO? Why are we putting it in Hive? How did we do it? When is it released? And what next?
  • #20: 0. Converters convert a Hive Op. Graph to an Optiq representation. In Optiq we have RelNodes and RexNodes in place of Operators and ExprNodes. The conversion creates a ‘Logical’ plan. RedSinks are dropped; Physical traits like Partitioning/Bucketness is lost. The Plan Graph is the central data structure of the Planner. There is a Node for each Node in the input Plan. A Node represents a Set of equivalent Sub Graphs(Plans). Each Set is further divided into Subsets based on traits: traits capture physical attributes like sortedness/bucketness Rules comprise of a Match Graph Template and an onMatch action. Action generates a ‘better’ equivalent Plan. So Rule match actions populates Plan Graph Sets. Metadata Providers provide all Metadata information to the Planner: Schema, but also Cost Formulas like Selectivity and NDV calculations. RelNodes have Cost. The Cost model encapsulates Cost calculations. Rule Match Queue is a Queue of Rule Matches. Planner runs until the Queue is empty for a fixed number of iterations. The application of a RuleMatch adds to the Plan Graph and also adds new Rule Matches to the Queue. RuleMatches are ordered based on importance: which is based on RelNode cost and distance of Node in Plan from Root.