SlideShare a Scribd company logo
Reducing Cache Misses in Hash Join 
                                 Probing Phase By Pre‐sorting Strategy
                       Gi-Hwan Oh, Jae-Myung Kim, Woon-Hak Kang, and Sang-Won Lee


Background and Motivation
• Evolution of In‐memory Hash Join, |R| < |S| 
 ‐ Partitioning (Shatdal et al. VLDB`94)                                          ‐ Multi‐core Radix‐clustering (Kim et al. VLDB`09)
   ‐ |R|>> cache size                                                               ‐ Hash join  on multicore
   =>  |R.partition| < cache size                                                 ‐ No partitioning (revisited) (Blanas et al. SIGMOD`11)
 ‐ Radix‐clustering (Manegold et al. VLDB`00)                                       ‐ Multicore and skewed data
   ‐ # partitions >> TLB entries                                                    ‐ Cons : High cache miss ( 99% in probe phase )
   =>  Multi‐pass partitioning

Proposed Scheme: Pre-sorting
 Cache Miss in Probe Phase                                                                            S                         Hash
                                                                                                                                 Table       R
  ‐ Reference pattern on hash table is random




                                                                                                              In‐Cache
    ‐ Cause cache misses




                                                                                                                Sort
 Strategy: Sorting ‘S’ Relation by Join Attribute                                                                       Probe
  ‐ Change reference pattern of a hash table of relation ‘R’                                                                         Build
    ‐ Global random ‐> clustering in local scope (better temporal                                                                    Hash
                                                                                                                                     Table
      and spatial locality in cache access)
  ‐ Sorting all records of ‘S’ is unrealistic
    ‐ In‐cache sorting: sorting buffer size = the largest private cache                                 Benefits of This Strategy
  ‐ Maximize the number of records to sort                                                              ‐ Reduces cache misses
    ‐ Extract (key,rid) to reduce record size (like alpha sort)                                         ‐ Can be applied on any hash 
 Reduction cache miss >> Sorting overhead                                                                join algorithms
                                     Cycles                     Cache Misses
                Normal Data               183,850,677,579               105,341,310
        Fully Sorted Data                   27,370,971,180                 389,938




Performance Evaluation
                Cycles          Execution Time                                         Environmental Setting
                200                                                                    ‐ HW: Intel Core‐i7 860, 2.8GHz  (private 256K, 
     Billions




                                                                                         shared 8M), 12GB RAM
                150
                                                                                       ‐ OS: Linux  2.6.32‐220.7 (CentOS)
                100
                                                                                       ‐ Dataset: |R| = 16M, |S| = 256M, hash 
                                                                                         entries=1M, Schema R & S = (long, long), 
                50
                                                                                         Uniform distribution

                 0
                           IP       IP+SP           NP          NP+SP
                                                                                       Evaluation Result
                          PART    BUILD     PROBE        SORT                          ‐ Outperforms all other algorithms
                      • IP=Independent Partitioning                                    ‐ IP+SP is 30% faster than IP
                      • NP=No Partitioning                                             ‐ NP+SP is 30% faster than NP
                      • SP=Sorting and Probing

More Related Content

PPTX
Big Data Analysis With RHadoop
PDF
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
PPTX
Hadoop Architecture_Cluster_Cap_Plan
PPT
apache pig performance optimizations talk at apachecon 2010
PPTX
GoodFit: Multi-Resource Packing of Tasks with Dependencies
PPTX
Hadoop performance optimization tips
PPTX
An Introduction to Apache Pig
PDF
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Big Data Analysis With RHadoop
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
Hadoop Architecture_Cluster_Cap_Plan
apache pig performance optimizations talk at apachecon 2010
GoodFit: Multi-Resource Packing of Tasks with Dependencies
Hadoop performance optimization tips
An Introduction to Apache Pig
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab

Viewers also liked (10)

PPTX
Data Step Hash Object vs SQL Join
PPTX
Understanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
DOCX
PPTX
Join operation
PDF
Oracle Database In-Memory Option for ILOUG
PDF
Microsoft SQL Server Physical Join Operators
PDF
Oracle Database In-Memory Option in Action
PPT
Unit iv(dsc++)
PPT
13. Query Processing in DBMS
Data Step Hash Object vs SQL Join
Understanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
Join operation
Oracle Database In-Memory Option for ILOUG
Microsoft SQL Server Physical Join Operators
Oracle Database In-Memory Option in Action
Unit iv(dsc++)
13. Query Processing in DBMS
Ad

Similar to Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting Strategy (20)

PDF
Tajo: A Distributed Data Warehouse System for Hadoop
PPTX
Apache Hadoop YARN 3.x in Alibaba
PPTX
No sql solutions - 공개용
PDF
Apache Drill talk ApacheCon 2018
PDF
Hadoop Overview kdd2011
PDF
DSL - Domain Specific Languages, Chapter 4, Internal DSL
PPT
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
PDF
Architecting the Future of Big Data & Search - Eric Baldeschwieler
PDF
Gpu Join Presentation
PDF
An introduction to apache drill presentation
PDF
Facebook keynote-nicolas-qcon
PDF
支撑Facebook消息处理的h base存储系统
PDF
Facebook Messages & HBase
PDF
DFX Architecture for High-performance Multi-core Microprocessors
PDF
Hadoop Overview & Architecture
 
PDF
PhillyDB Hbase and MapR M7 - March 2013
PDF
Philly DB MapR M7 - March 2013
PDF
Cassandra for Sysadmins
PDF
Microsoft R - Data Science at Scale
PDF
HBase 0.20.0 Performance Evaluation
Tajo: A Distributed Data Warehouse System for Hadoop
Apache Hadoop YARN 3.x in Alibaba
No sql solutions - 공개용
Apache Drill talk ApacheCon 2018
Hadoop Overview kdd2011
DSL - Domain Specific Languages, Chapter 4, Internal DSL
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Gpu Join Presentation
An introduction to apache drill presentation
Facebook keynote-nicolas-qcon
支撑Facebook消息处理的h base存储系统
Facebook Messages & HBase
DFX Architecture for High-performance Multi-core Microprocessors
Hadoop Overview & Architecture
 
PhillyDB Hbase and MapR M7 - March 2013
Philly DB MapR M7 - March 2013
Cassandra for Sysadmins
Microsoft R - Data Science at Scale
HBase 0.20.0 Performance Evaluation
Ad

More from Jaemyung Kim (6)

PDF
Write Amplification: An Analysis of In-Memory Database Durability Techniques
PDF
Database High Availability Using SHADOW Systems
PDF
Altibase DSM: CTable for Pull-based Processing in SPE
PDF
Implementation of Bitmap based Incognito and Performance Evaluation
PDF
IPL 기법의 인덱스 연산 분석
PDF
정보과학회 FTL논문 아이디어
Write Amplification: An Analysis of In-Memory Database Durability Techniques
Database High Availability Using SHADOW Systems
Altibase DSM: CTable for Pull-based Processing in SPE
Implementation of Bitmap based Incognito and Performance Evaluation
IPL 기법의 인덱스 연산 분석
정보과학회 FTL논문 아이디어

Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting Strategy

  • 1. Reducing Cache Misses in Hash Join  Probing Phase By Pre‐sorting Strategy Gi-Hwan Oh, Jae-Myung Kim, Woon-Hak Kang, and Sang-Won Lee Background and Motivation • Evolution of In‐memory Hash Join, |R| < |S|  ‐ Partitioning (Shatdal et al. VLDB`94) ‐ Multi‐core Radix‐clustering (Kim et al. VLDB`09) ‐ |R|>> cache size ‐ Hash join  on multicore =>  |R.partition| < cache size ‐ No partitioning (revisited) (Blanas et al. SIGMOD`11) ‐ Radix‐clustering (Manegold et al. VLDB`00) ‐ Multicore and skewed data ‐ # partitions >> TLB entries ‐ Cons : High cache miss ( 99% in probe phase ) =>  Multi‐pass partitioning Proposed Scheme: Pre-sorting  Cache Miss in Probe Phase S Hash Table R ‐ Reference pattern on hash table is random In‐Cache ‐ Cause cache misses Sort  Strategy: Sorting ‘S’ Relation by Join Attribute Probe ‐ Change reference pattern of a hash table of relation ‘R’ Build ‐ Global random ‐> clustering in local scope (better temporal  Hash Table and spatial locality in cache access) ‐ Sorting all records of ‘S’ is unrealistic ‐ In‐cache sorting: sorting buffer size = the largest private cache  Benefits of This Strategy ‐ Maximize the number of records to sort ‐ Reduces cache misses ‐ Extract (key,rid) to reduce record size (like alpha sort) ‐ Can be applied on any hash   Reduction cache miss >> Sorting overhead join algorithms Cycles Cache Misses Normal Data 183,850,677,579 105,341,310 Fully Sorted Data 27,370,971,180 389,938 Performance Evaluation Cycles Execution Time  Environmental Setting 200 ‐ HW: Intel Core‐i7 860, 2.8GHz  (private 256K,  Billions shared 8M), 12GB RAM 150 ‐ OS: Linux  2.6.32‐220.7 (CentOS) 100 ‐ Dataset: |R| = 16M, |S| = 256M, hash  entries=1M, Schema R & S = (long, long),  50 Uniform distribution 0 IP IP+SP NP NP+SP  Evaluation Result PART BUILD PROBE SORT ‐ Outperforms all other algorithms • IP=Independent Partitioning ‐ IP+SP is 30% faster than IP • NP=No Partitioning ‐ NP+SP is 30% faster than NP • SP=Sorting and Probing