SlideShare a Scribd company logo
Three-way join in one 
round on Hadoop 
COMP 6231 
GROUP 7 
IRAJ HEDAYATISOMARIN, ZAKARIA NASERELDINE, J INYANG DU
Problem statement 
푅 ⋈ 푆 ⋈ 푇 
In this section of second project we 
aimed to calculate three-way join in 
one round of Map-Reduce algorithm. 
S R 
T 
R join S join T
Algorithm Overview 
First relation: R 
a, b 
Second relation: S 
b, c 
Third relation: T 
c, d 
0 1 2 3 
4 5 6 7 
8 9 10 11 
12 13 14 15 
Mapper 
h(b)=x 
h(c)=y 
R,(a,b) 
S,(b,c) 
T,(c,d) 
x 
y 
<KEY, VALUE>=<(X,Y), (relation_name, tuple)> In memory join 
Coordinate of a reducer in imagined matrix of reducers
Mapping and Hashing 
<KEY, VALUE>=<(X,Y), (relation_name, tuple)> 
Exactly same as input 
Fetch from file name 
Input tuple 
First relation: R 
(h(b),1) 
(h(b),2) 
Second relation: S (h(b),h(c)) 
Third relation: T 
… 
(h(b),11) 
(1,h(c)) 
(1,h(c)) 
… 
(11,h(c)) 
푅푒푑푢푐푒푟 # = (푥 − 1) × # 표푓 푟푒푑푢푐푒푟푠 + 푦 
h(b)=x 
h(c)=y
In-memory join algorithm 
NESTED LOOP JOIN 
For each tuple in R 
For each tuple in S 
If R.b==S.b then 
For each tuple in T 
If S.c==T.c then 
Print (R.a, S.b, S.c, T.d) 
SORT-BASED JOIN ALGORITHM 
1. divide input list in three sorted lists using 
Binary Search 푂(푛 algorithm 
log 푛) 
2. Execute in-memory join algorithm 
•UNTIL R and S are not empty DO 
• IF the first items in both list are equal THEN 
• make sure all the tuples with the same value have 
been joined together and remove them from the list 
• ELSE 
• Choose the smallest one and remove items until 
reach an item equal or greater than the front item in 
the another list 
푂(푛3) 
1.Divide list: 푂(푛 log 푛) 
2.In-memory join: 
1.푅 ⋈ 푆 = 푂 푛 
2.푅푆 ⋈ 푇 = 푂 푛
Number of reducers 
We decide to use a square matrix. This choice would be a constraint on number of reducers. For 
example in this case, we had 128 reducers available but actually we just use 121 of them 
On the other hand selecting different number of reducers in each dimension, we will have data 
replication and inefficiency.
Number of reducers (example 1, 
replication problem) 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
2 
3 
4 
# of reducers=128 
Assumption: R>>T 
Both of them have uniform distribution 
T(R) = 1,000,000 
T(T) = 1,000 
For square matrix: 
Replicated data=1,000,000*11+1,000*11=11,011,000 
For above matrix: 
Replicated data=1,000,000*16+1,000*16=16,016,000
Number of reducers (example 1, 
inefficiency problem) 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
2 
3 
4 
IDLE FULL IDLE FULL 
# of reducers=128 
Assumption: T>>R 
T is not uniformly distributed 
T(R) = 1,000 
T(T) = 1,000,000 
When the range is reduced, it’s more likely two value 
hash in to the same location.
Experimental results 
37 seconds
Any Question?

More Related Content

PDF
Group p1
PPT
4.4 external hashing
PPT
Introduction to R r.nabati - iausdj.ac.ir
PDF
Hashing notes data structures (HASHING AND HASH FUNCTIONS)
PDF
Data structure using c bcse 3102 pcs 1002
PPTX
Hashing In Data Structure
PPTX
R language introduction
PDF
HCR's Series (Expansion of factorial of any natural number)
Group p1
4.4 external hashing
Introduction to R r.nabati - iausdj.ac.ir
Hashing notes data structures (HASHING AND HASH FUNCTIONS)
Data structure using c bcse 3102 pcs 1002
Hashing In Data Structure
R language introduction
HCR's Series (Expansion of factorial of any natural number)

What's hot (20)

PPT
Data structure lecture 2
PPTX
Bca ii dfs u-1 introduction to data structure
PPTX
Hashing Techniques in Data Structures Part2
PDF
DBMS 9 | Extendible Hashing
PPT
Data structure
PPT
4.4 hashing
PDF
Data structure ppt
PDF
2nd puc computer science chapter 3 data structures 1
PPT
358 33 powerpoint-slides_15-hashing-collision_chapter-15
PDF
Application of hashing in better alg design tanmay
PPTX
Stack and Hash Table
PDF
Day 5 u8f13
PPT
5.4 randomized datastructures
PPTX
Set data structure
PPTX
Data structure and its types
PPT
Hashing
PPTX
Row major and column major in 2 d
PPTX
Hashing
PPT
Introduction of data structure
PPT
Lecture 2a arrays
Data structure lecture 2
Bca ii dfs u-1 introduction to data structure
Hashing Techniques in Data Structures Part2
DBMS 9 | Extendible Hashing
Data structure
4.4 hashing
Data structure ppt
2nd puc computer science chapter 3 data structures 1
358 33 powerpoint-slides_15-hashing-collision_chapter-15
Application of hashing in better alg design tanmay
Stack and Hash Table
Day 5 u8f13
5.4 randomized datastructures
Set data structure
Data structure and its types
Hashing
Row major and column major in 2 d
Hashing
Introduction of data structure
Lecture 2a arrays
Ad

Similar to Three way join in one round on hadoop (20)

PPTX
Join operation
PDF
Multiplication of two 3 d sparse matrices using 1d arrays and linked lists
PPTX
Join Operation.pptx
PPTX
unit-2 Query processing and optimization,Query equivalence, Join strategies.pptx
PDF
Theta join (M-bucket-I algorithm explained)
PPTX
Mining of massive datasets
PPTX
Are there trends, changes in the mi.pptx
PPTX
Join Algorithms in MapReduce
PPTX
Join algorithm in MapReduce
PPT
lecture14.ppt
PPTX
Optimal Chain Matrix Multiplication Big Data Perspective
PPTX
lec6.pptx
PPTX
programminghomeworkhelp.com_Advanced Algorithms Homework Help.pptx
PPTX
lecture 41 cost optimization.pptx
PPTX
Algorithm Exam Help
PPTX
Algorithms Exam Help
PPTX
Brute force method
PPTX
1.Array and linklst definition
PPT
Tri Merge Sorting Algorithm
Join operation
Multiplication of two 3 d sparse matrices using 1d arrays and linked lists
Join Operation.pptx
unit-2 Query processing and optimization,Query equivalence, Join strategies.pptx
Theta join (M-bucket-I algorithm explained)
Mining of massive datasets
Are there trends, changes in the mi.pptx
Join Algorithms in MapReduce
Join algorithm in MapReduce
lecture14.ppt
Optimal Chain Matrix Multiplication Big Data Perspective
lec6.pptx
programminghomeworkhelp.com_Advanced Algorithms Homework Help.pptx
lecture 41 cost optimization.pptx
Algorithm Exam Help
Algorithms Exam Help
Brute force method
1.Array and linklst definition
Tri Merge Sorting Algorithm
Ad

Recently uploaded (20)

PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Introduction to the R Programming Language
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPT
Quality review (1)_presentation of this 21
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Introduction to Data Science and Data Analysis
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
.pdf is not working space design for the following data for the following dat...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to the R Programming Language
Qualitative Qantitative and Mixed Methods.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Supervised vs unsupervised machine learning algorithms
SAP 2 completion done . PRESENTATION.pptx
Introduction to machine learning and Linear Models
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Quality review (1)_presentation of this 21
Galatica Smart Energy Infrastructure Startup Pitch Deck
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Mega Projects Data Mega Projects Data
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Data Science and Data Analysis
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Miokarditis (Inflamasi pada Otot Jantung)
Recruitment and Placement PPT.pdfbjfibjdfbjfobj

Three way join in one round on hadoop

  • 1. Three-way join in one round on Hadoop COMP 6231 GROUP 7 IRAJ HEDAYATISOMARIN, ZAKARIA NASERELDINE, J INYANG DU
  • 2. Problem statement 푅 ⋈ 푆 ⋈ 푇 In this section of second project we aimed to calculate three-way join in one round of Map-Reduce algorithm. S R T R join S join T
  • 3. Algorithm Overview First relation: R a, b Second relation: S b, c Third relation: T c, d 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Mapper h(b)=x h(c)=y R,(a,b) S,(b,c) T,(c,d) x y <KEY, VALUE>=<(X,Y), (relation_name, tuple)> In memory join Coordinate of a reducer in imagined matrix of reducers
  • 4. Mapping and Hashing <KEY, VALUE>=<(X,Y), (relation_name, tuple)> Exactly same as input Fetch from file name Input tuple First relation: R (h(b),1) (h(b),2) Second relation: S (h(b),h(c)) Third relation: T … (h(b),11) (1,h(c)) (1,h(c)) … (11,h(c)) 푅푒푑푢푐푒푟 # = (푥 − 1) × # 표푓 푟푒푑푢푐푒푟푠 + 푦 h(b)=x h(c)=y
  • 5. In-memory join algorithm NESTED LOOP JOIN For each tuple in R For each tuple in S If R.b==S.b then For each tuple in T If S.c==T.c then Print (R.a, S.b, S.c, T.d) SORT-BASED JOIN ALGORITHM 1. divide input list in three sorted lists using Binary Search 푂(푛 algorithm log 푛) 2. Execute in-memory join algorithm •UNTIL R and S are not empty DO • IF the first items in both list are equal THEN • make sure all the tuples with the same value have been joined together and remove them from the list • ELSE • Choose the smallest one and remove items until reach an item equal or greater than the front item in the another list 푂(푛3) 1.Divide list: 푂(푛 log 푛) 2.In-memory join: 1.푅 ⋈ 푆 = 푂 푛 2.푅푆 ⋈ 푇 = 푂 푛
  • 6. Number of reducers We decide to use a square matrix. This choice would be a constraint on number of reducers. For example in this case, we had 128 reducers available but actually we just use 121 of them On the other hand selecting different number of reducers in each dimension, we will have data replication and inefficiency.
  • 7. Number of reducers (example 1, replication problem) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 2 3 4 # of reducers=128 Assumption: R>>T Both of them have uniform distribution T(R) = 1,000,000 T(T) = 1,000 For square matrix: Replicated data=1,000,000*11+1,000*11=11,011,000 For above matrix: Replicated data=1,000,000*16+1,000*16=16,016,000
  • 8. Number of reducers (example 1, inefficiency problem) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 2 3 4 IDLE FULL IDLE FULL # of reducers=128 Assumption: T>>R T is not uniformly distributed T(R) = 1,000 T(T) = 1,000,000 When the range is reduced, it’s more likely two value hash in to the same location.