SlideShare a Scribd company logo
1
Agenda
inner join
left join
right join
semi join
anti join
full join
•
•
•
•
•
•
2
3
Case Study
details of customers who have placed orders and their order details
details of customers and their orders irrespective of whether a customer has placed orders or not.
get customer details for all orders
get customer data, if available, for all orders
details of customers who have not placed orders
details of all customers and all orders
•
•
•
•
•
•
4
5
Libraries
library(dplyr)
library(readr)
6
Data: Orders
order <- read_delim('https://raw.githubusercontent.com/rsquaredacademy/d
## # A tibble: 300 x 3
## id order_date amount
## <dbl> <chr> <dbl>
## 1 368 7/2/2016 365.
## 2 286 11/2/2016 2064.
## 3 28 2/22/2017 432.
## 4 309 3/5/2017 480.
## 5 2 12/28/2016 235.
## 6 31 12/30/2016 2745.
## 7 179 12/21/2016 2358.
## 8 484 11/24/2016 1031.
## 9 115 9/9/2016 1218.
## 10 340 5/6/2017 1184.
## # ... with 290 more rows
7
Data: Customers
customer <- read_delim('https://raw.githubusercontent.com/rsquaredacadem
## # A tibble: 91 x 3
## id first_name city
## <dbl> <chr> <chr>
## 1 1 Elbertine California
## 2 2 Marcella Colorado
## 3 3 Daria Florida
## 4 4 Sherilyn Distric...
## 5 5 Ketty Texas
## 6 6 Jethro California
## 7 7 Jeremiah California
## 8 8 Constancia Texas
## 9 9 Muire Idaho
## 10 10 Abigail Texas
## # ... with 81 more rows
8
9
10
11
Case Study
inner_join(customer, order, by = "id")
## # A tibble: 55 x 5
## id first_name city order_date amount
## <dbl> <chr> <chr> <chr> <dbl>
## 1 2 Marcella Colorado 12/28/2016 235.
## 2 2 Marcella Colorado 8/31/2016 1150.
## 3 5 Ketty Texas 1/17/2017 346.
## 4 6 Jethro California 1/27/2017 2317.
## 5 7 Jeremiah California 6/21/2016 136.
## 6 7 Jeremiah California 2/13/2017 1407.
## 7 7 Jeremiah California 7/8/2016 1914.
## 8 8 Constancia Texas 11/5/2016 2461.
## 9 8 Constancia Texas 5/19/2017 2714.
## 10 9 Muire Idaho 12/28/2016 187.
## # ... with 45 more rows
12
13
14
Case Study
left_join(customer, order, by = "id")
## # A tibble: 104 x 5
## id first_name city order_date amount
## <dbl> <chr> <chr> <chr> <dbl>
## 1 1 Elbertine California <NA> NA
## 2 2 Marcella Colorado 12/28/2016 235.
## 3 2 Marcella Colorado 8/31/2016 1150.
## 4 3 Daria Florida <NA> NA
## 5 4 Sherilyn Distric... <NA> NA
## 6 5 Ketty Texas 1/17/2017 346.
## 7 6 Jethro California 1/27/2017 2317.
## 8 7 Jeremiah California 6/21/2016 136.
## 9 7 Jeremiah California 2/13/2017 1407.
## 10 7 Jeremiah California 7/8/2016 1914.
## # ... with 94 more rows
15
16
17
Case Study
right_join(customer, order, by = "id")
## # A tibble: 300 x 5
## id first_name city order_date amount
## <dbl> <chr> <chr> <chr> <dbl>
## 1 368 <NA> <NA> 7/2/2016 365.
## 2 286 <NA> <NA> 11/2/2016 2064.
## 3 28 Avrit Texas 2/22/2017 432.
## 4 309 <NA> <NA> 3/5/2017 480.
## 5 2 Marcella Colorado 12/28/2016 235.
## 6 31 Clemmie Tennessee 12/30/2016 2745.
## 7 179 <NA> <NA> 12/21/2016 2358.
## 8 484 <NA> <NA> 11/24/2016 1031.
## 9 115 <NA> <NA> 9/9/2016 1218.
## 10 340 <NA> <NA> 5/6/2017 1184.
## # ... with 290 more rows
18
19
20
Case Study
semi_join(customer, order, by = "id")
## # A tibble: 42 x 3
## id first_name city
## <dbl> <chr> <chr>
## 1 2 Marcella Colorado
## 2 5 Ketty Texas
## 3 6 Jethro California
## 4 7 Jeremiah California
## 5 8 Constancia Texas
## 6 9 Muire Idaho
## 7 15 Valentijn California
## 8 16 Monique Missouri
## 9 20 Colette Texas
## 10 28 Avrit Texas
## # ... with 32 more rows
21
22
23
Case Study
anti_join(customer, order, by = "id")
## # A tibble: 49 x 3
## id first_name city
## <dbl> <chr> <chr>
## 1 1 Elbertine California
## 2 3 Daria Florida
## 3 4 Sherilyn Distric...
## 4 10 Abigail Texas
## 5 11 Wynne Georgia
## 6 12 Pietra Minnesota
## 7 13 Bram Iowa
## 8 14 Rees New York
## 9 17 Orazio Louisiana
## 10 18 Mason Texas
## # ... with 39 more rows
24
25
26
Case Study
full_join(customer, order, by = "id")
## # A tibble: 349 x 5
## id first_name city order_date amount
## <dbl> <chr> <chr> <chr> <dbl>
## 1 1 Elbertine California <NA> NA
## 2 2 Marcella Colorado 12/28/2016 235.
## 3 2 Marcella Colorado 8/31/2016 1150.
## 4 3 Daria Florida <NA> NA
## 5 4 Sherilyn Distric... <NA> NA
## 6 5 Ketty Texas 1/17/2017 346.
## 7 6 Jethro California 1/27/2017 2317.
## 8 7 Jeremiah California 6/21/2016 136.
## 9 7 Jeremiah California 2/13/2017 1407.
## 10 7 Jeremiah California 7/8/2016 1914.
## # ... with 339 more rows
27
28

More Related Content

PDF
Solid schemas & advanced sql
PPTX
Data cleansing
PPT
DataMeet 4: Data cleaning & census data
PDF
Handling Date & Time in R
PDF
Market Basket Analysis in R
PDF
Practical Introduction to Web scraping using R
PDF
Explore Data using dplyr
PDF
Data Wrangling with dplyr
Solid schemas & advanced sql
Data cleansing
DataMeet 4: Data cleaning & census data
Handling Date & Time in R
Market Basket Analysis in R
Practical Introduction to Web scraping using R
Explore Data using dplyr
Data Wrangling with dplyr

More from Rsquared Academy (20)

PDF
Writing Readable Code with Pipes
PDF
Introduction to tibbles
PDF
Read data from Excel spreadsheets into R
PDF
Read/Import data from flat/delimited files into R
PDF
Variables & Data Types in R
PDF
How to install & update R packages?
PDF
How to get help in R?
PDF
Introduction to R
PDF
RMySQL Tutorial For Beginners
PDF
R Markdown Tutorial For Beginners
PDF
R Data Visualization Tutorial: Bar Plots
PDF
R Programming: Introduction to Matrices
PDF
R Programming: Introduction to Vectors
PPTX
R Programming: Variables & Data Types
PDF
Data Visualization With R: Learn To Combine Multiple Graphs
PDF
R Data Visualization: Learn To Add Text Annotations To Plots
PDF
Data Visualization With R: Learn To Modify Font Of Graphical Parameters
PDF
Data Visualization With R: Learn To Modify Color Of Plots
PDF
Data Visualization With R: Learn To Modify Title, Axis Labels & Range
PDF
Data Visualization With R: Introduction
Writing Readable Code with Pipes
Introduction to tibbles
Read data from Excel spreadsheets into R
Read/Import data from flat/delimited files into R
Variables & Data Types in R
How to install & update R packages?
How to get help in R?
Introduction to R
RMySQL Tutorial For Beginners
R Markdown Tutorial For Beginners
R Data Visualization Tutorial: Bar Plots
R Programming: Introduction to Matrices
R Programming: Introduction to Vectors
R Programming: Variables & Data Types
Data Visualization With R: Learn To Combine Multiple Graphs
R Data Visualization: Learn To Add Text Annotations To Plots
Data Visualization With R: Learn To Modify Font Of Graphical Parameters
Data Visualization With R: Learn To Modify Color Of Plots
Data Visualization With R: Learn To Modify Title, Axis Labels & Range
Data Visualization With R: Introduction
Ad

Recently uploaded (20)

PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PDF
Global Data and Analytics Market Outlook Report
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Leprosy and NLEP programme community medicine
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Navigating the Thai Supplements Landscape.pdf
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PDF
annual-report-2024-2025 original latest.
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Global Data and Analytics Market Outlook Report
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
IMPACT OF LANDSLIDE.....................
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Leprosy and NLEP programme community medicine
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Qualitative Qantitative and Mixed Methods.pptx
CYBER SECURITY the Next Warefare Tactics
Topic 5 Presentation 5 Lesson 5 Corporate Fin
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Optimise Shopper Experiences with a Strong Data Estate.pdf
[EN] Industrial Machine Downtime Prediction
Navigating the Thai Supplements Landscape.pdf
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
annual-report-2024-2025 original latest.
Ad

Joining Data with dplyr

  • 1. 1
  • 2. Agenda inner join left join right join semi join anti join full join • • • • • • 2
  • 3. 3
  • 4. Case Study details of customers who have placed orders and their order details details of customers and their orders irrespective of whether a customer has placed orders or not. get customer details for all orders get customer data, if available, for all orders details of customers who have not placed orders details of all customers and all orders • • • • • • 4
  • 5. 5
  • 7. Data: Orders order <- read_delim('https://raw.githubusercontent.com/rsquaredacademy/d ## # A tibble: 300 x 3 ## id order_date amount ## <dbl> <chr> <dbl> ## 1 368 7/2/2016 365. ## 2 286 11/2/2016 2064. ## 3 28 2/22/2017 432. ## 4 309 3/5/2017 480. ## 5 2 12/28/2016 235. ## 6 31 12/30/2016 2745. ## 7 179 12/21/2016 2358. ## 8 484 11/24/2016 1031. ## 9 115 9/9/2016 1218. ## 10 340 5/6/2017 1184. ## # ... with 290 more rows 7
  • 8. Data: Customers customer <- read_delim('https://raw.githubusercontent.com/rsquaredacadem ## # A tibble: 91 x 3 ## id first_name city ## <dbl> <chr> <chr> ## 1 1 Elbertine California ## 2 2 Marcella Colorado ## 3 3 Daria Florida ## 4 4 Sherilyn Distric... ## 5 5 Ketty Texas ## 6 6 Jethro California ## 7 7 Jeremiah California ## 8 8 Constancia Texas ## 9 9 Muire Idaho ## 10 10 Abigail Texas ## # ... with 81 more rows 8
  • 9. 9
  • 10. 10
  • 11. 11
  • 12. Case Study inner_join(customer, order, by = "id") ## # A tibble: 55 x 5 ## id first_name city order_date amount ## <dbl> <chr> <chr> <chr> <dbl> ## 1 2 Marcella Colorado 12/28/2016 235. ## 2 2 Marcella Colorado 8/31/2016 1150. ## 3 5 Ketty Texas 1/17/2017 346. ## 4 6 Jethro California 1/27/2017 2317. ## 5 7 Jeremiah California 6/21/2016 136. ## 6 7 Jeremiah California 2/13/2017 1407. ## 7 7 Jeremiah California 7/8/2016 1914. ## 8 8 Constancia Texas 11/5/2016 2461. ## 9 8 Constancia Texas 5/19/2017 2714. ## 10 9 Muire Idaho 12/28/2016 187. ## # ... with 45 more rows 12
  • 13. 13
  • 14. 14
  • 15. Case Study left_join(customer, order, by = "id") ## # A tibble: 104 x 5 ## id first_name city order_date amount ## <dbl> <chr> <chr> <chr> <dbl> ## 1 1 Elbertine California <NA> NA ## 2 2 Marcella Colorado 12/28/2016 235. ## 3 2 Marcella Colorado 8/31/2016 1150. ## 4 3 Daria Florida <NA> NA ## 5 4 Sherilyn Distric... <NA> NA ## 6 5 Ketty Texas 1/17/2017 346. ## 7 6 Jethro California 1/27/2017 2317. ## 8 7 Jeremiah California 6/21/2016 136. ## 9 7 Jeremiah California 2/13/2017 1407. ## 10 7 Jeremiah California 7/8/2016 1914. ## # ... with 94 more rows 15
  • 16. 16
  • 17. 17
  • 18. Case Study right_join(customer, order, by = "id") ## # A tibble: 300 x 5 ## id first_name city order_date amount ## <dbl> <chr> <chr> <chr> <dbl> ## 1 368 <NA> <NA> 7/2/2016 365. ## 2 286 <NA> <NA> 11/2/2016 2064. ## 3 28 Avrit Texas 2/22/2017 432. ## 4 309 <NA> <NA> 3/5/2017 480. ## 5 2 Marcella Colorado 12/28/2016 235. ## 6 31 Clemmie Tennessee 12/30/2016 2745. ## 7 179 <NA> <NA> 12/21/2016 2358. ## 8 484 <NA> <NA> 11/24/2016 1031. ## 9 115 <NA> <NA> 9/9/2016 1218. ## 10 340 <NA> <NA> 5/6/2017 1184. ## # ... with 290 more rows 18
  • 19. 19
  • 20. 20
  • 21. Case Study semi_join(customer, order, by = "id") ## # A tibble: 42 x 3 ## id first_name city ## <dbl> <chr> <chr> ## 1 2 Marcella Colorado ## 2 5 Ketty Texas ## 3 6 Jethro California ## 4 7 Jeremiah California ## 5 8 Constancia Texas ## 6 9 Muire Idaho ## 7 15 Valentijn California ## 8 16 Monique Missouri ## 9 20 Colette Texas ## 10 28 Avrit Texas ## # ... with 32 more rows 21
  • 22. 22
  • 23. 23
  • 24. Case Study anti_join(customer, order, by = "id") ## # A tibble: 49 x 3 ## id first_name city ## <dbl> <chr> <chr> ## 1 1 Elbertine California ## 2 3 Daria Florida ## 3 4 Sherilyn Distric... ## 4 10 Abigail Texas ## 5 11 Wynne Georgia ## 6 12 Pietra Minnesota ## 7 13 Bram Iowa ## 8 14 Rees New York ## 9 17 Orazio Louisiana ## 10 18 Mason Texas ## # ... with 39 more rows 24
  • 25. 25
  • 26. 26
  • 27. Case Study full_join(customer, order, by = "id") ## # A tibble: 349 x 5 ## id first_name city order_date amount ## <dbl> <chr> <chr> <chr> <dbl> ## 1 1 Elbertine California <NA> NA ## 2 2 Marcella Colorado 12/28/2016 235. ## 3 2 Marcella Colorado 8/31/2016 1150. ## 4 3 Daria Florida <NA> NA ## 5 4 Sherilyn Distric... <NA> NA ## 6 5 Ketty Texas 1/17/2017 346. ## 7 6 Jethro California 1/27/2017 2317. ## 8 7 Jeremiah California 6/21/2016 136. ## 9 7 Jeremiah California 2/13/2017 1407. ## 10 7 Jeremiah California 7/8/2016 1914. ## # ... with 339 more rows 27
  • 28. 28