SlideShare a Scribd company logo
Manipulating Data
Using DPLYR()
Rupak Roy
Dplyr() provides a flexible grammar of data manipulation. It's the next iteration
of plyr, focused on tools for working with data frames (hence the d in the
name).
It has three main goals:
 Identify the most important data manipulation verbs and make them easy to
use from R.
 Provide blazing fast performance for in-memory data i.e. large data by
writing key pieces in C++ (using Rcpp)
 Uses the same interface to work with data no matter where it's stored,
whether in a data frame, a data table or a database.
dplyr(): a grammar of data
manipulation
Rupak Roy
>install.packages(dplyr)
>library(dplyr)
#converting the variables to factors
>mtcars$cyl<- as.factor(mtcars$cyl);
>mtcars$am<-as.factor(mtcars$am);
>str(mtcars);
#using OR dpylr()
>dmtcars<-filter(mtcars,cyl==6|cyl==7)
#base R package
>dmtcarss<-mtcars[mtcars$cyl==6|mtcars$cyl == 7,]
>View(dmtcars)
#using AND dplyr()
>dmtcars<-filter(mtcars,cyl==6 & cyl==4)
#using base R package
>dmtcars<- dmtcars<-mtcars[mtcars$cyl==6 & mtcars$cyl ==4, ]
>View(dmtcars)
Subsetting: rows
#using dplyr()
>mtcars_col1<-select(mtcars, mpg, cyl, disp)
>View(mtcars_col)
#using base R-package
>mtcars_col1<-mtcars[ , c("mpg", "cyl", "disp")]
#adding new columns using dplyr:mutate()
>mtcars<-mutate(mtcars, newcol1= ifelse(mtcars$mpg<=15,"luxury",
ifelse(mtcars$mpg<= 20,"sports","economy")) )
#using base R-package where “newcol1” is the new column
>mtcars$newcol1<-ifelse(mtcars$mpg<=15,"luxury",ifelse (mtcars$mpg<=
20,"sports","economy"))
>View(mtcars)
>mtcars<-select(mtcars, -newcol1) #to delete a column
Sub-setting: columns
#arrange using dplyr()
>mtcars<-arrange(mtcars,cyl) #ascending order
>mtcars<-arrange(mtcars, desc(cyl))
#arrange using base R package
>mtcars<-mtcars[order(mtcars$cyl), ]
#group
>group_by(mtcars, cyl)
#summarize
>summarize(mtcars, mean(mpg), sd(mpg))
Order() and Group_by()
Rupak Roy
 Pipelines is a R package helps to better organize the code in
pipeline built with %>% structuring sequences of data operations
left-to-right which is much easier to read, write, and maintain.
 The dplyr R package uses %.% operator which is similar to %>%;
however, it has been deprecated and dplyr now recommends
magrittr that %>% which dplyr imports from magrittr.
Differences between %.%(dplyr) and %>%(magrittr):
> The magrittr package is a much more lightweight package that
exists to define only that pipe-like operator.
> Minimizing the need for local variables and function definitions.
Pipelines %>%(pipe operator)
#using base R package to find the average whose cylinder = 4
>mean(mtcars[mtcars$cyl=="4","mpg"])
Note: here we have use “4” as cyl data type is factor and not numeric else ==4
#using dpylr
>summarize(filter(mtcars,cyl=="4"), mean(mpg))
#using pipe
>mtcars%>%filter(cyl=="4")%>%summarize(mean(mpg))
#categorize the mtcars based on mpg in a new column
mtcars%>%mutate(newcol2=ifelse(mpg<=15,"luxury",ifelse (mpg<=
20,"sports","economy")))
magrittr()
Rupak Roy
Next:
We will see how to manipulate data using dates
Manipulating Data
Rupak Roy

More Related Content

PDF
Import web resources using R Studio
PDF
Apache Scoop - Import with Append mode and Last Modified mode
PDF
Manipulating data with dates
PDF
Import and Export Big Data using R Studio
PDF
Export Data using R Studio
PDF
Transpose and manipulate character strings
PDF
Import Data using R
PDF
4 R Tutorial DPLYR Apply Function
Import web resources using R Studio
Apache Scoop - Import with Append mode and Last Modified mode
Manipulating data with dates
Import and Export Big Data using R Studio
Export Data using R Studio
Transpose and manipulate character strings
Import Data using R
4 R Tutorial DPLYR Apply Function

What's hot (19)

PDF
Introduction to scoop and its functions
PDF
Manipulating Data using base R package
PDF
Next Generation Programming in R
PDF
Data manipulation with dplyr
PDF
Grouping & Summarizing Data in R
PPTX
Session 04 pig - slides
PPTX
R language introduction
PDF
SAS and R Code for Basic Statistics
PDF
R code for data manipulation
PDF
R data-import, data-export
 
PDF
Import and Export Excel Data using openxlsx in R Studio
PDF
Import and Export Excel files using XLConnect in R Studio
PDF
R code descriptive statistics of phenotypic data by Avjinder Kaler
PDF
Data manipulation on r
PDF
5 R Tutorial Data Visualization
PPTX
Introduction To R Language
PDF
Data handling in r
PDF
Introduction to Data Mining with R and Data Import/Export in R
PDF
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to scoop and its functions
Manipulating Data using base R package
Next Generation Programming in R
Data manipulation with dplyr
Grouping & Summarizing Data in R
Session 04 pig - slides
R language introduction
SAS and R Code for Basic Statistics
R code for data manipulation
R data-import, data-export
 
Import and Export Excel Data using openxlsx in R Studio
Import and Export Excel files using XLConnect in R Studio
R code descriptive statistics of phenotypic data by Avjinder Kaler
Data manipulation on r
5 R Tutorial Data Visualization
Introduction To R Language
Data handling in r
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Pandas and Time Series Analysis [PyCon DE]
Ad

Similar to Manipulating Data using DPLYR in R Studio (20)

PPTX
Learn to use dplyr (Feb 2015 Philly R User Meetup)
PDF
Dplyr v2 . Exploratory data analysis.pdf
PDF
Dplyr v2 . Exploratory data analysis.pdf
PPTX
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
PPTX
R brownbag seminar 2.3
PPTX
Unit I - introduction to r language 2.pptx
PDF
Broom: Converting Statistical Models to Tidy Data Frames
PPTX
Murtaugh 2022 Appl Comp Genomics Tidyverse lecture.pptx-1.pptx
PDF
R Programming: Transform/Reshape Data In R
PPTX
Unit 2 - Data Manipulation with R.pptx
PDF
Introduction to R for data science
PDF
Data Manipulation Using R (& dplyr)
PDF
Introduction to r studio on aws 2020 05_06
PPTX
Introduction to R
PPT
R programming slides
PDF
Data Visualization With R: Introduction
PDF
Data Visualization With R
PDF
Data Wrangling with dplyr and tidyr Cheat Sheet
PDF
[系列活動] Data exploration with modern R
PDF
Introduction to R Short course Fall 2016
Learn to use dplyr (Feb 2015 Philly R User Meetup)
Dplyr v2 . Exploratory data analysis.pdf
Dplyr v2 . Exploratory data analysis.pdf
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
R brownbag seminar 2.3
Unit I - introduction to r language 2.pptx
Broom: Converting Statistical Models to Tidy Data Frames
Murtaugh 2022 Appl Comp Genomics Tidyverse lecture.pptx-1.pptx
R Programming: Transform/Reshape Data In R
Unit 2 - Data Manipulation with R.pptx
Introduction to R for data science
Data Manipulation Using R (& dplyr)
Introduction to r studio on aws 2020 05_06
Introduction to R
R programming slides
Data Visualization With R: Introduction
Data Visualization With R
Data Wrangling with dplyr and tidyr Cheat Sheet
[系列活動] Data exploration with modern R
Introduction to R Short course Fall 2016
Ad

More from Rupak Roy (20)

PDF
Hierarchical Clustering - Text Mining/NLP
PDF
Clustering K means and Hierarchical - NLP
PDF
Network Analysis - NLP
PDF
Topic Modeling - NLP
PDF
Sentiment Analysis Practical Steps
PDF
NLP - Sentiment Analysis
PDF
Text Mining using Regular Expressions
PDF
Introduction to Text Mining
PDF
Apache Hbase Architecture
PDF
Introduction to Hbase
PDF
Apache Hive Table Partition and HQL
PDF
Installing Apache Hive, internal and external table, import-export
PDF
Introductive to Hive
PDF
Scoop Job, import and export to RDBMS
PDF
Introduction to Flume
PDF
Apache Pig Relational Operators - II
PDF
Passing Parameters using File and Command Line
PDF
Apache PIG Relational Operations
PDF
Apache PIG casting, reference
PDF
Pig Latin, Data Model with Load and Store Functions
Hierarchical Clustering - Text Mining/NLP
Clustering K means and Hierarchical - NLP
Network Analysis - NLP
Topic Modeling - NLP
Sentiment Analysis Practical Steps
NLP - Sentiment Analysis
Text Mining using Regular Expressions
Introduction to Text Mining
Apache Hbase Architecture
Introduction to Hbase
Apache Hive Table Partition and HQL
Installing Apache Hive, internal and external table, import-export
Introductive to Hive
Scoop Job, import and export to RDBMS
Introduction to Flume
Apache Pig Relational Operators - II
Passing Parameters using File and Command Line
Apache PIG Relational Operations
Apache PIG casting, reference
Pig Latin, Data Model with Load and Store Functions

Recently uploaded (20)

PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
top salesforce developer skills in 2025.pdf
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Nekopoi APK 2025 free lastest update
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
System and Network Administration Chapter 2
PPTX
history of c programming in notes for students .pptx
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
How Creative Agencies Leverage Project Management Software.pdf
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
top salesforce developer skills in 2025.pdf
Design an Analysis of Algorithms II-SECS-1021-03
CHAPTER 2 - PM Management and IT Context
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
How to Migrate SBCGlobal Email to Yahoo Easily
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Nekopoi APK 2025 free lastest update
Odoo POS Development Services by CandidRoot Solutions
Softaken Excel to vCard Converter Software.pdf
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Which alternative to Crystal Reports is best for small or large businesses.pdf
Design an Analysis of Algorithms I-SECS-1021-03
System and Network Administration Chapter 2
history of c programming in notes for students .pptx
Wondershare Filmora 15 Crack With Activation Key [2025
ManageIQ - Sprint 268 Review - Slide Deck

Manipulating Data using DPLYR in R Studio

  • 2. Dplyr() provides a flexible grammar of data manipulation. It's the next iteration of plyr, focused on tools for working with data frames (hence the d in the name). It has three main goals:  Identify the most important data manipulation verbs and make them easy to use from R.  Provide blazing fast performance for in-memory data i.e. large data by writing key pieces in C++ (using Rcpp)  Uses the same interface to work with data no matter where it's stored, whether in a data frame, a data table or a database. dplyr(): a grammar of data manipulation Rupak Roy
  • 3. >install.packages(dplyr) >library(dplyr) #converting the variables to factors >mtcars$cyl<- as.factor(mtcars$cyl); >mtcars$am<-as.factor(mtcars$am); >str(mtcars); #using OR dpylr() >dmtcars<-filter(mtcars,cyl==6|cyl==7) #base R package >dmtcarss<-mtcars[mtcars$cyl==6|mtcars$cyl == 7,] >View(dmtcars) #using AND dplyr() >dmtcars<-filter(mtcars,cyl==6 & cyl==4) #using base R package >dmtcars<- dmtcars<-mtcars[mtcars$cyl==6 & mtcars$cyl ==4, ] >View(dmtcars) Subsetting: rows
  • 4. #using dplyr() >mtcars_col1<-select(mtcars, mpg, cyl, disp) >View(mtcars_col) #using base R-package >mtcars_col1<-mtcars[ , c("mpg", "cyl", "disp")] #adding new columns using dplyr:mutate() >mtcars<-mutate(mtcars, newcol1= ifelse(mtcars$mpg<=15,"luxury", ifelse(mtcars$mpg<= 20,"sports","economy")) ) #using base R-package where “newcol1” is the new column >mtcars$newcol1<-ifelse(mtcars$mpg<=15,"luxury",ifelse (mtcars$mpg<= 20,"sports","economy")) >View(mtcars) >mtcars<-select(mtcars, -newcol1) #to delete a column Sub-setting: columns
  • 5. #arrange using dplyr() >mtcars<-arrange(mtcars,cyl) #ascending order >mtcars<-arrange(mtcars, desc(cyl)) #arrange using base R package >mtcars<-mtcars[order(mtcars$cyl), ] #group >group_by(mtcars, cyl) #summarize >summarize(mtcars, mean(mpg), sd(mpg)) Order() and Group_by() Rupak Roy
  • 6.  Pipelines is a R package helps to better organize the code in pipeline built with %>% structuring sequences of data operations left-to-right which is much easier to read, write, and maintain.  The dplyr R package uses %.% operator which is similar to %>%; however, it has been deprecated and dplyr now recommends magrittr that %>% which dplyr imports from magrittr. Differences between %.%(dplyr) and %>%(magrittr): > The magrittr package is a much more lightweight package that exists to define only that pipe-like operator. > Minimizing the need for local variables and function definitions. Pipelines %>%(pipe operator)
  • 7. #using base R package to find the average whose cylinder = 4 >mean(mtcars[mtcars$cyl=="4","mpg"]) Note: here we have use “4” as cyl data type is factor and not numeric else ==4 #using dpylr >summarize(filter(mtcars,cyl=="4"), mean(mpg)) #using pipe >mtcars%>%filter(cyl=="4")%>%summarize(mean(mpg)) #categorize the mtcars based on mpg in a new column mtcars%>%mutate(newcol2=ifelse(mpg<=15,"luxury",ifelse (mpg<= 20,"sports","economy"))) magrittr() Rupak Roy
  • 8. Next: We will see how to manipulate data using dates Manipulating Data Rupak Roy