SlideShare a Scribd company logo
Data manipulation:
Data wrangling, aggregation, and
group operations.
AAA-Python Edition
Plan
●
1- Hierarchical indexing
●
2- Combining and merging Data Sets
●
3- Reshaping and pivoting
●
4- Group by Mechanics
●
5- Data aggregation
●
6- Other aggregation operations
3
1-Hierarchical
indexing
[By Amina Delali]
Data Wrangling and hierarchical indexingData Wrangling and hierarchical indexing
●
Data wrangling is the process of cleaning and unifying messy
and complex data sets for easy access and analysis. (from:
https://www.datawatch.com/what-is-data-wrangling/)
●
Hierarchical indexing: is the use of multiple indexes at
different levels
hind
i1, i2 , i3 will be
In the level A
ser1.index
ser1
level 1 level 2
Indices of level 1 Indices of level 2
4
1-Hierarchical
indexing
[By Amina Delali]
Reordering and sortingReordering and sorting
●
Reordering enables interchanging the index levels using the
swaplevel method
●
Sorting enables sorting the data by sorting one level
values, using the sort_index method.
The order of the data (so
the indexes too)
remains the same
The hierarchy of
the indexes changed
The hierarchy of the
Indexes didn’t change
The order of the data
(so the indexes too)
changed: the_level1
index was sorted
5
1-Hierarchical
indexing
[By Amina Delali]
Operations by levelOperations by level
●
df1
df1
If sum was applied on number,
It would perform and addition instead
Of this concatenation
A new column
addedsorted
6
1-Hierarchical
indexing
[By Amina Delali]
indexingindexing
●
df2 The previous df1 columns
are now indexes
The indexes are
converted into
columns
7
2-Combiningand
mergingDataSets
[By Amina Delali]
mergemerge
●
df1
df3
Each row from df1 with “green”
value in “colors” column, will be
combined with each row from
df3 with “green” value in
“mycolors” column.
Both df1 and df2 have “codes”
column, so “_x” and “_y”
suffixes were added.
“red” rows weren’t included
8
2-Combiningand
mergingDataSets
[By Amina Delali]
mergemerge
●
df1
df3
Specifying the argument “how” as
“left”,all rows from df1 were
included (even if no matching
value exists in df3)
The suffixes argument used to customize the
suffixes added to columns with the same name
9
2-Combiningand
mergingDataSets
[By Amina Delali]
mergemerge
●
df3
df11
Combining df11
And df3 by matching
Values from “number
Column of df11,
with values from
Index values of df3
10
2-Combiningand
mergingDataSets
[By Amina Delali]
joinjoin
●
● df3df11
You have to
specify the
suffixes if the
dataframes
have coulmns
with same
names
By default all the
values of df11 were
added
The dataframes
are combined by
matching indexes
values
11
3-Reshapingand
pivoting
[By Amina Delali]
concatconcat
●
combine_firstcombine_first
ser11
ser2
ser3
ser4ser3
ser4
ser3 values
were chosen
over ser4
values
In ser3, “a” corresponding
value == nan and in ser4
“a” corresponding value
is not null, so it was chosen
12
3-Reshapingand
pivoting
[By Amina Delali]
stack & unstackstack & unstack
●
stack: pivot columns label to rows indexes
●
unstack: pivot rows indexes to columns labels
13
3-Reshapingand
pivoting
[By Amina Delali]
●
meltmelt
pivotpivot
Only 2 unique
values for
Code_type
3 unique values
3 columns
The values are obtained
from “value”by matching
“Code_type”and
“number” values
df3
14
4-GroupbyMechanics
[By Amina Delali]
groupbygroupby
●
df11
5 + 7 +8 = 20
i
i
j
j
15
4-GroupbyMechanics
[By Amina Delali]
groupbygroupby
●
myDict
Gr1 values are summed
together.
And Gr2 values are also
summed together
16
5-Dataaggregation
[By Amina Delali]
aggagg
●
Became an index
Remains a column
Columns kept columnsColumns kept columns
We could just write:
groupby(“Code_type”)
17
6-Otheraggregation
operations
[By Amina Delali]
●
We see that the number
values in each interval is
different from the others
We see that the number
values in each interval are
all the same == 2
The data values
in “value3” column
are grouped by
intervals created
by cut(same length)
and qcut (same
size)
cut & qcutcut & qcut
18
6-Otheraggregation
operations
[By Amina Delali]
●
crosstabcrosstab
df11
For “number” value ==1,
corresponds: 2 values == green
and 0 value in blue and red in
“colors”
References
●
Datawatch. What is data wrangling? On-line at
https://www.datawatch.com/what-is-data-wrangling/.
Accessed on 31-10-2018.
●
Wes McKinney. Python for data analysis: Data wrangling
with Pandas, NumPy, and IPython. O’Reilly Media, Inc, 2018.
●
pydata.org. Pandas documentation. On-line at
https://pandas.pydata. org/. Accessed on 19-10-2018.
Thank
you!
FOR ALL YOUR TIME

More Related Content

PDF
Aaa ped-5-Data manipulation: Pandas
PPTX
2. R-basics, Vectors, Arrays, Matrices, Factors
PDF
2 data structure in R
PDF
PPTX
Data structure and its types
PPTX
Bca ii dfs u-1 introduction to data structure
PDF
Advanced data structures vol. 1
PDF
Data structure using c bcse 3102 pcs 1002
Aaa ped-5-Data manipulation: Pandas
2. R-basics, Vectors, Arrays, Matrices, Factors
2 data structure in R
Data structure and its types
Bca ii dfs u-1 introduction to data structure
Advanced data structures vol. 1
Data structure using c bcse 3102 pcs 1002

What's hot (20)

PPTX
Computer Science-Data Structures :Abstract DataType (ADT)
PPTX
Data structure and its types
PPTX
Introduction to data structure
PPTX
Introduction to data structure
PPTX
PDF
Introduction of data structures and algorithms
DOC
Data structures project
PPTX
Data structure
PPTX
Data structure & its types
PDF
Data structures Basics
PDF
Data structures (introduction)
PPTX
Java Databse Connectvity- Alex Jose
PPTX
Data structure and its types.
PPT
Data structures using c
PDF
Elementary data structure
PDF
R code descriptive statistics of phenotypic data by Avjinder Kaler
PDF
Introduction data structure
PDF
UNIT I LINEAR DATA STRUCTURES – LIST
PPT
data structure
PPTX
R Programming Language
Computer Science-Data Structures :Abstract DataType (ADT)
Data structure and its types
Introduction to data structure
Introduction to data structure
Introduction of data structures and algorithms
Data structures project
Data structure
Data structure & its types
Data structures Basics
Data structures (introduction)
Java Databse Connectvity- Alex Jose
Data structure and its types.
Data structures using c
Elementary data structure
R code descriptive statistics of phenotypic data by Avjinder Kaler
Introduction data structure
UNIT I LINEAR DATA STRUCTURES – LIST
data structure
R Programming Language
Ad

Similar to Aaa ped-8- Data manipulation: Data wrangling, aggregation, and group operations (20)

PDF
Aaa ped-6-Data manipulation: Data Files, and Data Cleaning & Preparation
PPTX
Unit 4_Working with Graphs _python (2).pptx
PDF
2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...
PDF
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
PPTX
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
PPTX
Pandas csv
PPTX
introduction to data structures in pandas
PPTX
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
PPTX
Unit 3_Numpy_Vsp.pptx
PPTX
Data Exploration in R.pptx
PDF
Pandas in Depth_ Data Manipultion(Chapter 5)(Important).pdf
PDF
Aaa ped-4- Data manipulation: Numpy
PDF
104333 sri vidhya eng notes
PDF
Lesson 2.2 abstraction
PPTX
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
PPTX
introduction to pandas data structure.pptx
PPT
6-Sorrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrti...
PPTX
2. Data Preprocessing with Numpy and Pandas.pptx
PPTX
Unit 3_Numpy_VP.pptx
PPT
Chapter15
Aaa ped-6-Data manipulation: Data Files, and Data Cleaning & Preparation
Unit 4_Working with Graphs _python (2).pptx
2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
Pandas csv
introduction to data structures in pandas
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
Unit 3_Numpy_Vsp.pptx
Data Exploration in R.pptx
Pandas in Depth_ Data Manipultion(Chapter 5)(Important).pdf
Aaa ped-4- Data manipulation: Numpy
104333 sri vidhya eng notes
Lesson 2.2 abstraction
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
introduction to pandas data structure.pptx
6-Sorrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrti...
2. Data Preprocessing with Numpy and Pandas.pptx
Unit 3_Numpy_VP.pptx
Chapter15
Ad

More from AminaRepo (19)

PDF
Aaa ped-23-Artificial Neural Network: Keras and Tensorfow
PDF
Aaa ped-22-Artificial Neural Network: Introduction to ANN
PDF
Aaa ped-21-Recommender Systems: Content-based Filtering
PDF
Aaa ped-20-Recommender Systems: Model-based collaborative filtering
PDF
Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
PDF
Aaa ped-18-Unsupervised Learning: Association Rule Learning
PDF
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
PDF
Aaa ped-16-Unsupervised Learning: clustering
PDF
Aaa ped-15-Ensemble Learning: Random Forests
PDF
Aaa ped-14-Ensemble Learning: About Ensemble Learning
PDF
Aaa ped-12-Supervised Learning: Support Vector Machines & Naive Bayes Classifer
PDF
Aaa ped-11-Supervised Learning: Multivariable Regressor & Classifers
PDF
Aaa ped-10-Supervised Learning: Introduction to Supervised Learning
PDF
Aaa ped-9-Data manipulation: Time Series & Geographical visualization
PDF
Aaa ped-Data-8- manipulation: Plotting and Visualization
PDF
Aaa ped-3. Pythond: advanced concepts
PDF
Aaa ped-2- Python: Basics
PDF
Aaa ped-1- Python: Introduction to AI, Python and Colab
PDF
Aaa ped-24- Reinforcement Learning
Aaa ped-23-Artificial Neural Network: Keras and Tensorfow
Aaa ped-22-Artificial Neural Network: Introduction to ANN
Aaa ped-21-Recommender Systems: Content-based Filtering
Aaa ped-20-Recommender Systems: Model-based collaborative filtering
Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
Aaa ped-18-Unsupervised Learning: Association Rule Learning
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-16-Unsupervised Learning: clustering
Aaa ped-15-Ensemble Learning: Random Forests
Aaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-12-Supervised Learning: Support Vector Machines & Naive Bayes Classifer
Aaa ped-11-Supervised Learning: Multivariable Regressor & Classifers
Aaa ped-10-Supervised Learning: Introduction to Supervised Learning
Aaa ped-9-Data manipulation: Time Series & Geographical visualization
Aaa ped-Data-8- manipulation: Plotting and Visualization
Aaa ped-3. Pythond: advanced concepts
Aaa ped-2- Python: Basics
Aaa ped-1- Python: Introduction to AI, Python and Colab
Aaa ped-24- Reinforcement Learning

Recently uploaded (20)

PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
Introduction to Cardiovascular system_structure and functions-1
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
2Systematics of Living Organisms t-.pptx
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PDF
Sciences of Europe No 170 (2025)
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Phytochemical Investigation of Miliusa longipes.pdf
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Introduction to Cardiovascular system_structure and functions-1
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
2Systematics of Living Organisms t-.pptx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
2. Earth - The Living Planet Module 2ELS
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Viruses (History, structure and composition, classification, Bacteriophage Re...
Classification Systems_TAXONOMY_SCIENCE8.pptx
Derivatives of integument scales, beaks, horns,.pptx
Sciences of Europe No 170 (2025)
Cell Membrane: Structure, Composition & Functions
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Comparative Structure of Integument in Vertebrates.pptx
The KM-GBF monitoring framework – status & key messages.pptx

Aaa ped-8- Data manipulation: Data wrangling, aggregation, and group operations