SlideShare a Scribd company logo
PythonForDataScience Cheat Sheet
Pandas
Learn Python for Data Science Interactively at www.DataCamp.com
Reshaping Data
DataCamp
Learn Python for Data Science Interactively
Advanced Indexing
Reindexing
>>> s2 = s.reindex(['a','c','d','e','b'])
>>> s3 = s.reindex(range(5),
	 method='bfill')
0 3
1 3
2 3
3 3
4 3
Forward Filling Backward Filling
>>> df.reindex(range(4),
	 method='ffill')
Country Capital Population
0 Belgium Brussels 11190846
1 India New Delhi 1303171035
2 Brazil Brasília 207847528
3 Brazil Brasília 207847528
Pivot
Stack / Unstack
Melt
Combining Data
>>> pd.melt(df2, Gather columns into rows
id_vars=["Date"],
value_vars=["Type", "Value"],
value_name="Observations")
>>> stacked = df5.stack() Pivot a level of column labels
>>> stacked.unstack() Pivot a level of index labels
>>> df3= df2.pivot(index='Date', Spread rows into columns
columns='Type',
values='Value')
>>> arrays = [np.array([1,2,3]),
np.array([5,4,3])]
>>> df5 = pd.DataFrame(np.random.rand(3, 2), index=arrays)
>>> tuples = list(zip(*arrays))
>>> index = pd.MultiIndex.from_tuples(tuples,
names=['first', 'second'])
>>> df6 = pd.DataFrame(np.random.rand(3, 2), index=index)
>>> df2.set_index(["Date", "Type"])
Missing Data
>>> df.dropna() Drop NaN values
>>> df3.fillna(df3.mean()) Fill NaN values with a predetermined value
>>> df2.replace("a", "f") Replace values with others
2016-03-01 a
2016-03-02 b
2016-03-01 c
11.432
13.031
20.784
2016-03-03 a
2016-03-02 a
2016-03-03 c
99.906
1.303
20.784
Date Type Value
0
1
2
3
4
5
Type
Date
2016-03-01
2016-03-02
2016-03-03
a
11.432
1.303
99.906
b
NaN
13.031
NaN
c
20.784
NaN
20.784
Selecting
>>> df3.loc[:,(df3>1).any()] Select cols with any vals >1
>>> df3.loc[:,(df3>1).all()] Select cols with vals > 1
>>> df3.loc[:,df3.isnull().any()] Select cols with NaN
>>> df3.loc[:,df3.notnull().all()] Select cols without NaN
Indexing With isin
>>> df[(df.Country.isin(df2.Type))] Find same elements
>>> df3.filter(items=”a”,”b”]) Filter on values
>>> df.select(lambda x: not x%5) Select specific elements
Where
>>> s.where(s > 0) Subset the data
Query
>>> df6.query('second > first') Query DataFrame
Pivot Table
>>> df4 = pd.pivot_table(df2, Spread rows into columns
values='Value',
index='Date',
columns='Type'])
Merge
Join
Concatenate
>>> pd.merge(data1,
data2,
how='left',
on='X1')
>>> data1.join(data2, how='right')
Vertical
>>> s.append(s2)
Horizontal/Vertical
>>> pd.concat([s,s2],axis=1, keys=['One','Two'])
>>> pd.concat([data1, data2], axis=1, join='inner')
1
2
3
5
4
3
0
1
0
1
0
1
0.233482
0.390959
0.184713
0.237102
0.433522
0.429401
1
2
3
5
4
3
0
0.233482
0.184713
0.433522
1
0.390959
0.237102
0.429401
Stacked
Unstacked
2016-03-01 a
2016-03-02 b
2016-03-01 c
11.432
13.031
20.784
2016-03-03 a
2016-03-02 a
2016-03-03 c
99.906
1.303
20.784
Date Type Value
0
1
2
3
4
5
2016-03-01 Type
2016-03-02 Type
2016-03-01 Type
a
b
c
2016-03-03 Type
2016-03-02 Type
2016-03-03 Type
a
a
c
Date Variable Observations
0
1
2
3
4
5
2016-03-01 Value
2016-03-02 Value
2016-03-01 Value
11.432
13.031
20.784
2016-03-03 Value
2016-03-02 Value
2016-03-03 Value
99.906
1.303
20.784
6
7
8
9
10
11
Iteration
>>> df.iteritems() (Column-index, Series) pairs
>>> df.iterrows() (Row-index, Series) pairs
data1
a
b
c
11.432
1.303
99.906
X1 X2
a
b
d
20.784
NaN
20.784
data2
X1 X3
a
b
c
11.432
1.303
99.906
20.784
NaN
NaN
X1 X2 X3
>>> pd.merge(data1,
data2,
how='outer',
on='X1')
>>> pd.merge(data1,
data2,
how='right',
on='X1')
a
b
d
11.432
1.303
NaN
20.784
NaN
20.784
X1 X2 X3
>>> pd.merge(data1,
data2,
how='inner',
on='X1')
a
b
11.432
1.303
20.784
NaN
X1 X2 X3
a
b
c
11.432
1.303
99.906
20.784
NaN
NaN
X1 X2 X3
d NaN 20.784
Setting/Resetting Index
>>> df.set_index('Country') Set the index
>>> df4 = df.reset_index() Reset the index
>>> df = df.rename(index=str, Rename DataFrame
columns={"Country":"cntry",
"Capital":"cptl",
"Population":"ppltn"})
Duplicate Data
>>> s3.unique() Return unique values
>>> df2.duplicated('Type') Check duplicates
>>> df2.drop_duplicates('Type', keep='last') Drop duplicates
>>> df.index.duplicated() Check index duplicates
Grouping Data
Aggregation
>>> df2.groupby(by=['Date','Type']).mean()
>>> df4.groupby(level=0).sum()
>>> df4.groupby(level=0).agg({'a':lambda x:sum(x)/len(x),
'b': np.sum})
Transformation
>>> customSum = lambda x: (x+x%2)
>>> df4.groupby(level=0).transform(customSum)
MultiIndexing
Dates
Visualization
Also see NumPy Arrays
>>> s.plot()
>>> plt.show()
Also see Matplotlib
>>> import matplotlib.pyplot as plt
>>> df2.plot()
>>> plt.show()
>>> df2['Date']= pd.to_datetime(df2['Date'])
>>> df2['Date']= pd.date_range('2000-1-1',
periods=6,
freq='M')
>>> dates = [datetime(2012,5,1), datetime(2012,5,2)]
>>> index = pd.DatetimeIndex(dates)
>>> index = pd.date_range(datetime(2012,2,1), end, freq='BM')

More Related Content

PDF
Pandas,scipy,numpy cheatsheet
PDF
Python Pandas for Data Science cheatsheet
PDF
Vectors data frames
 
PDF
5. R basics
 
PPTX
The Essence of the Iterator Pattern
PDF
10. Getting Spatial
 
PDF
The Essence of the Iterator Pattern (pdf)
PDF
R for you
Pandas,scipy,numpy cheatsheet
Python Pandas for Data Science cheatsheet
Vectors data frames
 
5. R basics
 
The Essence of the Iterator Pattern
10. Getting Spatial
 
The Essence of the Iterator Pattern (pdf)
R for you

What's hot (18)

PDF
Pandas pythonfordatascience
PPTX
Python Seaborn Data Visualization
PDF
R learning by examples
PDF
R programming intro with examples
PDF
Data Analysis and Programming in R
PPTX
R programming
PDF
Data transformation-cheatsheet
PPTX
Ggplot2 v3
PPTX
Big Data Mining in Indian Economic Survey 2017
PDF
PDF
Pandas Cheat Sheet
PPT
PPTX
R programming language
PDF
Dplyr and Plyr
PPTX
Rewriting Engine for Process Algebras
PDF
Morel, a Functional Query Language
PDF
Webi Report Function Overview
PDF
Rsplit apply combine
Pandas pythonfordatascience
Python Seaborn Data Visualization
R learning by examples
R programming intro with examples
Data Analysis and Programming in R
R programming
Data transformation-cheatsheet
Ggplot2 v3
Big Data Mining in Indian Economic Survey 2017
Pandas Cheat Sheet
R programming language
Dplyr and Plyr
Rewriting Engine for Process Algebras
Morel, a Functional Query Language
Webi Report Function Overview
Rsplit apply combine
Ad

Similar to 3 pandasadvanced (20)

PDF
SciPy 2011 pandas lightning talk
PDF
Pandas cheat sheet
PDF
Pandas cheat sheet_data science
PDF
Data Wrangling with Pandas
PPT
Python Panda Library for python programming.ppt
PDF
Pandas in Python for Data Exploration .pdf
PDF
Data Analysis with Pandas CheatSheet .pdf
PDF
Pandas in Depth_ Data Manipultion(Chapter 5)(Important).pdf
PDF
PyData Paris 2015 - Track 1.2 Gilles Louppe
PDF
pandas.pdf
PDF
pandas (1).pdf
PPTX
Data_Manipulation_with_Pandas that manipulation used
PPTX
DataStructures in Pyhton Pandas and numpy.pptx
PDF
Aaa ped-5-Data manipulation: Pandas
PPTX
ppanda.pptx
PDF
lecture14DATASCIENCE AND MACHINE LER.pdf
PPTX
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
PPTX
Handling Missing Data for Data Analysis.pptx
PDF
pandas dataframe notes.pdf
PPTX
introductiontopandas- for 190615082420.pptx
SciPy 2011 pandas lightning talk
Pandas cheat sheet
Pandas cheat sheet_data science
Data Wrangling with Pandas
Python Panda Library for python programming.ppt
Pandas in Python for Data Exploration .pdf
Data Analysis with Pandas CheatSheet .pdf
Pandas in Depth_ Data Manipultion(Chapter 5)(Important).pdf
PyData Paris 2015 - Track 1.2 Gilles Louppe
pandas.pdf
pandas (1).pdf
Data_Manipulation_with_Pandas that manipulation used
DataStructures in Pyhton Pandas and numpy.pptx
Aaa ped-5-Data manipulation: Pandas
ppanda.pptx
lecture14DATASCIENCE AND MACHINE LER.pdf
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
Handling Missing Data for Data Analysis.pptx
pandas dataframe notes.pdf
introductiontopandas- for 190615082420.pptx
Ad

More from pramod naik (13)

DOCX
Electrical lab
DOC
Dsp manual
PDF
Adversarial search
PDF
2 pandasbasic
PDF
1 pythonbasic
PDF
Chapter07
PDF
Chapter06
PDF
Chapter05
PDF
Chapter04b
PDF
Chapter04a
PDF
Chapter01
PDF
Ilsvrc2015 deep residual_learning_kaiminghe
PDF
Ganesan dhawanrpt
Electrical lab
Dsp manual
Adversarial search
2 pandasbasic
1 pythonbasic
Chapter07
Chapter06
Chapter05
Chapter04b
Chapter04a
Chapter01
Ilsvrc2015 deep residual_learning_kaiminghe
Ganesan dhawanrpt

Recently uploaded (20)

PDF
Chapter 04 - Osseous Systefsdm - Copy.pdf
PPTX
634512141-Untitledgggjjkhhjjkjjjuytffghjjjhrrfgh fffrrttttt uuuujjikkkkkhfdgh...
PDF
Parasitology Tables is read to to delete a hite
PPTX
Micronutrient-Supplementation_RMCAB.pptx
PPTX
Sesame Seeds: Expert Insights on Farming
DOC
Millersville毕业证学历认证,奥古斯塔娜大学毕业证全套证件文凭
PPT
(JD-AGS)area production estimates 18-12-2012 (1).ppt
PDF
Marinate Container for Effortless Meal Preparation
PDF
V6.001-FSSC-22000-V6-Part-1-Presentation.pdf
PPT
PGR513_Jasmonic Acid as a plant growth and hormones
PDF
Chapter 13 - Urinary System -dcsd Copy.pdf
PPT
pathophysiology-140119084712-phpapp01.ppt
PPTX
February 02-2024Daily quality report..pptx
PPTX
FST-401 lecture # 7 Food Chemistry.pptx
PPTX
Vitamin A .pptxjdjdksmxnenxmdmdmdmxmemmxms
PPTX
water supply and waste disposal in food industry pptx
PDF
MODELING ALGORITHM OF ESTIMATION OF RENAL FUNCTION BY THE COCKCROFT AND MDRD ...
PPTX
SOYBEAN PRODUCTION TECHNOLOGIES In the Philippines.pptx
PPTX
Food Product development and Intercultural Marketing.Prefinal.pptx
PPTX
COMPONENTS OF FOOD jgjtgjjgjgjgjgjgjgjg
Chapter 04 - Osseous Systefsdm - Copy.pdf
634512141-Untitledgggjjkhhjjkjjjuytffghjjjhrrfgh fffrrttttt uuuujjikkkkkhfdgh...
Parasitology Tables is read to to delete a hite
Micronutrient-Supplementation_RMCAB.pptx
Sesame Seeds: Expert Insights on Farming
Millersville毕业证学历认证,奥古斯塔娜大学毕业证全套证件文凭
(JD-AGS)area production estimates 18-12-2012 (1).ppt
Marinate Container for Effortless Meal Preparation
V6.001-FSSC-22000-V6-Part-1-Presentation.pdf
PGR513_Jasmonic Acid as a plant growth and hormones
Chapter 13 - Urinary System -dcsd Copy.pdf
pathophysiology-140119084712-phpapp01.ppt
February 02-2024Daily quality report..pptx
FST-401 lecture # 7 Food Chemistry.pptx
Vitamin A .pptxjdjdksmxnenxmdmdmdmxmemmxms
water supply and waste disposal in food industry pptx
MODELING ALGORITHM OF ESTIMATION OF RENAL FUNCTION BY THE COCKCROFT AND MDRD ...
SOYBEAN PRODUCTION TECHNOLOGIES In the Philippines.pptx
Food Product development and Intercultural Marketing.Prefinal.pptx
COMPONENTS OF FOOD jgjtgjjgjgjgjgjgjgjg

3 pandasadvanced

  • 1. PythonForDataScience Cheat Sheet Pandas Learn Python for Data Science Interactively at www.DataCamp.com Reshaping Data DataCamp Learn Python for Data Science Interactively Advanced Indexing Reindexing >>> s2 = s.reindex(['a','c','d','e','b']) >>> s3 = s.reindex(range(5), method='bfill') 0 3 1 3 2 3 3 3 4 3 Forward Filling Backward Filling >>> df.reindex(range(4), method='ffill') Country Capital Population 0 Belgium Brussels 11190846 1 India New Delhi 1303171035 2 Brazil Brasília 207847528 3 Brazil Brasília 207847528 Pivot Stack / Unstack Melt Combining Data >>> pd.melt(df2, Gather columns into rows id_vars=["Date"], value_vars=["Type", "Value"], value_name="Observations") >>> stacked = df5.stack() Pivot a level of column labels >>> stacked.unstack() Pivot a level of index labels >>> df3= df2.pivot(index='Date', Spread rows into columns columns='Type', values='Value') >>> arrays = [np.array([1,2,3]), np.array([5,4,3])] >>> df5 = pd.DataFrame(np.random.rand(3, 2), index=arrays) >>> tuples = list(zip(*arrays)) >>> index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second']) >>> df6 = pd.DataFrame(np.random.rand(3, 2), index=index) >>> df2.set_index(["Date", "Type"]) Missing Data >>> df.dropna() Drop NaN values >>> df3.fillna(df3.mean()) Fill NaN values with a predetermined value >>> df2.replace("a", "f") Replace values with others 2016-03-01 a 2016-03-02 b 2016-03-01 c 11.432 13.031 20.784 2016-03-03 a 2016-03-02 a 2016-03-03 c 99.906 1.303 20.784 Date Type Value 0 1 2 3 4 5 Type Date 2016-03-01 2016-03-02 2016-03-03 a 11.432 1.303 99.906 b NaN 13.031 NaN c 20.784 NaN 20.784 Selecting >>> df3.loc[:,(df3>1).any()] Select cols with any vals >1 >>> df3.loc[:,(df3>1).all()] Select cols with vals > 1 >>> df3.loc[:,df3.isnull().any()] Select cols with NaN >>> df3.loc[:,df3.notnull().all()] Select cols without NaN Indexing With isin >>> df[(df.Country.isin(df2.Type))] Find same elements >>> df3.filter(items=”a”,”b”]) Filter on values >>> df.select(lambda x: not x%5) Select specific elements Where >>> s.where(s > 0) Subset the data Query >>> df6.query('second > first') Query DataFrame Pivot Table >>> df4 = pd.pivot_table(df2, Spread rows into columns values='Value', index='Date', columns='Type']) Merge Join Concatenate >>> pd.merge(data1, data2, how='left', on='X1') >>> data1.join(data2, how='right') Vertical >>> s.append(s2) Horizontal/Vertical >>> pd.concat([s,s2],axis=1, keys=['One','Two']) >>> pd.concat([data1, data2], axis=1, join='inner') 1 2 3 5 4 3 0 1 0 1 0 1 0.233482 0.390959 0.184713 0.237102 0.433522 0.429401 1 2 3 5 4 3 0 0.233482 0.184713 0.433522 1 0.390959 0.237102 0.429401 Stacked Unstacked 2016-03-01 a 2016-03-02 b 2016-03-01 c 11.432 13.031 20.784 2016-03-03 a 2016-03-02 a 2016-03-03 c 99.906 1.303 20.784 Date Type Value 0 1 2 3 4 5 2016-03-01 Type 2016-03-02 Type 2016-03-01 Type a b c 2016-03-03 Type 2016-03-02 Type 2016-03-03 Type a a c Date Variable Observations 0 1 2 3 4 5 2016-03-01 Value 2016-03-02 Value 2016-03-01 Value 11.432 13.031 20.784 2016-03-03 Value 2016-03-02 Value 2016-03-03 Value 99.906 1.303 20.784 6 7 8 9 10 11 Iteration >>> df.iteritems() (Column-index, Series) pairs >>> df.iterrows() (Row-index, Series) pairs data1 a b c 11.432 1.303 99.906 X1 X2 a b d 20.784 NaN 20.784 data2 X1 X3 a b c 11.432 1.303 99.906 20.784 NaN NaN X1 X2 X3 >>> pd.merge(data1, data2, how='outer', on='X1') >>> pd.merge(data1, data2, how='right', on='X1') a b d 11.432 1.303 NaN 20.784 NaN 20.784 X1 X2 X3 >>> pd.merge(data1, data2, how='inner', on='X1') a b 11.432 1.303 20.784 NaN X1 X2 X3 a b c 11.432 1.303 99.906 20.784 NaN NaN X1 X2 X3 d NaN 20.784 Setting/Resetting Index >>> df.set_index('Country') Set the index >>> df4 = df.reset_index() Reset the index >>> df = df.rename(index=str, Rename DataFrame columns={"Country":"cntry", "Capital":"cptl", "Population":"ppltn"}) Duplicate Data >>> s3.unique() Return unique values >>> df2.duplicated('Type') Check duplicates >>> df2.drop_duplicates('Type', keep='last') Drop duplicates >>> df.index.duplicated() Check index duplicates Grouping Data Aggregation >>> df2.groupby(by=['Date','Type']).mean() >>> df4.groupby(level=0).sum() >>> df4.groupby(level=0).agg({'a':lambda x:sum(x)/len(x), 'b': np.sum}) Transformation >>> customSum = lambda x: (x+x%2) >>> df4.groupby(level=0).transform(customSum) MultiIndexing Dates Visualization Also see NumPy Arrays >>> s.plot() >>> plt.show() Also see Matplotlib >>> import matplotlib.pyplot as plt >>> df2.plot() >>> plt.show() >>> df2['Date']= pd.to_datetime(df2['Date']) >>> df2['Date']= pd.date_range('2000-1-1', periods=6, freq='M') >>> dates = [datetime(2012,5,1), datetime(2012,5,2)] >>> index = pd.DatetimeIndex(dates) >>> index = pd.date_range(datetime(2012,2,1), end, freq='BM')