SlideShare a Scribd company logo
PANDAS
A POWERFUL DATA MANIPULATION TOOL
/ /EmmaJasonAMyers @jasonamyers
WHAT'S SO SPECIAL ABOUT PANDAS?
1. Tabular/Matrix
2. DataFlexibility
3. DataManipulation
4. Time Series
Introduction to Pandas
INSTALLATIONpipinstallpandas
pipinstallpandasaspd
PANDAS DATA STRUCTURES
Series -basicallyan ordered dictthatcan be named
Dataframe -Alabeled two dimensionaldatatype
SERIES
importpandasaspd
cookies=pd.Series(
[
'ChocolateChip,'
'PeanutButter,'
'GingerMolasses,'
'OatmealRaisin,'
'Sugar',
'Oreo',
]
)
WHAT DOES IT LOOK LIKE?
0 ChocolateChip
1 PeanutButter
2 GingerMolasses
3 OatmealRaisin
4 Sugar
5 Oreo
dtype:object
PROPERTIES
>>>cookies.values
array(['ChocolateChip','PeanutButter','GingerMolasses',
'OatmealRaisin','Sugar','Oreo'],dtype=object)
>>>cookies.index
Int64Index([0,1,2,3,4,5],dtype='int64')
SPECIFYING THE INDEX
cookies=pd.Series([12,10,8,6,4,2],index=['ChocolateChip',
'PeanutButter',
'GingerMolasses',
'OatmealRaisin',
'Sugar',
'PowderSugar'
])
INDEXED SERIES
ChocolateChip 12
PeanutButter 10
GingerMolasses 8
OatmealRaisin 6
Sugar 4
PowderSugar 2
dtype:int64
NAMING THE VALUES AND INDEXES
>>>cookies.name='counts'
>>>cookies.index.name='type'
type
ChocolateChip 12
PeanutButter 10
GingerMolasses 8
OatmealRaisin 6
Sugar 4
PowderSugar 2
Name:counts,dtype:int64
ACCESSING ELEMENTS
>>>cookies[[name.endswith('Sugar')fornameincookies.index]]
Sugar 4
PowderSugar 2
dtype:int64
>>>cookies[cookies>10]
ChocolateChip 12
Name:counts,dtype:int64
DATAFRAMES
df=pd.DataFrame({
'count':[12,10,8,6,2,2,2],
'type':['ChocolateChip','PeanutButter','GingerMolasses','OatmealRaisin','Sug
'owner':['Jason','Jason','Jason','Jason','Jason','Jason','Marvin']
})
count owner type
0 12 Jason ChocolateChip
1 10 Jason PeanutButter
2 8 Jason GingerMolasses
3 6 Jason OatmealRaisin
4 2 Jason Sugar
5 2 Jason PowderSugar
6 2 Marvin Sugar
ACCESSING COLUMNS
>>>df['type']
0 ChocolateChip
1 PeanutButter
2 GingerMolasses
3 OatmealRaisin
4 Sugar
5 PowderSugar
6 Sugar
Name:type,dtype:object
ACCESSING ROWS
>>>df.loc[2]
count 8
owner Jason
type GingerMolasses
Name:2,dtype:object
SLICING ROWS
>>>df.loc[2:5]
count owner type
2 8 Jason GingerMolasses
3 6 Jason OatmealRaisin
4 2 Jason Sugar
5 2 Jason PowderSugar
PIVOTING
>>>df.loc[3:4].T
3 4
count 6 2
owner Jason Jason
type OatmealRaisin Sugar
GROUPING
>>>df.groupby('owner').sum()
count
owner
Jason 40
Marvin 2
>>>df.groupby(['type','owner']).sum()
count
type owner
ChocolateChip Jason 12
GingerMolassesJason 8
OatmealRaisin Jason 6
PeanutButter Jason 10
PowderSugar Jason 2
Sugar Jason 2
Marvin 2
RENAMING COLUMNS
>>>g_sum=df.groupby(['type']).sum()
>>>g_sum.columns=['Total']
Total
sum
ChocolateChip 12
GingerMolasses 8
OatmealRaisin 6
PeanutButter 10
PowderSugar 2
Sugar 4
PIVOT TABLES
>>>pd.pivot_table(df,values='count',index=['type'],columns=['owner'])
Owner Jason Marvin
type
ChocolateChip 12 NaN
GingerMolasses 8 NaN
OatmealRaisin 6 NaN
PeanutButter 10 NaN
PowderSugar 2 NaN
Sugar 2 2
JOINING
>>>df=pivot_t.join(g_sum)
>>>df.fillna(0,inplace=True)
Jason Marvin Total
type
ChocolateChip 12 0 12
GingerMolasses 8 0 8
OatmealRaisin 6 0 6
PeanutButter 10 0 10
PowderSugar 2 0 2
Sugar 2 2 4
Introduction to Pandas
REAL WORLD PROBLEM
Introduction to Pandas
OUR DATASOURCE
2014-06-2417:20:23.014642,0,34,102,0,0,0,60
2014-06-2417:25:01.176772,0,32,174,0,0,0,133
2014-06-2417:30:01.370235,0,28,57,0,0,0,75
2014-07-2114:35:01.797838,0,39,74,0,0,0,30,0,262,2,3,3,0
2014-07-2114:40:02.000434,0,54,143,0,0,0,44,0,499,3,9,9,0
READING FROM A CSV
df=pd.read_csv('results.csv',header=0,quotechar=''')
datetime abuse_passthrough any_abuse_handled ...
0 2014-06-2417:20:23.014642 0 34 ...
SETTING THE DATETIME AS THE INDEX
>>>df['datetime']=pandas.to_datetime(df.datetime)
>>>df.index=df.datetime
>>>deldf['datetime']
abuse_passthrough any_abuse_handled...
datetime ...
2014-06-2417:20:23.014642 0 34...
2014-06-2417:25:01.176772 0 32...
TIME SLICING
>>>df['2014-07-2113:55:00':'2014-07-2114:10:00']
abuse_passthrough any_abuse_handled...
datetime ...
2014-07-2113:55:01.153706 0 24...
2014-07-2114:00:01.372624 0 24...
2014-07-2114:05:01.910827 0 32...
HandlingMissingDataPoints
>>>df.fillna(0,inplace=True)
FUNCTIONS
>>>df.sum()
abuse_passthrough 39
any_abuse_handled 81537
handle_bp_message_handled 271689
handle_bp_message_corrupt_handled 0
error 0
forward_all_unhandled 0
original_message_handled 136116
list_unsubscribe_optout 71
default_handler_dropped 1342285
default_unhandled 2978
default_opt_out_bounce 22044
default_opt_out 23132
default_handler_pattern_dropped 0
dtype:float64
>>>df.sum().sum()
1879891.0
>>>df.mean()
abuse_passthrough 0.009673
any_abuse_handled 20.222470
handle_bp_message_handled 67.383185
handle_bp_message_corrupt_handled 0.000000
error 0.000000
forward_all_unhandled 0.000000
original_message_handled 33.758929
list_unsubscribe_optout 0.017609
default_handler_dropped 332.907986
default_unhandled 0.738591
default_opt_out_bounce 5.467262
default_opt_out 5.737103
default_handler_pattern_dropped 0.000000
dtype:float64
>>>df['2014-07-2113:55:00':'2014-07-2114:10:00'].apply(np.cumsum)
abuse_passthrough any_abuse_handled...
datetime ...
2014-07-2113:55:01.153706 0 24...
2014-07-2114:00:01.372624 0 48...
2014-07-2114:05:01.910827 0 80...
RESAMPLING
>>>d_df=df.resample('1D',how='sum')
abuse_passthrough any_abuse_handled...
datetime ...
2014-07-07 0 3178...
2014-07-08 1 6536...
2014-07-09 2 6857...
SORTING
>>>d_df.sort('any_abuse_handled',ascending=False)
abuse_passthrough any_abuse_handled...
datetime ...
2014-07-15 21 7664...
2014-07-17 5 7548...
2014-07-10 0 7106...
2014-07-11 10 6942...
DESCRIBE
>>>d_df.describe()
abuse_passthrough any_abuse_handled...
count 15.00000 15.000000...
mean 2.60000 5435.800000...
std 5.79162 1848.716358...
min 0.00000 2174.000000...
25% 0.00000 3810.000000...
50% 0.00000 6191.000000...
75% 1.50000 6899.500000...
max 21.00000 7664.000000...
OUTPUT TO CSV
>>>d_df.to_csv(path_or_buf='output.csv')
Introduction to Pandas
ONE MORE THING...
Vincent
importvincent
CHARTS
chart=vincent.StackedArea(d_df)
chart.legend(title='Legend')
chart.colors(brew='Set3')
EXAMPLE STACKED AREA
EXAMPLE LINE
EXAMPLE PIE
QUESTIONS
JASON A MYERS / @JASONAMYERS

More Related Content

PPTX
Experiments in genetic programming
PDF
Watsonダマされる!? 〜AIが生成した画像 vs 各社の画像認識〜
PDF
Pandas
PDF
pandas - Python Data Analysis
PDF
Introduction to NumPy (PyData SV 2013)
PDF
pandas: Powerful data analysis tools for Python
PDF
Python Static Analysis Tools
PDF
Introduction to SQLAlchemy ORM
Experiments in genetic programming
Watsonダマされる!? 〜AIが生成した画像 vs 各社の画像認識〜
Pandas
pandas - Python Data Analysis
Introduction to NumPy (PyData SV 2013)
pandas: Powerful data analysis tools for Python
Python Static Analysis Tools
Introduction to SQLAlchemy ORM

More from Jason Myers (9)

PDF
Filling the flask
PDF
Building CLIs that Click
PDF
Spanning Tree Algorithm
PDF
SQLAlchemy Core: An Introduction
PDF
Generating Power with Yield
PDF
Introduction to SQLAlchemy and Alembic Migrations
PDF
Diabetes and Me: My Journey So Far
PDF
Selenium testing
PDF
Coderfaire Data Networking for Developers
Filling the flask
Building CLIs that Click
Spanning Tree Algorithm
SQLAlchemy Core: An Introduction
Generating Power with Yield
Introduction to SQLAlchemy and Alembic Migrations
Diabetes and Me: My Journey So Far
Selenium testing
Coderfaire Data Networking for Developers
Ad

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation_ Review paper, used for researhc scholars
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Electronic commerce courselecture one. Pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
The AUB Centre for AI in Media Proposal.docx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Mobile App Security Testing_ A Comprehensive Guide.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Empathic Computing: Creating Shared Understanding
Reach Out and Touch Someone: Haptics and Empathic Computing
Chapter 3 Spatial Domain Image Processing.pdf
Understanding_Digital_Forensics_Presentation.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Weekly Chronicles - August'25 Week I
Big Data Technologies - Introduction.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Machine learning based COVID-19 study performance prediction
Network Security Unit 5.pdf for BCA BBA.
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Ad

Introduction to Pandas