SlideShare a Scribd company logo
#ISSlearn
#ISSlearn
PANDAS ATE MY DATA
13 July 2018 / Mr. Kenneth Phang
© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
Agenda
Visualization
Bar chart, Histogram and
Pie Chart
Comparison with SQL
Introduction to Pandas Library
Data Structure - Series and
DataFrame
Data Analysis & Manipulation
MultiIndex, GroupBy, Merging,
Joining, Concentanation
2© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
INTRODUCTION TO PANDAS LIBRARY
3© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
Agenda
Visualization
Bar chart, Histogram and
Pie Chart
Comparison with SQL
Introduction to Pandas Library
Data Structure - Series and
DataFrame
Data Analysis & Manipulation
GroupBy, Merging, Joining,
Concentanation
4© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
Series
Series is the data structure for a single column of a DataFrame,
not only conceptually, but literally i.e. the data in a DataFrame
is actually stored in memory as a collection of Series.
5© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
DataFrame
pandas.DataFrame(data=None, index=None, columns=None,
dtype=None, copy=False)
Two-dimensional size-mutable, potentially heterogeneous
tabular data structure with labeled axes (rows and columns).
Arithmetic operations align on both row and column labels. Can
be thought of as a dict-like container for Series objects. The
primary pandas data structure
6© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
SQL Comparison - SELECT
SQL
SELECT total_bill, tip, smoker, time
FROM tips
LIMIT 5;
PD
tips[['total_bill', 'tip', 'smoker', 'time']].head(5)
7© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
SQL Comparison - WHERE
SQL
SELECT *
FROM tips
WHERE time = 'Dinner'
LIMIT 5;
PD
tips[tips['time'] == 'Dinner'].head(5)
8© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
SQL Comparison - WHERE
SQL
SELECT *
FROM tips
WHERE size >= 5 OR total_bill > 45;
PD
tips[(tips['size'] >= 5) | (tips['total_bill'] > 45)]
9© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
SQL Comparison - WHERE NULL
SQL
SELECT *
FROM frame
WHERE col2 IS NULL;
PD
frame[frame['col2'].isna()]
10© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
SQL Comparison - WHERE NULL
SQL
SELECT *
FROM frame
WHERE col1 IS NOT NULL;
PD
frame[frame['col1'].notna()]
11© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
SQL Comparison - GROUP BY
SQL
SELECT sex, count(*)
FROM tips
GROUP BY sex;
/*
Female 87
Male 157
*/
PD
tips.groupby('sex').size()
12© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
SQL Comparison - GROUP BY
PD
In [19]: tips.groupby('sex')['total_bill'].count()
Out[19]:
sex
Female 87
Male 157
Name: total_bill, dtype: int64
13© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
SQL Comparison - INNER JOIN
SQL
SELECT *
FROM df1
INNER JOIN df2
ON df1.key = df2.key;
PD
pd.merge(df1, df2, on='key')
14© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
SQL Comparison - LEFT OUTER JOIN
SQL
-- show all records from df1
SELECT *
FROM df1
LEFT OUTER JOIN df2
ON df1.key = df2.key;
PD
pd.merge(df1, df2, on='key', how='left')
15© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
SQL Comparison - RIGHT JOIN
SQL
SELECT *
FROM df1
RIGHT OUTER JOIN df2
ON df1.key = df2.key;
PD
pd.merge(df1, df2, on='key', how='right')
16© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
SQL Comparison - FULL JOIN
SQL
SELECT *
FROM df1
FULL OUTER JOIN df2
ON df1.key = df2.key;
PD
pd.merge(df1, df2, on='key', how='outer')
17© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
SQL Comparison - UNION
SQL
SELECT city, rank
FROM df1
UNION
SELECT city, rank
FROM df2;
PD
pd.concat([df1, df2]).drop_duplicates()
18© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
Pandas Cheat Sheet
19© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
Pandas Cheat Sheet
20© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
Pandas Cheat Sheet
21© 2018 National University of Singapore. All Rights Reserved
#ISSlearn
Pandas Cheat Sheet
22© 2018 National University of Singapore. All Rights Reserved
#ISSlearn 23
Pandas Workshop
23© 2018 National University of Singapore. All Rights Reserved
Workshop
https://github.com/kenken64/PandasAteMyData
#ISSlearn 24
THANK YOU ☺
kenneth.phang@nus.edu.sg
24© 2018 National University of Singapore. All Rights Reserved

More Related Content

PDF
Data Science as a Career and Intro to R
PDF
Download Python for R Users pdf for free
PDF
PPT
Lecture 17
PPTX
10. Search Tree - Data Structures using C++ by Varsha Patil
PPTX
Toolboxes for data scientists
PDF
Basic Data Engineering
PPTX
3. Stack - Data Structures using C++ by Varsha Patil
Data Science as a Career and Intro to R
Download Python for R Users pdf for free
Lecture 17
10. Search Tree - Data Structures using C++ by Varsha Patil
Toolboxes for data scientists
Basic Data Engineering
3. Stack - Data Structures using C++ by Varsha Patil

Similar to NUS-ISS Learning Day 2018- Pandas ate my data (20)

PDF
Report for internship
PDF
Deep Learning for Recommender Systems with Nick pentreath
PDF
Business Analytics with R
PPTX
Big Data Transformation Powered By Apache Spark.pptx
PPTX
Big Data Transformations Powered By Spark
PDF
Your Data Science Journey - Setting Up Analytics Units From Scratch
PDF
How to Find Patterns in Your Data with SQL
PDF
DeE_Data_Architecture_QA
PDF
Teug gri 20180329
PDF
Evaluating Aggregate Functions of Iceberg Query Using Priority Based Bitmap I...
PDF
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
PPTX
L3_1564812219285.pptx
PPTX
Deep Learning for Recommender Systems
PPTX
To understand the importance of Python libraries in data analysis.
PDF
[計一] Basic r programming final0918
PDF
[計一] Basic r programming final0918
DOCX
employee turnover prediction document.docx
PDF
Intel® AI: AI Lab at Intel
PPTX
Search and Recommendations: 3 Sides of the Same Coin
PPTX
Integrate ERP and CRM Metadata into ER/Studio
Report for internship
Deep Learning for Recommender Systems with Nick pentreath
Business Analytics with R
Big Data Transformation Powered By Apache Spark.pptx
Big Data Transformations Powered By Spark
Your Data Science Journey - Setting Up Analytics Units From Scratch
How to Find Patterns in Your Data with SQL
DeE_Data_Architecture_QA
Teug gri 20180329
Evaluating Aggregate Functions of Iceberg Query Using Priority Based Bitmap I...
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
L3_1564812219285.pptx
Deep Learning for Recommender Systems
To understand the importance of Python libraries in data analysis.
[計一] Basic r programming final0918
[計一] Basic r programming final0918
employee turnover prediction document.docx
Intel® AI: AI Lab at Intel
Search and Recommendations: 3 Sides of the Same Coin
Integrate ERP and CRM Metadata into ER/Studio
Ad

More from NUS-ISS (20)

PDF
Designing Impactful Services and User Experience - Lim Wee Khee
PDF
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
PDF
The Importance of Cybersecurity for Digital Transformation
PDF
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
PDF
Understanding GenAI/LLM and What is Google Offering - Felix Goh
PDF
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
PDF
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...
PDF
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
PDF
Supply Chain Security for Containerised Workloads - Lee Chuk Munn
PDF
Future of Learning - Yap Aye Wee.pdf
PDF
Future of Learning - Khoong Chan Meng
PPTX
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
PDF
Product Management in The Trenches for a Cloud Service
PDF
Overview of Data and Analytics Essentials and Foundations
PDF
Predictive Analytics
PDF
Feature Engineering for IoT
PDF
Master of Technology in Software Engineering
PDF
Master of Technology in Enterprise Business Analytics
PDF
Diagnosing Complex Problems Using System Archetypes
PPTX
Satisfying the ‘-ilities’ of an Enterprise Cloud Service
Designing Impactful Services and User Experience - Lim Wee Khee
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
The Importance of Cybersecurity for Digital Transformation
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Understanding GenAI/LLM and What is Google Offering - Felix Goh
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
Supply Chain Security for Containerised Workloads - Lee Chuk Munn
Future of Learning - Yap Aye Wee.pdf
Future of Learning - Khoong Chan Meng
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Product Management in The Trenches for a Cloud Service
Overview of Data and Analytics Essentials and Foundations
Predictive Analytics
Feature Engineering for IoT
Master of Technology in Software Engineering
Master of Technology in Enterprise Business Analytics
Diagnosing Complex Problems Using System Archetypes
Satisfying the ‘-ilities’ of an Enterprise Cloud Service
Ad

Recently uploaded (20)

PDF
Nekopoi APK 2025 free lastest update
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
iTop VPN Free 5.6.0.5262 Crack latest version 2025
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
Transform Your Business with a Software ERP System
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
assetexplorer- product-overview - presentation
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
System and Network Administraation Chapter 3
PDF
medical staffing services at VALiNTRY
Nekopoi APK 2025 free lastest update
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Upgrade and Innovation Strategies for SAP ERP Customers
Understanding Forklifts - TECH EHS Solution
Reimagine Home Health with the Power of Agentic AI​
Design an Analysis of Algorithms II-SECS-1021-03
iTop VPN Free 5.6.0.5262 Crack latest version 2025
wealthsignaloriginal-com-DS-text-... (1).pdf
PTS Company Brochure 2025 (1).pdf.......
Softaken Excel to vCard Converter Software.pdf
Transform Your Business with a Software ERP System
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
CHAPTER 2 - PM Management and IT Context
Which alternative to Crystal Reports is best for small or large businesses.pdf
assetexplorer- product-overview - presentation
Operating system designcfffgfgggggggvggggggggg
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Design an Analysis of Algorithms I-SECS-1021-03
System and Network Administraation Chapter 3
medical staffing services at VALiNTRY

NUS-ISS Learning Day 2018- Pandas ate my data

  • 1. #ISSlearn #ISSlearn PANDAS ATE MY DATA 13 July 2018 / Mr. Kenneth Phang © 2018 National University of Singapore. All Rights Reserved
  • 2. #ISSlearn Agenda Visualization Bar chart, Histogram and Pie Chart Comparison with SQL Introduction to Pandas Library Data Structure - Series and DataFrame Data Analysis & Manipulation MultiIndex, GroupBy, Merging, Joining, Concentanation 2© 2018 National University of Singapore. All Rights Reserved
  • 3. #ISSlearn INTRODUCTION TO PANDAS LIBRARY 3© 2018 National University of Singapore. All Rights Reserved
  • 4. #ISSlearn Agenda Visualization Bar chart, Histogram and Pie Chart Comparison with SQL Introduction to Pandas Library Data Structure - Series and DataFrame Data Analysis & Manipulation GroupBy, Merging, Joining, Concentanation 4© 2018 National University of Singapore. All Rights Reserved
  • 5. #ISSlearn Series Series is the data structure for a single column of a DataFrame, not only conceptually, but literally i.e. the data in a DataFrame is actually stored in memory as a collection of Series. 5© 2018 National University of Singapore. All Rights Reserved
  • 6. #ISSlearn DataFrame pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False) Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure 6© 2018 National University of Singapore. All Rights Reserved
  • 7. #ISSlearn SQL Comparison - SELECT SQL SELECT total_bill, tip, smoker, time FROM tips LIMIT 5; PD tips[['total_bill', 'tip', 'smoker', 'time']].head(5) 7© 2018 National University of Singapore. All Rights Reserved
  • 8. #ISSlearn SQL Comparison - WHERE SQL SELECT * FROM tips WHERE time = 'Dinner' LIMIT 5; PD tips[tips['time'] == 'Dinner'].head(5) 8© 2018 National University of Singapore. All Rights Reserved
  • 9. #ISSlearn SQL Comparison - WHERE SQL SELECT * FROM tips WHERE size >= 5 OR total_bill > 45; PD tips[(tips['size'] >= 5) | (tips['total_bill'] > 45)] 9© 2018 National University of Singapore. All Rights Reserved
  • 10. #ISSlearn SQL Comparison - WHERE NULL SQL SELECT * FROM frame WHERE col2 IS NULL; PD frame[frame['col2'].isna()] 10© 2018 National University of Singapore. All Rights Reserved
  • 11. #ISSlearn SQL Comparison - WHERE NULL SQL SELECT * FROM frame WHERE col1 IS NOT NULL; PD frame[frame['col1'].notna()] 11© 2018 National University of Singapore. All Rights Reserved
  • 12. #ISSlearn SQL Comparison - GROUP BY SQL SELECT sex, count(*) FROM tips GROUP BY sex; /* Female 87 Male 157 */ PD tips.groupby('sex').size() 12© 2018 National University of Singapore. All Rights Reserved
  • 13. #ISSlearn SQL Comparison - GROUP BY PD In [19]: tips.groupby('sex')['total_bill'].count() Out[19]: sex Female 87 Male 157 Name: total_bill, dtype: int64 13© 2018 National University of Singapore. All Rights Reserved
  • 14. #ISSlearn SQL Comparison - INNER JOIN SQL SELECT * FROM df1 INNER JOIN df2 ON df1.key = df2.key; PD pd.merge(df1, df2, on='key') 14© 2018 National University of Singapore. All Rights Reserved
  • 15. #ISSlearn SQL Comparison - LEFT OUTER JOIN SQL -- show all records from df1 SELECT * FROM df1 LEFT OUTER JOIN df2 ON df1.key = df2.key; PD pd.merge(df1, df2, on='key', how='left') 15© 2018 National University of Singapore. All Rights Reserved
  • 16. #ISSlearn SQL Comparison - RIGHT JOIN SQL SELECT * FROM df1 RIGHT OUTER JOIN df2 ON df1.key = df2.key; PD pd.merge(df1, df2, on='key', how='right') 16© 2018 National University of Singapore. All Rights Reserved
  • 17. #ISSlearn SQL Comparison - FULL JOIN SQL SELECT * FROM df1 FULL OUTER JOIN df2 ON df1.key = df2.key; PD pd.merge(df1, df2, on='key', how='outer') 17© 2018 National University of Singapore. All Rights Reserved
  • 18. #ISSlearn SQL Comparison - UNION SQL SELECT city, rank FROM df1 UNION SELECT city, rank FROM df2; PD pd.concat([df1, df2]).drop_duplicates() 18© 2018 National University of Singapore. All Rights Reserved
  • 19. #ISSlearn Pandas Cheat Sheet 19© 2018 National University of Singapore. All Rights Reserved
  • 20. #ISSlearn Pandas Cheat Sheet 20© 2018 National University of Singapore. All Rights Reserved
  • 21. #ISSlearn Pandas Cheat Sheet 21© 2018 National University of Singapore. All Rights Reserved
  • 22. #ISSlearn Pandas Cheat Sheet 22© 2018 National University of Singapore. All Rights Reserved
  • 23. #ISSlearn 23 Pandas Workshop 23© 2018 National University of Singapore. All Rights Reserved Workshop https://github.com/kenken64/PandasAteMyData
  • 24. #ISSlearn 24 THANK YOU ☺ kenneth.phang@nus.edu.sg 24© 2018 National University of Singapore. All Rights Reserved