PyData meetup #1
Andrey Vykhodtsev <andrej.vihodcev@gmail.com>
Welcome to the first PyData meetup!
• My name is Andrey Vykhodtsev
• I plan to do 1 meetup per 2 months
• Meetups can be more frequent and interesting if you also engage
• Meetup group fee is paid by NumFocus (NCO)
• Attendees get stickers (until out of stickers)
• first meetup is pretty basic (sorry hardcore ppl.)
Agenda
• Short intro Pandas / Numpy
• SQL vs Pandas
• Going through Pandas features in an interactive demo mode
• data loading
• Indexing, manipulating columns and rows
• Grouping, stacking, reshaping, pivoting
• Plotting directly from pandas
• Time series, categoricals
• Gotchas
Why are we talking about this
• Pandas and Numpy are at the core of python datascience toolchain
• PyData is another name
• Numerous Python data science libraries work with or on top of
Pandas / Numpy
• scikit-learn
• scikit-image
• matplotlib
• networkx
• bokeh
• Jupyter (these notebooks are made with Jupyter)
Intro to Numpy
• Numpy is around for long time
• Part of SciPy
• Library to work with
multidimensional arrays
• Very fast
• Vectorization
• Integration with BLAS/Lapack
and Intel MKL
# Pandas
• Wrapper on top of Numpy
• inspired by R data.frame and
data.table
• A lot of flexibility
• sometimes lost performance
• Drawbacks:
• Single-threaded
• Memory bound
Pandas vs. SQL
• SQL syntax can be easier to read
• SQL has mature optimizers
• SQL works on top of databases
• Windowed aggregates are easier to express in SQL
• Pandas is more flexible
• Pandas is geared towards datascience
• Some things are much easier to express in Pandas than in SQL
• how about "give me all columns that start with 'mn_' and aggregate them
using mean"?
Loading data - Available methods
• IO documentation
• from CSV
• from TXT
• from SQL database
• from Excel
• from url, zipped
• clipboard
• json, xml, html
• python pickle files
• h5
reading and writing files
• head command displays first 5 or N rows
• delimiter parameter
• usecols parameter
• setting pandas display options
• setting types while reading
• nrow
Main data structures
• Doc bookmark about data structures
• pd.Series - like column, indexed by some values
• for ints and floats it is actually numpy ndarray
• pd.DataFrame - bunch of pd.Series, sharing same index
• pd.Index, pd.MultiIndex
• pd.GroupedData
• pd.Panel
Indexing and selecting
• by using df.col
• by list of names
• by list or range of numbers
• by boolean vector
• advanced slicing
• query and np.where
• df.loc, df.iloc, df.ix, df.at, df.iat
Further read
• Modern Pandas series
• Pandas docs

More Related Content

PPTX
[Mas 500] Data Basics
PPTX
MongoDB for the SQL Server
PPTX
Accesso ai dati con Azure Data Platform
PDF
Yjs: A Real-Time Framework for Peer-to-peer Group Editing on Arbitrary Data T...
KEY
Benefits of using MongoDB: Reduce Complexity & Adapt to Changes
PDF
Getting Started with MongoDB
PPTX
PPTX
MongoDB
[Mas 500] Data Basics
MongoDB for the SQL Server
Accesso ai dati con Azure Data Platform
Yjs: A Real-Time Framework for Peer-to-peer Group Editing on Arbitrary Data T...
Benefits of using MongoDB: Reduce Complexity & Adapt to Changes
Getting Started with MongoDB
MongoDB

What's hot (17)

PDF
SDEC2011 NoSQL concepts and models
PDF
NoSQL
PPTX
Moo the universe and everything
KEY
Strengths and Weaknesses of MongoDB
KEY
NoSQL in the context of Social Web
PPT
Zotero citation management using zotero _the next gen research by aziz el ...
PDF
Data engineering Stl Big Data IDEA user group
PPTX
MongoDB Introduction - Document Oriented Nosql Database
PPTX
Solr tech talk
PDF
Your backend architecture is what matters slideshare
PDF
Extbase object to xml mapping
KEY
MongoDB at CodeMash 2.0.1.0
PPT
Json - ideal for data interchange
PDF
DatoConference2015
PPTX
Making MySQL Agile-ish
PPTX
No SQL : Which way to go? Presented at DDDMelbourne 2015
PDF
Drupal 7: What's In It For You?
SDEC2011 NoSQL concepts and models
NoSQL
Moo the universe and everything
Strengths and Weaknesses of MongoDB
NoSQL in the context of Social Web
Zotero citation management using zotero _the next gen research by aziz el ...
Data engineering Stl Big Data IDEA user group
MongoDB Introduction - Document Oriented Nosql Database
Solr tech talk
Your backend architecture is what matters slideshare
Extbase object to xml mapping
MongoDB at CodeMash 2.0.1.0
Json - ideal for data interchange
DatoConference2015
Making MySQL Agile-ish
No SQL : Which way to go? Presented at DDDMelbourne 2015
Drupal 7: What's In It For You?
Ad

Viewers also liked (20)

PDF
Fashion product de-duplication with image similarity and LSH
PDF
Installing Hadoop / Spark from scratch
PPTX
Praxis and politics of urban data: Building the Dublin Dashboard
PDF
The ethics of urban big data and smart cities
PPTX
Dublin dashboard launch
PPTX
Ethics and Politics of Big Data
PPTX
Interactive Data Science From Scratch with Apache Zeppelin and Apache Spark
PPTX
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
PDF
Spark Under the Hood - Meetup @ Data Science London
PDF
Working with Fashion Models - PyDataLondon 2016
PPTX
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
PDF
Why your Spark job is failing
PPT
Step-by-Step Introduction to Apache Flink
PDF
Reactive app using actor model & apache spark
PPTX
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
PDF
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
PDF
Developing a Movie recommendation Engine with Spark
PPTX
Why your Spark Job is Failing
PPTX
Big data ppt
PPT
Smart Cities and Big Data - Research Presentation
Fashion product de-duplication with image similarity and LSH
Installing Hadoop / Spark from scratch
Praxis and politics of urban data: Building the Dublin Dashboard
The ethics of urban big data and smart cities
Dublin dashboard launch
Ethics and Politics of Big Data
Interactive Data Science From Scratch with Apache Zeppelin and Apache Spark
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Spark Under the Hood - Meetup @ Data Science London
Working with Fashion Models - PyDataLondon 2016
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Why your Spark job is failing
Step-by-Step Introduction to Apache Flink
Reactive app using actor model & apache spark
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Developing a Movie recommendation Engine with Spark
Why your Spark Job is Failing
Big data ppt
Smart Cities and Big Data - Research Presentation
Ad

Similar to PyData Ljubljana meetup #1 (20)

PDF
What's new in pandas and the SciPy stack for financial users
PDF
Panda data structures and its importance in Python.pdf
PPTX
python-pandas-For-Data-Analysis-Manipulate.pptx
PPTX
Python for statistical analysis
PPTX
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
PPTX
Dc python meetup
PPTX
Data Analysis packages
PPTX
PYTHON PANDAS.pptx
PDF
pandas: Powerful data analysis tools for Python
PPTX
getting started with numpy and pandas.pptx
PDF
Pandas vs. SQL – Tools that Data Scientists use most often.pdf
PPTX
Python.pptx
PPTX
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
PPTX
data science for engineering reference pdf
DOCX
Detailed Report on Basics Of Pandas of Python
PPT
Pandas-and-NumPy-Powerful-Tools-for-Data-Analysis (1).ppt
PDF
Importing Data Sets | Importing Data Sets | Importing Data Sets
PDF
Mastering pandas 1st Edition Femi Anthony
PDF
PyData Paris 2015 - Track 1.2 Gilles Louppe
PDF
Download full ebook of Mastering Pandas Femi Anthony instant download pdf
What's new in pandas and the SciPy stack for financial users
Panda data structures and its importance in Python.pdf
python-pandas-For-Data-Analysis-Manipulate.pptx
Python for statistical analysis
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
Dc python meetup
Data Analysis packages
PYTHON PANDAS.pptx
pandas: Powerful data analysis tools for Python
getting started with numpy and pandas.pptx
Pandas vs. SQL – Tools that Data Scientists use most often.pdf
Python.pptx
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
data science for engineering reference pdf
Detailed Report on Basics Of Pandas of Python
Pandas-and-NumPy-Powerful-Tools-for-Data-Analysis (1).ppt
Importing Data Sets | Importing Data Sets | Importing Data Sets
Mastering pandas 1st Edition Femi Anthony
PyData Paris 2015 - Track 1.2 Gilles Louppe
Download full ebook of Mastering Pandas Femi Anthony instant download pdf

More from Andrey Vykhodtsev (9)

PPTX
Explaining machine learning models with python
PDF
20181003 Whirlwind tour into Pyspark
PDF
20180405 av toxic_comment_classification
PDF
20180328 av kaggle_jigsaw_with_amlwb
PPTX
20170927 py data_n3_bokeh_plotly
PDF
20151015 zagreb spark_notebooks
PDF
20150716 introduction to apache spark v3
PDF
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
PDF
20150617 spark meetup zagreb
Explaining machine learning models with python
20181003 Whirlwind tour into Pyspark
20180405 av toxic_comment_classification
20180328 av kaggle_jigsaw_with_amlwb
20170927 py data_n3_bokeh_plotly
20151015 zagreb spark_notebooks
20150716 introduction to apache spark v3
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
20150617 spark meetup zagreb

Recently uploaded (20)

PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PPTX
statsppt this is statistics ppt for giving knowledge about this topic
PDF
Session 11 - Data Visualization Storytelling (2).pdf
PPTX
The Data Security Envisioning Workshop provides a summary of an organization...
PDF
Global Data and Analytics Market Outlook Report
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PDF
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
SET 1 Compulsory MNH machine learning intro
PPT
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt
PPTX
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
PDF
Loose-Leaf for Auditing & Assurance Services A Systematic Approach 11th ed. E...
PPT
Image processing and pattern recognition 2.ppt
PPTX
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
PPTX
Caseware_IDEA_Detailed_Presentation.pptx
PPTX
New ISO 27001_2022 standard and the changes
PPTX
recommendation Project PPT with details attached
PDF
Navigating the Thai Supplements Landscape.pdf
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
statsppt this is statistics ppt for giving knowledge about this topic
Session 11 - Data Visualization Storytelling (2).pdf
The Data Security Envisioning Workshop provides a summary of an organization...
Global Data and Analytics Market Outlook Report
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
SET 1 Compulsory MNH machine learning intro
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
Loose-Leaf for Auditing & Assurance Services A Systematic Approach 11th ed. E...
Image processing and pattern recognition 2.ppt
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
Caseware_IDEA_Detailed_Presentation.pptx
New ISO 27001_2022 standard and the changes
recommendation Project PPT with details attached
Navigating the Thai Supplements Landscape.pdf

PyData Ljubljana meetup #1

  • 1. PyData meetup #1 Andrey Vykhodtsev <andrej.vihodcev@gmail.com>
  • 2. Welcome to the first PyData meetup! • My name is Andrey Vykhodtsev • I plan to do 1 meetup per 2 months • Meetups can be more frequent and interesting if you also engage • Meetup group fee is paid by NumFocus (NCO) • Attendees get stickers (until out of stickers) • first meetup is pretty basic (sorry hardcore ppl.)
  • 3. Agenda • Short intro Pandas / Numpy • SQL vs Pandas • Going through Pandas features in an interactive demo mode • data loading • Indexing, manipulating columns and rows • Grouping, stacking, reshaping, pivoting • Plotting directly from pandas • Time series, categoricals • Gotchas
  • 4. Why are we talking about this • Pandas and Numpy are at the core of python datascience toolchain • PyData is another name • Numerous Python data science libraries work with or on top of Pandas / Numpy • scikit-learn • scikit-image • matplotlib • networkx • bokeh • Jupyter (these notebooks are made with Jupyter)
  • 5. Intro to Numpy • Numpy is around for long time • Part of SciPy • Library to work with multidimensional arrays • Very fast • Vectorization • Integration with BLAS/Lapack and Intel MKL
  • 6. # Pandas • Wrapper on top of Numpy • inspired by R data.frame and data.table • A lot of flexibility • sometimes lost performance • Drawbacks: • Single-threaded • Memory bound
  • 7. Pandas vs. SQL • SQL syntax can be easier to read • SQL has mature optimizers • SQL works on top of databases • Windowed aggregates are easier to express in SQL • Pandas is more flexible • Pandas is geared towards datascience • Some things are much easier to express in Pandas than in SQL • how about "give me all columns that start with 'mn_' and aggregate them using mean"?
  • 8. Loading data - Available methods • IO documentation • from CSV • from TXT • from SQL database • from Excel • from url, zipped • clipboard • json, xml, html • python pickle files • h5
  • 9. reading and writing files • head command displays first 5 or N rows • delimiter parameter • usecols parameter • setting pandas display options • setting types while reading • nrow
  • 10. Main data structures • Doc bookmark about data structures • pd.Series - like column, indexed by some values • for ints and floats it is actually numpy ndarray • pd.DataFrame - bunch of pd.Series, sharing same index • pd.Index, pd.MultiIndex • pd.GroupedData • pd.Panel
  • 11. Indexing and selecting • by using df.col • by list of names • by list or range of numbers • by boolean vector • advanced slicing • query and np.where • df.loc, df.iloc, df.ix, df.at, df.iat
  • 12. Further read • Modern Pandas series • Pandas docs