SlideShare a Scribd company logo
Python for Statistical Analysis
AND ITS DIFFERENT PACKAGES
18SE02CE011 : URJA DIYORA
SUBMIT TO :
DR.JASLEEN KAUR
OUTLINE
• Introduction to Pandas
• Data Wrangling with Pandas
• Plotting and Visualization
• NumPy Basics: Arrays and Vectorized Computation
• Statistical Data Modeling
• Data Loading, Storage, and File Formats
• Packages For Statistical Analysis
Introduction to Pandas
• Importing data
• Series and DataFrame objects
• Indexing, data selection and subsetting
• Hierarchical indexing
• Reading and writing files
• Sorting and ranking
• Missing data
• Data summarization
Data Wrangling with Pandas
• Date/time types
• Merging and joining DataFrame objects
• Concatenation
• Reshaping DataFrame objects
• Pivoting
• Data transformation
• Permutation and sampling
• Data aggregation and GroupBy operation
Plotting and Visualization
• Plotting in Pandas vs Matplotlib
• Bar plots
• Histograms
• Box plots
• Grouped plots
• Scatterplots
• Trellis plots
Statistical Data Modeling
• Statistical modeling
• Fitting data to probability distributions
• Fitting regression models
• Model selection
• Bootstrapping
Data Loading, Storage, and File Formats
• Indexing: Can treat one or more columns as the returned DataFrame,
and whether to get column names from the file, the user, or not at all.
• Type inference and data conversion: This includes the user-defined value
conversions and custom list of missing value markers.
• Datetime parsing: Includes combining capability, including combining
date and time information spread over multiple columns into a single
column in the result.
• Iterating: Support for iterating over chunks of very large files.
• Unclean data issues: Skipping rows or a footer, comments, or other
minor things like numeric data with thousands separated by commas
Packages For Statistical Analysis
• pandas >= 0.11.1 and its dependencies
• NumPy >= 1.6.1
• matplotlib >= 1.0.0
• pytz
• IPython >= 0.1.2
• pyzmq
• Tornado
• Optional: statsmodels, xlrd and openpyxl
NumPy Basics: Arrays and Vectorized Computation
• Fast vectorized array operations for data munging and cleaning,
subsetting and filtering, transformation, and any other kinds of
computations
• Common array algorithms like sorting, unique, and set operations
• Efficient descriptive statistics and aggregating/summarizing data
• Data alignment and relational data manipulations for merging and
joining together heterogeneous data sets
• Expressing conditional logic as array expressions instead of loops with if-
elifelse branches
• Group-wise data manipulations (aggregation, transformation, function
application).
Scipy
SciPy is a collection of packages addressing a number of different standard
problem domains in scientific computing.
• SciPy. Integrate: numerical integration routines and differential equation
solvers
• scipy.linalg: linear algebra routines and matrix decompositions extending
beyond those provided in numpy.linalg
• scipy.optimize: function optimizers (minimizers) and root finding
algorithms
• scipy.signal: signal processing tools
• scipy.sparse: sparse matrices and sparse linear system solvers
REFERENCES
http://oreilly.com/catalog/errata.csp?isbn=9781449319793f
corporate@oreilly.com
http://oreil.ly/python_for_data_analysis
http://facebook.com/oreilly
http://twitter.com/oreillymedia
http://www.youtube.com/oreillymedia

More Related Content

PPTX
PDF
Data Mining- Big Data landscape
PPT
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
PPT
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
PPT
Chapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
PPT
Michael Stonebraker How to do Complex Analytics
PPTX
Python data structures - best in class for data analysis
Data Mining- Big Data landscape
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Michael Stonebraker How to do Complex Analytics
Python data structures - best in class for data analysis

What's hot (20)

PPTX
Data Analytics with R and SQL Server
PDF
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
PPT
introduction to data mining tutorial
PPT
Data Mining Concepts and Techniques
PPTX
Major issues in data mining
PPTX
Data warehouse and olap technology
PDF
An R primer for SQL folks
PPT
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
PDF
Dbm630_lecture02-03
PDF
pandas: Powerful data analysis tools for Python
PPT
Data pre processing
PPT
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
PPT
data mining
PDF
pandas: a Foundational Python Library for Data Analysis and Statistics
PPTX
ECU SBL Learning Analytics for Assurance of Learning
PPTX
Data Mining: Key definitions
PPTX
Tatyana Matvienko,Senior Java Developer, Big data storages
PPTX
Big data storages
PPTX
Data Mining: Mining ,associations, and correlations
PPTX
Data mining techniques unit 2
Data Analytics with R and SQL Server
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
introduction to data mining tutorial
Data Mining Concepts and Techniques
Major issues in data mining
Data warehouse and olap technology
An R primer for SQL folks
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Dbm630_lecture02-03
pandas: Powerful data analysis tools for Python
Data pre processing
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
data mining
pandas: a Foundational Python Library for Data Analysis and Statistics
ECU SBL Learning Analytics for Assurance of Learning
Data Mining: Key definitions
Tatyana Matvienko,Senior Java Developer, Big data storages
Big data storages
Data Mining: Mining ,associations, and correlations
Data mining techniques unit 2
Ad

Similar to Python for statistical analysis (20)

PDF
DS LAB MANUAL.pdf
DOCX
python fundamentals
PDF
2Essential-Python-Libraries-for-Data-Analytics[1].pdf
PDF
An Overview of Python for Data Analytics
PPTX
To understand the importance of Python libraries in data analysis.
PPTX
Data Analysis packages
PPTX
Data Science With Python | Python For Data Science | Python Data Science Cour...
PDF
Python for Data Analysis Data Wrangling with Pandas NumPy and IPython Wes Mck...
PPTX
Meetup Junio Data Analysis with python 2018
PPTX
Data Analysis in Python-NumPy
PDF
Python for Data Analysis_ Data Wrangling with Pandas, Numpy, and Ipython ( PD...
PPTX
PyData Ljubljana meetup #1
PPTX
DATA ANALYSIS AND VISUALISATION using python
PDF
Download full ebook of Mastering Pandas Femi Anthony instant download pdf
PDF
Panda data structures and its importance in Python.pdf
PPTX
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python (3).pptx
PPTX
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
PDF
Scientific Python
PPTX
Python for Data Analytics and ML examples
DS LAB MANUAL.pdf
python fundamentals
2Essential-Python-Libraries-for-Data-Analytics[1].pdf
An Overview of Python for Data Analytics
To understand the importance of Python libraries in data analysis.
Data Analysis packages
Data Science With Python | Python For Data Science | Python Data Science Cour...
Python for Data Analysis Data Wrangling with Pandas NumPy and IPython Wes Mck...
Meetup Junio Data Analysis with python 2018
Data Analysis in Python-NumPy
Python for Data Analysis_ Data Wrangling with Pandas, Numpy, and Ipython ( PD...
PyData Ljubljana meetup #1
DATA ANALYSIS AND VISUALISATION using python
Download full ebook of Mastering Pandas Femi Anthony instant download pdf
Panda data structures and its importance in Python.pdf
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python (3).pptx
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Scientific Python
Python for Data Analytics and ML examples
Ad

Recently uploaded (20)

PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
composite construction of structures.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Digital Logic Computer Design lecture notes
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
DOCX
573137875-Attendance-Management-System-original
PPTX
Welding lecture in detail for understanding
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
PPT on Performance Review to get promotions
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Model Code of Practice - Construction Work - 21102022 .pdf
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
composite construction of structures.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
CH1 Production IntroductoryConcepts.pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Digital Logic Computer Design lecture notes
Internet of Things (IOT) - A guide to understanding
CYBER-CRIMES AND SECURITY A guide to understanding
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
573137875-Attendance-Management-System-original
Welding lecture in detail for understanding
OOP with Java - Java Introduction (Basics)
additive manufacturing of ss316l using mig welding
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPT on Performance Review to get promotions

Python for statistical analysis

  • 1. Python for Statistical Analysis AND ITS DIFFERENT PACKAGES 18SE02CE011 : URJA DIYORA SUBMIT TO : DR.JASLEEN KAUR
  • 2. OUTLINE • Introduction to Pandas • Data Wrangling with Pandas • Plotting and Visualization • NumPy Basics: Arrays and Vectorized Computation • Statistical Data Modeling • Data Loading, Storage, and File Formats • Packages For Statistical Analysis
  • 3. Introduction to Pandas • Importing data • Series and DataFrame objects • Indexing, data selection and subsetting • Hierarchical indexing • Reading and writing files • Sorting and ranking • Missing data • Data summarization
  • 4. Data Wrangling with Pandas • Date/time types • Merging and joining DataFrame objects • Concatenation • Reshaping DataFrame objects • Pivoting • Data transformation • Permutation and sampling • Data aggregation and GroupBy operation
  • 5. Plotting and Visualization • Plotting in Pandas vs Matplotlib • Bar plots • Histograms • Box plots • Grouped plots • Scatterplots • Trellis plots
  • 6. Statistical Data Modeling • Statistical modeling • Fitting data to probability distributions • Fitting regression models • Model selection • Bootstrapping
  • 7. Data Loading, Storage, and File Formats • Indexing: Can treat one or more columns as the returned DataFrame, and whether to get column names from the file, the user, or not at all. • Type inference and data conversion: This includes the user-defined value conversions and custom list of missing value markers. • Datetime parsing: Includes combining capability, including combining date and time information spread over multiple columns into a single column in the result. • Iterating: Support for iterating over chunks of very large files. • Unclean data issues: Skipping rows or a footer, comments, or other minor things like numeric data with thousands separated by commas
  • 8. Packages For Statistical Analysis • pandas >= 0.11.1 and its dependencies • NumPy >= 1.6.1 • matplotlib >= 1.0.0 • pytz • IPython >= 0.1.2 • pyzmq • Tornado • Optional: statsmodels, xlrd and openpyxl
  • 9. NumPy Basics: Arrays and Vectorized Computation • Fast vectorized array operations for data munging and cleaning, subsetting and filtering, transformation, and any other kinds of computations • Common array algorithms like sorting, unique, and set operations • Efficient descriptive statistics and aggregating/summarizing data • Data alignment and relational data manipulations for merging and joining together heterogeneous data sets • Expressing conditional logic as array expressions instead of loops with if- elifelse branches • Group-wise data manipulations (aggregation, transformation, function application).
  • 10. Scipy SciPy is a collection of packages addressing a number of different standard problem domains in scientific computing. • SciPy. Integrate: numerical integration routines and differential equation solvers • scipy.linalg: linear algebra routines and matrix decompositions extending beyond those provided in numpy.linalg • scipy.optimize: function optimizers (minimizers) and root finding algorithms • scipy.signal: signal processing tools • scipy.sparse: sparse matrices and sparse linear system solvers