SlideShare a Scribd company logo
DATA ANALYSIS USING PYTHON
Nagendra
Asstt. Professor
B. N. College (University of Delhi)
LEARNING OUTCOME
• What is Data Science ?
• Data Analysis Methodology
• Python Basics
 Variable and Data Types
 Reading Data
 Selecting Filtering the Data
 Data manipulation,
 sorting, grouping,
• Python Libraries for Data Science
 NumPy (Numerical Computation)
 Pandas ( Data Analysis)
 Matplotlib ( Data Visualization)
 SciKit-Learn ( Machine learning Algorithms)
WHAT IS DATA SCIENCE
The process of finding insights/trends/ intelligence from the data
A relatively new field
Deeply rooted to Statistics and Decision Support System
A Multidisciplinary field ( Domain Knowledge, Tools & technology,
Mathematics & Statistics, Problem Solving Skills)
DATA ANALYSIS METHODOLOGY
• Statement of the problem/Objective of
the Study
• Data Preparation
• Feature selection
• Exploratory Data Analysis
PYTHON BASICS
What is Python
• A high-level general-purpose programming language.
• A very popular Data Science tool for data analysis, data visualization and Machine
Learning tasks
• It is a open source and free tool
PYTHON BASICS
How to Download Python
Download the python from the following link
https://www.python.org/downloads
You can also download Python, and Jupytor Notebook from the following link
https://www.anaconda.com/why-anaconda/
PYTHON BASICS
Common Tools in Python Environment
The Python interactive console:
Also called the Python interpreter or Python shell and provides programmers with a quick
way to execute commands and try out or test code without creating a file. (
https://www.python.org/shell/)
Spyder: It is a powerful scientific environment written in Python, for Python, and designed by
and for scientists, engineers and data analysts. It offers a unique combination of the advanced
editing, analysis, debugging, and profiling. (https://pypi.org/project/spyder/)
Jupiter Notebook: It is an open source web application that you can use to create and share
works (code, equations, visualizations, Machine Learning models and texts. (https://jupyter.org)
PYTHON BASICS
Most Popular Python Libraries for Data Science
PYTHON BASICS
 Variable and Types
• Variable is a memory location and placeholder to hold the data
• Most common Python Data Types: float, int, str, List, Tuple, Dictionary
PYTHON BASICS
Basic Operations in Python
Arithmetic Operations
Addition
Subtraction
Multiplication
Division,
Modulo
Relational Operations
Equal
Greater/Greater than
Less/less Than
Logical Operations
TRUE/FALSE
AND
IN
OR
PYTHON BASICS
List
A common Data type in python
Collection comma-separated values (items) between square brackets
Contain same or different types
Mutable behavior Values can add, remove, update/replace the value, slice and dice the
members
PYTHON BASICS
Tuple
A common Type in Python
A tuple is very similar to List A collection of items inside the parenthesis()
Tuple is Immutable ( The value cannot be changed)
Can slice and dice add elements and Delete the entire tuple
PYTHON BASICS
Dictionary
Another common and popular type in Python
A collection of unordered data values
A dictionary holds key value pairs of data The items are separated by commas, and the
whole thing is enclosed in curly braces
Keys are immutable but the values are mutable - can add modify and Delete values
PYTHON BASICS
Function
A function is a collection of reusable codes
We write the function one time and call it to solve the particular task
Two Types of Function:
System Function: max(), min(), len()
User Defined Function – created by the programmer/developer
Main Components of Function: Input, computation, output Global and local function
PYTHON BASICS
Looping - For Loop
The for loop that is used to iterate over elements of a sequence
It is often used when we have a piece of code which we want to repeat "n" number of time.
PYTHON BASICS
Looping - While Loop
The while loop tells the computer to do something as long as the condition is met
It's construct consists of a block of code and a condition.
PYTHON LIBRARY
NumPy
• It uses multidimensional arrays and matrices, as well as functions to perform
the computation
• Allow to perform advanced mathematical and statistical operations on the
above objects
• It provides vectorization of mathematical operations on arrays and matrices
• many other python libraries are built on the top of NumPy library
• Contains Linear algebra operations, Fourier Transformation and Random
number generation
https://numpy.org
PYTHON LIBRARY
Pandas
• It is a Data Analysis tool in Python
• It adds data structures and tools ( Series and Data Frame) designed to work
with table-like data (similar to table in SQL Server environment)
• It provides tools for data manipulation: selecting, reshaping, merging,
sorting, slicing, aggregation etc.
• It integrates time series functionality
• It also handles missing data
https://pandas.pydata.org
PYTHON LIBRARY
Matplotlib
• It is a two- dimensional Data Plotting and Data Visualization library
in Python
• We can create line plots, scatter plots, bar charts, histograms, pie
charts etc.
https://matplotlib.org
PYTHON LIBRARY
Seaborn
• Seaborn is a Python data visualization library based on matplotlib.
• It provides a high-level interface for drawing attractive and
informative statistical graphics
http://seaborn.pydata.org
PYTHON LIBRARY
IPython and Jupyter
• IPython is use for interactive computing and software development.
• IPython provides easy access to operating system’s shell and file system.
• IPython web notebook become Jupyter notebook with support for over 40
programming languages. Ipython system can now used as kernel for using
Python with Jupyter.
• Jupyter provides a productive environment for interactive and exploratory
computing.
PYTHON LIBRARY
Scikit-learn
• It is general purpose machine learning toolkit for Python programmers.
• It includes submodules such as Classification(SVM, nearest neighbors,
random forest, logistic regression, etc.) , Regression, Clustering(k-means,
etc.), Dimensional reduction(PCA, feature selection, etc.), Model selection
(grid search, metrics, etc.), Preprocessing(feature extraction,
normalization).
• It is built on the top of NumPy, SciPy and matplotlib.
https://scikit-learn.org/stable
PYTHON LIBRARY
stasmodel
• It is a statistical analysis and contains statistics and econometrics.
• It has submodule such as Regression model, Analysis of
variance(ANOVA), Time series Analysis, Nonparametric
methods(Kernel density estimation, kernel regression), Visualization
of statistical model results.
QUESTION & ANSWER
What Feedback do you have for me?
Questions:
nagendra.bnc@bn.du.ac.in (Nagendra)
USEFUL LINKS
https://www.python.org/downloads/
https://www.python.org/doc/
https://www.datasciencecentral.com/
https://www.kaggle.com/

More Related Content

PPTX
Basic of python for data analysis
PDF
Python Programming: The Best Language for Every Coder
PPTX
Python ml
PPTX
1.pptx why python for AI in engineering field
PPTX
Abhishek Training PPT.pptx
PPTX
presentation on data science with python
PPTX
Python for ML
PPTX
intro to python.pptx
Basic of python for data analysis
Python Programming: The Best Language for Every Coder
Python ml
1.pptx why python for AI in engineering field
Abhishek Training PPT.pptx
presentation on data science with python
Python for ML
intro to python.pptx

Similar to Data analysis using python in Jupyter notebook.pptx (20)

PDF
Python for Data Science: A Comprehensive Guide
PPTX
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
PPTX
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
PDF
Introduction to python
PPTX
Python in geospatial analysis
PPTX
Python for data science
PPTX
Artificial Intelligence concepts in a Nutshell
PDF
Python indroduction
 
PPTX
DATA SCIENCE PPT.pptx
PDF
Lecture 01 of python programming - data nalaytics.pdf
PDF
Data Analytics with Python: A Comprehensive Approach - CETPA Infotech
PDF
2015 03-28-eb-final
PDF
Travis Oliphant "Python for Speed, Scale, and Science"
PPTX
Python libraries for data science
PPTX
It is about IDLE Python Installation version 3.1.2
PDF
Python Libraries for Data Science - A Must-Know List.pdf
PDF
Keynote at Converge 2019
PPTX
ANN-Lecture2-Python Startup.pptx
PPTX
Introduction_to_Python.pptx
Python for Data Science: A Comprehensive Guide
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to python
Python in geospatial analysis
Python for data science
Artificial Intelligence concepts in a Nutshell
Python indroduction
 
DATA SCIENCE PPT.pptx
Lecture 01 of python programming - data nalaytics.pdf
Data Analytics with Python: A Comprehensive Approach - CETPA Infotech
2015 03-28-eb-final
Travis Oliphant "Python for Speed, Scale, and Science"
Python libraries for data science
It is about IDLE Python Installation version 3.1.2
Python Libraries for Data Science - A Must-Know List.pdf
Keynote at Converge 2019
ANN-Lecture2-Python Startup.pptx
Introduction_to_Python.pptx
Ad

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Advanced methodologies resolving dimensionality complications for autism neur...
The AUB Centre for AI in Media Proposal.docx
20250228 LYD VKU AI Blended-Learning.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Unlocking AI with Model Context Protocol (MCP)
The Rise and Fall of 3GPP – Time for a Sabbatical?
Mobile App Security Testing_ A Comprehensive Guide.pdf
Machine learning based COVID-19 study performance prediction
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
sap open course for s4hana steps from ECC to s4
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Big Data Technologies - Introduction.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Building Integrated photovoltaic BIPV_UPV.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Ad

Data analysis using python in Jupyter notebook.pptx

  • 1. DATA ANALYSIS USING PYTHON Nagendra Asstt. Professor B. N. College (University of Delhi)
  • 2. LEARNING OUTCOME • What is Data Science ? • Data Analysis Methodology • Python Basics  Variable and Data Types  Reading Data  Selecting Filtering the Data  Data manipulation,  sorting, grouping, • Python Libraries for Data Science  NumPy (Numerical Computation)  Pandas ( Data Analysis)  Matplotlib ( Data Visualization)  SciKit-Learn ( Machine learning Algorithms)
  • 3. WHAT IS DATA SCIENCE The process of finding insights/trends/ intelligence from the data A relatively new field Deeply rooted to Statistics and Decision Support System A Multidisciplinary field ( Domain Knowledge, Tools & technology, Mathematics & Statistics, Problem Solving Skills)
  • 4. DATA ANALYSIS METHODOLOGY • Statement of the problem/Objective of the Study • Data Preparation • Feature selection • Exploratory Data Analysis
  • 5. PYTHON BASICS What is Python • A high-level general-purpose programming language. • A very popular Data Science tool for data analysis, data visualization and Machine Learning tasks • It is a open source and free tool
  • 6. PYTHON BASICS How to Download Python Download the python from the following link https://www.python.org/downloads You can also download Python, and Jupytor Notebook from the following link https://www.anaconda.com/why-anaconda/
  • 7. PYTHON BASICS Common Tools in Python Environment The Python interactive console: Also called the Python interpreter or Python shell and provides programmers with a quick way to execute commands and try out or test code without creating a file. ( https://www.python.org/shell/) Spyder: It is a powerful scientific environment written in Python, for Python, and designed by and for scientists, engineers and data analysts. It offers a unique combination of the advanced editing, analysis, debugging, and profiling. (https://pypi.org/project/spyder/) Jupiter Notebook: It is an open source web application that you can use to create and share works (code, equations, visualizations, Machine Learning models and texts. (https://jupyter.org)
  • 8. PYTHON BASICS Most Popular Python Libraries for Data Science
  • 9. PYTHON BASICS  Variable and Types • Variable is a memory location and placeholder to hold the data • Most common Python Data Types: float, int, str, List, Tuple, Dictionary
  • 10. PYTHON BASICS Basic Operations in Python Arithmetic Operations Addition Subtraction Multiplication Division, Modulo Relational Operations Equal Greater/Greater than Less/less Than Logical Operations TRUE/FALSE AND IN OR
  • 11. PYTHON BASICS List A common Data type in python Collection comma-separated values (items) between square brackets Contain same or different types Mutable behavior Values can add, remove, update/replace the value, slice and dice the members
  • 12. PYTHON BASICS Tuple A common Type in Python A tuple is very similar to List A collection of items inside the parenthesis() Tuple is Immutable ( The value cannot be changed) Can slice and dice add elements and Delete the entire tuple
  • 13. PYTHON BASICS Dictionary Another common and popular type in Python A collection of unordered data values A dictionary holds key value pairs of data The items are separated by commas, and the whole thing is enclosed in curly braces Keys are immutable but the values are mutable - can add modify and Delete values
  • 14. PYTHON BASICS Function A function is a collection of reusable codes We write the function one time and call it to solve the particular task Two Types of Function: System Function: max(), min(), len() User Defined Function – created by the programmer/developer Main Components of Function: Input, computation, output Global and local function
  • 15. PYTHON BASICS Looping - For Loop The for loop that is used to iterate over elements of a sequence It is often used when we have a piece of code which we want to repeat "n" number of time.
  • 16. PYTHON BASICS Looping - While Loop The while loop tells the computer to do something as long as the condition is met It's construct consists of a block of code and a condition.
  • 17. PYTHON LIBRARY NumPy • It uses multidimensional arrays and matrices, as well as functions to perform the computation • Allow to perform advanced mathematical and statistical operations on the above objects • It provides vectorization of mathematical operations on arrays and matrices • many other python libraries are built on the top of NumPy library • Contains Linear algebra operations, Fourier Transformation and Random number generation https://numpy.org
  • 18. PYTHON LIBRARY Pandas • It is a Data Analysis tool in Python • It adds data structures and tools ( Series and Data Frame) designed to work with table-like data (similar to table in SQL Server environment) • It provides tools for data manipulation: selecting, reshaping, merging, sorting, slicing, aggregation etc. • It integrates time series functionality • It also handles missing data https://pandas.pydata.org
  • 19. PYTHON LIBRARY Matplotlib • It is a two- dimensional Data Plotting and Data Visualization library in Python • We can create line plots, scatter plots, bar charts, histograms, pie charts etc. https://matplotlib.org
  • 20. PYTHON LIBRARY Seaborn • Seaborn is a Python data visualization library based on matplotlib. • It provides a high-level interface for drawing attractive and informative statistical graphics http://seaborn.pydata.org
  • 21. PYTHON LIBRARY IPython and Jupyter • IPython is use for interactive computing and software development. • IPython provides easy access to operating system’s shell and file system. • IPython web notebook become Jupyter notebook with support for over 40 programming languages. Ipython system can now used as kernel for using Python with Jupyter. • Jupyter provides a productive environment for interactive and exploratory computing.
  • 22. PYTHON LIBRARY Scikit-learn • It is general purpose machine learning toolkit for Python programmers. • It includes submodules such as Classification(SVM, nearest neighbors, random forest, logistic regression, etc.) , Regression, Clustering(k-means, etc.), Dimensional reduction(PCA, feature selection, etc.), Model selection (grid search, metrics, etc.), Preprocessing(feature extraction, normalization). • It is built on the top of NumPy, SciPy and matplotlib. https://scikit-learn.org/stable
  • 23. PYTHON LIBRARY stasmodel • It is a statistical analysis and contains statistics and econometrics. • It has submodule such as Regression model, Analysis of variance(ANOVA), Time series Analysis, Nonparametric methods(Kernel density estimation, kernel regression), Visualization of statistical model results.
  • 24. QUESTION & ANSWER What Feedback do you have for me? Questions: nagendra.bnc@bn.du.ac.in (Nagendra)