SlideShare a Scribd company logo
2
Most read
5
Most read
9
Most read
INTRODUCTION OF DATA
SCIENCE
A LOOK BACK AT DATA SCIENCE
Introduction Of Data Science
• Data science is the field of study that combines domain expertise, programming
skills, and knowledge of mathematics and statistics to extract meaningful insights
from data. Data science combines multiple fields, including statistics, scientific
methods, artificial intelligence (AI), and data analysis, to extract value from data.
Those who practice data science are called data scientists, and they combine a
range of skills to analyze data collected from the web, smartphones, customers,
sensors, and other sources to derive actionable insights.
• Data science encompasses preparing data for analysis, including cleansing,
aggregating, and manipulating the data to perform advanced
data analysis.
Features of Data Science :-
• Responsive Construct
• Flexible
• Easily Trainable
• Feature Columns
• Open Source
• Parallel Network Training
• Visualizer
• Availability of Statistical Distributions
• Layered Components
• Event Logger
Different sectors where we using data science
Financial Industry
Travel industry
Manufacturing
Banking Sector Educational
Gaming
DATA SCIENCE
Purpose of python in data science
∙ It uses the elegant syntax , hence the programs are easier to read.
∙ It is a simple to access language, which makes it easy to achieve the
program working.
∙ The large standard library and community support.
∙ The interactive mode of Python makes its simple to test codes.
∙ Python is an expressive language
Component of python in data science
Data Analysis
• Data Analysis is a process of collecting, transforming, cleaning, and
modeling data with the goal of discovering the required information.
• A simple example of Data analysis is whenever we take any decision in our
day-to-day life is by thinking about what happened last time or what will
happen by choosing that particular decision. This is nothing but analyzing
our past or future and making decisions based on it.
Data Analysis Process consists of the following phases that
are iterative in nature –
Data Analysis
Data Requirements Specification
❖ The data required for analysis is based on a question or an experiment. Based on the requirements
of those directing the analysis, the data necessary as inputs to the analysis is identified (e.g.,
Population of people).
Data Collection
❖ Data Collection is the process of gathering information on targeted variables identified as
data requirements.
Data Processing
❖ The data that is collected must be processed or organized for analysis.
Data Analysis
Data Cleaning
❖ The processed and organized data may be incomplete, contain
duplicates, or contain errors. Data Cleaning is the process of
preventing and correcting these errors.
Data Analysis
❖ Data that is processed, organized and cleaned would be ready for the
analysis. Various data analysis techniques are available to understand,
interpret, and derive conclusions based on the requirements.
Communication
❖ The results of the data analysis are to be reported in a format as
required by the users to support their decisions and further action.
EDA (Exploratory Data Analysis)
• Exploratory data analysis (EDA) is a method of analyzing and investigating the
data sets to summaries their main characteristics.
• EDA focuses more narrowly on checking assumptions required for model fitting
and hypothesis testing. It also checks while handling missing values and making
transformations of variables as needed.
• EDA build a robust understanding of the data, issues associated with either the
info or process. it’s a scientific approach to get the story of the data.
EDA Process
STEP 1: Import python libraries
STEP 2: We will now read the data from a CSV file.
Step 3: head ( ) - By default, it returns the first 5 rows of the Data frame
• Step 4: tail ( ) - By default, it returns the last 5 rows of the Data frame. This function is used to get the last n
rows. This function returns the last n rows from the object based on position
• Step 5: describe () - Return a statistical summary for numerical columns present in the dataset.
• Step 6:shape - It shows the number of dimensions as well as the size in each dimension.
• Step 7: columns - Return the column labels of the data frame.
• Step 8: nunique ( ) - Return number of unique elements in the object. It counts the number of unique
entries over columns or rows.
.
• Step 9: isnull ( ).sum ( ) - Return the number of missing values in each column.
• Step 10: drop is use for Removing Columns .
• Step 11: Correlation is a measurement that describes the relationship between two variables.
• . Step 12: A correlation heatmap is a heatmap that shows a 2D correlation matrix between two discrete
dimensions, using colored cells to represent data from usually a monochromatic scale. The values of the first
dimension appear as the rows of the table while of the second dimension as a column. The color of the cell
is proportional to the number of measurements that match the dimensional value
Step 13 : Pairplot is a module of seaborn library .To plot multiple pairwise bivariate distributions in a dataset,
you can use the pairplot() function. This shows the relationship for (n, 2) combination of variable in a
DataFrame as a matrix of plots and the diagonal plots are the univariate plots
TYPES OF EXPLORATORY DATA ANALYSIS (EDA)
❖There are four types of EDA in all :-
1. Univariate Non-graphical
2. Univariate graphical
3. Multivariate Non-graphical
4. Multivariate graphical
TYPES OF EXPLORATORY DATA ANALYSIS (EDA)
Univariate non-graphical:
❖ This is the simplest form of data analysis among the four options.
In this type of analysis, the data that is being analysed consists of
just a single variable.
Univariate graphical:
❖ Unlike the non-graphical method, the graphical method provides
the full picture of the data. The three main methods of analysis
under this type are histogram, stem and leaf plot, and box plots.
TYPES OF EXPLORATORY DATA ANALYSIS (EDA)
Multivariate non-graphical:
❖ Multivariate non-graphical EDA technique is usually wont to show the connection
between two or more variables within the sort of either cross-tabulation or
statistics.
Multivariate graphical:
❖ This type of EDA displays the relationship between two or more set of data. A bar
chart, where each group represents a level of one of the variables and each bar
within the group represents levels of other variables.
Other common sorts of multivariate graphics are:
• Scatterplot
• Run chart
• Heat map
• Multivariate chart
• Bubble chart
EXPLORATORY DATA ANALYSIS (EDA) TOOLS
Python :
• EDA can be done using python for
identifying the missing value in a data set.
Other functions that can be performed are —
the description of data, handling outliers,
getting insights through the plots. Its high-
level, built-in data structure and dynamic
typing and binding make it an attractive tool
for EDA.
• Analyzing a dataset is a hectic task that takes
a lot of time. Python provides certain open-
source modules that can automate the whole
process of EDA and help in saving time.
R:
• The R language is used widely by
data scientists and statisticians for
developing statistical observations and
data analysis.
• R is an open-source programming
language that provides a free software
environment for statistical computing
and graphics that is supported by the R
Foundation for Statistical Computing.
THANK YOU!!!

More Related Content

PPTX
Introduction to Data Science
PPTX
Introduction to Data Science.pptx
PPTX
Introduction to data science
PDF
Introduction to data science
PDF
Data Science
PPTX
Data science
PPTX
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
PPTX
Data Science
Introduction to Data Science
Introduction to Data Science.pptx
Introduction to data science
Introduction to data science
Data Science
Data science
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Data Science

What's hot (20)

PDF
Data science presentation
PPTX
Introduction to data science.pptx
PDF
Introduction to Data Science
PPTX
Data science
PPTX
Clustering in Data Mining
PPT
Introduction to Data Mining
PPT
Data mining slides
 
PPTX
Data Science Training | Data Science For Beginners | Data Science With Python...
PDF
The Data Science Process
PPT
Machine Learning
PPTX
Credit card fraud detection using machine learning Algorithms
PPTX
Ppt on data science
PPTX
Exploratory data analysis with Python
PPTX
Data science Big Data
PDF
Heart Attack Prediction using Machine Learning
PPTX
Data science & data scientist
PDF
Data science
PPT
K mean-clustering algorithm
PPTX
Introduction to Data Analytics
PPTX
Introduction to data analytics
Data science presentation
Introduction to data science.pptx
Introduction to Data Science
Data science
Clustering in Data Mining
Introduction to Data Mining
Data mining slides
 
Data Science Training | Data Science For Beginners | Data Science With Python...
The Data Science Process
Machine Learning
Credit card fraud detection using machine learning Algorithms
Ppt on data science
Exploratory data analysis with Python
Data science Big Data
Heart Attack Prediction using Machine Learning
Data science & data scientist
Data science
K mean-clustering algorithm
Introduction to Data Analytics
Introduction to data analytics
Ad

Similar to Introduction of data science (20)

PDF
data science with python_UNIT 2_full notes.pdf
PDF
UNIT -1 Data exploration and visualization ppt
PPTX
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
PPTX
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
PPTX
EDA.pptx
DOCX
Data structure and algorithm.
PDF
Exploratory Data Analysis - Satyajit.pdf
PPTX
Singular Value Decomposition (SVD).pptx
PPTX
EDAB Module 5 Singular Value Decomposition (SVD).pptx
PPTX
CH 4_TYBSC(CS)_Data Science_Visualisation
PPTX
UNIT I- Introduction- data science key components, features
PPTX
Types of Data in Machine Learning, Number aand Categorical
PPTX
Exploratory Data Analysis (EDA) .pptx
PDF
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
PPTX
Data Science and Analysis.pptx
PDF
Introduction to Artificial Intelligence_ Lec 4
PPTX
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
PDF
Unit-I PPT hususi sisooshsgv. Eijeieieooekejj
PPTX
ch2 DS.pptx
PPTX
02 Related Concepts
data science with python_UNIT 2_full notes.pdf
UNIT -1 Data exploration and visualization ppt
Machine_Learning_VTU_6th_Semester_Module_2.1.pptx
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
EDA.pptx
Data structure and algorithm.
Exploratory Data Analysis - Satyajit.pdf
Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
CH 4_TYBSC(CS)_Data Science_Visualisation
UNIT I- Introduction- data science key components, features
Types of Data in Machine Learning, Number aand Categorical
Exploratory Data Analysis (EDA) .pptx
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science and Analysis.pptx
Introduction to Artificial Intelligence_ Lec 4
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
Unit-I PPT hususi sisooshsgv. Eijeieieooekejj
ch2 DS.pptx
02 Related Concepts
Ad

Recently uploaded (20)

PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Database Infoormation System (DBIS).pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Foundation of Data Science unit number two notes
PPT
Quality review (1)_presentation of this 21
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Global journeys: estimating international migration
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
1_Introduction to advance data techniques.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
STUDY DESIGN details- Lt Col Maksud (21).pptx
.pdf is not working space design for the following data for the following dat...
Database Infoormation System (DBIS).pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction to Knowledge Engineering Part 1
Major-Components-ofNKJNNKNKNKNKronment.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Clinical guidelines as a resource for EBP(1).pdf
Foundation of Data Science unit number two notes
Quality review (1)_presentation of this 21
Introduction-to-Cloud-ComputingFinal.pptx
Global journeys: estimating international migration
Data_Analytics_and_PowerBI_Presentation.pptx
1_Introduction to advance data techniques.pptx

Introduction of data science

  • 1. INTRODUCTION OF DATA SCIENCE A LOOK BACK AT DATA SCIENCE
  • 2. Introduction Of Data Science • Data science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data. Data science combines multiple fields, including statistics, scientific methods, artificial intelligence (AI), and data analysis, to extract value from data. Those who practice data science are called data scientists, and they combine a range of skills to analyze data collected from the web, smartphones, customers, sensors, and other sources to derive actionable insights. • Data science encompasses preparing data for analysis, including cleansing, aggregating, and manipulating the data to perform advanced data analysis.
  • 3. Features of Data Science :- • Responsive Construct • Flexible • Easily Trainable • Feature Columns • Open Source • Parallel Network Training • Visualizer • Availability of Statistical Distributions • Layered Components • Event Logger
  • 4. Different sectors where we using data science Financial Industry Travel industry Manufacturing Banking Sector Educational Gaming DATA SCIENCE
  • 5. Purpose of python in data science ∙ It uses the elegant syntax , hence the programs are easier to read. ∙ It is a simple to access language, which makes it easy to achieve the program working. ∙ The large standard library and community support. ∙ The interactive mode of Python makes its simple to test codes. ∙ Python is an expressive language
  • 6. Component of python in data science
  • 7. Data Analysis • Data Analysis is a process of collecting, transforming, cleaning, and modeling data with the goal of discovering the required information. • A simple example of Data analysis is whenever we take any decision in our day-to-day life is by thinking about what happened last time or what will happen by choosing that particular decision. This is nothing but analyzing our past or future and making decisions based on it.
  • 8. Data Analysis Process consists of the following phases that are iterative in nature –
  • 9. Data Analysis Data Requirements Specification ❖ The data required for analysis is based on a question or an experiment. Based on the requirements of those directing the analysis, the data necessary as inputs to the analysis is identified (e.g., Population of people). Data Collection ❖ Data Collection is the process of gathering information on targeted variables identified as data requirements. Data Processing ❖ The data that is collected must be processed or organized for analysis.
  • 10. Data Analysis Data Cleaning ❖ The processed and organized data may be incomplete, contain duplicates, or contain errors. Data Cleaning is the process of preventing and correcting these errors. Data Analysis ❖ Data that is processed, organized and cleaned would be ready for the analysis. Various data analysis techniques are available to understand, interpret, and derive conclusions based on the requirements. Communication ❖ The results of the data analysis are to be reported in a format as required by the users to support their decisions and further action.
  • 11. EDA (Exploratory Data Analysis) • Exploratory data analysis (EDA) is a method of analyzing and investigating the data sets to summaries their main characteristics. • EDA focuses more narrowly on checking assumptions required for model fitting and hypothesis testing. It also checks while handling missing values and making transformations of variables as needed. • EDA build a robust understanding of the data, issues associated with either the info or process. it’s a scientific approach to get the story of the data.
  • 12. EDA Process STEP 1: Import python libraries STEP 2: We will now read the data from a CSV file. Step 3: head ( ) - By default, it returns the first 5 rows of the Data frame
  • 13. • Step 4: tail ( ) - By default, it returns the last 5 rows of the Data frame. This function is used to get the last n rows. This function returns the last n rows from the object based on position • Step 5: describe () - Return a statistical summary for numerical columns present in the dataset.
  • 14. • Step 6:shape - It shows the number of dimensions as well as the size in each dimension. • Step 7: columns - Return the column labels of the data frame. • Step 8: nunique ( ) - Return number of unique elements in the object. It counts the number of unique entries over columns or rows.
  • 15. . • Step 9: isnull ( ).sum ( ) - Return the number of missing values in each column. • Step 10: drop is use for Removing Columns . • Step 11: Correlation is a measurement that describes the relationship between two variables.
  • 16. • . Step 12: A correlation heatmap is a heatmap that shows a 2D correlation matrix between two discrete dimensions, using colored cells to represent data from usually a monochromatic scale. The values of the first dimension appear as the rows of the table while of the second dimension as a column. The color of the cell is proportional to the number of measurements that match the dimensional value Step 13 : Pairplot is a module of seaborn library .To plot multiple pairwise bivariate distributions in a dataset, you can use the pairplot() function. This shows the relationship for (n, 2) combination of variable in a DataFrame as a matrix of plots and the diagonal plots are the univariate plots
  • 17. TYPES OF EXPLORATORY DATA ANALYSIS (EDA) ❖There are four types of EDA in all :- 1. Univariate Non-graphical 2. Univariate graphical 3. Multivariate Non-graphical 4. Multivariate graphical
  • 18. TYPES OF EXPLORATORY DATA ANALYSIS (EDA) Univariate non-graphical: ❖ This is the simplest form of data analysis among the four options. In this type of analysis, the data that is being analysed consists of just a single variable. Univariate graphical: ❖ Unlike the non-graphical method, the graphical method provides the full picture of the data. The three main methods of analysis under this type are histogram, stem and leaf plot, and box plots.
  • 19. TYPES OF EXPLORATORY DATA ANALYSIS (EDA) Multivariate non-graphical: ❖ Multivariate non-graphical EDA technique is usually wont to show the connection between two or more variables within the sort of either cross-tabulation or statistics. Multivariate graphical: ❖ This type of EDA displays the relationship between two or more set of data. A bar chart, where each group represents a level of one of the variables and each bar within the group represents levels of other variables. Other common sorts of multivariate graphics are: • Scatterplot • Run chart • Heat map • Multivariate chart • Bubble chart
  • 20. EXPLORATORY DATA ANALYSIS (EDA) TOOLS Python : • EDA can be done using python for identifying the missing value in a data set. Other functions that can be performed are — the description of data, handling outliers, getting insights through the plots. Its high- level, built-in data structure and dynamic typing and binding make it an attractive tool for EDA. • Analyzing a dataset is a hectic task that takes a lot of time. Python provides certain open- source modules that can automate the whole process of EDA and help in saving time. R: • The R language is used widely by data scientists and statisticians for developing statistical observations and data analysis. • R is an open-source programming language that provides a free software environment for statistical computing and graphics that is supported by the R Foundation for Statistical Computing.