SlideShare a Scribd company logo
CS3352 - Foundations of Data Science
III Semester CSE
© Vignesh Saravanan K, AP/CSE
Lecture-2
Data Science Process: Overview
UNIT I – INTRODUCTION
2 © Vignesh Saravanan K, AP/CSE
Foundations of Data Science
RAMCO INSTITUTE OF TECHNOLOGY
2
Data Science Process – 6 Steps (RRPEMP)
Setting the research goal
Retrieving data
Data preparation
Data exploration
Data modeling
Presentation and automation
3 © Vignesh Saravanan K, AP/CSE
Foundations of Data Science
RAMCO INSTITUTE OF TECHNOLOGY
3
1. Setting the research goal
 When the business asks you to perform a data science project, you’ll
first prepare a Project Charter.
 Project charter contains information such as:
 What you’re going to research?
 How the company benefits from that?
 what data and resources you need?
 A timetable.
 Expected deliverables.
4 © Vignesh Saravanan K, AP/CSE
Foundations of Data Science
RAMCO INSTITUTE OF TECHNOLOGY
4
2. Retrieving data
 The second step is to collect data.
 In this step you ensure that you can use the data in your program,
which means:
 Checking the existence of data
 Quality and access to the data.
 Data can also be delivered by third-party companies and takes
many forms ranging from Excel spreadsheets to different types of
databases.
5 © Vignesh Saravanan K, AP/CSE
Foundations of Data Science
RAMCO INSTITUTE OF TECHNOLOGY
5
3. Data preparation
 Data collection is an error-prone process
 In this step phase you enhance the quality of the data and prepare
it for use in subsequent steps
 Consists of three sub-phases:
 Data Cleansing - removes false values from a data source
 Data Integration - enriches data sources by combining
information from multiple data sources
 Data Transformation - ensures that the data is in a suitable
format for use in your models.
6 © Vignesh Saravanan K, AP/CSE
Foundations of Data Science
RAMCO INSTITUTE OF TECHNOLOGY
6
4. Data exploration
 Deeper understanding of your data.
 Try to understand:
 how variables interact with each other,
 the distribution of the data, and
 whether there are outliers.
 To achieve this you mainly use descriptive statistics, visual
techniques, and simple modeling. - Exploratory Data Analysis
(EDA)
7 © Vignesh Saravanan K, AP/CSE
Foundations of Data Science
RAMCO INSTITUTE OF TECHNOLOGY
7
5. Data modeling or model building
 In this phase you use models, domain knowledge, and insights
about the data
 Select a technique from the fields of statistics, machine learning,
operations research, and so on
 Building a model is an iterative process that involves selecting the
variables for the model, executing the model, and model
diagnostics
8 © Vignesh Saravanan K, AP/CSE
Foundations of Data Science
RAMCO INSTITUTE OF TECHNOLOGY
8
6. Presentation and automation
 Finally, you present the results to your business.
 Results can take many forms:
 ranging from presentations
 Graphs, pictures, analysis diagrams
 research reports
 Sometimes you’ll need to automate the execution of the process,
business will want to use the insights you gained in another
project.
CS3352 - Foundations of Data Science
III Semester CSE
© Vignesh Saravanan K, AP/CSE
CS3352 - Foundations of Data Science
III Semester CSE
© Vignesh Saravanan K, AP/CSE
CS3352 - Foundations of Data Science
III Semester CSE
© Vignesh Saravanan K, AP/CSE
End of Lecture
• Data Science Process – 6 Steps
10 © Vignesh Saravanan K, AP/CSE
Foundations of Data Science
RAMCO INSTITUTE OF TECHNOLOGY
10

More Related Content

PDF
Developing a framework for
PDF
Foundational Methodology for Data Science
PPTX
Data Science course at MIT SCHOOL OF DISTANCE EDUCATION
PPTX
MODULE 1_Introduction to Data analytics and life cycle..pptx
PPTX
data science.pptx
PPTX
Data Science course in Hyderabad .
PPTX
Data Science course in Hyderabad .
Developing a framework for
Foundational Methodology for Data Science
Data Science course at MIT SCHOOL OF DISTANCE EDUCATION
MODULE 1_Introduction to Data analytics and life cycle..pptx
data science.pptx
Data Science course in Hyderabad .
Data Science course in Hyderabad .

Similar to Unit - I - Lecture-2.pdf (20)

PPTX
data science course in Hyderabad data science course in Hyderabad
PPTX
best data science course institutes in Hyderabad
PPTX
data science course training in Hyderabad
PPTX
data science course training in Hyderabad
PDF
Data science course in ameerpet Hyderabad
PDF
Data science course in madhapur,Hyderabad
PPTX
33A1660F-datascience.pptx Data analyst at the end
PPTX
Dot Net Full Stack course in madhapur,Hyderabad
PDF
Data Science.pdf
PPTX
Data Analytics Life Cycle
PDF
The Architecture of System for Predicting Student Performance based on the Da...
PDF
Big Data and Cloud Readiness
PDF
BIG DATA AND CLOUD READINESS
PDF
Data mining for prediction of human
PPTX
Introduction to Project Development using Visual Basic
PDF
A New Approach of Analysis of Student Results by using MapReduce
PDF
Educational Data Mining to Analyze Students Performance – Concept Plan
PDF
Fundamentals of Information Systems 9th Edition Stair Solutions Manual
PDF
Fundamentals of Information Systems 9th Edition Stair Solutions Manual
PPT
Get your data analytics strategy right!
data science course in Hyderabad data science course in Hyderabad
best data science course institutes in Hyderabad
data science course training in Hyderabad
data science course training in Hyderabad
Data science course in ameerpet Hyderabad
Data science course in madhapur,Hyderabad
33A1660F-datascience.pptx Data analyst at the end
Dot Net Full Stack course in madhapur,Hyderabad
Data Science.pdf
Data Analytics Life Cycle
The Architecture of System for Predicting Student Performance based on the Da...
Big Data and Cloud Readiness
BIG DATA AND CLOUD READINESS
Data mining for prediction of human
Introduction to Project Development using Visual Basic
A New Approach of Analysis of Student Results by using MapReduce
Educational Data Mining to Analyze Students Performance – Concept Plan
Fundamentals of Information Systems 9th Edition Stair Solutions Manual
Fundamentals of Information Systems 9th Edition Stair Solutions Manual
Get your data analytics strategy right!
Ad

More from Vignesh Saravanan (9)

PDF
Unit - 2 - Lecture-3.pdf
PDF
Bayesian learning
PDF
Case study-the next gen pos
PDF
Elaboration and domain model
PDF
Integrity constraints in dbms
PDF
Innovative practices jigsaw
PDF
Innovative practices reflection
PDF
Relational algebra in dbms
PDF
Database System Architecture
Unit - 2 - Lecture-3.pdf
Bayesian learning
Case study-the next gen pos
Elaboration and domain model
Integrity constraints in dbms
Innovative practices jigsaw
Innovative practices reflection
Relational algebra in dbms
Database System Architecture
Ad

Recently uploaded (20)

PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PPTX
Information Storage and Retrieval Techniques Unit III
PDF
Abrasive, erosive and cavitation wear.pdf
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
Design Guidelines and solutions for Plastics parts
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PDF
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PDF
August 2025 - Top 10 Read Articles in Network Security & Its Applications
PDF
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
PDF
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
Soil Improvement Techniques Note - Rabbi
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPTX
Feature types and data preprocessing steps
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Fundamentals of safety and accident prevention -final (1).pptx
Information Storage and Retrieval Techniques Unit III
Abrasive, erosive and cavitation wear.pdf
Exploratory_Data_Analysis_Fundamentals.pdf
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Design Guidelines and solutions for Plastics parts
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
Influence of Green Infrastructure on Residents’ Endorsement of the New Ecolog...
Nature of X-rays, X- Ray Equipment, Fluoroscopy
August 2025 - Top 10 Read Articles in Network Security & Its Applications
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
Soil Improvement Techniques Note - Rabbi
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Feature types and data preprocessing steps

Unit - I - Lecture-2.pdf

  • 1. CS3352 - Foundations of Data Science III Semester CSE © Vignesh Saravanan K, AP/CSE Lecture-2 Data Science Process: Overview UNIT I – INTRODUCTION
  • 2. 2 © Vignesh Saravanan K, AP/CSE Foundations of Data Science RAMCO INSTITUTE OF TECHNOLOGY 2 Data Science Process – 6 Steps (RRPEMP) Setting the research goal Retrieving data Data preparation Data exploration Data modeling Presentation and automation
  • 3. 3 © Vignesh Saravanan K, AP/CSE Foundations of Data Science RAMCO INSTITUTE OF TECHNOLOGY 3 1. Setting the research goal  When the business asks you to perform a data science project, you’ll first prepare a Project Charter.  Project charter contains information such as:  What you’re going to research?  How the company benefits from that?  what data and resources you need?  A timetable.  Expected deliverables.
  • 4. 4 © Vignesh Saravanan K, AP/CSE Foundations of Data Science RAMCO INSTITUTE OF TECHNOLOGY 4 2. Retrieving data  The second step is to collect data.  In this step you ensure that you can use the data in your program, which means:  Checking the existence of data  Quality and access to the data.  Data can also be delivered by third-party companies and takes many forms ranging from Excel spreadsheets to different types of databases.
  • 5. 5 © Vignesh Saravanan K, AP/CSE Foundations of Data Science RAMCO INSTITUTE OF TECHNOLOGY 5 3. Data preparation  Data collection is an error-prone process  In this step phase you enhance the quality of the data and prepare it for use in subsequent steps  Consists of three sub-phases:  Data Cleansing - removes false values from a data source  Data Integration - enriches data sources by combining information from multiple data sources  Data Transformation - ensures that the data is in a suitable format for use in your models.
  • 6. 6 © Vignesh Saravanan K, AP/CSE Foundations of Data Science RAMCO INSTITUTE OF TECHNOLOGY 6 4. Data exploration  Deeper understanding of your data.  Try to understand:  how variables interact with each other,  the distribution of the data, and  whether there are outliers.  To achieve this you mainly use descriptive statistics, visual techniques, and simple modeling. - Exploratory Data Analysis (EDA)
  • 7. 7 © Vignesh Saravanan K, AP/CSE Foundations of Data Science RAMCO INSTITUTE OF TECHNOLOGY 7 5. Data modeling or model building  In this phase you use models, domain knowledge, and insights about the data  Select a technique from the fields of statistics, machine learning, operations research, and so on  Building a model is an iterative process that involves selecting the variables for the model, executing the model, and model diagnostics
  • 8. 8 © Vignesh Saravanan K, AP/CSE Foundations of Data Science RAMCO INSTITUTE OF TECHNOLOGY 8 6. Presentation and automation  Finally, you present the results to your business.  Results can take many forms:  ranging from presentations  Graphs, pictures, analysis diagrams  research reports  Sometimes you’ll need to automate the execution of the process, business will want to use the insights you gained in another project.
  • 9. CS3352 - Foundations of Data Science III Semester CSE © Vignesh Saravanan K, AP/CSE CS3352 - Foundations of Data Science III Semester CSE © Vignesh Saravanan K, AP/CSE CS3352 - Foundations of Data Science III Semester CSE © Vignesh Saravanan K, AP/CSE End of Lecture • Data Science Process – 6 Steps
  • 10. 10 © Vignesh Saravanan K, AP/CSE Foundations of Data Science RAMCO INSTITUTE OF TECHNOLOGY 10