SlideShare a Scribd company logo
Data Science Life Cycle
Presented By:
Praanav Bhowmik
Durgesh Gupta
Agenda:
● What is Data Science?
● Qualities of Data Scientist
● Job Role & Skill Sets
● Data Science Life Cycle
○ Business Understanding
○ Data acquisition and understanding
○ Modeling
○ Deployment
○ Customer Acceptance
○ Maintenance
● Contribution of Data Scientist to the Organization.
Introduction
● Data Science is a combination of multiple disciplines that uses statistics, data
analysis, and machine learning to analyze data and to extract information,
knowledge and gain insights from it.
● Data Science is about data gathering, analysis and decision-making.
● Finding patterns in data, through analysis,visualizations, and make future
predictions are the key things.
● Make use of theory and methods to provide concrete and actionable solutions to
complex problems.
Qualities of Data Scientist
Qualities of Data Scientist
● Discover valuable insights from huge amounts of data, which can then be used to
shape company strategies and achieve business objectives.
● Data Scientists Empower Management to Make Smarter Decisions.
● Data Scientists Make it Easier to Achieve Business Goals
● Challenge the Workforce to Embrace Data
● Refine Target Audiences
● Identify New Revenue Opportunities
● Analytical mind and business acumen
Job Responsibilities
● Fetching information from various sources and analyzing it to get a clear
understanding of how an organization performs.
● Uses statistical and analytical methods plus AI tools to automate specific
processes within the organization and develop smart solutions to business
challenges.
● Build predictive models and machine learning algorithms.
● Project information using data visualization tools.
● Propose solutions and strategies to tackle business challenges.
Skill Set Required
● Data scientists need to use mathematics to process and structure the data they’re
dealing with.
● Probability & Statistics: Statistics allows data scientists to slice and dice through data,
extracting the insights needed to make reasonable conclusions.
● Programming: A data scientist needs to know several programming languages like
Python for writing scripts for data manipulation, analysis, and visualization. R, Java, C etc
to achieve specific goals.
● Data Management: Ability to extract data from relational databases, non-relational and
unstructured data.
● Machine Learning / Deep Learning: ML algorithms to build the model.
● Cloud Computing: Able to use and utilize data and machine learning services and
frameworks available or provided by cloud service providers.
Data Science Life Cycle
DS Life Cycle
Business Understanding
● The complete cycle revolves around the enterprise goal.
● Identify the key business variables that the analysis needs to predict.
● Define the project goals by asking and refining "sharp" questions that are relevant,
specific, and unambiguous.
● Find the relevant data that helps you answer the questions that define the
objectives of the project.
Data Acquisition and
Understanding
● Real-world data sets are often noisy, are missing values, or have a host of other
discrepancies.
● Aim is to produce a clean, high-quality data set whose relationship to the target
variables is understood.
● Develop a solution architecture of the data pipeline that refreshes and scores the
data regularly
Modeling
● Determine the optimal data features for the machine-learning model.
● Create an informative machine-learning model that predicts the target most
accurately.
● The process for model training includes the following steps:
○ Split the input data randomly for modeling into a training data set and a test
data set.
○ Build the models by using the training data set.
○ Evaluate the training and the test data set. Use a series of competing
machine-learning algorithms along with the various associated tuning
parameters (known as a parameter sweep) that are geared toward
answering the question of interest with the current data.
Deployment
● Deploy models with a data pipeline to a production or production-like
environment for final user acceptance.
● After you have a set of models that perform well, you can operationalize them for
other applications to consume. Depending on the business requirements,
predictions are made either in real time or on a batch basis.
● To deploy models, you expose them with an open API interface.
● The interface enables the model to be easily consumed from various applications.
Customer Acceptance
● Confirm that the pipeline, the model, and their deployment in a production
environment satisfy the customer's objectives.
● The customer should validate that the system meets their business needs and
that it answers the questions with acceptable accuracy to deploy the system to
production for use by their client's application.
● The project is handed-off to the entity responsible for operations.
Monitoring & Maintenance
● The final but continuous phase of ML development is model monitoring and
maintenance.
● Post-deployment, you need to monitor your model to ensure it continues to
perform as expected.
● ML model requires regular tuning and updating to meet performance
expectations.
● Failing to perform this essential step may result in diminishing model accuracy
over time.
Contribution to the Organization
Contribution
● Data Science helps businesses monitor, manage, and collect performance
measures to improve decision-making across the organization.
● Companies may use trend analysis to make critical decisions to improve consumer
engagement, corporate performance, and boost revenue.
● Data Science models make use of current data and may simulate a variety of
operations. As a result, businesses may look for candidates with a professional
certificate who have studied the best courses for data analytics.
● Data Science assists firms in identifying and refining target audiences by integrating
existing data with additional data points to provide meaningful insights.
References
● Microsoft Data Science Life Cycle Overview
● Data Science Introduction-W3Schools
Thank You

More Related Content

PPTX
Data Science course in Hyderabad .
PPTX
Data Science course in Hyderabad .
PDF
Data science course in ameerpet Hyderabad
PPTX
data science course training in Hyderabad
PPTX
data science course in Hyderabad data science course in Hyderabad
PPTX
data science.pptx
PPTX
best data science course institutes in Hyderabad
PDF
How to become a data scientist
Data Science course in Hyderabad .
Data Science course in Hyderabad .
Data science course in ameerpet Hyderabad
data science course training in Hyderabad
data science course in Hyderabad data science course in Hyderabad
data science.pptx
best data science course institutes in Hyderabad
How to become a data scientist

Similar to DS Life Cycle (20)

PPTX
33A1660F-datascience.pptx Data analyst at the end
PDF
Data science course in madhapur,Hyderabad
PDF
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
PPTX
Data Science Introduction: Concepts, lifecycle, applications.pptx
PPTX
An-Introduction-to-the-Data-Science.pptx
PDF
Building successful data science teams
PDF
Real World End to End machine Learning Pipeline
PDF
A Beginner’s Guide to An Incredible Technology Data Science.pdf
PDF
a-beginner-guide-to-an-incredible-technology-data-science.pdf
PPTX
Data Science Training in Chandigarh h
PPTX
The Power of Data Science by DICS INNOVATIVE.pptx
PDF
Essential Skills required for Aspiring Data Scientists.pdf
PPTX
Dot Net Full Stack course in madhapur,Hyderabad
PDF
Data Science: Unlocking Insights and Transforming Industries
PDF
Untitled document.pdf
PDF
Real-World-Case-Studies-in-Data-Science.
PPTX
DataScience.pptx
PDF
Guide for a Data Scientist
PDF
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
PPTX
Data Science Introduction to Data Science
33A1660F-datascience.pptx Data analyst at the end
Data science course in madhapur,Hyderabad
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Introduction: Concepts, lifecycle, applications.pptx
An-Introduction-to-the-Data-Science.pptx
Building successful data science teams
Real World End to End machine Learning Pipeline
A Beginner’s Guide to An Incredible Technology Data Science.pdf
a-beginner-guide-to-an-incredible-technology-data-science.pdf
Data Science Training in Chandigarh h
The Power of Data Science by DICS INNOVATIVE.pptx
Essential Skills required for Aspiring Data Scientists.pdf
Dot Net Full Stack course in madhapur,Hyderabad
Data Science: Unlocking Insights and Transforming Industries
Untitled document.pdf
Real-World-Case-Studies-in-Data-Science.
DataScience.pptx
Guide for a Data Scientist
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
Data Science Introduction to Data Science
Ad

More from Knoldus Inc. (20)

PPTX
Angular Hydration Presentation (FrontEnd)
PPTX
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
PPTX
Self-Healing Test Automation Framework - Healenium
PPTX
Kanban Metrics Presentation (Project Management)
PPTX
Java 17 features and implementation.pptx
PPTX
Chaos Mesh Introducing Chaos in Kubernetes
PPTX
GraalVM - A Step Ahead of JVM Presentation
PPTX
Nomad by HashiCorp Presentation (DevOps)
PPTX
Nomad by HashiCorp Presentation (DevOps)
PPTX
DAPR - Distributed Application Runtime Presentation
PPTX
Introduction to Azure Virtual WAN Presentation
PPTX
Introduction to Argo Rollouts Presentation
PPTX
Intro to Azure Container App Presentation
PPTX
Insights Unveiled Test Reporting and Observability Excellence
PPTX
Introduction to Splunk Presentation (DevOps)
PPTX
Code Camp - Data Profiling and Quality Analysis Framework
PPTX
AWS: Messaging Services in AWS Presentation
PPTX
Amazon Cognito: A Primer on Authentication and Authorization
PPTX
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
PPTX
Managing State & HTTP Requests In Ionic.
Angular Hydration Presentation (FrontEnd)
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Self-Healing Test Automation Framework - Healenium
Kanban Metrics Presentation (Project Management)
Java 17 features and implementation.pptx
Chaos Mesh Introducing Chaos in Kubernetes
GraalVM - A Step Ahead of JVM Presentation
Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)
DAPR - Distributed Application Runtime Presentation
Introduction to Azure Virtual WAN Presentation
Introduction to Argo Rollouts Presentation
Intro to Azure Container App Presentation
Insights Unveiled Test Reporting and Observability Excellence
Introduction to Splunk Presentation (DevOps)
Code Camp - Data Profiling and Quality Analysis Framework
AWS: Messaging Services in AWS Presentation
Amazon Cognito: A Primer on Authentication and Authorization
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Managing State & HTTP Requests In Ionic.
Ad

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Electronic commerce courselecture one. Pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Modernizing your data center with Dell and AMD
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
cuic standard and advanced reporting.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
The AUB Centre for AI in Media Proposal.docx
Building Integrated photovoltaic BIPV_UPV.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Electronic commerce courselecture one. Pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
MYSQL Presentation for SQL database connectivity
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
A Presentation on Artificial Intelligence
Modernizing your data center with Dell and AMD
NewMind AI Weekly Chronicles - August'25 Week I
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Big Data Technologies - Introduction.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Review of recent advances in non-invasive hemoglobin estimation
Reach Out and Touch Someone: Haptics and Empathic Computing
cuic standard and advanced reporting.pdf
Advanced methodologies resolving dimensionality complications for autism neur...

DS Life Cycle

  • 1. Data Science Life Cycle Presented By: Praanav Bhowmik Durgesh Gupta
  • 2. Agenda: ● What is Data Science? ● Qualities of Data Scientist ● Job Role & Skill Sets ● Data Science Life Cycle ○ Business Understanding ○ Data acquisition and understanding ○ Modeling ○ Deployment ○ Customer Acceptance ○ Maintenance ● Contribution of Data Scientist to the Organization.
  • 3. Introduction ● Data Science is a combination of multiple disciplines that uses statistics, data analysis, and machine learning to analyze data and to extract information, knowledge and gain insights from it. ● Data Science is about data gathering, analysis and decision-making. ● Finding patterns in data, through analysis,visualizations, and make future predictions are the key things. ● Make use of theory and methods to provide concrete and actionable solutions to complex problems.
  • 4. Qualities of Data Scientist
  • 5. Qualities of Data Scientist ● Discover valuable insights from huge amounts of data, which can then be used to shape company strategies and achieve business objectives. ● Data Scientists Empower Management to Make Smarter Decisions. ● Data Scientists Make it Easier to Achieve Business Goals ● Challenge the Workforce to Embrace Data ● Refine Target Audiences ● Identify New Revenue Opportunities ● Analytical mind and business acumen
  • 6. Job Responsibilities ● Fetching information from various sources and analyzing it to get a clear understanding of how an organization performs. ● Uses statistical and analytical methods plus AI tools to automate specific processes within the organization and develop smart solutions to business challenges. ● Build predictive models and machine learning algorithms. ● Project information using data visualization tools. ● Propose solutions and strategies to tackle business challenges.
  • 7. Skill Set Required ● Data scientists need to use mathematics to process and structure the data they’re dealing with. ● Probability & Statistics: Statistics allows data scientists to slice and dice through data, extracting the insights needed to make reasonable conclusions. ● Programming: A data scientist needs to know several programming languages like Python for writing scripts for data manipulation, analysis, and visualization. R, Java, C etc to achieve specific goals. ● Data Management: Ability to extract data from relational databases, non-relational and unstructured data. ● Machine Learning / Deep Learning: ML algorithms to build the model. ● Cloud Computing: Able to use and utilize data and machine learning services and frameworks available or provided by cloud service providers.
  • 10. Business Understanding ● The complete cycle revolves around the enterprise goal. ● Identify the key business variables that the analysis needs to predict. ● Define the project goals by asking and refining "sharp" questions that are relevant, specific, and unambiguous. ● Find the relevant data that helps you answer the questions that define the objectives of the project.
  • 11. Data Acquisition and Understanding ● Real-world data sets are often noisy, are missing values, or have a host of other discrepancies. ● Aim is to produce a clean, high-quality data set whose relationship to the target variables is understood. ● Develop a solution architecture of the data pipeline that refreshes and scores the data regularly
  • 12. Modeling ● Determine the optimal data features for the machine-learning model. ● Create an informative machine-learning model that predicts the target most accurately. ● The process for model training includes the following steps: ○ Split the input data randomly for modeling into a training data set and a test data set. ○ Build the models by using the training data set. ○ Evaluate the training and the test data set. Use a series of competing machine-learning algorithms along with the various associated tuning parameters (known as a parameter sweep) that are geared toward answering the question of interest with the current data.
  • 13. Deployment ● Deploy models with a data pipeline to a production or production-like environment for final user acceptance. ● After you have a set of models that perform well, you can operationalize them for other applications to consume. Depending on the business requirements, predictions are made either in real time or on a batch basis. ● To deploy models, you expose them with an open API interface. ● The interface enables the model to be easily consumed from various applications.
  • 14. Customer Acceptance ● Confirm that the pipeline, the model, and their deployment in a production environment satisfy the customer's objectives. ● The customer should validate that the system meets their business needs and that it answers the questions with acceptable accuracy to deploy the system to production for use by their client's application. ● The project is handed-off to the entity responsible for operations.
  • 15. Monitoring & Maintenance ● The final but continuous phase of ML development is model monitoring and maintenance. ● Post-deployment, you need to monitor your model to ensure it continues to perform as expected. ● ML model requires regular tuning and updating to meet performance expectations. ● Failing to perform this essential step may result in diminishing model accuracy over time.
  • 16. Contribution to the Organization
  • 17. Contribution ● Data Science helps businesses monitor, manage, and collect performance measures to improve decision-making across the organization. ● Companies may use trend analysis to make critical decisions to improve consumer engagement, corporate performance, and boost revenue. ● Data Science models make use of current data and may simulate a variety of operations. As a result, businesses may look for candidates with a professional certificate who have studied the best courses for data analytics. ● Data Science assists firms in identifying and refining target audiences by integrating existing data with additional data points to provide meaningful insights.
  • 18. References ● Microsoft Data Science Life Cycle Overview ● Data Science Introduction-W3Schools