SlideShare a Scribd company logo
2
Most read
3
Most read
7
Most read
Data Science Project Lifecycle
Jason Geng @Data Application Lab
Miya Du @Data Science Association
Business
Requirement
Data
Acquisition
Data
Preparation
Hypothesis &
Modeling
Evaluation &
Interpretation
Deployment
Operations
Optimization
Business Requirements
u Data scientists need to work with business people and
those with expertise in understanding the data,
understanding the business
u Specify the business requirements
u For instance, the healthcare data
e.g. ‘DISCWT’:
‘This the discharge-level weight
on the HCUP nationwide data to
produce national estimates’
Understand the data:
Understand the Business:
Goal:
Predict Readmission Rate
Database:
Healthcare:
Readmissions Database
Modeling
Data Collection
u Data from product line
u Purchase third party data
u Social media (Facebook, LinkedIn)
u Web crawling
u Open source data (Opendata, U.S. Census Data)
Challenge
Data Storage
Data Management
Legacy data
OLTP Web Log
Web Crawler
Open Source
Third Party
Data
Social Media
Data
XML
CSV
LOG
SQL
…
Product Line
Business
Intelligence
Data Science
App
Data Preparation (Data Wrangling)
u Cleaning data (semantic errors, missing entries, or inconsistent
formatting)
u Challenge: data integration
u 80% time in project workflow
Data
Source A
Data
Source B
Data
Source B
ETL
Data
Warehouse
Feature Engineering
Select or
creating
features
Research
feature
relevance
Experiment
and
validation
Change the
feature set
Go back to
feature
selection
step
Modeling
Reference Source: http://scikit-learn.org/stable/tutorial/machine_learning_map/
Deploy to Product Line
Thank you!
https://www.DataAppLab.com
Feb 2017
PPT: Xiaolu Zhao @ Feb 16, 2017

More Related Content

DOCX
Airline Reservation System Documentation
PPT
penelitian-dan-statistika.ppt
PPTX
Introduction to Data Science
PPTX
PPTX
Insect pest of banana crop
PPTX
Data preprocessing in Machine learning
PDF
Data preprocessing using Machine Learning
PDF
18CS81 IOT MODULE 4 PPT.pdf
Airline Reservation System Documentation
penelitian-dan-statistika.ppt
Introduction to Data Science
Insect pest of banana crop
Data preprocessing in Machine learning
Data preprocessing using Machine Learning
18CS81 IOT MODULE 4 PPT.pdf

What's hot (20)

PPTX
Introduction to Data Science
PDF
An introduction to Machine Learning
PPTX
Data science
PDF
Data science presentation
PDF
Exploratory data analysis data visualization
PPTX
introduction to data science
PPTX
Data science life cycle
PDF
Introduction to data analytics
PPTX
Intro/Overview on Machine Learning Presentation
PDF
Data Science Full Course | Edureka
PPTX
Introduction to data science.pptx
PPT
Machine learning
PPT
Basics of Machine Learning
PPTX
Statistics for data science
PPTX
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
PDF
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
PPT
Machine Learning
PPTX
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
PPTX
Introduction of Data Science
PDF
Big data Analytics
Introduction to Data Science
An introduction to Machine Learning
Data science
Data science presentation
Exploratory data analysis data visualization
introduction to data science
Data science life cycle
Introduction to data analytics
Intro/Overview on Machine Learning Presentation
Data Science Full Course | Edureka
Introduction to data science.pptx
Machine learning
Basics of Machine Learning
Statistics for data science
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Machine Learning
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
Introduction of Data Science
Big data Analytics
Ad

Viewers also liked (20)

PDF
Introduction on Data Science
PPTX
Session 01 designing and scoping a data science project
PDF
Data Science Introduction
PDF
Introduction to Data Science
KEY
Intro to Data Science for Enterprise Big Data
PDF
How to Become a Data Scientist
PDF
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
PDF
Metis data science_project_kiva_20150407
PPTX
CRISP-DM: Data Mining e Modelos Preditivos
PPT
The Anatomy of a Data Science Project
PPTX
E-Discovery explained, so you don't need to be a Lawyer, to get it.
PDF
20151016 Data Science For Project Managers
PPS
The Lotus Seed
PPTX
Biology project
PDF
Data science
PDF
Comunicado: brasileiro José Sette é escolhido como novo diretor-executivo da OIC
PPTX
National seminar on emergence of internet of things (io t) trends and challe...
PDF
Tivam4 tut bai1_ccs
PPTX
The Other 99% of a Data Science Project
PPTX
Pintura del Quattrocento en Italia (3)
Introduction on Data Science
Session 01 designing and scoping a data science project
Data Science Introduction
Introduction to Data Science
Intro to Data Science for Enterprise Big Data
How to Become a Data Scientist
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Metis data science_project_kiva_20150407
CRISP-DM: Data Mining e Modelos Preditivos
The Anatomy of a Data Science Project
E-Discovery explained, so you don't need to be a Lawyer, to get it.
20151016 Data Science For Project Managers
The Lotus Seed
Biology project
Data science
Comunicado: brasileiro José Sette é escolhido como novo diretor-executivo da OIC
National seminar on emergence of internet of things (io t) trends and challe...
Tivam4 tut bai1_ccs
The Other 99% of a Data Science Project
Pintura del Quattrocento en Italia (3)
Ad

Similar to Data Science Project Lifecycle (20)

PPTX
Dallas datascienceconference jasongeng-v3
PPTX
Data Science Project Lifecycle and Skill Set
PPTX
Introduction to data analytics is important
PPTX
Data Analytics presentation for college.
PPTX
Tips and Tricks to be an Effective Data Scientist
PPTX
Best Data Science Course in Rohini, BY DICS
PPTX
Introduction-FODS-fundamantals of data science
PDF
Python para Manual de Ciência de Dados
PDF
a-beginner-guide-to-an-incredible-technology-data-science.pdf
PDF
A Beginner’s Guide to An Incredible Technology Data Science.pdf
PDF
Understanding-the-Data-Science-Lifecycle
PPTX
Data Science Vs Data Analyst In 2024 | Skills, Career, Salary Comparision | S...
PPTX
L3 Big Data and Application.pptx
PDF
Where to study Data Science Course in Kerala
PDF
OVERVIEW OF DATA SCIENCE (3).pdf
PPTX
What is Data analytics? How is data analytics a better career option?
PDF
Getstarteddssd12717sd
PDF
Data science mastery course in pitampura
PPTX
Data science in business Administration Nagarajan.pptx
PDF
Building Data Science Teams
 
Dallas datascienceconference jasongeng-v3
Data Science Project Lifecycle and Skill Set
Introduction to data analytics is important
Data Analytics presentation for college.
Tips and Tricks to be an Effective Data Scientist
Best Data Science Course in Rohini, BY DICS
Introduction-FODS-fundamantals of data science
Python para Manual de Ciência de Dados
a-beginner-guide-to-an-incredible-technology-data-science.pdf
A Beginner’s Guide to An Incredible Technology Data Science.pdf
Understanding-the-Data-Science-Lifecycle
Data Science Vs Data Analyst In 2024 | Skills, Career, Salary Comparision | S...
L3 Big Data and Application.pptx
Where to study Data Science Course in Kerala
OVERVIEW OF DATA SCIENCE (3).pdf
What is Data analytics? How is data analytics a better career option?
Getstarteddssd12717sd
Data science mastery course in pitampura
Data science in business Administration Nagarajan.pptx
Building Data Science Teams
 

Recently uploaded (20)

PPTX
1_Introduction to advance data techniques.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Global journeys: estimating international migration
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Introduction to Business Data Analytics.
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Launch Your Data Science Career in Kochi – 2025
1_Introduction to advance data techniques.pptx
Reliability_Chapter_ presentation 1221.5784
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Database Infoormation System (DBIS).pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
climate analysis of Dhaka ,Banglades.pptx
Global journeys: estimating international migration
.pdf is not working space design for the following data for the following dat...
Data_Analytics_and_PowerBI_Presentation.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
IB Computer Science - Internal Assessment.pptx
Clinical guidelines as a resource for EBP(1).pdf
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction to Business Data Analytics.
Major-Components-ofNKJNNKNKNKNKronment.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Launch Your Data Science Career in Kochi – 2025

Data Science Project Lifecycle