SlideShare a Scribd company logo
Data Science Project Lifecycle
and Data Scientist Skill Set
Jason Geng @Data Application Lab
Miya Du @Data Science Association
Business
Requirement
Data
Acquisition
Data
Preparation
Hypothesis &
Modeling
Evaluation &
Interpretation
Deployment
Operations
Optimization
Business Requirements
 Data scientists need to work with business people and
those with expertise in understanding the data,
understanding the business
 Specify the business requirements
 For instance, the healthcare data
e.g. ‘DISCWT’:
‘This the discharge-level weight
on the HCUP nationwide data to
produce national estimates’
Understand the data:
Understand the Business:
Goal:
Predict Readmission Rate
Database:
Healthcare:
Readmissions Database
Modeling
Data Collection
 Data from product line
 Purchase third party data
 Social media (Facebook, LinkedIn)
 Web crawling
 Open source data (Opendata, U.S. Census Data)
Challenge
Data Storage
Data Management
Legacy data
OLTP Web Log
Web Crawler
Open Source
Third Party
Data
Social Media
Data
XML
CSV
LOG
SQL
…
Product Line
Business
Intelligence
Data Science
App
Data preparation (data wrangling)
 Cleaning data (semantic errors, missing entries, or inconsistent
formatting)
 Challenge: data integration
 80% time in project workflow
Data
Source A
Data
Source B
Data
Source B
ETL
Data
Warehouse
Feature engineering
Select or
creating
features
Research
feature
relevance
Experiment
and
validation
Change the
feature set
Go back to
feature
selection
step
Modeling
Reference Source: http://scikit-learn.org/stable/tutorial/machine_learning_map/
Deploy to product line
Data Science Project Lifecycle and Skill Set
Machine Learning
Data Collection
Communication
& Storytelling
Data Wrangling
Product Development &
Feedback Analysis
Data Visualization
Statistics
Domain
Knowledge &
Business Mindset
Data Science
Skill Tree
Required Knowledge
Skillsets Knowledge
Domain Knowledge and
Business Mindset
Programming R, Python, NLP, Java, Distributed System
Industry
Various Concentrations(Finance, E-
Commerce, Geo, Biology, Medicine)
Data Collection &
Wrangling
Database Database Systems and Management
Big Data Big Data Processing and Analytics
Statistics Modeling, Inference and Optimization
Machine learning Data Mining and Machine Learning
Data Visualization Data Visualization and Exploratory Analytics
Communication and Storytelling Professional Speaking and Writing
Program Comparison
University Name Northwestern CMU
Johns
Hopkins
Columbia
University
Stanford Berkeley UW USC
Domain
Knowledge
& Business
Mindset
Programming ✓ ✓ ✓ ✓ ✓ ✓ ✓
Industry ✓ ✓ ✓ ✓ ✓
Data
Collection
&
Wrangling
Database ✓ ✓ ✓ ✓
Big Data ✓ ✓ ✓ ✓ ✓ ✓
Statistics ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Machine learning ✓ ✓ ✓ ✓ ✓ ✓ ✓
Data Visualization ✓ ✓ ✓ ✓
Communication and
Storytelling
✓ ✓ ✓
Thank you!
https://www.DataAppLab.com
Feb 2017
PPT: Xiaolu Zhao @ Feb 16, 2017

More Related Content

PPTX
Ppt on data science
PDF
Data science presentation
PPTX
Introduction of Data Science
PPTX
Data science life cycle
PPTX
Introduction to data analytics
PPTX
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
PPTX
Data science
PPTX
Machine learning
Ppt on data science
Data science presentation
Introduction of Data Science
Data science life cycle
Introduction to data analytics
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Data science
Machine learning

What's hot (20)

PPTX
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
PPTX
Data science
PPTX
introduction to data science
PPTX
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
PPTX
Data science applications and usecases
PDF
Data Science Project Lifecycle
PPTX
Data science & data scientist
PDF
Introduction to Machine Learning with SciKit-Learn
PDF
Introduction to data science
PDF
Bayesian Networks - A Brief Introduction
PPTX
Introduction to data science.pptx
PDF
Data Science Full Course | Edureka
PDF
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
PPTX
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
PDF
Data Science Training | Data Science Tutorial | Data Science Certification | ...
PDF
Introduction To Data Science
PDF
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
PPTX
Prescriptive Analytics
PPTX
Introduction to Data Science
PDF
Artificial Intelligence And Machine Learning PowerPoint Presentation Slides C...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Data science
introduction to data science
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Data science applications and usecases
Data Science Project Lifecycle
Data science & data scientist
Introduction to Machine Learning with SciKit-Learn
Introduction to data science
Bayesian Networks - A Brief Introduction
Introduction to data science.pptx
Data Science Full Course | Edureka
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Introduction To Data Science
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Prescriptive Analytics
Introduction to Data Science
Artificial Intelligence And Machine Learning PowerPoint Presentation Slides C...
Ad

Viewers also liked (20)

PDF
Value Drivers for Your Data – Big, Fast, or Smart
PDF
Are API Services Taking Over All the Interesting Data Science Problems?
PDF
Microsoft Cognitive Service, Tap into the Power of Machine Learning with Easy...
PDF
Data Science meets Software Development
PPTX
Project in science
PPTX
Data and Local Government: Building & Configuring LA’s Real Estate Portfolio
PPTX
Chmsc lab school science project classification of plants
PDF
20160419 CCASA
PDF
Building a Data-Driven Culture
PDF
Lecture on Data Science in a Data-Driven Culture
PPTX
Image Processing as a Part of Big Data Initiatives
PPTX
Building a data-driven culture
PDF
How big data tranform your business? Data Science Thailand Meet up #6
PPTX
How to reach a Data Driven culture
PPTX
Helping Business Leaders Get Over Their Learning Curve in Advanced Analytics
PDF
Honey's Data Dinner#13 跨領域專案開發經驗談(User Story Mapping)
PPTX
組織創新管理- 序言(科特勒談創新型組織)
PPTX
組織創新管理 - 啟動者(科特勒談創新型組織)
PPTX
組織創新管理 - 促進者(科特勒談創新型組織)
PPT
How to use SlideShare on LinkedIn
Value Drivers for Your Data – Big, Fast, or Smart
Are API Services Taking Over All the Interesting Data Science Problems?
Microsoft Cognitive Service, Tap into the Power of Machine Learning with Easy...
Data Science meets Software Development
Project in science
Data and Local Government: Building & Configuring LA’s Real Estate Portfolio
Chmsc lab school science project classification of plants
20160419 CCASA
Building a Data-Driven Culture
Lecture on Data Science in a Data-Driven Culture
Image Processing as a Part of Big Data Initiatives
Building a data-driven culture
How big data tranform your business? Data Science Thailand Meet up #6
How to reach a Data Driven culture
Helping Business Leaders Get Over Their Learning Curve in Advanced Analytics
Honey's Data Dinner#13 跨領域專案開發經驗談(User Story Mapping)
組織創新管理- 序言(科特勒談創新型組織)
組織創新管理 - 啟動者(科特勒談創新型組織)
組織創新管理 - 促進者(科特勒談創新型組織)
How to use SlideShare on LinkedIn
Ad

Similar to Data Science Project Lifecycle and Skill Set (20)

PDF
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
PPTX
JavaZone 2018 - A Practical(ish) Introduction to Data Science
PDF
Essential Skills required for Aspiring Data Scientists.pdf
PPTX
NDC Oslo : A Practical Introduction to Data Science
PDF
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
PDF
How to become a data scientist
PPTX
A Practical-ish Introduction to Data Science
PDF
DS Life Cycle
PDF
DS Life Cycle
PDF
Building the Data Science Profession in Europe
PDF
Data Science Highlights
PPTX
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
PDF
Introduction to Data Science (Data Summit, 2017)
PDF
Understanding-the-Data-Science-Lifecycle
PDF
Guide for a Data Scientist
PDF
Data science tutorial
PDF
Decoding Data Science
PPTX
Data Science Introduction: Concepts, lifecycle, applications.pptx
PDF
Data science vs. Data scientist by Jothi Periasamy
PPTX
Göteborg university(condensed)
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
JavaZone 2018 - A Practical(ish) Introduction to Data Science
Essential Skills required for Aspiring Data Scientists.pdf
NDC Oslo : A Practical Introduction to Data Science
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
How to become a data scientist
A Practical-ish Introduction to Data Science
DS Life Cycle
DS Life Cycle
Building the Data Science Profession in Europe
Data Science Highlights
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
Introduction to Data Science (Data Summit, 2017)
Understanding-the-Data-Science-Lifecycle
Guide for a Data Scientist
Data science tutorial
Decoding Data Science
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data science vs. Data scientist by Jothi Periasamy
Göteborg university(condensed)

More from IDEAS - Int'l Data Engineering and Science Association (20)

PPTX
How to deliver effective data science projects
PPTX
Digital cracks in banking--Sid Nandi
PDF
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
PPTX
Battling Skynet: The Role of Humanity in Artificial Intelligence
PPTX
Implementing Artificial Intelligence with Big Data
PPSX
Data Architecture (i.e., normalization / relational algebra) and Database Sec...
PDF
Blockchain Application in Real Estate Transactions
PDF
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
PPTX
Practical Machine Learning at Work
PDF
Artificial Intelligence: Hype, Reality, Vision.
PPTX
Operationalizing your Data Lake: Get Ready for Advanced Analytics
PDF
Introduction to Deep Reinforcement Learning
PPTX
Best Practices in Data Partnerships Between Mayor's Office and Academia
PDF
Everything You Wish You Knew About Search
PPTX
AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...
PPTX
Data-Driven AI for Entertainment and Healthcare
PDF
PDF
Using AI to Tackle the Future of Health Care Data
PDF
Hot Dog, Not Hot Dog! Generate new training data without taking more photos.
How to deliver effective data science projects
Digital cracks in banking--Sid Nandi
“Full Stack” Data Science with R for Startups: Production-ready with Open-Sou...
Battling Skynet: The Role of Humanity in Artificial Intelligence
Implementing Artificial Intelligence with Big Data
Data Architecture (i.e., normalization / relational algebra) and Database Sec...
Blockchain Application in Real Estate Transactions
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Practical Machine Learning at Work
Artificial Intelligence: Hype, Reality, Vision.
Operationalizing your Data Lake: Get Ready for Advanced Analytics
Introduction to Deep Reinforcement Learning
Best Practices in Data Partnerships Between Mayor's Office and Academia
Everything You Wish You Knew About Search
AliMe Bot Platform Technical Practice - Alibaba`s Personal Intelligent Assist...
Data-Driven AI for Entertainment and Healthcare
Using AI to Tackle the Future of Health Care Data
Hot Dog, Not Hot Dog! Generate new training data without taking more photos.

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Encapsulation theory and applications.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Big Data Technologies - Introduction.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Dropbox Q2 2025 Financial Results & Investor Presentation
Spectral efficient network and resource selection model in 5G networks
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
cuic standard and advanced reporting.pdf
Electronic commerce courselecture one. Pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Weekly Chronicles - August'25 Week I
MYSQL Presentation for SQL database connectivity
Encapsulation theory and applications.pdf
Programs and apps: productivity, graphics, security and other tools
Unlocking AI with Model Context Protocol (MCP)
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation_ Review paper, used for researhc scholars
Big Data Technologies - Introduction.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
MIND Revenue Release Quarter 2 2025 Press Release

Data Science Project Lifecycle and Skill Set

Editor's Notes

  • #4: Add health care Re-adminssion Niu ying
  • #5: 为了所有数据能在全国范围的医疗数据做横向比较而算出来的系数
  • #6: Data source + add picture => bring challenge