SlideShare a Scribd company logo
• Please mute your phone and turn off your video. There are over eighty people who don’t want to see or
hear you chewing.
• If you have any suggestions for future topics that you would like this group to cover, please send them to
Scott Shaw using Webex’s chat feature.
• We will send out the presentation deck after the meeting. Look for an announcement in the Meetup link
for this meeting.
• If you have questions during the presentation, also send them to Scott Shaw using Webex’s chat feature.
We will get to as many questions as we can.
Before we begin…
Operationalizing Data Science
Adam Doyle
Daugherty Business Solutions
April 1, 2020
Thanks for coming out everyone!
April Fools!
Operationalizing Data Science
Adam Doyle
Daugherty Business Solutions
April 1, 2020
Begin with the end in mind.
Pause in the middle to make sure that you can get to where you are
going.
• What is the business
intention that you are
trying to achieve?
• Minimize Cost
• Maximize Return
• Minimize Risk
• Realize Opportunity
• Engage Stakeholders
• POC vs Production
ready and valued
product
Identify your thesis.
Goal vs. intention
SMART goal
Refine to question that can be
answered with data science
Data science – predict,
explain, evaluate
Decision science –
combination of data science
and data engineering
Acquire data.
Third-Party
Data
Internal API Streaming
General – Amount, Access,
Quality, Labeled?
Third Party
o Assess Data Quality (Value
Range, Adherence,
Representative)
o Data Format (Automatic vs
hand-generated, Similar data
from different partners are
vastly different)
o Governed (Use appropriate
– avoid reidentification, TTL,
Contractuals, Track access,
renewals)
Internal
API (Data size limits,
unreliability, costs)
Streaming (CDC, Device Data,
Standardized?)
Explore the data.
Data Exploration
Statistical
Relationships and Correlations
Profiling
Textual – Word, Stop Words,
Bigram, Trigram
Clustering
Check in with SME
Every block of stone has a statue inside it, and it is the
task of the sculptor to discover it.
Cleanse data.
Data profiling
Deduplication
Outliers
Filter
Imputation
Source Corrections
Data shaping
Sort
Project
Enrichment
Create the model and features.
Type of Models
(Supervised,
Unsupervised,
Reinforcement Learning,
Neural Networks)
Feature Engineering
(Transformations and
Aggregations)
Encode Indicator Variables
Binning/Bucketing
Sparse Classes
Interaction Features
Extract Elements (eg.
Time)
Normalization
Feature Selection
Testing your features
Testing your model
Check in with SME
Check in with Business
Does what you’ve created
address the concerns of
the business?
Batch vs. Real-time?
Batch Training vs
Real-time for
- Training
- Evaluation
Evaluate the model.
Accuracy
Precision
Recall
MSE
Alignment to
Business
Deploy the model.
Automation
Scaling
SLAs
Versioning
Data Pipelines
Ongoing Data Acquisition
Ongoing Data Cleaning
Ongoing Feature Encoding
Integration in application
Monitor the model.
Drift
Degrading the model
Predictions and their
effects
Optimize the model.
Feature Optimization
Retraining
Remodeling
Conclusion
What does it mean to be
done?
Explanation as a Result
Questions?
• https://www.dataengineeringpodcast.com/
• https://dataengweekly.com/
• https://www.logicalclocks.com/blog/feature-store-the-missing-data-
layer-in-ml-pipelines
• https://www.imperva.com/blog/deployment-isnt-the-final-step-
monitoring-machine-learning-models-in-production/
Links

More Related Content

PPTX
Bigdata analytics
PPTX
Protecting data privacy in analytics and machine learning ISACA London UK
PDF
5 Factors Impacting Your Big Data Project's Performance
PPTX
Business analytics
PDF
Big Data: Issues and Challenges
PPTX
An Introduction to Big Data
PDF
The Future Of Big Data
PDF
Addressing Big Data Challenges - The Hadoop Way
Bigdata analytics
Protecting data privacy in analytics and machine learning ISACA London UK
5 Factors Impacting Your Big Data Project's Performance
Business analytics
Big Data: Issues and Challenges
An Introduction to Big Data
The Future Of Big Data
Addressing Big Data Challenges - The Hadoop Way

What's hot (20)

PPTX
Big Data - The 5 Vs Everyone Must Know
PDF
Building Data Science Teams
 
PDF
Full-Stack Data Science: How to be a One-person Data Team
PDF
Big Data & the importance of Data Science
PPTX
Big Data Analytics Strategy and Roadmap
PDF
Challenges of Big Data Research
PDF
Big Data - Insights & Challenges
PDF
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
PPTX
Big data
PDF
Big data issues and challenges
PDF
Big data course | big data training | big data classes
PDF
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
PPTX
Big Data and the Art of Data Science
PPTX
Motivation for big data
PDF
Big data privacy issues in public social media
PDF
Big Data : Risks and Opportunities
PDF
Python's Role in the Future of Data Analysis
PPTX
Big data-ppt
PDF
Big data Introduction by Mohan
PDF
Data Architecture: OMG It’s Made of People
Big Data - The 5 Vs Everyone Must Know
Building Data Science Teams
 
Full-Stack Data Science: How to be a One-person Data Team
Big Data & the importance of Data Science
Big Data Analytics Strategy and Roadmap
Challenges of Big Data Research
Big Data - Insights & Challenges
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Big data
Big data issues and challenges
Big data course | big data training | big data classes
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Big Data and the Art of Data Science
Motivation for big data
Big data privacy issues in public social media
Big Data : Risks and Opportunities
Python's Role in the Future of Data Analysis
Big data-ppt
Big data Introduction by Mohan
Data Architecture: OMG It’s Made of People
Ad

Similar to Operationalizing Data Science St. Louis Big Data IDEA (20)

PDF
Modernizing the Analytics and Data Science Lifecycle for the Scalable Enterpr...
PDF
Lean Analytics: How to get more out of your data science team
PDF
Building successful data science teams
PPTX
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
PDF
Data Science and Culture
PDF
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
PPTX
Best practice for_agile_ds_projects
PDF
Wtf is data science?
PDF
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
PDF
A Playbook for Solo & Siloed Data Science Practitioners
PDF
Accretive Health - Quality Management in Health Care
PPTX
Getting past the hype .pptx
PDF
Architecting for analytics
PDF
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
PPTX
Data Science Training in Chandigarh h
PDF
Maximizing Big Data ROI via Best of Breed Technology Patterns and Practices -...
PDF
Industrial Data Science
PDF
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
PPTX
An-Introduction-to-the-Data-Science.pptx
PPTX
ANIn Coimbatore Sep 2023 | Agile for data science by Venkatesa Prasanna Selvaraj
Modernizing the Analytics and Data Science Lifecycle for the Scalable Enterpr...
Lean Analytics: How to get more out of your data science team
Building successful data science teams
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Data Science and Culture
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Best practice for_agile_ds_projects
Wtf is data science?
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
A Playbook for Solo & Siloed Data Science Practitioners
Accretive Health - Quality Management in Health Care
Getting past the hype .pptx
Architecting for analytics
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
Data Science Training in Chandigarh h
Maximizing Big Data ROI via Best of Breed Technology Patterns and Practices -...
Industrial Data Science
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
An-Introduction-to-the-Data-Science.pptx
ANIn Coimbatore Sep 2023 | Agile for data science by Venkatesa Prasanna Selvaraj
Ad

More from Adam Doyle (20)

PPTX
ML Ops.pptx
PPTX
Data Engineering Roles
PPTX
Managed Cluster Services
PPTX
Delta lake and the delta architecture
PPTX
Great Expectations Presentation
PDF
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
PDF
Automate your data flows with Apache NIFI
PDF
Apache Iceberg Presentation for the St. Louis Big Data IDEA
PPTX
Localized Hadoop Development
PDF
The new big data
PDF
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020
PDF
Snowflake Data Science and AI/ML at Scale
PPTX
Retooling on the Modern Data and Analytics Tech Stack
PDF
Stl meetup cloudera platform - january 2020
PPTX
How stlrda does data
PPTX
Tailoring machine learning practices to support prescriptive analytics
PPTX
Synthesis of analytical methods data driven decision-making
PPTX
Big Data IDEA 101 2019
PPTX
Data Engineering and the Data Science Lifecycle
PDF
Data engineering Stl Big Data IDEA user group
ML Ops.pptx
Data Engineering Roles
Managed Cluster Services
Delta lake and the delta architecture
Great Expectations Presentation
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
Automate your data flows with Apache NIFI
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Localized Hadoop Development
The new big data
Feature store Overview St. Louis Big Data IDEA Meetup aug 2020
Snowflake Data Science and AI/ML at Scale
Retooling on the Modern Data and Analytics Tech Stack
Stl meetup cloudera platform - january 2020
How stlrda does data
Tailoring machine learning practices to support prescriptive analytics
Synthesis of analytical methods data driven decision-making
Big Data IDEA 101 2019
Data Engineering and the Data Science Lifecycle
Data engineering Stl Big Data IDEA user group

Recently uploaded (20)

PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Introduction to Business Data Analytics.
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Database Infoormation System (DBIS).pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
IB Computer Science - Internal Assessment.pptx
climate analysis of Dhaka ,Banglades.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Reliability_Chapter_ presentation 1221.5784
Miokarditis (Inflamasi pada Otot Jantung)
Major-Components-ofNKJNNKNKNKNKronment.pptx
Introduction to Business Data Analytics.
Supervised vs unsupervised machine learning algorithms
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Database Infoormation System (DBIS).pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
1_Introduction to advance data techniques.pptx
Introduction to Knowledge Engineering Part 1
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm

Operationalizing Data Science St. Louis Big Data IDEA

  • 1. • Please mute your phone and turn off your video. There are over eighty people who don’t want to see or hear you chewing. • If you have any suggestions for future topics that you would like this group to cover, please send them to Scott Shaw using Webex’s chat feature. • We will send out the presentation deck after the meeting. Look for an announcement in the Meetup link for this meeting. • If you have questions during the presentation, also send them to Scott Shaw using Webex’s chat feature. We will get to as many questions as we can. Before we begin…
  • 2. Operationalizing Data Science Adam Doyle Daugherty Business Solutions April 1, 2020
  • 3. Thanks for coming out everyone!
  • 5. Operationalizing Data Science Adam Doyle Daugherty Business Solutions April 1, 2020
  • 6. Begin with the end in mind. Pause in the middle to make sure that you can get to where you are going. • What is the business intention that you are trying to achieve? • Minimize Cost • Maximize Return • Minimize Risk • Realize Opportunity • Engage Stakeholders • POC vs Production ready and valued product
  • 7. Identify your thesis. Goal vs. intention SMART goal Refine to question that can be answered with data science Data science – predict, explain, evaluate Decision science – combination of data science and data engineering
  • 8. Acquire data. Third-Party Data Internal API Streaming General – Amount, Access, Quality, Labeled? Third Party o Assess Data Quality (Value Range, Adherence, Representative) o Data Format (Automatic vs hand-generated, Similar data from different partners are vastly different) o Governed (Use appropriate – avoid reidentification, TTL, Contractuals, Track access, renewals) Internal API (Data size limits, unreliability, costs) Streaming (CDC, Device Data, Standardized?)
  • 9. Explore the data. Data Exploration Statistical Relationships and Correlations Profiling Textual – Word, Stop Words, Bigram, Trigram Clustering Check in with SME
  • 10. Every block of stone has a statue inside it, and it is the task of the sculptor to discover it. Cleanse data. Data profiling Deduplication Outliers Filter Imputation Source Corrections Data shaping Sort Project Enrichment
  • 11. Create the model and features. Type of Models (Supervised, Unsupervised, Reinforcement Learning, Neural Networks) Feature Engineering (Transformations and Aggregations) Encode Indicator Variables Binning/Bucketing Sparse Classes Interaction Features Extract Elements (eg. Time) Normalization Feature Selection Testing your features Testing your model Check in with SME
  • 12. Check in with Business Does what you’ve created address the concerns of the business?
  • 13. Batch vs. Real-time? Batch Training vs Real-time for - Training - Evaluation
  • 15. Deploy the model. Automation Scaling SLAs Versioning Data Pipelines Ongoing Data Acquisition Ongoing Data Cleaning Ongoing Feature Encoding Integration in application
  • 16. Monitor the model. Drift Degrading the model Predictions and their effects
  • 17. Optimize the model. Feature Optimization Retraining Remodeling
  • 18. Conclusion What does it mean to be done? Explanation as a Result
  • 20. • https://www.dataengineeringpodcast.com/ • https://dataengweekly.com/ • https://www.logicalclocks.com/blog/feature-store-the-missing-data- layer-in-ml-pipelines • https://www.imperva.com/blog/deployment-isnt-the-final-step- monitoring-machine-learning-models-in-production/ Links

Editor's Notes

  • #3: Welcome. Introduction.
  • #6: Welcome. Introduction.
  • #7: What is the business intention that you are trying to achieve? Minimize Cost Maximize Return Minimize Risk Realize Opportunity Engage Stakeholders POC vs Production ready and valued product
  • #8: Decision science SMART goal Goal vs. intention Refine to question Data science – predict, explain, evaluate
  • #9: General – Amount, Access, Quality, Labeled? Third Party o   Assess Data Quality (Value Range, Adherence, Representative) o   Data Format (Automatic vs hand-generated, Similar data from different partners are vastly different) o   Governed (Use appropriate – avoid reidentification, TTL, Contractuals, Track access, renewals) Internal API (Data size limits, unreliability, costs) Streaming (CDC, Device Data, Standardized?)
  • #10: Data Exploration Statistical Relationships and Correlations Profiling Textual – Word, Stop Words, Bigram, Trigram Clustering Check in with SME
  • #11: Data profiling Deduplication Outliers Filter Imputation Source Corrections Data shaping Sort Project Enrichment
  • #12: Type of Models (Supervised, Unsupervised, Reinforcement Learning, Neural Networks) Feature Engineering (Transformations and Aggregations) Encode Indicator Variables Binning/Bucketing Sparse Classes Interaction Features Extract Elements (eg. Time) Normalization Feature Selection Testing your features Testing your model Check in with SME
  • #13: Check in with Business Does what you’ve created address the concerns of the business?
  • #14: Batch Training vs Real-time Training Batch Evaluation vs Real-time Evaluation
  • #15: Truth Matrix Mean Square Error Evaluation time
  • #16: Automation Scaling SLAs Versioning Data Pipelines Ongoing Data Acquisition Ongoing Data Cleaning Ongoing Feature Encoding Integration in application
  • #17: Drift Degrading the model Predictions and their effects
  • #18: Feature Optimization Retraining Remodeling
  • #19: What does it mean to be done? Explanation as a Result