SlideShare a Scribd company logo
MACHINE LEARNING MODELS:
FROM RESEARCH TO PRODUCTION
Tristan Zajonc | Head of Machine Learning Engineering
2 © Cloudera, Inc. All rights reserved.
WHY ARE MACHINE LEARNING MODELS IMPORTANT?
PROTECT
business
CONNECT
products &
services (IoT)
DRIVE
customer insights
The world is filled with prediction problems that require a new approach to software
● Predictive maintenance
● Logistics optimization
● Self-driving cars
● Medical diagnostics
● Marketing effectiveness
● Next best action
● Insider threat
prevention
● Fraud prevention
● Payment integrity
3 © Cloudera, Inc. All rights reserved.
SOFTWARE 1.0
Traditional development relies on hand-coded programs that map inputs to outputs
FUNCTION OUTPUTINPUT
x f(x) y
This approach is intractable or performs poorly for many problems that are easy for humans.
4 © Cloudera, Inc. All rights reserved.
SOFTWARE 2.0 - MACHINE LEARNING
FUNCTION OUTPUTINPUT
x f(θ,x) y
Machine learning searches for programs that map inputs to outputs effectively
Machine Learning
Model
5 © Cloudera, Inc. All rights reserved.
THE POWER OF MACHINE LEARNING
FUNCTION TEXTPIXELS
Machine learning enables software to address entirely new applications
Source: https://cs.stanford.edu/people/karpathy/deepimagesent/
6 © Cloudera, Inc. All rights reserved.
7 © Cloudera, Inc. All rights reserved.
MACHINE LEARNING AT FACEBOOK, UBER, AND GOOGLE
Efficient, reliable, large-scale machine learning requires new supporting tools
Facebook
FBLearner
Uber
Michelangelo
Google
TFX
8 © Cloudera, Inc. All rights reserved.
ACCELERATING THREE STAGES OF MACHINE LEARNING
Manage models
Deploy models
Monitor performance
DEPLOYDEVELOP
Explore data
Develop models
Share results
TRAIN
Optimize parameters
Track experiments
Compare performance
Machine learning platforms can accelerate model development, training, and deployment
9 © Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCH
Enables self-service data science at scale in secure environments
For data scientists
• Open data science
Use R, Python, or Scala with
your favorite libraries and on-
demand compute
• No need to sample
Directly access secure data
via Spark, Impala, or HDFS
• Reproducible,
collaborative research
Share with your whole team
For IT professionals
• Bring analysis to the data
Give data science team the
freedom to work how they
want, when they want
• Secure by default
Stay compliant with out-of-the-
box Hadoop security
• Flexible deployment
On-premises or in the cloud
10 © Cloudera, Inc. All rights reserved.
NEW IN CLOUDERA DATA SCIENCE WORKBENCH 1.4
Accelerate machine learning from research to production
DEVELOP MODELS
• Explore data securely and
develop models as a team
TRAIN MODELS
• Train, track, and compare
reproducible experiments
DEPLOY MODELS
• Deploy and monitor models
as APIs to serve predictions
NEW! NEW!
11 © Cloudera, Inc. All rights reserved.
RUN AND TRACK EXPERIMENTS
12 © Cloudera, Inc. All rights reserved.
CHALLENGE: REPRODUCIBLE RESEARCH
How do you know what model is better? How can you repeat a result?
• Model development is iterative
• Try different data, features, libraries, algorithms,
hyperparameters, etc.
• Reproducing a model means you need (at
least)...
• Training data
• Data/feature pipeline code
• Model training code + dependencies
• Runtime environment (CPU, GPU, memory, …)
• Any results or performance metrics
• This is a lot to keep track of!
13 © Cloudera, Inc. All rights reserved.
HOW THIS WORKS TODAY
WHY THIS IS A PROBLEM
• Wasted time and effort keeping
track of model/environment
changes
• Wasted time and effort trying to
recreate a result, especially for junior
data scientists / new team members
• Compliance risk due to inability to
explain the modeling process
• Source control?
• Unless you forget
• And maybe the library changes
• Or this…
• mymodel.py
• mymodel.final.py
• mymodel.final.final2.py
• mymodel.final.final2.noreallyfinal.py
• Post-it notes or notebook
• To keep track of performance metrics
14 © Cloudera, Inc. All rights reserved.
INTRODUCING EXPERIMENTS
Versioned model training runs for evaluation and reproducibility
Data scientists can now...
• Create a snapshot of model code,
dependencies, and configuration
necessary to train the model
• Build and execute the training run in an
isolated container
• Track specified model metrics,
performance, and model artifacts
• Inspect, compare, or deploy prior models
15 © Cloudera, Inc. All rights reserved.
DEMO
16 © Cloudera, Inc. All rights reserved.
DEPLOY AND MANAGE MODELS
17 © Cloudera, Inc. All rights reserved.
CHALLENGE: GETTING TO PRODUCTION
So you’ve got a trained model. Now what?
• Data scientists want to rapidly expose
candidate models to serve predictions
• REST APIs are the most requested approach
• Development and production are very different
• Owners: Data Scientists vs. Data Engineers
• Languages: Python/R vs. Java/Scala/C++
• Policy Controls: Approved code, packages, etc.
• Vocabulary: Data Science vs. DevOps
• Data scientists do not often have the skills (or
entitlements) to deploy models
18 © Cloudera, Inc. All rights reserved.
HOW THIS WORKS TODAY
WHY THIS IS A PROBLEM
• Can take days to months, raising the
bar for what’s worth deploying,
leading to stale models and reduced
innovation
• Many opportunities to introduce
errors and compliance risk
• Workarounds are difficult to
maintain while increasing the skills
gap
• Models get handed off to an
engineering team for re-coding
• Can you reproduce the model?
• Can you prove it’s the same?
• Or data science teams try to hack it
• Requires an entirely new set of tools (e.g.
Flask, Docker, Kubernetes, …)
• Getting an API up is different from
maintaining it (e.g. managing, monitoring,
testing, updating, scaling, …)
19 © Cloudera, Inc. All rights reserved.
INTRODUCING MODELS
Machine learning models as one-click microservices (REST APIs)
Model APIs made easy!
1. Choose Python/R file, e.g. score.py
2. Choose function, e.g. forecast
f = open('model.pk', 'rb')
model = pickle.load(f)
def forecast(data):
return model.predict(data)
3. Choose resources
4. Deploy!
20 © Cloudera, Inc. All rights reserved.
DEMO
21 © Cloudera, Inc. All rights reserved.
HOW IT WORKS
Experiments and Models leverage a new way of building images from source
• When running and experiment or deploying a model:
• Provides declarative pathway from version control to experiment or model
SOURCE
• Stage 1: Git snapshot of
source, respecting .gitignore
(before sure to ignore any local
Python/R environment)
IMAGE
• Stage 2: Docker build from
source; cdsw-build.sh
defines build steps, e.g.:
RUN
• Stage 3: Run versioned image
as Experiment (batch) or
Model (online) in Kubernetes
#!/bin/bash
pip3 install -r requirements.txt
22 © Cloudera, Inc. All rights reserved.
Not all machine learning applications have the same requirements
AS ALWAYS, USE THE RIGHT TOOL FOR THE JOB
• Not all models need online scoring -- batch scoring is simple and reliable
• Latency sensitive models are often better deployed to the edge
• mobile applications
• autonomous vehicles
• high-frequency trading
• Self-service deployment is not always appropriate
• Facebook scale (200 trillion predictions per day)
• Prediction services with high SLAs
But self-service model deployment enables rapid delivery for many ML use cases
and should be an available tool in every agile enterprise.
Confidential-Restricted – For Discussion Purposes Only23 © Cloudera, Inc. All rights reserved.
BENEFITS OF AN UNIFIED MACHINE LEARNING PLATFORM
RESEARCH EXPERIENCE
✓ Faster iteration, with confidence
✓ Easier to identify best models
✓ Easier to reproduce/explain work
✓ Easier to onboard team members
DEPLOYMENT EXPERIENCE
✓ Faster business impact
✓ Easier to deploy more models
✓ Easier to reproduce/explain work
✓ Easier to access CDH data/compute
OPERATIONAL EXPERIENCE
✓ Lower cost and risk for managing models
✓ Easier to support through self-service
✓ Easier to scale a shared environment
✓ Easier to control access
THANK YOU

More Related Content

PPTX
Spark and Deep Learning Frameworks at Scale 7.19.18
PPTX
Cloudera training secure your cloudera cluster 7.10.18
PPTX
How komatsu is driving operational efficiencies using io t and machine learni...
PPTX
Making Self-Service BI a Reality in the Enterprise
PPTX
Big data journey to the cloud 5.30.18 asher bartch
PPTX
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
PPTX
Big Data Fundamentals
PPTX
Cloudera training: secure your Cloudera cluster
Spark and Deep Learning Frameworks at Scale 7.19.18
Cloudera training secure your cloudera cluster 7.10.18
How komatsu is driving operational efficiencies using io t and machine learni...
Making Self-Service BI a Reality in the Enterprise
Big data journey to the cloud 5.30.18 asher bartch
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
Big Data Fundamentals
Cloudera training: secure your Cloudera cluster

What's hot (20)

PPTX
Self-service Big Data Analytics on Microsoft Azure
PPTX
Get started with Cloudera's cyber solution
PPTX
Big data journey to the cloud rohit pujari 5.30.18
PPTX
Cloud Data Warehousing with Cloudera Altus 7.24.18
PPTX
Cloudera - The Modern Platform for Analytics
PPTX
Cloudera SDX
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
How Cloudera SDX can aid GDPR compliance
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Cloudera Altus: Big Data in the Cloud Made Easy
PPTX
Consolidate your data marts for fast, flexible analytics 5.24.18
PDF
Machine Learning in the Enterprise 2019
PPTX
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
PPTX
Part 2: A Visual Dive into Machine Learning and Deep Learning 

PPTX
Kudu Forrester Webinar
PPTX
How Data Drives Business at Choice Hotels
PPTX
The Vision & Challenge of Applied Machine Learning
PPTX
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
PPTX
When SAP alone is not enough
Self-service Big Data Analytics on Microsoft Azure
Get started with Cloudera's cyber solution
Big data journey to the cloud rohit pujari 5.30.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloudera - The Modern Platform for Analytics
Cloudera SDX
Leveraging the Cloud for Big Data Analytics 12.11.18
How Cloudera SDX can aid GDPR compliance
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Modern Data Warehouse Fundamentals Part 3
Cloudera Altus: Big Data in the Cloud Made Easy
Consolidate your data marts for fast, flexible analytics 5.24.18
Machine Learning in the Enterprise 2019
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Kudu Forrester Webinar
How Data Drives Business at Choice Hotels
The Vision & Challenge of Applied Machine Learning
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
When SAP alone is not enough
Ad

Similar to Machine Learning Models: From Research to Production 6.13.18 (20)

PDF
Machine Learning Model Deployment: Strategy to Implementation
PPTX
Manoj Shanmugasundaram - Agile Machine Learning Development
PPTX
Next-Gen ML/AI Platform
PPTX
Data Science and CDSW
PDF
仕事ではじめる機械学習
PPTX
The Edge to AI Deep Dive Barcelona Meetup March 2019
PPTX
Deep Learning with Cloudera
PPTX
Part 1: Introducing the Cloudera Data Science Workbench
PDF
Train, predict, serve: How to go into production your machine learning model
PPTX
Data Science in Enterprise
PDF
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
PDF
Data Science and Machine Learning for the Enterprise
PPTX
Part 3: Models in Production: A Look From Beginning to End
PPTX
Hadoop for the Data Scientist: Spark in Cloudera 5.5
PPTX
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine Data
PPTX
DevOps for Machine Learning overview en-us
PDF
Continuous delivery for machine learning
PDF
Machine learning systems for engineers
PDF
Edge to ai analytics from edge to cloud with efficient movement of machine data
PPTX
Introduction to Machine Learning
Machine Learning Model Deployment: Strategy to Implementation
Manoj Shanmugasundaram - Agile Machine Learning Development
Next-Gen ML/AI Platform
Data Science and CDSW
仕事ではじめる機械学習
The Edge to AI Deep Dive Barcelona Meetup March 2019
Deep Learning with Cloudera
Part 1: Introducing the Cloudera Data Science Workbench
Train, predict, serve: How to go into production your machine learning model
Data Science in Enterprise
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Data Science and Machine Learning for the Enterprise
Part 3: Models in Production: A Look From Beginning to End
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine Data
DevOps for Machine Learning overview en-us
Continuous delivery for machine learning
Machine learning systems for engineers
Edge to ai analytics from edge to cloud with efficient movement of machine data
Introduction to Machine Learning
Ad

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
2020 Cloudera Data Impact Awards Finalists
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
Machine Learning with Limited Labeled Data 4/3/19
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
PPTX
Modern Data Warehouse Fundamentals Part 2
PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PPTX
Analyst Webinar: Doing a 180 on Customer 360
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
Introducing the data science sandbox as a service 8.30.18
PPTX
Introducing Workload XM 8.7.18
PDF
Multi task learning stepping away from narrow expert models 7.11.18
PPTX
The 5 Biggest Data Myths in Telco: Exposed
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Introducing the data science sandbox as a service 8.30.18
Introducing Workload XM 8.7.18
Multi task learning stepping away from narrow expert models 7.11.18
The 5 Biggest Data Myths in Telco: Exposed

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Encapsulation theory and applications.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
KodekX | Application Modernization Development
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation_ Review paper, used for researhc scholars
PPT
Teaching material agriculture food technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Review of recent advances in non-invasive hemoglobin estimation
cuic standard and advanced reporting.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Encapsulation theory and applications.pdf
Unlocking AI with Model Context Protocol (MCP)
KodekX | Application Modernization Development
Building Integrated photovoltaic BIPV_UPV.pdf
Approach and Philosophy of On baking technology
Big Data Technologies - Introduction.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Spectral efficient network and resource selection model in 5G networks
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation_ Review paper, used for researhc scholars
Teaching material agriculture food technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Digital-Transformation-Roadmap-for-Companies.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Review of recent advances in non-invasive hemoglobin estimation

Machine Learning Models: From Research to Production 6.13.18

  • 1. MACHINE LEARNING MODELS: FROM RESEARCH TO PRODUCTION Tristan Zajonc | Head of Machine Learning Engineering
  • 2. 2 © Cloudera, Inc. All rights reserved. WHY ARE MACHINE LEARNING MODELS IMPORTANT? PROTECT business CONNECT products & services (IoT) DRIVE customer insights The world is filled with prediction problems that require a new approach to software ● Predictive maintenance ● Logistics optimization ● Self-driving cars ● Medical diagnostics ● Marketing effectiveness ● Next best action ● Insider threat prevention ● Fraud prevention ● Payment integrity
  • 3. 3 © Cloudera, Inc. All rights reserved. SOFTWARE 1.0 Traditional development relies on hand-coded programs that map inputs to outputs FUNCTION OUTPUTINPUT x f(x) y This approach is intractable or performs poorly for many problems that are easy for humans.
  • 4. 4 © Cloudera, Inc. All rights reserved. SOFTWARE 2.0 - MACHINE LEARNING FUNCTION OUTPUTINPUT x f(θ,x) y Machine learning searches for programs that map inputs to outputs effectively Machine Learning Model
  • 5. 5 © Cloudera, Inc. All rights reserved. THE POWER OF MACHINE LEARNING FUNCTION TEXTPIXELS Machine learning enables software to address entirely new applications Source: https://cs.stanford.edu/people/karpathy/deepimagesent/
  • 6. 6 © Cloudera, Inc. All rights reserved.
  • 7. 7 © Cloudera, Inc. All rights reserved. MACHINE LEARNING AT FACEBOOK, UBER, AND GOOGLE Efficient, reliable, large-scale machine learning requires new supporting tools Facebook FBLearner Uber Michelangelo Google TFX
  • 8. 8 © Cloudera, Inc. All rights reserved. ACCELERATING THREE STAGES OF MACHINE LEARNING Manage models Deploy models Monitor performance DEPLOYDEVELOP Explore data Develop models Share results TRAIN Optimize parameters Track experiments Compare performance Machine learning platforms can accelerate model development, training, and deployment
  • 9. 9 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH Enables self-service data science at scale in secure environments For data scientists • Open data science Use R, Python, or Scala with your favorite libraries and on- demand compute • No need to sample Directly access secure data via Spark, Impala, or HDFS • Reproducible, collaborative research Share with your whole team For IT professionals • Bring analysis to the data Give data science team the freedom to work how they want, when they want • Secure by default Stay compliant with out-of-the- box Hadoop security • Flexible deployment On-premises or in the cloud
  • 10. 10 © Cloudera, Inc. All rights reserved. NEW IN CLOUDERA DATA SCIENCE WORKBENCH 1.4 Accelerate machine learning from research to production DEVELOP MODELS • Explore data securely and develop models as a team TRAIN MODELS • Train, track, and compare reproducible experiments DEPLOY MODELS • Deploy and monitor models as APIs to serve predictions NEW! NEW!
  • 11. 11 © Cloudera, Inc. All rights reserved. RUN AND TRACK EXPERIMENTS
  • 12. 12 © Cloudera, Inc. All rights reserved. CHALLENGE: REPRODUCIBLE RESEARCH How do you know what model is better? How can you repeat a result? • Model development is iterative • Try different data, features, libraries, algorithms, hyperparameters, etc. • Reproducing a model means you need (at least)... • Training data • Data/feature pipeline code • Model training code + dependencies • Runtime environment (CPU, GPU, memory, …) • Any results or performance metrics • This is a lot to keep track of!
  • 13. 13 © Cloudera, Inc. All rights reserved. HOW THIS WORKS TODAY WHY THIS IS A PROBLEM • Wasted time and effort keeping track of model/environment changes • Wasted time and effort trying to recreate a result, especially for junior data scientists / new team members • Compliance risk due to inability to explain the modeling process • Source control? • Unless you forget • And maybe the library changes • Or this… • mymodel.py • mymodel.final.py • mymodel.final.final2.py • mymodel.final.final2.noreallyfinal.py • Post-it notes or notebook • To keep track of performance metrics
  • 14. 14 © Cloudera, Inc. All rights reserved. INTRODUCING EXPERIMENTS Versioned model training runs for evaluation and reproducibility Data scientists can now... • Create a snapshot of model code, dependencies, and configuration necessary to train the model • Build and execute the training run in an isolated container • Track specified model metrics, performance, and model artifacts • Inspect, compare, or deploy prior models
  • 15. 15 © Cloudera, Inc. All rights reserved. DEMO
  • 16. 16 © Cloudera, Inc. All rights reserved. DEPLOY AND MANAGE MODELS
  • 17. 17 © Cloudera, Inc. All rights reserved. CHALLENGE: GETTING TO PRODUCTION So you’ve got a trained model. Now what? • Data scientists want to rapidly expose candidate models to serve predictions • REST APIs are the most requested approach • Development and production are very different • Owners: Data Scientists vs. Data Engineers • Languages: Python/R vs. Java/Scala/C++ • Policy Controls: Approved code, packages, etc. • Vocabulary: Data Science vs. DevOps • Data scientists do not often have the skills (or entitlements) to deploy models
  • 18. 18 © Cloudera, Inc. All rights reserved. HOW THIS WORKS TODAY WHY THIS IS A PROBLEM • Can take days to months, raising the bar for what’s worth deploying, leading to stale models and reduced innovation • Many opportunities to introduce errors and compliance risk • Workarounds are difficult to maintain while increasing the skills gap • Models get handed off to an engineering team for re-coding • Can you reproduce the model? • Can you prove it’s the same? • Or data science teams try to hack it • Requires an entirely new set of tools (e.g. Flask, Docker, Kubernetes, …) • Getting an API up is different from maintaining it (e.g. managing, monitoring, testing, updating, scaling, …)
  • 19. 19 © Cloudera, Inc. All rights reserved. INTRODUCING MODELS Machine learning models as one-click microservices (REST APIs) Model APIs made easy! 1. Choose Python/R file, e.g. score.py 2. Choose function, e.g. forecast f = open('model.pk', 'rb') model = pickle.load(f) def forecast(data): return model.predict(data) 3. Choose resources 4. Deploy!
  • 20. 20 © Cloudera, Inc. All rights reserved. DEMO
  • 21. 21 © Cloudera, Inc. All rights reserved. HOW IT WORKS Experiments and Models leverage a new way of building images from source • When running and experiment or deploying a model: • Provides declarative pathway from version control to experiment or model SOURCE • Stage 1: Git snapshot of source, respecting .gitignore (before sure to ignore any local Python/R environment) IMAGE • Stage 2: Docker build from source; cdsw-build.sh defines build steps, e.g.: RUN • Stage 3: Run versioned image as Experiment (batch) or Model (online) in Kubernetes #!/bin/bash pip3 install -r requirements.txt
  • 22. 22 © Cloudera, Inc. All rights reserved. Not all machine learning applications have the same requirements AS ALWAYS, USE THE RIGHT TOOL FOR THE JOB • Not all models need online scoring -- batch scoring is simple and reliable • Latency sensitive models are often better deployed to the edge • mobile applications • autonomous vehicles • high-frequency trading • Self-service deployment is not always appropriate • Facebook scale (200 trillion predictions per day) • Prediction services with high SLAs But self-service model deployment enables rapid delivery for many ML use cases and should be an available tool in every agile enterprise.
  • 23. Confidential-Restricted – For Discussion Purposes Only23 © Cloudera, Inc. All rights reserved. BENEFITS OF AN UNIFIED MACHINE LEARNING PLATFORM RESEARCH EXPERIENCE ✓ Faster iteration, with confidence ✓ Easier to identify best models ✓ Easier to reproduce/explain work ✓ Easier to onboard team members DEPLOYMENT EXPERIENCE ✓ Faster business impact ✓ Easier to deploy more models ✓ Easier to reproduce/explain work ✓ Easier to access CDH data/compute OPERATIONAL EXPERIENCE ✓ Lower cost and risk for managing models ✓ Easier to support through self-service ✓ Easier to scale a shared environment ✓ Easier to control access