SlideShare a Scribd company logo
Deploying scikit-learn
Models in Production
Rajat Arya (@rajatarya)
Product Manager, Dato Inc.
1
2
Dato provides a platform for building intelligent
apps
Data
Engineering
Data
Intelligence
Deployment
• Fast & scalable
• Rich data type support
• Visualization
• App-oriented ML
• Supporting utils
• Extensibility
• Batch & always-on
• RESTful interface
• Elastic & robust
Build, deploy, & manage your intelligent apps with Dato.
3
DATA
ML
Algorithm
How Everyone Starts with ML
• Running experiments
• Plots are the results
• Not clear how to get this deployed
4
DATA
ML
Algorithm
Deployment?
• Write a spec for other team to
implement in ‘production’ language
• Translate code in 6-12 months
• Stale / irrelevant model implemented
• Two teams maintaining two systems
Custom
Model
Data Engineers, Data Architects,
DevOps, App Developers
App
A
P
I
Data Scientist
5
Current Challenges
• Machine Learning Models
are opaque objects
• Export format like PMML
don’t support many
models
• Focus on training, not
prediction
6
Starting from the Beginning
GOAL: Handle live production traffic directly served from
the trained machine learning model
What are the requirements if we wanted to build a
similar architecture for ML Models?
One: Easy to Integrate
• REST APIs for both querying
and management
• Have client libraries in other
languages (no Python lock-in)
7
App
A
P
I
Two: High Performance
• Utilize Load Balancer for
distributing request load
• Integrated distributed cache
so repeated queries are only
answered once
8
App
A
P
I
C
A
C
H
E
A
P
I
C
A
C
H
E
Engine
A
P
I
C
A
C
H
E
LB
Three: Fault Tolerant
• Model running on many
machines
• System operational during
node failure
9
App
A
P
I
C
A
C
H
E
A
P
I
C
A
C
H
E
A
P
I
C
A
C
H
E
LB
GLC
Model
GLC
Model
GLC
Model
Engine
Engine
Engine
Four: Scalable
• Elastic scale nodes in cluster
up and down
• Easy to configure, cache
automatically updates with
cluster changes
10
App
A
P
I
C
A
C
H
E
A
P
I
C
A
C
H
E
LB
GLC
Model
GLC
Model
Engine
Engine
A
P
I
C
A
C
H
E Engine
A
P
I
C
A
C
H
E Engine
Five: Maintainable
• Zero downtime during model
deployment
• Metrics & logs
• Model management
11
App
A
P
I
C
A
C
H
E
A
P
I
C
A
C
H
E
A
P
I
C
A
C
H
E
LB
GLC
Model
GLC
Model
GLC
Model
Engine
Engine
Engine
Six: Extensible
• Arbitrary Python
• Use any set of Python
packages
• Model ensembling
12
App
A
P
I
C
A
C
H
E
A
P
I
C
A
C
H
E
A
P
I
C
A
C
H
E
LB
GLC
Model
GLC
Model
GLC
Model
Python
Python
Python
13
Requirements Recap
1. Easy to Integrate
2. High Performance
3. Fault Tolerant
4. Scalable
5. Maintainable
6. Extensible
App
A
P
I
C
A
C
H
E
A
P
I
C
A
C
H
E
A
P
I
C
A
C
H
E
LB
GLC
Model
GLC
Model
GLC
Model
Python
Python
Python
14
Do-It-Yourself
• Web Service layer:
- Tornado, Flask, Keen, Django, etc
• Caching layer:
- Redis, Cassandra, Memcached, DynamoDb, BerkeleyDb,
MySQL, etc
• Logs:
- Logback, LogStash, Splunk, Loggly
• Metrics:
- AWS CloudWatch, Mixpanel, Librato, etc
15
… or use Dato Predictive Services
We set out with this goal, and used these requirements
… and now I'd like to show it to you.
DEMO: Deploying a scikit-learn model using
Dato Predictive Services
16
17
Models as Services
• Deploy models as low-latency REST services
• Elastically scale up or out with one command
• Monitoring & Model Management
• Deploy existing Python models
• Run on AWS EC2 or Hadoop YARN
Dato Predictive Services
Predictive Engine
REST Client Direct
Model Mgmt

More Related Content

PDF
Machine learning in production
PPTX
Production and Beyond: Deploying and Managing Machine Learning Models
PPTX
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
PDF
Building A Production-Level Machine Learning Pipeline
PDF
CI/CD for Machine Learning
PDF
Weave GitOps - continuous delivery for any Kubernetes
PDF
Modern Machine Learning Infrastructure and Practices
PDF
AI driven classification framework for advanced Test Automation
Machine learning in production
Production and Beyond: Deploying and Managing Machine Learning Models
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Building A Production-Level Machine Learning Pipeline
CI/CD for Machine Learning
Weave GitOps - continuous delivery for any Kubernetes
Modern Machine Learning Infrastructure and Practices
AI driven classification framework for advanced Test Automation

What's hot (20)

PDF
Data ops: Machine Learning in production
PDF
Overcoming Regulatory & Compliance Hurdles with Hybrid Cloud EKS and Weave Gi...
PDF
Workshop: Your first machine learning project
PDF
Managers guide to effective building of machine learning products
PPTX
Magdalena Stenius: MLOPS Will Change Machine Learning
PPTX
From Data Science to MLOps
PPTX
ML-Ops: From Proof-of-Concept to Production Application
PDF
Challenges of Operationalising Data Science in Production
PDF
Ml infra at an early stage
PPTX
Why is dev ops for machine learning so different - dataxdays
PPTX
Code to Release using Artificial Intelligence and Machine Learning
PPTX
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
PPTX
Ai use cases
PDF
Version Control in AI/Machine Learning by Datmo
PDF
Ml ops past_present_future
PDF
Feature drift monitoring as a service for machine learning models at scale
PDF
Architecting for Data Science
PPTX
Data ops in practice
PDF
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
PPTX
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Data ops: Machine Learning in production
Overcoming Regulatory & Compliance Hurdles with Hybrid Cloud EKS and Weave Gi...
Workshop: Your first machine learning project
Managers guide to effective building of machine learning products
Magdalena Stenius: MLOPS Will Change Machine Learning
From Data Science to MLOps
ML-Ops: From Proof-of-Concept to Production Application
Challenges of Operationalising Data Science in Production
Ml infra at an early stage
Why is dev ops for machine learning so different - dataxdays
Code to Release using Artificial Intelligence and Machine Learning
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Ai use cases
Version Control in AI/Machine Learning by Datmo
Ml ops past_present_future
Feature drift monitoring as a service for machine learning models at scale
Architecting for Data Science
Data ops in practice
Machine Learning system architecture – Microsoft Translator, a Case Study : ...
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Ad

Viewers also liked (20)

PPTX
Machine Learning in Big Data
PDF
LINK UP - How your business can benefit from LinkedIn
PPT
Pancreatitis
PPTX
Ahead Week 1 Key Slides
PDF
Unidad iii mantencion_de_personal
PDF
Salud y seguridad de los trabajadores del sector salud.pdf
PDF
The Impact of a Medical Device Recall
PPT
Empowerment Awareness
PDF
VIH-AIDS 2008.
PPTX
SBK Kongress 2010 - Informierte PatientInnen – ist die Pflege darauf vorbere...
 
KEY
Cascalog workshop
PPT
Lab safety 12_10_13
 
PPTX
BNI 10 Minute Presentation from Supply My School
PPT
Social Tools in the Enterprise - SXSW
PPT
Dynamic Wellness JourneyCare Goal setting and research
PDF
Final Brazil
DOCX
Ss aba
PDF
Cascalog at Hadoop Day
PPT
Insight family space, Graham Cadle
PDF
Cloud Computing - Gina Franco
Machine Learning in Big Data
LINK UP - How your business can benefit from LinkedIn
Pancreatitis
Ahead Week 1 Key Slides
Unidad iii mantencion_de_personal
Salud y seguridad de los trabajadores del sector salud.pdf
The Impact of a Medical Device Recall
Empowerment Awareness
VIH-AIDS 2008.
SBK Kongress 2010 - Informierte PatientInnen – ist die Pflege darauf vorbere...
 
Cascalog workshop
Lab safety 12_10_13
 
BNI 10 Minute Presentation from Supply My School
Social Tools in the Enterprise - SXSW
Dynamic Wellness JourneyCare Goal setting and research
Final Brazil
Ss aba
Cascalog at Hadoop Day
Insight family space, Graham Cadle
Cloud Computing - Gina Franco
Ad

Similar to Py data scikit-production (20)

PDF
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
PDF
World Artificial Intelligence Conference Shanghai 2018
PPTX
Apache Spark Model Deployment
PDF
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
PDF
Bridging the gap in enterprise AI
PDF
From science to engineering, the process to build a machine learning product
PDF
Pragmatic Machine Learning @ ML Spain
PDF
Machine learning model to production
PDF
Deploying Large Spark Models to production and model scoring in near real time
PPTX
Machine learning at scale - Webinar By zekeLabs
PPTX
Deploying ML models in the enterprise
PDF
Productionizing Data Science at Experience
PPTX
Serverless machine learning architectures at Helixa
PDF
From Prototyping to Deployment at Scale with R and sparklyr with Kevin Kuo
PPTX
A practical guidance of the enterprise machine learning
PPTX
End-to-End ML Models Deployment Tutorial
PDF
Machine learning at scale challenges and solutions
PPTX
Serverless Functions and Machine Learning: Putting the AI in APIs
PDF
Deploying spark ml models
PPTX
Notes on Deploying Machine-learning Models at Scale
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
World Artificial Intelligence Conference Shanghai 2018
Apache Spark Model Deployment
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Bridging the gap in enterprise AI
From science to engineering, the process to build a machine learning product
Pragmatic Machine Learning @ ML Spain
Machine learning model to production
Deploying Large Spark Models to production and model scoring in near real time
Machine learning at scale - Webinar By zekeLabs
Deploying ML models in the enterprise
Productionizing Data Science at Experience
Serverless machine learning architectures at Helixa
From Prototyping to Deployment at Scale with R and sparklyr with Kevin Kuo
A practical guidance of the enterprise machine learning
End-to-End ML Models Deployment Tutorial
Machine learning at scale challenges and solutions
Serverless Functions and Machine Learning: Putting the AI in APIs
Deploying spark ml models
Notes on Deploying Machine-learning Models at Scale

More from Turi, Inc. (20)

PPTX
Webinar - Analyzing Video
PDF
Webinar - Patient Readmission Risk
PPTX
Webinar - Know Your Customer - Arya (20160526)
PPTX
Webinar - Product Matching - Palombo (20160428)
PPTX
Webinar - Pattern Mining Log Data - Vega (20160426)
PPTX
Webinar - Fraud Detection - Palombo (20160428)
PPTX
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
PDF
Pattern Mining: Extracting Value from Log Data
PPTX
Intelligent Applications with Machine Learning Toolkits
PPTX
Text Analysis with Machine Learning
PPTX
Machine Learning with GraphLab Create
PPTX
Machine Learning in Production with Dato Predictive Services
PPTX
Machine Learning in 2016: Live Q&A with Carlos Guestrin
PDF
Scalable data structures for data science
PPTX
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
PDF
Introduction to Recommender Systems
PPTX
Overview of Machine Learning and Feature Engineering
PPTX
SFrame
PPT
Building Personalized Data Products with Dato
PPTX
Getting Started With Dato - August 2015
Webinar - Analyzing Video
Webinar - Patient Readmission Risk
Webinar - Know Your Customer - Arya (20160526)
Webinar - Product Matching - Palombo (20160428)
Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Fraud Detection - Palombo (20160428)
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Pattern Mining: Extracting Value from Log Data
Intelligent Applications with Machine Learning Toolkits
Text Analysis with Machine Learning
Machine Learning with GraphLab Create
Machine Learning in Production with Dato Predictive Services
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Scalable data structures for data science
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Recommender Systems
Overview of Machine Learning and Feature Engineering
SFrame
Building Personalized Data Products with Dato
Getting Started With Dato - August 2015

Recently uploaded (20)

PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
Quality review (1)_presentation of this 21
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Computer network topology notes for revision
PPTX
Global journeys: estimating international migration
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Business Acumen Training GuidePresentation.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Quality review (1)_presentation of this 21
STUDY DESIGN details- Lt Col Maksud (21).pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction to Knowledge Engineering Part 1
Clinical guidelines as a resource for EBP(1).pdf
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Data_Analytics_and_PowerBI_Presentation.pptx
Computer network topology notes for revision
Global journeys: estimating international migration
Miokarditis (Inflamasi pada Otot Jantung)
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf

Py data scikit-production

  • 1. Deploying scikit-learn Models in Production Rajat Arya (@rajatarya) Product Manager, Dato Inc. 1
  • 2. 2 Dato provides a platform for building intelligent apps Data Engineering Data Intelligence Deployment • Fast & scalable • Rich data type support • Visualization • App-oriented ML • Supporting utils • Extensibility • Batch & always-on • RESTful interface • Elastic & robust Build, deploy, & manage your intelligent apps with Dato.
  • 3. 3 DATA ML Algorithm How Everyone Starts with ML • Running experiments • Plots are the results • Not clear how to get this deployed
  • 4. 4 DATA ML Algorithm Deployment? • Write a spec for other team to implement in ‘production’ language • Translate code in 6-12 months • Stale / irrelevant model implemented • Two teams maintaining two systems Custom Model Data Engineers, Data Architects, DevOps, App Developers App A P I Data Scientist
  • 5. 5 Current Challenges • Machine Learning Models are opaque objects • Export format like PMML don’t support many models • Focus on training, not prediction
  • 6. 6 Starting from the Beginning GOAL: Handle live production traffic directly served from the trained machine learning model What are the requirements if we wanted to build a similar architecture for ML Models?
  • 7. One: Easy to Integrate • REST APIs for both querying and management • Have client libraries in other languages (no Python lock-in) 7 App A P I
  • 8. Two: High Performance • Utilize Load Balancer for distributing request load • Integrated distributed cache so repeated queries are only answered once 8 App A P I C A C H E A P I C A C H E Engine A P I C A C H E LB
  • 9. Three: Fault Tolerant • Model running on many machines • System operational during node failure 9 App A P I C A C H E A P I C A C H E A P I C A C H E LB GLC Model GLC Model GLC Model Engine Engine Engine
  • 10. Four: Scalable • Elastic scale nodes in cluster up and down • Easy to configure, cache automatically updates with cluster changes 10 App A P I C A C H E A P I C A C H E LB GLC Model GLC Model Engine Engine A P I C A C H E Engine A P I C A C H E Engine
  • 11. Five: Maintainable • Zero downtime during model deployment • Metrics & logs • Model management 11 App A P I C A C H E A P I C A C H E A P I C A C H E LB GLC Model GLC Model GLC Model Engine Engine Engine
  • 12. Six: Extensible • Arbitrary Python • Use any set of Python packages • Model ensembling 12 App A P I C A C H E A P I C A C H E A P I C A C H E LB GLC Model GLC Model GLC Model Python Python Python
  • 13. 13 Requirements Recap 1. Easy to Integrate 2. High Performance 3. Fault Tolerant 4. Scalable 5. Maintainable 6. Extensible App A P I C A C H E A P I C A C H E A P I C A C H E LB GLC Model GLC Model GLC Model Python Python Python
  • 14. 14 Do-It-Yourself • Web Service layer: - Tornado, Flask, Keen, Django, etc • Caching layer: - Redis, Cassandra, Memcached, DynamoDb, BerkeleyDb, MySQL, etc • Logs: - Logback, LogStash, Splunk, Loggly • Metrics: - AWS CloudWatch, Mixpanel, Librato, etc
  • 15. 15 … or use Dato Predictive Services We set out with this goal, and used these requirements … and now I'd like to show it to you.
  • 16. DEMO: Deploying a scikit-learn model using Dato Predictive Services 16
  • 17. 17 Models as Services • Deploy models as low-latency REST services • Elastically scale up or out with one command • Monitoring & Model Management • Deploy existing Python models • Run on AWS EC2 or Hadoop YARN Dato Predictive Services Predictive Engine REST Client Direct Model Mgmt

Editor's Notes

  • #4: So I got started with ML by taking a class. Data -> to ML algo, and then generate a plot. Of course this isn’t how actual applications are written, but this is often where customers are starting when approaching taking ML to production.
  • #5: So I got started with ML by taking a class. Data -> to ML algo, and then generate a plot. Of course this isn’t how actual applications are written, but this is often where customers are starting when approaching taking ML to production.