SlideShare a Scribd company logo
Bucharest Big Data Meetup
Tech talks, use cases, all big data related topics
June 5th meetup
6:30 PM - 7:00 PM getting together
7:00 - 7:40 Productionizing Machine Learning,
Cosmin Pintoiu & Costina Batica @ Lentiq
7:40 - 8:15 Technology showdown: database vs blockchain,
Felix Crisan @ Blockchain Romania
8:15 - 8:45 Pizza and drinks sponsored by Netopia.
Sponsored by
Organizer
Valentina Crisan
Productionizing
Machine Learning
Cosmin Pintoiu – Solution Architect
cosmin.pintoiu@lentiq.com
Costina Batica – Software Developer
Agenda
1. Machine Learning made easy
2. From prototype to production (motivation for workflow and RCB)
3. Reusable code blocks (implementation)
4. Workflow Manager
5. Model Server
6. Demo time
7. Roadmap
8. Conclusions and Q&A
https://medium.freecodecamp.org/a-beginners-guide-to-training-and-deploying-machine-learning-models-using-python-48a313502e5a
Machine Learning
made easy
ü Gathering data
ü Preparing that data
ü Choosing a model
ü Training
ü Evaluation
ü Hyperparameter tuning
ü Prediction
Productionizing Machine Learning - Bigdata meetup 5-06-2019
The truth is
ü Productionizing ML is hard.
ü 5% of Machine Learning models/data science projects will be used in a Production Environment.
ü Most of the models developed will be “deployed” on power point slides.
ü If we want to be from this 5% we have to understand the problem of ML deployment and how to solve it.
Challenges of making the DS process successful
Small to medium companies
ü Lack of resources
ü Lack of skills and knowledge
ü Difficult to set up their own environment
ü Need additional developers for moving models into
production
ü Need devops for maintaining the environment
ü Struggles in defining the most valuable business problem
Enterprises
ü Over-centralized -> smaller and localized teams cannot be
agile
ü Lack of collaboration
ü Difficult to integrate new technologies in the enterprise
stack
ü Lack of visibility
ü Difficult to scale and put models in production
ü Centralized data ownership
A complete cycle
ü Gathering data
ü Preparing that data
ü Choosing a model
ü Training
ü Evaluation
ü Hyperparameter tuning
ü Prediction
We now have a model. What’s next?!
ü Serialize
ü Deploy
ü Serving / Scoring
ü Scaling
ü Update
ü Monitor
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Continuos features
Vector Assember Scaler Scaled Continuos
Feature Vector
Categorical features
String Indexer
One hot
encoder
Categorical Feature
Vector
Vector Assember
Vector Assember
Final Feature
Vector
Linear Regression
Model
Object Storage
Model Server
Model Server
Model Server
…
Load Balancer
Inference API
request
From prototype to production
requestEnd User
Application
prediction
Continuos features
Vector Assember Scaler Scaled Continuos
Feature Vector
Categorical features
String Indexer
One hot
encoder
Categorical Feature
Vector
Vector Assember
Vector Assember
Final Feature
Vector
Linear Regression
Model
Object Storage
Model Server
Model Server
Model Server
…
Load Balancer
Inference API
request
From prototype to production
Model Server
RCB & Workflow manager
Reusable code blocks
What is a Reusable Code Block?
You can think of a Reusable Code Block as a template for
creating tasks. Users typically package frequent tasks such
as cleaning data, anonymizing data, training models etc.
Reusable Code Blocks are shared with the entire data lake
and are stored in Lentiq's global registry, meaning that any
user with access to it can reuse the code block to perform
similar tasks.
There are two possible sources for an RCB:
ü Custom Docker image: the image needs to be uploaded
to a public repository
ü A Jupyter Notebook (using Kaniko): one can choose
the notebook from a list of available published
notebooks, which are shared with all the users with
access to the notebook’s data lake. This makes
cooperation and sharing knowledge between
departments easy.
Reusable Code Blocks are shared with the entire data lake and are stored in Lentiq's global
registry, meaning that any user with access to it can reuse the code block to perform
similar tasks.
Dockerless containers. RCB and Kaniko
Kaniko is an open source tool created by Google for
building container images from a Dockerfile and pushing
them to a remote registry, without having root access to a
Docker daemon.
Kaniko enables building container images in environments
that cannot easily or securely run a Docker daemon, like a
Kubernetes cluster or a container.
It executes each command within a Dockerfile completely
in user-space, so the build does not require privileges.
Privileged mode should be avoided at all costs to ensure a
secure environment.
How does it work?
Kaniko builds as a root user within a container in an
unprivileged environment. The Kaniko executor then
fetches and extracts the base-image file system to root (the
base image is the image in the FROM line of the
Dockerfile).
It executes each command in order, and takes a snapshot of
the file system after each command. This snapshot is
created in user-space by walking the filesystem and
comparing it to the prior state that was stored in memory.
It appends any modifications to the filesystem as a new
layer to the base image, and makes any relevant changes to
image metadata. After executing every command in the
Dockerfile, the executor pushes the newly built image to
the desired registry.
Running Kaniko in a Kubernetes cluster
Kaniko is run as a container in the cluster.
The Job spec needs three arguments:
Running Kaniko in a Kubernetes cluster
Kaniko is run as a container in the cluster.
The Job spec needs three arguments:
ü -- dockerfile
ü -- context: a path to a Dockerfile. This can be:
• a Github repository (cloned using an init container)
• a place Kaniko has access to, like a GCS or S3 storage
bucket (compressed tar file) (or any other registry
supported by Docker credential helpers)
• a local directory (specified with an emptyDir volume)
ü -- destination (repository) where Kaniko pushes the image
Besides the Kubernetes Job definition, we have some additional requirements:
ü A Kubernetes cluster …
ü Kubernetes secret mounted as a data volume under /kaniko/.docker/config.json:
ü Contains registry credentials required for pushing the final image
ü For example, to push the image to a Docker Hub repository, you need to execute:
ü Otherwise, you create a Kubernetes secret with your registry credentials using the following command:
ü A configmap to store the Jupyter Notebook
Building inside a Kubernetes cluster
requestEnd User
Application
prediction
Continuos features
Vector Assember Scaler Scaled Continuos
Feature Vector
Categorical features
String Indexer
One hot
encoder
Categorical Feature
Vector
Vector Assember
Vector Assember
Final Feature
Vector
Linear Regression
Model
Object Storage
Model Server
Model Server
Model Server
…
Load Balancer
Inference API
request
From prototype to production
Model Server
RCB & Workflow manager
Why the need of workflows?!
How it used to be:
ü Train a model
ü Run a pipeline using scripts
ü Set manual triggers
ü Wait for jobs, ETL.
ü Monitor
What happened:
ü The need for more iterations
ü More experimentation
ü More work for ops
ü Tedious and repetitive tasks
ü Reduces productivity
We needed a tool to automate, schedule, and share machine learning pipelines
Introducing workflow manager
ü Develop - create model in any
framework (scikit-learn,
SparkML, Tensorflow, etc.)
ü Serialize – save it in a format that
can be stored and transmitted
over network
ü Serving – used the model for
online / batch inference
(prediction or scoring)
request
Regression, Clustering, Random Forest, K-mens, XGBoost, Neural Network
Object Storage
Model Server
Model Server
Model Server
…
Load Balancer
Inference API
request
End User
Application
prediction
Model Server
SparkMLlib
Scikit-learn
Tensorflow:
One Runtime
Model serialization / persistence
Why MLeap?
ü MLeap is a common serialization format and execution
engine for machine learning pipelines.
ü Minimizes the effort to serve models within a production
environment
ü MLeap provides simple interfaces to execute entire ML
pipelines, from feature transformers to classifiers,
regressions, clustering algorithms, and neural networks
https://www.slideshare.net/JenAman/mleap-productionize-data-science-workflows-using-spark
What else is there?
ü PMML (xml) - Predictive Model
Markup Language
ü ONNX (protobuf, DL, Tensors) -
Open Neural Network Exchange
ü NNEF(DL, Tensors) - Neural
networks exchange format
ü PFA (json) – Portable Format for
Analitycs
Hard-coded
models
(SQL, Java, Ruby)
PMML
Emerging
Solutions
(yHat,
DataRobot)
Enterprise
Solutions
(Microsoft, IBM,
SAS)
Quick to
implement
Open Sourced
Commited to
Spark/ Hadoop
API Server
Infrastructure
https://www.slideshare.net/JenAman/mleap-productionize-data-science-workflows-using-spark
ü Low latency
ü Scale fast (horizontally)
ü Reliable and robust
ü Model versioning and in place
updates
ü Monitoring and management
ü Both online and batch mode
ü Auto scaling
What do we want from a model server?
Demo time
Auto Scaling ML serving
Roadmap
ü One of the hardest problems to solve is scaling in
a cost effective way
ü You have hundreds of API calls for predictions at
01:00 PM but we might encounter 100 000 calls
at 07:00 PM (this is the time most users will use
the app)
ü We need:
ü Target metric
ü Min-max capacity
ü Cool down period
Monitoring and
troubleshooting
Roadmap
ü Monitoring and profiling of
production traffic
ü Monitoring of models
performance
ü Model interpretation
(Interpretability is as important as
creating a model)
ü Multiple trials
ü Bayesian methods
ü Hyper-parameter tuning using ParamGrid
ü Ideally to learn from previous runs
ü Run multiple experiments in parallel or sequentially
ü Ideally to learn from previous experiments (to guide future experiments)
Roadmap
HyperParameter
tuning
Distributed training
(Tensorflow & SparkML)
Roadmap
ü It can take a loooong time to
train
ü 1.000 cpus, gpu , tpu
ü HorovodRunner: Distributed
Deep Learning
Conclusions
ü Jupyter notebooks / code can be encapsulated inside docker containers to be shared
and reused
ü Workflow engine automate and schedule machine learning pipelines
ü Machine learning models are queried via REST APIs
ü Scalable model serving / inference using Model Server
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
ü https://github.com/mlflow/mlflow-example/
ü https://towardsdatascience.com/the-7-steps-of-machine-learning-2877d7e5548e
ü https://www.anaconda.com/productionizing-and-deploying-data-science-projects/
ü https://events.linuxfoundation.org/wp-content/uploads/2017/12/Productionizing-ML-Pipelines-with-the-Portable-Format-for-
Analytics-Nick-Pentreath-IBM.pdf
ü https://hackernoon.com/a-guide-to-scaling-machine-learning-models-in-production-aa8831163846
ü https://blog.algorithmia.com/deploying-machine-learning-at-scale/
ü https://medium.freecodecamp.org/a-beginners-guide-to-training-and-deploying-machine-learning-models-using-python-
48a313502e5a
ü https://towardsdatascience.com/how-to-train-your-neural-networks-in-parallel-with-keras-and-apache-spark-ea8a3f48cae6
ü https://hydrosphere.io/serving-docs/latest/components/runtimes.html
References:

More Related Content

PDF
Netflix machine learning
PDF
More Data Science with Less Engineering: Machine Learning Infrastructure at N...
PDF
Deploying End-to-End Deep Learning Pipelines with ONNX
PPTX
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
PDF
Whats new in_mlflow
PPTX
Overview of PaaS: Java experience
PPTX
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
PDF
KFServing and Feast
Netflix machine learning
More Data Science with Less Engineering: Machine Learning Infrastructure at N...
Deploying End-to-End Deep Learning Pipelines with ONNX
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Whats new in_mlflow
Overview of PaaS: Java experience
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
KFServing and Feast

What's hot (20)

PDF
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
PDF
Automating machine learning lifecycle with kubeflow
PDF
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
PDF
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
PDF
Hopsworks Feature Store 2.0 a new paradigm
PDF
Build, Scale, and Deploy Deep Learning Pipelines with Ease
PDF
ROCm and Distributed Deep Learning on Spark and TensorFlow
PDF
MLFlow 1.0 Meetup
PPTX
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems
PDF
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
PPTX
Big Data Analytics-Open Source Toolkits
PDF
Accelerated Training of Transformer Models
PPTX
Anomaly Detection with Azure and .NET
PPTX
DAWN and Scientific Workflows
PDF
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
PDF
Scaling Machine Learning To Billions Of Parameters
PDF
Migrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow
PPTX
Catalyst optimizer
PDF
Accelerating Data Science with Better Data Engineering on Databricks
PDF
Deploying Enterprise Deep Learning Masterclass Preview - Enterprise Deep Lea...
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Automating machine learning lifecycle with kubeflow
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Hopsworks Feature Store 2.0 a new paradigm
Build, Scale, and Deploy Deep Learning Pipelines with Ease
ROCm and Distributed Deep Learning on Spark and TensorFlow
MLFlow 1.0 Meetup
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
Big Data Analytics-Open Source Toolkits
Accelerated Training of Transformer Models
Anomaly Detection with Azure and .NET
DAWN and Scientific Workflows
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Scaling Machine Learning To Billions Of Parameters
Migrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow
Catalyst optimizer
Accelerating Data Science with Better Data Engineering on Databricks
Deploying Enterprise Deep Learning Masterclass Preview - Enterprise Deep Lea...
Ad

Similar to Productionizing Machine Learning - Bigdata meetup 5-06-2019 (20)

PDF
PPTX
OS for AI: Elastic Microservices & the Next Gen of ML
PPTX
Serverless machine learning architectures at Helixa
PDF
Running Apache Spark Jobs Using Kubernetes
PDF
Open shift and docker - october,2014
PDF
Scaling AI/ML with Containers and Kubernetes
PDF
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
PDF
Containerized architectures for deep learning
PDF
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
PDF
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
PDF
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
PPTX
Hitchhiker's guide to Cloud-Native Build Pipelines and Infrastructure as Code
PPTX
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
PPTX
Ultimate Guide to Microservice Architecture on Kubernetes
PDF
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
PPTX
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
PPTX
Onion Architecture with S#arp
PDF
Clipper: A Low-Latency Online Prediction Serving System
PDF
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
PDF
Slide DevSecOps Microservices
OS for AI: Elastic Microservices & the Next Gen of ML
Serverless machine learning architectures at Helixa
Running Apache Spark Jobs Using Kubernetes
Open shift and docker - october,2014
Scaling AI/ML with Containers and Kubernetes
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
Containerized architectures for deep learning
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
Hitchhiker's guide to Cloud-Native Build Pipelines and Infrastructure as Code
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Ultimate Guide to Microservice Architecture on Kubernetes
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Onion Architecture with S#arp
Clipper: A Low-Latency Online Prediction Serving System
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
Slide DevSecOps Microservices
Ad

Recently uploaded (20)

PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Fluid Mechanics, Module 3: Basics of Fluid Mechanics
PPTX
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPT
Project quality management in manufacturing
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPT
Drone Technology Electronics components_1
PDF
Structs to JSON How Go Powers REST APIs.pdf
DOCX
573137875-Attendance-Management-System-original
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Lecture Notes Electrical Wiring System Components
UNIT-1 - COAL BASED THERMAL POWER PLANTS
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
CYBER-CRIMES AND SECURITY A guide to understanding
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Fluid Mechanics, Module 3: Basics of Fluid Mechanics
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Project quality management in manufacturing
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
OOP with Java - Java Introduction (Basics)
Foundation to blockchain - A guide to Blockchain Tech
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Drone Technology Electronics components_1
Structs to JSON How Go Powers REST APIs.pdf
573137875-Attendance-Management-System-original

Productionizing Machine Learning - Bigdata meetup 5-06-2019

  • 1. Bucharest Big Data Meetup Tech talks, use cases, all big data related topics June 5th meetup 6:30 PM - 7:00 PM getting together 7:00 - 7:40 Productionizing Machine Learning, Cosmin Pintoiu & Costina Batica @ Lentiq 7:40 - 8:15 Technology showdown: database vs blockchain, Felix Crisan @ Blockchain Romania 8:15 - 8:45 Pizza and drinks sponsored by Netopia. Sponsored by Organizer Valentina Crisan
  • 2. Productionizing Machine Learning Cosmin Pintoiu – Solution Architect cosmin.pintoiu@lentiq.com Costina Batica – Software Developer
  • 3. Agenda 1. Machine Learning made easy 2. From prototype to production (motivation for workflow and RCB) 3. Reusable code blocks (implementation) 4. Workflow Manager 5. Model Server 6. Demo time 7. Roadmap 8. Conclusions and Q&A
  • 4. https://medium.freecodecamp.org/a-beginners-guide-to-training-and-deploying-machine-learning-models-using-python-48a313502e5a Machine Learning made easy ü Gathering data ü Preparing that data ü Choosing a model ü Training ü Evaluation ü Hyperparameter tuning ü Prediction
  • 6. The truth is ü Productionizing ML is hard. ü 5% of Machine Learning models/data science projects will be used in a Production Environment. ü Most of the models developed will be “deployed” on power point slides. ü If we want to be from this 5% we have to understand the problem of ML deployment and how to solve it.
  • 7. Challenges of making the DS process successful Small to medium companies ü Lack of resources ü Lack of skills and knowledge ü Difficult to set up their own environment ü Need additional developers for moving models into production ü Need devops for maintaining the environment ü Struggles in defining the most valuable business problem Enterprises ü Over-centralized -> smaller and localized teams cannot be agile ü Lack of collaboration ü Difficult to integrate new technologies in the enterprise stack ü Lack of visibility ü Difficult to scale and put models in production ü Centralized data ownership
  • 8. A complete cycle ü Gathering data ü Preparing that data ü Choosing a model ü Training ü Evaluation ü Hyperparameter tuning ü Prediction We now have a model. What’s next?! ü Serialize ü Deploy ü Serving / Scoring ü Scaling ü Update ü Monitor
  • 10. Continuos features Vector Assember Scaler Scaled Continuos Feature Vector Categorical features String Indexer One hot encoder Categorical Feature Vector Vector Assember Vector Assember Final Feature Vector Linear Regression Model Object Storage Model Server Model Server Model Server … Load Balancer Inference API request From prototype to production
  • 11. requestEnd User Application prediction Continuos features Vector Assember Scaler Scaled Continuos Feature Vector Categorical features String Indexer One hot encoder Categorical Feature Vector Vector Assember Vector Assember Final Feature Vector Linear Regression Model Object Storage Model Server Model Server Model Server … Load Balancer Inference API request From prototype to production Model Server RCB & Workflow manager
  • 13. What is a Reusable Code Block? You can think of a Reusable Code Block as a template for creating tasks. Users typically package frequent tasks such as cleaning data, anonymizing data, training models etc. Reusable Code Blocks are shared with the entire data lake and are stored in Lentiq's global registry, meaning that any user with access to it can reuse the code block to perform similar tasks. There are two possible sources for an RCB: ü Custom Docker image: the image needs to be uploaded to a public repository ü A Jupyter Notebook (using Kaniko): one can choose the notebook from a list of available published notebooks, which are shared with all the users with access to the notebook’s data lake. This makes cooperation and sharing knowledge between departments easy.
  • 14. Reusable Code Blocks are shared with the entire data lake and are stored in Lentiq's global registry, meaning that any user with access to it can reuse the code block to perform similar tasks.
  • 15. Dockerless containers. RCB and Kaniko Kaniko is an open source tool created by Google for building container images from a Dockerfile and pushing them to a remote registry, without having root access to a Docker daemon. Kaniko enables building container images in environments that cannot easily or securely run a Docker daemon, like a Kubernetes cluster or a container. It executes each command within a Dockerfile completely in user-space, so the build does not require privileges. Privileged mode should be avoided at all costs to ensure a secure environment.
  • 16. How does it work? Kaniko builds as a root user within a container in an unprivileged environment. The Kaniko executor then fetches and extracts the base-image file system to root (the base image is the image in the FROM line of the Dockerfile). It executes each command in order, and takes a snapshot of the file system after each command. This snapshot is created in user-space by walking the filesystem and comparing it to the prior state that was stored in memory. It appends any modifications to the filesystem as a new layer to the base image, and makes any relevant changes to image metadata. After executing every command in the Dockerfile, the executor pushes the newly built image to the desired registry.
  • 17. Running Kaniko in a Kubernetes cluster Kaniko is run as a container in the cluster. The Job spec needs three arguments:
  • 18. Running Kaniko in a Kubernetes cluster Kaniko is run as a container in the cluster. The Job spec needs three arguments: ü -- dockerfile ü -- context: a path to a Dockerfile. This can be: • a Github repository (cloned using an init container) • a place Kaniko has access to, like a GCS or S3 storage bucket (compressed tar file) (or any other registry supported by Docker credential helpers) • a local directory (specified with an emptyDir volume) ü -- destination (repository) where Kaniko pushes the image
  • 19. Besides the Kubernetes Job definition, we have some additional requirements: ü A Kubernetes cluster … ü Kubernetes secret mounted as a data volume under /kaniko/.docker/config.json: ü Contains registry credentials required for pushing the final image ü For example, to push the image to a Docker Hub repository, you need to execute: ü Otherwise, you create a Kubernetes secret with your registry credentials using the following command: ü A configmap to store the Jupyter Notebook Building inside a Kubernetes cluster
  • 20. requestEnd User Application prediction Continuos features Vector Assember Scaler Scaled Continuos Feature Vector Categorical features String Indexer One hot encoder Categorical Feature Vector Vector Assember Vector Assember Final Feature Vector Linear Regression Model Object Storage Model Server Model Server Model Server … Load Balancer Inference API request From prototype to production Model Server RCB & Workflow manager
  • 21. Why the need of workflows?! How it used to be: ü Train a model ü Run a pipeline using scripts ü Set manual triggers ü Wait for jobs, ETL. ü Monitor What happened: ü The need for more iterations ü More experimentation ü More work for ops ü Tedious and repetitive tasks ü Reduces productivity We needed a tool to automate, schedule, and share machine learning pipelines
  • 23. ü Develop - create model in any framework (scikit-learn, SparkML, Tensorflow, etc.) ü Serialize – save it in a format that can be stored and transmitted over network ü Serving – used the model for online / batch inference (prediction or scoring) request Regression, Clustering, Random Forest, K-mens, XGBoost, Neural Network Object Storage Model Server Model Server Model Server … Load Balancer Inference API request End User Application prediction Model Server SparkMLlib Scikit-learn Tensorflow: One Runtime Model serialization / persistence
  • 24. Why MLeap? ü MLeap is a common serialization format and execution engine for machine learning pipelines. ü Minimizes the effort to serve models within a production environment ü MLeap provides simple interfaces to execute entire ML pipelines, from feature transformers to classifiers, regressions, clustering algorithms, and neural networks
  • 25. https://www.slideshare.net/JenAman/mleap-productionize-data-science-workflows-using-spark What else is there? ü PMML (xml) - Predictive Model Markup Language ü ONNX (protobuf, DL, Tensors) - Open Neural Network Exchange ü NNEF(DL, Tensors) - Neural networks exchange format ü PFA (json) – Portable Format for Analitycs Hard-coded models (SQL, Java, Ruby) PMML Emerging Solutions (yHat, DataRobot) Enterprise Solutions (Microsoft, IBM, SAS) Quick to implement Open Sourced Commited to Spark/ Hadoop API Server Infrastructure
  • 26. https://www.slideshare.net/JenAman/mleap-productionize-data-science-workflows-using-spark ü Low latency ü Scale fast (horizontally) ü Reliable and robust ü Model versioning and in place updates ü Monitoring and management ü Both online and batch mode ü Auto scaling What do we want from a model server?
  • 28. Auto Scaling ML serving Roadmap ü One of the hardest problems to solve is scaling in a cost effective way ü You have hundreds of API calls for predictions at 01:00 PM but we might encounter 100 000 calls at 07:00 PM (this is the time most users will use the app) ü We need: ü Target metric ü Min-max capacity ü Cool down period
  • 29. Monitoring and troubleshooting Roadmap ü Monitoring and profiling of production traffic ü Monitoring of models performance ü Model interpretation (Interpretability is as important as creating a model)
  • 30. ü Multiple trials ü Bayesian methods ü Hyper-parameter tuning using ParamGrid ü Ideally to learn from previous runs ü Run multiple experiments in parallel or sequentially ü Ideally to learn from previous experiments (to guide future experiments) Roadmap HyperParameter tuning
  • 31. Distributed training (Tensorflow & SparkML) Roadmap ü It can take a loooong time to train ü 1.000 cpus, gpu , tpu ü HorovodRunner: Distributed Deep Learning
  • 32. Conclusions ü Jupyter notebooks / code can be encapsulated inside docker containers to be shared and reused ü Workflow engine automate and schedule machine learning pipelines ü Machine learning models are queried via REST APIs ü Scalable model serving / inference using Model Server
  • 35. ü https://github.com/mlflow/mlflow-example/ ü https://towardsdatascience.com/the-7-steps-of-machine-learning-2877d7e5548e ü https://www.anaconda.com/productionizing-and-deploying-data-science-projects/ ü https://events.linuxfoundation.org/wp-content/uploads/2017/12/Productionizing-ML-Pipelines-with-the-Portable-Format-for- Analytics-Nick-Pentreath-IBM.pdf ü https://hackernoon.com/a-guide-to-scaling-machine-learning-models-in-production-aa8831163846 ü https://blog.algorithmia.com/deploying-machine-learning-at-scale/ ü https://medium.freecodecamp.org/a-beginners-guide-to-training-and-deploying-machine-learning-models-using-python- 48a313502e5a ü https://towardsdatascience.com/how-to-train-your-neural-networks-in-parallel-with-keras-and-apache-spark-ea8a3f48cae6 ü https://hydrosphere.io/serving-docs/latest/components/runtimes.html References: