SlideShare a Scribd company logo
HIGH PERFORMANCE DISTRIBUTED TENSORFLOW
IN PRODUCTION WITH GPUS AND KUBERNETES
HPC ADVISORY COUNCIL, FEB 2018
CHRIS FREGLY
FOUNDER @ PIPELINE.AI
HIGH PERFORMANCE DISTRIBUTED TENSORFLOW
IN PRODUCTION WITH GPUS AND KUBERNETES
HPC ADVISORY COUNCIL, FEB 2018
CHRIS FREGLY
FOUNDER @ PIPELINE.AI
KEY TAKE-AWAYS
Optimize Your Models After Training
Validate Models Online in Live Production (Safely!)
Evaluate Model Performance Offline *and* Online
Monitor and Tune Your Model Serving Runtime
INTRODUCTIONS: ME
§ Chris Fregly, Founder & Engineer @PipelineAI
§ Formerly Netflix, Databricks, IBM Spark Tech
§ Advanced Spark and TensorFlow Meetup
§ Please Join Our 60,000+ Global Members!!
Contact Me
chris@pipeline.ai
@cfregly
Global Locations
* San Francisco
* Chicago
* Austin
* Washington DC
* Dusseldorf
* London
INTRODUCTIONS: YOU
§ Data Scientist, Data Engineer, Data Analyst, Data Curious
§ Want to Deploy ML/AI Models Rapidly and Safely
§ Need to Trace or Explain Model Predictions
§ Have a Decent Grasp of Computer Science Fundamentals
PIPELINE.AI IS 100% OPEN SOURCE
§ https://github.com/PipelineAI/pipeline/
§ Please Star this GitHub Repo!
§ VC’s Value GitHub Stars @ $1,500 Each (?!)
GitHub Repo Geo Heat Map: http://jrvis.com/red-dwarf/
PIPELINE.AI OVERVIEW
500,000Docker Downloads
60,000 Registered Users
60,000 Meetup Members
30,000 LinkedIn Followers
2,400 GitHub Stars
15 Enterprise Beta Users
RECENT PIPELINE.AI NEWS
Sept 2017
Dec 2017
Jan 2018
PipelineAI Becomes Google ML/AI Expert
Register to Install PipelineAI
in Your Own Environment
(Starting March 2018)
http://pipeline.ai
Try GPU Community
Edition Today!
http://community.pipeline.ai
WHY HEAVY FOCUS ON MODEL SERVING?
Model Training
Batch & Boring
Offline in Research Lab
Pipeline Ends at Training
No Insight into Live Production
Small Number of Data Scientists
Optimizations Very Well-Known
Real-Time & Exciting!!
Online in Live Production
Pipeline Extends into Production
Continuous Insight into Live Production
Huuuuuuge Number of Application Users
Runtime Optimizations Not Yet Explored
<<<
Model Serving
100’s Training Jobs per Day 1,000,000’s Predictions per Sec
CLOUD-BASED MODEL SERVING OPTIONS
§ AWS SageMaker
§ Released Nov 2017 @ Re-invent
§ Custom Docker Images for Training/Serving (ie. PipelineAI Images)
§ Distributed TensorFlow Training through Estimator API
§ Traffic Splitting for A/B Model Testing
§ Google Cloud ML Engine
§ Mostly Command-Line Based
§ Driving TensorFlow Open Source API (ie. Estimator API)
§ Azure ML
PipelineAI Supports
Hybrid-Cloud
Deployments
BUILD MODEL WITH THE RUNTIME
§ Package Model + Runtime into 1 Docker Image
§ Emphasizes Immutable Deployment and Infrastructure
§ Same Image Across All Environments
§ No Library or Dependency Surprises from Laptop to Production
§ Allows Tuning Model + Runtime Together
pipeline predict-server-build --model-name=mnist 
--model-tag=A 
--model-type=tensorflow 
--model-runtime=tfserving 
--model-chip=gpu 
--model-path=./tensorflow/mnist/
Build Local
Model Server A
TUNE MODEL + RUNTIME TOGETHER
§ Model Training Optimizations
§ Model Hyper-Parameters (ie. Learning Rate)
§ Reduced Precision (ie. FP16 Half Precision)
§ Model Serving (Post-Train) Optimizations
§ Quantize Model Weights + Activations From 32-bit to 8-bit
§ Fuse Neural Network Layers Together
§ Model Runtime Optimizations
§ Runtime Config: Request Batch Size, etc
§ Different Runtime: TensorFlow Serving CPU/GPU, Nvidia TensorRT
SERVING (POST-TRAIN) OPTIMIZATIONS
§ Prepare Model for Serving
§ Simplify Network, Reduce Size
§ Reduce Precision -> Fast Math
§ Some Tools
§ Graph Transform Tool (GTT)
§ tfcompile
After Training
After
Optimizing!
pipeline optimize --optimization-list=[‘quantize_weights’,‘tfcompile’] 
--model-name=mnist 
--model-tag=A 
--model-path=./tensorflow/mnist/model 
--model-inputs=[‘x’] 
--model-outputs=[‘add’] 
--output-path=./tensorflow/mnist/optimized_model
Linear
Regression
Model Size: 70MB –> 70K (!)
NVIDIA TENSOR-RT RUNTIME
§ Post-Training Model Optimizations
§ Specific to Nvidia GPUs
§ GPU-Optimized Prediction Runtime
§ Alternative to TensorFlow Serving
§ PipelineAI Supports TensorRT!
TENSORFLOW LITE RUNTIME
§ Post-Training Model Optimizations
§ Currently Supports iOS and Android
§ On-Device Prediction Runtime
§ Low-Latency, Fast Startup
§ Selective Operator Loading
§ 70KB Min - 300KB Max Runtime Footprint
§ Supports Accelerators (GPU, TPU)
§ Falls Back to CPU without Accelerator
§ Java and C++ APIs
3 DIFFERENT RUNTIMES, SAME MODEL
pipeline predict-server-build --model-name=mnist 
--model-tag=C 
--model-type=tensorflow 
--model-runtime=tensorrt 
--model-chip=gpu 
--model-path=./tensorflow/mnist/
Build Local
Model Server C
pipeline predict-server-build --model-name=mnist 
--model-tag=A 
--model-type=tensorflow 
--model-runtime=tfserving 
--model-chip=cpu 
--model-path=./tensorflow/mnist/
Build Local
Model Server A
pipeline predict-server-build --model-name=mnist 
--model-tag=B 
--model-type=tensorflow 
--model-runtime=tfserving 
--model-chip=gpu 
--model-path=./tensorflow/mnist/
Build Local
Model Server B
Same Model,
Diff Runtime
RUN A LOADTEST LOCALLY!
§ Perform Mini-Load Test on Local Model Server
§ Immediate, Local Prediction Performance Metrics
§ Compare to Previous Model + Runtime Variations
§ Gain Intuition Before Push to Prod
pipeline predict-server-start --model-name=mnist 
--model-tag=A 
--memory-limit=2G
pipeline predict-http-test --model-endpoint-url=http://localhost:8080 
--test-request-path=test_request.json 
--test-request-concurrency=1000
Start Local
LoadTest
Start Local
Model Servers
PUSH IMAGE TO DOCKER REGISTRY
§ Supports All Public + Private Docker Registries
§ DockerHub, Artifactory, Quay, AWS, Google, …
§ Or Self-Hosted, Private Docker Registry
pipeline predict-server-push --model-name=mnist 
--model-tag=A 
--image-registry-url=<your-registry> 
--image-registry-repo=<your-repo>
Push Images to
Docker Registry
DEPLOY MODELS SAFELY TO PROD
§ Deploy from CLI or Jupyter Notebook
§ Tear-Down and Rollback Models Quickly
§ Shadow Canary: Deploy to 20% Live Traffic
§ Split Canary: Deploy to 97-2-1% Live Traffic
pipeline predict-kube-start --model-name=mnist 
--model-tag=BStart Cluster B
pipeline predict-kube-start --model-name=mnist 
--model-tag=CStart Cluster C
pipeline predict-kube-start --model-name=mnist 
--model-tag=AStart Cluster A
pipeline predict-kube-route --model-name=mnist 
--model-split-tag-and-weight-dict='{"A":97, "B":2, "C”:1}' 
--model-shadow-tag-list='[]'
Route Live Traffic
COMPARE MODELS OFFLINE & ONLINE
§ Offline, Batch Metrics
§ Validation + Training Accuracy
§ CPU + GPU Utilization
§ Online, Live Prediction Values
§ Compare Relative Precision
§ Newly-Seen, Streaming Data
§ Online, Real-Time Metrics
§ Response Time, Throughput
§ Cost ($) Per Prediction
ENSEMBLE PREDICTION AUDIT TRAIL
§ Necessary for Model Explain-ability
§ Fine-Grained Request Tracing
§ Used for Model Ensembles
REAL-TIME PREDICTION STREAMS
§ Visually Compare Real-time Predictions
Features and
Inputs
Predictions and
Confidences
Model B Model CModel A
PREDICTION PROFILING AND TUNING
§ Pinpoint Performance Bottlenecks
§ Fine-Grained Prediction Metrics
§ 3 Steps in Real-Time Prediction
1. transform_request()
2. predict()
3. transform_response()
SHIFT TRAFFIC TO MAX(REVENUE)
§ Shift Traffic to Winning Model with Multi-armed Bandits
LIVE, ADAPTIVE TRAFFIC ROUTING
§ A/B Tests
§ Inflexible and Boring
§ Multi-Armed Bandits
§ Adaptive and Exciting!
pipeline predict-kube-route --model-name=mnist 
--model-split-tag-and-weight-dict='{"A":1, "B":2, "C”:97}’ 
--model-shadow-tag-list='[]'
Route Traffic
Dynamically
SHIFT TRAFFIC TO MIN(CLOUD CO$T)
§ Based on Cost ($) Per Prediction
§ Cost Changes Throughout Day
§ Lose AWS Spot Instances
§ Google Cloud Becomes Cheaper
§ Shift Across Clouds & On-Prem
PSEUDO-CONTINUOUS TRAINING
§ Identify and Fix Borderline (Unconfident) Predictions
§ Fix Predictions Along Class Boundaries
§ Facilitate ”Human in the Loop”
§ Retrain with Newly-Labeled Data
§ Game-ify the Labeling Process
§ Path to Crowd-Sourced Labeling
CONTINUOUS MODEL TRAINING
§ The Holy Grail of Machine Learning!
§ PipelineAI Supports Continuous Model Training!
§ Kafka, Kinesis
§ Spark Streaming, Flink
§ Storm, Heron
THANK YOU!!
§ Please Star this GitHub Repo!
§ All slides, code, notebooks, and Docker images here:
https://github.com/PipelineAI/pipeline
Contact Me
chris@pipeline.ai
@cfregly

More Related Content

PDF
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PDF
PaaS with Java
PDF
ShadowReader - Serverless load tests for replaying production traffic
PDF
AWS Step Functions 実践
PDF
Lessons Learned from building a serverless API
PDF
Accelerated Training of Transformer Models
PDF
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PaaS with Java
ShadowReader - Serverless load tests for replaying production traffic
AWS Step Functions 実践
Lessons Learned from building a serverless API
Accelerated Training of Transformer Models
FaaS or not to FaaS. Visible and invisible benefits of the Serverless paradig...

What's hot (16)

PDF
Building your own calendly using amazon app sync
PDF
Scaling Machine Learning To Billions Of Parameters
PDF
アドテク×Scala @Dynalyst
PDF
Amazon EC2 Container Service Live Demo - Microservices Web Day
PDF
Low Cost AWS Services For Application Development in the Cloud
PDF
Camel Desing Patterns Learned Through Blood, Sweat, and Tears
PPTX
Running Vue Storefront in production (PWA Magento webshop)
PDF
Page experience road - WordCamp Athens 2022
PDF
Serverless Architecture Patterns - Manoj Ganapathi
PDF
Heat optimization
PPTX
Serverless by examples and case studies
ODP
Building Complex Data Workflows with Cascading on Hadoop
PDF
Digdag Updates 2020 July
PPTX
Performance on a budget
PDF
DW on AWS
PDF
Gab2015 nicolas fonrose_costefficiencywithmicrosoftazure.pptx
Building your own calendly using amazon app sync
Scaling Machine Learning To Billions Of Parameters
アドテク×Scala @Dynalyst
Amazon EC2 Container Service Live Demo - Microservices Web Day
Low Cost AWS Services For Application Development in the Cloud
Camel Desing Patterns Learned Through Blood, Sweat, and Tears
Running Vue Storefront in production (PWA Magento webshop)
Page experience road - WordCamp Athens 2022
Serverless Architecture Patterns - Manoj Ganapathi
Heat optimization
Serverless by examples and case studies
Building Complex Data Workflows with Cascading on Hadoop
Digdag Updates 2020 July
Performance on a budget
DW on AWS
Gab2015 nicolas fonrose_costefficiencywithmicrosoftazure.pptx
Ad

Similar to High Performance Distributed TensorFlow with GPUs and Kubernetes (20)

PDF
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PDF
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
PDF
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PDF
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PDF
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
PDF
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
PDF
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
PDF
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
PDF
Nvidia GPU Tech Conference - Optimizing, Profiling, and Deploying TensorFlow...
PDF
Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...
PDF
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
PDF
A survey on Machine Learning In Production (July 2018)
PDF
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PDF
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
PDF
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
PDF
Hydrosphere.io for ODSC: Webinar on Kubeflow
PPTX
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
PPTX
AWS re:Invent 2018 - Machine Learning recap (December 2018)
PDF
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
PPTX
AWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI
PipelineAI + AWS SageMaker + Distributed TensorFlow + AI Model Training and S...
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Nvidia GPU Tech Conference - Optimizing, Profiling, and Deploying TensorFlow...
Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
A survey on Machine Learning In Production (July 2018)
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
Hydrosphere.io for ODSC: Webinar on Kubeflow
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
AWS re:Invent 2018 - Machine Learning recap (December 2018)
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
Ad

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
PPTX
Transforming Private 5G Networks
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
PDF
HPC Impact: EDA Telemetry Neural Networks
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
PDF
Machine Learning for Weather Forecasts
PPTX
HPC AI Advisory Council Update
PDF
Fugaku Supercomputer joins fight against COVID-19
PDF
Energy Efficient Computing using Dynamic Tuning
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
PDF
State of ARM-based HPC
PDF
Versal Premium ACAP for Network and Cloud Acceleration
PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
PDF
Scaling TCO in a Post Moore's Era
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
PDF
Introducing HPC with a Raspberry Pi Cluster
PDF
Overview of HPC Interconnects
Major Market Shifts in IT
Preparing to program Aurora at Exascale - Early experiences and future direct...
Transforming Private 5G Networks
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
HPC Impact: EDA Telemetry Neural Networks
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Machine Learning for Weather Forecasts
HPC AI Advisory Council Update
Fugaku Supercomputer joins fight against COVID-19
Energy Efficient Computing using Dynamic Tuning
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
State of ARM-based HPC
Versal Premium ACAP for Network and Cloud Acceleration
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Scaling TCO in a Post Moore's Era
CUDA-Python and RAPIDS for blazing fast scientific computing
Introducing HPC with a Raspberry Pi Cluster
Overview of HPC Interconnects

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
KodekX | Application Modernization Development
PPTX
Big Data Technologies - Introduction.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
A Presentation on Artificial Intelligence
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
cuic standard and advanced reporting.pdf
PPTX
Cloud computing and distributed systems.
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Electronic commerce courselecture one. Pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
Spectral efficient network and resource selection model in 5G networks
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KodekX | Application Modernization Development
Big Data Technologies - Introduction.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
A Presentation on Artificial Intelligence
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
cuic standard and advanced reporting.pdf
Cloud computing and distributed systems.
Dropbox Q2 2025 Financial Results & Investor Presentation
NewMind AI Weekly Chronicles - August'25 Week I
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
The AUB Centre for AI in Media Proposal.docx
Machine learning based COVID-19 study performance prediction
Electronic commerce courselecture one. Pdf
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Per capita expenditure prediction using model stacking based on satellite ima...

High Performance Distributed TensorFlow with GPUs and Kubernetes

  • 1. HIGH PERFORMANCE DISTRIBUTED TENSORFLOW IN PRODUCTION WITH GPUS AND KUBERNETES HPC ADVISORY COUNCIL, FEB 2018 CHRIS FREGLY FOUNDER @ PIPELINE.AI
  • 2. HIGH PERFORMANCE DISTRIBUTED TENSORFLOW IN PRODUCTION WITH GPUS AND KUBERNETES HPC ADVISORY COUNCIL, FEB 2018 CHRIS FREGLY FOUNDER @ PIPELINE.AI
  • 3. KEY TAKE-AWAYS Optimize Your Models After Training Validate Models Online in Live Production (Safely!) Evaluate Model Performance Offline *and* Online Monitor and Tune Your Model Serving Runtime
  • 4. INTRODUCTIONS: ME § Chris Fregly, Founder & Engineer @PipelineAI § Formerly Netflix, Databricks, IBM Spark Tech § Advanced Spark and TensorFlow Meetup § Please Join Our 60,000+ Global Members!! Contact Me chris@pipeline.ai @cfregly Global Locations * San Francisco * Chicago * Austin * Washington DC * Dusseldorf * London
  • 5. INTRODUCTIONS: YOU § Data Scientist, Data Engineer, Data Analyst, Data Curious § Want to Deploy ML/AI Models Rapidly and Safely § Need to Trace or Explain Model Predictions § Have a Decent Grasp of Computer Science Fundamentals
  • 6. PIPELINE.AI IS 100% OPEN SOURCE § https://github.com/PipelineAI/pipeline/ § Please Star this GitHub Repo! § VC’s Value GitHub Stars @ $1,500 Each (?!) GitHub Repo Geo Heat Map: http://jrvis.com/red-dwarf/
  • 7. PIPELINE.AI OVERVIEW 500,000Docker Downloads 60,000 Registered Users 60,000 Meetup Members 30,000 LinkedIn Followers 2,400 GitHub Stars 15 Enterprise Beta Users
  • 8. RECENT PIPELINE.AI NEWS Sept 2017 Dec 2017 Jan 2018 PipelineAI Becomes Google ML/AI Expert Register to Install PipelineAI in Your Own Environment (Starting March 2018) http://pipeline.ai Try GPU Community Edition Today! http://community.pipeline.ai
  • 9. WHY HEAVY FOCUS ON MODEL SERVING? Model Training Batch & Boring Offline in Research Lab Pipeline Ends at Training No Insight into Live Production Small Number of Data Scientists Optimizations Very Well-Known Real-Time & Exciting!! Online in Live Production Pipeline Extends into Production Continuous Insight into Live Production Huuuuuuge Number of Application Users Runtime Optimizations Not Yet Explored <<< Model Serving 100’s Training Jobs per Day 1,000,000’s Predictions per Sec
  • 10. CLOUD-BASED MODEL SERVING OPTIONS § AWS SageMaker § Released Nov 2017 @ Re-invent § Custom Docker Images for Training/Serving (ie. PipelineAI Images) § Distributed TensorFlow Training through Estimator API § Traffic Splitting for A/B Model Testing § Google Cloud ML Engine § Mostly Command-Line Based § Driving TensorFlow Open Source API (ie. Estimator API) § Azure ML PipelineAI Supports Hybrid-Cloud Deployments
  • 11. BUILD MODEL WITH THE RUNTIME § Package Model + Runtime into 1 Docker Image § Emphasizes Immutable Deployment and Infrastructure § Same Image Across All Environments § No Library or Dependency Surprises from Laptop to Production § Allows Tuning Model + Runtime Together pipeline predict-server-build --model-name=mnist --model-tag=A --model-type=tensorflow --model-runtime=tfserving --model-chip=gpu --model-path=./tensorflow/mnist/ Build Local Model Server A
  • 12. TUNE MODEL + RUNTIME TOGETHER § Model Training Optimizations § Model Hyper-Parameters (ie. Learning Rate) § Reduced Precision (ie. FP16 Half Precision) § Model Serving (Post-Train) Optimizations § Quantize Model Weights + Activations From 32-bit to 8-bit § Fuse Neural Network Layers Together § Model Runtime Optimizations § Runtime Config: Request Batch Size, etc § Different Runtime: TensorFlow Serving CPU/GPU, Nvidia TensorRT
  • 13. SERVING (POST-TRAIN) OPTIMIZATIONS § Prepare Model for Serving § Simplify Network, Reduce Size § Reduce Precision -> Fast Math § Some Tools § Graph Transform Tool (GTT) § tfcompile After Training After Optimizing! pipeline optimize --optimization-list=[‘quantize_weights’,‘tfcompile’] --model-name=mnist --model-tag=A --model-path=./tensorflow/mnist/model --model-inputs=[‘x’] --model-outputs=[‘add’] --output-path=./tensorflow/mnist/optimized_model Linear Regression Model Size: 70MB –> 70K (!)
  • 14. NVIDIA TENSOR-RT RUNTIME § Post-Training Model Optimizations § Specific to Nvidia GPUs § GPU-Optimized Prediction Runtime § Alternative to TensorFlow Serving § PipelineAI Supports TensorRT!
  • 15. TENSORFLOW LITE RUNTIME § Post-Training Model Optimizations § Currently Supports iOS and Android § On-Device Prediction Runtime § Low-Latency, Fast Startup § Selective Operator Loading § 70KB Min - 300KB Max Runtime Footprint § Supports Accelerators (GPU, TPU) § Falls Back to CPU without Accelerator § Java and C++ APIs
  • 16. 3 DIFFERENT RUNTIMES, SAME MODEL pipeline predict-server-build --model-name=mnist --model-tag=C --model-type=tensorflow --model-runtime=tensorrt --model-chip=gpu --model-path=./tensorflow/mnist/ Build Local Model Server C pipeline predict-server-build --model-name=mnist --model-tag=A --model-type=tensorflow --model-runtime=tfserving --model-chip=cpu --model-path=./tensorflow/mnist/ Build Local Model Server A pipeline predict-server-build --model-name=mnist --model-tag=B --model-type=tensorflow --model-runtime=tfserving --model-chip=gpu --model-path=./tensorflow/mnist/ Build Local Model Server B Same Model, Diff Runtime
  • 17. RUN A LOADTEST LOCALLY! § Perform Mini-Load Test on Local Model Server § Immediate, Local Prediction Performance Metrics § Compare to Previous Model + Runtime Variations § Gain Intuition Before Push to Prod pipeline predict-server-start --model-name=mnist --model-tag=A --memory-limit=2G pipeline predict-http-test --model-endpoint-url=http://localhost:8080 --test-request-path=test_request.json --test-request-concurrency=1000 Start Local LoadTest Start Local Model Servers
  • 18. PUSH IMAGE TO DOCKER REGISTRY § Supports All Public + Private Docker Registries § DockerHub, Artifactory, Quay, AWS, Google, … § Or Self-Hosted, Private Docker Registry pipeline predict-server-push --model-name=mnist --model-tag=A --image-registry-url=<your-registry> --image-registry-repo=<your-repo> Push Images to Docker Registry
  • 19. DEPLOY MODELS SAFELY TO PROD § Deploy from CLI or Jupyter Notebook § Tear-Down and Rollback Models Quickly § Shadow Canary: Deploy to 20% Live Traffic § Split Canary: Deploy to 97-2-1% Live Traffic pipeline predict-kube-start --model-name=mnist --model-tag=BStart Cluster B pipeline predict-kube-start --model-name=mnist --model-tag=CStart Cluster C pipeline predict-kube-start --model-name=mnist --model-tag=AStart Cluster A pipeline predict-kube-route --model-name=mnist --model-split-tag-and-weight-dict='{"A":97, "B":2, "C”:1}' --model-shadow-tag-list='[]' Route Live Traffic
  • 20. COMPARE MODELS OFFLINE & ONLINE § Offline, Batch Metrics § Validation + Training Accuracy § CPU + GPU Utilization § Online, Live Prediction Values § Compare Relative Precision § Newly-Seen, Streaming Data § Online, Real-Time Metrics § Response Time, Throughput § Cost ($) Per Prediction
  • 21. ENSEMBLE PREDICTION AUDIT TRAIL § Necessary for Model Explain-ability § Fine-Grained Request Tracing § Used for Model Ensembles
  • 22. REAL-TIME PREDICTION STREAMS § Visually Compare Real-time Predictions Features and Inputs Predictions and Confidences Model B Model CModel A
  • 23. PREDICTION PROFILING AND TUNING § Pinpoint Performance Bottlenecks § Fine-Grained Prediction Metrics § 3 Steps in Real-Time Prediction 1. transform_request() 2. predict() 3. transform_response()
  • 24. SHIFT TRAFFIC TO MAX(REVENUE) § Shift Traffic to Winning Model with Multi-armed Bandits
  • 25. LIVE, ADAPTIVE TRAFFIC ROUTING § A/B Tests § Inflexible and Boring § Multi-Armed Bandits § Adaptive and Exciting! pipeline predict-kube-route --model-name=mnist --model-split-tag-and-weight-dict='{"A":1, "B":2, "C”:97}’ --model-shadow-tag-list='[]' Route Traffic Dynamically
  • 26. SHIFT TRAFFIC TO MIN(CLOUD CO$T) § Based on Cost ($) Per Prediction § Cost Changes Throughout Day § Lose AWS Spot Instances § Google Cloud Becomes Cheaper § Shift Across Clouds & On-Prem
  • 27. PSEUDO-CONTINUOUS TRAINING § Identify and Fix Borderline (Unconfident) Predictions § Fix Predictions Along Class Boundaries § Facilitate ”Human in the Loop” § Retrain with Newly-Labeled Data § Game-ify the Labeling Process § Path to Crowd-Sourced Labeling
  • 28. CONTINUOUS MODEL TRAINING § The Holy Grail of Machine Learning! § PipelineAI Supports Continuous Model Training! § Kafka, Kinesis § Spark Streaming, Flink § Storm, Heron
  • 29. THANK YOU!! § Please Star this GitHub Repo! § All slides, code, notebooks, and Docker images here: https://github.com/PipelineAI/pipeline Contact Me chris@pipeline.ai @cfregly