SlideShare a Scribd company logo
PIPELINE.AI: HIGH PERFORMANCE MODEL
TRAINING & SERVING WITH GPUS…
…AND AWS SAGEMAKER, GOOGLE CLOUD ML,
AZURE ML & KUBERNETES!
CHRIS FREGLY
FOUNDER @ PIPELINE.AI
RECENT PIPELINE.AI NEWS
Sept 2017
Dec 2017
INTRODUCTIONS: ME
§ Chris Fregly, Founder & Engineer @PipelineAI
§ Formerly Netflix, Databricks, IBM Spark Tech
§ Advanced Spark and TensorFlow Meetup
§ Please Join Our 60,000+ Global Members!!
Contact Me
chris@pipeline.ai
@cfregly
Global Locations
* San Francisco
* Chicago
* Austin
* Washington DC
* Dusseldorf
* London
INTRODUCTIONS: YOU
§ Software Engineer, Data Scientist, Data Engineer, Data Analyst
§ Interested in Optimizing and Deploying TF Models to Production
§ Nice to Have a Working Knowledge of TensorFlow (Not Required)
PIPELINE.AI IS 100% OPEN SOURCE
§ https://github.com/PipelineAI/pipeline/
§ Please Star 🌟 this GitHub Repo!
§ Some VC’s Value GitHub Stars @ $15,000 Each (?!)
PIPELINE.AI OVERVIEW
450,000 Docker Downloads
60,000 Users Registered for GA
60,000 Meetup Members
40,000 LinkedIn Followers
2,200 GitHub Stars
12 Enterprise Beta Users
WHY HEAVY FOCUS ON MODEL SERVING?
Model Training
Batch & Boring
Offline in Research Lab
Pipeline Ends at Training
No Insight into Live Production
Small Number of Data Scientists
Optimizations Very Well-Known
Real-Time & Exciting!!
Online in Live Production
Pipeline Extends into Production
Continuous Insight into Live Production
Huuuuuuge Number of Application Users
**Many Optimizations Not Yet Utilized
<<<
Model Serving
100’s Training Jobs per Day 1,000,000’s Predictions per Sec
AGENDA
§ Deploy and Tune Models + Runtimes Safely in Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
§ Live, Continuous Model Training in Production
PACKAGE MODEL + RUNTIME AS ONE
§ Build Model with Runtime into Immutable Docker Image
§ Emphasize Immutable Deployment and Infrastructure
§ Same Runtime Dependencies in All Environments
§ Local, Development, Staging, Production
§ No Library or Dependency Surprises
§ Deploy and Tune Model + Runtime Together
pipeline predict-server-build --model-type=tensorflow 
--model-name=mnist 
--model-tag=A 
--model-path=./models/tensorflow/mnist/
Build Local
Model Server A
LOAD TEST LOCAL MODEL + RUNTIME
§ Perform Mini-Load Test on Local Model Server
§ Immediate, Local Prediction Performance Metrics
§ Compare to Previous Model + Runtime Variations
pipeline predict-server-start --model-type=tensorflow 
--model-name=mnist 
--model-tag=A
pipeline predict --model-endpoint-url=http://localhost:8080 
--test-request-path=test_request.json 
--test-request-concurrency=1000
Load Test Local
Model Server A
Start Local
Model Server A
PUSH IMAGE TO DOCKER REGISTRY
§ Supports All Public + Private Docker Registries
§ DockerHub, Artifactory, Quay, AWS, Google, …
§ Or Self-Hosted, Private Docker Registry
pipeline predict-server-push --image-registry-url=<your-registry> 
--image-registry-repo=<your-repo> 
--model-type=tensorflow 
--model-name=mnist 
--model-tag=A
Push Image To
Docker Registry
CLOUD-BASED OPTIONS
§ AWS SageMaker
§ Released Nov 2017 @ Re-invent
§ Custom Docker Images for Training & Serving ie. PipelineAI Images
§ Distributed TensorFlow Training through Estimator API
§ Traffic Splitting for A/B Model Testing
§ Google Cloud ML Engine
§ Mostly Command-Line Based
§ Driving TensorFlow Open Source API (ie. Experiment API)
§ Azure ML
TUNE MODEL + RUNTIME AS SINGLE UNIT
§ Model Training Optimizations
§ Model Hyper-Parameters (ie. Learning Rate)
§ Reduced Precision (ie. FP16 Half Precision)
§ Post-Training Model Optimizations
§ Quantize Model Weights + Activations From 32-bit to 8-bit
§ Fuse Neural Network Layers Together
§ Model Runtime Optimizations
§ Runtime Configs (ie. Request Batch Size)
§ Different Runtimes (ie. TensorFlow Lite, Nvidia TensorRT)
POST-TRAINING OPTIMIZATIONS
§ Prepare Model for Serving
§ Simplify Network
§ Reduce Model Size
§ Quantize for Fast Matrix Math
§ Some Tools
§ Graph Transform Tool (GTT)
§ tfcompile
After Training
After
Optimizing!
pipeline optimize --optimization-list=[quantize_weights, tfcompile] 
--model-type=tensorflow 
--model-name=mnist 
--model-tag=A 
--model-path=./tensorflow/mnist/model 
--output-path=./tensorflow/mnist/optimized_model
Linear
Regression
RUNTIME OPTION: TENSORFLOW LITE
§ Post-Training Model Optimizations
§ Currently Supports iOS and Android
§ On-Device Prediction Runtime
§ Low-Latency, Fast Startup
§ Selective Operator Loading
§ 70KB Min - 300KB Max Runtime Footprint
§ Supports Accelerators (GPU, TPU)
§ Falls Back to CPU without Accelerator
§ Java and C++ APIs
RUNTIME OPTION: NVIDIA TENSOR-RT
§ Post-Training Model Optimizations
§ Specific to Nvidia GPU
§ GPU-Optimized Prediction Runtime
§ Alternative to TensorFlow Serving
§ PipelineAI Supports TensorRT!
DEPLOY MODELS SAFELY TO PROD
§ Deploy from CLI or Jupyter Notebook
§ Tear-Down or Rollback Models Quickly
§ Shadow Canary Deploy: ie.20% Live Traffic
§ Split Canary Deploy: ie. 97-2-1% Live Traffic
pipeline predict-cluster-start --model-runtime=tflite 
--model-type=tensorflow 
--model-name=mnist 
--model-tag=B 
--traffic-split=2
Start Production
Model Cluster B
pipeline predict-cluster-start --model-runtime=tensorrt 
--model-type=tensorflow 
--model-name=mnist 
--model-tag=C 
--traffic-split=1
Start Production
Model Cluster C
pipeline predict-cluster-start --model-runtime=tfserving_gpu 
--model-type=tensorflow 
--model-name=mnist 
--model-tag=A 
--traffic-split=97
Start Production
Model Cluster A
AGENDA
§ Deploy and Tune Models + Runtimes Safely in Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
§ Live, Continuous Model Training in Production
COMPARE MODELS OFFLINE & ONLINE
§ Offline, Batch Metrics
§ Validation + Training Accuracy
§ CPU + GPU Utilization
§ Live Prediction Values
§ Compare Relative Precision
§ Newly-Seen, Streaming Data
§ Online, Real-Time Metrics
§ Response Time, Throughput
§ Cost ($) Per Prediction
VIEW REAL-TIME PREDICTION STREAM
§ Visually Compare Real-Time Predictions
Prediction
Inputs
Prediction
Results &
Confidences
Model B Model CModel A
PREDICTION PROFILING AND TUNING
§ Pinpoint Performance Bottlenecks
§ Fine-Grained Prediction Metrics
§ 3 Steps in Real-Time Prediction
1. transform_request()
2. predict()
3. transform_response()
AGENDA
§ Deploy and Tune Models + Runtimes Safely in Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
§ Live, Continuous Model Training in Production
LIVE, ADAPTIVE TRAFFIC ROUTING
§ A/B Tests
§ Inflexible and Boring
§ Multi-Armed Bandits
§ Adaptive and Exciting!
pipeline traffic-router-split --model-type=tensorflow 
--model-name=mnist 
--model-tag-list=[A,B,C] 
--model-weight-list=[1,2,97]
Adjust
Traffic Routing
Dynamically
SHIFT TRAFFIC TO MAX(REVENUE)
§ Shift Traffic to Winning Model using AI Bandit Algos
SHIFT TRAFFIC TO MIN(CLOUD CO$T)
§ Based on Cost ($) Per Prediction
§ Cost Changes Throughout Day
§ Lose AWS Spot Instances
§ Google Cloud Becomes Cheaper
§ Shift Across Clouds & On-Prem
AGENDA
§ Deploy and Tune Models + Runtimes Safely in Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
§ Live, Continuous Model Training in Production
LIVE, CONTINUOUS MODEL TRAINING
§ The Holy Grail of Machine Learning
§ Q1 2018: PipelineAI Supports Continuous Model Training!
§ Kafka, Kinesis
§ Spark Streaming
PSEUDO-CONTINUOUS TRAINING
§ Identify and Fix Borderline Predictions (~50-50% Confidence)
§ Fix Along Class Boundaries
§ Retrain Newly-Labeled Data
§ Game-ify Labeling Process
§ Enable Crowd Sourcing
DEMO: TRAIN, DEPLOY, TEST MODEL
§ https://github.com/PipelineAI/pipeline/
§ Please Star 🌟 this GitHub Repo!
pipeline predict-server-build --model-type=tensorflow 
--model-name=mnist 
--model-tag=A 
--model-path=./models/tensorflow/mnist/
THANK YOU!!
§ https://github.com/PipelineAI/pipeline/
§ Please Star 🌟 this GitHub Repo!
§ Reminder: VC’s Value GitHub Stars @ $15,000 Each (!!)
Contact Me
chris@pipeline.ai
@cfregly

More Related Content

PDF
Distributed Crypto-Currency Trading with Apache Pulsar
PDF
Accelerating analytics in a new era of data
PDF
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
PDF
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
PPTX
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
PDF
Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...
PPTX
Zero Downtime App Deployment using Hadoop
PDF
Introduction to SQream and the IoT environment
Distributed Crypto-Currency Trading with Apache Pulsar
Accelerating analytics in a new era of data
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Using Azure Databricks, Structured Streaming, and Deep Learning Pipelines to ...
Zero Downtime App Deployment using Hadoop
Introduction to SQream and the IoT environment

What's hot (15)

PPTX
SQREAM DB on IBM Power9
PDF
Google Cloud Platform Tutorial | GCP Fundamentals | Edureka
PDF
Sqream DB on OpenPOWER performance
PDF
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
PDF
Consolidate Your Technical Debt With Spark Data Sources -Tools and Techniques...
PPTX
Getting It Right Exactly Once: Principles for Streaming Architectures
PDF
Pipelining the Heroes with Kafka and Graph
PDF
Application modernization patterns with apache kafka, debezium, and kubernete...
PDF
Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...
PDF
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
PDF
Ingesting IoT data in Food Processing
PDF
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
PDF
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
PDF
Building a real-time, scalable and intelligent programmatic ad buying platform
PPTX
Scylla Summit 2018: Scylla and KairosDB in Smart Vehicle Diagnostics
SQREAM DB on IBM Power9
Google Cloud Platform Tutorial | GCP Fundamentals | Edureka
Sqream DB on OpenPOWER performance
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Consolidate Your Technical Debt With Spark Data Sources -Tools and Techniques...
Getting It Right Exactly Once: Principles for Streaming Architectures
Pipelining the Heroes with Kafka and Graph
Application modernization patterns with apache kafka, debezium, and kubernete...
Fully-Managed, Multi-Tenant Kafka Clusters: Tips, Tricks, and Tools (Christop...
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
Ingesting IoT data in Food Processing
Spark and Hadoop at Production Scale-(Anil Gadre, MapR)
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Building a real-time, scalable and intelligent programmatic ad buying platform
Scylla Summit 2018: Scylla and KairosDB in Smart Vehicle Diagnostics
Ad

Similar to Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI (20)

PDF
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
PDF
High Performance Distributed TensorFlow with GPUs and Kubernetes
PDF
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
PDF
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
PDF
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
PDF
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
PDF
Nvidia GPU Tech Conference - Optimizing, Profiling, and Deploying TensorFlow...
PDF
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
PDF
Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...
PDF
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PDF
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
PDF
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
PDF
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
PDF
A survey on Machine Learning In Production (July 2018)
PPTX
Getting Started with TensorFlow on Google Cloud
PDF
Pipelines for model deployment
PDF
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
PDF
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
PDF
MLflow Model Serving
PPTX
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
PipelineAI + TensorFlow AI + Spark ML + Kuberenetes + Istio + AWS SageMaker +...
High Performance Distributed TensorFlow with GPUs and Kubernetes
PipelineAI Optimizes Your Enterprise AI Pipeline from Distributed Training to...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Building Google's ML Engine from Scratch on AWS with GPUs, Kubernetes, Istio,...
Nvidia GPU Tech Conference - Optimizing, Profiling, and Deploying TensorFlow...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...
PipelineAI Real-Time Machine Learning - Global Artificial Intelligence Confer...
PipelineAI Continuous Machine Learning and AI - Rework Deep Learning Summit -...
High Performance Distributed TensorFlow in Production with GPUs - NIPS 2017 -...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
A survey on Machine Learning In Production (July 2018)
Getting Started with TensorFlow on Google Cloud
Pipelines for model deployment
Deep Learning with Tensorflow and Apache MXNet on AWS (April 2019)
Optimizing, Profiling, and Deploying TensorFlow AI Models in Production with ...
MLflow Model Serving
Deep Learning with TensorFlow and Apache MXNet on Amazon SageMaker (March 2019)
Ad

More from Data Con LA (20)

PPTX
Data Con LA 2022 Keynotes
PPTX
Data Con LA 2022 Keynotes
PDF
Data Con LA 2022 Keynote
PPTX
Data Con LA 2022 - Startup Showcase
PPTX
Data Con LA 2022 Keynote
PDF
Data Con LA 2022 - Using Google trends data to build product recommendations
PPTX
Data Con LA 2022 - AI Ethics
PDF
Data Con LA 2022 - Improving disaster response with machine learning
PDF
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
PDF
Data Con LA 2022 - Real world consumer segmentation
PPTX
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
PPTX
Data Con LA 2022 - Moving Data at Scale to AWS
PDF
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
PDF
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
PDF
Data Con LA 2022 - Intro to Data Science
PDF
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
PPTX
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
PPTX
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
PPTX
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
PPTX
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynote
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 Keynote
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022 - Data Streaming with Kafka

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
Machine learning based COVID-19 study performance prediction
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Cloud computing and distributed systems.
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Empathic Computing: Creating Shared Understanding
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Big Data Technologies - Introduction.pptx
MYSQL Presentation for SQL database connectivity
Machine learning based COVID-19 study performance prediction
Mobile App Security Testing_ A Comprehensive Guide.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Diabetes mellitus diagnosis method based random forest with bat algorithm
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectroscopy.pptx food analysis technology
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Understanding_Digital_Forensics_Presentation.pptx
Approach and Philosophy of On baking technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The Rise and Fall of 3GPP – Time for a Sabbatical?
Cloud computing and distributed systems.
Programs and apps: productivity, graphics, security and other tools
Empathic Computing: Creating Shared Understanding
Spectral efficient network and resource selection model in 5G networks
Big Data Technologies - Introduction.pptx

Optimizing, Profiling, and Deploying High Performance Spark ML and TensorFlow AI

  • 1. PIPELINE.AI: HIGH PERFORMANCE MODEL TRAINING & SERVING WITH GPUS… …AND AWS SAGEMAKER, GOOGLE CLOUD ML, AZURE ML & KUBERNETES! CHRIS FREGLY FOUNDER @ PIPELINE.AI
  • 3. INTRODUCTIONS: ME § Chris Fregly, Founder & Engineer @PipelineAI § Formerly Netflix, Databricks, IBM Spark Tech § Advanced Spark and TensorFlow Meetup § Please Join Our 60,000+ Global Members!! Contact Me chris@pipeline.ai @cfregly Global Locations * San Francisco * Chicago * Austin * Washington DC * Dusseldorf * London
  • 4. INTRODUCTIONS: YOU § Software Engineer, Data Scientist, Data Engineer, Data Analyst § Interested in Optimizing and Deploying TF Models to Production § Nice to Have a Working Knowledge of TensorFlow (Not Required)
  • 5. PIPELINE.AI IS 100% OPEN SOURCE § https://github.com/PipelineAI/pipeline/ § Please Star 🌟 this GitHub Repo! § Some VC’s Value GitHub Stars @ $15,000 Each (?!)
  • 6. PIPELINE.AI OVERVIEW 450,000 Docker Downloads 60,000 Users Registered for GA 60,000 Meetup Members 40,000 LinkedIn Followers 2,200 GitHub Stars 12 Enterprise Beta Users
  • 7. WHY HEAVY FOCUS ON MODEL SERVING? Model Training Batch & Boring Offline in Research Lab Pipeline Ends at Training No Insight into Live Production Small Number of Data Scientists Optimizations Very Well-Known Real-Time & Exciting!! Online in Live Production Pipeline Extends into Production Continuous Insight into Live Production Huuuuuuge Number of Application Users **Many Optimizations Not Yet Utilized <<< Model Serving 100’s Training Jobs per Day 1,000,000’s Predictions per Sec
  • 8. AGENDA § Deploy and Tune Models + Runtimes Safely in Prod § Compare Models Both Offline and Online § Auto-Shift Traffic to Winning Model or Cloud § Live, Continuous Model Training in Production
  • 9. PACKAGE MODEL + RUNTIME AS ONE § Build Model with Runtime into Immutable Docker Image § Emphasize Immutable Deployment and Infrastructure § Same Runtime Dependencies in All Environments § Local, Development, Staging, Production § No Library or Dependency Surprises § Deploy and Tune Model + Runtime Together pipeline predict-server-build --model-type=tensorflow --model-name=mnist --model-tag=A --model-path=./models/tensorflow/mnist/ Build Local Model Server A
  • 10. LOAD TEST LOCAL MODEL + RUNTIME § Perform Mini-Load Test on Local Model Server § Immediate, Local Prediction Performance Metrics § Compare to Previous Model + Runtime Variations pipeline predict-server-start --model-type=tensorflow --model-name=mnist --model-tag=A pipeline predict --model-endpoint-url=http://localhost:8080 --test-request-path=test_request.json --test-request-concurrency=1000 Load Test Local Model Server A Start Local Model Server A
  • 11. PUSH IMAGE TO DOCKER REGISTRY § Supports All Public + Private Docker Registries § DockerHub, Artifactory, Quay, AWS, Google, … § Or Self-Hosted, Private Docker Registry pipeline predict-server-push --image-registry-url=<your-registry> --image-registry-repo=<your-repo> --model-type=tensorflow --model-name=mnist --model-tag=A Push Image To Docker Registry
  • 12. CLOUD-BASED OPTIONS § AWS SageMaker § Released Nov 2017 @ Re-invent § Custom Docker Images for Training & Serving ie. PipelineAI Images § Distributed TensorFlow Training through Estimator API § Traffic Splitting for A/B Model Testing § Google Cloud ML Engine § Mostly Command-Line Based § Driving TensorFlow Open Source API (ie. Experiment API) § Azure ML
  • 13. TUNE MODEL + RUNTIME AS SINGLE UNIT § Model Training Optimizations § Model Hyper-Parameters (ie. Learning Rate) § Reduced Precision (ie. FP16 Half Precision) § Post-Training Model Optimizations § Quantize Model Weights + Activations From 32-bit to 8-bit § Fuse Neural Network Layers Together § Model Runtime Optimizations § Runtime Configs (ie. Request Batch Size) § Different Runtimes (ie. TensorFlow Lite, Nvidia TensorRT)
  • 14. POST-TRAINING OPTIMIZATIONS § Prepare Model for Serving § Simplify Network § Reduce Model Size § Quantize for Fast Matrix Math § Some Tools § Graph Transform Tool (GTT) § tfcompile After Training After Optimizing! pipeline optimize --optimization-list=[quantize_weights, tfcompile] --model-type=tensorflow --model-name=mnist --model-tag=A --model-path=./tensorflow/mnist/model --output-path=./tensorflow/mnist/optimized_model Linear Regression
  • 15. RUNTIME OPTION: TENSORFLOW LITE § Post-Training Model Optimizations § Currently Supports iOS and Android § On-Device Prediction Runtime § Low-Latency, Fast Startup § Selective Operator Loading § 70KB Min - 300KB Max Runtime Footprint § Supports Accelerators (GPU, TPU) § Falls Back to CPU without Accelerator § Java and C++ APIs
  • 16. RUNTIME OPTION: NVIDIA TENSOR-RT § Post-Training Model Optimizations § Specific to Nvidia GPU § GPU-Optimized Prediction Runtime § Alternative to TensorFlow Serving § PipelineAI Supports TensorRT!
  • 17. DEPLOY MODELS SAFELY TO PROD § Deploy from CLI or Jupyter Notebook § Tear-Down or Rollback Models Quickly § Shadow Canary Deploy: ie.20% Live Traffic § Split Canary Deploy: ie. 97-2-1% Live Traffic pipeline predict-cluster-start --model-runtime=tflite --model-type=tensorflow --model-name=mnist --model-tag=B --traffic-split=2 Start Production Model Cluster B pipeline predict-cluster-start --model-runtime=tensorrt --model-type=tensorflow --model-name=mnist --model-tag=C --traffic-split=1 Start Production Model Cluster C pipeline predict-cluster-start --model-runtime=tfserving_gpu --model-type=tensorflow --model-name=mnist --model-tag=A --traffic-split=97 Start Production Model Cluster A
  • 18. AGENDA § Deploy and Tune Models + Runtimes Safely in Prod § Compare Models Both Offline and Online § Auto-Shift Traffic to Winning Model or Cloud § Live, Continuous Model Training in Production
  • 19. COMPARE MODELS OFFLINE & ONLINE § Offline, Batch Metrics § Validation + Training Accuracy § CPU + GPU Utilization § Live Prediction Values § Compare Relative Precision § Newly-Seen, Streaming Data § Online, Real-Time Metrics § Response Time, Throughput § Cost ($) Per Prediction
  • 20. VIEW REAL-TIME PREDICTION STREAM § Visually Compare Real-Time Predictions Prediction Inputs Prediction Results & Confidences Model B Model CModel A
  • 21. PREDICTION PROFILING AND TUNING § Pinpoint Performance Bottlenecks § Fine-Grained Prediction Metrics § 3 Steps in Real-Time Prediction 1. transform_request() 2. predict() 3. transform_response()
  • 22. AGENDA § Deploy and Tune Models + Runtimes Safely in Prod § Compare Models Both Offline and Online § Auto-Shift Traffic to Winning Model or Cloud § Live, Continuous Model Training in Production
  • 23. LIVE, ADAPTIVE TRAFFIC ROUTING § A/B Tests § Inflexible and Boring § Multi-Armed Bandits § Adaptive and Exciting! pipeline traffic-router-split --model-type=tensorflow --model-name=mnist --model-tag-list=[A,B,C] --model-weight-list=[1,2,97] Adjust Traffic Routing Dynamically
  • 24. SHIFT TRAFFIC TO MAX(REVENUE) § Shift Traffic to Winning Model using AI Bandit Algos
  • 25. SHIFT TRAFFIC TO MIN(CLOUD CO$T) § Based on Cost ($) Per Prediction § Cost Changes Throughout Day § Lose AWS Spot Instances § Google Cloud Becomes Cheaper § Shift Across Clouds & On-Prem
  • 26. AGENDA § Deploy and Tune Models + Runtimes Safely in Prod § Compare Models Both Offline and Online § Auto-Shift Traffic to Winning Model or Cloud § Live, Continuous Model Training in Production
  • 27. LIVE, CONTINUOUS MODEL TRAINING § The Holy Grail of Machine Learning § Q1 2018: PipelineAI Supports Continuous Model Training! § Kafka, Kinesis § Spark Streaming
  • 28. PSEUDO-CONTINUOUS TRAINING § Identify and Fix Borderline Predictions (~50-50% Confidence) § Fix Along Class Boundaries § Retrain Newly-Labeled Data § Game-ify Labeling Process § Enable Crowd Sourcing
  • 29. DEMO: TRAIN, DEPLOY, TEST MODEL § https://github.com/PipelineAI/pipeline/ § Please Star 🌟 this GitHub Repo! pipeline predict-server-build --model-type=tensorflow --model-name=mnist --model-tag=A --model-path=./models/tensorflow/mnist/
  • 30. THANK YOU!! § https://github.com/PipelineAI/pipeline/ § Please Star 🌟 this GitHub Repo! § Reminder: VC’s Value GitHub Stars @ $15,000 Each (!!) Contact Me chris@pipeline.ai @cfregly