SlideShare a Scribd company logo
Azure Machine Learning
A Technical Overview
Viktor Tsykunov
Principal Software Engineering Lead
Microsoft
Table of Contents
Machine Learning Requirements
Applications, Characteristics and Requirements
End-to-End lifecycle and processes
Deep Learning: Additional Requirements
Azure Machine Learning service
Artifacts: Workspace, Experiments, Compute, Models,
Images, Deployment and Datastore
Concepts: Model Management, Pipelines
Code Sample: Step-by-Step Workflow
Azure Automated Machine Learning
Requirements of an
advanced ML Platform
Machine Learning
Typical E2E Process
…
Prepare Experiment Deploy
Orchestrate
DevOps loop for data science
Prepare
Data
Prepare
Register and
Manage Model
Build
Image
…
Build model
(your favorite IDE)
Deploy Service
Monitor Model
Train &
Test Model
Deep Learning places
additional requirements
Traditional ML versus DL
Top figure source;
Bottom figure from NVIDIA
Trad.
ML
DL
Some deep learning applications
Forecasting
A / अ
Machine
Translation
Predictive
analytics
Autonomous
vehicles
Speech
recognition
Image
recognition
Characteristics of Deep Learning
Deep Learning
Three Additional Requirements
1.
2.
3.
Azure offers a comprehensive
AI/ML platform that meets—and
exceeds—requirements
Machine Learning on Azure
Domain specific pretrained models
To reduce time to market
Azure
Databricks
Machine
Learning VMs
Popular frameworks
To build advanced deep learning solutions
TensorFlow
Pytorch Onnx
Azure Machine
Learning
Language
Speech
…
Search
Vision
Productive services
To empower data science and development teams
Powerful infrastructure
To accelerate deep learning
Scikit-Learn
PyCharm Jupyter
Familiar Data Science tools
To simplify model development
Visual Studio Code Command line
CPU GPU FPGA
From the Intelligent Cloud to the Intelligent Edge
What is Azure Machine Learning service?
Set of Azure
Cloud Services
Python
SDK
✓ Prepare Data
✓ Build Models
✓ Train Models
✓ Manage Models
✓ Track Experiments
✓ Deploy Models
That enables
you to:
Azure Machine Learning:
Technical Details
Azure ML service
Key Artifacts
Workspace
How to use the Azure Machine
Learning service:
E2E coding example using the SDK
My Computer
Experiment Docker Image
Data Store
Compute Target
Azure ML
Workspace
Azure ML
Steps
Setup for Code Example
This tutorial trains a simple logistic regression
using the MNIST dataset and scikit-learn with
Azure Machine Learning service.
MNIST is a dataset consisting of 70,000 grayscale images.
Each image is a handwritten digit of 28x28 pixels,
representing a number from 0 to 9.
The goal is to create a multi-class classifier to
identify the digit a given image represents.
from azureml.core import Workspace
ws = Workspace.create(name='myworkspace',
subscription_id='<azure-subscription-id>',
resource_group='myresourcegroup',
create_resource_group=True,
location='eastus2' # or other supported Azure region
)
# see workspace details
ws.get_details()
Step 2 – Create an Experiment
experiment_name = ‘my-experiment-1'
from azureml.core import Experiment
exp = Experiment(workspace=ws, name=experiment_name)
Step 1 – Create a workspace
Step 3 – Create remote compute target
# choose a name for your cluster, specify min and max nodes
compute_name = os.environ.get("BATCHAI_CLUSTER_NAME", "cpucluster")
compute_min_nodes = os.environ.get("BATCHAI_CLUSTER_MIN_NODES", 0)
compute_max_nodes = os.environ.get("BATCHAI_CLUSTER_MAX_NODES", 4)
# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6
vm_size = os.environ.get("BATCHAI_CLUSTER_SKU", "STANDARD_D2_V2")
provisioning_config = AmlCompute.provisioning_configuration(
vm_size = vm_size,
min_nodes = compute_min_nodes,
max_nodes = compute_max_nodes)
# create the cluster
print(‘ creating a new compute target... ')
compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)
# You can poll for a minimum number of nodes and for a specific timeout.
# if no min node count is provided it will use the scale settings for the cluster
compute_target.wait_for_completion(show_output=True,
min_node_count=None, timeout_in_minutes=20)
# note that while loading, we are shrinking the intensity values (X) from 0-255 to 0-1 so that the
model converge faster.
X_train = load_data('./data/train-images.gz', False) / 255.0
y_train = load_data('./data/train-labels.gz', True).reshape(-1)
X_test = load_data('./data/test-images.gz', False) / 255.0
y_test = load_data('./data/test-labels.gz', True).reshape(-1)
First load the compressed files into numpy arrays. Note the ‘load_data’ is a custom function that simply parses the
compressed files into numpy arrays.
Now make the data accessible remotely by uploading that data from your local machine into Azure so it can be
accessed for remote training. The files are uploaded into a directory named mnist at the root of the datastore.
ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name)
ds.upload(src_dir='./data', target_path='mnist', overwrite=True, show_progress=True)
We now have everything you need to start training a model.
Step 4 – Upload data to the cloud
%%time from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
clf.fit(X_train, y_train)
# Next, make predictions using the test set and calculate the accuracy
y_hat = clf.predict(X_test)
print(np.average(y_hat == y_test))
You should see the local model accuracy displayed. [It should be a number like 0.915]
Train a simple logistic regression model using scikit-learn locally. This should take a minute or two.
Step 5 – Train a local model
To submit a training job to a remote you have to perform the following tasks:
• 6.1: Create a directory
• 6.2: Create a training script
• 6.3: Create an estimator object
• 6.4: Submit the job
Step 6.1 – Create a directory
Create a directory to deliver the required code from your computer to the remote resource.
import os
script_folder = './sklearn-mnist' os.makedirs(script_folder, exist_ok=True)
Step 6 – Train model on remote cluster
%%writefile $script_folder/train.py
# load train and test set into numpy arrays
# Note: we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can
# converge faster.
# ‘data_folder’ variable holds the location of the data files (from datastore)
Reg = 0.8 # regularization rate of the logistic regression model.
X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0
X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0
y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1)
y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = 'n’)
# get hold of the current run
run = Run.get_context()
#Train a logistic regression model with regularizaion rate of’ ‘reg’
clf = LogisticRegression(C=1.0/reg, random_state=42)
clf.fit(X_train, y_train)
Step 6.2 – Create a Training Script (1/2)
print('Predict the test set’)
y_hat = clf.predict(X_test)
# calculate accuracy on the prediction
acc = np.average(y_hat == y_test)
print('Accuracy is', acc)
run.log('regularization rate', np.float(args.reg))
run.log('accuracy', np.float(acc)) os.makedirs('outputs', exist_ok=True)
# The training script saves the model into a directory named ‘outputs’. Note files saved in the
# outputs folder are automatically uploaded into experiment record. Anything written in this
# directory is automatically uploaded into the workspace.
joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')
Step 6.2 – Create a Training Script (2/2)
An estimator object is used to submit the run.
from azureml.train.estimator import Estimator
script_params = { '--data-folder': ds.as_mount(), '--regularization': 0.8 }
est = Estimator(source_directory=script_folder,
script_params=script_params,
compute_target=compute_target,
entry_script='train.py’,
conda_packages=['scikit-learn'])
Step 6.4 – Submit the job to the cluster for training
run = exp.submit(config=est)
run
Step 6.3 – Create an Estimator
What happens after you submit the job?
Post-Processing
The ./outputs directory of the run is copied
over to the run history in your workspace
so you can access these results.
Running
In this stage, the necessary scripts and files are
sent to the compute target, then data stores are
mounted/copied, then the entry_script is run.
While the job is running, stdout and the ./logs
directory are streamed to the run history. You can
monitor the run's progress using these logs.
Image creation
A Docker image is created matching the Python
environment specified by the estimator. The image
is uploaded to the workspace. Image creation and
uploading takes about 5 minutes.
This happens once for each Python environment
since the container is cached for subsequent runs.
During image creation, logs are streamed to the
run history. You can monitor the image creation
progress using these logs.
Scaling
If the remote cluster requires more nodes
to execute the run than currently available,
additional nodes are added automatically.
Scaling typically takes about 5 minutes.
Step 7 – Monitor a run
You can watch the progress of the run with a Jupyter widget. The widget is asynchronous and provides live
updates every 10-15 seconds until the job completes.
from azureml.widgets import RunDetails
RunDetails(run).show()
Here is a still snapshot of the widget shown at the end of training:
Step 8 – See the results
As model training and monitoring happen in the background. Wait until the model has completed training before
running more code. Use wait_for_completion to show when the model training is complete
run.wait_for_completion(show_output=False)
# now there is a trained model on the remote cluster
print(run.get_metrics())
{'regularization rate': 0.8, 'accuracy': 0.9204}
Step 9 – Register the model
This wrote the file ‘outputs/sklearn_mnist_model.pkl’ in a directory named ‘outputs’ in the VM of the cluster where
the job is executed.
• outputs is a special directory in that all content in this directory is automatically uploaded to your workspace.
• This content appears in the run record in the experiment under your workspace.
• Hence, the model file is now also available in your workspace.
joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')
Recall that the last step in the training script is:
# register the model in the workspace
model = run.register_model (
model_name='sklearn_mnist’,
model_path='outputs/sklearn_mnist_model.pkl’)
The model is now available to query, examine, or deploy
Step 9 – Deploy the Model
Step 9.1 – Create the scoring script
Create the scoring script, called score.py, used by the web service call to show how to use the model.
It requires two functions – init() and run (input data)
from azureml.core.model import Model
def init():
global model
# retreive the path to the model file using the model name
model_path = Model.get_model_path('sklearn_mnist’)
model = joblib.load(model_path)
def run(raw_data):
data = np.array(json.loads(raw_data)['data’])
# make prediction
y_hat = model.predict(data) return json.dumps(y_hat.tolist())
Step 9.2 – Create environment file
Create an environment file, called myenv.yml, that specifies all of the script's package dependencies. This file is
used to ensure that all of those dependencies are installed in the Docker image. This example needs scikit-learn
and azureml-sdk.
from azureml.core.conda_dependencies import CondaDependencies
myenv = CondaDependencies()
myenv.add_conda_package("scikit-learn")
with open("myenv.yml","w") as f:
f.write(myenv.serialize_to_string())
Step 9.3 – Create configuration file
Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for the
ACI container. Here we will use the defaults (1 core and 1 gigabyte of RAM)
from azureml.core.webservice import AciWebservice
aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1,
tags={"data": "MNIST", "method" : "sklearn"},
description='Predict MNIST with sklearn')
Step 9.4 – Deploy the model to ACI
%%time
from azureml.core.webservice import Webservice
from azureml.core.image import ContainerImage
# configure the image
image_config = ContainerImage.image_configuration(
execution_script ="score.py",
runtime ="python",
conda_file ="myenv.yml")
service = Webservice.deploy_from_model(workspace=ws, name='sklearn-mnist-svc’,
deployment_config=aciconfig, models=[model],
image_config=image_config)
service.wait_for_deployment(show_output=True)
Step 10 – Test the deployed model using the HTTP end point
Test the deployed model by sending images to be classified to the HTTP endpoint
import requests
import json
# send a random row from the test set to score
random_index = np.random.randint(0, len(X_test)-1)
input_data = "{"data": [" + str(list(X_test[random_index])) + "]}"
headers = {'Content-Type':'application/json’}
resp = requests.post(service.scoring_uri, input_data, headers=headers)
print("POST to url", service.scoring_uri)
#print("input data:", input_data)
print("label:", y_test[random_index])
print("prediction:", resp.text)
Azure Automated Machine Learning
‘simplifies’ the creation and selection
of the optimal model
Typical ‘manual’ approach to hyperparameter tuning
Dataset
Training
Algorithm 1
Hyperparameter
Values – config 1
Model 1
Hyperparameter
Values – config 2
Model 2
Hyperparameter
Values – config 3
Model 3
Model
Training
Infrastructure
Training
Algorithm 2
Hyperparameter
Values – config 4
Model 4
Complex
Tedious
Repetitive
Time consuming
Expensive
What are Hyperparameters?
Adjustable parameters that govern model training
Chosen prior to training, stay constant during training
Model performance heavily depends on hyperparameter
The search space to explore—i.e. evaluating all possible
combinations—is huge.
Sparsity of good configurations.
Very few of all possible configurations are optimal.
Evaluating each configuration is resource and time
consuming.
Time and resources are limited.
Challenges with Hyperparameter Selection
Machine Learning Complexity
Complexity of Machine Learning
Source: http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
Automated ML
Current Capabilities
Category Value
Compute
Target
Automated ML
Use via the Python SDK
© Copyright Microsoft Corporation. All rights reserved.
Appendix
Customer story
Drone-based electric
grid inspector powered
by deep learning
Challenge
Traditional power line inspection services are costly
Demand for low cost image scoring and support for
multiple concurrent customers
Needed powerful AI to execute on a drone solution
Solution
Deep learning to analyze multiple streaming data feeds
Azure GPUs support Single Shot multibox detectors
Reliable, consistent, and highly elastic scalability with Azure
Batch Shipyards
eSmart architecture
Data Sources Ingest Prepare Analyze Publish Consume
Drone collected
images
Batch upload of
drone images
On-prem
command
center
Cosmos DB
Contain inventory
results and state
changes
Azure ML
compute
Docker Image
DNN contained in
a Docker image
Azure ML service
Azure Blob
Cosmos DB
10
01
TensorFlow
Jupyter
Intelligent Edge
Models deployed to
drones for accelerated
inferencing
Try it for free
http://aka.ms/amlfree
Learn more: http://aka.ms/azureml-docs
Visit the Getting started guide
Challenge
• Fraud losses of USD 2.8M in internet banking in
a single channel for one region alone
• Need for real-time fraud detection in less than
15 minutes
• Demand to reduce account false positive rate
(AFPR) to no more than 5 to 1
Solution
• Deployed Lambda-based solution architecture
using Azure data PaaS
• Fraud account detection rate (ADR) of 94%
• Value detection rate (VDR) of 76% with a 1
minute delay post identification
Million dollar loss prevention
with real-time fraud detection
Challenge
• Assist buyer search by providing accurate
options of similar clothing items from
catalogue
• Need for improved smart-image matching
capability based on color, pattern, neck style,
etc.
Solution
• Training data created using Bing and domain-
specific images
• Used transfer learning to leverage pre-trained
ImageNet deep neural network
• Accurate list of most similar clothing items
using similarity metrics for apparel
• Match accuracy of 74%
Increased match accuracy
with image analysis
Deploy to FPGA
FPGAs
EFFICIENCY
Azure ML
Silicon Alternatives
FLEXIBILITY
CPUs GPUs
ASICs
INFERENCING
CPUs, GPUS, FPGAs
TRAINING
CPUs and GPUs
Cloud
Edge
INFERENCING
CPUs, GPUS, FPGAs
TRAINING
(HEAVY EDGE)
CPUs and GPUs
Flexibility Efficiency
FPGAs vs. CPU, GPU, and ASIC
What is currently supported on Azure?
Today, Project Brainwave supports
Image classification and recognition scenarios
TensorFlow deployment
DNNs: ResNet 50, ResNet 152, VGG-16, SSD-VGG,
and DenseNet-121
Intel FPGA hardware
Using this FPGA-enabled hardware architecture,
trained neural networks run quickly and with lower
latency.
Project Brainwave can parallelize pre-trained deep
neural networks (DNN) across FPGAs to scale out
your service.
The DNNs can be pre-trained, as a deep featurizer
for transfer learning, or fine-tuned with
updated weights
Here is the workflow for creating an image
recognition service in Azure using supported
DNNs as a featurizer for deployment on Azure
FPGAs:
Use the Azure Machine Learning SDK for Python to
create a service definition, which is a file describing a
pipeline of graphs (input, featurizer, and classifier) based
on TensorFlow. The deployment command will
automatically compress the definition and graphs into a
ZIP file and upload the ZIP to Azure Blob storage. The
DNN is already deployed on Project Brainwave to run on
the FPGA.
Register the model using the SDK with the ZIP file in
Azure Blob storage.
Deploy the service with the registered model using SDK
Distributed Training with
Azure ML Compute
Distributed Training with Azure ML Compute
distributed training with Horovod
Deploy to IoT Edge
IoT Edge Modules
An Azure IoT Edge device is a Linux or
Windows-based device that runs the
Azure IoT Edge runtime.
Machine learning models can be
deployed to these devices as IoT Edge
Modules.
Benefits: Deploying a model to an IoT
Edge device allows the device to use
the model directly, instead of having to
send data to the cloud for processing.
You get faster response times and less
data transfer
IoT Edge Modules
Azure IoT Edge modules are the smallest
unit of computation managed by IoT Edge,
and can contain Azure services or your own
solution-specific code.
IoT Edge module images contain
applications that take advantage of the
management, security, and communication
features of the IoT Edge runtime.
In implementation, modules images exist as
container images in a repository, and
module instances are containers on devices.
Edge Deployment
Light and Heavy
Cloud: Azure Heavy Edge Light Edge
Description
An Azure host that
spans from CPU to
GPU and FPGA VMs
A server with slots to insert CPUs, GPUs, and FPGAs or a X64 or ARM system that needs
to be plugged in to work
A Sensor with a SOC (ARM CPU, NNA, MCU) and memory
that can operate on batteries
Example
DSVM / ACI / AKS /
Batch AI
- DataBox Edge
- HPE
- Azure Stack
- DataBox Edge - Industrial PC
-Video Gateway
-DVR
-Mobile Phones
-VAIDK
-Mobile Phones
-IP Cameras
-Azure Sphere
- Appliances
What runs
the model
CPU,GPU or FPGA
CPU,GPU or
FPGA
CPU, GPU x64 CPU Multi-ARM CPU
Hw accelerated
NNA
CPU/GPU MCU
Purpose Graphics
VM Family NV v1
GPU NVIDIA M60
Sizes 1, 2 or 4 GPU
Interconnect PCIe (dual root)
2nd Network
VM CPU Haswell
VM RAM 56-224 GB
Local SSD ~380-1500 GB
Storage Std Storage
Driver Quadro/Grid PC
Azure Family of GPUs
Compute Compute Compute
NC v1 NC v2 NC v3
NVIDIA K80 NVIDIA P100 NVIDIA V100
1, 2 or 4 GPU 1, 2 or 4 GPU 1, 2 or 4 GPU
PCIe (dual root) PCIe (dual root) PCIe (dual root)
FDR InfiniBand FDR InfiniBand FDR InfiniBand
Haswell Broadwell Broadwell
56-224 GB 112-448 GB 112-448 GB
~380-1500 GB ~700-3000 GB ~700-3000 GB
Std Storage Prem Storage Prem Storage
Tesla Tesla Tesla
Deep Learning
ND v1
NVIDIA P40
1, 2 or 4 GPU
PCIe (dual root)
FDR InfiniBand
Broadwell
112-448 GB
~700-3000 GB
Prem Storage
Tesla
New GPU Families (Announced this week!)
Purpose Graphics
VM Family NV v2
GPU NVIDIA M60
Sizes 1, 2 or 4 GPU
Interconnect PCIe (dual root)
2nd Network
VM CPU Broadwell
VM RAM 112-448 GB
Local SSD ~700-3000 GB
Storage Prem Storage
Driver Quadro/Grid PC
Deep Learning
ND v2
NVIDIA V100
8 GPU
NVLink
Skylake
672 GB
~1300 GB
Prem Storage

More Related Content

PPTX
MLOps - The Assembly Line of ML
PDF
What is MLOps
PDF
Auto-Train a Time-Series Forecast Model With AML + ADB
PDF
MLOps for production-level machine learning
PPT
PDF
Simplifying Model Management with MLflow
PDF
Introduction to Web Services
PDF
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
MLOps - The Assembly Line of ML
What is MLOps
Auto-Train a Time-Series Forecast Model With AML + ADB
MLOps for production-level machine learning
Simplifying Model Management with MLflow
Introduction to Web Services
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...

What's hot (20)

PDF
MLOps Bridging the gap between Data Scientists and Ops.
PDF
DVC - Git-like Data Version Control for Machine Learning projects
PDF
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
PDF
Introduction to MLflow
PDF
Data Versioning and Reproducible ML with DVC and MLflow
PPTX
Pythonsevilla2019 - Introduction to MLFlow
PPTX
Introduction to APIs (Application Programming Interface)
PDF
Databricks Overview for MLOps
PPTX
Introduction to webservices
PPTX
From Data Science to MLOps
PDF
"Managing the Complete Machine Learning Lifecycle with MLflow"
PDF
Using MLOps to Bring ML to Production/The Promise of MLOps
PPTX
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
PDF
Managing the Machine Learning Lifecycle with MLOps
PDF
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
PDF
Vector databases and neural search
PPTX
Regular Expression (Regex) Fundamentals
PPTX
MLOps.pptx
PDF
MLflow: A Platform for Production Machine Learning
PDF
ML-Ops: Philosophy, Best-Practices and Tools
MLOps Bridging the gap between Data Scientists and Ops.
DVC - Git-like Data Version Control for Machine Learning projects
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
Introduction to MLflow
Data Versioning and Reproducible ML with DVC and MLflow
Pythonsevilla2019 - Introduction to MLFlow
Introduction to APIs (Application Programming Interface)
Databricks Overview for MLOps
Introduction to webservices
From Data Science to MLOps
"Managing the Complete Machine Learning Lifecycle with MLflow"
Using MLOps to Bring ML to Production/The Promise of MLOps
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Managing the Machine Learning Lifecycle with MLOps
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Vector databases and neural search
Regular Expression (Regex) Fundamentals
MLOps.pptx
MLflow: A Platform for Production Machine Learning
ML-Ops: Philosophy, Best-Practices and Tools
Ad

Similar to Viktor Tsykunov: Azure Machine Learning Service (20)

PPTX
Unsupervised Aspect Based Sentiment Analysis at Scale
PPTX
Azure machine learning service
PDF
ML-Ops how to bring your data science to production
PDF
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...
PDF
Build and deploy PyTorch models with Azure Machine Learning - Henk - CCDays
PDF
I want my model to be deployed ! (another story of MLOps)
PDF
Competition 1 (blog 1)
PPTX
Database connectivity in python
PPT
AzureMLDeployment.ppt
PDF
Distributed Deep Learning Using Java on the Client and in the Cloud
PDF
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
PDF
Start machine learning in 5 simple steps
PPTX
Smart Data Conference: DL4J and DataVec
PDF
maXbox starter65 machinelearning3
PDF
Julien Simon "Scaling ML from 0 to millions of users"
PDF
Celery with python
PDF
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
PDF
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
PDF
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
PDF
Benchy, python framework for performance benchmarking of Python Scripts
Unsupervised Aspect Based Sentiment Analysis at Scale
Azure machine learning service
ML-Ops how to bring your data science to production
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...
Build and deploy PyTorch models with Azure Machine Learning - Henk - CCDays
I want my model to be deployed ! (another story of MLOps)
Competition 1 (blog 1)
Database connectivity in python
AzureMLDeployment.ppt
Distributed Deep Learning Using Java on the Client and in the Cloud
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Start machine learning in 5 simple steps
Smart Data Conference: DL4J and DataVec
maXbox starter65 machinelearning3
Julien Simon "Scaling ML from 0 to millions of users"
Celery with python
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Benchy, python framework for performance benchmarking of Python Scripts
Ad

More from Lviv Startup Club (20)

PDF
Maksym Vyshnivetskyi: PMO KPIs (UA) - LemBS
PDF
Oleksandr Ivakhnenko: LinkedIn Marketing і Content Marketing: розширений підх...
PDF
Maksym Vyshnivetskyi: PMO Quality Management (UA)
PDF
Oleksandr Ivakhnenko: Вступ до генерації лідів для ІТ-аутсорсингу (UA)
PDF
Oleksandr Osypenko: Поради щодо іспиту та закриття курсу (UA)
PDF
Oleksandr Osypenko: Пробний іспит + аналіз (UA)
PDF
Oleksandr Osypenko: Agile / Hybrid Delivery (UA)
PDF
Oleksandr Osypenko: Стейкхолдери та їх вплив (UA)
PDF
Rostyslav Chayka: Prompt Engineering для проєктного менеджменту (Advanced) (UA)
PPTX
Dmytro Liesov: PMO Tools and Technologies (UA)
PDF
Rostyslav Chayka: Управління командою за допомогою AI (UA)
PDF
Oleksandr Osypenko: Tailoring + Change Management (UA)
PDF
Maksym Vyshnivetskyi: Управління закупівлями (UA)
PDF
Oleksandr Osypenko: Управління ризиками (UA)
PPTX
Dmytro Zubkov: PMO Resource Management (UA)
PPTX
Rostyslav Chayka: Комунікація за допомогою AI (UA)
PDF
Ihor Pavlenko: Комунікація за допомогою AI (UA)
PDF
Maksym Vyshnivetskyi: Управління якістю (UA)
PDF
Ihor Pavlenko: Робота зі стейкхолдерами за допомогою AI (UA)
PDF
Maksym Vyshnivetskyi: Управління вартістю (Cost) (UA)
Maksym Vyshnivetskyi: PMO KPIs (UA) - LemBS
Oleksandr Ivakhnenko: LinkedIn Marketing і Content Marketing: розширений підх...
Maksym Vyshnivetskyi: PMO Quality Management (UA)
Oleksandr Ivakhnenko: Вступ до генерації лідів для ІТ-аутсорсингу (UA)
Oleksandr Osypenko: Поради щодо іспиту та закриття курсу (UA)
Oleksandr Osypenko: Пробний іспит + аналіз (UA)
Oleksandr Osypenko: Agile / Hybrid Delivery (UA)
Oleksandr Osypenko: Стейкхолдери та їх вплив (UA)
Rostyslav Chayka: Prompt Engineering для проєктного менеджменту (Advanced) (UA)
Dmytro Liesov: PMO Tools and Technologies (UA)
Rostyslav Chayka: Управління командою за допомогою AI (UA)
Oleksandr Osypenko: Tailoring + Change Management (UA)
Maksym Vyshnivetskyi: Управління закупівлями (UA)
Oleksandr Osypenko: Управління ризиками (UA)
Dmytro Zubkov: PMO Resource Management (UA)
Rostyslav Chayka: Комунікація за допомогою AI (UA)
Ihor Pavlenko: Комунікація за допомогою AI (UA)
Maksym Vyshnivetskyi: Управління якістю (UA)
Ihor Pavlenko: Робота зі стейкхолдерами за допомогою AI (UA)
Maksym Vyshnivetskyi: Управління вартістю (Cost) (UA)

Recently uploaded (20)

PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Microsoft 365 products and services descrption
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
DOCX
Factor Analysis Word Document Presentation
PPTX
IMPACT OF LANDSLIDE.....................
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPT
Predictive modeling basics in data cleaning process
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
Transcultural that can help you someday.
PPTX
modul_python (1).pptx for professional and student
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
SAP 2 completion done . PRESENTATION.pptx
Microsoft 365 products and services descrption
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Factor Analysis Word Document Presentation
IMPACT OF LANDSLIDE.....................
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Predictive modeling basics in data cleaning process
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
retention in jsjsksksksnbsndjddjdnFPD.pptx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Transcultural that can help you someday.
modul_python (1).pptx for professional and student
STERILIZATION AND DISINFECTION-1.ppthhhbx
Pilar Kemerdekaan dan Identi Bangsa.pptx
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj

Viktor Tsykunov: Azure Machine Learning Service

  • 1. Azure Machine Learning A Technical Overview Viktor Tsykunov Principal Software Engineering Lead Microsoft
  • 2. Table of Contents Machine Learning Requirements Applications, Characteristics and Requirements End-to-End lifecycle and processes Deep Learning: Additional Requirements Azure Machine Learning service Artifacts: Workspace, Experiments, Compute, Models, Images, Deployment and Datastore Concepts: Model Management, Pipelines Code Sample: Step-by-Step Workflow Azure Automated Machine Learning
  • 4. Machine Learning Typical E2E Process … Prepare Experiment Deploy Orchestrate
  • 5. DevOps loop for data science Prepare Data Prepare Register and Manage Model Build Image … Build model (your favorite IDE) Deploy Service Monitor Model Train & Test Model
  • 7. Traditional ML versus DL Top figure source; Bottom figure from NVIDIA Trad. ML DL
  • 8. Some deep learning applications Forecasting A / अ Machine Translation Predictive analytics Autonomous vehicles Speech recognition Image recognition
  • 10. Deep Learning Three Additional Requirements 1. 2. 3.
  • 11. Azure offers a comprehensive AI/ML platform that meets—and exceeds—requirements
  • 12. Machine Learning on Azure Domain specific pretrained models To reduce time to market Azure Databricks Machine Learning VMs Popular frameworks To build advanced deep learning solutions TensorFlow Pytorch Onnx Azure Machine Learning Language Speech … Search Vision Productive services To empower data science and development teams Powerful infrastructure To accelerate deep learning Scikit-Learn PyCharm Jupyter Familiar Data Science tools To simplify model development Visual Studio Code Command line CPU GPU FPGA From the Intelligent Cloud to the Intelligent Edge
  • 13. What is Azure Machine Learning service? Set of Azure Cloud Services Python SDK ✓ Prepare Data ✓ Build Models ✓ Train Models ✓ Manage Models ✓ Track Experiments ✓ Deploy Models That enables you to:
  • 15. Azure ML service Key Artifacts Workspace
  • 16. How to use the Azure Machine Learning service: E2E coding example using the SDK
  • 17. My Computer Experiment Docker Image Data Store Compute Target Azure ML Workspace Azure ML Steps
  • 18. Setup for Code Example This tutorial trains a simple logistic regression using the MNIST dataset and scikit-learn with Azure Machine Learning service. MNIST is a dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing a number from 0 to 9. The goal is to create a multi-class classifier to identify the digit a given image represents.
  • 19. from azureml.core import Workspace ws = Workspace.create(name='myworkspace', subscription_id='<azure-subscription-id>', resource_group='myresourcegroup', create_resource_group=True, location='eastus2' # or other supported Azure region ) # see workspace details ws.get_details() Step 2 – Create an Experiment experiment_name = ‘my-experiment-1' from azureml.core import Experiment exp = Experiment(workspace=ws, name=experiment_name) Step 1 – Create a workspace
  • 20. Step 3 – Create remote compute target # choose a name for your cluster, specify min and max nodes compute_name = os.environ.get("BATCHAI_CLUSTER_NAME", "cpucluster") compute_min_nodes = os.environ.get("BATCHAI_CLUSTER_MIN_NODES", 0) compute_max_nodes = os.environ.get("BATCHAI_CLUSTER_MAX_NODES", 4) # This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6 vm_size = os.environ.get("BATCHAI_CLUSTER_SKU", "STANDARD_D2_V2") provisioning_config = AmlCompute.provisioning_configuration( vm_size = vm_size, min_nodes = compute_min_nodes, max_nodes = compute_max_nodes) # create the cluster print(‘ creating a new compute target... ') compute_target = ComputeTarget.create(ws, compute_name, provisioning_config) # You can poll for a minimum number of nodes and for a specific timeout. # if no min node count is provided it will use the scale settings for the cluster compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
  • 21. # note that while loading, we are shrinking the intensity values (X) from 0-255 to 0-1 so that the model converge faster. X_train = load_data('./data/train-images.gz', False) / 255.0 y_train = load_data('./data/train-labels.gz', True).reshape(-1) X_test = load_data('./data/test-images.gz', False) / 255.0 y_test = load_data('./data/test-labels.gz', True).reshape(-1) First load the compressed files into numpy arrays. Note the ‘load_data’ is a custom function that simply parses the compressed files into numpy arrays. Now make the data accessible remotely by uploading that data from your local machine into Azure so it can be accessed for remote training. The files are uploaded into a directory named mnist at the root of the datastore. ds = ws.get_default_datastore() print(ds.datastore_type, ds.account_name, ds.container_name) ds.upload(src_dir='./data', target_path='mnist', overwrite=True, show_progress=True) We now have everything you need to start training a model. Step 4 – Upload data to the cloud
  • 22. %%time from sklearn.linear_model import LogisticRegression clf = LogisticRegression() clf.fit(X_train, y_train) # Next, make predictions using the test set and calculate the accuracy y_hat = clf.predict(X_test) print(np.average(y_hat == y_test)) You should see the local model accuracy displayed. [It should be a number like 0.915] Train a simple logistic regression model using scikit-learn locally. This should take a minute or two. Step 5 – Train a local model
  • 23. To submit a training job to a remote you have to perform the following tasks: • 6.1: Create a directory • 6.2: Create a training script • 6.3: Create an estimator object • 6.4: Submit the job Step 6.1 – Create a directory Create a directory to deliver the required code from your computer to the remote resource. import os script_folder = './sklearn-mnist' os.makedirs(script_folder, exist_ok=True) Step 6 – Train model on remote cluster
  • 24. %%writefile $script_folder/train.py # load train and test set into numpy arrays # Note: we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can # converge faster. # ‘data_folder’ variable holds the location of the data files (from datastore) Reg = 0.8 # regularization rate of the logistic regression model. X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0 X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0 y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1) y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1) print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = 'n’) # get hold of the current run run = Run.get_context() #Train a logistic regression model with regularizaion rate of’ ‘reg’ clf = LogisticRegression(C=1.0/reg, random_state=42) clf.fit(X_train, y_train) Step 6.2 – Create a Training Script (1/2)
  • 25. print('Predict the test set’) y_hat = clf.predict(X_test) # calculate accuracy on the prediction acc = np.average(y_hat == y_test) print('Accuracy is', acc) run.log('regularization rate', np.float(args.reg)) run.log('accuracy', np.float(acc)) os.makedirs('outputs', exist_ok=True) # The training script saves the model into a directory named ‘outputs’. Note files saved in the # outputs folder are automatically uploaded into experiment record. Anything written in this # directory is automatically uploaded into the workspace. joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl') Step 6.2 – Create a Training Script (2/2)
  • 26. An estimator object is used to submit the run. from azureml.train.estimator import Estimator script_params = { '--data-folder': ds.as_mount(), '--regularization': 0.8 } est = Estimator(source_directory=script_folder, script_params=script_params, compute_target=compute_target, entry_script='train.py’, conda_packages=['scikit-learn']) Step 6.4 – Submit the job to the cluster for training run = exp.submit(config=est) run Step 6.3 – Create an Estimator
  • 27. What happens after you submit the job? Post-Processing The ./outputs directory of the run is copied over to the run history in your workspace so you can access these results. Running In this stage, the necessary scripts and files are sent to the compute target, then data stores are mounted/copied, then the entry_script is run. While the job is running, stdout and the ./logs directory are streamed to the run history. You can monitor the run's progress using these logs. Image creation A Docker image is created matching the Python environment specified by the estimator. The image is uploaded to the workspace. Image creation and uploading takes about 5 minutes. This happens once for each Python environment since the container is cached for subsequent runs. During image creation, logs are streamed to the run history. You can monitor the image creation progress using these logs. Scaling If the remote cluster requires more nodes to execute the run than currently available, additional nodes are added automatically. Scaling typically takes about 5 minutes.
  • 28. Step 7 – Monitor a run You can watch the progress of the run with a Jupyter widget. The widget is asynchronous and provides live updates every 10-15 seconds until the job completes. from azureml.widgets import RunDetails RunDetails(run).show() Here is a still snapshot of the widget shown at the end of training:
  • 29. Step 8 – See the results As model training and monitoring happen in the background. Wait until the model has completed training before running more code. Use wait_for_completion to show when the model training is complete run.wait_for_completion(show_output=False) # now there is a trained model on the remote cluster print(run.get_metrics()) {'regularization rate': 0.8, 'accuracy': 0.9204}
  • 30. Step 9 – Register the model This wrote the file ‘outputs/sklearn_mnist_model.pkl’ in a directory named ‘outputs’ in the VM of the cluster where the job is executed. • outputs is a special directory in that all content in this directory is automatically uploaded to your workspace. • This content appears in the run record in the experiment under your workspace. • Hence, the model file is now also available in your workspace. joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl') Recall that the last step in the training script is: # register the model in the workspace model = run.register_model ( model_name='sklearn_mnist’, model_path='outputs/sklearn_mnist_model.pkl’) The model is now available to query, examine, or deploy
  • 31. Step 9 – Deploy the Model
  • 32. Step 9.1 – Create the scoring script Create the scoring script, called score.py, used by the web service call to show how to use the model. It requires two functions – init() and run (input data) from azureml.core.model import Model def init(): global model # retreive the path to the model file using the model name model_path = Model.get_model_path('sklearn_mnist’) model = joblib.load(model_path) def run(raw_data): data = np.array(json.loads(raw_data)['data’]) # make prediction y_hat = model.predict(data) return json.dumps(y_hat.tolist())
  • 33. Step 9.2 – Create environment file Create an environment file, called myenv.yml, that specifies all of the script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image. This example needs scikit-learn and azureml-sdk. from azureml.core.conda_dependencies import CondaDependencies myenv = CondaDependencies() myenv.add_conda_package("scikit-learn") with open("myenv.yml","w") as f: f.write(myenv.serialize_to_string()) Step 9.3 – Create configuration file Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for the ACI container. Here we will use the defaults (1 core and 1 gigabyte of RAM) from azureml.core.webservice import AciWebservice aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1, tags={"data": "MNIST", "method" : "sklearn"}, description='Predict MNIST with sklearn')
  • 34. Step 9.4 – Deploy the model to ACI %%time from azureml.core.webservice import Webservice from azureml.core.image import ContainerImage # configure the image image_config = ContainerImage.image_configuration( execution_script ="score.py", runtime ="python", conda_file ="myenv.yml") service = Webservice.deploy_from_model(workspace=ws, name='sklearn-mnist-svc’, deployment_config=aciconfig, models=[model], image_config=image_config) service.wait_for_deployment(show_output=True)
  • 35. Step 10 – Test the deployed model using the HTTP end point Test the deployed model by sending images to be classified to the HTTP endpoint import requests import json # send a random row from the test set to score random_index = np.random.randint(0, len(X_test)-1) input_data = "{"data": [" + str(list(X_test[random_index])) + "]}" headers = {'Content-Type':'application/json’} resp = requests.post(service.scoring_uri, input_data, headers=headers) print("POST to url", service.scoring_uri) #print("input data:", input_data) print("label:", y_test[random_index]) print("prediction:", resp.text)
  • 36. Azure Automated Machine Learning ‘simplifies’ the creation and selection of the optimal model
  • 37. Typical ‘manual’ approach to hyperparameter tuning Dataset Training Algorithm 1 Hyperparameter Values – config 1 Model 1 Hyperparameter Values – config 2 Model 2 Hyperparameter Values – config 3 Model 3 Model Training Infrastructure Training Algorithm 2 Hyperparameter Values – config 4 Model 4 Complex Tedious Repetitive Time consuming Expensive
  • 38. What are Hyperparameters? Adjustable parameters that govern model training Chosen prior to training, stay constant during training Model performance heavily depends on hyperparameter
  • 39. The search space to explore—i.e. evaluating all possible combinations—is huge. Sparsity of good configurations. Very few of all possible configurations are optimal. Evaluating each configuration is resource and time consuming. Time and resources are limited. Challenges with Hyperparameter Selection
  • 40. Machine Learning Complexity Complexity of Machine Learning Source: http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
  • 42. Automated ML Use via the Python SDK
  • 43. © Copyright Microsoft Corporation. All rights reserved. Appendix
  • 45. Drone-based electric grid inspector powered by deep learning Challenge Traditional power line inspection services are costly Demand for low cost image scoring and support for multiple concurrent customers Needed powerful AI to execute on a drone solution Solution Deep learning to analyze multiple streaming data feeds Azure GPUs support Single Shot multibox detectors Reliable, consistent, and highly elastic scalability with Azure Batch Shipyards
  • 46. eSmart architecture Data Sources Ingest Prepare Analyze Publish Consume Drone collected images Batch upload of drone images On-prem command center Cosmos DB Contain inventory results and state changes Azure ML compute Docker Image DNN contained in a Docker image Azure ML service Azure Blob Cosmos DB 10 01 TensorFlow Jupyter Intelligent Edge Models deployed to drones for accelerated inferencing
  • 47. Try it for free http://aka.ms/amlfree Learn more: http://aka.ms/azureml-docs Visit the Getting started guide
  • 48. Challenge • Fraud losses of USD 2.8M in internet banking in a single channel for one region alone • Need for real-time fraud detection in less than 15 minutes • Demand to reduce account false positive rate (AFPR) to no more than 5 to 1 Solution • Deployed Lambda-based solution architecture using Azure data PaaS • Fraud account detection rate (ADR) of 94% • Value detection rate (VDR) of 76% with a 1 minute delay post identification Million dollar loss prevention with real-time fraud detection
  • 49. Challenge • Assist buyer search by providing accurate options of similar clothing items from catalogue • Need for improved smart-image matching capability based on color, pattern, neck style, etc. Solution • Training data created using Bing and domain- specific images • Used transfer learning to leverage pre-trained ImageNet deep neural network • Accurate list of most similar clothing items using similarity metrics for apparel • Match accuracy of 74% Increased match accuracy with image analysis
  • 51. FPGAs EFFICIENCY Azure ML Silicon Alternatives FLEXIBILITY CPUs GPUs ASICs INFERENCING CPUs, GPUS, FPGAs TRAINING CPUs and GPUs Cloud Edge INFERENCING CPUs, GPUS, FPGAs TRAINING (HEAVY EDGE) CPUs and GPUs Flexibility Efficiency FPGAs vs. CPU, GPU, and ASIC
  • 52. What is currently supported on Azure? Today, Project Brainwave supports Image classification and recognition scenarios TensorFlow deployment DNNs: ResNet 50, ResNet 152, VGG-16, SSD-VGG, and DenseNet-121 Intel FPGA hardware Using this FPGA-enabled hardware architecture, trained neural networks run quickly and with lower latency. Project Brainwave can parallelize pre-trained deep neural networks (DNN) across FPGAs to scale out your service. The DNNs can be pre-trained, as a deep featurizer for transfer learning, or fine-tuned with updated weights Here is the workflow for creating an image recognition service in Azure using supported DNNs as a featurizer for deployment on Azure FPGAs: Use the Azure Machine Learning SDK for Python to create a service definition, which is a file describing a pipeline of graphs (input, featurizer, and classifier) based on TensorFlow. The deployment command will automatically compress the definition and graphs into a ZIP file and upload the ZIP to Azure Blob storage. The DNN is already deployed on Project Brainwave to run on the FPGA. Register the model using the SDK with the ZIP file in Azure Blob storage. Deploy the service with the registered model using SDK
  • 54. Distributed Training with Azure ML Compute distributed training with Horovod
  • 56. IoT Edge Modules An Azure IoT Edge device is a Linux or Windows-based device that runs the Azure IoT Edge runtime. Machine learning models can be deployed to these devices as IoT Edge Modules. Benefits: Deploying a model to an IoT Edge device allows the device to use the model directly, instead of having to send data to the cloud for processing. You get faster response times and less data transfer IoT Edge Modules Azure IoT Edge modules are the smallest unit of computation managed by IoT Edge, and can contain Azure services or your own solution-specific code. IoT Edge module images contain applications that take advantage of the management, security, and communication features of the IoT Edge runtime. In implementation, modules images exist as container images in a repository, and module instances are containers on devices.
  • 57. Edge Deployment Light and Heavy Cloud: Azure Heavy Edge Light Edge Description An Azure host that spans from CPU to GPU and FPGA VMs A server with slots to insert CPUs, GPUs, and FPGAs or a X64 or ARM system that needs to be plugged in to work A Sensor with a SOC (ARM CPU, NNA, MCU) and memory that can operate on batteries Example DSVM / ACI / AKS / Batch AI - DataBox Edge - HPE - Azure Stack - DataBox Edge - Industrial PC -Video Gateway -DVR -Mobile Phones -VAIDK -Mobile Phones -IP Cameras -Azure Sphere - Appliances What runs the model CPU,GPU or FPGA CPU,GPU or FPGA CPU, GPU x64 CPU Multi-ARM CPU Hw accelerated NNA CPU/GPU MCU
  • 58. Purpose Graphics VM Family NV v1 GPU NVIDIA M60 Sizes 1, 2 or 4 GPU Interconnect PCIe (dual root) 2nd Network VM CPU Haswell VM RAM 56-224 GB Local SSD ~380-1500 GB Storage Std Storage Driver Quadro/Grid PC Azure Family of GPUs Compute Compute Compute NC v1 NC v2 NC v3 NVIDIA K80 NVIDIA P100 NVIDIA V100 1, 2 or 4 GPU 1, 2 or 4 GPU 1, 2 or 4 GPU PCIe (dual root) PCIe (dual root) PCIe (dual root) FDR InfiniBand FDR InfiniBand FDR InfiniBand Haswell Broadwell Broadwell 56-224 GB 112-448 GB 112-448 GB ~380-1500 GB ~700-3000 GB ~700-3000 GB Std Storage Prem Storage Prem Storage Tesla Tesla Tesla Deep Learning ND v1 NVIDIA P40 1, 2 or 4 GPU PCIe (dual root) FDR InfiniBand Broadwell 112-448 GB ~700-3000 GB Prem Storage Tesla
  • 59. New GPU Families (Announced this week!) Purpose Graphics VM Family NV v2 GPU NVIDIA M60 Sizes 1, 2 or 4 GPU Interconnect PCIe (dual root) 2nd Network VM CPU Broadwell VM RAM 112-448 GB Local SSD ~700-3000 GB Storage Prem Storage Driver Quadro/Grid PC Deep Learning ND v2 NVIDIA V100 8 GPU NVLink Skylake 672 GB ~1300 GB Prem Storage