SlideShare a Scribd company logo
Je veux qu’on déploie
mon modèle de ML !
Global Azure 2020 – Samedi 25 Avril
Une conférence coorganisée par AZUG FR
Another story of MLOps…
“It works on my machine”
says the Data Scientist
…while training the model on a laptop
• with a (not so significant ?) part of the datas
• a whole night long
• with some unknown packages
• with dependencies of others projects
• without a distributed compute context
Paul Péton
• Microsoft Artificial Intelligence MVP since 2018
• Meetup organizer : Club Power BI @Nantes, France (YouTube)
• Meetup speaker : Azure Nantes, France (YouTube)
• Used to be Data Miner (an old buzzword…)
• Senior Data Consultant @AZEO
Twitter : @paulpeton
https://www.linkedin.com/in/paul-peton-datascience
https://github.com/methodidacte/
Data Scientist or Data Engineer ?
How to merge ?
Agenda
• Quickly : what is Machine Learning ?
• Azure Machine Learning “new” Studio
• The MLOps approach
• Your new best friend : the Python SDK
• 3 notebooks are available on my GitHub
• Examples of Azure Architecture
• Conclusion and go further
Supervised Machine Learning
• Field of study that gives computers the ability to learn without
being explicitly programmed (Arthur Samuel, 1959)
Data project : 4 common steps
Without the « consume » step(reporting, predictive models,
apps…), your data project is only cost.
INGEST STORE COMPUTE CONSUME
Focus on the « compute » step in ML
• Iterations on data preparation and training steps
• We evaluate the model with metrics.
• The « Predict » step is supported by a deployment.
Feature
engineering &
selection
Train model
Test / evaluate
model
Predict
Parameters, hyperparameters, pickle…
• The simplest model you have ever seen :
• y = f(X) = aX + b
• [a,b] are the parameters of the model.
• Example of hyperparameters : learning rate
• We will save the parameters values in a « pickle » file.
• Others formats : ONNX, H5, RDS...
Why industrialize ?
• Check versioning
• Have backups for backtracking
• Planning the execution of treatments
• Monitor services
• Integrating organizational security
• Deploy on light terminals (edge)
What do we need for production ?
• A scheduler to plan
• Data cleansing
• Training / re-training of the model
• The forecast calculation (in batch mode)
• A storage system to archive models
• Per algorithm, per version, per training dataset
• In a serialized (not proprietary) binary format
• A tool for exposing the model
• Via REST API (diagnostic language)
• Secure access
• Resources that can be deployed at scale
• With the help of the containers
• In the Cloud
Serving
The “new” Studio
Azure Machine Learning service
Set of Azure Cloud
Services
Python
SDK & R
✓ Prepare Data
✓ Build Models
✓ Train Models
✓ Manage Models
✓ Track Experiments
✓ Deploy Models
That enables you to:
Cloud
CPU, GPU, FPGA
Datasets
Profiling, Drift, Labeling
Inferencing
Batch, Realtime
MLOps
Reproducible, Automatable, GitHub, CLI, REST
Experience
SDK, Notebooks, Drag-n-drop, Wizard
Edge
CPU, GPU, NPU
Azure IoT Edge
Security, Mgmt., Deployment
Compute
Jobs, Clusters, Instances
Model Registry
Models, Images
Training
Experiments, Runs
Datasets
Profiling, Drift, Labeling
Inferencing
Batch, Realtime
MLOps
Reproducible, Automatable, GitHub, CLI, REST
Experience
SDK, Notebooks, Drag-n-drop, Wizard
Model Registry
Models, Images
Training
Experiments, Runs
Use case : regression
• Diabetes
• Author: Bradley Efron, Trevor Hastie, Iain Johnstone and Robert
Tibshirani
• Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of
n = 442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.
• Modelise the Y value by the others (regression)
How to find the best model ?
• What about Data Scientist’s job ?
• 80% of the job is prepapring data
• 20% is complaining aout the first 80%
• Best model with Automated ML !
• As script (included in the Python SDK)
• With a UI (enterprise feature)
Automated ML with the Python SDK
import logging
from azureml.train.automl import AutoMLConfig
# Define settings
automl_settings = {
"iteration_timeout_minutes" : 10,
"iterations" : 2,
"primary_metric" : 'spearman_correlation',
"preprocess" : True,
"verbosity" : logging.INFO,
"n_cross_validations": 5
}
Automated ML with the Python SDK
automl_config = AutoMLConfig(task = 'regression’,
# could be classification or forecast for time series
debug_log = 'automated_ml_errors.log',
path = train_model_folder,
compute_target = aml_compute,
run_configuration = aml_run_config,
data_script = train_model_folder+"/get_data.py",
**automl_settings)
print("AutoML config created.")
Automated ML with the Python SDK
from azureml.train.automl.runtime import AutoMLStep
trainWithAutomlStep = AutoMLStep(
name='AutoML_Regression',
automl_config=automl_config,
inputs=[output_split_train_x, output_split_train_y],
allow_reuse=True,
hash_paths=[os.path.realpath(train_model_folder)])
from azureml.core.experiment import Experiment
experiment = Experiment(ws, "taxi-experiment")
local_run = experiment.submit(automl_config, show_output=True)
MLOps
DevOps MLOps
Code testing
Code reproducibility
App deployment
Model retraining
Model validation
Model reproducibility
Model deployment
DevOps MLOps
Code testing
Code reproducibility
App deployment
Model retraining
Model validation
Model reproducibility
Model deployment
Python SDK
Azure Python SDK
• Set of libraries that facilitate access to :
• Management components (Virtual Machine, Cluster, Image…)
• Runtime components (ServiceBus using HTTP, Batch, Monitor…)
• Official GitHub repository :
• https://github.com/Azure/azure-sdk-for-python
• The full list of available packages and their latest version :
• https://docs.microsoft.com/fr-fr/python/api/overview/azure/?view=azure-python
• Installation :
!pip install --upgrade azureml-sdk
• Or clone th GtiHub reporsitory :
git clone git://github.com/Azure/azure-sdk-for-python.git
cd azure-sdk-for-python
python setup.py install
Quick start tip (without Azure ML)
The Data Science VM is
preconfigured with azureml and a
Spark context.
Main Objects
• Workspace
• Inside the Workspace
• Datastore & Dataset
• Compute target
• Experiment
• Pipeline
• Run
• Model
• Environment
• Estimator
• Inference
• Endpoint
Workspace Experiment
Datastore /
Dataset
Train the
Model
Register
the Model
Deploy the
model
ML most simple workflow with Python SDK
Intialize the
workspace
• To interact with Azure
ML Service
Create an
experiment
• Which will « run »
Create a datastore /
upload a dataset
• Or upload an Open
Dataset
Train the model
• Best model can be
found with Automated
ML
Register the model
• With SCIKITLEARN,
ONNX or TensorFlow
framework
Deploy the model
• Create a standard
service
Intialize the workspace
import azureml.core
from azureml.core import Workspace
# check core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)
# load workspace configuration from the config.json file in the
current folder = interactive authentication
ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, ws.location, sep='t')
Workspace Experiment
Datastore /
Dataset
Train the
Model
Register the
Model
Deploy the
Model
Authentication with service principal
Create the service principal with PowerShell :
az extension add -n azure-cli-ml
az ad sp create-for-rbac --sdk-auth --name ml-auth
az ad sp show --id your-client-id
az ml workspace share -w your-workspace-name -g your-
resource-group-name --user your-sp-object-id --role owner
Authentication with service principal
from azureml.core.authentication import ServicePrincipalAuthentication
sp = ServicePrincipalAuthentication(tenant_id="your-tenant-id", # tenantID
service_principal_id="your-client-id", # clientId
service_principal_password="your-client-secret") # clientSecret
import os
sp = ServicePrincipalAuthentication(tenant_id=os.environ['AML_TENANT_ID’],
service_principal_id=os.environ['AML_PRINCIPAL_ID'],
service_principal_password=os.environ['AML_PRINCIPAL_PASS’])
from azureml.core import Workspace
ws = Workspace.get(name="sandboxaml",
workspace="rg-sandbox",
subscription_id=os.environ[‘SUBSCRIPTION_ID’],
auth=sp)
ws.get_details()
Create an experiment
experiment_name = 'my_first_experiment'
from azureml.core import Experiment
exp = Experiment(workspace=ws, name=experiment_name)
exp
Workspace Experiment
Datastore /
Dataset
Train the
Model
Register the
Model
Deploy the
mode
Create a datastore
blob_datastore_name='MyBlobDatastore'
# Storage account name
account_name=os.getenv("BLOB_ACCOUNTNAME_62", "<my-account-name>")
# Name of Azure blob container
container_name=os.getenv("BLOB_CONTAINER_62", "<my-container-name>")
# Storage account key
account_key=os.getenv("BLOB_ACCOUNT_KEY_62", "<my-account-key>")
blob_datastore = Datastore.get(ws, blob_datastore_name) print("Found Blob
Datastore with name: %s" % blob_datastore_name)
Workspace Experiment
Datastore /
Dataset
Train the
Model
Register the
Model
Deploy the
mode
Supported Azure Storage services
• Azure Blob Container
• Azure File Share
• Azure Data Lake
• Azure Data Lake Gen2
• Azure SQL Database
• Azure Database for PostgreSQL
• Azure Database for MySQL
• Databricks File System
We need an ODBC connector !
Upload data from datastore
from azureml.core import Dataset
dataset = Dataset.Tabular.from_delimited_files(
path = [(datastore, 'train-dataset/tabular/iris.csv’)])
# Convert the Dataset object to a (in-memory) pandas dataframe
df = dataset.to_pandas_dataframe()
Workspace Experiment
Datastore /
Dataset
Train the
Model
Register the
Model
Deploy the
model
We will speak about parallel compute later
Train the model (nothing from azureml !)
# Separate train and test data
from sklearn.model_selection import train_test_split
X = diabetes.drop(target, axis=1)
y = diabetes["Y"].values.reshape(-1,1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
from sklearn.linear_model import LinearRegression
# Set up the model
model = LinearRegression()
# Use fit
model.fit(X_train, y_train)
Workspace Experiment
Datastore /
Dataset
Train the
Model
Register the
Model
Deploy the
model
We train on the compute instance !
Register the model
from azureml.core.model import Model
model = Model.register(model_path="diabetes_regression_model.pkl",
model_name="diabetes_regression_model",
tags={'area': "diabetes", 'type': "regression"},
description="Ridge reg to predict diabetes",
workspace=ws)
Workspace Experiment
Datastore /
Dataset
Train the
Model
Register the
Model
Deploy the
model
We can find the model in the studio
Deploy the model to the cloud (ACI)
from azureml.core import Webservice
from azureml.exceptions import WebserviceException
service_name = 'diabetes-service'
# Remove any existing service under the same name.
try:
Webservice(ws, service_name).delete()
except WebserviceException:
pass
service = Model.deploy(ws, service_name, [model])
service.wait_for_deployment(show_output=True)
Workspace Experiment
Datastore /
Dataset
Train the
Model
Register the
Model
Deploy the
mode
We can see the endpoint in the studio
We can find the container on the Azure portal
Test the model
import json
input_payload = json.dumps({
'data': [
[59, 2, 32.1, 101.0, 157, 93.2, 38.0, 4.0, 4.8598, 87]
],
'method': 'predict’
# If you have a classification model, you can get probabilities by changing
this to 'predict_proba'.
})
output = service.run(input_payload)
print(output)
Workspace Experiment
Datastore /
Dataset
Train the
Model
Register the
Model
Deploy the
mode
Where to deploy ?
• In a local Docker environment
• To the “cloud” : Azure Container Instance
• For developer scenario
• To Azure Kubernetes Service
• For production scenario
• To Azure Function
• With the Docker Image
• Serverless
• To Azure Webapp
• With the Docker Image
• Cheaper than AKS
ML custom workflow with Python SDK
…
Train the model
•Best model can be found
with Automated ML
Register the model
•With CUSTOM
framework
Define a scoring
script
•Called score.py
Define the
environment
•Packages needed
Create an inference
configuration
•Profile the model if
needed
Deploy the model
•Locally
•ACI
•AKS
Scoring script
Workspace Experiment
Datastore /
Dataset
Train the
Model
Register the
Model
Scoring script
Environment
Inference
configuration
Deploy the
Model
Environment definition
# Set up the (compute target) environnement and save in a YAML file
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
env = Environment("diabetes_env")
env.docker.enabled = True
# Two methods : Conda or pip
env.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn',
'pandas',
'numpy',
'matplotlib'
])
env.python.conda_dependencies.add_pip_package("inference-schema[numpy-support]")
env.python.conda_dependencies.save_to_file(".", "diabetes_env.yml")
Workspace Experiment
Datastore /
Dataset
Train the
Model
Register the
Model
Scoring script
Environment
Inference
configuration
Deploy the
Model
Create the inference configuration
• from azureml.core.model import InferenceConfig
• inference_config = InferenceConfig(entry_script='score.py’,
environment=env)
Workspace Experiment
Datastore /
Dataset
Train the
Model
Register the
Model
Scoring script
Environment
Inference
configuration
Deploy the
Model
Deploy the model (1/2)
from azureml.core import Webservice
from azureml.core.webservice import AciWebservice
from azureml.exceptions import WebserviceException
service_name = 'diabetes-custom-service'
# Remove any existing service under the same name.
try:
Webservice(ws, service_name).delete()
except WebserviceException:
pass
aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)
Workspace Experiment
Datastore /
Dataset
Train the
Model
Register the
Model
Scoring script
Environment
Inference
configuration
Deploy the
Model
Find the best deploy configuration with the profiling object
Deploy the model (2/2)
service = Model.deploy(workspace=ws,
name=service_name,
models=[model],
inference_config=inference_config,
deployment_config=aci_config)
service.wait_for_deployment(show_output=True)
Workspace Experiment
Datastore /
Dataset
Train the
Model
Register the
Model
Scoring script
Environment
Inference
configuration
Deploy the
Model
Local scenario (Docker Web Service)
Local scenario (Docker Web Service)
I want my model to be deployed ! (another story of MLOps)
ML « big data » workflow with Python SDK
Intialize the
workspace
• To interact with Azure
ML Service
Create an
experiment
• Which will « run »
Create a datastore /
upload a dataset
• Or upload an Open
Dataset
Create a compute
target
• Cluster of VM
Train the model
• Best model can be
found with Automated
ML
Define the compute
target environment
• Create a YAML file
Create an estimator
• Configuration of the
run
Monitor the run …
Create a remote compute target
# Compute cluster creation.
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
cpu_cluster_name = "myComputeCluster"
# Verify that cluster does not exist already
try:
cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
print(" Cluster already exists")
except ComputeTargetException:
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
min_nodes=0, max_nodes=4)
cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
cpu_cluster.wait_for_completion(show_output=True, min_node_count=0, timeout_in_minutes=10)
min_nodes=0 The target will be deleted when no jobs are running on it.
Workspace Experiment
Datastore /
Dataset
Train the Model
Compute target
environment
Estimator
Monitor
Register the
(remote) Model
Deploy the
Model
Train on a remote target cluster
parser = argparse.ArgumentParser()
parser.add_argument('--regularization',
type=float, dest='reg', default=0.5,
help='regularization strength')
args = parser.parse_args()
model = Ridge(alpha=args.reg, solver="auto",
random_state=42)
# note file saved in the outputs folder is
automatically uploaded into experiment record
joblib.dump(value=model,
filename='outputs/diabetes_reg_remote_model.pkl')
Workspace Experiment
Datastore /
Dataset
Train the Model
Compute target
environment
Estimator
Monitor
Register the
(remote) Model
Deploy the
Model
Create an estimator
from azureml.train.estimator import Estimator
script_params = {
'--regularization': 0.5
}
est = Estimator(source_directory=script_folder,
script_params=script_params,
compute_target=compute_target,
environment_definition=env,
entry_script='train.py’)
run = exp.submit(config=est)
Workspace Experiment
Datastore /
Dataset
Train the Model
Compute target
environment
Estimator
Monitor
Register the
(remote) Model
Deploy the
Model
Monitor a run
from azureml.widgets import RunDetails
RunDetails(run).show()
# specify show_output to True for a verbose log
run.wait_for_completion(show_output=True)
Workspace Experiment
Datastore /
Dataset
Train the Model
Compute target
environment
Estimator
Monitor
Register the
(remote) Model
Deploy the
Model
Jupyter widget
Register the (remote) model
print(run.get_file_names())
# register model
model = run.register_model(model_name='sklearn_mnist',
model_path='outputs/sklearn_mnist_model.pkl’)
print(model.name, model.id, model.version, sep='t')
Workspace Experiment
Datastore /
Dataset
Train the Model
Compute target
environment
Estimator
Monitor
Register the
(remote) Model
Deploy the
Model
Define a ML Pipeline
from azureml.pipeline.core import Pipeline
from azureml.pipeline.steps import PythonScriptStep
from azureml.widgets import RunDetails
pipeline_steps = [ PythonScriptStep(
script_name="train.py",
arguments=["--input", input_data, "--output", output_data],
inputs=[input_data],
outputs=[output_data],
compute_target=compute_target,
source_directory="dataprep"
) ]
pipeline = Pipeline(workspace = ws, steps=pipeline_steps)
pipeline_run = exp.submit(pipeline, regenerate_outputs=False)
RunDetails(pipeline_run).show()
Define a Scheduler
from azureml.pipeline.core import Pipeline, PublishedPipeline
published_pipelines = PublishedPipeline.list(ws)
for published_pipeline in published_pipelines:
print(f"{published_pipeline.name},'{published_pipeline.id}'")
from azureml.pipeline.core import Schedule, ScheduleRecurrence
recurrence = ScheduleRecurrence(frequency="Minute", interval=15)
recurring_schedule = Schedule.create(ws, name="MyRecurringSchedule",
description="Based on time",
pipeline_id=pipeline_id,
experiment_name=experiment_name,
recurrence=recurrence)
pipeline = PublishedPipeline.get(ws, id=pipeline_id)
pipeline.disable()
Launch pipeline from Azure Data Factory
Industry leading MLOps
Data
Analyst
Data
Scientist
Data Lake
Storage
Raw data copy
Clean data archiving
Deployment
Power BI
ML Service
SQL DB /
DWH
What I do for my client
azureml SDK
What I will do soon
Data
Analyst
Data
Scientist
Data Lake
Storage
Raw data copy Deployment
Power BI
ML Service
My opinion about deploying with
Azure ML
• Strengths to build on :
• Notebooks are everywhere (except in the heart of code purists),
locally and in the cloud.
• Services communicate (increasingly) with each other and are easily
planned.
• The configuration and deployment of the containers is possible
through the interface but can also be automated through the code.
• The story ends with visualizations in Power BI (JDBC connector to
Databricks tables)
My opinion about deploying with
Azure ML
• Which would make it easier for us:
• A Data Preparation tool with predefined connectors
• Where is Azure Machine Learning Workbench ?
• A better interface for designing ML pipelines, including code
• Could be the Studio
• A simple DEV / UAT / PROD separation
• Have a look at Azure Devops (release pipeline)
• A better mastery of Software Engineering by Data Scientists
• Learn Scala !
• Or a better scalability of Python
• Try koalas, dask, modin…
Summary
• Data Scientists can spend time on
• exploring data
• interpret models
• Data Engineers have a full toolbox to industrialize ML
• Are they really two distinct people ?
Now, only sky cloud is the limit !
• Fully automated deployments
• A/B testing scenario
• Integrate feedback and loop ?
And now, go further with…
• AutomatedML
• azureml-core on Databricks
• azureml for R ?
• control Azure ML with API
• InterpretML Community
I want my model to be deployed ! (another story of MLOps)

More Related Content

PPTX
Unsupervised Aspect Based Sentiment Analysis at Scale
PDF
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
PPTX
2020 10 22 AI Fundamentals - Azure Machine Learning
PDF
Julien Simon "Scaling ML from 0 to millions of users"
PPTX
Distributed Model Training using MXNet with Horovod
PPTX
[NDC 2019] Functions 2.0: Enterprise-Grade Serverless
PPTX
Inside Azure Diagnostics
PDF
Spark summit2014 techtalk - testing spark
Unsupervised Aspect Based Sentiment Analysis at Scale
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
2020 10 22 AI Fundamentals - Azure Machine Learning
Julien Simon "Scaling ML from 0 to millions of users"
Distributed Model Training using MXNet with Horovod
[NDC 2019] Functions 2.0: Enterprise-Grade Serverless
Inside Azure Diagnostics
Spark summit2014 techtalk - testing spark

What's hot (15)

PDF
Azure ARM Templates 101
PDF
Infrastructure as Code: Manage your Architecture with Git
PDF
AWSの進化とSmartNewsの裏側
PPTX
Hacking google cloud run
PPTX
OpenStack Horizon: Controlling the Cloud using Django
PDF
Running BSD on AWS
PDF
From Zero to Hero – Web Performance
PDF
Agile experiments in Machine Learning with F#
PDF
Spring Batch in Code - simple DB to DB batch applicaiton
PPTX
ServerTemplate Deep Dive
PPTX
Using Google App Engine Python
PPTX
Picking the right AWS backend for your application (September 2017)
PDF
Autoscaling in kubernetes v1
PDF
Infrastructure as Code for Beginners
PDF
Amplify를 통해 클라우드 기반 모바일 앱 개발하기 - 박태성(IDEASAM) :: AWS Community Day 2020
Azure ARM Templates 101
Infrastructure as Code: Manage your Architecture with Git
AWSの進化とSmartNewsの裏側
Hacking google cloud run
OpenStack Horizon: Controlling the Cloud using Django
Running BSD on AWS
From Zero to Hero – Web Performance
Agile experiments in Machine Learning with F#
Spring Batch in Code - simple DB to DB batch applicaiton
ServerTemplate Deep Dive
Using Google App Engine Python
Picking the right AWS backend for your application (September 2017)
Autoscaling in kubernetes v1
Infrastructure as Code for Beginners
Amplify를 통해 클라우드 기반 모바일 앱 개발하기 - 박태성(IDEASAM) :: AWS Community Day 2020
Ad

Similar to I want my model to be deployed ! (another story of MLOps) (20)

PPTX
Azure machine learning service
PDF
201908 Overview of Automated ML
PDF
201906 04 Overview of Automated ML June 2019
PDF
Sergii Baidachnyi ITEM 2018
PDF
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...
PDF
Build and deploy PyTorch models with Azure Machine Learning - Henk - CCDays
PDF
DEVOPS AND MACHINE LEARNING
PPTX
Machine Learning and AI
PPTX
2018 11 14 Artificial Intelligence and Machine Learning in Azure
PPTX
Migrating Existing Open Source Machine Learning to Azure
PDF
Open erp technical_memento_v0.6.3_a4
PDF
201906 02 Introduction to AutoML with ML.NET 1.0
PPTX
Machine Learning for .NET Developers - ADC21
PDF
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
PDF
Productionizing Machine Learning - Bigdata meetup 5-06-2019
PDF
국내 건설 기계사 도입 사례를 통해 보는 AI가 적용된 수요 예측 관리 - 베스핀글로벌 조창윤 AI/ML팀 팀장
PDF
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
PDF
Big Data Adavnced Analytics on Microsoft Azure
PDF
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
PDF
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
Azure machine learning service
201908 Overview of Automated ML
201906 04 Overview of Automated ML June 2019
Sergii Baidachnyi ITEM 2018
Shift Remote AI: Build and deploy PyTorch Models with Azure Machine Learning ...
Build and deploy PyTorch models with Azure Machine Learning - Henk - CCDays
DEVOPS AND MACHINE LEARNING
Machine Learning and AI
2018 11 14 Artificial Intelligence and Machine Learning in Azure
Migrating Existing Open Source Machine Learning to Azure
Open erp technical_memento_v0.6.3_a4
201906 02 Introduction to AutoML with ML.NET 1.0
Machine Learning for .NET Developers - ADC21
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Productionizing Machine Learning - Bigdata meetup 5-06-2019
국내 건설 기계사 도입 사례를 통해 보는 AI가 적용된 수요 예측 관리 - 베스핀글로벌 조창윤 AI/ML팀 팀장
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Big Data Adavnced Analytics on Microsoft Azure
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
Ad

More from AZUG FR (20)

PPTX
Packer, Terraform, Ansible avec Azure
PPTX
Tester avant de déployer ; comment tester ses déploiements ARM.
PDF
Dev & run d'un site marchant dans Azure
PPTX
Azure DNS Privé
PPTX
Meetup AZUG FR @ IdeaStudio - 5 Février 2019
PPTX
Cedric leblond migrer jenkins AWS vers Azure Devops
PPTX
Ignite 2018 - Nouveautés governance et management (Manon Pernin)
PPTX
Ignite 2018 - Nouveauté stockage (Didier Esteves)
PPTX
Ignite 2018 - Coups de coeur (Benoit Sautiere)
PPTX
Ignite 2018 - nouveautés sécurité et réseau (Laurent Yin)
PPTX
GAB 2018 PARIS - Enrichir vos applications avec Azure AI services par Houssem...
PPTX
GAB 2018 PARIS - Mettez un peu de CI/CD dans vos projets data! par Guillaume...
PPTX
GAB 2018 PARIS - Gouvernance Azure, comment éviter les écueils par Benoît Sau...
PPTX
Meetup AZUG FR Dec 2017 @ Arolla
PPTX
Meetup AZUG FR Oct 2017 @ Cellenza
PDF
Analysez vos textes avec Cognitive Services
PPTX
Gab17 lyon - La BI traditionnelle est une histoire du passée. Impacts de la r...
PPTX
Gab17 lyon - Blockchain as a service dans Azure by Igor Leontiev
PPTX
GAB 2017 PARIS - IoT Azure - Aymeric Weinbach
PPTX
GAB 2017 PARIS - Tester la sécurité de vos annuaires Active Directory et Azur...
Packer, Terraform, Ansible avec Azure
Tester avant de déployer ; comment tester ses déploiements ARM.
Dev & run d'un site marchant dans Azure
Azure DNS Privé
Meetup AZUG FR @ IdeaStudio - 5 Février 2019
Cedric leblond migrer jenkins AWS vers Azure Devops
Ignite 2018 - Nouveautés governance et management (Manon Pernin)
Ignite 2018 - Nouveauté stockage (Didier Esteves)
Ignite 2018 - Coups de coeur (Benoit Sautiere)
Ignite 2018 - nouveautés sécurité et réseau (Laurent Yin)
GAB 2018 PARIS - Enrichir vos applications avec Azure AI services par Houssem...
GAB 2018 PARIS - Mettez un peu de CI/CD dans vos projets data! par Guillaume...
GAB 2018 PARIS - Gouvernance Azure, comment éviter les écueils par Benoît Sau...
Meetup AZUG FR Dec 2017 @ Arolla
Meetup AZUG FR Oct 2017 @ Cellenza
Analysez vos textes avec Cognitive Services
Gab17 lyon - La BI traditionnelle est une histoire du passée. Impacts de la r...
Gab17 lyon - Blockchain as a service dans Azure by Igor Leontiev
GAB 2017 PARIS - IoT Azure - Aymeric Weinbach
GAB 2017 PARIS - Tester la sécurité de vos annuaires Active Directory et Azur...

Recently uploaded (20)

PDF
Exploring VPS Hosting Trends for SMBs in 2025
PPTX
Funds Management Learning Material for Beg
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PPTX
Slides PPTX World Game (s) Eco Economic Epochs.pptx
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PDF
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PPT
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PDF
Sims 4 Historia para lo sims 4 para jugar
PDF
SASE Traffic Flow - ZTNA Connector-1.pdf
PPTX
SAP Ariba Sourcing PPT for learning material
PPT
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
DOCX
Unit-3 cyber security network security of internet system
PPTX
Digital Literacy And Online Safety on internet
PDF
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PPTX
artificialintelligenceai1-copy-210604123353.pptx
PPT
Ethics in Information System - Management Information System
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
Exploring VPS Hosting Trends for SMBs in 2025
Funds Management Learning Material for Beg
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
Slides PPTX World Game (s) Eco Economic Epochs.pptx
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
Unit-1 introduction to cyber security discuss about how to secure a system
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
INTERNET------BASICS-------UPDATED PPT PRESENTATION
Sims 4 Historia para lo sims 4 para jugar
SASE Traffic Flow - ZTNA Connector-1.pdf
SAP Ariba Sourcing PPT for learning material
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
Unit-3 cyber security network security of internet system
Digital Literacy And Online Safety on internet
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
artificialintelligenceai1-copy-210604123353.pptx
Ethics in Information System - Management Information System
Module 1 - Cyber Law and Ethics 101.pptx

I want my model to be deployed ! (another story of MLOps)

  • 1. Je veux qu’on déploie mon modèle de ML ! Global Azure 2020 – Samedi 25 Avril Une conférence coorganisée par AZUG FR Another story of MLOps…
  • 2. “It works on my machine” says the Data Scientist …while training the model on a laptop • with a (not so significant ?) part of the datas • a whole night long • with some unknown packages • with dependencies of others projects • without a distributed compute context
  • 3. Paul Péton • Microsoft Artificial Intelligence MVP since 2018 • Meetup organizer : Club Power BI @Nantes, France (YouTube) • Meetup speaker : Azure Nantes, France (YouTube) • Used to be Data Miner (an old buzzword…) • Senior Data Consultant @AZEO Twitter : @paulpeton https://www.linkedin.com/in/paul-peton-datascience https://github.com/methodidacte/
  • 4. Data Scientist or Data Engineer ? How to merge ?
  • 5. Agenda • Quickly : what is Machine Learning ? • Azure Machine Learning “new” Studio • The MLOps approach • Your new best friend : the Python SDK • 3 notebooks are available on my GitHub • Examples of Azure Architecture • Conclusion and go further
  • 6. Supervised Machine Learning • Field of study that gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959)
  • 7. Data project : 4 common steps Without the « consume » step(reporting, predictive models, apps…), your data project is only cost. INGEST STORE COMPUTE CONSUME
  • 8. Focus on the « compute » step in ML • Iterations on data preparation and training steps • We evaluate the model with metrics. • The « Predict » step is supported by a deployment. Feature engineering & selection Train model Test / evaluate model Predict
  • 9. Parameters, hyperparameters, pickle… • The simplest model you have ever seen : • y = f(X) = aX + b • [a,b] are the parameters of the model. • Example of hyperparameters : learning rate • We will save the parameters values in a « pickle » file. • Others formats : ONNX, H5, RDS...
  • 10. Why industrialize ? • Check versioning • Have backups for backtracking • Planning the execution of treatments • Monitor services • Integrating organizational security • Deploy on light terminals (edge)
  • 11. What do we need for production ? • A scheduler to plan • Data cleansing • Training / re-training of the model • The forecast calculation (in batch mode) • A storage system to archive models • Per algorithm, per version, per training dataset • In a serialized (not proprietary) binary format • A tool for exposing the model • Via REST API (diagnostic language) • Secure access • Resources that can be deployed at scale • With the help of the containers • In the Cloud Serving
  • 13. Azure Machine Learning service Set of Azure Cloud Services Python SDK & R ✓ Prepare Data ✓ Build Models ✓ Train Models ✓ Manage Models ✓ Track Experiments ✓ Deploy Models That enables you to:
  • 14. Cloud CPU, GPU, FPGA Datasets Profiling, Drift, Labeling Inferencing Batch, Realtime MLOps Reproducible, Automatable, GitHub, CLI, REST Experience SDK, Notebooks, Drag-n-drop, Wizard Edge CPU, GPU, NPU Azure IoT Edge Security, Mgmt., Deployment Compute Jobs, Clusters, Instances Model Registry Models, Images Training Experiments, Runs Datasets Profiling, Drift, Labeling Inferencing Batch, Realtime MLOps Reproducible, Automatable, GitHub, CLI, REST Experience SDK, Notebooks, Drag-n-drop, Wizard Model Registry Models, Images Training Experiments, Runs
  • 15. Use case : regression • Diabetes • Author: Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani • Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442 diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline. • Modelise the Y value by the others (regression)
  • 16. How to find the best model ? • What about Data Scientist’s job ? • 80% of the job is prepapring data • 20% is complaining aout the first 80% • Best model with Automated ML ! • As script (included in the Python SDK) • With a UI (enterprise feature)
  • 17. Automated ML with the Python SDK import logging from azureml.train.automl import AutoMLConfig # Define settings automl_settings = { "iteration_timeout_minutes" : 10, "iterations" : 2, "primary_metric" : 'spearman_correlation', "preprocess" : True, "verbosity" : logging.INFO, "n_cross_validations": 5 }
  • 18. Automated ML with the Python SDK automl_config = AutoMLConfig(task = 'regression’, # could be classification or forecast for time series debug_log = 'automated_ml_errors.log', path = train_model_folder, compute_target = aml_compute, run_configuration = aml_run_config, data_script = train_model_folder+"/get_data.py", **automl_settings) print("AutoML config created.")
  • 19. Automated ML with the Python SDK from azureml.train.automl.runtime import AutoMLStep trainWithAutomlStep = AutoMLStep( name='AutoML_Regression', automl_config=automl_config, inputs=[output_split_train_x, output_split_train_y], allow_reuse=True, hash_paths=[os.path.realpath(train_model_folder)]) from azureml.core.experiment import Experiment experiment = Experiment(ws, "taxi-experiment") local_run = experiment.submit(automl_config, show_output=True)
  • 20. MLOps
  • 21. DevOps MLOps Code testing Code reproducibility App deployment Model retraining Model validation Model reproducibility Model deployment
  • 22. DevOps MLOps Code testing Code reproducibility App deployment Model retraining Model validation Model reproducibility Model deployment
  • 24. Azure Python SDK • Set of libraries that facilitate access to : • Management components (Virtual Machine, Cluster, Image…) • Runtime components (ServiceBus using HTTP, Batch, Monitor…) • Official GitHub repository : • https://github.com/Azure/azure-sdk-for-python • The full list of available packages and their latest version : • https://docs.microsoft.com/fr-fr/python/api/overview/azure/?view=azure-python • Installation : !pip install --upgrade azureml-sdk • Or clone th GtiHub reporsitory : git clone git://github.com/Azure/azure-sdk-for-python.git cd azure-sdk-for-python python setup.py install
  • 25. Quick start tip (without Azure ML) The Data Science VM is preconfigured with azureml and a Spark context.
  • 26. Main Objects • Workspace • Inside the Workspace • Datastore & Dataset • Compute target • Experiment • Pipeline • Run • Model • Environment • Estimator • Inference • Endpoint Workspace Experiment Datastore / Dataset Train the Model Register the Model Deploy the model
  • 27. ML most simple workflow with Python SDK Intialize the workspace • To interact with Azure ML Service Create an experiment • Which will « run » Create a datastore / upload a dataset • Or upload an Open Dataset Train the model • Best model can be found with Automated ML Register the model • With SCIKITLEARN, ONNX or TensorFlow framework Deploy the model • Create a standard service
  • 28. Intialize the workspace import azureml.core from azureml.core import Workspace # check core SDK version number print("Azure ML SDK Version: ", azureml.core.VERSION) # load workspace configuration from the config.json file in the current folder = interactive authentication ws = Workspace.from_config() print(ws.name, ws.location, ws.resource_group, ws.location, sep='t') Workspace Experiment Datastore / Dataset Train the Model Register the Model Deploy the Model
  • 29. Authentication with service principal Create the service principal with PowerShell : az extension add -n azure-cli-ml az ad sp create-for-rbac --sdk-auth --name ml-auth az ad sp show --id your-client-id az ml workspace share -w your-workspace-name -g your- resource-group-name --user your-sp-object-id --role owner
  • 30. Authentication with service principal from azureml.core.authentication import ServicePrincipalAuthentication sp = ServicePrincipalAuthentication(tenant_id="your-tenant-id", # tenantID service_principal_id="your-client-id", # clientId service_principal_password="your-client-secret") # clientSecret import os sp = ServicePrincipalAuthentication(tenant_id=os.environ['AML_TENANT_ID’], service_principal_id=os.environ['AML_PRINCIPAL_ID'], service_principal_password=os.environ['AML_PRINCIPAL_PASS’]) from azureml.core import Workspace ws = Workspace.get(name="sandboxaml", workspace="rg-sandbox", subscription_id=os.environ[‘SUBSCRIPTION_ID’], auth=sp) ws.get_details()
  • 31. Create an experiment experiment_name = 'my_first_experiment' from azureml.core import Experiment exp = Experiment(workspace=ws, name=experiment_name) exp Workspace Experiment Datastore / Dataset Train the Model Register the Model Deploy the mode
  • 32. Create a datastore blob_datastore_name='MyBlobDatastore' # Storage account name account_name=os.getenv("BLOB_ACCOUNTNAME_62", "<my-account-name>") # Name of Azure blob container container_name=os.getenv("BLOB_CONTAINER_62", "<my-container-name>") # Storage account key account_key=os.getenv("BLOB_ACCOUNT_KEY_62", "<my-account-key>") blob_datastore = Datastore.get(ws, blob_datastore_name) print("Found Blob Datastore with name: %s" % blob_datastore_name) Workspace Experiment Datastore / Dataset Train the Model Register the Model Deploy the mode
  • 33. Supported Azure Storage services • Azure Blob Container • Azure File Share • Azure Data Lake • Azure Data Lake Gen2 • Azure SQL Database • Azure Database for PostgreSQL • Azure Database for MySQL • Databricks File System We need an ODBC connector !
  • 34. Upload data from datastore from azureml.core import Dataset dataset = Dataset.Tabular.from_delimited_files( path = [(datastore, 'train-dataset/tabular/iris.csv’)]) # Convert the Dataset object to a (in-memory) pandas dataframe df = dataset.to_pandas_dataframe() Workspace Experiment Datastore / Dataset Train the Model Register the Model Deploy the model We will speak about parallel compute later
  • 35. Train the model (nothing from azureml !) # Separate train and test data from sklearn.model_selection import train_test_split X = diabetes.drop(target, axis=1) y = diabetes["Y"].values.reshape(-1,1) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) from sklearn.linear_model import LinearRegression # Set up the model model = LinearRegression() # Use fit model.fit(X_train, y_train) Workspace Experiment Datastore / Dataset Train the Model Register the Model Deploy the model We train on the compute instance !
  • 36. Register the model from azureml.core.model import Model model = Model.register(model_path="diabetes_regression_model.pkl", model_name="diabetes_regression_model", tags={'area': "diabetes", 'type': "regression"}, description="Ridge reg to predict diabetes", workspace=ws) Workspace Experiment Datastore / Dataset Train the Model Register the Model Deploy the model
  • 37. We can find the model in the studio
  • 38. Deploy the model to the cloud (ACI) from azureml.core import Webservice from azureml.exceptions import WebserviceException service_name = 'diabetes-service' # Remove any existing service under the same name. try: Webservice(ws, service_name).delete() except WebserviceException: pass service = Model.deploy(ws, service_name, [model]) service.wait_for_deployment(show_output=True) Workspace Experiment Datastore / Dataset Train the Model Register the Model Deploy the mode
  • 39. We can see the endpoint in the studio
  • 40. We can find the container on the Azure portal
  • 41. Test the model import json input_payload = json.dumps({ 'data': [ [59, 2, 32.1, 101.0, 157, 93.2, 38.0, 4.0, 4.8598, 87] ], 'method': 'predict’ # If you have a classification model, you can get probabilities by changing this to 'predict_proba'. }) output = service.run(input_payload) print(output) Workspace Experiment Datastore / Dataset Train the Model Register the Model Deploy the mode
  • 42. Where to deploy ? • In a local Docker environment • To the “cloud” : Azure Container Instance • For developer scenario • To Azure Kubernetes Service • For production scenario • To Azure Function • With the Docker Image • Serverless • To Azure Webapp • With the Docker Image • Cheaper than AKS
  • 43. ML custom workflow with Python SDK … Train the model •Best model can be found with Automated ML Register the model •With CUSTOM framework Define a scoring script •Called score.py Define the environment •Packages needed Create an inference configuration •Profile the model if needed Deploy the model •Locally •ACI •AKS
  • 44. Scoring script Workspace Experiment Datastore / Dataset Train the Model Register the Model Scoring script Environment Inference configuration Deploy the Model
  • 45. Environment definition # Set up the (compute target) environnement and save in a YAML file from azureml.core import Environment from azureml.core.conda_dependencies import CondaDependencies env = Environment("diabetes_env") env.docker.enabled = True # Two methods : Conda or pip env.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn', 'pandas', 'numpy', 'matplotlib' ]) env.python.conda_dependencies.add_pip_package("inference-schema[numpy-support]") env.python.conda_dependencies.save_to_file(".", "diabetes_env.yml") Workspace Experiment Datastore / Dataset Train the Model Register the Model Scoring script Environment Inference configuration Deploy the Model
  • 46. Create the inference configuration • from azureml.core.model import InferenceConfig • inference_config = InferenceConfig(entry_script='score.py’, environment=env) Workspace Experiment Datastore / Dataset Train the Model Register the Model Scoring script Environment Inference configuration Deploy the Model
  • 47. Deploy the model (1/2) from azureml.core import Webservice from azureml.core.webservice import AciWebservice from azureml.exceptions import WebserviceException service_name = 'diabetes-custom-service' # Remove any existing service under the same name. try: Webservice(ws, service_name).delete() except WebserviceException: pass aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1) Workspace Experiment Datastore / Dataset Train the Model Register the Model Scoring script Environment Inference configuration Deploy the Model Find the best deploy configuration with the profiling object
  • 48. Deploy the model (2/2) service = Model.deploy(workspace=ws, name=service_name, models=[model], inference_config=inference_config, deployment_config=aci_config) service.wait_for_deployment(show_output=True) Workspace Experiment Datastore / Dataset Train the Model Register the Model Scoring script Environment Inference configuration Deploy the Model
  • 49. Local scenario (Docker Web Service)
  • 50. Local scenario (Docker Web Service)
  • 52. ML « big data » workflow with Python SDK Intialize the workspace • To interact with Azure ML Service Create an experiment • Which will « run » Create a datastore / upload a dataset • Or upload an Open Dataset Create a compute target • Cluster of VM Train the model • Best model can be found with Automated ML Define the compute target environment • Create a YAML file Create an estimator • Configuration of the run Monitor the run …
  • 53. Create a remote compute target # Compute cluster creation. from azureml.core.compute import ComputeTarget, AmlCompute from azureml.core.compute_target import ComputeTargetException cpu_cluster_name = "myComputeCluster" # Verify that cluster does not exist already try: cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name) print(" Cluster already exists") except ComputeTargetException: compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', min_nodes=0, max_nodes=4) cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config) cpu_cluster.wait_for_completion(show_output=True, min_node_count=0, timeout_in_minutes=10) min_nodes=0 The target will be deleted when no jobs are running on it. Workspace Experiment Datastore / Dataset Train the Model Compute target environment Estimator Monitor Register the (remote) Model Deploy the Model
  • 54. Train on a remote target cluster parser = argparse.ArgumentParser() parser.add_argument('--regularization', type=float, dest='reg', default=0.5, help='regularization strength') args = parser.parse_args() model = Ridge(alpha=args.reg, solver="auto", random_state=42) # note file saved in the outputs folder is automatically uploaded into experiment record joblib.dump(value=model, filename='outputs/diabetes_reg_remote_model.pkl') Workspace Experiment Datastore / Dataset Train the Model Compute target environment Estimator Monitor Register the (remote) Model Deploy the Model
  • 55. Create an estimator from azureml.train.estimator import Estimator script_params = { '--regularization': 0.5 } est = Estimator(source_directory=script_folder, script_params=script_params, compute_target=compute_target, environment_definition=env, entry_script='train.py’) run = exp.submit(config=est) Workspace Experiment Datastore / Dataset Train the Model Compute target environment Estimator Monitor Register the (remote) Model Deploy the Model
  • 56. Monitor a run from azureml.widgets import RunDetails RunDetails(run).show() # specify show_output to True for a verbose log run.wait_for_completion(show_output=True) Workspace Experiment Datastore / Dataset Train the Model Compute target environment Estimator Monitor Register the (remote) Model Deploy the Model
  • 58. Register the (remote) model print(run.get_file_names()) # register model model = run.register_model(model_name='sklearn_mnist', model_path='outputs/sklearn_mnist_model.pkl’) print(model.name, model.id, model.version, sep='t') Workspace Experiment Datastore / Dataset Train the Model Compute target environment Estimator Monitor Register the (remote) Model Deploy the Model
  • 59. Define a ML Pipeline from azureml.pipeline.core import Pipeline from azureml.pipeline.steps import PythonScriptStep from azureml.widgets import RunDetails pipeline_steps = [ PythonScriptStep( script_name="train.py", arguments=["--input", input_data, "--output", output_data], inputs=[input_data], outputs=[output_data], compute_target=compute_target, source_directory="dataprep" ) ] pipeline = Pipeline(workspace = ws, steps=pipeline_steps) pipeline_run = exp.submit(pipeline, regenerate_outputs=False) RunDetails(pipeline_run).show()
  • 60. Define a Scheduler from azureml.pipeline.core import Pipeline, PublishedPipeline published_pipelines = PublishedPipeline.list(ws) for published_pipeline in published_pipelines: print(f"{published_pipeline.name},'{published_pipeline.id}'") from azureml.pipeline.core import Schedule, ScheduleRecurrence recurrence = ScheduleRecurrence(frequency="Minute", interval=15) recurring_schedule = Schedule.create(ws, name="MyRecurringSchedule", description="Based on time", pipeline_id=pipeline_id, experiment_name=experiment_name, recurrence=recurrence) pipeline = PublishedPipeline.get(ws, id=pipeline_id) pipeline.disable()
  • 61. Launch pipeline from Azure Data Factory
  • 63. Data Analyst Data Scientist Data Lake Storage Raw data copy Clean data archiving Deployment Power BI ML Service SQL DB / DWH What I do for my client azureml SDK
  • 64. What I will do soon Data Analyst Data Scientist Data Lake Storage Raw data copy Deployment Power BI ML Service
  • 65. My opinion about deploying with Azure ML • Strengths to build on : • Notebooks are everywhere (except in the heart of code purists), locally and in the cloud. • Services communicate (increasingly) with each other and are easily planned. • The configuration and deployment of the containers is possible through the interface but can also be automated through the code. • The story ends with visualizations in Power BI (JDBC connector to Databricks tables)
  • 66. My opinion about deploying with Azure ML • Which would make it easier for us: • A Data Preparation tool with predefined connectors • Where is Azure Machine Learning Workbench ? • A better interface for designing ML pipelines, including code • Could be the Studio • A simple DEV / UAT / PROD separation • Have a look at Azure Devops (release pipeline) • A better mastery of Software Engineering by Data Scientists • Learn Scala ! • Or a better scalability of Python • Try koalas, dask, modin…
  • 67. Summary • Data Scientists can spend time on • exploring data • interpret models • Data Engineers have a full toolbox to industrialize ML • Are they really two distinct people ? Now, only sky cloud is the limit ! • Fully automated deployments • A/B testing scenario • Integrate feedback and loop ?
  • 68. And now, go further with… • AutomatedML • azureml-core on Databricks • azureml for R ? • control Azure ML with API • InterpretML Community