SlideShare a Scribd company logo
AzureML – Zero to Hero
Govind Kanshi
MTC Bangalore
2nd August 2014
What we will cover
• AzureML-
• What it enables
• Examples
• Upload data/understand explore it
• Develop model/evaluate it/deploy it
What this discussion is not about
• Data Science/Big Data defn/use etc
• ML Advanced topics
• Feature Engineering – which features are useful/cleaning/dropping
• For PCA kind of work – use R today
• Individual algorithm discussion/deep dive.
• Model tuning(Parameter sweep) or other techniques – boosting/bagging
• Overcoming Data vagaries
What you should walk out with
• Excitement and confidence that ML with AzureML is doable by all of
us as long as we are curious and patient.
• AzureML is democratized platform for learning from data ensuring
better informed decisions. It helps to bring sophisticated algorithms
and mechanisms in easy to use way for masses and high end
researchers today.
What are we trying to do
• Learn from existing Data to do prediction on data
• Classification – Put labels
• Regression - price,
• Recommendation – Rank choices
• Examples – classify different behavior, price,recommend, find anamoly
• Explore data form natural groupings based on some distance formula
• Clustering
Demo
• Deployed model for public dataset to classify if person has diabetes
• Deployed model to predict Decibels of noise
• How old is this stuff term “regression ” firstly appears in the Galton´s (1822-
1911) biological works.
• Y = a_1 * X_1 + ... + a_n * X_n...
• Solve for ...
What did we see
• Exposed Web service in Raw format to do prediction as request-
response
Demo
• Walkthrough of the model creation for Classification
• Possibly choose another algorithm to compare/evaluate
What did we see
AzureML studio – Experiments/Datasets/Web services
Web Services – RR or Batch mode
Algorithms – Classification, Regression, Recommendation, Ranking
Data – Ingestion, cleansing, massaging,
R Integration
Dataset/Experiments are immutable – new versions can be deployed
What did we do(typical AzureML path)
• Define the goal – regression or classification or recommendation
• Create a model and train it using dataset
• Get data –
• Cleanup the data or replace missing data if required
• Use the appropriate algorithm/train it
• Score the model with test data
• Looked at the algorithm parameters
• Evaluate Model using metrics
• Add more algorithms to compare
• Deploy Model as webservice for request-response mechanism
• What about batch – yes you can.
• Data exploration – visualization of data/results
Evaluate Models – summary(classification)
• Confusion Matrix
• Precision - (TP / (TP+FP) )
• Recall - (TP / (TP + FN))
• F1-score
• ROC curve + AUC - Area under ROC curve
Actual  Predicted class yes no
yes True positive (TP) False negative (FN)
no False positive (FP) True negative (TN)
Issues to think about
• Cleaning/choosing right data points
• Missing data/transforming data/dropping data/relationship between features
• Evaluating the algorithm, comparing, tuning the parameters,
relearning
• Which algorithm to choose(Boolean classification vs 10 class vs
ranking), Data has many attributes 1000s to 5 digits, vs very less data
or very sparse/noisy data
• What loss function, hyper parameter to aim for
• Explain the output – black box vs decision trees
• Online/Active Learning
Machine Learning Resources
• Coursera Machine Learning class
https://www.coursera.org/course/ml
• Access to AzureML – it is in preview
• http://www.youtube.com/watch?v=wjTJVhmu1JM
• Draft of Alex Smola and Vishy book on ML: http://alex.smola.org/drafts/thebook.pdf
• Elements of Statistical Learning – Hastie, Tibshirani et al: http://www-stat.stanford.edu/~tibs/ElemStatLearn/
• Information Theory, Inference, and Learning Algos – David Mackay: http://www.inference.phy.cam.ac.uk/mackay/itila/
• Datasets - http://archive.ics.uci.edu/ml/datasets.html
• Official AzureML – tutorials/Video walkthroughs - https://azure.microsoft.com/en-us/documentation/services/machine-learning/
Advanced topics
• Other topics
• How to use various input data cleanup procedures(dropping/adding/correlated features)
• How to publish Web service to Azure Market Place($) - https://azure.microsoft.com/en-us/documentation/articles/machine-learning-publish-web-
service-to-azure-marketplace/
• How do you version assets/”dag”
• Techniques to overcome vagaries of data
• Stratification- sampling for training and testing within classes to overcome issues in data samples
representation
• k-fold CV - data is split randomly into k subsets + each subset is used for testing and the remainder for
training. This is repeated and results averaged. CV uses sampling without replacement.
• Bootstraping - uses sampling with replacement to form the training set.
• Increasing performance of Model
• Bagging - Combining predictions by voting or averaging (for numeric prediction).
• Boosting - Uses voting/averaging but models are weighted according to their performance.
• Parameter sweeping
• Regularization parameter handling – Penalty for overfitting
• Understanding the algorithm performance/visualization of the algorithm path when possible.
• Associated statistics(confidence/distributions)

More Related Content

PDF
Machine learning systems for engineers
PPTX
Top 10 Data Science Practioner Pitfalls - Mark Landry
PPTX
Automated Machine Learning
PDF
Automatic machine learning (AutoML) 101
PDF
H2O World - Ensembles with Erin LeDell
PDF
Azure Machine Learning
PDF
Automated Machine Learning
PPTX
Introduction to Azure Machine Learning
Machine learning systems for engineers
Top 10 Data Science Practioner Pitfalls - Mark Landry
Automated Machine Learning
Automatic machine learning (AutoML) 101
H2O World - Ensembles with Erin LeDell
Azure Machine Learning
Automated Machine Learning
Introduction to Azure Machine Learning

What's hot (20)

PDF
Modern Machine Learning Infrastructure and Practices
PPTX
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
PPTX
Microsoft Introduction to Automated Machine Learning
PDF
Using H2O AutoML for Kaggle Competitions
PDF
GLM & GBM in H2O
PDF
MLconf seattle 2015 presentation
PPTX
Machine learning 101 dkom 2017
PDF
The Power of Auto ML and How Does it Work
PPTX
Microsoft azure machine learning
PPTX
Machine Learning for .NET Developers - ADC21
PPTX
Azure Machine Learning 101
PDF
Making Data Science Scalable - 5 Lessons Learned
PPTX
A Beginner's Guide to Machine Learning with Scikit-Learn
PPTX
Top 10 Data Science Practitioner Pitfalls
PPTX
Machine learning 101 sit hvr
PPTX
2015 Data Science Summit @ dato Review
PPTX
Machine Learning Fundamentals
PDF
Data Workflows for Machine Learning - Seattle DAML
PDF
Building a modern data platform with scala, akka, apache beam
PPTX
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
Modern Machine Learning Infrastructure and Practices
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Microsoft Introduction to Automated Machine Learning
Using H2O AutoML for Kaggle Competitions
GLM & GBM in H2O
MLconf seattle 2015 presentation
Machine learning 101 dkom 2017
The Power of Auto ML and How Does it Work
Microsoft azure machine learning
Machine Learning for .NET Developers - ADC21
Azure Machine Learning 101
Making Data Science Scalable - 5 Lessons Learned
A Beginner's Guide to Machine Learning with Scikit-Learn
Top 10 Data Science Practitioner Pitfalls
Machine learning 101 sit hvr
2015 Data Science Summit @ dato Review
Machine Learning Fundamentals
Data Workflows for Machine Learning - Seattle DAML
Building a modern data platform with scala, akka, apache beam
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
Ad

Similar to AzureML – zero to hero (20)

PPTX
Azure machine learning tech mela
PPTX
Azure Machine Learning Challenge_Speakers Presentation.pptx
PPTX
Machine learning
PDF
The Machine Learning Workflow with Azure
PDF
I want my model to be deployed ! (another story of MLOps)
PDF
Azure Machine Learning and ML on Premises
PDF
Machine learning for IoT - unpacking the blackbox
PDF
Prepare your data for machine learning
PDF
The Data Science Process - Do we need it and how to apply?
PPTX
Data Science with Azure Machine Learning and  R
PPTX
Getting Started with Azure AutoML
PPTX
AzureML TechTalk
PDF
Azure Machine Learning tutorial
PDF
201906 04 Overview of Automated ML June 2019
PPTX
An introduction to azure machine learning
PPTX
Azure Machine Learning and its real-world use cases
PPTX
DF1 - ML - Petukhov - Azure Ml Machine Learning as a Service
PDF
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
PPTX
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
PPTX
Net campus2015 antimomusone
Azure machine learning tech mela
Azure Machine Learning Challenge_Speakers Presentation.pptx
Machine learning
The Machine Learning Workflow with Azure
I want my model to be deployed ! (another story of MLOps)
Azure Machine Learning and ML on Premises
Machine learning for IoT - unpacking the blackbox
Prepare your data for machine learning
The Data Science Process - Do we need it and how to apply?
Data Science with Azure Machine Learning and  R
Getting Started with Azure AutoML
AzureML TechTalk
Azure Machine Learning tutorial
201906 04 Overview of Automated ML June 2019
An introduction to azure machine learning
Azure Machine Learning and its real-world use cases
DF1 - ML - Petukhov - Azure Ml Machine Learning as a Service
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
PREDICT THE FUTURE , MACHINE LEARNING & BIG DATA
Net campus2015 antimomusone
Ad

Recently uploaded (20)

PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Computer network topology notes for revision
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Foundation of Data Science unit number two notes
PPT
Quality review (1)_presentation of this 21
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
STUDY DESIGN details- Lt Col Maksud (21).pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Computer network topology notes for revision
Moving the Public Sector (Government) to a Digital Adoption
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Reliability_Chapter_ presentation 1221.5784
Galatica Smart Energy Infrastructure Startup Pitch Deck
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Acceptance and paychological effects of mandatory extra coach I classes.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Introduction to Knowledge Engineering Part 1
Supervised vs unsupervised machine learning algorithms
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Foundation of Data Science unit number two notes
Quality review (1)_presentation of this 21
.pdf is not working space design for the following data for the following dat...
Launch Your Data Science Career in Kochi – 2025
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf

AzureML – zero to hero

  • 1. AzureML – Zero to Hero Govind Kanshi MTC Bangalore 2nd August 2014
  • 2. What we will cover • AzureML- • What it enables • Examples • Upload data/understand explore it • Develop model/evaluate it/deploy it
  • 3. What this discussion is not about • Data Science/Big Data defn/use etc • ML Advanced topics • Feature Engineering – which features are useful/cleaning/dropping • For PCA kind of work – use R today • Individual algorithm discussion/deep dive. • Model tuning(Parameter sweep) or other techniques – boosting/bagging • Overcoming Data vagaries
  • 4. What you should walk out with • Excitement and confidence that ML with AzureML is doable by all of us as long as we are curious and patient. • AzureML is democratized platform for learning from data ensuring better informed decisions. It helps to bring sophisticated algorithms and mechanisms in easy to use way for masses and high end researchers today.
  • 5. What are we trying to do • Learn from existing Data to do prediction on data • Classification – Put labels • Regression - price, • Recommendation – Rank choices • Examples – classify different behavior, price,recommend, find anamoly • Explore data form natural groupings based on some distance formula • Clustering
  • 6. Demo • Deployed model for public dataset to classify if person has diabetes • Deployed model to predict Decibels of noise • How old is this stuff term “regression ” firstly appears in the Galton´s (1822- 1911) biological works. • Y = a_1 * X_1 + ... + a_n * X_n... • Solve for ...
  • 7. What did we see • Exposed Web service in Raw format to do prediction as request- response
  • 8. Demo • Walkthrough of the model creation for Classification • Possibly choose another algorithm to compare/evaluate
  • 9. What did we see AzureML studio – Experiments/Datasets/Web services Web Services – RR or Batch mode Algorithms – Classification, Regression, Recommendation, Ranking Data – Ingestion, cleansing, massaging, R Integration Dataset/Experiments are immutable – new versions can be deployed
  • 10. What did we do(typical AzureML path) • Define the goal – regression or classification or recommendation • Create a model and train it using dataset • Get data – • Cleanup the data or replace missing data if required • Use the appropriate algorithm/train it • Score the model with test data • Looked at the algorithm parameters • Evaluate Model using metrics • Add more algorithms to compare • Deploy Model as webservice for request-response mechanism • What about batch – yes you can. • Data exploration – visualization of data/results
  • 11. Evaluate Models – summary(classification) • Confusion Matrix • Precision - (TP / (TP+FP) ) • Recall - (TP / (TP + FN)) • F1-score • ROC curve + AUC - Area under ROC curve Actual Predicted class yes no yes True positive (TP) False negative (FN) no False positive (FP) True negative (TN)
  • 12. Issues to think about • Cleaning/choosing right data points • Missing data/transforming data/dropping data/relationship between features • Evaluating the algorithm, comparing, tuning the parameters, relearning • Which algorithm to choose(Boolean classification vs 10 class vs ranking), Data has many attributes 1000s to 5 digits, vs very less data or very sparse/noisy data • What loss function, hyper parameter to aim for • Explain the output – black box vs decision trees • Online/Active Learning
  • 13. Machine Learning Resources • Coursera Machine Learning class https://www.coursera.org/course/ml • Access to AzureML – it is in preview • http://www.youtube.com/watch?v=wjTJVhmu1JM • Draft of Alex Smola and Vishy book on ML: http://alex.smola.org/drafts/thebook.pdf • Elements of Statistical Learning – Hastie, Tibshirani et al: http://www-stat.stanford.edu/~tibs/ElemStatLearn/ • Information Theory, Inference, and Learning Algos – David Mackay: http://www.inference.phy.cam.ac.uk/mackay/itila/ • Datasets - http://archive.ics.uci.edu/ml/datasets.html • Official AzureML – tutorials/Video walkthroughs - https://azure.microsoft.com/en-us/documentation/services/machine-learning/
  • 14. Advanced topics • Other topics • How to use various input data cleanup procedures(dropping/adding/correlated features) • How to publish Web service to Azure Market Place($) - https://azure.microsoft.com/en-us/documentation/articles/machine-learning-publish-web- service-to-azure-marketplace/ • How do you version assets/”dag” • Techniques to overcome vagaries of data • Stratification- sampling for training and testing within classes to overcome issues in data samples representation • k-fold CV - data is split randomly into k subsets + each subset is used for testing and the remainder for training. This is repeated and results averaged. CV uses sampling without replacement. • Bootstraping - uses sampling with replacement to form the training set. • Increasing performance of Model • Bagging - Combining predictions by voting or averaging (for numeric prediction). • Boosting - Uses voting/averaging but models are weighted according to their performance. • Parameter sweeping • Regularization parameter handling – Penalty for overfitting • Understanding the algorithm performance/visualization of the algorithm path when possible. • Associated statistics(confidence/distributions)

Editor's Notes

  • #10: AzureML - where experiments are done and deployed as web services AzureML studio has “toolbar” which has modules for data ingestion/transformation, statistics, machine learning. Some of them have properties which can be set. AzureML has Datasets which can be bought in at runtime or persisted inside. It has public datasets too. AzureML
  • #12: Classification algorithms can be measured by these metrics Regression have just RMSE which many people are questioning in present circumstances (Sum through all instances (actual class value - predicted one)) Clustering has different mechanism and requires tests/re-runs to ensure grouped/clustered points have cohesion of somekind Types of classification errors often incur different costs. Total error = (FP+FN)/(TP+FP+TN+FN) Lift charts Sort instances by their predicted probability of being a true positive (TP). X axis is sample size and Y axis is number of true positives (TP). ROC curves (ROC means receiver operating characteristic, a term from signal processing) X axis shows %of false positives (FP) Y axis shows %of true positives (TP). Recall - precision (IR world- search world has these terms too ): Precision (retrieved relevant / total retrieved) = TP / (TP+FP) Recall (retrieved relevant / total relevant) = TP / (TP + FN)
  • #15: Desirables Model interpretation More visualization HMM Native Time series? Text analysis – IR integration