SlideShare a Scribd company logo
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Machine Learning Scientist
@ledell / erin@h2o.ai
Erin LeDell
Automatic Machine Learning
AutoML
MEET
THE MAKERS
ERIN LEDELL
Machine Learning Scientist
NAVDEEP GILL
Software Engineer

& Data Scientist
RAY PECK
Director of Product
Engineering
• Intro to Automatic Machine Learning (AutoML)
• Random Grid Search & Stacked Ensembles
• H2O’s AutoML (R, Python, GUI)
• H2O-3 Roadmap
• Hands-on Tutorial
Agenda
Intro to AutoML
Automatic Machine Learning
Aspects of Automatic ML
Model

Generation
EnsemblesData Prep
Data Prep
• Imputation of missing data
• Standardization of numeric features
• One-hot encoding of categorical features
• Count/Label/Target encoding of categorical features
• Feature selection and/or feature extraction (e.g. PCA)
• Feature engineering
Model Generation
• Cartesian grid search
• Random grid search
• Tune individual models via Early Stopping
• Bayesian Hyperparameter Optimization
Ensembles
• Bagging / Averaging
• Stacking / Super Learning
• Ensemble Selection
Random Stacking
Random Grids + Stacked Ensembles
Stacked Ensembles
• Specify L base learners (with model params).
• Specify a metalearner (just another algo).
• Perform k-fold cross-validation on the base learners.
Stacked Ensembles
• Collect cross-validated predicted values from base
learners.
• Train a second-level metalearning algorithm to find
the optimal combination of base learners.
• Metalearner requires only a small amount of compute
on top of the cross-validation process (it’s cheap).
Random Grid Search + Stacking
• Random Grid Search combined with Stacked
Ensembles is a powerful combination.
• Stacked Ensembles perform particularly well if the
models they are based on (1) are individually
strong, and (2) make uncorrelated errors.
• Random Grid Search is an excellent way to create
a diverse of models for the ensemble.
H2O AutoML
Automatic Machine Learning in H2O
H2O Machine Learning Platform
• Distributed (multi-core + multi-node) implementations of
cutting edge ML algorithms.
• Core algorithms written in high performance Java.
• APIs available in R, Python, Scala & web GUI.
• Works on Hadoop, Spark, EC2, your laptop, etc.
• Easily deploy models to production as pure Java code.
H2O AutoML (first cut)
• Imputation, one-hot encoding, standardization.
• Random Grid Search over a custom hyperparameter
space, defined by expert data scientists.
• Early stopping of individual models and random grids.
• GBMs, Random Forests, Deep Neural Nets, GLMs
• Multiple Stacked Ensembles of models.
• Leaderboard for ranking.
H2O AutoML in R
library(h2o)
h2o.init()



train <- h2o.importFile("train.csv")


aml <- h2o.automl(y = "response_colname", 

training_frame = train,

max_runtime_secs = 600)
lb <- aml@leaderboard

H2O AutoML in Python
import h2o
from h2o.automl import H2OAutoML
h2o.init()



train = h2o.import_file("train.csv")



aml = H2OAutoML(max_runtime_secs = 600)

aml.train(y = "response_colname", 

training_frame = train)



lb = aml.leaderboard
H2O AutoML in Flow
Example Leaderboard for binary classification
H2O AutoML Leaderboard
H2O-3 Roadmap
Coming Soon to H2O
Feature Q1 Q2
New Algorithm: Cox-Proportional Hazards
GLM: Ordinal Regression
GBM: Quasibinomial
NLP Improvements, TF-IDF
Stacked Ensemble: Custom Metalearner
AutoML: New Ensembles
AutoML: Add XGBoost
Distributed XGBoost
New Algorithm: Factorization Machines
H2O-3 Roadmap
https://tinyurl.com/h2o-automl-jira
• Documentation: http://docs.h2o.ai
• Tutorials: https://github.com/h2oai/h2o-tutorials
• Slidedecks: https://github.com/h2oai/h2o-meetups
• Videos: https://www.youtube.com/user/0xdata
• Events & Meetups: http://h2o.ai/events
• Stack Overflow: https://stackoverflow.com/tags/h2o
• Google Group: https://tinyurl.com/h2ostream
• Gitter: http://gitter.im/h2oai/h2o-3
Hands-on Tutorial
DEMO
First-time Qwiklab Account Setup
• Go to http://h2oai.qwiklab.com
• Click on “JOIN”
• Create a new account with a valid email address
• You will receive a confirmation email
• Click on the link in the confirmation email
• Go back to http://h2oai.qwiklab.com and log in
• Go to the Catalog on the left bar
• Choose “Introduction to AutoML in H2O”
• Wait for instructions
https://tinyurl.com/automl-h2oworld17
Code and data available here
H2O AutoML Tutorial

More Related Content

PDF
Scalable Automatic Machine Learning in H2O
PDF
Cheat sheets for AI
PPTX
Automated Machine Learning (Auto ML)
PDF
The Evolution of AutoML
PDF
Python matplotlib cheat_sheet
PDF
Machine Learning and Data Mining: 16 Classifiers Ensembles
PDF
Bayesian learning
PPTX
Ensemble methods
Scalable Automatic Machine Learning in H2O
Cheat sheets for AI
Automated Machine Learning (Auto ML)
The Evolution of AutoML
Python matplotlib cheat_sheet
Machine Learning and Data Mining: 16 Classifiers Ensembles
Bayesian learning
Ensemble methods

What's hot (20)

PDF
AutoML - The Future of AI
PDF
Feature selection
PPTX
Neural Networks
PDF
Automatic machine learning (AutoML) 101
PDF
Hyperparameter Optimization for Machine Learning
PDF
Understanding Bagging and Boosting
PDF
Intepretability / Explainable AI for Deep Neural Networks
PDF
Machine Learning Interpretability
PDF
Machine Learning Algorithms
PDF
Expert System With Python -1
PDF
Feature Engineering
PDF
pandas - Python Data Analysis
PDF
Auto-Train a Time-Series Forecast Model With AML + ADB
PDF
Deep Dive into Hyperparameter Tuning
PDF
Spark SQL
PPTX
Boosting Approach to Solving Machine Learning Problems
PPTX
Lecture 6: Ensemble Methods
PDF
Big Query Basics
PPTX
Introduction to Machine Learning
PPTX
Real-Time Data Flows with Apache NiFi
AutoML - The Future of AI
Feature selection
Neural Networks
Automatic machine learning (AutoML) 101
Hyperparameter Optimization for Machine Learning
Understanding Bagging and Boosting
Intepretability / Explainable AI for Deep Neural Networks
Machine Learning Interpretability
Machine Learning Algorithms
Expert System With Python -1
Feature Engineering
pandas - Python Data Analysis
Auto-Train a Time-Series Forecast Model With AML + ADB
Deep Dive into Hyperparameter Tuning
Spark SQL
Boosting Approach to Solving Machine Learning Problems
Lecture 6: Ensemble Methods
Big Query Basics
Introduction to Machine Learning
Real-Time Data Flows with Apache NiFi
Ad

Similar to Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai (20)

PDF
Open Platform for AI & ML modeling
PDF
Scalable Automatic Machine Learning with H2O
PPTX
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
PDF
Scalable Data Science in Python and R on Apache Spark
PDF
Scalable Automatic Machine Learning with H2O” by Erin LeDell, Chief Machine L...
PDF
Building a SIMD Supported Vectorized Native Engine for Spark SQL
PDF
Low Latency Polyglot Model Scoring using Apache Apex
PDF
Spark + H20 = Machine Learning at scale
PPTX
R4ML: An R Based Scalable Machine Learning Framework
PDF
Hadoop spark online demo
PDF
Scalable Automatic Machine Learning in H2O
PDF
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
PPTX
Automate Machine Learning Pipeline Using MLBox
PDF
Infrastructure Challenges in Scaling RAG with Custom AI models
PDF
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
PDF
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
PPTX
Building Machine Learning Inference Pipelines at Scale (July 2019)
PDF
Reproducible AI using MLflow and PyTorch
PDF
Scalable AutoML for Time Series Forecasting using Ray
PPTX
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Open Platform for AI & ML modeling
Scalable Automatic Machine Learning with H2O
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Scalable Data Science in Python and R on Apache Spark
Scalable Automatic Machine Learning with H2O” by Erin LeDell, Chief Machine L...
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Low Latency Polyglot Model Scoring using Apache Apex
Spark + H20 = Machine Learning at scale
R4ML: An R Based Scalable Machine Learning Framework
Hadoop spark online demo
Scalable Automatic Machine Learning in H2O
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
Automate Machine Learning Pipeline Using MLBox
Infrastructure Challenges in Scaling RAG with Custom AI models
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Building Machine Learning Inference Pipelines at Scale (July 2019)
Reproducible AI using MLflow and PyTorch
Scalable AutoML for Time Series Forecasting using Ray
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Ad

More from Sri Ambati (20)

PDF
H2O Label Genie Starter Track - Support Presentation
PDF
H2O.ai Agents : From Theory to Practice - Support Presentation
PDF
H2O Generative AI Starter Track - Support Presentation Slides.pdf
PDF
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
PDF
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
PDF
Intro to Enterprise h2oGPTe Presentation Slides
PDF
Enterprise h2o GPTe Learning Path Slide Deck
PDF
H2O Wave Course Starter - Presentation Slides
PDF
Large Language Models (LLMs) - Level 3 Slides
PDF
Data Science and Machine Learning Platforms (2024) Slides
PDF
Data Prep for H2O Driverless AI - Slides
PDF
H2O Cloud AI Developer Services - Slides (2024)
PDF
LLM Learning Path Level 2 - Presentation Slides
PDF
LLM Learning Path Level 1 - Presentation Slides
PDF
Hydrogen Torch - Starter Course - Presentation Slides
PDF
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
PDF
H2O Driverless AI Starter Course - Slides and Assignments
PPTX
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
PDF
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
PPTX
Generative AI Masterclass - Model Risk Management.pptx
H2O Label Genie Starter Track - Support Presentation
H2O.ai Agents : From Theory to Practice - Support Presentation
H2O Generative AI Starter Track - Support Presentation Slides.pdf
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Intro to Enterprise h2oGPTe Presentation Slides
Enterprise h2o GPTe Learning Path Slide Deck
H2O Wave Course Starter - Presentation Slides
Large Language Models (LLMs) - Level 3 Slides
Data Science and Machine Learning Platforms (2024) Slides
Data Prep for H2O Driverless AI - Slides
H2O Cloud AI Developer Services - Slides (2024)
LLM Learning Path Level 2 - Presentation Slides
LLM Learning Path Level 1 - Presentation Slides
Hydrogen Torch - Starter Course - Presentation Slides
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
H2O Driverless AI Starter Course - Slides and Assignments
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Generative AI Masterclass - Model Risk Management.pptx

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Machine learning based COVID-19 study performance prediction
PPT
Teaching material agriculture food technology
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
A Presentation on Artificial Intelligence
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Approach and Philosophy of On baking technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
sap open course for s4hana steps from ECC to s4
Machine learning based COVID-19 study performance prediction
Teaching material agriculture food technology
Assigned Numbers - 2025 - Bluetooth® Document
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
The AUB Centre for AI in Media Proposal.docx
Building Integrated photovoltaic BIPV_UPV.pdf
Network Security Unit 5.pdf for BCA BBA.
The Rise and Fall of 3GPP – Time for a Sabbatical?
Per capita expenditure prediction using model stacking based on satellite ima...
A Presentation on Artificial Intelligence
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Electronic commerce courselecture one. Pdf
Approach and Philosophy of On baking technology
“AI and Expert System Decision Support & Business Intelligence Systems”
Review of recent advances in non-invasive hemoglobin estimation
Advanced methodologies resolving dimensionality complications for autism neur...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Mobile App Security Testing_ A Comprehensive Guide.pdf

Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai

  • 2. Machine Learning Scientist @ledell / erin@h2o.ai Erin LeDell
  • 4. MEET THE MAKERS ERIN LEDELL Machine Learning Scientist NAVDEEP GILL Software Engineer
 & Data Scientist RAY PECK Director of Product Engineering
  • 5. • Intro to Automatic Machine Learning (AutoML) • Random Grid Search & Stacked Ensembles • H2O’s AutoML (R, Python, GUI) • H2O-3 Roadmap • Hands-on Tutorial Agenda
  • 6. Intro to AutoML Automatic Machine Learning
  • 7. Aspects of Automatic ML Model
 Generation EnsemblesData Prep
  • 8. Data Prep • Imputation of missing data • Standardization of numeric features • One-hot encoding of categorical features • Count/Label/Target encoding of categorical features • Feature selection and/or feature extraction (e.g. PCA) • Feature engineering
  • 9. Model Generation • Cartesian grid search • Random grid search • Tune individual models via Early Stopping • Bayesian Hyperparameter Optimization
  • 10. Ensembles • Bagging / Averaging • Stacking / Super Learning • Ensemble Selection
  • 11. Random Stacking Random Grids + Stacked Ensembles
  • 12. Stacked Ensembles • Specify L base learners (with model params). • Specify a metalearner (just another algo). • Perform k-fold cross-validation on the base learners.
  • 13. Stacked Ensembles • Collect cross-validated predicted values from base learners. • Train a second-level metalearning algorithm to find the optimal combination of base learners. • Metalearner requires only a small amount of compute on top of the cross-validation process (it’s cheap).
  • 14. Random Grid Search + Stacking • Random Grid Search combined with Stacked Ensembles is a powerful combination. • Stacked Ensembles perform particularly well if the models they are based on (1) are individually strong, and (2) make uncorrelated errors. • Random Grid Search is an excellent way to create a diverse of models for the ensemble.
  • 15. H2O AutoML Automatic Machine Learning in H2O
  • 16. H2O Machine Learning Platform • Distributed (multi-core + multi-node) implementations of cutting edge ML algorithms. • Core algorithms written in high performance Java. • APIs available in R, Python, Scala & web GUI. • Works on Hadoop, Spark, EC2, your laptop, etc. • Easily deploy models to production as pure Java code.
  • 17. H2O AutoML (first cut) • Imputation, one-hot encoding, standardization. • Random Grid Search over a custom hyperparameter space, defined by expert data scientists. • Early stopping of individual models and random grids. • GBMs, Random Forests, Deep Neural Nets, GLMs • Multiple Stacked Ensembles of models. • Leaderboard for ranking.
  • 18. H2O AutoML in R library(h2o) h2o.init()
 
 train <- h2o.importFile("train.csv") 
 aml <- h2o.automl(y = "response_colname", 
 training_frame = train,
 max_runtime_secs = 600) lb <- aml@leaderboard

  • 19. H2O AutoML in Python import h2o from h2o.automl import H2OAutoML h2o.init()
 
 train = h2o.import_file("train.csv")
 
 aml = H2OAutoML(max_runtime_secs = 600)
 aml.train(y = "response_colname", 
 training_frame = train)
 
 lb = aml.leaderboard
  • 21. Example Leaderboard for binary classification H2O AutoML Leaderboard
  • 23. Feature Q1 Q2 New Algorithm: Cox-Proportional Hazards GLM: Ordinal Regression GBM: Quasibinomial NLP Improvements, TF-IDF Stacked Ensemble: Custom Metalearner AutoML: New Ensembles AutoML: Add XGBoost Distributed XGBoost New Algorithm: Factorization Machines H2O-3 Roadmap https://tinyurl.com/h2o-automl-jira
  • 24. • Documentation: http://docs.h2o.ai • Tutorials: https://github.com/h2oai/h2o-tutorials • Slidedecks: https://github.com/h2oai/h2o-meetups • Videos: https://www.youtube.com/user/0xdata • Events & Meetups: http://h2o.ai/events • Stack Overflow: https://stackoverflow.com/tags/h2o • Google Group: https://tinyurl.com/h2ostream • Gitter: http://gitter.im/h2oai/h2o-3
  • 26. First-time Qwiklab Account Setup • Go to http://h2oai.qwiklab.com • Click on “JOIN” • Create a new account with a valid email address • You will receive a confirmation email • Click on the link in the confirmation email • Go back to http://h2oai.qwiklab.com and log in • Go to the Catalog on the left bar • Choose “Introduction to AutoML in H2O” • Wait for instructions