SlideShare a Scribd company logo
Scalable Automatic Machine
Learning with H2O
Parul Pandey
Data Science Evangelist, H2O.ai
What is H2O?
H2O.ai, the
company
H2O, the
platform
•
•
•
Founded in 2012
Advised by Stanford Professors Hastie, Tibshirani & Boyd
Headquarters: Mountain View, California, USA
•
•
•
Open Source Software (Apache 2.0 Licensed)
R, Python, Scala, Java and Web Interfaces
Distributed Machine Learning Algorithms for Big Data
H2OTools
H2O in Industry
Agenda
• H2O Platform
• Automatic Machine Learning (AutoML)
• H2O AutoML Overview
• Demo
H2O Platform
H2O Machine Learning Platform
• Open source, distributed (multi-core + multi-node)
implementations of cutting edge ML algorithms.
• Core algorithms written in high performance Java.
• APIs available in R, Python, Scala; web GUI.
• Easily deploy models to production as pure Java code.
• Works on Hadoop, Spark, AWS, your laptop, etc.
H2O Machine Learning Features
• Supervised & unsupervised machine learning algos
(GBM, RF,DNN, GLM, Stacked Ensembles, etc.)
• Imputation, normalization & auto one-hot-encoding
• Automatic early stopping
• Cross-validation, grid search & random search
• Variable importance, model evaluation metrics, plots
Intro to A utomatic
Machine Learning
Aspects of Automatic Machine Learning
Data Prep
Model
Generation
Ensembles
H2O’s Auto ML
Scalable Automatic Machine Learning with H2O
H2O AutoML
• Basic data pre-processing (as in all H2O algos).
• Trains a Random grid of algorithms like GBMs, DNNs, GLMs,
etc. using a carefully chosen hyper-parameter space.
• Individual models are tuned using cross-validation.
• Two Stacked Ensembles are trained (“All Models” ensemble
& a lightweight “Best of Family” ensemble).
• Returns a sorted “Leaderboard” of all models.
• All models can be easily exported to production.
https://www.h2o.ai/blog/a-deep-dive-into-h2os-automl/
Random G r id Search & Stacking
• Random Grid Search combined with Stacked Ensembles
is a powerful combination.
• Ensembles perform particularly well if the models they are
based on (1) are individually strong, and (2) make
uncorrelated errors.
• Stacking usesa second-level metalearning algorithm to find the
optimal combination of base learners.
Who is it for?
H 2 O A utoML in R
H2O AutoML in Python
H 2 O A utoML in Flow GUI
H 2 O A utoML Leaderboard
Example
Leaderboard for
binary classification
H2O Auto ML Tutorial
Learn H2O AutoML!
• Docs: https://tinyurl.com/h2o-automl-docs
• R& Py tutorials:https://tinyurl.com/h2o-automl-tutorials
• Blog: A Deep dive into H2O’s AutoML
H2O Resources
• Documentation: http://docs.h2o.ai
• Tutorials: https://github.com/h2oai/h2o-tutorials
• Slidedecks: https://github.com/h2oai/h2o-meetups
• Videos: https://www.youtube.com/user/0xdata
• Stack Overflow: https://stackoverflow.com/tags/h2o
• Google Group: https://tinyurl.com/h2ostream
• Gitter: http://gitter.im/h2oai/h2o-3
• Events & Meetups: http://h2o.ai/events
Contribute to H2O!
Get in touch over email, Gitter or JIRA.
https://github.com/h2oai/h2o-3/blob/master/CONTRIBUTING.md
Thank you!

More Related Content

PDF
Scalable Automatic Machine Learning in H2O
PDF
Planning and Learning with Tabular Methods
PDF
Fast qPCR assay optimization and validation techniques for HTS
PPT
Spike sorting-tutorial
PPTX
Bioinformatics .pptx
PDF
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
PPSX
Prototype-based models in machine learning
PDF
NGS: Mapping and de novo assembly
Scalable Automatic Machine Learning in H2O
Planning and Learning with Tabular Methods
Fast qPCR assay optimization and validation techniques for HTS
Spike sorting-tutorial
Bioinformatics .pptx
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Prototype-based models in machine learning
NGS: Mapping and de novo assembly

Similar to Scalable Automatic Machine Learning with H2O (20)

PDF
Open Platform for AI & ML modeling
PDF
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
PDF
Erin LeDell, H2O.ai - Scalable Automatic Machine Learning - H2O World San Fra...
PDF
High Performance Machine Learning in R with H2O
PDF
Scalable Automatic Machine Learning with H2O” by Erin LeDell, Chief Machine L...
PDF
New Developments in H2O: April 2017 Edition
PDF
Machine Learning With H2O vs SparkML
PDF
Intro to Machine Learning with H2O and AWS
PPTX
Spark meetup feb 2016
PPTX
Neev Open Source Contributions
PPTX
Basic Application Performance Optimization Techniques (Backend)
PDF
Scalable Machine Learning in R and Python with H2O
PPTX
H2O Core Introduction
PDF
Scalable and Automatic Machine Learning with H2O
PDF
Apache Drill (ver. 0.2)
PPTX
Containerdays Intro to Habitat
PDF
Lessons learned from building Eclipse-based add-ons for commercial modeling t...
PPTX
Modern Web-site Development Pipeline
PPTX
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
PDF
Intro to H2O Machine Learning in Python - Galvanize Seattle
Open Platform for AI & ML modeling
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Erin LeDell, H2O.ai - Scalable Automatic Machine Learning - H2O World San Fra...
High Performance Machine Learning in R with H2O
Scalable Automatic Machine Learning with H2O” by Erin LeDell, Chief Machine L...
New Developments in H2O: April 2017 Edition
Machine Learning With H2O vs SparkML
Intro to Machine Learning with H2O and AWS
Spark meetup feb 2016
Neev Open Source Contributions
Basic Application Performance Optimization Techniques (Backend)
Scalable Machine Learning in R and Python with H2O
H2O Core Introduction
Scalable and Automatic Machine Learning with H2O
Apache Drill (ver. 0.2)
Containerdays Intro to Habitat
Lessons learned from building Eclipse-based add-ons for commercial modeling t...
Modern Web-site Development Pipeline
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Intro to H2O Machine Learning in Python - Galvanize Seattle
Ad

More from Sri Ambati (20)

PDF
H2O Label Genie Starter Track - Support Presentation
PDF
H2O.ai Agents : From Theory to Practice - Support Presentation
PDF
H2O Generative AI Starter Track - Support Presentation Slides.pdf
PDF
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
PDF
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
PDF
Intro to Enterprise h2oGPTe Presentation Slides
PDF
Enterprise h2o GPTe Learning Path Slide Deck
PDF
H2O Wave Course Starter - Presentation Slides
PDF
Large Language Models (LLMs) - Level 3 Slides
PDF
Data Science and Machine Learning Platforms (2024) Slides
PDF
Data Prep for H2O Driverless AI - Slides
PDF
H2O Cloud AI Developer Services - Slides (2024)
PDF
LLM Learning Path Level 2 - Presentation Slides
PDF
LLM Learning Path Level 1 - Presentation Slides
PDF
Hydrogen Torch - Starter Course - Presentation Slides
PDF
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
PDF
H2O Driverless AI Starter Course - Slides and Assignments
PPTX
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
PDF
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
PPTX
Generative AI Masterclass - Model Risk Management.pptx
H2O Label Genie Starter Track - Support Presentation
H2O.ai Agents : From Theory to Practice - Support Presentation
H2O Generative AI Starter Track - Support Presentation Slides.pdf
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Intro to Enterprise h2oGPTe Presentation Slides
Enterprise h2o GPTe Learning Path Slide Deck
H2O Wave Course Starter - Presentation Slides
Large Language Models (LLMs) - Level 3 Slides
Data Science and Machine Learning Platforms (2024) Slides
Data Prep for H2O Driverless AI - Slides
H2O Cloud AI Developer Services - Slides (2024)
LLM Learning Path Level 2 - Presentation Slides
LLM Learning Path Level 1 - Presentation Slides
Hydrogen Torch - Starter Course - Presentation Slides
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
H2O Driverless AI Starter Course - Slides and Assignments
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Generative AI Masterclass - Model Risk Management.pptx
Ad

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Cloud computing and distributed systems.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Approach and Philosophy of On baking technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation theory and applications.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Cloud computing and distributed systems.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Machine learning based COVID-19 study performance prediction
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Big Data Technologies - Introduction.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Reach Out and Touch Someone: Haptics and Empathic Computing
Understanding_Digital_Forensics_Presentation.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Advanced methodologies resolving dimensionality complications for autism neur...
Approach and Philosophy of On baking technology
The AUB Centre for AI in Media Proposal.docx
Per capita expenditure prediction using model stacking based on satellite ima...
20250228 LYD VKU AI Blended-Learning.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

Scalable Automatic Machine Learning with H2O

  • 1. Scalable Automatic Machine Learning with H2O Parul Pandey Data Science Evangelist, H2O.ai
  • 2. What is H2O? H2O.ai, the company H2O, the platform • • • Founded in 2012 Advised by Stanford Professors Hastie, Tibshirani & Boyd Headquarters: Mountain View, California, USA • • • Open Source Software (Apache 2.0 Licensed) R, Python, Scala, Java and Web Interfaces Distributed Machine Learning Algorithms for Big Data
  • 5. Agenda • H2O Platform • Automatic Machine Learning (AutoML) • H2O AutoML Overview • Demo
  • 7. H2O Machine Learning Platform • Open source, distributed (multi-core + multi-node) implementations of cutting edge ML algorithms. • Core algorithms written in high performance Java. • APIs available in R, Python, Scala; web GUI. • Easily deploy models to production as pure Java code. • Works on Hadoop, Spark, AWS, your laptop, etc.
  • 8. H2O Machine Learning Features • Supervised & unsupervised machine learning algos (GBM, RF,DNN, GLM, Stacked Ensembles, etc.) • Imputation, normalization & auto one-hot-encoding • Automatic early stopping • Cross-validation, grid search & random search • Variable importance, model evaluation metrics, plots
  • 9. Intro to A utomatic Machine Learning
  • 10. Aspects of Automatic Machine Learning Data Prep Model Generation Ensembles
  • 13. H2O AutoML • Basic data pre-processing (as in all H2O algos). • Trains a Random grid of algorithms like GBMs, DNNs, GLMs, etc. using a carefully chosen hyper-parameter space. • Individual models are tuned using cross-validation. • Two Stacked Ensembles are trained (“All Models” ensemble & a lightweight “Best of Family” ensemble). • Returns a sorted “Leaderboard” of all models. • All models can be easily exported to production.
  • 15. Random G r id Search & Stacking • Random Grid Search combined with Stacked Ensembles is a powerful combination. • Ensembles perform particularly well if the models they are based on (1) are individually strong, and (2) make uncorrelated errors. • Stacking usesa second-level metalearning algorithm to find the optimal combination of base learners.
  • 16. Who is it for?
  • 17. H 2 O A utoML in R
  • 18. H2O AutoML in Python
  • 19. H 2 O A utoML in Flow GUI
  • 20. H 2 O A utoML Leaderboard Example Leaderboard for binary classification
  • 21. H2O Auto ML Tutorial
  • 22. Learn H2O AutoML! • Docs: https://tinyurl.com/h2o-automl-docs • R& Py tutorials:https://tinyurl.com/h2o-automl-tutorials • Blog: A Deep dive into H2O’s AutoML
  • 23. H2O Resources • Documentation: http://docs.h2o.ai • Tutorials: https://github.com/h2oai/h2o-tutorials • Slidedecks: https://github.com/h2oai/h2o-meetups • Videos: https://www.youtube.com/user/0xdata • Stack Overflow: https://stackoverflow.com/tags/h2o • Google Group: https://tinyurl.com/h2ostream • Gitter: http://gitter.im/h2oai/h2o-3 • Events & Meetups: http://h2o.ai/events
  • 24. Contribute to H2O! Get in touch over email, Gitter or JIRA. https://github.com/h2oai/h2o-3/blob/master/CONTRIBUTING.md