SlideShare a Scribd company logo
Copyright 2015 CATENATE Group – All rights reserved
H20 - Thirst for Machine Learning
Meetup Machine Learning/Data Science,
Rome, 15 March 2017
Gabriele Nocco, Senior Data Scientist
gabriele.nocco@catenate.com
Catenate s.r.l.
Copyright 2015 CATENATE Group – All rights reserved
● H2O Introduction
● GBM
● Demo
2
AGENDA
Copyright 2015 CATENATE Group – All rights reserved
● H2O Introduction
● GBM
● Demo
3
AGENDA
Copyright 2015 CATENATE Group – All rights reserved
H2O INTRODUCTION
H2O is an opensource in-memory Machine Learning engine. Java-based, it exposes
comfortable APIs in Java, Scala, Python and R. It also has a notebook-like user
interface called Flow.
The transversality of languages enables the access to the framework for many
different professional roles, from analysts to programmers, up to more “academic”
data scientists. So H2O can be a complete infrastructure, from the prototype model
to the engineering solution.
Copyright 2015 CATENATE Group – All rights reserved
H2O INTRODUCTION - GARTNER
In 2017, H2O.ai became a Visionary in
the Magic Quadrant for Data Science
Platforms:
STRENGTHS
● Market awareness
● Customer satisfaction
● Flexibility and scalability
CAUTIONS
● Data access and preparation
● High technical bar for use
● Visualization and data exploration
● Sales execution
https://www.gartner.com/doc/reprints?id=1-3TKPVG1&ct=170215&st=sb
Copyright 2015 CATENATE Group – All rights reserved
H2O INTRODUCTION - FEATURES
● H2O Eco-System Benefits:
○ Scalable to massive datasets on large clusters, fully parallelized
○ Low-latency Java (“POJO”) scoring code is auto-generated
○ Easy to deploy on Laptop, Server, Hadoop cluster, Spark cluster, HPC
○ APIs include R, Python, Flow, Scala, Java, Javascript, REST
● Regularization techniques: Dropout, L1/L2
● Early stopping, N-fold cross-validation, Grid search
● Handling of categorical, missing and sparse data
● Gaussian/Laplace/Poisson/Gamma/Tweedie regression with offsets, observation weights,
various loss functions
● Unsupervised mode for nonlinear dimensionality reduction, outlier detection
● File type allowed: csv, ORC, SVMLite, ARFF, XLS, XLSX, Avro, Parquet
Copyright 2015 CATENATE Group – All rights reserved
H2O INTRODUCTION - ALGORITHMS
Copyright 2015 CATENATE Group – All rights reserved
H2O INTRODUCTION - ARCHITECTURE
Copyright 2015 CATENATE Group – All rights reserved
H2O INTRODUCTION - ARCHITECTURE
Copyright 2015 CATENATE Group – All rights reserved
H2O has the ability to develop Deep Neural Networks natively, or through integration with
TensorFlow. It is now possible to produce very deep networks (5 to 1000 layers!) and it is
possible to handle huge amounts of data, in the order of GBs or TBs.
Another great advantage is the ability to exploit the potential of GPU to perform
computations.
H2O INTRODUCTION - H2O + TENSORFLOW
Copyright 2015 CATENATE Group – All rights reserved
With the release of
TensorFlow, H2O has
embraced the wave of
enthusiasm for the growth
of Deep Learning.
Thanks to Deep Water,
H2O allows us to interact
in a direct and simple way
with Deep Learning tools
like TensorFlow, MXNet
and Caffe.
H2O INTRODUCTION - H2O + TENSORFLOW
Copyright 2015 CATENATE Group – All rights reserved
H2O INTRODUCTION - ARCHITECTURE
Copyright 2015 CATENATE Group – All rights reserved
H2O INTRODUCTION - H2O + SPARK
One of the first plugin
developed in H2O was the
one for Apache Spark,
named Sparkling Water.
Binding to an opensource
project on the rise such as
Spark, with the power of
calculation that distributed
computing allows, has
been a great driving force
for the growth of H2O.
Copyright 2015 CATENATE Group – All rights reserved
A Sparkling Water
application runs like a job
that can be started with
spark-submit.
At this point the Spark
Master produces the DAG
and divides the execution
for each Worker, in which
the H2O libraries are
loaded in the Java process.
H2O INTRODUCTION - H2O + SPARK
Copyright 2015 CATENATE Group – All rights reserved
The Sparkling Water
solution is obviously
certificated for all the
Spark distributions:
Hortonworks, Cloudera,
MapR.
Databricks provides a
Spark cluster in cloud, and
H2O works perfectly in this
environment. H2O Rains
with Databricks Cloud!
H2O INTRODUCTION - H2O + SPARK
Copyright 2015 CATENATE Group – All rights reserved
● H2O Introduction
● GBM
● Demo
16
AGENDA
Copyright 2015 CATENATE Group – All rights reserved
Gradient Boosting Machine is one of the most powerful techniques to build predictive models. It can
be applied for classification or regression, so it’s a supervised algorithm.
This is one of the most diffused and used algorithm in the Kaggle community, performing better than
SVMs, Decision Trees and Neural Networks in a large number of cases.
https://www.quora.com/Why-does-Gradient-boosting-work-so-well-for-so-many-Kaggle-problems
GBM can be an optimal solution when the dimension of the dataset or the computing power doesn’t
allow to train a Deep Neural Network.
GBM
Gradient Boosting Machine
Copyright 2015 CATENATE Group – All rights reserved
Kaggle is the biggest platform for Machine Learning
contests in the world.
https://www.kaggle.com/
In the beginning of March 2017, Google announces
the acquisition of the Kaggle community.
GBM - KAGGLE
Copyright 2015 CATENATE Group – All rights reserved
GBM - ORIGIN OF BOOSTING IDEA
A weak learner is an algorithm whose performance is only marginally better than random
chance. Boosting was developed in the 1980s as the answer to the following question: “can
we combine many weak learners to create a very strong one?”
Boosting revolves around filtering observations, by focusing new learners on samples that
were difficult to classify by previous weak learners.
Using this idea, we can train a succession of weak learning methods, each one focused on
patterns that were misclassified previously.
Origin of Boosting Idea
Copyright 2015 CATENATE Group – All rights reserved
GBM - ADABOOST
The first algorithm to gain a large popularity in the boosting family was Adaptive
Boosting or AdaBoost for short. In the original formulation, weak learners are
made by decision trees with a single split, called decision stumps.
AdaBoost works by weighting the observations, and sampling the dataset at each
iteration with more emphasis on instances that are difficult to classify.
Sequentially, we add stumps trying to classify them better.
At every step, predictions are made by taking a majority vote of the weak
learners’ outputs, weighted by a measure of their individual accuracy.
The First Boosting Algorithm
Copyright 2015 CATENATE Group – All rights reserved
GBM - GRADIENT BOOSTING
In later years, it was realized that AdaBoost can be derived formally as the minimization
of a specific cost function with an exponential loss. This allowed to recast the algorithm
under a statistical framework.
Gradient Boosting Machines, later called just gradient boosting (or gradient tree boosting
when using trees), are the natural generalization of AdaBoost to handle boosting with any
loss function following a gradient descent procedure:
GBM = Boosting + Gradient descent
This class of algorithms remains stage-wise additive, since new learners are added
iteratively while old ones are kept fixed. The generalization allows arbitrary
differentiable loss functions to be used, providing more flexible algorithms to handle
regression, multi-class classification and more.
Generalization of AdaBoost as Gradient Boosting
Copyright 2015 CATENATE Group – All rights reserved
GBM - GRADIENT BOOSTING
Summarizing, GBM requires to specify three different components:
● The loss function with respect to the new weak learners.
● The specific form of the weak learner (e.g., stumps).
● A technique to add weak learners between them to minimize the loss function.
How Gradient Boosting Works
Copyright 2015 CATENATE Group – All rights reserved
GBM - GRADIENT BOOSTING
The loss function determines the behavior of the algorithm.
The only requirement is differentiability, in order to allow gradient descent on it. Although
you can define arbitrary losses, in practice only a handful are used. For example, regression
may use a squared error and classification may use logarithmic loss.
Loss Function
Copyright 2015 CATENATE Group – All rights reserved
GBM - GRADIENT BOOSTING
In H2O, the weak learners are implemented as decision trees, making this an
instance of decision tree boosting. In order to allow the addition of their
outputs, regression trees (having real values in output) are used.
When building each decision tree, the algorithm iteratively selects a split
point in a greedy fashion based on a measure of “purity” of the dataset, in
order to minimize the loss. It is possible to increase the depth of the trees to
handle more flexible decision boundaries.
On the contrary, to limit overfitting we can constrain the topology of tree
by, e.g. limiting the depth, the number of splits, or the number of leaf nodes.
Weak Learner
Copyright 2015 CATENATE Group – All rights reserved
GBM - GRADIENT BOOSTING
Gradient descent is a generic iterative technique to minimize objective functions. At each iteration, the
gradient of the loss function (e.g., the error on the training set) is computed, and it is used to choose a set
of parameters that decreases its value.
In a GBM, the optimization problem is formulated in terms of functions such as trees (functional
optimization), making it relatively hard in general. The basic idea is to approximate this gradient using only
its values on our training points.
In a GBM with squared loss, the resulting algorithm is extremely simple: at each step we train a new tree
on the “residual errors” with respect to the previous weak learners. This can be seen as a gradient descent
step with respect to our loss, where all previous weak learners are kept fixed and the gradient is
approximated. This generalizes easily to different losses.
Additive Model
Copyright 2015 CATENATE Group – All rights reserved
GBM - GRADIENT BOOSTING
The output for the new tree is then added to the output of the existing sequence of trees in
an effort to correct or improve the final output of the model. In particular, we associate a
different weighting parameter to each decision region of the newly constructed tree. This is
done by solving a new optimization problem with respect to these weights.
A fixed number of trees are added or training stops once loss reaches an acceptable level
or no longer improves on an external validation dataset.
Output and Stop Condition
Copyright 2015 CATENATE Group – All rights reserved
GBM - GRADIENT BOOSTING
Gradient boosting is a greedy algorithm and can overfit a training dataset quickly.
It can benefit from regularization methods that penalize various parts of the algorithm and
generally improve the performance of the algorithm by reducing overfitting.
There are 4 enhancements to basic gradient boosting:
● Tree Constraints
● Learning Rate
● Stochastic Gradient Boosting
● Penalized Learning (Regularization of regression trees output in L1 or L2)
Improvements to Basic Gradient Boosting
Copyright 2015 CATENATE Group – All rights reserved
● H2O Introduction
● GBM
● Demo
28
AGENDA
Copyright 2015 CATENATE Group – All rights reserved
Q&A
Copyright 2015 CATENATE Group – All rights reserved

More Related Content

PPTX
Serverless Data Architecture at scale on Google Cloud Platform - Lorenzo Ridi...
PPTX
Serverless Data Architecture at scale on Google Cloud Platform
PDF
Google Cloud Platform Empowers TensorFlow and Machine Learning
PDF
Nexxworks bootcamp ML6 (27/09/2017)
PDF
Apache Beam and Google Cloud Dataflow - IDG - final
PPTX
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
PPTX
ML6 talk at Nexxworks Bootcamp
PDF
Google Cloud Platform for Data Science teams
Serverless Data Architecture at scale on Google Cloud Platform - Lorenzo Ridi...
Serverless Data Architecture at scale on Google Cloud Platform
Google Cloud Platform Empowers TensorFlow and Machine Learning
Nexxworks bootcamp ML6 (27/09/2017)
Apache Beam and Google Cloud Dataflow - IDG - final
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
ML6 talk at Nexxworks Bootcamp
Google Cloud Platform for Data Science teams

What's hot (20)

PDF
On-Prem Solution for the Selection of Wind Energy Models
PDF
RAPIDS – Open GPU-accelerated Data Science
PDF
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
PDF
RAPIDS: GPU-Accelerated ETL and Feature Engineering
PDF
Serverless Data Platform
PDF
How @twitterhadoop chose google cloud
PDF
Rapids: Data Science on GPUs
PDF
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
PPTX
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
PDF
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
PDF
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
PDF
Present & Future of Greenplum Database A massively parallel Postgres Database...
PPTX
Apache kylin 2.0: from classic olap to real-time data warehouse
PPTX
Hadoop and Storm - AJUG talk
PDF
Make your PySpark Data Fly with Arrow!
PPTX
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PPTX
Managing 100s of PetaBytes of data in Cloud
PDF
ASGARD Splunk Conf 2016
PDF
Reliable Performance at Scale with Apache Spark on Kubernetes
PPTX
Parallel Linear Regression in Interative Reduce and YARN
On-Prem Solution for the Selection of Wind Energy Models
RAPIDS – Open GPU-accelerated Data Science
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
RAPIDS: GPU-Accelerated ETL and Feature Engineering
Serverless Data Platform
How @twitterhadoop chose google cloud
Rapids: Data Science on GPUs
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
DECK36 - Log everything! and Realtime Datastream Analytics with Storm
Open Source RAPIDS GPU Platform to Accelerate Predictive Data Analytics
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
Present & Future of Greenplum Database A massively parallel Postgres Database...
Apache kylin 2.0: from classic olap to real-time data warehouse
Hadoop and Storm - AJUG talk
Make your PySpark Data Fly with Arrow!
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
Managing 100s of PetaBytes of data in Cloud
ASGARD Splunk Conf 2016
Reliable Performance at Scale with Apache Spark on Kubernetes
Parallel Linear Regression in Interative Reduce and YARN
Ad

Viewers also liked (17)

PPTX
Introduzione Deep Learning & TensorFlow
PPTX
Introduzione
PDF
AI for business: Capire l'opportunità
PDF
Brussels Capital of Data Science
KEY
Machine keynote
PPTX
Google Dev Fest 2016 - Roma
PPTX
Machine Learning for Product Managers
DOC
Profesorado calor
PPT
Forests
PDF
Proyecto de vida dayana
PDF
Scalable Deep Learning Platform On Spark In Baidu
PDF
Deep Learning Lightning Talk
PDF
Cuadernos vindel 11
PDF
Introduction to Data Science (Data Science Thailand Meetup #1)
PDF
Cuadernos vindel 14
PPTX
Estudo 17 plano de uma ora de oração
PDF
Real-life Application of Analytics: Fighting the Underworld of Bike Theft wit...
Introduzione Deep Learning & TensorFlow
Introduzione
AI for business: Capire l'opportunità
Brussels Capital of Data Science
Machine keynote
Google Dev Fest 2016 - Roma
Machine Learning for Product Managers
Profesorado calor
Forests
Proyecto de vida dayana
Scalable Deep Learning Platform On Spark In Baidu
Deep Learning Lightning Talk
Cuadernos vindel 11
Introduction to Data Science (Data Science Thailand Meetup #1)
Cuadernos vindel 14
Estudo 17 plano de uma ora de oração
Real-life Application of Analytics: Fighting the Underworld of Bike Theft wit...
Ad

Similar to H20 - Thirst for Machine Learning (20)

PPTX
Gabriele Nocco - Massive distributed processing with H2O - Codemotion Milan 2017
PDF
AMGLAB A COMMUNITY PROBLEM SOLVING ENVIRONMENT FOR ALGEBRAIC MULTIGRID METHODS
PDF
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
PPTX
XGBOOST [Autosaved]12.pptx
PDF
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...
PPTX
Running Stateful Apps on Kubernetes
PPTX
Fine tuning large LMs
PPTX
HBaseCon 2013: Being Smarter Than the Smart Meter
PDF
Why Gretl The Perfect Choice for Econometrics Assignments
PDF
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
PDF
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
PDF
Maintaining aggregates
PPTX
An Introduction to Amazon SageMaker (October 2018)
PPTX
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 
PPTX
Using Apache Spark with IBM SPSS Modeler
PDF
Data Mining with SpagoBI suite
PPTX
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
PDF
Spark forplainoldjavageeks svforum_20140724
PPTX
Syngenta's Predictive Analytics Platform for Seeds R&D
PDF
Generalized Linear Models with H2O
Gabriele Nocco - Massive distributed processing with H2O - Codemotion Milan 2017
AMGLAB A COMMUNITY PROBLEM SOLVING ENVIRONMENT FOR ALGEBRAIC MULTIGRID METHODS
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
XGBOOST [Autosaved]12.pptx
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...
Running Stateful Apps on Kubernetes
Fine tuning large LMs
HBaseCon 2013: Being Smarter Than the Smart Meter
Why Gretl The Perfect Choice for Econometrics Assignments
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
Maintaining aggregates
An Introduction to Amazon SageMaker (October 2018)
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 
Using Apache Spark with IBM SPSS Modeler
Data Mining with SpagoBI suite
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
Spark forplainoldjavageeks svforum_20140724
Syngenta's Predictive Analytics Platform for Seeds R&D
Generalized Linear Models with H2O

More from MeetupDataScienceRoma (20)

PDF
Serve Davvero il Machine Learning nelle PMI? | Niccolò Annino
PDF
Meta-learning through the lenses of Statistical Learning Theory (Carlo Cilibe...
PPTX
Claudio Gallicchio - Deep Reservoir Computing for Structured Data
PDF
Docker for Deep Learning (Andrea Panizza)
PDF
Machine Learning for Epidemiological Models (Enrico Meloni)
PDF
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
PDF
Web Meetup #2: Modelli matematici per l'epidemiologia
PDF
Deep red - The environmental impact of deep learning (Paolo Caressa)
PDF
[Sponsored] C3.ai description
PDF
Paolo Galeone - Dissecting tf.function to discover auto graph strengths and s...
PPTX
Multimodal AI Approach to Provide Assistive Services (Francesco Puja)
PPTX
Introduzione - Meetup MLOps & Assistive AI
PDF
Zero, One, Many - Machine Learning in Produzione (Luca Palmieri)
PPTX
Mario Incarnati - The power of data visualization
PDF
Machine Learning in the AWS Cloud
PPTX
OLIVAW: reaching superhuman strength at Othello
PDF
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
PPTX
Bring your neural networks to the browser with TF.js - Simone Scardapane
PPTX
Meetup Gennaio 2019 - Slide introduttiva
PPTX
Elena Gagliardoni - Neural Chatbot
Serve Davvero il Machine Learning nelle PMI? | Niccolò Annino
Meta-learning through the lenses of Statistical Learning Theory (Carlo Cilibe...
Claudio Gallicchio - Deep Reservoir Computing for Structured Data
Docker for Deep Learning (Andrea Panizza)
Machine Learning for Epidemiological Models (Enrico Meloni)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Web Meetup #2: Modelli matematici per l'epidemiologia
Deep red - The environmental impact of deep learning (Paolo Caressa)
[Sponsored] C3.ai description
Paolo Galeone - Dissecting tf.function to discover auto graph strengths and s...
Multimodal AI Approach to Provide Assistive Services (Francesco Puja)
Introduzione - Meetup MLOps & Assistive AI
Zero, One, Many - Machine Learning in Produzione (Luca Palmieri)
Mario Incarnati - The power of data visualization
Machine Learning in the AWS Cloud
OLIVAW: reaching superhuman strength at Othello
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
Bring your neural networks to the browser with TF.js - Simone Scardapane
Meetup Gennaio 2019 - Slide introduttiva
Elena Gagliardoni - Neural Chatbot

Recently uploaded (20)

PDF
Fluorescence-microscope_Botany_detailed content
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Mega Projects Data Mega Projects Data
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
annual-report-2024-2025 original latest.
PPT
Quality review (1)_presentation of this 21
PPTX
Database Infoormation System (DBIS).pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
1_Introduction to advance data techniques.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Introduction to machine learning and Linear Models
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Computer network topology notes for revision
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
Fluorescence-microscope_Botany_detailed content
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Mega Projects Data Mega Projects Data
IB Computer Science - Internal Assessment.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
annual-report-2024-2025 original latest.
Quality review (1)_presentation of this 21
Database Infoormation System (DBIS).pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Qualitative Qantitative and Mixed Methods.pptx
Clinical guidelines as a resource for EBP(1).pdf
1_Introduction to advance data techniques.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
.pdf is not working space design for the following data for the following dat...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
climate analysis of Dhaka ,Banglades.pptx
Introduction to machine learning and Linear Models
Galatica Smart Energy Infrastructure Startup Pitch Deck
Computer network topology notes for revision
Introduction-to-Cloud-ComputingFinal.pptx

H20 - Thirst for Machine Learning

  • 1. Copyright 2015 CATENATE Group – All rights reserved H20 - Thirst for Machine Learning Meetup Machine Learning/Data Science, Rome, 15 March 2017 Gabriele Nocco, Senior Data Scientist gabriele.nocco@catenate.com Catenate s.r.l.
  • 2. Copyright 2015 CATENATE Group – All rights reserved ● H2O Introduction ● GBM ● Demo 2 AGENDA
  • 3. Copyright 2015 CATENATE Group – All rights reserved ● H2O Introduction ● GBM ● Demo 3 AGENDA
  • 4. Copyright 2015 CATENATE Group – All rights reserved H2O INTRODUCTION H2O is an opensource in-memory Machine Learning engine. Java-based, it exposes comfortable APIs in Java, Scala, Python and R. It also has a notebook-like user interface called Flow. The transversality of languages enables the access to the framework for many different professional roles, from analysts to programmers, up to more “academic” data scientists. So H2O can be a complete infrastructure, from the prototype model to the engineering solution.
  • 5. Copyright 2015 CATENATE Group – All rights reserved H2O INTRODUCTION - GARTNER In 2017, H2O.ai became a Visionary in the Magic Quadrant for Data Science Platforms: STRENGTHS ● Market awareness ● Customer satisfaction ● Flexibility and scalability CAUTIONS ● Data access and preparation ● High technical bar for use ● Visualization and data exploration ● Sales execution https://www.gartner.com/doc/reprints?id=1-3TKPVG1&ct=170215&st=sb
  • 6. Copyright 2015 CATENATE Group – All rights reserved H2O INTRODUCTION - FEATURES ● H2O Eco-System Benefits: ○ Scalable to massive datasets on large clusters, fully parallelized ○ Low-latency Java (“POJO”) scoring code is auto-generated ○ Easy to deploy on Laptop, Server, Hadoop cluster, Spark cluster, HPC ○ APIs include R, Python, Flow, Scala, Java, Javascript, REST ● Regularization techniques: Dropout, L1/L2 ● Early stopping, N-fold cross-validation, Grid search ● Handling of categorical, missing and sparse data ● Gaussian/Laplace/Poisson/Gamma/Tweedie regression with offsets, observation weights, various loss functions ● Unsupervised mode for nonlinear dimensionality reduction, outlier detection ● File type allowed: csv, ORC, SVMLite, ARFF, XLS, XLSX, Avro, Parquet
  • 7. Copyright 2015 CATENATE Group – All rights reserved H2O INTRODUCTION - ALGORITHMS
  • 8. Copyright 2015 CATENATE Group – All rights reserved H2O INTRODUCTION - ARCHITECTURE
  • 9. Copyright 2015 CATENATE Group – All rights reserved H2O INTRODUCTION - ARCHITECTURE
  • 10. Copyright 2015 CATENATE Group – All rights reserved H2O has the ability to develop Deep Neural Networks natively, or through integration with TensorFlow. It is now possible to produce very deep networks (5 to 1000 layers!) and it is possible to handle huge amounts of data, in the order of GBs or TBs. Another great advantage is the ability to exploit the potential of GPU to perform computations. H2O INTRODUCTION - H2O + TENSORFLOW
  • 11. Copyright 2015 CATENATE Group – All rights reserved With the release of TensorFlow, H2O has embraced the wave of enthusiasm for the growth of Deep Learning. Thanks to Deep Water, H2O allows us to interact in a direct and simple way with Deep Learning tools like TensorFlow, MXNet and Caffe. H2O INTRODUCTION - H2O + TENSORFLOW
  • 12. Copyright 2015 CATENATE Group – All rights reserved H2O INTRODUCTION - ARCHITECTURE
  • 13. Copyright 2015 CATENATE Group – All rights reserved H2O INTRODUCTION - H2O + SPARK One of the first plugin developed in H2O was the one for Apache Spark, named Sparkling Water. Binding to an opensource project on the rise such as Spark, with the power of calculation that distributed computing allows, has been a great driving force for the growth of H2O.
  • 14. Copyright 2015 CATENATE Group – All rights reserved A Sparkling Water application runs like a job that can be started with spark-submit. At this point the Spark Master produces the DAG and divides the execution for each Worker, in which the H2O libraries are loaded in the Java process. H2O INTRODUCTION - H2O + SPARK
  • 15. Copyright 2015 CATENATE Group – All rights reserved The Sparkling Water solution is obviously certificated for all the Spark distributions: Hortonworks, Cloudera, MapR. Databricks provides a Spark cluster in cloud, and H2O works perfectly in this environment. H2O Rains with Databricks Cloud! H2O INTRODUCTION - H2O + SPARK
  • 16. Copyright 2015 CATENATE Group – All rights reserved ● H2O Introduction ● GBM ● Demo 16 AGENDA
  • 17. Copyright 2015 CATENATE Group – All rights reserved Gradient Boosting Machine is one of the most powerful techniques to build predictive models. It can be applied for classification or regression, so it’s a supervised algorithm. This is one of the most diffused and used algorithm in the Kaggle community, performing better than SVMs, Decision Trees and Neural Networks in a large number of cases. https://www.quora.com/Why-does-Gradient-boosting-work-so-well-for-so-many-Kaggle-problems GBM can be an optimal solution when the dimension of the dataset or the computing power doesn’t allow to train a Deep Neural Network. GBM Gradient Boosting Machine
  • 18. Copyright 2015 CATENATE Group – All rights reserved Kaggle is the biggest platform for Machine Learning contests in the world. https://www.kaggle.com/ In the beginning of March 2017, Google announces the acquisition of the Kaggle community. GBM - KAGGLE
  • 19. Copyright 2015 CATENATE Group – All rights reserved GBM - ORIGIN OF BOOSTING IDEA A weak learner is an algorithm whose performance is only marginally better than random chance. Boosting was developed in the 1980s as the answer to the following question: “can we combine many weak learners to create a very strong one?” Boosting revolves around filtering observations, by focusing new learners on samples that were difficult to classify by previous weak learners. Using this idea, we can train a succession of weak learning methods, each one focused on patterns that were misclassified previously. Origin of Boosting Idea
  • 20. Copyright 2015 CATENATE Group – All rights reserved GBM - ADABOOST The first algorithm to gain a large popularity in the boosting family was Adaptive Boosting or AdaBoost for short. In the original formulation, weak learners are made by decision trees with a single split, called decision stumps. AdaBoost works by weighting the observations, and sampling the dataset at each iteration with more emphasis on instances that are difficult to classify. Sequentially, we add stumps trying to classify them better. At every step, predictions are made by taking a majority vote of the weak learners’ outputs, weighted by a measure of their individual accuracy. The First Boosting Algorithm
  • 21. Copyright 2015 CATENATE Group – All rights reserved GBM - GRADIENT BOOSTING In later years, it was realized that AdaBoost can be derived formally as the minimization of a specific cost function with an exponential loss. This allowed to recast the algorithm under a statistical framework. Gradient Boosting Machines, later called just gradient boosting (or gradient tree boosting when using trees), are the natural generalization of AdaBoost to handle boosting with any loss function following a gradient descent procedure: GBM = Boosting + Gradient descent This class of algorithms remains stage-wise additive, since new learners are added iteratively while old ones are kept fixed. The generalization allows arbitrary differentiable loss functions to be used, providing more flexible algorithms to handle regression, multi-class classification and more. Generalization of AdaBoost as Gradient Boosting
  • 22. Copyright 2015 CATENATE Group – All rights reserved GBM - GRADIENT BOOSTING Summarizing, GBM requires to specify three different components: ● The loss function with respect to the new weak learners. ● The specific form of the weak learner (e.g., stumps). ● A technique to add weak learners between them to minimize the loss function. How Gradient Boosting Works
  • 23. Copyright 2015 CATENATE Group – All rights reserved GBM - GRADIENT BOOSTING The loss function determines the behavior of the algorithm. The only requirement is differentiability, in order to allow gradient descent on it. Although you can define arbitrary losses, in practice only a handful are used. For example, regression may use a squared error and classification may use logarithmic loss. Loss Function
  • 24. Copyright 2015 CATENATE Group – All rights reserved GBM - GRADIENT BOOSTING In H2O, the weak learners are implemented as decision trees, making this an instance of decision tree boosting. In order to allow the addition of their outputs, regression trees (having real values in output) are used. When building each decision tree, the algorithm iteratively selects a split point in a greedy fashion based on a measure of “purity” of the dataset, in order to minimize the loss. It is possible to increase the depth of the trees to handle more flexible decision boundaries. On the contrary, to limit overfitting we can constrain the topology of tree by, e.g. limiting the depth, the number of splits, or the number of leaf nodes. Weak Learner
  • 25. Copyright 2015 CATENATE Group – All rights reserved GBM - GRADIENT BOOSTING Gradient descent is a generic iterative technique to minimize objective functions. At each iteration, the gradient of the loss function (e.g., the error on the training set) is computed, and it is used to choose a set of parameters that decreases its value. In a GBM, the optimization problem is formulated in terms of functions such as trees (functional optimization), making it relatively hard in general. The basic idea is to approximate this gradient using only its values on our training points. In a GBM with squared loss, the resulting algorithm is extremely simple: at each step we train a new tree on the “residual errors” with respect to the previous weak learners. This can be seen as a gradient descent step with respect to our loss, where all previous weak learners are kept fixed and the gradient is approximated. This generalizes easily to different losses. Additive Model
  • 26. Copyright 2015 CATENATE Group – All rights reserved GBM - GRADIENT BOOSTING The output for the new tree is then added to the output of the existing sequence of trees in an effort to correct or improve the final output of the model. In particular, we associate a different weighting parameter to each decision region of the newly constructed tree. This is done by solving a new optimization problem with respect to these weights. A fixed number of trees are added or training stops once loss reaches an acceptable level or no longer improves on an external validation dataset. Output and Stop Condition
  • 27. Copyright 2015 CATENATE Group – All rights reserved GBM - GRADIENT BOOSTING Gradient boosting is a greedy algorithm and can overfit a training dataset quickly. It can benefit from regularization methods that penalize various parts of the algorithm and generally improve the performance of the algorithm by reducing overfitting. There are 4 enhancements to basic gradient boosting: ● Tree Constraints ● Learning Rate ● Stochastic Gradient Boosting ● Penalized Learning (Regularization of regression trees output in L1 or L2) Improvements to Basic Gradient Boosting
  • 28. Copyright 2015 CATENATE Group – All rights reserved ● H2O Introduction ● GBM ● Demo 28 AGENDA
  • 29. Copyright 2015 CATENATE Group – All rights reserved Q&A
  • 30. Copyright 2015 CATENATE Group – All rights reserved