SlideShare a Scribd company logo
Scalable Data Science and
Deep Learning with H2O
Next.ML Workshop
San Francisco, 1/17/15
Arno Candel, H2O.ai
http://tiny.cc/h2o_next_ml_slides
Who am I?
PhD in Computational Physics, 2005

from ETH Zurich Switzerland
!
6 years at SLAC - Accelerator Physics Modeling
2 years at Skytree - Machine Learning
13 months at H2O.ai - Machine Learning
!
15 years in Supercomputing & Modeling
!
Named “2014 Big Data All-Star” by Fortune Magazine
!
@ArnoCandel
H2O Deep Learning, @ArnoCandel
Outline
Introduction (10 mins)
Methods & Implementation (20 mins)
Results and Live Demos (20 mins)
Higgs boson classification
MNIST handwritten digits
Ebay text classification
h2o-dev Outlook: Flow, Python
Part 2: Hands-On Session (40 mins)
Web GUI: Higgs dataset
R Studio: Adult, Higgs, MNIST datasets
3
H2O Deep Learning, @ArnoCandel
Teamwork at H2O.ai
Java, Apache v2 Open-Source
#1 Java Machine Learning in Github
Join the community!
4
H2O Deep Learning, @ArnoCandel
H2O: Open-Source (Apache v2)
Predictive Analytics Platform
5
H2O Deep Learning, @ArnoCandel 6
H2O Architecture - Designed for speed,
scale, accuracy & ease of use
Key technical points:
• distributed JVMs + REST API
• no Java GC issues 

(data in byte[], Double)
• loss-less number compression
• Hadoop integration (v1,YARN)
• R package (CRAN)
Pre-built fully featured algos:

K-Means, NB, PCA, CoxPH,

GLM, RF, GBM, DeepLearning
H2O Deep Learning, @ArnoCandel
Wikipedia:

Deep learning is a set of algorithms in
machine learning that attempt to model
high-level abstractions in data by using
architectures composed of multiple 

non-linear transformations.
What is Deep Learning?
Input:

Image
Output:

User ID
7
Example: Facebook DeepFace
H2O Deep Learning, @ArnoCandel
What is NOT Deep
Linear models are not deep
(by definition)
!
Neural nets with 1 hidden layer are not deep
(only 1 layer - no feature hierarchy)
!
SVMs and Kernel methods are not deep
(2 layers: kernel + linear)
!
Classification trees are not deep
(operate on original input space, no new features generated)
8
H2O Deep Learning, @ArnoCandel
1970s multi-layer feed-forward Neural Network
(stochastic gradient descent with back-propagation)
!
+ distributed processing for big data
(fine-grain in-memory MapReduce on distributed data)
!
+ multi-threaded speedup
(async fork/join worker threads operate at FORTRAN speeds)
!
+ smart algorithms for fast & accurate results
(automatic standardization, one-hot encoding of categoricals, missing value imputation, weight &
bias initialization, adaptive learning rate, momentum, dropout/l1/L2 regularization, grid search, 

N-fold cross-validation, checkpointing, load balancing, auto-tuning, model averaging, etc.)
!
= powerful tool for (un)supervised
machine learning on real-world data
H2O Deep Learning
9
all 320 cores maxed out
H2O Deep Learning, @ArnoCandel
“fully connected” directed graph of neurons
age
income
employment
married
single
Input layer
Hidden
layer 1
Hidden
layer 2
Output layer
3x4 4x3 3x2#connections
information flow
input/output neuron
hidden neuron
4 3 2#neurons 3
Example Neural Network
10
H2O Deep Learning, @ArnoCandel
age
income
employment
yj = tanh(sumi(xi*uij)+bj)
uij
xi
yj
per-class probabilities

sum(pl) = 1
zk = tanh(sumj(yj*vjk)+ck)
vjk
zk
pl
pl = softmax(sumk(zk*wkl)+dl)
wkl
softmax(xk) = exp(xk) / sumk(exp(xk))
“neurons activate each other via weighted sums”
Prediction: Forward Propagation
activation function: tanh
alternative:

x -> max(0,x) “rectifier”
pl is a non-linear function of xi:
can approximate ANY function
with enough layers!
bj, ck, dl: bias values

(indep. of inputs)
11
married
single
H2O Deep Learning, @ArnoCandel
age
income
employment
xi
Automatic standardization of data

xi: mean = 0, stddev = 1
!
horizontalize categorical variables, e.g.
{full-time, part-time, none, self-employed} 

->

{0,1,0} = part-time, {0,0,0} = self-employed
Automatic initialization of weights
!
Poor man’s initialization: random weights wkl
!
Default (better): Uniform distribution in

+/- sqrt(6/(#units + #units_previous_layer))
Data preparation & Initialization
Neural Networks are sensitive to numerical noise,

operate best in the linear regime (not saturated)
12
married
single
wkl
H2O Deep Learning, @ArnoCandel
Mean Square Error = (0.22 + 0.22)/2 “penalize differences per-class”
!
Cross-entropy = -log(0.8) “strongly penalize non-1-ness”
Training: Update Weights & Biases
Stochastic Gradient Descent: Update weights and biases via
gradient of the error (via back-propagation):
For each training row, we make a prediction and compare
with the actual label (supervised learning):
married10.8
predicted actual
Objective: minimize prediction error (MSE or cross-entropy)
w <— w - rate * ∂E/∂w
1
13
single00.2
E
w
rate
H2O Deep Learning, @ArnoCandel
Backward Propagation


!
∂E/∂wi = ∂E/∂y * ∂y/∂net * ∂net/∂wi
= ∂(error(y))/∂y * ∂(activation(net))/∂net * xi
Backprop: Compute ∂E/∂wi via chain rule going backwards
wi
net = sumi(wi*xi) + b
xi
E = error(y)
y = activation(net)
How to compute ∂E/∂wi for wi <— wi - rate * ∂E/∂wi ?
Naive: For every i, evaluate E twice at (w1,…,wi±∆,…,wN)… Slow!
14
H2O Deep Learning, @ArnoCandel
H2O Deep Learning Architecture
K-V
K-V
HTTPD
HTTPD
nodes/JVMs: sync
threads: async
communication
w
w w
w w w w
w1
w3 w2
w4
w2+w4
w1+w3
w* = (w1+w2+w3+w4)/4
map:

each node trains a
copy of the weights
and biases with
(some* or all of) its
local data with
asynchronous F/J
threads
initial model: weights and biases w
updated model: w*
H2O atomic
in-memory

K-V store
reduce:

model averaging:
average weights and
biases from all nodes,
speedup is at least
#nodes/log(#rows)
arxiv:1209.4129v3
Keep iterating over the data (“epochs”), score from time to time
Query & display
the model via
JSON, WWW
2
2 431
1
1
1
4
3 2
1 2
1
i
*auto-tuned (default) or user-specified number of points per MapReduce iteration
15
H2O Deep Learning, @ArnoCandel
Adaptive learning rate - ADADELTA (Google)

Automatically set learning rate for each neuron
based on its training history
Grid Search and Checkpointing

Run a grid search to scan many hyper-
parameters, then continue training the most
promising model(s)
Regularization

L1: penalizes non-zero weights

L2: penalizes large weights

Dropout: randomly ignore certain inputs
Hogwild!: intentional race conditions
Distributed mode: weight averaging
16
“Secret” Sauce to Higher Accuracy
H2O Deep Learning, @ArnoCandel
Detail: Adaptive Learning Rate
!
Compute moving average of ∆wi
2 at time t for window length rho:
!
E[∆wi
2]t = rho * E[∆wi
2]t-1 + (1-rho) * ∆wi
2
!
Compute RMS of ∆wi at time t with smoothing epsilon:
!
RMS[∆wi]t = sqrt( E[∆wi
2]t + epsilon )
Adaptive annealing / progress:
Gradient-dependent learning rate,
moving window prevents “freezing”
(unlike ADAGRAD: no window)
Adaptive acceleration / momentum:
accumulate previous weight updates,
but over a window of time
RMS[∆wi]t-1
RMS[∂E/∂wi]t
rate(wi, t) =
Do the same for ∂E/∂wi, then
obtain per-weight learning rate:
cf. ADADELTA paper
17
H2O Deep Learning, @ArnoCandel
Detail: Dropout Regularization
18
Training:
For each hidden neuron, for each training sample, for each iteration,
ignore (zero out) a different random fraction p of input activations.
!
age
income
employment
married
single
X
X
X
Testing:
Use all activations, but reduce them by a factor p
(to “simulate” the missing activations during training).
cf. Geoff Hinton's paper
H2O Deep Learning, @ArnoCandel 19
Application: Higgs Boson Classification
Higgs

vs

Background
Large Hadron Collider: Largest experiment of mankind!
$13+ billion, 16.8 miles long, 120 MegaWatts, -456F, 1PB/day, etc.
Higgs boson discovery (July ’12) led to 2013 Nobel prize!
http://arxiv.org/pdf/1402.4735v2.pdf
Images courtesy CERN / LHC
HIGGS UCI Dataset:
21 low-level features AND
7 high-level derived features (physics formulae)
Train: 10M rows, Valid: 500k, Test: 500k rows
H2O Deep Learning, @ArnoCandel 20
Live Demo: Let’s see what Deep Learning
can do with low-level features alone!
? ? ?
Former baseline for AUC: 0.733 and 0.816
H2O Algorithm low-level H2O AUC all features H2O AUC
Generalized Linear Model 0.596 0.684
Random Forest 0.764 0.840
Gradient Boosted Trees 0.753 0.839
Neural Net 1 hidden layer 0.760 0.830
H2O Deep Learning ?
add

derived

!
features
Higgs: Derived features are important!
H2O Deep Learning, @ArnoCandel
MNIST: digits classification
Standing world record:

Without distortions or
convolutions, the best-ever
published error rate on test
set: 0.83% (Microsoft)
21
Train: 60,000 rows 784 integer columns 10 classes
Test: 10,000 rows 784 integer columns 10 classes
MNIST = Digitized handwritten
digits database (Yann LeCun)
Data: 28x28=784 pixels with
(gray-scale) values in 0…255
Yann LeCun: “Yet another advice: don't get
fooled by people who claim to have a solution
to Artificial General Intelligence. Ask them what
error rate they get on MNIST or ImageNet.”
H2O Deep Learning, @ArnoCandel 22
H2O Deep Learning beats MNIST
Standard 60k/10k data
No distortions
No convolutions
No unsupervised training
No ensemble
!
10 hours on 10 16-core nodes
World-record!
0.83% test set error
http://learn.h2o.ai/content/hands-on_training/deep_learning.html
H2O Deep Learning, @ArnoCandel
POJO Model Export for
Production Scoring
23
Plain old Java code is
auto-generated to take
your H2O Deep Learning
models into production!
H2O Deep Learning, @ArnoCandel
Parallel Scalability
(for 64 epochs on MNIST, with “0.83%” parameters)
24
Speedup
0.00
10.00
20.00
30.00
40.00
1 2 4 8 16 32 63
H2O Nodes
(4 cores per node, 1 epoch per node per MapReduce)
2.7 mins
Training Time
0
25
50
75
100
1 2 4 8 16 32 63
H2O Nodes
in minutes
H2O Deep Learning, @ArnoCandel
Goal: Predict the item from
seller’s text description
25
Train: 578,361 rows 8,647 cols 467 classes
Test: 64,263 rows 8,647 cols 143 classes
“Vintage 18KT gold Rolex 2 Tone
in great condition”
Data: Bag of words vector 0,0,1,0,0,0,0,0,1,0,0,0,1,…,0
vintagegold condition
Text Classification
H2O Deep Learning, @ArnoCandel
Out-Of-The-Box: 11.6% test set error after 10 epochs!
Predicts the correct class (out of 143) 88.4% of the time!
26
Note 2: No tuning was done

(results are for illustration only)
Train: 578,361 rows 8,647 cols 467 classes
Test: 64,263 rows 8,647 cols 143 classes
Note 1: H2O columnar-compressed in-memory
store only needs 60 MB to store 5 billion
values (dense CSV needs 18 GB)
Text Classification
H2O Deep Learning, @ArnoCandel
MNIST: Unsupervised Anomaly Detection
with Deep Learning (Autoencoder)
27
The good The bad The ugly
Download the script and run it yourself!
H2O Deep Learning, @ArnoCandel 28
How well did
Deep Learning do?
Let’s see how H2O did in the past 10 minutes!
Higgs: Live Demo (Continued)
<your guess?>
reference paper results
Any guesses for AUC on low-level features?
AUC=0.76 was the best for RF/GBM/NN (H2O)
H2O Deep Learning, @ArnoCandel
H2O Steam: Scoring Platform
29
Higgs Dataset Demo on 10-node cluster
Let’s score all our H2O models and compare them!
http://server:port/steam/index.html
Live Demo
H2O Deep Learning, @ArnoCandel 30
Live Demo on 10-node cluster:
<10 minutes runtime for all H2O algos!
Better than LHC baseline of AUC=0.73!
Scoring Higgs Models in H2O Steam
H2O Deep Learning, @ArnoCandel 31
Algorithm
Paper’s
l-l AUC
low-level
H2O AUC
all features

H2O AUC
Parameters (not heavily tuned), 

H2O running on 10 nodes
Generalized Linear Model - 0.596 0.684 default, binomial
Random Forest - 0.764 0.840 50 trees, max depth 50
Gradient Boosted Trees 0.73 0.753 0.839 50 trees, max depth 15
Neural Net 1 layer 0.733 0.760 0.830 1x300 Rectifier, 100 epochs
Deep Learning 3 hidden layers 0.836 0.850 - 3x1000 Rectifier, L2=1e-5, 40 epochs
Deep Learning 4 hidden layers 0.868 0.869 - 4x500 Rectifier, L1=L2=1e-5, 300 epochs
Deep Learning 5 hidden layers 0.880 0.871 - 5x500 Rectifier, L1=L2=1e-5
Deep Learning on low-level features alone beats everything else!
Prelim. H2O results compare well with paper’s results* (TMVA & Theano)
Higgs Particle Detection with H2O
*Nature paper: http://arxiv.org/pdf/1402.4735v2.pdf
HIGGS UCI Dataset:
21 low-level features AND
7 high-level derived features
Train: 10M rows, Test: 500k rows
H2O Deep Learning, @ArnoCandel
Coming very soon: h2o-dev
New UI: Flow
New languages: python, Javascript
32
H2O Deep Learning, @ArnoCandel
h2o-dev Python Example
33
H2O Deep Learning, @ArnoCandel
Part 2: Hands-On Session
34
Web GUI
Import Higgs data, split into train/test
Train grid search Deep Learning model
Continue training the best model
ROC and Multi-Model Scoring
R Studio
Connect to running H2O Cluster from R
Run ML algos on 3 different datasets
More: Follow examples from http://learn.h2o.ai 

(R scripts and data at http://data.h2o.ai)
H2O Deep Learning, @ArnoCandel
H2O Docker VM
35
http://h2o.ai/blog/2015/01/h2o-docker/
H2O will be at

http://`boot2docker ip`:8996
H2O Deep Learning, @ArnoCandel
Import Higgs data
36
Enter
H2O Deep Learning, @ArnoCandel
Split Into Train/Test
37
H2O Deep Learning, @ArnoCandel
Train Grid Search DL Model
38
Enter
Enter
Enter
Enter
H2O Deep Learning, @ArnoCandel
Continue Training Best Model
39
Scroll
right
Enter
H2O Deep Learning, @ArnoCandel
Inspect ROC, thresholds, etc.
40
H2O Deep Learning, @ArnoCandel
Multi-Model Scoring
41
H2O Deep Learning, @ArnoCandel
Control H2O from R Studio
42
http://learn.h2o.ai/
R scripts in github
1) Paste content of

http://tiny.cc/h2o_next_ml into R Studio
2) Execute line by line with Ctrl-Enter to
run ML algorithms on H2O Cluster via R
3) Check out the links below for more info
http://h2o.gitbooks.io
H2O Deep Learning, @ArnoCandel
Snippets from R script
43
Install H2O R package & connect to H2O Server
Run Deep Learning on MNIST
H2O Deep Learning, @ArnoCandel 44
H2O GitBooks
Also available: GBM & GLM GitBooks
at http://h2o.gitbooks.io
H2O World
learn.h2o.ai
R, EC2,
Hadoop
Deep
Learning
H2O Deep Learning, @ArnoCandel
H2O Kaggle Starter R Scripts
45
H2O Deep Learning, @ArnoCandel
Re-Live H2O World!
46
http://h2o.ai/h2o-world/
http://learn.h2o.ai
Watch the Videos
Day 2
• Speakers from Academia & Industry
• Trevor Hastie (ML)
• John Chambers (S, R)
• Josh Bloch (Java API)
• Many use cases from customers
• 3 Top Kaggle Contestants (Top 10)
• 3 Panel discussions
Day 1
• Hands-On Training
• Supervised
• Unsupervised
• Advanced Topics
• Markting Usecase
• Product Demos
• Hacker-Fest with 

Cliff Click (CTO, Hotspot)
H2O Deep Learning, @ArnoCandel
You can participate!
47
- Images: Convolutional & Pooling Layers PUB-644
- Sequences: Recurrent Neural Networks PUB-1052
- Faster Training: GPGPU support PUB-1013
- Pre-Training: Stacked Auto-Encoders PUB-1014
- Ensembles PUB-1072
- Use H2O at Kaggle Challenges!
H2O Deep Learning, @ArnoCandel
Key Take-Aways
H2O is an open source predictive analytics platform
for data scientists and business analysts who need
scalable and fast machine learning.
!
H2O Deep Learning is ready to take your advanced
analytics to the next level - Try it on your data!
!
Join our Community and Meetups!
https://github.com/h2oai
h2ostream community forum
www.h2o.ai
@h2oai
48
Thank you!

More Related Content

PDF
H2ODeepLearningThroughExamples021215
PDF
H2O Distributed Deep Learning by Arno Candel 071614
PDF
MLconf - Distributed Deep Learning for Classification and Regression Problems...
PDF
How to win data science competitions with Deep Learning
PDF
H2O Open Source Deep Learning, Arno Candel 03-20-14
PDF
Deep Learning in the Wild with Arno Candel
PDF
Alex Tellez, Deep Learning Applications
PDF
Deep Learning through Examples
H2ODeepLearningThroughExamples021215
H2O Distributed Deep Learning by Arno Candel 071614
MLconf - Distributed Deep Learning for Classification and Regression Problems...
How to win data science competitions with Deep Learning
H2O Open Source Deep Learning, Arno Candel 03-20-14
Deep Learning in the Wild with Arno Candel
Alex Tellez, Deep Learning Applications
Deep Learning through Examples

What's hot (20)

PPTX
Deep Learning with Python (PyData Seattle 2015)
PDF
Webinar: Deep Learning with H2O
PDF
San Francisco Hadoop User Group Meetup Deep Learning
PDF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
PPTX
Introduction to Deep Learning
PDF
Introduction to Deep Learning with Python
PDF
Deep Learning and Reinforcement Learning
PDF
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
PDF
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
PPTX
An introduction to Deep Learning
PDF
Machine Learning and Deep Learning with R
PDF
Deep Learning And Business Models (VNITC 2015-09-13)
PDF
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
PPTX
Machine Learning, Deep Learning and Data Analysis Introduction
PDF
Deep Learning Cases: Text and Image Processing
PDF
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
PDF
Applying your Convolutional Neural Networks
PPTX
Diving into Deep Learning (Silicon Valley Code Camp 2017)
PDF
Using Deep Learning to do Real-Time Scoring in Practical Applications
PDF
Language translation with Deep Learning (RNN) with TensorFlow
 
Deep Learning with Python (PyData Seattle 2015)
Webinar: Deep Learning with H2O
San Francisco Hadoop User Group Meetup Deep Learning
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Introduction to Deep Learning
Introduction to Deep Learning with Python
Deep Learning and Reinforcement Learning
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
An introduction to Deep Learning
Machine Learning and Deep Learning with R
Deep Learning And Business Models (VNITC 2015-09-13)
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
Machine Learning, Deep Learning and Data Analysis Introduction
Deep Learning Cases: Text and Image Processing
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Applying your Convolutional Neural Networks
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Using Deep Learning to do Real-Time Scoring in Practical Applications
Language translation with Deep Learning (RNN) with TensorFlow
 
Ad

Viewers also liked (20)

PDF
Transform your Business with AI, Deep Learning and Machine Learning
PDF
Deep Learning Computer Build
PDF
Deep Learning - The Past, Present and Future of Artificial Intelligence
PDF
Passive stereo vision with deep learning
PDF
Deep Learning and the state of AI / 2016
PPTX
New pedagogies for deep learning
PPTX
Squeezing Deep Learning Into Mobile Phones
PPTX
What Deep Learning Means for Artificial Intelligence
PPTX
Deep learning at nmc devin jones
PPTX
Best Deep Learning Post from LinkedIn Group
PDF
Donner - Deep Learning - Overview and practical aspects
PDF
Deep learning - Part I
ODP
Start a deep learning startup - tutorial
PDF
Indoor Point Cloud Processing - Deep learning for semantic segmentation of in...
PPTX
Spark machine learning & deep learning
PDF
Introduction to Machine Learning and Deep Learning
PDF
Deep learning - Conceptual understanding and applications
PDF
Deep Water - GPU Deep Learning for H2O - Arno Candel
PPTX
Deep Learning in Computer Vision
PDF
Deep Learning - Convolutional Neural Networks
Transform your Business with AI, Deep Learning and Machine Learning
Deep Learning Computer Build
Deep Learning - The Past, Present and Future of Artificial Intelligence
Passive stereo vision with deep learning
Deep Learning and the state of AI / 2016
New pedagogies for deep learning
Squeezing Deep Learning Into Mobile Phones
What Deep Learning Means for Artificial Intelligence
Deep learning at nmc devin jones
Best Deep Learning Post from LinkedIn Group
Donner - Deep Learning - Overview and practical aspects
Deep learning - Part I
Start a deep learning startup - tutorial
Indoor Point Cloud Processing - Deep learning for semantic segmentation of in...
Spark machine learning & deep learning
Introduction to Machine Learning and Deep Learning
Deep learning - Conceptual understanding and applications
Deep Water - GPU Deep Learning for H2O - Arno Candel
Deep Learning in Computer Vision
Deep Learning - Convolutional Neural Networks
Ad

Similar to H2O Deep Learning at Next.ML (20)

PDF
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
PPTX
Online learning, Vowpal Wabbit and Hadoop
PDF
Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015
PDF
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
PDF
W2 - Multilayer Neural Network Lecture notes university of sydney
PPTX
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
PDF
The Back Propagation Learning Algorithm
PDF
Convolutional neural networks for image classification — evidence from Kaggle...
PDF
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
PDF
Top 10 Performance Gotchas for scaling in-memory Algorithms.
PDF
Ling liu part 02:big graph processing
PDF
Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015
PDF
Scalable Data Science and Deep Learning with H2O
PDF
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...
PDF
My Postdoctoral Research
PDF
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
PPTX
Artificial Neural Network
PDF
Productive Use of the Apache Spark Prompt with Sam Penrose
PDF
Functional Programming with Immutable Data Structures
PPTX
Digit recognizer by convolutional neural network
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
Online learning, Vowpal Wabbit and Hadoop
Arno candel scalabledatascienceanddeeplearningwithh2o_reworkboston2015
ArnoCandelScalabledatascienceanddeeplearningwithh2o_gotochg
W2 - Multilayer Neural Network Lecture notes university of sydney
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
The Back Propagation Learning Algorithm
Convolutional neural networks for image classification — evidence from Kaggle...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
Top 10 Performance Gotchas for scaling in-memory Algorithms.
Ling liu part 02:big graph processing
Arno candel scalabledatascienceanddeeplearningwithh2o_odsc_boston2015
Scalable Data Science and Deep Learning with H2O
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...
My Postdoctoral Research
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Artificial Neural Network
Productive Use of the Apache Spark Prompt with Sam Penrose
Functional Programming with Immutable Data Structures
Digit recognizer by convolutional neural network

More from Sri Ambati (20)

PDF
H2O Label Genie Starter Track - Support Presentation
PDF
H2O.ai Agents : From Theory to Practice - Support Presentation
PDF
H2O Generative AI Starter Track - Support Presentation Slides.pdf
PDF
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
PDF
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
PDF
Intro to Enterprise h2oGPTe Presentation Slides
PDF
Enterprise h2o GPTe Learning Path Slide Deck
PDF
H2O Wave Course Starter - Presentation Slides
PDF
Large Language Models (LLMs) - Level 3 Slides
PDF
Data Science and Machine Learning Platforms (2024) Slides
PDF
Data Prep for H2O Driverless AI - Slides
PDF
H2O Cloud AI Developer Services - Slides (2024)
PDF
LLM Learning Path Level 2 - Presentation Slides
PDF
LLM Learning Path Level 1 - Presentation Slides
PDF
Hydrogen Torch - Starter Course - Presentation Slides
PDF
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
PDF
H2O Driverless AI Starter Course - Slides and Assignments
PPTX
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
PDF
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
PPTX
Generative AI Masterclass - Model Risk Management.pptx
H2O Label Genie Starter Track - Support Presentation
H2O.ai Agents : From Theory to Practice - Support Presentation
H2O Generative AI Starter Track - Support Presentation Slides.pdf
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Intro to Enterprise h2oGPTe Presentation Slides
Enterprise h2o GPTe Learning Path Slide Deck
H2O Wave Course Starter - Presentation Slides
Large Language Models (LLMs) - Level 3 Slides
Data Science and Machine Learning Platforms (2024) Slides
Data Prep for H2O Driverless AI - Slides
H2O Cloud AI Developer Services - Slides (2024)
LLM Learning Path Level 2 - Presentation Slides
LLM Learning Path Level 1 - Presentation Slides
Hydrogen Torch - Starter Course - Presentation Slides
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
H2O Driverless AI Starter Course - Slides and Assignments
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Generative AI Masterclass - Model Risk Management.pptx

Recently uploaded (20)

PDF
System and Network Administration Chapter 2
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
ISO 45001 Occupational Health and Safety Management System
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
System and Network Administraation Chapter 3
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
history of c programming in notes for students .pptx
PPTX
ai tools demonstartion for schools and inter college
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
top salesforce developer skills in 2025.pdf
System and Network Administration Chapter 2
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
ISO 45001 Occupational Health and Safety Management System
Odoo POS Development Services by CandidRoot Solutions
System and Network Administraation Chapter 3
Navsoft: AI-Powered Business Solutions & Custom Software Development
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Softaken Excel to vCard Converter Software.pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
2025 Textile ERP Trends: SAP, Odoo & Oracle
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
How to Choose the Right IT Partner for Your Business in Malaysia
Operating system designcfffgfgggggggvggggggggg
Internet Downloader Manager (IDM) Crack 6.42 Build 41
history of c programming in notes for students .pptx
ai tools demonstartion for schools and inter college
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
top salesforce developer skills in 2025.pdf

H2O Deep Learning at Next.ML

  • 1. Scalable Data Science and Deep Learning with H2O Next.ML Workshop San Francisco, 1/17/15 Arno Candel, H2O.ai http://tiny.cc/h2o_next_ml_slides
  • 2. Who am I? PhD in Computational Physics, 2005
 from ETH Zurich Switzerland ! 6 years at SLAC - Accelerator Physics Modeling 2 years at Skytree - Machine Learning 13 months at H2O.ai - Machine Learning ! 15 years in Supercomputing & Modeling ! Named “2014 Big Data All-Star” by Fortune Magazine ! @ArnoCandel
  • 3. H2O Deep Learning, @ArnoCandel Outline Introduction (10 mins) Methods & Implementation (20 mins) Results and Live Demos (20 mins) Higgs boson classification MNIST handwritten digits Ebay text classification h2o-dev Outlook: Flow, Python Part 2: Hands-On Session (40 mins) Web GUI: Higgs dataset R Studio: Adult, Higgs, MNIST datasets 3
  • 4. H2O Deep Learning, @ArnoCandel Teamwork at H2O.ai Java, Apache v2 Open-Source #1 Java Machine Learning in Github Join the community! 4
  • 5. H2O Deep Learning, @ArnoCandel H2O: Open-Source (Apache v2) Predictive Analytics Platform 5
  • 6. H2O Deep Learning, @ArnoCandel 6 H2O Architecture - Designed for speed, scale, accuracy & ease of use Key technical points: • distributed JVMs + REST API • no Java GC issues 
 (data in byte[], Double) • loss-less number compression • Hadoop integration (v1,YARN) • R package (CRAN) Pre-built fully featured algos:
 K-Means, NB, PCA, CoxPH,
 GLM, RF, GBM, DeepLearning
  • 7. H2O Deep Learning, @ArnoCandel Wikipedia:
 Deep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using architectures composed of multiple 
 non-linear transformations. What is Deep Learning? Input:
 Image Output:
 User ID 7 Example: Facebook DeepFace
  • 8. H2O Deep Learning, @ArnoCandel What is NOT Deep Linear models are not deep (by definition) ! Neural nets with 1 hidden layer are not deep (only 1 layer - no feature hierarchy) ! SVMs and Kernel methods are not deep (2 layers: kernel + linear) ! Classification trees are not deep (operate on original input space, no new features generated) 8
  • 9. H2O Deep Learning, @ArnoCandel 1970s multi-layer feed-forward Neural Network (stochastic gradient descent with back-propagation) ! + distributed processing for big data (fine-grain in-memory MapReduce on distributed data) ! + multi-threaded speedup (async fork/join worker threads operate at FORTRAN speeds) ! + smart algorithms for fast & accurate results (automatic standardization, one-hot encoding of categoricals, missing value imputation, weight & bias initialization, adaptive learning rate, momentum, dropout/l1/L2 regularization, grid search, 
 N-fold cross-validation, checkpointing, load balancing, auto-tuning, model averaging, etc.) ! = powerful tool for (un)supervised machine learning on real-world data H2O Deep Learning 9 all 320 cores maxed out
  • 10. H2O Deep Learning, @ArnoCandel “fully connected” directed graph of neurons age income employment married single Input layer Hidden layer 1 Hidden layer 2 Output layer 3x4 4x3 3x2#connections information flow input/output neuron hidden neuron 4 3 2#neurons 3 Example Neural Network 10
  • 11. H2O Deep Learning, @ArnoCandel age income employment yj = tanh(sumi(xi*uij)+bj) uij xi yj per-class probabilities
 sum(pl) = 1 zk = tanh(sumj(yj*vjk)+ck) vjk zk pl pl = softmax(sumk(zk*wkl)+dl) wkl softmax(xk) = exp(xk) / sumk(exp(xk)) “neurons activate each other via weighted sums” Prediction: Forward Propagation activation function: tanh alternative:
 x -> max(0,x) “rectifier” pl is a non-linear function of xi: can approximate ANY function with enough layers! bj, ck, dl: bias values
 (indep. of inputs) 11 married single
  • 12. H2O Deep Learning, @ArnoCandel age income employment xi Automatic standardization of data
 xi: mean = 0, stddev = 1 ! horizontalize categorical variables, e.g. {full-time, part-time, none, self-employed} 
 ->
 {0,1,0} = part-time, {0,0,0} = self-employed Automatic initialization of weights ! Poor man’s initialization: random weights wkl ! Default (better): Uniform distribution in
 +/- sqrt(6/(#units + #units_previous_layer)) Data preparation & Initialization Neural Networks are sensitive to numerical noise,
 operate best in the linear regime (not saturated) 12 married single wkl
  • 13. H2O Deep Learning, @ArnoCandel Mean Square Error = (0.22 + 0.22)/2 “penalize differences per-class” ! Cross-entropy = -log(0.8) “strongly penalize non-1-ness” Training: Update Weights & Biases Stochastic Gradient Descent: Update weights and biases via gradient of the error (via back-propagation): For each training row, we make a prediction and compare with the actual label (supervised learning): married10.8 predicted actual Objective: minimize prediction error (MSE or cross-entropy) w <— w - rate * ∂E/∂w 1 13 single00.2 E w rate
  • 14. H2O Deep Learning, @ArnoCandel Backward Propagation 
 ! ∂E/∂wi = ∂E/∂y * ∂y/∂net * ∂net/∂wi = ∂(error(y))/∂y * ∂(activation(net))/∂net * xi Backprop: Compute ∂E/∂wi via chain rule going backwards wi net = sumi(wi*xi) + b xi E = error(y) y = activation(net) How to compute ∂E/∂wi for wi <— wi - rate * ∂E/∂wi ? Naive: For every i, evaluate E twice at (w1,…,wi±∆,…,wN)… Slow! 14
  • 15. H2O Deep Learning, @ArnoCandel H2O Deep Learning Architecture K-V K-V HTTPD HTTPD nodes/JVMs: sync threads: async communication w w w w w w w w1 w3 w2 w4 w2+w4 w1+w3 w* = (w1+w2+w3+w4)/4 map:
 each node trains a copy of the weights and biases with (some* or all of) its local data with asynchronous F/J threads initial model: weights and biases w updated model: w* H2O atomic in-memory
 K-V store reduce:
 model averaging: average weights and biases from all nodes, speedup is at least #nodes/log(#rows) arxiv:1209.4129v3 Keep iterating over the data (“epochs”), score from time to time Query & display the model via JSON, WWW 2 2 431 1 1 1 4 3 2 1 2 1 i *auto-tuned (default) or user-specified number of points per MapReduce iteration 15
  • 16. H2O Deep Learning, @ArnoCandel Adaptive learning rate - ADADELTA (Google)
 Automatically set learning rate for each neuron based on its training history Grid Search and Checkpointing
 Run a grid search to scan many hyper- parameters, then continue training the most promising model(s) Regularization
 L1: penalizes non-zero weights
 L2: penalizes large weights
 Dropout: randomly ignore certain inputs Hogwild!: intentional race conditions Distributed mode: weight averaging 16 “Secret” Sauce to Higher Accuracy
  • 17. H2O Deep Learning, @ArnoCandel Detail: Adaptive Learning Rate ! Compute moving average of ∆wi 2 at time t for window length rho: ! E[∆wi 2]t = rho * E[∆wi 2]t-1 + (1-rho) * ∆wi 2 ! Compute RMS of ∆wi at time t with smoothing epsilon: ! RMS[∆wi]t = sqrt( E[∆wi 2]t + epsilon ) Adaptive annealing / progress: Gradient-dependent learning rate, moving window prevents “freezing” (unlike ADAGRAD: no window) Adaptive acceleration / momentum: accumulate previous weight updates, but over a window of time RMS[∆wi]t-1 RMS[∂E/∂wi]t rate(wi, t) = Do the same for ∂E/∂wi, then obtain per-weight learning rate: cf. ADADELTA paper 17
  • 18. H2O Deep Learning, @ArnoCandel Detail: Dropout Regularization 18 Training: For each hidden neuron, for each training sample, for each iteration, ignore (zero out) a different random fraction p of input activations. ! age income employment married single X X X Testing: Use all activations, but reduce them by a factor p (to “simulate” the missing activations during training). cf. Geoff Hinton's paper
  • 19. H2O Deep Learning, @ArnoCandel 19 Application: Higgs Boson Classification Higgs
 vs
 Background Large Hadron Collider: Largest experiment of mankind! $13+ billion, 16.8 miles long, 120 MegaWatts, -456F, 1PB/day, etc. Higgs boson discovery (July ’12) led to 2013 Nobel prize! http://arxiv.org/pdf/1402.4735v2.pdf Images courtesy CERN / LHC HIGGS UCI Dataset: 21 low-level features AND 7 high-level derived features (physics formulae) Train: 10M rows, Valid: 500k, Test: 500k rows
  • 20. H2O Deep Learning, @ArnoCandel 20 Live Demo: Let’s see what Deep Learning can do with low-level features alone! ? ? ? Former baseline for AUC: 0.733 and 0.816 H2O Algorithm low-level H2O AUC all features H2O AUC Generalized Linear Model 0.596 0.684 Random Forest 0.764 0.840 Gradient Boosted Trees 0.753 0.839 Neural Net 1 hidden layer 0.760 0.830 H2O Deep Learning ? add
 derived
 ! features Higgs: Derived features are important!
  • 21. H2O Deep Learning, @ArnoCandel MNIST: digits classification Standing world record:
 Without distortions or convolutions, the best-ever published error rate on test set: 0.83% (Microsoft) 21 Train: 60,000 rows 784 integer columns 10 classes Test: 10,000 rows 784 integer columns 10 classes MNIST = Digitized handwritten digits database (Yann LeCun) Data: 28x28=784 pixels with (gray-scale) values in 0…255 Yann LeCun: “Yet another advice: don't get fooled by people who claim to have a solution to Artificial General Intelligence. Ask them what error rate they get on MNIST or ImageNet.”
  • 22. H2O Deep Learning, @ArnoCandel 22 H2O Deep Learning beats MNIST Standard 60k/10k data No distortions No convolutions No unsupervised training No ensemble ! 10 hours on 10 16-core nodes World-record! 0.83% test set error http://learn.h2o.ai/content/hands-on_training/deep_learning.html
  • 23. H2O Deep Learning, @ArnoCandel POJO Model Export for Production Scoring 23 Plain old Java code is auto-generated to take your H2O Deep Learning models into production!
  • 24. H2O Deep Learning, @ArnoCandel Parallel Scalability (for 64 epochs on MNIST, with “0.83%” parameters) 24 Speedup 0.00 10.00 20.00 30.00 40.00 1 2 4 8 16 32 63 H2O Nodes (4 cores per node, 1 epoch per node per MapReduce) 2.7 mins Training Time 0 25 50 75 100 1 2 4 8 16 32 63 H2O Nodes in minutes
  • 25. H2O Deep Learning, @ArnoCandel Goal: Predict the item from seller’s text description 25 Train: 578,361 rows 8,647 cols 467 classes Test: 64,263 rows 8,647 cols 143 classes “Vintage 18KT gold Rolex 2 Tone in great condition” Data: Bag of words vector 0,0,1,0,0,0,0,0,1,0,0,0,1,…,0 vintagegold condition Text Classification
  • 26. H2O Deep Learning, @ArnoCandel Out-Of-The-Box: 11.6% test set error after 10 epochs! Predicts the correct class (out of 143) 88.4% of the time! 26 Note 2: No tuning was done
 (results are for illustration only) Train: 578,361 rows 8,647 cols 467 classes Test: 64,263 rows 8,647 cols 143 classes Note 1: H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB) Text Classification
  • 27. H2O Deep Learning, @ArnoCandel MNIST: Unsupervised Anomaly Detection with Deep Learning (Autoencoder) 27 The good The bad The ugly Download the script and run it yourself!
  • 28. H2O Deep Learning, @ArnoCandel 28 How well did Deep Learning do? Let’s see how H2O did in the past 10 minutes! Higgs: Live Demo (Continued) <your guess?> reference paper results Any guesses for AUC on low-level features? AUC=0.76 was the best for RF/GBM/NN (H2O)
  • 29. H2O Deep Learning, @ArnoCandel H2O Steam: Scoring Platform 29 Higgs Dataset Demo on 10-node cluster Let’s score all our H2O models and compare them! http://server:port/steam/index.html Live Demo
  • 30. H2O Deep Learning, @ArnoCandel 30 Live Demo on 10-node cluster: <10 minutes runtime for all H2O algos! Better than LHC baseline of AUC=0.73! Scoring Higgs Models in H2O Steam
  • 31. H2O Deep Learning, @ArnoCandel 31 Algorithm Paper’s l-l AUC low-level H2O AUC all features
 H2O AUC Parameters (not heavily tuned), 
 H2O running on 10 nodes Generalized Linear Model - 0.596 0.684 default, binomial Random Forest - 0.764 0.840 50 trees, max depth 50 Gradient Boosted Trees 0.73 0.753 0.839 50 trees, max depth 15 Neural Net 1 layer 0.733 0.760 0.830 1x300 Rectifier, 100 epochs Deep Learning 3 hidden layers 0.836 0.850 - 3x1000 Rectifier, L2=1e-5, 40 epochs Deep Learning 4 hidden layers 0.868 0.869 - 4x500 Rectifier, L1=L2=1e-5, 300 epochs Deep Learning 5 hidden layers 0.880 0.871 - 5x500 Rectifier, L1=L2=1e-5 Deep Learning on low-level features alone beats everything else! Prelim. H2O results compare well with paper’s results* (TMVA & Theano) Higgs Particle Detection with H2O *Nature paper: http://arxiv.org/pdf/1402.4735v2.pdf HIGGS UCI Dataset: 21 low-level features AND 7 high-level derived features Train: 10M rows, Test: 500k rows
  • 32. H2O Deep Learning, @ArnoCandel Coming very soon: h2o-dev New UI: Flow New languages: python, Javascript 32
  • 33. H2O Deep Learning, @ArnoCandel h2o-dev Python Example 33
  • 34. H2O Deep Learning, @ArnoCandel Part 2: Hands-On Session 34 Web GUI Import Higgs data, split into train/test Train grid search Deep Learning model Continue training the best model ROC and Multi-Model Scoring R Studio Connect to running H2O Cluster from R Run ML algos on 3 different datasets More: Follow examples from http://learn.h2o.ai 
 (R scripts and data at http://data.h2o.ai)
  • 35. H2O Deep Learning, @ArnoCandel H2O Docker VM 35 http://h2o.ai/blog/2015/01/h2o-docker/ H2O will be at
 http://`boot2docker ip`:8996
  • 36. H2O Deep Learning, @ArnoCandel Import Higgs data 36 Enter
  • 37. H2O Deep Learning, @ArnoCandel Split Into Train/Test 37
  • 38. H2O Deep Learning, @ArnoCandel Train Grid Search DL Model 38 Enter Enter Enter Enter
  • 39. H2O Deep Learning, @ArnoCandel Continue Training Best Model 39 Scroll right Enter
  • 40. H2O Deep Learning, @ArnoCandel Inspect ROC, thresholds, etc. 40
  • 41. H2O Deep Learning, @ArnoCandel Multi-Model Scoring 41
  • 42. H2O Deep Learning, @ArnoCandel Control H2O from R Studio 42 http://learn.h2o.ai/ R scripts in github 1) Paste content of
 http://tiny.cc/h2o_next_ml into R Studio 2) Execute line by line with Ctrl-Enter to run ML algorithms on H2O Cluster via R 3) Check out the links below for more info http://h2o.gitbooks.io
  • 43. H2O Deep Learning, @ArnoCandel Snippets from R script 43 Install H2O R package & connect to H2O Server Run Deep Learning on MNIST
  • 44. H2O Deep Learning, @ArnoCandel 44 H2O GitBooks Also available: GBM & GLM GitBooks at http://h2o.gitbooks.io H2O World learn.h2o.ai R, EC2, Hadoop Deep Learning
  • 45. H2O Deep Learning, @ArnoCandel H2O Kaggle Starter R Scripts 45
  • 46. H2O Deep Learning, @ArnoCandel Re-Live H2O World! 46 http://h2o.ai/h2o-world/ http://learn.h2o.ai Watch the Videos Day 2 • Speakers from Academia & Industry • Trevor Hastie (ML) • John Chambers (S, R) • Josh Bloch (Java API) • Many use cases from customers • 3 Top Kaggle Contestants (Top 10) • 3 Panel discussions Day 1 • Hands-On Training • Supervised • Unsupervised • Advanced Topics • Markting Usecase • Product Demos • Hacker-Fest with 
 Cliff Click (CTO, Hotspot)
  • 47. H2O Deep Learning, @ArnoCandel You can participate! 47 - Images: Convolutional & Pooling Layers PUB-644 - Sequences: Recurrent Neural Networks PUB-1052 - Faster Training: GPGPU support PUB-1013 - Pre-Training: Stacked Auto-Encoders PUB-1014 - Ensembles PUB-1072 - Use H2O at Kaggle Challenges!
  • 48. H2O Deep Learning, @ArnoCandel Key Take-Aways H2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning. ! H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data! ! Join our Community and Meetups! https://github.com/h2oai h2ostream community forum www.h2o.ai @h2oai 48 Thank you!