SlideShare a Scribd company logo
H2O for Internet of Things
Jo-fai (Joe) Chow
Data Scientist
joe@h2o.ai
@matlabulous
Data Science Milan
Politecnico di Milano
10th October, 2016
Agenda
• First Talk (25 mins)
o About H2O.ai
o Demo
• A Simple Classification Task
• H2O’s Web Interface
o Why H2O?
• Our Community
• Our Customers
o What’s Next?
• New H2O Features
• Second Talk (25 mins)
o H2O for IoT
• Predictive Maintenance
• Anomaly Detection
• H2O’s R Interface
• Third Talk (25 mins)
o Deep Water
o Demo
• H2O + mxnet on GPU
• H2O’s Python Interface
2
Data and Code
• Please go to bit.ly/h2o_milan_1
• subfolders
o iot_use_case_1
o iot_use_case_2
3
Use Case 1
Predictive Maintenance
Data for Use Case 1: SECOM
5
https://archive.ics.uci.edu/ml/datasets/SECOM
6
We want to predict fails in the future.
The ML Problem – Pass/Fail
• Inputs
o 591 features
• Output
o Classification
• -1 = pass
• 1 = fail
• Size: 1567 Samples
7
8
Features (Numeric)
ID (excluded from modeling)
9
Features (Numeric)
Response (Classification)
-1 (Pass) or 1 (Fail)
Use Case 1: Predictive Maintenance
Step 1: R Packages
step_01_install_packages.R
11
Package ‘h2o’
Use Case 1: Predictive Maintenance
Step 2: Exploratory Analysis
step_02_exploratory_analysis.R
13
Importing SECOM data
Optional (different ways to import data)
step_02_exploratory_analysis.R
14
Basic exploratory analysis
Convert -1 and 1 to categorical value
Note: Imbalance dataset (only 104 fails)
Use H2O Flow (localhost:54321)
15
Use Case 1: Predictive Maintenance
Step 3: Building &
Evaluating Models
step_03_basic_models.R
17
Define features & target
step_03_basic_models.R
18
Split data with a random seed
Classification 1 samples ≈ 7%
step_03_basic_models.R
19
Train H2O models with default values
H2O automatically ignores
Columns with constant values
step_03_basic_models.R
20
summary(model_xxx)
Use H2O Flow (localhost:54321)
21
step_03_basic_models.R
22
Evaluate models with
test data
Advanced Procedures
• Step 4 – Manual Tuning
• Step 5 – Early Stopping
• Step 6 – Grid Search
• Step 7 – Stacking Models (“h2oEnsemble”)
• Step 8 – Saving/Loading Models
• Please try them out later (bit.ly/h2o_milan_1)
23
Use Case 2:
Anomaly Detection
Anomaly (Outlier) Detection
• Definition
o Identification of items,
events or observations
which do not conform to
an expected pattern or
other items in a dataset.
• Use Cases
o Bank Fraud
o Monitoring
Manufacturing Lines
o Machine Learning
• Separate dataset and
build different models
25Photo credit: www.dbta.com
Deep Autoencoder for Anomaly Detection
• Consider the following three-layer neural
network with one hidden layer and the same
number of input neurons (features) as output
neurons.
• The loss function is the mean squared error
(MSE) between the input and the output. Hence,
the network is forced to learn the identity via a
nonlinear, reduced representation of the original
data.
o e.g. High MSE = potential outliers
• Such an algorithm is called a deep autoencoder.
26https://github.com/h2oai/h2o-training-book/blob/master/hands-on_training/anomaly_detection.md
MNIST Example – The Good Ones
27Samples with Low Mean Squared Error (MSE)
MNIST Example – The Bad Ones
28Samples with High Mean Squared Error (MSE)
MNIST Example – The Ugly Ones
29Samples with Highest Mean Squared Error (MSE)
use_case_2_anomaly_detection.R
30
use_case_2_anomaly_detection.R
31
Define your own cut-off point
Build a Deep Autoencoder
Look at the MSE
Define cut-off Outliers identified
End of Second Talk – Thanks!
32
• Data Science Milan
• Gianmario Spacagna
• Politecnico di Milano
• Resources
o bit.ly/h2o_milan_1
o www.h2o.ai
o docs.h2o.ai
• Contact
o joe@h2o.ai
o @matlabulous
o github.com/woobe

More Related Content

PDF
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
PDF
Project “Deep Water” (H2O integration with other deep learning libraries - Jo...
PDF
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...
PDF
Data Intensive Applications with Apache Flink
PDF
Think Big | Enterprise Artificial Intelligence
PDF
Image Caption Generation: Intro to Distributed Tensorflow and Distributed Sco...
PDF
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
PDF
Love & Innovative technology presented by a technology pioneer and an AI expe...
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Project “Deep Water” (H2O integration with other deep learning libraries - Jo...
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...
Data Intensive Applications with Apache Flink
Think Big | Enterprise Artificial Intelligence
Image Caption Generation: Intro to Distributed Tensorflow and Distributed Sco...
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
Love & Innovative technology presented by a technology pioneer and an AI expe...

What's hot (20)

PPTX
Anomaly Detection using Deep Auto-Encoders | Gianmario Spacagna
PDF
cyREST: Cytoscape as a Service
PDF
Towards the Cytoscape Cyberinfrastructure
PDF
Knowledge Discovery in Production
PDF
Deep Learning with MXNet - Dmitry Larko
PDF
What's New in Cytoscape
PDF
TensorFlow London: Cutting edge generative models
PDF
Some "challenges" on the open-source/open-data front
PDF
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
PDF
Scaling AI in production using PyTorch
PDF
Overview of Modern Graph Analysis Tools
PDF
Cytoscape: Now and Future
PDF
OpenVis Conference Report Part 1 (and Introduction to D3.js)
PDF
Building Reproducible Network Data Analysis / Visualization Workflows
PDF
EclipseCon France 2015 - Science Track
PDF
Cytoscape and External Data Analysis Tools
PDF
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford Consortium
PDF
2a Mini-conf PredictCovid. Field: Artificial Intelligence
PDF
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...
PDF
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Anomaly Detection using Deep Auto-Encoders | Gianmario Spacagna
cyREST: Cytoscape as a Service
Towards the Cytoscape Cyberinfrastructure
Knowledge Discovery in Production
Deep Learning with MXNet - Dmitry Larko
What's New in Cytoscape
TensorFlow London: Cutting edge generative models
Some "challenges" on the open-source/open-data front
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Scaling AI in production using PyTorch
Overview of Modern Graph Analysis Tools
Cytoscape: Now and Future
OpenVis Conference Report Part 1 (and Introduction to D3.js)
Building Reproducible Network Data Analysis / Visualization Workflows
EclipseCon France 2015 - Science Track
Cytoscape and External Data Analysis Tools
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford Consortium
2a Mini-conf PredictCovid. Field: Artificial Intelligence
DN18 | The Evolution and Future of Graph Technology: Intelligent Systems | Ax...
Data Programming: Creating Large Datasets, Quickly -- Presented at JPL MLRG
Ad

Viewers also liked (19)

PPTX
Inaugural talk Data Science Milan - Gianmario Spacagna
PDF
Data intensive applications with Apache Flink - Simone Robutti, Radicalbit
PDF
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
PPTX
Presentacion electronica
ODP
Railings
PPTX
Best of Gymnastmatte und Yogamatte
PDF
The Firm Media - Case Studies
DOCX
Wayne Hellon
PDF
POSDigital_References_en_small2
ODP
Report on seminar sayedwaheed02 ibm02
PPT
HRIS Compensation and Benefit Analyst fix
PPTX
Arquitectura romana
DOCX
PPTX
Equipo kappa-1
PPTX
Tecnologias de la comunicación y su infuencia
DOCX
Cb07 revista digital
RTF
Nancy Resume-5-2
PPTX
Zoológico
PPTX
Como a evolucionado la tecnología
Inaugural talk Data Science Milan - Gianmario Spacagna
Data intensive applications with Apache Flink - Simone Robutti, Radicalbit
The Barclays Data Science Hackathon: Building Retail Recommender Systems base...
Presentacion electronica
Railings
Best of Gymnastmatte und Yogamatte
The Firm Media - Case Studies
Wayne Hellon
POSDigital_References_en_small2
Report on seminar sayedwaheed02 ibm02
HRIS Compensation and Benefit Analyst fix
Arquitectura romana
Equipo kappa-1
Tecnologias de la comunicación y su infuencia
Cb07 revista digital
Nancy Resume-5-2
Zoológico
Como a evolucionado la tecnología
Ad

Similar to H2O for IoT - Jo-Fai (Joe) Chow, H2O (20)

PDF
AI & ML in Cyber Security - Why Algorithms Are Dangerous
PPTX
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
PPTX
Micrometrics to forecast performance tsunamis
PDF
Reverse_Engineering_of_binary_File_Formats.pdf
PPTX
7 Micro-Metrics That Predict Production Outages in Performance Labs Webinar
PDF
陸永祥/全球網路攝影機帶來的機會與挑戰
PDF
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
PDF
AIPyCraft: AI-Assisted Software Development Lifecycle for 6G Blockchain Oracl...
PPTX
Microsoft IoT & Data OpenHack Zürich Day 2
PDF
What’s eating python performance
PPTX
Diadem 1.0
PDF
Data Lineage, Property Based Testing & Neo4j
PPTX
are algorithms really a black box
PDF
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
PPTX
LEGaTO: Use cases
PPTX
SKG-2013, Beijing, China, 03 October 2013
PDF
H2O at Poznan R Meetup
PPTX
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
PPTX
Production Debugging at Code Camp Philly
PPTX
VISUG - Approaches for application request throttling
AI & ML in Cyber Security - Why Algorithms Are Dangerous
Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...
Micrometrics to forecast performance tsunamis
Reverse_Engineering_of_binary_File_Formats.pdf
7 Micro-Metrics That Predict Production Outages in Performance Labs Webinar
陸永祥/全球網路攝影機帶來的機會與挑戰
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
AIPyCraft: AI-Assisted Software Development Lifecycle for 6G Blockchain Oracl...
Microsoft IoT & Data OpenHack Zürich Day 2
What’s eating python performance
Diadem 1.0
Data Lineage, Property Based Testing & Neo4j
are algorithms really a black box
JUNIPER: Towards Modeling Approach Enabling Efficient Platform for Heterogene...
LEGaTO: Use cases
SKG-2013, Beijing, China, 03 October 2013
H2O at Poznan R Meetup
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
Production Debugging at Code Camp Philly
VISUG - Approaches for application request throttling

More from Data Science Milan (20)

PDF
ML & Graph algorithms to prevent financial crime in digital payments
PDF
How to use the Economic Complexity Index to guide innovation plans
PDF
Robustness Metrics for ML Models based on Deep Learning Methods
PDF
"You don't need a bigger boat": serverless MLOps for reasonable companies
PDF
Question generation using Natural Language Processing by QuestGen.AI
PDF
Speed up data preparation for ML pipelines on AWS
PPTX
Serverless machine learning architectures at Helixa
PDF
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
PDF
Reinforcement Learning Overview | Marco Del Pra
PDF
Time Series Classification with Deep Learning | Marco Del Pra
PDF
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
PDF
Audience projection of target consumers over multiple domains a ner and baye...
PDF
Weak supervised learning - Kristina Khvatova
PDF
GANs beyond nice pictures: real value of data generation, Alex Honchar
PDF
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
PDF
3D Point Cloud analysis using Deep Learning
PDF
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
PDF
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
PDF
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
PDF
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
ML & Graph algorithms to prevent financial crime in digital payments
How to use the Economic Complexity Index to guide innovation plans
Robustness Metrics for ML Models based on Deep Learning Methods
"You don't need a bigger boat": serverless MLOps for reasonable companies
Question generation using Natural Language Processing by QuestGen.AI
Speed up data preparation for ML pipelines on AWS
Serverless machine learning architectures at Helixa
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
Reinforcement Learning Overview | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del Pra
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Audience projection of target consumers over multiple domains a ner and baye...
Weak supervised learning - Kristina Khvatova
GANs beyond nice pictures: real value of data generation, Alex Honchar
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
3D Point Cloud analysis using Deep Learning
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...

Recently uploaded (20)

PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation theory and applications.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
cuic standard and advanced reporting.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Encapsulation_ Review paper, used for researhc scholars
Unlocking AI with Model Context Protocol (MCP)
Per capita expenditure prediction using model stacking based on satellite ima...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Modernizing your data center with Dell and AMD
NewMind AI Weekly Chronicles - August'25 Week I
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation theory and applications.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
“AI and Expert System Decision Support & Business Intelligence Systems”
Review of recent advances in non-invasive hemoglobin estimation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Understanding_Digital_Forensics_Presentation.pptx
Approach and Philosophy of On baking technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Network Security Unit 5.pdf for BCA BBA.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MYSQL Presentation for SQL database connectivity
cuic standard and advanced reporting.pdf

H2O for IoT - Jo-Fai (Joe) Chow, H2O