An introduction to Sparkling Water
Michal Malohlava
h2o.ai
Who Am I?
Background
• PhD in CS from Charles University in Prague,
Czech Republic
• Postdoc at Purdue University experimenting with
algos for large-scale computation
• Now software engineer at H2O.ai


 Experience with domain-specific languages,
distributed system, software engineering,
and big data.
H2O.ai
H2Oteam
Sri Ambati Cliff Click
Co-Founders
Stephen
Boyd
Rob
Tibshirani
Trevor
Hastie
Scientific
Advisory
Council
H2O
Open-Source In-Memory Data Science Platform
• Highly optimized Java code (in-house)
• Distributed in-memory K-V store and map/
reduce computation framework
• Data parser (HDFS, S3, NFS, HTTP, local
drives, etc.)
• Read/write access to distributed data
frames (R/Pandas-style)
• ML algos - Deep Learning, GBM, DRF,
GLM, GLRM, K-Means, PCA, CoxPH,
Ensembles
• REST API: clients Interactive UI/R/Python
Sparkling
Water
Sparkling Water
Provides
• Transparent integration of H2O into Spark
ecosystem

• Use H2O Frames and algorithms with Spark API



Excels in existing Spark workflows requiring
advanced Machine Learning algorithms
TYPICAL USE CASES
Where to use Sparkling Water?
Data

Source
Model
building
Modelling
Deep Learning, GBM
DRF, GLM, GLRM

K-Means, PCA
CoxPH, Ensembles
Prediction
processingData munging
Where to use Sparkling Water?
Data

Source
Dataparsing
munging
Modelling
Data load/munging/
exploration
Load and parse
data directly into 

H2OFrame
Ad hoc

data
transformation
Where to use Sparkling Water?
Data

Source
Off-line
model
training
Stream
processing
Data
Stream
Data munging
Model
prediction
Deploy
the model
Export model

in a binary format
or
as code
Modelling
WHAT IS INSIDE?
Cluster manager
Worker node
Spark executor
Scala/Py main
program
Driver node
H2OContext
SparkContext
Worker node
Spark executor
Worker node
Spark executor
H2OServicesH2OServices
Data

Source
SparkExecutorSparkExecutorSparkExecutor
Spark Cluster
DataFrame
H2OServices
H2OFrame
Data

Source
h2oContext.asDataFrame
h2oContext.asH2OFrame
TIME FOR DEMO!
Key Points to Remember
Sparkling Water integrates H2O to Spark
• Enables using advanced 

machine learning algorithms 

inside Spark workflows
• Offers eager computation model,

mutable data structure 

H2OFrame
THANK YOU.
@h2oai @mmalohlava
h2o.ai/download

github.com/h2oai/sparkling-water



Visit our booth K27 for live demos and more!

More Related Content

PDF
H2O Rains with Databricks Cloud - NY 02.16.16
PDF
H2O Rains with Databricks Cloud - Parisoma SF
PDF
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
PDF
Build Your Own Recommendation Engine
PDF
H2O PySparkling Water
PPTX
Migrating from Closed to Open Source - Fonda Ingram & Ken Sanford
PDF
Intro to H2O Machine Learning in R at Santa Clara University
PDF
H2O Advancements - Arno Candel
H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - Parisoma SF
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
Build Your Own Recommendation Engine
H2O PySparkling Water
Migrating from Closed to Open Source - Fonda Ingram & Ken Sanford
Intro to H2O Machine Learning in R at Santa Clara University
H2O Advancements - Arno Candel

What's hot (20)

PDF
Intro to H2O Machine Learning in Python - Galvanize Seattle
PDF
Python and H2O with Cliff Click at PyData Dallas 2015
PDF
ArnoCandelAIFrontiers011217
PDF
Strata San Jose 2016: Scalable Ensemble Learning with H2O
PDF
An Introduction to Sparkling Water by Michal Malohlava
PPTX
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
PDF
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
PDF
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
PDF
Lambda architecture for real time big data
PPTX
Building Data Pipelines with Spark and StreamSets
PDF
Latest Developments in H2O
PDF
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
PPTX
Building a Virtual Data Lake with Apache Arrow
PDF
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
PDF
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
PDF
Uber's data science workbench
PDF
H2O with Erin LeDell at Portland R User Group
PDF
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
PDF
Online Model Updating with Spark Streaming
PDF
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Intro to H2O Machine Learning in Python - Galvanize Seattle
Python and H2O with Cliff Click at PyData Dallas 2015
ArnoCandelAIFrontiers011217
Strata San Jose 2016: Scalable Ensemble Learning with H2O
An Introduction to Sparkling Water by Michal Malohlava
Insights Without Tradeoffs Using Structured Streaming keynote by Michael Armb...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Lambda architecture for real time big data
Building Data Pipelines with Spark and StreamSets
Latest Developments in H2O
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Building a Virtual Data Lake with Apache Arrow
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
Uber's data science workbench
H2O with Erin LeDell at Portland R User Group
Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala ...
Online Model Updating with Spark Streaming
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Ad

Viewers also liked (20)

PDF
H2O World - ML Could Solve NLP Challenges: Ontology Management - Erik Huddleston
PPTX
H2O World - What Do Companies Need to do to Stay Ahead - Michael Marks
PDF
Sparkling Water
PDF
H2O World - PySparkling Water - Nidhi Mehta
PDF
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
PDF
H2O World - Collaborative, Reproducible Research with H2O - Nick Elprin
PPTX
H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
PPTX
Data & Data Alliances - Scott Mclellan
PDF
2014 09 30_sparkling_water_hands_on
PDF
Cybersecurity with AI - Ashrith Barthur
PDF
H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...
PDF
Building Machine Learning Applications with Sparkling Water
PDF
Sparkling Water 2.0 - Michal Malohlava
PDF
Deep Learning with MXNet - Dmitry Larko
PPTX
Nvidia Deep Learning Solutions - Alex Sabatier
PPTX
Skutil - H2O meets Sklearn - Taylor Smith
PDF
Deep Water - GPU Deep Learning for H2O - Arno Candel
PDF
H2O AutoML roadmap - Ray Peck
PDF
Machine Learning with H2O, Spark, and Python at Strata 2015
PDF
H2O Deep Learning at Next.ML
H2O World - ML Could Solve NLP Challenges: Ontology Management - Erik Huddleston
H2O World - What Do Companies Need to do to Stay Ahead - Michael Marks
Sparkling Water
H2O World - PySparkling Water - Nidhi Mehta
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World - Collaborative, Reproducible Research with H2O - Nick Elprin
H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
Data & Data Alliances - Scott Mclellan
2014 09 30_sparkling_water_hands_on
Cybersecurity with AI - Ashrith Barthur
H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...
Building Machine Learning Applications with Sparkling Water
Sparkling Water 2.0 - Michal Malohlava
Deep Learning with MXNet - Dmitry Larko
Nvidia Deep Learning Solutions - Alex Sabatier
Skutil - H2O meets Sklearn - Taylor Smith
Deep Water - GPU Deep Learning for H2O - Arno Candel
H2O AutoML roadmap - Ray Peck
Machine Learning with H2O, Spark, and Python at Strata 2015
H2O Deep Learning at Next.ML
Ad

Similar to Introduction to Sparkling Water - Spark Summit East 2016 (20)

PDF
Madrid Meetup
PDF
Spark Summit EU talk by Jakub Hava
PDF
From R Script to Production Using rsparkling with Navdeep Gill
PPTX
"Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at ...
PDF
Intro to Machine Learning with H2O and AWS
PDF
Sparkling Water Meetup: Deep Learning for Public Safety
PPTX
H2O open source sparkling water introduction and deep dive
PDF
Introduction to H2O and Model Stacking Use Cases
PDF
H2O at Berlin R Meetup
PDF
Berlin R Meetup
PDF
H2O-3 and Sparkling Water Workshop
PPTX
ISV Showcase: End-to-end Machine Learning using H2O on Azure
PDF
H2O at Poznan R Meetup
PDF
H2O at BelgradeR Meetup
PDF
Belgrade R - Intro to H2O and Deep Water
PDF
Introduction to Machine Learning with H2O and Python
PDF
Machine Learning With H2O vs SparkML
PPTX
Project "Deep Water"
PDF
Sparkling Water Workshop
PDF
Automatic and Interpretable Machine Learning in R with H2O and LIME
Madrid Meetup
Spark Summit EU talk by Jakub Hava
From R Script to Production Using rsparkling with Navdeep Gill
"Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at ...
Intro to Machine Learning with H2O and AWS
Sparkling Water Meetup: Deep Learning for Public Safety
H2O open source sparkling water introduction and deep dive
Introduction to H2O and Model Stacking Use Cases
H2O at Berlin R Meetup
Berlin R Meetup
H2O-3 and Sparkling Water Workshop
ISV Showcase: End-to-end Machine Learning using H2O on Azure
H2O at Poznan R Meetup
H2O at BelgradeR Meetup
Belgrade R - Intro to H2O and Deep Water
Introduction to Machine Learning with H2O and Python
Machine Learning With H2O vs SparkML
Project "Deep Water"
Sparkling Water Workshop
Automatic and Interpretable Machine Learning in R with H2O and LIME

More from Sri Ambati (20)

PDF
H2O Label Genie Starter Track - Support Presentation
PDF
H2O.ai Agents : From Theory to Practice - Support Presentation
PDF
H2O Generative AI Starter Track - Support Presentation Slides.pdf
PDF
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
PDF
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
PDF
Intro to Enterprise h2oGPTe Presentation Slides
PDF
Enterprise h2o GPTe Learning Path Slide Deck
PDF
H2O Wave Course Starter - Presentation Slides
PDF
Large Language Models (LLMs) - Level 3 Slides
PDF
Data Science and Machine Learning Platforms (2024) Slides
PDF
Data Prep for H2O Driverless AI - Slides
PDF
H2O Cloud AI Developer Services - Slides (2024)
PDF
LLM Learning Path Level 2 - Presentation Slides
PDF
LLM Learning Path Level 1 - Presentation Slides
PDF
Hydrogen Torch - Starter Course - Presentation Slides
PDF
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
PDF
H2O Driverless AI Starter Course - Slides and Assignments
PPTX
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
PDF
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
PPTX
Generative AI Masterclass - Model Risk Management.pptx
H2O Label Genie Starter Track - Support Presentation
H2O.ai Agents : From Theory to Practice - Support Presentation
H2O Generative AI Starter Track - Support Presentation Slides.pdf
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Intro to Enterprise h2oGPTe Presentation Slides
Enterprise h2o GPTe Learning Path Slide Deck
H2O Wave Course Starter - Presentation Slides
Large Language Models (LLMs) - Level 3 Slides
Data Science and Machine Learning Platforms (2024) Slides
Data Prep for H2O Driverless AI - Slides
H2O Cloud AI Developer Services - Slides (2024)
LLM Learning Path Level 2 - Presentation Slides
LLM Learning Path Level 1 - Presentation Slides
Hydrogen Torch - Starter Course - Presentation Slides
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
H2O Driverless AI Starter Course - Slides and Assignments
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Generative AI Masterclass - Model Risk Management.pptx

Recently uploaded (20)

PPTX
Configure Apache Mutual Authentication
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
CloudStack 4.21: First Look Webinar slides
PPTX
Chapter 5: Probability Theory and Statistics
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
A review of recent deep learning applications in wood surface defect identifi...
PPTX
TEXTILE technology diploma scope and career opportunities
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Comparative analysis of machine learning models for fake news detection in so...
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
Five Habits of High-Impact Board Members
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PPT
Geologic Time for studying geology for geologist
PDF
STKI Israel Market Study 2025 version august
Configure Apache Mutual Authentication
NewMind AI Weekly Chronicles – August ’25 Week III
CloudStack 4.21: First Look Webinar slides
Chapter 5: Probability Theory and Statistics
UiPath Agentic Automation session 1: RPA to Agents
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
A review of recent deep learning applications in wood surface defect identifi...
TEXTILE technology diploma scope and career opportunities
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Comparative analysis of machine learning models for fake news detection in so...
Final SEM Unit 1 for mit wpu at pune .pptx
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
Five Habits of High-Impact Board Members
Improvisation in detection of pomegranate leaf disease using transfer learni...
Developing a website for English-speaking practice to English as a foreign la...
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
sustainability-14-14877-v2.pddhzftheheeeee
Taming the Chaos: How to Turn Unstructured Data into Decisions
Geologic Time for studying geology for geologist
STKI Israel Market Study 2025 version august

Introduction to Sparkling Water - Spark Summit East 2016