Mateusz Dymczyk
Senior Software Engineer
H2O.ai
@mdymczyk
Introduction to
H2O4GPU
Practical Machine Learning
Machine
Learning
Moore’s Law
1980 1990 2000 2010 2020
102
103
104
105
106
107
40 Years of Microprocessor Trend Data
Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O.
Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for
2010-2015 by K. Rupp
Single-threaded perf
1.5X per year
1.1X per year
Transistors
(thousands)
GPU
1980 1990 2000 2010 2020
GPU-Computing perf
1.5X per year
1000X
by
2025
Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O.
Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for
2010-2015 by K. Rupp
102
103
104
105
106
107
Single-threaded perf
1.5X per year
1.1X per year
APPLICATIONS
SYSTEMS
ALGORITHMS
CUDA
ARCHITECTURE
GPU architecture
Low latency vs High throughput
GPU
• Optimized for data-parallel,
throughput computation
• Architecture tolerant of
memory latency
• More transistors dedicated to
computation
CPU
• Optimized for low-latency
access to cached data sets
• Control logic for out-of-order
and speculative execution
GPU Enhanced Applications
Application Code
GPU
Use GPU to
Parallelize
Compute-Intensive
Functions CPU
Rest of Sequential
CPU Code
Machine Learning and GPUs
Matrix Multiplication
Matrix Multiplication
Matrix Multiplication
Matrix Multiplication
Matrix Multiplication
H2O4GPU
• Open-Source: https://github.com/h2oai/h2o4gpu
• Collection of important ML algorithms ported to the GPU (with CPU fallback option):
• Gradient Boosted Machines
• GLM
• Truncated SVD
• PCA
• KMeans
• (soon) Field Aware Factorization Machines
• Performance optimized, multi-GPU support (certain algorithms)
• Used within our own Driverless AI Product to boost performance 30X
• Scikit-Learn compatible Python API (and now R API)
Gradient Boosting Machines
• Based upon XGBoost
• Raw floating point data -> Binned into Quantiles
• Quantiles are stored as compressed instead of floats
• Compressed Quantiles are efficiently transferred to GPU
• Sparsity is handled directly with highly GPU efficiency
• Multi-GPU by sharding rows using NVIDIA NCCL AllReduce
An Introduction to H2O4GPU
KMeans
• Significantly faster than Scikit-learn implementation (up to 50x)
• Significantly faster than other GPU implementations (5x-10x)
• Supports kmeans|| initialization
• Supports multiple GPUs by sharding the dataset
• Supports batching data if exceeds GPU memory
12 with kmeans||
Truncated SVD & PCA
• Matrix decomposition
• Popular for text processing
and dimensionality reduction
• GPU optimizes linear algebra
operations
Truncated SVD & PCA
• The intrinsic dimensionality of certain datasets is much lower than the
original (e.g. here 4096 vs. actual ~200)
• PCA can reduce the dimensionality and preserve most of the explained
variance at the same time
• Better input for further modeling - takes less time
An Introduction to H2O4GPU
Field Aware Factorization Machines
* under development
• Click Through Rate (CTR):
• One of the most important tasks in computational advertising
• Percentage of users, who actually click on ads
• Until recently solved with logistic regression - bad at finding feature conjunctions
(learns the effect of all variables or features individually)
Clicked Publisher (P) Advertiser (A) Gender (G)
Yes ESPN Nike Male
No NBC Adidas Male
Field Aware Factorization Machines
* under development
• Separates the data into fields (Publisher, Advertiser, Gender) and features (EPSN, NBC,
Adidas, Nike, Male, Female)
• Uses a latent space for each pair to generate the model
• Used to win the first prize of three CTR competitions hosted by Criteo, Avazu, Outbrain,
and also the third prize of RecSys Challenge 2015.
More info
• Code: http://github.com/h2oai/h2o4gpu
• Questions:
• https://stackoverflow.com/questions/tagged/h2o4gpu
• https://gitter.im/h2oai/h2o4gpu
Q&A

More Related Content

PDF
Machine Learning on Google Cloud with H2O
PDF
H2O-3: Overview of new features and algorithms
PDF
Get Behind the Wheel with H2O Driverless AI Hands-On Training
PPTX
Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...
PPTX
Sundar Ranganathan, NetApp + Vinod Iyengar, H2O.ai - Driverless AI integratio...
PDF
Hambug R Meetup - Intro to H2O
PDF
Training of Python scikit-learn models on Azure
PDF
Ai platform at scale
Machine Learning on Google Cloud with H2O
H2O-3: Overview of new features and algorithms
Get Behind the Wheel with H2O Driverless AI Hands-On Training
Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...
Sundar Ranganathan, NetApp + Vinod Iyengar, H2O.ai - Driverless AI integratio...
Hambug R Meetup - Intro to H2O
Training of Python scikit-learn models on Azure
Ai platform at scale

What's hot (19)

PDF
CI/CD for Machine Learning with Daniel Kobran
PDF
Bigdata Machine Learning Platform
PPTX
Paris Data Geek - Spark Streaming
PDF
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
PDF
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
PDF
Data Tells the Story - Greenplum Summit 2018
PDF
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
PDF
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
PDF
Introduction to data science with H2O-Chicago
PDF
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
PDF
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
PDF
ASGARD Splunk Conf 2016
PDF
H2O Deep Water - Making Deep Learning Accessible to Everyone
PDF
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
PDF
Rapids: Data Science on GPUs
PPTX
Designing Artificial Intelligence
PDF
Dsdt meetup 2017 11-21
PDF
RAPIDS – Open GPU-accelerated Data Science
PDF
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
CI/CD for Machine Learning with Daniel Kobran
Bigdata Machine Learning Platform
Paris Data Geek - Spark Streaming
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Data Tells the Story - Greenplum Summit 2018
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Introduction to data science with H2O-Chicago
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
ASGARD Splunk Conf 2016
H2O Deep Water - Making Deep Learning Accessible to Everyone
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Rapids: Data Science on GPUs
Designing Artificial Intelligence
Dsdt meetup 2017 11-21
RAPIDS – Open GPU-accelerated Data Science
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Ad

Similar to An Introduction to H2O4GPU (20)

PDF
Foundation of High Performance Computing HPC
PDF
GTC 2017: Powering the AI Revolution
PDF
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
PDF
In datacenter performance analysis of a tensor processing unit
PPTX
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
PPTX
DATE 2020: Design, Automation and Test in Europe Conference
PPTX
realtime_ai_systems_academia.pptx
PDF
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
PDF
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
PDF
Extending Hadoop for Fun & Profit
PDF
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
PDF
Deterministic and high throughput data processing for CubeSats
PPTX
Exascale Capabl
PPTX
Deep_Learning_Frameworks_CNTK_PyTorch
PPTX
Programmable Exascale Supercomputer
PDF
GTC Taiwan 2017 主題演說
PDF
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
PDF
Robotics technical Presentation
PPTX
Fugaku, the Successes and the Lessons Learned
Foundation of High Performance Computing HPC
GTC 2017: Powering the AI Revolution
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
In datacenter performance analysis of a tensor processing unit
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
DATE 2020: Design, Automation and Test in Europe Conference
realtime_ai_systems_academia.pptx
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Extending Hadoop for Fun & Profit
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Deterministic and high throughput data processing for CubeSats
Exascale Capabl
Deep_Learning_Frameworks_CNTK_PyTorch
Programmable Exascale Supercomputer
GTC Taiwan 2017 主題演說
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Robotics technical Presentation
Fugaku, the Successes and the Lessons Learned
Ad

More from Sri Ambati (20)

PDF
H2O Label Genie Starter Track - Support Presentation
PDF
H2O.ai Agents : From Theory to Practice - Support Presentation
PDF
H2O Generative AI Starter Track - Support Presentation Slides.pdf
PDF
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
PDF
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
PDF
Intro to Enterprise h2oGPTe Presentation Slides
PDF
Enterprise h2o GPTe Learning Path Slide Deck
PDF
H2O Wave Course Starter - Presentation Slides
PDF
Large Language Models (LLMs) - Level 3 Slides
PDF
Data Science and Machine Learning Platforms (2024) Slides
PDF
Data Prep for H2O Driverless AI - Slides
PDF
H2O Cloud AI Developer Services - Slides (2024)
PDF
LLM Learning Path Level 2 - Presentation Slides
PDF
LLM Learning Path Level 1 - Presentation Slides
PDF
Hydrogen Torch - Starter Course - Presentation Slides
PDF
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
PDF
H2O Driverless AI Starter Course - Slides and Assignments
PPTX
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
PDF
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
PPTX
Generative AI Masterclass - Model Risk Management.pptx
H2O Label Genie Starter Track - Support Presentation
H2O.ai Agents : From Theory to Practice - Support Presentation
H2O Generative AI Starter Track - Support Presentation Slides.pdf
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Intro to Enterprise h2oGPTe Presentation Slides
Enterprise h2o GPTe Learning Path Slide Deck
H2O Wave Course Starter - Presentation Slides
Large Language Models (LLMs) - Level 3 Slides
Data Science and Machine Learning Platforms (2024) Slides
Data Prep for H2O Driverless AI - Slides
H2O Cloud AI Developer Services - Slides (2024)
LLM Learning Path Level 2 - Presentation Slides
LLM Learning Path Level 1 - Presentation Slides
Hydrogen Torch - Starter Course - Presentation Slides
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
H2O Driverless AI Starter Course - Slides and Assignments
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Generative AI Masterclass - Model Risk Management.pptx

Recently uploaded (20)

PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Architecture types and enterprise applications.pdf
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PPTX
Microsoft Excel 365/2024 Beginner's training
PPTX
Modernising the Digital Integration Hub
PPTX
Configure Apache Mutual Authentication
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
STKI Israel Market Study 2025 version august
DOCX
search engine optimization ppt fir known well about this
PDF
Abstractive summarization using multilingual text-to-text transfer transforme...
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PPTX
The various Industrial Revolutions .pptx
Consumable AI The What, Why & How for Small Teams.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Enhancing emotion recognition model for a student engagement use case through...
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
A review of recent deep learning applications in wood surface defect identifi...
Architecture types and enterprise applications.pdf
The influence of sentiment analysis in enhancing early warning system model f...
Final SEM Unit 1 for mit wpu at pune .pptx
Microsoft Excel 365/2024 Beginner's training
Modernising the Digital Integration Hub
Configure Apache Mutual Authentication
1 - Historical Antecedents, Social Consideration.pdf
A contest of sentiment analysis: k-nearest neighbor versus neural network
STKI Israel Market Study 2025 version august
search engine optimization ppt fir known well about this
Abstractive summarization using multilingual text-to-text transfer transforme...
A comparative study of natural language inference in Swahili using monolingua...
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
The various Industrial Revolutions .pptx

An Introduction to H2O4GPU

  • 1. Mateusz Dymczyk Senior Software Engineer H2O.ai @mdymczyk Introduction to H2O4GPU
  • 3. Moore’s Law 1980 1990 2000 2010 2020 102 103 104 105 106 107 40 Years of Microprocessor Trend Data Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp Single-threaded perf 1.5X per year 1.1X per year Transistors (thousands)
  • 4. GPU 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000X by 2025 Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp 102 103 104 105 106 107 Single-threaded perf 1.5X per year 1.1X per year APPLICATIONS SYSTEMS ALGORITHMS CUDA ARCHITECTURE
  • 5. GPU architecture Low latency vs High throughput GPU • Optimized for data-parallel, throughput computation • Architecture tolerant of memory latency • More transistors dedicated to computation CPU • Optimized for low-latency access to cached data sets • Control logic for out-of-order and speculative execution
  • 6. GPU Enhanced Applications Application Code GPU Use GPU to Parallelize Compute-Intensive Functions CPU Rest of Sequential CPU Code
  • 13. H2O4GPU • Open-Source: https://github.com/h2oai/h2o4gpu • Collection of important ML algorithms ported to the GPU (with CPU fallback option): • Gradient Boosted Machines • GLM • Truncated SVD • PCA • KMeans • (soon) Field Aware Factorization Machines • Performance optimized, multi-GPU support (certain algorithms) • Used within our own Driverless AI Product to boost performance 30X • Scikit-Learn compatible Python API (and now R API)
  • 14. Gradient Boosting Machines • Based upon XGBoost • Raw floating point data -> Binned into Quantiles • Quantiles are stored as compressed instead of floats • Compressed Quantiles are efficiently transferred to GPU • Sparsity is handled directly with highly GPU efficiency • Multi-GPU by sharding rows using NVIDIA NCCL AllReduce
  • 16. KMeans • Significantly faster than Scikit-learn implementation (up to 50x) • Significantly faster than other GPU implementations (5x-10x) • Supports kmeans|| initialization • Supports multiple GPUs by sharding the dataset • Supports batching data if exceeds GPU memory
  • 18. Truncated SVD & PCA • Matrix decomposition • Popular for text processing and dimensionality reduction • GPU optimizes linear algebra operations
  • 19. Truncated SVD & PCA • The intrinsic dimensionality of certain datasets is much lower than the original (e.g. here 4096 vs. actual ~200) • PCA can reduce the dimensionality and preserve most of the explained variance at the same time • Better input for further modeling - takes less time
  • 21. Field Aware Factorization Machines * under development • Click Through Rate (CTR): • One of the most important tasks in computational advertising • Percentage of users, who actually click on ads • Until recently solved with logistic regression - bad at finding feature conjunctions (learns the effect of all variables or features individually) Clicked Publisher (P) Advertiser (A) Gender (G) Yes ESPN Nike Male No NBC Adidas Male
  • 22. Field Aware Factorization Machines * under development • Separates the data into fields (Publisher, Advertiser, Gender) and features (EPSN, NBC, Adidas, Nike, Male, Female) • Uses a latent space for each pair to generate the model • Used to win the first prize of three CTR competitions hosted by Criteo, Avazu, Outbrain, and also the third prize of RecSys Challenge 2015.
  • 23. More info • Code: http://github.com/h2oai/h2o4gpu • Questions: • https://stackoverflow.com/questions/tagged/h2o4gpu • https://gitter.im/h2oai/h2o4gpu
  • 24. Q&A