SlideShare a Scribd company logo
Bring Your Own Recipes
Make Your Own AI
Confidential2 Confidential2
• aquarium.h2o.ai
• H2O.ai’s software-as-a-service platform for training and initial
exploration
• Recommended for use as a training, workshops and tutorials
• Driverless AI Test Drive
• https://github.com/h2oai/tutorials/blob/master/DriverlessAI/Test-
Drive/test-drive.md
• Your data will disappear after the time period
• Run as many times as needed
H2O Aquarium 1
2
3
Confidential3
Make Your Own AI: Agenda
• Where does BYOR fit into Driverless AI?
• What are custom recipes?
• Tutorial: Using custom recipes
• What does it take to write a recipe?
• Example deep dive with the experts
Confidential4
Key Capabilities of H2O Driverless AI
• Automatic Feature Engineering
• Automatic Visualization
• Machine Learning Interpretability (MLI)
• Automatic Scoring Pipelines
• Natural Language Processing
• Time Series Forecasting
• Flexibility of Data & Deployment
• NVIDIA GPU Acceleration
• Bring-Your-Own Recipes
Confidential5
Driverless AI Across Industries
Confidential6
The Workflow of Driverless AI
SQL
HDFS
X Y
Automatic Model Optimization
Automatic
Scoring Pipeline
Deploy
Low-latency
Scoring to
Production
Modelling
Dataset
Model Recipes
• i.i.d. data
• Time-series
• More on the way
Advanced
Feature
Engineering
Algorithm
Model
Tuning+ +
Survival of the Fittest
1 Drag and Drop Data
2 Automatic Visualization
4 Automatic Model Optimization
5 Automatic Scoring Pipelines
Snowflake
Model
Documentation
 Upload your own recipe(s)
Transformations Algorithms Scorers
3 Bring Your Own Recipes
 Driverless AI executes automation on your recipes
Feature engineering, model selection, hyper-parameter tuning,
overfitting protection
 Driverless AI automates
model scoring and
deployment using your
recipes
Amazon S3
Google BigQuery
Azure Blog Storage
Confidential7
What is a Recipe…
• Machine Learning Pipelines’ model prepped data to solve a business question
• Transformations are done on the original data to ensure it’s clean and most predictive
• Additional datasets may be brought in to add insights
• The data is modeled using an algorithm to find the optimal rules to solve the problem
• We determine the best model by using a specific metric, or scorer
• BYOR stands for Bring Your Own Recipe and it allows domain scientists to solve their
problems faster and with more precision by adding their expertise in the form of Python
code snippets
• By providing your own custom recipes, you can gain control over the optimization choices
that Driverless AI makes to best solve your machine learning problems
Confidential8
• Flexibility, extensibility and customizations built into the Driverless AI
platform
• New open source recipes built by the data science community, curated by
Kaggle Grand Masters @ H2O.ai
• Data scientists can focus on domain-specific functions to build
customizations
• 1-click upload of your recipes – models, scorers and transformations
• Driverless AI treats custom recipes as first-class citizens in the automatic
machine learning workflow
• Every business can have a recipe cookbook for collaborative data
science within their organization
…and Why Do You Care?
Confidential9
https://h2oai.github.io/tutorials/
Confidential10 Confidential10
• aquarium.h2o.ai
• H2O.ai’s software-as-a-service platform for training and initial
exploration
• Recommended for use as a training, workshops and tutorials
• Driverless AI Test Drive
• https://github.com/h2oai/tutorials/blob/master/DriverlessAI/Test-
Drive/test-drive.md
• Your data will disappear after the time period
• Run as many times as needed
H2O Aquarium 1
2
3
Confidential11
The Writing Recipes Process
• First write and test idea on
sample data before wrapping as
a recipe
• Download the Driverless AI
Recipes Repository for easy
access to examples
• Use the Recipe Templates to
ensure you have all required
components
https://github.com/h2oai/driverlessai-recipes
Confidential12
What does it take to write a custom recipe?
• Somewhere to write .py files
• To use or test your recipe you need Driverless AI 1.7.0 or later
• BYOR is not available in the current LTS release series (1.6.X)
• To test your code locally you need
• Python 3.6, numpy, datatable, & the Driverless AI python client
• Python development environment such as PyCharm or Spyder
• To write recipes you need
• The ability to write python code
Confidential13
The Testing Recipes Process
• Upload to Driverless AI to
automatically test on sample data
or
• Use the DAI Python or R client to
automate this process
or
• Test locally using a dummy
version of the RecipeTransformer
class we will be extending
Confidential14
What if I get stuck writing a custom recipe?
• Use error messages and stack traces from Driverless AI & your python development
environment to try to pinpoint what is causing the problem
• Write to the Driverless AI Experiment Logs (Example in Advanced Options below)
• Read the FAQ & look the templates: https://github.com/h2oai/driverlessai-recipes
• Follow along with the tutorial (Coming Soon): https://h2oai.github.io/tutorials/
• Ask on the community channel: https://www.h2o.ai/community/
Confidential15
Build Your Own Recipe
Full customization of the entire ML Pipeline through scikit-learn Python API
Custom Feature Engineering – fit_transform & transform
• Custom statistical transformations and embeddings for numbers, categories,
text, date/time, time-series, image, audio, zip, lat/long, ICD, ...
Custom Optimization Functions – f(id, actual, predicted, weight)
• Ranking, Pricing, Yield Scoring, Cost/Reward, any Business Metrics
Custom ML Algorithms – fit & predict
• Access to ML ecosystem: H2O-3, sklearn, Keras, PyTorch, CatBoost, etc.
Confidential16
https://h2oai.github.io/tutorials/
Confidential17 Confidential17
Dive into H2O
https://www.eventbrite.com/e/dive-into-h2o-new-york-tickets-76351721053
Confidential18
More details:
FAQ / Architecture Diagram etc.
https://github.com/h2oai/driverlessai-recipes
Confidential19
Bring Your Own Recipes
• What is BYOR?
• Building a Transformer
• Building a Scorer
• Building a Model Algorithm
• Advanced Options
• Writing Recipes Help
Confidential20
Advanced Options: Importing Packages
• Install and use the exact version of the
exact package you need for your recipe
• _global_modules_needed_by_name
• Use before class definition for when there are
multiple recipes in one file that need the
package
• _modules_needed_by_name
• Use in the class definition
"""Row-by-row similarity between two text columns based
on FuzzyWuzzy"""
# https://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-
string-matching-in-python/
# https://github.com/seatgeek/fuzzywuzzy
from h2oaicore.transformer_utils import
CustomTransformer
import datatable as dt
import numpy as np
_global_modules_needed_by_name = ['nltk==3.4.3']
import nltk
Confidential21
Advanced Options: Similar Recipes
• Extend your custom recipes when there
are multiple options or similar methods
and you want all of them to be tested
class FuzzyQRatioTransformer(FuzzyBaseTransformer,
CustomTransformer):
_method = "QRatio"
class FuzzyWRatioTransformer(FuzzyBaseTransformer,
CustomTransformer):
_method = "WRatio"
class
ZipcodeTypeTransformer(ZipcodeLightBaseTransformer,
CustomTransformer):
def get_property_name(self, value):
return 'zip_code_type'
class
ZipcodeCityTransformer(ZipcodeLightBaseTransformer,
CustomTransformer):
def get_property_name(self, value):
return 'city'
Confidential22
Advanced Options: Recipe Parameters
• set_default_params
• Parameters of models or transformers
• Access in functions with self.params
from h2oaicore.systemutils import physical_cores_count
class ExtraTreesModel(CustomModel):
_display_name = "ExtraTrees"
_description = "Extra Trees Model based on sklearn"
def set_default_params(self, accuracy=None,
time_tolerance=None, interpretability=None, **kwargs):
self.params = dict(
random_state=kwargs.get("random_state", 1234)
, n_estimators=min(kwargs.get("n_estimators", 100), 1000)
, criterion="gini" if self.num_classes >= 2 else "mse"
, n_jobs=self.params_base.get('n_jobs', max(1,
physical_cores_count)))
Confidential23
Advanced Options: Recipe Parameters
• mutate_params
• Random permutations of parameter
options for transformers and models
• Can get the options chosen in final
model from Auto Doc
class ExtraTreesModel(CustomModel):
_display_name = "ExtraTrees"
_description = "Extra Trees Model based on sklearn"
def mutate_params(self, accuracy=10, **kwargs):
if accuracy > 8:
estimators_list = [100, 200, 300, 500, 1000, 2000]
elif accuracy >= 5:
estimators_list = [50, 100, 200, 300, 400, 500]
else:
estimators_list = [10, 50, 100, 150, 200, 250, 300]
# Modify certain parameters for tuning
self.params["n_estimators"] =
int(np.random.choice(estimators_list))
self.params["criterion"] = np.random.choice(["gini", "entropy"]) if
self.num_classes >= 2 
else np.random.choice(["mse", "mae"])
Confidential24
Advanced Options: Writing to Logs
• Leave notes in the
experiment logs
from h2oaicore.systemutils import make_experiment_logger,
loggerinfo, loggerwarning
...
if self.context and self.context.experiment_id:
logger = make_experiment_logger(
experiment_id=self.context.experiment_id,
tmp_dir=self.context.tmp_dir,
experiment_tmp_dir=self.context.experiment_tmp_dir
)
...
loggerinfo(logger, "Prophet will use {} workers for
fitting".format(n_jobs))

More Related Content

PPTX
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
PPTX
Custom Machine Learning Recipes for the Enterprise
PPTX
Automatic Model Documentation with H2O
PPTX
Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...
PDF
Scalable Automatic Machine Learning with H2O
PPTX
Near realtime AI deployment with huge data and super low latency - Levi Brack...
PDF
Productionizing H2O Models with Apache Spark
PDF
Challenges of Operationalising Data Science in Production
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Custom Machine Learning Recipes for the Enterprise
Automatic Model Documentation with H2O
Nanda Vijaydev, BlueData - Deploying H2O in Large Scale Distributed Environme...
Scalable Automatic Machine Learning with H2O
Near realtime AI deployment with huge data and super low latency - Levi Brack...
Productionizing H2O Models with Apache Spark
Challenges of Operationalising Data Science in Production

What's hot (20)

PDF
From Chatbots to Augmented Conversational Assistants
PPTX
Machine Learning with H2O
PPTX
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
PDF
Driverless AI - Intro + Interactive Hands-on Lab
PDF
Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...
PPTX
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
PDF
Simplifying AI integration on Apache Spark
PPTX
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
PDF
Weave GitOps - continuous delivery for any Kubernetes
PDF
MLOps with Kubeflow
PPTX
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
PDF
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
PDF
Introducción al Machine Learning Automático
PDF
Productionizing Machine Learning in Our Health and Wellness Marketplace
PDF
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
PPTX
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
PPTX
Magdalena Stenius: MLOPS Will Change Machine Learning
PDF
Vertex AI: Pipelines for your MLOps workflows
PDF
H2O Driverless AI Workshop
PDF
Overcoming Regulatory & Compliance Hurdles with Hybrid Cloud EKS and Weave Gi...
From Chatbots to Augmented Conversational Assistants
Machine Learning with H2O
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Driverless AI - Intro + Interactive Hands-on Lab
Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Simplifying AI integration on Apache Spark
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Weave GitOps - continuous delivery for any Kubernetes
MLOps with Kubeflow
Using H2O for Mobile Transaction Forecasting & Anomaly Detection - Capital One
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Introducción al Machine Learning Automático
Productionizing Machine Learning in Our Health and Wellness Marketplace
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Magdalena Stenius: MLOPS Will Change Machine Learning
Vertex AI: Pipelines for your MLOps workflows
H2O Driverless AI Workshop
Overcoming Regulatory & Compliance Hurdles with Hybrid Cloud EKS and Weave Gi...
Ad

Similar to Get Started with Driverless AI Recipes - Hands-on Training (20)

PDF
Custom Machine Learning Recipes
PDF
H2O Driverless AI Starter Course - Slides and Assignments
PDF
Bring Your Own Recipes Hands-On Session
PPTX
Introduction & Hands-on with H2O Driverless AI
PPTX
Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...
PDF
Accelerate ML Deployment with H2O Driverless AI on AWS
PDF
A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...
PPTX
Cassie Kozyrkov. Journey to AI
PPTX
H2O World - Munging, modeling, and pipelines using Python - Hank Roark
PDF
Machine Learning on Google Cloud with H2O
PDF
A Look Under the Hood of H2O Driverless AI
PDF
Get Behind the Wheel with H2O Driverless AI Hands-On Training
PDF
H2o.ai presentation at 2nd Virtual Pydata Piraeus meetup
PDF
Generalized Linear Models with H2O
PPTX
H2O open source sparkling water introduction and deep dive
PDF
Big Data LDN 2017: H2O.ai Driverless AI: Fast, Accurate, Interpretable AI
PDF
Introduction to Machine Learning with H2O and Python
PDF
Introduction to Machine Learning with H2O and Python
PPTX
Using H2O Random Grid Search for Hyper-parameters Optimization
PPTX
In this final review ppt we have usecase diagrams
Custom Machine Learning Recipes
H2O Driverless AI Starter Course - Slides and Assignments
Bring Your Own Recipes Hands-On Session
Introduction & Hands-on with H2O Driverless AI
Real-Time AI: Designing for Low Latency and High Throughput - Dr. Sergei Izra...
Accelerate ML Deployment with H2O Driverless AI on AWS
A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...
Cassie Kozyrkov. Journey to AI
H2O World - Munging, modeling, and pipelines using Python - Hank Roark
Machine Learning on Google Cloud with H2O
A Look Under the Hood of H2O Driverless AI
Get Behind the Wheel with H2O Driverless AI Hands-On Training
H2o.ai presentation at 2nd Virtual Pydata Piraeus meetup
Generalized Linear Models with H2O
H2O open source sparkling water introduction and deep dive
Big Data LDN 2017: H2O.ai Driverless AI: Fast, Accurate, Interpretable AI
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
Using H2O Random Grid Search for Hyper-parameters Optimization
In this final review ppt we have usecase diagrams
Ad

More from Sri Ambati (20)

PDF
H2O Label Genie Starter Track - Support Presentation
PDF
H2O.ai Agents : From Theory to Practice - Support Presentation
PDF
H2O Generative AI Starter Track - Support Presentation Slides.pdf
PDF
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
PDF
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
PDF
Intro to Enterprise h2oGPTe Presentation Slides
PDF
Enterprise h2o GPTe Learning Path Slide Deck
PDF
H2O Wave Course Starter - Presentation Slides
PDF
Large Language Models (LLMs) - Level 3 Slides
PDF
Data Science and Machine Learning Platforms (2024) Slides
PDF
Data Prep for H2O Driverless AI - Slides
PDF
H2O Cloud AI Developer Services - Slides (2024)
PDF
LLM Learning Path Level 2 - Presentation Slides
PDF
LLM Learning Path Level 1 - Presentation Slides
PDF
Hydrogen Torch - Starter Course - Presentation Slides
PDF
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
PPTX
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
PDF
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
PPTX
Generative AI Masterclass - Model Risk Management.pptx
PDF
AI and the Future of Software Development: A Sneak Peek
H2O Label Genie Starter Track - Support Presentation
H2O.ai Agents : From Theory to Practice - Support Presentation
H2O Generative AI Starter Track - Support Presentation Slides.pdf
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Intro to Enterprise h2oGPTe Presentation Slides
Enterprise h2o GPTe Learning Path Slide Deck
H2O Wave Course Starter - Presentation Slides
Large Language Models (LLMs) - Level 3 Slides
Data Science and Machine Learning Platforms (2024) Slides
Data Prep for H2O Driverless AI - Slides
H2O Cloud AI Developer Services - Slides (2024)
LLM Learning Path Level 2 - Presentation Slides
LLM Learning Path Level 1 - Presentation Slides
Hydrogen Torch - Starter Course - Presentation Slides
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Generative AI Masterclass - Model Risk Management.pptx
AI and the Future of Software Development: A Sneak Peek

Recently uploaded (20)

PPTX
Machine Learning_overview_presentation.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Machine learning based COVID-19 study performance prediction
PPT
Teaching material agriculture food technology
PDF
Getting Started with Data Integration: FME Form 101
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Approach and Philosophy of On baking technology
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Mushroom cultivation and it's methods.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Machine Learning_overview_presentation.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Per capita expenditure prediction using model stacking based on satellite ima...
Machine learning based COVID-19 study performance prediction
Teaching material agriculture food technology
Getting Started with Data Integration: FME Form 101
Reach Out and Touch Someone: Haptics and Empathic Computing
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Building Integrated photovoltaic BIPV_UPV.pdf
A comparative analysis of optical character recognition models for extracting...
Approach and Philosophy of On baking technology
OMC Textile Division Presentation 2021.pptx
A Presentation on Artificial Intelligence
Mushroom cultivation and it's methods.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Mobile App Security Testing_ A Comprehensive Guide.pdf
MIND Revenue Release Quarter 2 2025 Press Release
SOPHOS-XG Firewall Administrator PPT.pptx
Programs and apps: productivity, graphics, security and other tools
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Get Started with Driverless AI Recipes - Hands-on Training

  • 1. Bring Your Own Recipes Make Your Own AI
  • 2. Confidential2 Confidential2 • aquarium.h2o.ai • H2O.ai’s software-as-a-service platform for training and initial exploration • Recommended for use as a training, workshops and tutorials • Driverless AI Test Drive • https://github.com/h2oai/tutorials/blob/master/DriverlessAI/Test- Drive/test-drive.md • Your data will disappear after the time period • Run as many times as needed H2O Aquarium 1 2 3
  • 3. Confidential3 Make Your Own AI: Agenda • Where does BYOR fit into Driverless AI? • What are custom recipes? • Tutorial: Using custom recipes • What does it take to write a recipe? • Example deep dive with the experts
  • 4. Confidential4 Key Capabilities of H2O Driverless AI • Automatic Feature Engineering • Automatic Visualization • Machine Learning Interpretability (MLI) • Automatic Scoring Pipelines • Natural Language Processing • Time Series Forecasting • Flexibility of Data & Deployment • NVIDIA GPU Acceleration • Bring-Your-Own Recipes
  • 6. Confidential6 The Workflow of Driverless AI SQL HDFS X Y Automatic Model Optimization Automatic Scoring Pipeline Deploy Low-latency Scoring to Production Modelling Dataset Model Recipes • i.i.d. data • Time-series • More on the way Advanced Feature Engineering Algorithm Model Tuning+ + Survival of the Fittest 1 Drag and Drop Data 2 Automatic Visualization 4 Automatic Model Optimization 5 Automatic Scoring Pipelines Snowflake Model Documentation  Upload your own recipe(s) Transformations Algorithms Scorers 3 Bring Your Own Recipes  Driverless AI executes automation on your recipes Feature engineering, model selection, hyper-parameter tuning, overfitting protection  Driverless AI automates model scoring and deployment using your recipes Amazon S3 Google BigQuery Azure Blog Storage
  • 7. Confidential7 What is a Recipe… • Machine Learning Pipelines’ model prepped data to solve a business question • Transformations are done on the original data to ensure it’s clean and most predictive • Additional datasets may be brought in to add insights • The data is modeled using an algorithm to find the optimal rules to solve the problem • We determine the best model by using a specific metric, or scorer • BYOR stands for Bring Your Own Recipe and it allows domain scientists to solve their problems faster and with more precision by adding their expertise in the form of Python code snippets • By providing your own custom recipes, you can gain control over the optimization choices that Driverless AI makes to best solve your machine learning problems
  • 8. Confidential8 • Flexibility, extensibility and customizations built into the Driverless AI platform • New open source recipes built by the data science community, curated by Kaggle Grand Masters @ H2O.ai • Data scientists can focus on domain-specific functions to build customizations • 1-click upload of your recipes – models, scorers and transformations • Driverless AI treats custom recipes as first-class citizens in the automatic machine learning workflow • Every business can have a recipe cookbook for collaborative data science within their organization …and Why Do You Care?
  • 10. Confidential10 Confidential10 • aquarium.h2o.ai • H2O.ai’s software-as-a-service platform for training and initial exploration • Recommended for use as a training, workshops and tutorials • Driverless AI Test Drive • https://github.com/h2oai/tutorials/blob/master/DriverlessAI/Test- Drive/test-drive.md • Your data will disappear after the time period • Run as many times as needed H2O Aquarium 1 2 3
  • 11. Confidential11 The Writing Recipes Process • First write and test idea on sample data before wrapping as a recipe • Download the Driverless AI Recipes Repository for easy access to examples • Use the Recipe Templates to ensure you have all required components https://github.com/h2oai/driverlessai-recipes
  • 12. Confidential12 What does it take to write a custom recipe? • Somewhere to write .py files • To use or test your recipe you need Driverless AI 1.7.0 or later • BYOR is not available in the current LTS release series (1.6.X) • To test your code locally you need • Python 3.6, numpy, datatable, & the Driverless AI python client • Python development environment such as PyCharm or Spyder • To write recipes you need • The ability to write python code
  • 13. Confidential13 The Testing Recipes Process • Upload to Driverless AI to automatically test on sample data or • Use the DAI Python or R client to automate this process or • Test locally using a dummy version of the RecipeTransformer class we will be extending
  • 14. Confidential14 What if I get stuck writing a custom recipe? • Use error messages and stack traces from Driverless AI & your python development environment to try to pinpoint what is causing the problem • Write to the Driverless AI Experiment Logs (Example in Advanced Options below) • Read the FAQ & look the templates: https://github.com/h2oai/driverlessai-recipes • Follow along with the tutorial (Coming Soon): https://h2oai.github.io/tutorials/ • Ask on the community channel: https://www.h2o.ai/community/
  • 15. Confidential15 Build Your Own Recipe Full customization of the entire ML Pipeline through scikit-learn Python API Custom Feature Engineering – fit_transform & transform • Custom statistical transformations and embeddings for numbers, categories, text, date/time, time-series, image, audio, zip, lat/long, ICD, ... Custom Optimization Functions – f(id, actual, predicted, weight) • Ranking, Pricing, Yield Scoring, Cost/Reward, any Business Metrics Custom ML Algorithms – fit & predict • Access to ML ecosystem: H2O-3, sklearn, Keras, PyTorch, CatBoost, etc.
  • 17. Confidential17 Confidential17 Dive into H2O https://www.eventbrite.com/e/dive-into-h2o-new-york-tickets-76351721053
  • 18. Confidential18 More details: FAQ / Architecture Diagram etc. https://github.com/h2oai/driverlessai-recipes
  • 19. Confidential19 Bring Your Own Recipes • What is BYOR? • Building a Transformer • Building a Scorer • Building a Model Algorithm • Advanced Options • Writing Recipes Help
  • 20. Confidential20 Advanced Options: Importing Packages • Install and use the exact version of the exact package you need for your recipe • _global_modules_needed_by_name • Use before class definition for when there are multiple recipes in one file that need the package • _modules_needed_by_name • Use in the class definition """Row-by-row similarity between two text columns based on FuzzyWuzzy""" # https://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy- string-matching-in-python/ # https://github.com/seatgeek/fuzzywuzzy from h2oaicore.transformer_utils import CustomTransformer import datatable as dt import numpy as np _global_modules_needed_by_name = ['nltk==3.4.3'] import nltk
  • 21. Confidential21 Advanced Options: Similar Recipes • Extend your custom recipes when there are multiple options or similar methods and you want all of them to be tested class FuzzyQRatioTransformer(FuzzyBaseTransformer, CustomTransformer): _method = "QRatio" class FuzzyWRatioTransformer(FuzzyBaseTransformer, CustomTransformer): _method = "WRatio" class ZipcodeTypeTransformer(ZipcodeLightBaseTransformer, CustomTransformer): def get_property_name(self, value): return 'zip_code_type' class ZipcodeCityTransformer(ZipcodeLightBaseTransformer, CustomTransformer): def get_property_name(self, value): return 'city'
  • 22. Confidential22 Advanced Options: Recipe Parameters • set_default_params • Parameters of models or transformers • Access in functions with self.params from h2oaicore.systemutils import physical_cores_count class ExtraTreesModel(CustomModel): _display_name = "ExtraTrees" _description = "Extra Trees Model based on sklearn" def set_default_params(self, accuracy=None, time_tolerance=None, interpretability=None, **kwargs): self.params = dict( random_state=kwargs.get("random_state", 1234) , n_estimators=min(kwargs.get("n_estimators", 100), 1000) , criterion="gini" if self.num_classes >= 2 else "mse" , n_jobs=self.params_base.get('n_jobs', max(1, physical_cores_count)))
  • 23. Confidential23 Advanced Options: Recipe Parameters • mutate_params • Random permutations of parameter options for transformers and models • Can get the options chosen in final model from Auto Doc class ExtraTreesModel(CustomModel): _display_name = "ExtraTrees" _description = "Extra Trees Model based on sklearn" def mutate_params(self, accuracy=10, **kwargs): if accuracy > 8: estimators_list = [100, 200, 300, 500, 1000, 2000] elif accuracy >= 5: estimators_list = [50, 100, 200, 300, 400, 500] else: estimators_list = [10, 50, 100, 150, 200, 250, 300] # Modify certain parameters for tuning self.params["n_estimators"] = int(np.random.choice(estimators_list)) self.params["criterion"] = np.random.choice(["gini", "entropy"]) if self.num_classes >= 2 else np.random.choice(["mse", "mae"])
  • 24. Confidential24 Advanced Options: Writing to Logs • Leave notes in the experiment logs from h2oaicore.systemutils import make_experiment_logger, loggerinfo, loggerwarning ... if self.context and self.context.experiment_id: logger = make_experiment_logger( experiment_id=self.context.experiment_id, tmp_dir=self.context.tmp_dir, experiment_tmp_dir=self.context.experiment_tmp_dir ) ... loggerinfo(logger, "Prophet will use {} workers for fitting".format(n_jobs))

Editor's Notes

  • #3: Took ~ 6 minutes w/o pre-warming /data/Smalldata/gbm_test/titanic.csv /data/Kaggle/. CreditCard/CreditCard-train.csv‎ This is not uptodate: https://github.com/h2oai/tutorials/blob/master/DriverlessAI/aquarium/aquarium.md
  • #5: DAI quick overview Types of problems we can handle: TS, NLP, bi, multi, regress New engineered features to get new value out of your data Not a black box!!! MLI & Autodoc Production ready code (including all data transformations) Recipes to augment this process with your business knowledge
  • #6: Driverless AI is platform that is applicable across industries General purpose It’s not build for a single vertical or use case, but can be used for a wide range of basically all supervised problems Name a few use cases & industries Domain Scientists and SMEs are king when it comes to knowing their data and how to use it Combine together – turbo charge time to solution This is where recipes come in, we allow this expert knowledge to be added in to Driverless AI which refines the process for an individual use case Horizontal not vertical, core capabilities are agnostics, specific datasets use cases can be refined by domain expertise Meant to save time, not replace people but augment them to make them more efficient and provide guidance on how to operate on the data
  • #7: At a very high level, here’s how Driverless AI works: Ingest data from any data source: Hadoop, Snowflake, S3 object storage, Google BigQuery – Driverless AI is agnostic about the data source. Use Automatic Visualization and its various plots, graphics and charts to look at the data, and understand the data shape, outliers, missing values and so on. This is where a data scientist can quickly spot things such as bias in the data. Based on the problem type, Driverless AI will use recipes to do advanced feature engineering (automatically), while the model continues to iterate across thousands of choices, does parameter tuning, and looks for the best fit of the model. Finally, another amazing feature of Driverless AI is that it can build an automatic scoring pipeline, which means it can generate Python and Java code to deploy low latency scoring of that model into production. Imagine taking that scored model and propagating it across every edge device – on smart phones, or in cars, to continuously generate value. Through this process, Machine Learning Interpretability gives the data scientist the reason codes and insight into what model was generated and which features were used to build the model. Automatic documentation gives one an in-depth explanation of the entire feature engineering process. This satisfies that desire to have trust in AI with explainability. This entire process is done through a graphical user interface, making it easy for even a novice data scientist to be productive immediately. Of course, acceleration to achieve faster time to insight is important, and an IBM Power System server with GPUs (such as the AC922) will give the highest level of acceleration to gain results and insights faster. The slide makes reference to IID (Independent and Identically Distributed) data. This refers to data where the individual rows (or observations) are essentially independent of each other, unlike time series data where there is a relationship between rows in the dataset as they are collected over time. For example, whether a customer applying for a new mortgage is likely to default on that mortgage at some point in the future would use IID data such as age, income, current net worth and so on to make the prediction. For time series data, think of a utilities company; for example, a utility company tracks resource utilization over the course of a day, weeks and years where trends of usage over time periods are a factor in the prediction process.
  • #9: Recipes: bring in your own domain knowledge Bring in existing IP - reuse existing IP on top of the engine Newer data scientists can use their senior’s IP ->
  • #11: Took ~ 6 minutes w/o pre-warming /data/Smalldata/gbm_test/titanic.csv /data/Kaggle/. CreditCard/CreditCard-train.csv‎ This is not uptodate: https://github.com/h2oai/tutorials/blob/master/DriverlessAI/aquarium/aquarium.md