Get Started with Driverless AI Recipes - Hands-on Training

Bring Your Own Recipes
Make Your Own AI

Confidential2 Confidential2
• aquarium.h2o.ai
• H2O.ai’s software-as-a-service platform for training and initial
exploration
• Recommended for use as a training, workshops and tutorials
• Driverless AI Test Drive
• https://github.com/h2oai/tutorials/blob/master/DriverlessAI/Test-
Drive/test-drive.md
• Your data will disappear after the time period
• Run as many times as needed
H2O Aquarium 1
2
3

Confidential3
Make Your Own AI: Agenda
• Where does BYOR fit into Driverless AI?
• What are custom recipes?
• Tutorial: Using custom recipes
• What does it take to write a recipe?
• Example deep dive with the experts

Confidential4
Key Capabilities of H2O Driverless AI
• Automatic Feature Engineering
• Automatic Visualization
• Machine Learning Interpretability (MLI)
• Automatic Scoring Pipelines
• Natural Language Processing
• Time Series Forecasting
• Flexibility of Data & Deployment
• NVIDIA GPU Acceleration
• Bring-Your-Own Recipes

Confidential5
Driverless AI Across Industries

Confidential6
The Workflow of Driverless AI
SQL
HDFS
X Y
Automatic Model Optimization
Automatic
Scoring Pipeline
Deploy
Low-latency
Scoring to
Production
Modelling
Dataset
Model Recipes
• i.i.d. data
• Time-series
• More on the way
Advanced
Feature
Engineering
Algorithm
Model
Tuning+ +
Survival of the Fittest
1 Drag and Drop Data
2 Automatic Visualization
4 Automatic Model Optimization
5 Automatic Scoring Pipelines
Snowflake
Model
Documentation
 Upload your own recipe(s)
Transformations Algorithms Scorers
3 Bring Your Own Recipes
 Driverless AI executes automation on your recipes
Feature engineering, model selection, hyper-parameter tuning,
overfitting protection
 Driverless AI automates
model scoring and
deployment using your
recipes
Amazon S3
Google BigQuery
Azure Blog Storage

Confidential7
What is a Recipe…
• Machine Learning Pipelines’ model prepped data to solve a business question
• Transformations are done on the original data to ensure it’s clean and most predictive
• Additional datasets may be brought in to add insights
• The data is modeled using an algorithm to find the optimal rules to solve the problem
• We determine the best model by using a specific metric, or scorer
• BYOR stands for Bring Your Own Recipe and it allows domain scientists to solve their
problems faster and with more precision by adding their expertise in the form of Python
code snippets
• By providing your own custom recipes, you can gain control over the optimization choices
that Driverless AI makes to best solve your machine learning problems

Confidential8
• Flexibility, extensibility and customizations built into the Driverless AI
platform
• New open source recipes built by the data science community, curated by
Kaggle Grand Masters @ H2O.ai
• Data scientists can focus on domain-specific functions to build
customizations
• 1-click upload of your recipes – models, scorers and transformations
• Driverless AI treats custom recipes as first-class citizens in the automatic
machine learning workflow
• Every business can have a recipe cookbook for collaborative data
science within their organization
…and Why Do You Care?

Confidential9
https://h2oai.github.io/tutorials/

• aquarium.h2o.ai
• H2O.ai’s software-as-a-service platform for training and initial
exploration
• Recommended for use as a training, workshops and tutorials
• Driverless AI Test Drive
• https://github.com/h2oai/tutorials/blob/master/DriverlessAI/Test-
Drive/test-drive.md
• Your data will disappear after the time period
• Run as many times as needed
H2O Aquarium 1
2
3

Confidential11
The Writing Recipes Process
• First write and test idea on
sample data before wrapping as
a recipe
• Download the Driverless AI
Recipes Repository for easy
access to examples
• Use the Recipe Templates to
ensure you have all required
components
https://github.com/h2oai/driverlessai-recipes

Confidential12
What does it take to write a custom recipe?
• Somewhere to write .py files
• To use or test your recipe you need Driverless AI 1.7.0 or later
• BYOR is not available in the current LTS release series (1.6.X)
• To test your code locally you need
• Python 3.6, numpy, datatable, & the Driverless AI python client
• Python development environment such as PyCharm or Spyder
• To write recipes you need
• The ability to write python code

Confidential13
The Testing Recipes Process
• Upload to Driverless AI to
automatically test on sample data
or
• Use the DAI Python or R client to
automate this process
or
• Test locally using a dummy
version of the RecipeTransformer
class we will be extending

Confidential14
What if I get stuck writing a custom recipe?
• Use error messages and stack traces from Driverless AI & your python development
environment to try to pinpoint what is causing the problem
• Write to the Driverless AI Experiment Logs (Example in Advanced Options below)
• Read the FAQ & look the templates: https://github.com/h2oai/driverlessai-recipes
• Follow along with the tutorial (Coming Soon): https://h2oai.github.io/tutorials/
• Ask on the community channel: https://www.h2o.ai/community/

Confidential15
Build Your Own Recipe
Full customization of the entire ML Pipeline through scikit-learn Python API
Custom Feature Engineering – fit_transform & transform
• Custom statistical transformations and embeddings for numbers, categories,
text, date/time, time-series, image, audio, zip, lat/long, ICD, ...
Custom Optimization Functions – f(id, actual, predicted, weight)
• Ranking, Pricing, Yield Scoring, Cost/Reward, any Business Metrics
Custom ML Algorithms – fit & predict
• Access to ML ecosystem: H2O-3, sklearn, Keras, PyTorch, CatBoost, etc.

Confidential16
https://h2oai.github.io/tutorials/

Dive into H2O
https://www.eventbrite.com/e/dive-into-h2o-new-york-tickets-76351721053

Confidential18
More details:
FAQ / Architecture Diagram etc.
https://github.com/h2oai/driverlessai-recipes

Confidential19
Bring Your Own Recipes
• What is BYOR?
• Building a Transformer
• Building a Scorer
• Building a Model Algorithm
• Advanced Options
• Writing Recipes Help

Confidential20
Advanced Options: Importing Packages
• Install and use the exact version of the
exact package you need for your recipe
• _global_modules_needed_by_name
• Use before class definition for when there are
multiple recipes in one file that need the
package
• _modules_needed_by_name
• Use in the class definition
"""Row-by-row similarity between two text columns based
on FuzzyWuzzy"""
# https://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-
string-matching-in-python/
# https://github.com/seatgeek/fuzzywuzzy
from h2oaicore.transformer_utils import
CustomTransformer
import datatable as dt
import numpy as np
_global_modules_needed_by_name = ['nltk==3.4.3']
import nltk

Confidential21
Advanced Options: Similar Recipes
• Extend your custom recipes when there
are multiple options or similar methods
and you want all of them to be tested
class FuzzyQRatioTransformer(FuzzyBaseTransformer,
CustomTransformer):
_method = "QRatio"
class FuzzyWRatioTransformer(FuzzyBaseTransformer,
CustomTransformer):
_method = "WRatio"
class
ZipcodeTypeTransformer(ZipcodeLightBaseTransformer,
CustomTransformer):
def get_property_name(self, value):
return 'zip_code_type'
class
ZipcodeCityTransformer(ZipcodeLightBaseTransformer,
CustomTransformer):
def get_property_name(self, value):
return 'city'

Confidential22
Advanced Options: Recipe Parameters
• set_default_params
• Parameters of models or transformers
• Access in functions with self.params
from h2oaicore.systemutils import physical_cores_count
class ExtraTreesModel(CustomModel):
_display_name = "ExtraTrees"
_description = "Extra Trees Model based on sklearn"
def set_default_params(self, accuracy=None,
time_tolerance=None, interpretability=None, **kwargs):
self.params = dict(
random_state=kwargs.get("random_state", 1234)
, n_estimators=min(kwargs.get("n_estimators", 100), 1000)
, criterion="gini" if self.num_classes >= 2 else "mse"
, n_jobs=self.params_base.get('n_jobs', max(1,
physical_cores_count)))

Confidential23
Advanced Options: Recipe Parameters
• mutate_params
• Random permutations of parameter
options for transformers and models
• Can get the options chosen in final
model from Auto Doc
class ExtraTreesModel(CustomModel):
_display_name = "ExtraTrees"
_description = "Extra Trees Model based on sklearn"
def mutate_params(self, accuracy=10, **kwargs):
if accuracy > 8:
estimators_list = [100, 200, 300, 500, 1000, 2000]
elif accuracy >= 5:
estimators_list = [50, 100, 200, 300, 400, 500]
else:
estimators_list = [10, 50, 100, 150, 200, 250, 300]
# Modify certain parameters for tuning
self.params["n_estimators"] =
int(np.random.choice(estimators_list))
self.params["criterion"] = np.random.choice(["gini", "entropy"]) if
self.num_classes >= 2
else np.random.choice(["mse", "mae"])

Confidential24
Advanced Options: Writing to Logs
• Leave notes in the
experiment logs
from h2oaicore.systemutils import make_experiment_logger,
loggerinfo, loggerwarning
...
if self.context and self.context.experiment_id:
logger = make_experiment_logger(
experiment_id=self.context.experiment_id,
tmp_dir=self.context.tmp_dir,
experiment_tmp_dir=self.context.experiment_tmp_dir
)
...
loggerinfo(logger, "Prophet will use {} workers for
fitting".format(n_jobs))

Get Started with Driverless AI Recipes - Hands-on Training

More Related Content

What's hot (20)

Similar to Get Started with Driverless AI Recipes - Hands-on Training (20)

More from Sri Ambati (20)

Recently uploaded (20)

Get Started with Driverless AI Recipes - Hands-on Training

Editor's Notes