Lessons from Driverless AI going to Production

Tom Kraljevic / Venkatesh Yadav
H2O.ai
Lessons From
Driverless AI Going
to Production

Outline
• Driverless AI software distributions and supported environments
• Hardware Recommendations
• End-to-end steps of hardware uncrating to Machine Learning
Pipeline-creating
• Data Sources
• Automating Driverless AI training
• Productionizing Driverless AI pipelines
• Top customer questions

Driverless AI Software Distributions and
Supported Environments
• Cloud marketplace BYOL offerings
• Amazon AWS AMI
• Microsoft Azure Marketplace
• Google Cloud Platform
• Nimbix, Paperspace
• IBM Cloud Private
• NVIDIA DGX Registry
• Install on your own
• Cloud (for experimenting or for serious use)
• Servers (for serious use)
• Desktop/Laptop (for experimenting with small data)

Cloud - Microsoft Azure Marketplace

Install on Your Own
• RPM package
• DEB package
• Docker image

RPM
Supported CPU Supported OS Supported CUDA Supported GPU
IBM Power P8 RHEL 7 CUDA 8.0
CUDA 9.0
(CUDA 9.2 soon...)
Kepler
Pascal
Volta
IBM Power P9 RHEL 7 CUDA 9.0
(CUDA 9.2 soon...)
Volta
x86_64 RHEL 7
SLES 12
CUDA 8.0
CUDA 9.0
(CUDA 9.2 soon...)
Kepler
Pascal
Volta

DEB
Supported CPU Supported OS Supported CUDA Supported GPU
IBM Power P8 Ubuntu 16.04 CUDA 8.0
CUDA 9.0
(CUDA 9.2 soon...)
Kepler
Pascal
Volta
IBM Power P9 (Ubuntu GPU
support not yet
available...)
(Ubuntu GPU
support not yet
available...)
(Ubuntu GPU
support not yet
available...)
x86_64 Ubuntu 16.04 CUDA 8.0
CUDA 9.0
(CUDA 9.2 soon...)
Kepler
Pascal
Volta
x86_64 Ubuntu 16.04 on
Windows (via WSL)
none none

Docker Image
Supported CPU Supported Host OS Supported
Container CUDA
Supported GPU
IBM Power P8 Ubuntu 16.04 CUDA 8.0
CUDA 9.0
Kepler
Pascal
Volta
IBM Power P8 RHEL 7 Soon... Soon...
IBM Power P9 (Ubuntu GPU
support not yet
available...)
(Ubuntu GPU
support not yet
available...)
(Ubuntu GPU
support not yet
available...)
IBM Power P9 RHEL 7 Soon... Soon...
x86_64 Ubuntu 16.04 CUDA 8.0
CUDA 9.0
Kepler
Pascal
Volta

Hardware Recommendations
• IBM Power
• P8 with 4 (or more) Pascal/Volta GPUs (“Minsky”)
• Lots of CPU cores (100 +)
• Lots of CPU memory (256 GB +)
• Fast storage (SSD/NVMe)
• P9 with 4 (or more) Volta GPUs (“Newell”)
• Lots of CPU cores (one of my test systems has 160 cores)
• x86_64
• 2 or more Xeon sockets
• 4 or more Pascal / Volta GPUs
• Insights
• Don’t skimp on CPU cores and memory; when GPUs aren’t working, this is the bottleneck
• Fast storage makes a big difference for docker-based environments

End-to-End Uncrating to Creating –
Bringing DAI to a new IBM P9 System
• Enable RedHat Linux subscription
• Install GPU drivers
• Install CUDA 9.0
• Grow the disk volume mounted at ‘/’
• Open firewall port 12345
• Download Driverless AI
• Install Driverless AI
• Use Driverless AI from your web browser

End-to-End Uncrating to Creating –
Bringing DAI to a new IBM P9 System
• [ Enable RedHat Linux subscription ]
• [ (Optional) Enable SELinux if you want it ]
• yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
• yum install dkms
• yum groupinstall “Development Tools”
• Needed to build GPU drivers
• wget http://us.download.nvidia.com/tesla/396.26/nvidia-driver-local-repo-rhel7-
396.26-1.0-1.ppc64le.rpm
• yum localinstall nvidia-driver*.rpm
• wget
https://developer.download.nvidia.com/compute/cuda/repos/rhel7/ppc64le/cuda-
repo-rhel7-9.2.88-1.ppc64le.rpm
• yum localinstall cuda-repo*.rpm
• yum install cuda-9-0.ppc64le
• systemctl enable nvidia-persistenced
• cp /lib/udev/rules.d/40-redhat.rules /etc/udev/rules.d
• sed -i ‘/SUBSYSTEM==“memory”, ACTION==“add”/d’ /etc/udev/rules.d/40-redhat.rules
• Needed for nvidia-smi to not say “Unknown error”
• reboot
• [ Grow size of the disk volume mounted at ‘/’ (default was really tiny) ]
• firewall-cmd --zone=public --add-port=12345/tcp –permanent
• wget http://.../dai-rpm.dai
• yum localinstall dai.rpm
• systemctl start dai
• http://dai-host:12345
• [ Import dataset ]
• [ Run an experiment (the “Predict” menu item) ]

Data Sources
• File Formats
• csv, tsv, txt, dat, tgz, gz, bz2, zip, xz, xls, xlsx, nff, feather, bin, arff, parquet
• Connectors
• Local filesystem
• HDFS
• S3
• Google Cloud Storage
• Google BigQuery
• (in development) Minio
• (in development) Snowflake
• Adding these on a first-come-first-served basis...

Automating Driverless AI Training (Python)
address = 'http://ip_where_driverless_is_running:12345'
username = 'username'
password = 'password'
from h2oai_client import Client, ModelParameters, InterpretParameters
h2oai = Client(address = address, username = username, password = password)
train_path = '/data/Kaggle/CreditCard/CreditCard-train.csv'
test_path = '/data/Kaggle/CreditCard/CreditCard-test.csv'
train = h2oai.create_dataset_sync(train_path)
test = h2oai.create_dataset_sync(test_path)
target="default payment next month"
params = h2oai.get_experiment_tuning_suggestion(dataset_key = train.key,
target_col = target,
is_classification = True,
is_time_series = False)
experiment = h2oai.start_experiment_sync(params)
h2oai.download(src_path=experiment.test_predictions_path, dest_dir=".")

Productionizing Driverless AI Pipelines
• Driverless AI MOJO pipeline (+ model) artifact
• Small/lightweight footprint
• Low latency
• Designed for real-time applications (predicting one row at a time)
• Java implementation
• MOJO for both the feature-engineered pipeline, as well as for MLI (to get reason
codes in production)
• Driverless AI Python pipeline (+ model) artifact
• Heavy footprint
• Usable for batch applications
• Used as a reference implementation for MOJO testing
• Will usually have new features first

Driverless AI Python MOJO Code Example
import java.io.IOException;
import ai.h2o.mojos.runtime.MojoPipeline;
import ai.h2o.mojos.runtime.frame.MojoFrame;
import ai.h2o.mojos.runtime.frame.MojoFrameBuilder;
import ai.h2o.mojos.runtime.frame.MojoRowBuilder;
import ai.h2o.mojos.runtime.utils.SimpleCSV;
public class Main {
public static void main(String[] args) throws IOException {
// Load model and csv
MojoPipeline model = MojoPipeline.loadFrom("pipeline.mojo");
// Get and fill the input columns
MojoFrameBuilder frameBuilder = model.getInputFrameBuilder();
MojoRowBuilder rowBuilder = frameBuilder.getMojoRowBuilder();
rowBuilder.setValue("AGE", "68");
rowBuilder.setValue("RACE", "2");
rowBuilder.setValue("DCAPS", "2");
rowBuilder.setValue("VOL", "0");
rowBuilder.setValue("GLEASON", "6");
frameBuilder.addRow(rowBuilder);
// Create a frame which can be transformed by MOJO pipeline
MojoFrame iframe = frameBuilder.toMojoFrame();
// Transform input frame by MOJO pipeline
MojoFrame oframe = model.transform(iframe);
// Output prediction as CSV
SimpleCSV outCsv = SimpleCSV.read(oframe);
outCsv.write(System.out);
}
}

Top Customer Questions - Installation
• Can Driverless AI run on CPU-only machines?
• Can Driverless AI be installed without docker in a native install mode RPM,
DEB package ?
• Can Driverless AI be integrated with ActiveDirectory/LDAP for
Authentication/Authorization ?
• Can Driverless AI be secured with SSL support ?
• Can I run multiple instances of Driverless AI on one GPU server ?
• Can I run divide Driverless AI and divide GPU resources ?
• Can Driverless AI run on my Windows 7 laptop ?
• Can Driverless AI run in an air-gapped environment?

Top Customer Questions - Deployment
• Can the model (& pipeline) be deployed as a docker container ?
• Can the model (& pipeline) be deployed as a micro service in
kubernetes ?
• Does Driverless AI support one click model (& pipeline) deployment ?
• How to scale Driverless AI MOJO model (& pipeline) in production ?
• What are the different Driverless AI MOJO model (& pipeline)
deployment patterns ?

Lessons from Driverless AI going to Production

More Related Content

What's hot (20)

Similar to Lessons from Driverless AI going to Production (20)

More from Sri Ambati (20)

Recently uploaded (20)

Lessons from Driverless AI going to Production