Flink Forward San Francisco 2019: Deploying ONNX models on Flink - Isaac Mckillen-Godfried

ONNX MEETS FLINK
The long trudge towards integrating PyTorch, Chainer, CNTK, MXNet and other models in Flink
streaming applications.

The Problem/Motivation
ONNX
Overview
Limitations
End-to-end example with Java Embedded Python
Overview

Goals
Remove barrier between A.I.“research” and “production.”
Enable access to recent state of the art models from major conference and Python
based frameworks
Speciﬁcally, integrate deep learning models written in Python frameworks like PyTorch,
CNTK, Chainer into Flink pipelines for realtime inference on streaming data.
Challenge(s)
Poor Python support in Flink and vice-versa poor ONNX support in Java
Converting a model to ONNX itself can be quite arduous
It can be challenging to rewrite pre-processing code in Java
Goals and challenges

International Conference Learning Representations (ICLR) statistics
2018: 87 papers mentioned PyTorch (compared to 228 that
mentioned Tensorflow)
2019 252 papers mentioned PyTorch (compared to 266 that
mentioned Tensorflow) Roughly a 190% increase!
The rise of
PyTorch
Tensorflow

Powered frameworks
NLP Other
torchvision
torchcv

What is ONNX?
Open neural network exchange format

WHY USE ONNX?
Backends that (at least in theory) run in a large number of
environments.
Can export models from a variety of formats to a standard
format
Exported models generally smaller (in terms of space) than full
models.

Overview of possible ways to
integrate ONNX models into Flink
Create a micro-service and use in conjunction with Flink
AsyncIO.
Use Java embedded Python (JEP) and run using Caffe2 (or
Tensorﬂow)
Load model natively into Java/Scala and run with a JVM backend
framework

ONNX frameworks overview
ONNX Scoreboard Measure supported operations

OPTIONS
ONNX options
Current backends
Caffe2 (Python, C++)
CNTK (C++, C#, Python, Java experimental)
Tensorﬂow-ONNX (Python) [Not analogous to Tensorﬂow)
VESPA (Java)
Menoh (C++, Java, C#, Ruby, NodeJS)

Menoh in Java
Only 19 of the 116 ops available (so pretty limited for now)
import jp.preferred.menoh.ModelRunner;
import jp.preferred.menoh.ModelRunnerBuilder;
try (
ModelRunnerBuilder builder = ModelRunner
// Load ONNX model data
.fromOnnxFile(“squeezenet.onnx”)
// Define input profile (name, dtype, dims) and output profile (name, dtype)
// Menoh calculates dims of outputs automatically at build time
.addInputProfile(conv11InName, DType.FLOAT, new int[] {batchSize, channelNum, height,
width})
.addOutputProfile(fc6OutName, DType.FLOAT)
.addOutputProfile(softmaxOutName, DType.FLOAT)
// Configure backend
.backendName("mkldnn")
.backendConfig("");
ModelRunner runner = builder.build()
) {
// The builder can be deleted explicitly after building a model runner
builder.close();

WHEN NOTTO USE ONNX?
Export process in many cases is difﬁcult and time consuming!
Backends have limited support for various operations.
For instance,Yolo2 still cannot be run on even Caffe2 or
Tensorﬂow backend due to lack of support of ImageScaler.
Some models have to be re-trained before exporting

Flink calls model “API” using AsyncIO similar to any other API connection
Pros
Use Docker container to capture exact model dependencies (smaller
container than with Flink+Model)
No (extensive) re-writing of code needed
Cons
Have to handle scaling/maintaining a separate service
AsyncIO and Microservice
Model

Uses JNI and the Cython API to start up the Python
interpreter inside the JVM
Faster than many alternatives
Can use pretty much any Python library including numpy,
Tensorﬂow, PyTorch, Keras, etc
Automatically converts Java primitives, Strings, and
jep.NDArrays sent into the Python interpreter into Python
primitives, strings, and numpy.ndarrays
Java Embedded Python (JEP)

Using PyTorch directly with JEP
Easiest way solution use: Kubernetes
AIStream JEP Flink Docker container
Setup can be a bit painful
Have to get Python dependencies on all
Flink nodes
Job needs path to Python
“Unsatisﬁed Link Error” is very common
Bootstrap script possible for EMR on AWS

NLP framework written in PyTorch with a state of the art named
entity recognition (NER) model.
from flair.data import Sentence
from flair.models import SequenceTagger
# make a sentence
sentence = Sentence('I love Berlin .')
# load the NER tagger
tagger = SequenceTagger.load('ner')
# run NER over sentence
tagger.predict(sentence)
Easy to train and combine with new methods
Framework handles complex preprocessing and models PyTorch
subclasses (therefore exporting to ONNX is not fun)

Named entity recognition on Flink
data stream with Flair
import jep.Jep;
import jep.JepConfig;
import org.apache.flink.api.common.functions.RichMapFunction;
import jep.SharedInterpreter;
import org.apache.flink.configuration.Configuration;
public class flairMap extends RichMapFunction<TweetData, String> {
private SharedInterpreter j;
@Override
public void open(Configuration c)
{
try {
j = new jep.SharedInterpreter();
j.eval("from flair.data import Sentence");
j.eval("from flair.models import SequenceTagger");
j.eval("model = SequenceTagger.load('ner')");
}
catch (jep.JepException e) {
e.printStackTrace();
}
}
}
public String map(TweetData tweet) throws jep.JepException{
String tweetText = tweet.tweetText;
tweetText = tweetText.replaceAll("[^A-Za-z0-9]", " ");
try {
j.set("text", tweetText);
j.eval("s=Sentence(text)");
j.eval("model.predict(s)");
Object result = j.getValue("s.get_spans('ner')");
return result.toString();
}
catch(jep.JepException e){
throw e;
}
}

Sentiment Analysis with Flair
from flair.models import TextClassifier
from flair.data import Sentence
classifier = TextClassifier.load('en-sentiment')
sentence = Sentence(‘Twitter is a really good company!’)
classifier.predict(sentence)
# print sentence with predicted labels
print('Sentence sentiment is: ‘ + sentence.labels)
public String map(TweetData tweet) throws jep.JepException{
String tweetText = tweet.tweetText;
tweetText = tweetText.replaceAll("[^A-Za-z0-9]", " ");
try {
j.set("text", tweetText);
j.eval("s=Sentence(text)");
j.eval("model.predict(s)");
Object result = j.getValue(“s.labels");
return result.toString();
}
catch(jep.JepException e){
throw e;
}
}

Consume data from Twitter Source using Flink Twitter Connector
Filter out non-English Tweets
Alternatively could load multilingual NER model(s)
Named Entity Recognition on Tweets (remove non-entities)
Sentiment Analysis on Tweet (entity, label, sentiment)
Convert to Table. Run query
Putting it all together
https://github.com/isaacmg/dl_java_stream
SELECT entity, sentiment, count(entity)
FROM Tweets
GROUP BY entity, sentiment

Currently easiest to either use JEP or a micro-service + AsyncIO
Saves time converting model to ONNX
No need to re-write code
Promising frameworks in the works like Menoh,VESPA, Dl4J etc
should eventually support ONNX natively but aren’t mature
enough yet.
Conclusions

Flink Forward San Francisco 2019: Deploying ONNX models on Flink - Isaac Mckillen-Godfried

More Related Content

What's hot (16)

Similar to Flink Forward San Francisco 2019: Deploying ONNX models on Flink - Isaac Mckillen-Godfried (20)

More from Flink Forward (20)

Recently uploaded (20)

Flink Forward San Francisco 2019: Deploying ONNX models on Flink - Isaac Mckillen-Godfried