Improving computer vision models at scale presentation

1© Cloudera, Inc. All rights reserved.
Improving computer vision models
at scale
Jan Kunigk | Principal Solutions Architect
Dr. Mirko Kämpf | Senior Solutions Architect
Marton Balassi | Solutions Architect

The slide deck is an updated
version of our talk
@Strata2018
in London

Motivation

Imagine the possibilities...
• detect dangerous situations in traffic
• detect a fire in a forest or a landfill via infrared drones early
• detect extremely hard to find tumors
• detect combatants in satellite data
• detect violence in subway station
• detect broken parts in a manufacturing line
... all that @ scale!

Requirements
• Fast random access to images
• Free text search for labels
• Visual user interface
• Execute existing Python and Scala deep learning pipelines at scale
• Automatic indexing of labels
• Easy model comparison
• Search for complex scenarios

Building blocks of our solution
• Fast random access of images
• HBase is used for storing both the images and the corresponding labels
• Free text search of labels
• Solr indexes are used to query the data
• Enrichment and augmentation with secondary data sources (e.g. GPS, CANbus)
• Hive/Impala tables are used to store enrichment data
• Visual interface
• A Hue dashboard provides the UI
• Execute existing Python and Scala deep learning pipelines at scale
• (Py)Spark is used to scale out the computation
• Automatic indexing of labels
• The Lily indexer is used to automatically populate the Solr collection

Solution overview
Main users:
Data Scientist
and
Domain Experts

Data Engineering and Model Lifecycle

Classifying an image
[1] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna: " Rethinking the Inception Architecture for Computer Vision”
https://arxiv.org/abs/1512.00567
[2] Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, Yoshua Bengio: " The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for
Semantic Segmentation”, https://arxiv.org/abs/1611.09326
http://mi.eng.cam.ac.uk/projects/segnet/demo.php#demo
[3] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollár: "Focal Loss for Dense Object Detection”, https://arxiv.org/abs/1708.02002.,
https://github.com/fizyr/keras-retinanet
[4] https://github.com/facebookresearch/Detectron
• Object detection
• InceptionV3 [1]
• Semantic Segmentation
• SegNet [2]
• Bounding Boxes
• RetinaNet [3]
• Object masking
• Detectron [4]

Typical use case for model improvement
Consider using a tool that visualizes layer activation, like https://github.com/raghakot/keras-vis.

• Models are trained on GPUs
• Since 1.1 Cloudera Data
Science Workbench natively
supports GPUs
• Your Ops people will
appreciate it
Data Science workflow with CDSW
https://blog.cloudera.com/blog/2017/07/prophecy-fulfilled-keras-and-cloudera-data-science-workbench/
https://blog.cloudera.com/blog/2017/09/customizing-docker-images-in-cloudera-data-science-workbench/

Access Patterns & Model Lifecycle with CDSW
CDH CDSW
IT-Crowd Data Crowd
GPU GPU
GPU GPU
img img img img
img img img img
tags
tr
if
Search for images by
properties and context
Access Compliance and
Governance data
Search for time series
patterns
Cloudera Data Science Workbench
Algorithm prototyping & model training
HUE Query Editor / Dashboards
Ad-hoc analysis using SQL and Search
img img img imgimg img img img img img img img img img img img img img img img img img img imgimg img img img img img img img img img img img img img img img
HUE Query Editor / Dashboards
Data engineering & curation
Solr
Augmentation
HBase
img img img img
img img img img
img img img img
awesome
.py
Tenzing
if
if
GPU
Depends
on YARN
resource
types

Full Data Pipeline
ffmpeg
img img img img
9.2
9.1
lon timestamp
20180428152138
area | tunnel | bridgel |
Geodata
Stadium | no | no
…
9.0
lat
48.1
48.3
48.5
NMEA
AVRO
B14 | yes | no20180428152330
20180428152831 B14 | no | yes
gps2avro
pynmea2
overpy
Image Data
CF:tagsCF:img_all
jpg imagenet
img stop-sign person
img truck
…
retinanet tiny-yolo
…
boatperson person
bicycle person traﬃc light boat
img img img img
CF:geo
20180428152330
20180428152330
Key:
30 30 30 30
Key
20180428152330
20180428152330
HBaseStorageHandlerNMEA
OpenStreetmap
/ overpass API
30 30 31 31
hbase-indexer-mr-job.jar
Lily
NMEA
Tenzing
if

Time domains / resolution
SELECT rnk,system_id,speed,time_gap
FROM
(SELECT
row_number() over (
partition by system_id order by time_gap asc) 'rnk',
system_id,speed,time_gap
FROM
(
SELECT
img_domain.system_id as system_id,
speed_domain.speed as speed,
abs(img_domain.time_s - speed_domain.time_s) as time_gap
FROM img_domain JOIN speed_domain
ON img_domain.system_id = can.car_id
) t
) t WHERE rnk = 1
SELECT system_id,speed
FROM img_domain.time_s, speed_domain.time_s
WHERE WITHIN (img_domain.time_s, speed_domain.time_s, 1.5s)
GPS
Image
Speed
SQL: A cool language that supports range queries, not yet existing:

PySpark implementation (Keras)
def predict(iterator):
model = InceptionV3(weights=None)
model.load_weights(FLAGS.weights_file)
return [(x[0], run_inference_on_image(model, x[1])) for x in iterator]
def main():
sc = SparkContext(conf=conf)
hbase_io = common.HbaseIO(FLAGS)
out_format = common.OutputFormatter(FLAGS, MODEL_NAME)
hbase_images = hbase_io.load_from_hbase(sc)
classified_images = hbase_images.mapPartitions(predict)
.map(out_format.imagenet_format)
classified_images.foreachPartition(hbase_io.put_to_hbase)

• The Python environment with tensorflow is distributed to the executors at
runtime, it is not preinstalled on the nodes
• The individual models only need to implement the following functions:
• prepare
• predict
• output_format
• Conceptually this is very close to the scikit-learn or Spark ML Pipelines approach
• Deep Learning Pipelines can be a way to streamline the implementation
PySpark implementation (Keras)
https://databricks.com/blog/2017/06/06/databricks-vision-simplify-large-scale-deep-learning.html

Spark implementation (dl4j / Scala)
def predict(pairs: Iterator[(String, (INDArray, Int, Int))]) = {
val model = ModelSerializer.restoreComputationGraph(modelLoc)
pairs.map{ case (name, image) =>
(name, run_inference_on_image(model, image)
}
}
def main(args: Array[String]) = {
val sc = SparkContext(conf=conf)
val hbase_io = common.HbaseIO(args)
val out_format = common.OutputFormatter(args)
val hbase_images = hbase_io.load_from_hbase(sc)
val classified_images = hbase_images.mapPartitions(predict)
.map(out_format.imagenet_format)
val classified_images.foreachPartition(hbase_io.put_to_hbase)
}

Demo

Moving further
Label Quality Inspection

Visual label inspection via HUE:
Label quality & Relations between objects
Index contains:
- object relations
- predicted labels
- object statistics
Rendered BoundingBoxes
are key to visual inspection.
>>> easy comparison of multiple
model classes (A,B) or
model versions (C1, C2).
Model BModel A

From labels to meaning ...
Person in front of car ... bounding boxes overlap,
... property of the object-pair becomes a fact.
Facts, are added into a multivalue field of a
document in a Solr index.
Query:
q=overlap_category:car-person OR overlap_category:person-car

Moving further
Semantic Search

How to identify relations?
1. Build ontology for traffic scenes or any
domain you work on.
2. Map statistical object properties to
RDF graph using heuristics
3. Combine scene-graphs in a triple store
4. Enable search with SPARQL

How to identify semantic relations?
• Build Ontology for Traffic Scenes
• Map statistical object properties to
RDF graph using heuristics
• Combine scene-graphs (triple store)
• Search with SPARQL
• Object detection
• Deep neural networks
• Bounding Box analysis
• Rendering of BBs with labels
• Geometry based heuristics
• Overlap ratios
• Orientation analysis
• SOLR Search by
• Label
• Relation

Why search on a knowledge base?
• This approach allows to search easily for complex scenarios:
THINGS (pedestrian, stop sign, hot spot, gun, …)
RELATIONSHIPS (close by, in front of, above, underneath, ...)
ACTIVITIES (danger, theft, evasion, escape)
SITUATIONS (combinations of THINGS, RELATIONS, and ACTIVITIES)
• ... very fast, even in huge image collections.
• Knowledge graphs remove the need to know Solr schema details.

Implementation of complementary search channels ...
Triplification using local graphs

Summary
What we can do with images today:
• Search for combinations and amounts of objects at scale: „at least 5 cars and 2 trucks”
• Search for basic relationship among those things: „In front of”, ”In a line”
• Enrich the search experience with other domains: geospatial, sensor data, etc.
This helps to:
• Gain better understanding of the quality of our CV models/apps
• Discover corner cases, improve model-lifecycle and build new (data) products faster
In the future:
• Focus on semantic search, advanced visualization and improved model lifecycles

Thank you
jk@cloudera.com
mirko@cloudera.com
mbalassi@cloudera.com

Appendix: Getting data
There are many great datasets out there for research purposes:
• Cityscapes, https://www.cityscapes-dataset.com/
• COCO, http://cocodataset.org/#home
• YouTube-8M, https://research.google.com/youtube8m/

Improving computer vision models at scale presentation

More Related Content

What's hot (6)

Similar to Improving computer vision models at scale presentation (20)

Recently uploaded (20)

Improving computer vision models at scale presentation