SlideShare a Scribd company logo
Object Recognition for Fun and Profit
Anil Thomas
SV Deep Learning Meetup
November 17th, 2015
Outline
2
•  Neon examples
•  Intro to convnets
•  Convolutional autoencoder
•  Whale recognition challenge
NEON
3
Neon
4
Backends
NervanaCPU, NervanaGPU
NervanaEngine (internal)
Datasets
Images: ImageNet, CIFAR-10, MNIST
Captions: flickr8k, flickr30k, COCO; Text: Penn Treebank, hutter-prize, IMDB, Amazon
Initializers Constant, Uniform, Gaussian, Glorot Uniform
Learning rules
Gradient Descent with Momentum
RMSProp, AdaDelta, Adam, Adagrad
Activations Rectified Linear, Softmax, Tanh, Logistic
Layers
Linear, Convolution, Pooling, Deconvolution, Dropout
Recurrent, Long Short-Term Memory, Gated Recurrent Unit, Recurrent Sum,
LookupTable
Costs Binary Cross Entropy, Multiclass Cross Entropy, Sum of Squares Error
Metrics Misclassification, TopKMisclassification, Accuracy
•  Modular components
•  Extensible, OO design
•  Documentation
•  neon.nervanasys.com
HANDS ON EXERCISE
5
INTRO TO CONVNETS
6
Convolution
0 1 2
3 4 5
6 7 8
0 1
2 3
19 25
37 43
0 1 3 4 0 1 2 3 19
7
•  Each element in the output is the result of a dot
product between two vectors
Convolutional layer
8
0
1
2
3
4
5
6
7
8
19
8
0 1 2
3 4 5
6 7 8
0 1
2 3
19 25
37 43
0
2
3
1
0
2
3
1
0
2
3
1
0
2
3
1
25
37
43
Convolutional layer
9
x +
x +
x +
x =
00
11
32
43
The weights are shared among
the units.
0
1
2
3
4
5
6
7
8
0
2
3
1
0
2
3
1
0
2
3
1
0
2
3
1
19
19
Recognizing patterns
10
Detected the pattern!
11
B0 B1 B2
B3 B4 B5
B6 B7 B8
G0 G1 G2
G3 G4 G5
G6 G7 G8
R0 R1 R2
R3 R4 R5
R6 R7 R8
12
B0 B1 B2
B3 B4 B5
B6 B7 B8
G0 G1 G2
G3 G4 G5
G6 G7 G8
R0 R1 R2
R3 R4 R5
R6 R7 R8
B0 B1 B2
B3 B4 B5
B6 B7 B8
13
G0 G1 G2
G3 G4 G5
G6 G7 G8
R0 R1 R2
R3 R4 R5
R6 R7 R8
B0 B1 B2
B3 B4 B5
B6 B7 B8
14
G0 G1 G2
G3 G4 G5
G6 G7 G8
R0 R1 R2
R3 R4 R5
R6 R7 R8
Max pooling
0 1 2
3 4 5
6 7 8
4 5
7 8
0 1 3 4 4
15
•  Each element in the output is the maximum value
within the pooling window
Max( )
Deconvolution
16
•  Ill posed problem, but we can approximate
•  Scatter versus gather
•  Used in convlayer backprop
•  Equivalent to convolution with a flipped kernel on zero padded input
•  Useful for convolutional autoencoders
Deconv layer
17
0 0 1
0 4 6
4 12 9
0 1
2 3
0 1
2 3
Deconv layer
18
0
0
1
0
4
6
4
12
9
0
23
1
x +
x =
31
13
0
2
3
1
0
2
3
1
0
2
3
1
0
1
2
3
6
Convolutional autoencoder
19
Input Conv1 Conv2 Conv3 Deconv1 Deconv2 Deconv3
RIGHT WHALE RECOGNITION
20
“Face” recognition for whales
21
•  Identify whales in aerial photographs
•  ~4500 labeled images, ~450 whales
•  ~7000 test images
•  Pictures taken over 10 years
•  Automating the identification process will aid conservation efforts
•  https://www.kaggle.com/c/noaa-right-whale-recognition
•  $10,000 prize pool
Right whales
22
•  One of the most endangered whales
•  Fewer than 500 North Atlantic right whales left
•  Hunted almost to extinction
•  Makes a V shaped blow
•  Has the largest testicle in the animal kingdom
•  Eats 2000 pounds of plankton a day
23
“All y’all look alike!”
24
Source: http://rwcatalog.neaq.org/
25
Source: https://teacheratsea.files.wordpress.com/2015/05/img_2292.jpg
Brute force approach
26
Churchill
Quasimodo
Aphrodite ?
*Not actual names
A better method
27
Churchill
Quasimodo
Aphrodite ?
Object localization
28
•  Many approaches in the literature
•  Overfeat (http://arxiv.org/pdf/1312.6229v4.pdf)
•  R-CNN (http://arxiv.org/pdf/1311.2524v5.pdf)
Even better!
29
Churchill
Quasimodo
Aphrodite ?
Getting mugshots
30
•  How to go from to ?
•  Training set can be manually labeled
•  No manual operations allowed on test set!
•  Estimate the heading (angle) of the whale using a CNN?
Estimate angle
31
220°
160°
120° ?
An easier way to estimate angle
32
•  Find two points along the whale’s body
•  θ = arctan((y1 – y2) / (x1 – x2))
•  But how do you label the test images?
θ
Train with co-ords?
33
(80, 80)
(90, 130)
(80, 190) ?
Train with a mask
34
?
Code for convolutional encoder
35
init = Gaussian(scale=0.1)
opt = Adadelta(decay=0.9)
common = dict(init=init, batch_norm=True, activation=Rectlin())
layers = []
nchan = 128
layers.append(Conv((2, 2, nchan), strides=2, **common))
for idx in range(16):
layers.append(Conv((3, 3, nchan), **common))
if nchan > 16:
nchan /= 2
for idx in range(15):
layers.append(Deconv((3, 3, nchan), **common))
layers.append(Deconv((4, 4, nchan), strides=2, **common))
layers.append(Deconv((3, 3, 1), init=init))
cost = GeneralizedCost(costfunc=SumSquared())
mlp = Model(layers=layers)
callbacks = Callbacks(mlp, train, eval_set=val, **args.callback_args)
mlp.fit(train, optimizer=opt, num_epochs=args.epochs, cost=cost, callbacks=callbacks)
Code for classifier
36
init = Gaussian(scale=0.01)
opt = Adadelta(decay=0.9)
common = dict(init=init, batch_norm=True, activation=Rectlin())
layers = []
nchan = 64
layers.append(Conv((2, 2, nchan), strides=2, **common))
for idx in range(6):
if nchan > 1024:
nchan = 1024
layers.append(Conv((3, 3, nchan), strides=1, **common))
layers.append(Pooling(2, strides=2))
nchan *= 2
layers.append(DropoutBinary(keep=0.5))
layers.append(Affine(nout=447, init=init, activation=Softmax()))
cost = GeneralizedCost(costfunc=CrossEntropyMulti())
mlp = Model(layers=layers)
callbacks = Callbacks(mlp, train, eval_set=val, **args.callback_args)
mlp.fit(train, optimizer=opt, num_epochs=args.epochs, cost=cost, callbacks=callbacks)
37
Results –heatmaps
Input epoch 0 epoch 2 epoch 4 epoch 6
Prediction indicated by
38
Results –sample crops from test set
39
Acknowledgements
40
•  NOAA Fisheries
•  Kaggle
•  Developers of sloth
•  Playground Global
Anil Thomas - Object recognition

More Related Content

PPTX
Cytoplasm & cell organelles By Manoj Dhital (M.Sc Medical Microbiology))
PDF
Urs Köster - Convolutional and Recurrent Neural Networks
PDF
Using neon for pattern recognition in audio data
PDF
Mask-RCNN for Instance Segmentation
PDF
Neural Networks in the Wild: Handwriting Recognition
PDF
Recent Object Detection Research & Person Detection
PDF
Scaling classical clone detection tools for ultra large datasets
PDF
2013 syscan360 yuki_chen_syscan360_exploit your java native vulnerabilities o...
Cytoplasm & cell organelles By Manoj Dhital (M.Sc Medical Microbiology))
Urs Köster - Convolutional and Recurrent Neural Networks
Using neon for pattern recognition in audio data
Mask-RCNN for Instance Segmentation
Neural Networks in the Wild: Handwriting Recognition
Recent Object Detection Research & Person Detection
Scaling classical clone detection tools for ultra large datasets
2013 syscan360 yuki_chen_syscan360_exploit your java native vulnerabilities o...

Similar to Anil Thomas - Object recognition (20)

PPTX
Powerpoint templates for machine learning.pptx
PDF
작은 스타트업에서 머신러닝 맛보기
PPTX
Caffe framework tutorial2
PDF
Eye deep
PDF
SIGNATE 国立国会図書館の画像データレイアウト認識 1st place solution
PDF
Understanding low latency jvm gcs V2
PDF
Feature Engineering
PPTX
Deep learning requirement and notes for novoice
PDF
Attention is All You Need (Transformer)
PDF
NAS EP Algorithm
PDF
2018AOI論壇_深度學習於表面瑕疪檢測_元智大學蔡篤銘
PPTX
Deep Learning in Computer Vision
PDF
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...
PDF
CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...
PDF
Engineering fast indexes (Deepdive)
PDF
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
PPTX
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
PPTX
Oxford 05-oct-2012
PPTX
Information from pixels
PDF
Auro tripathy - Localizing with CNNs
Powerpoint templates for machine learning.pptx
작은 스타트업에서 머신러닝 맛보기
Caffe framework tutorial2
Eye deep
SIGNATE 国立国会図書館の画像データレイアウト認識 1st place solution
Understanding low latency jvm gcs V2
Feature Engineering
Deep learning requirement and notes for novoice
Attention is All You Need (Transformer)
NAS EP Algorithm
2018AOI論壇_深度學習於表面瑕疪檢測_元智大學蔡篤銘
Deep Learning in Computer Vision
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...
CIFAR-10 for DAWNBench: Wide ResNets, Mixup Augmentation and "Super Convergen...
Engineering fast indexes (Deepdive)
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
Oxford 05-oct-2012
Information from pixels
Auro tripathy - Localizing with CNNs
Ad

More from Intel Nervana (19)

PDF
Introduction to Deep Learning and neon at Galvanize
PDF
Women in AI kickoff
PDF
Intel Nervana Artificial Intelligence Meetup 1/31/17
PDF
Introduction to deep learning @ Startup.ML by Andres Rodriguez
PDF
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
PPTX
Intel Nervana Artificial Intelligence Meetup 11/30/16
PDF
Deep Learning at Scale
PDF
ODSC West
PPTX
Deep Learning for Robotics
PDF
RE-Work Deep Learning Summit - September 2016
PDF
An Analysis of Convolution for Inference
PDF
Rethinking computation: A processor architecture for machine intelligence
PDF
Urs Köster Presenting at RE-Work DL Summit in Boston
PDF
Nervana and the Future of Computing
PDF
High-Performance GPU Programming for Deep Learning
PDF
Object Detection and Recognition
PDF
Video Activity Recognition and NLP Q&A Model Example
PDF
Introduction to Deep Learning with Will Constable
PDF
Startup.Ml: Using neon for NLP and Localization Applications
Introduction to Deep Learning and neon at Galvanize
Women in AI kickoff
Intel Nervana Artificial Intelligence Meetup 1/31/17
Introduction to deep learning @ Startup.ML by Andres Rodriguez
Andres Rodriguez at AI Frontiers: Catalyzing Deep Learning's Impact in the En...
Intel Nervana Artificial Intelligence Meetup 11/30/16
Deep Learning at Scale
ODSC West
Deep Learning for Robotics
RE-Work Deep Learning Summit - September 2016
An Analysis of Convolution for Inference
Rethinking computation: A processor architecture for machine intelligence
Urs Köster Presenting at RE-Work DL Summit in Boston
Nervana and the Future of Computing
High-Performance GPU Programming for Deep Learning
Object Detection and Recognition
Video Activity Recognition and NLP Q&A Model Example
Introduction to Deep Learning with Will Constable
Startup.Ml: Using neon for NLP and Localization Applications
Ad

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Approach and Philosophy of On baking technology
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
Teaching material agriculture food technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Cloud computing and distributed systems.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
NewMind AI Weekly Chronicles - August'25 Week I
Understanding_Digital_Forensics_Presentation.pptx
MYSQL Presentation for SQL database connectivity
Mobile App Security Testing_ A Comprehensive Guide.pdf
Review of recent advances in non-invasive hemoglobin estimation
Chapter 3 Spatial Domain Image Processing.pdf
Encapsulation_ Review paper, used for researhc scholars
Approach and Philosophy of On baking technology
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Advanced methodologies resolving dimensionality complications for autism neur...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Teaching material agriculture food technology
“AI and Expert System Decision Support & Business Intelligence Systems”
NewMind AI Monthly Chronicles - July 2025
Cloud computing and distributed systems.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

Anil Thomas - Object recognition