Anil Thomas - Object recognition

Object Recognition for Fun and Profit
Anil Thomas
SV Deep Learning Meetup
November 17th, 2015

Outline
2
•  Neon examples
•  Intro to convnets
•  Convolutional autoencoder
•  Whale recognition challenge

Neon
4
Backends
NervanaCPU, NervanaGPU
NervanaEngine (internal)
Datasets
Images: ImageNet, CIFAR-10, MNIST
Captions: flickr8k, flickr30k, COCO; Text: Penn Treebank, hutter-prize, IMDB, Amazon
Initializers Constant, Uniform, Gaussian, Glorot Uniform
Learning rules
Gradient Descent with Momentum
RMSProp, AdaDelta, Adam, Adagrad
Activations Rectified Linear, Softmax, Tanh, Logistic
Layers
Linear, Convolution, Pooling, Deconvolution, Dropout
Recurrent, Long Short-Term Memory, Gated Recurrent Unit, Recurrent Sum,
LookupTable
Costs Binary Cross Entropy, Multiclass Cross Entropy, Sum of Squares Error
Metrics Misclassification, TopKMisclassification, Accuracy
•  Modular components
•  Extensible, OO design
•  Documentation
•  neon.nervanasys.com

Convolution
0 1 2
3 4 5
6 7 8
0 1
2 3
19 25
37 43
0 1 3 4 0 1 2 3 19
7
•  Each element in the output is the result of a dot
product between two vectors

Convolutional layer
8
0
1
2
3
4
5
6
7
8
19
8
0 1 2
3 4 5
6 7 8
0 1
2 3
19 25
37 43
0
2
3
1
0
2
3
1
0
2
3
1
0
2
3
1
25
37
43

Convolutional layer
9
x +
x +
x +
x =
00
11
32
43
The weights are shared among
the units.
0
1
2
3
4
5
6
7
8
0
2
3
1
0
2
3
1
0
2
3
1
0
2
3
1
19
19

Recognizing patterns
10
Detected the pattern!

11
B0 B1 B2
B3 B4 B5
B6 B7 B8
G0 G1 G2
G3 G4 G5
G6 G7 G8
R0 R1 R2
R3 R4 R5
R6 R7 R8

12
B0 B1 B2
B3 B4 B5
B6 B7 B8
G0 G1 G2
G3 G4 G5
G6 G7 G8
R0 R1 R2
R3 R4 R5
R6 R7 R8

B0 B1 B2
B3 B4 B5
B6 B7 B8
13
G0 G1 G2
G3 G4 G5
G6 G7 G8
R0 R1 R2
R3 R4 R5
R6 R7 R8

B0 B1 B2
B3 B4 B5
B6 B7 B8
14
G0 G1 G2
G3 G4 G5
G6 G7 G8
R0 R1 R2
R3 R4 R5
R6 R7 R8

Max pooling
0 1 2
3 4 5
6 7 8
4 5
7 8
0 1 3 4 4
15
•  Each element in the output is the maximum value
within the pooling window
Max( )

Deconvolution
16
•  Ill posed problem, but we can approximate
•  Scatter versus gather
•  Used in convlayer backprop
•  Equivalent to convolution with a flipped kernel on zero padded input
•  Useful for convolutional autoencoders

Deconv layer
17
0 0 1
0 4 6
4 12 9
0 1
2 3
0 1
2 3

Deconv layer
18
0
0
1
0
4
6
4
12
9
0
23
1
x +
x =
31
13
0
2
3
1
0
2
3
1
0
2
3
1
0
1
2
3
6

Convolutional autoencoder
19
Input Conv1 Conv2 Conv3 Deconv1 Deconv2 Deconv3

“Face” recognition for whales
21
•  Identify whales in aerial photographs
•  ~4500 labeled images, ~450 whales
•  ~7000 test images
•  Pictures taken over 10 years
•  Automating the identification process will aid conservation efforts
•  https://www.kaggle.com/c/noaa-right-whale-recognition
•  $10,000 prize pool

Right whales
22
•  One of the most endangered whales
•  Fewer than 500 North Atlantic right whales left
•  Hunted almost to extinction
•  Makes a V shaped blow
•  Has the largest testicle in the animal kingdom
•  Eats 2000 pounds of plankton a day

23
“All y’all look alike!”

24
Source: http://rwcatalog.neaq.org/

25
Source: https://teacheratsea.files.wordpress.com/2015/05/img_2292.jpg

Brute force approach
26
Churchill
Quasimodo
Aphrodite ?
*Not actual names

A better method
27
Churchill
Quasimodo
Aphrodite ?

Object localization
28
•  Many approaches in the literature
•  Overfeat (http://arxiv.org/pdf/1312.6229v4.pdf)
•  R-CNN (http://arxiv.org/pdf/1311.2524v5.pdf)

Even better!
29
Churchill
Quasimodo
Aphrodite ?

Getting mugshots
30
•  How to go from to ?
•  Training set can be manually labeled
•  No manual operations allowed on test set!
•  Estimate the heading (angle) of the whale using a CNN?

Estimate angle
31
220°
160°
120° ?

An easier way to estimate angle
32
•  Find two points along the whale’s body
•  θ = arctan((y1 – y2) / (x1 – x2))
•  But how do you label the test images?
θ

Train with co-ords?
33
(80, 80)
(90, 130)
(80, 190) ?

Code for convolutional encoder
35
init = Gaussian(scale=0.1)
opt = Adadelta(decay=0.9)
common = dict(init=init, batch_norm=True, activation=Rectlin())
layers = []
nchan = 128
layers.append(Conv((2, 2, nchan), strides=2, **common))
for idx in range(16):
layers.append(Conv((3, 3, nchan), **common))
if nchan > 16:
nchan /= 2
layers.append(Deconv((3, 3, nchan), **common))
layers.append(Deconv((4, 4, nchan), strides=2, **common))
layers.append(Deconv((3, 3, 1), init=init))
cost = GeneralizedCost(costfunc=SumSquared())
mlp = Model(layers=layers)
callbacks = Callbacks(mlp, train, eval_set=val, **args.callback_args)
mlp.fit(train, optimizer=opt, num_epochs=args.epochs, cost=cost, callbacks=callbacks)

Code for classifier
36
init = Gaussian(scale=0.01)
opt = Adadelta(decay=0.9)
common = dict(init=init, batch_norm=True, activation=Rectlin())
layers = []
nchan = 64
if nchan > 1024:
nchan = 1024
layers.append(Pooling(2, strides=2))
nchan *= 2
layers.append(DropoutBinary(keep=0.5))
layers.append(Affine(nout=447, init=init, activation=Softmax()))
cost = GeneralizedCost(costfunc=CrossEntropyMulti())
mlp = Model(layers=layers)
callbacks = Callbacks(mlp, train, eval_set=val, **args.callback_args)
mlp.fit(train, optimizer=opt, num_epochs=args.epochs, cost=cost, callbacks=callbacks)

37
Results –heatmaps
Input epoch 0 epoch 2 epoch 4 epoch 6
Prediction indicated by

38
Results –sample crops from test set

Acknowledgements
40
•  NOAA Fisheries
•  Kaggle
•  Developers of sloth
•  Playground Global

Anil Thomas - Object recognition

Anil Thomas - Object recognition

More Related Content

Similar to Anil Thomas - Object recognition (20)

More from Intel Nervana (19)

Recently uploaded (20)

Anil Thomas - Object recognition