SlideShare a Scribd company logo
Deep Learning and
Image Analytics using
Python
sanparith.marukatat@nectec.or.th
Code examples are available at

https://goo.gl/PKLd97
Neural Networks Timeline
Learning
technique
for
deep structure
Big data
Computing

power
GPU, etc.
Neural Networks
• Neurons are connected via
synapse
• A neuron receives activations
from other neurons
• When these activations reach a
threshold, it fires an electronics
signal to other neurons http://en.wikipedia.org/wiki/Neuron
Artificial Neural Networks
0.1
0.2
0.1
0.50.1
0.3
1=
0.8=
0.2=
Multi-Layer Perceptron
• Number of input nodes = number of features
• 1 hidden layer
• Full connection between consecutive layers
• 2-class
• 1 output node with class label +1 and -1 or 0
• more than 2 classes
• Number of output nodes = number of classes (WHY?)
• Each output node is associated with a single class
• Classification rule: put the input pattern in the class whose
corresponding output node gives maximal value
Deep learning and image analytics using Python by Dr Sanparit
CSV format
ex1: MLP
Load data
Split into
• input feature vector
• class
Normalize input
Random split
Build an MLP
• 8 input nodes
• 1 hidden layer
• 100 hidden nodes
• 1 output node
• Sigmoid units
• Cross-entropy
• Adam optimizer
Training
Why?
Bias
• Parameters = weights
• How to train = Gradient
Gradient
• Gradient of a function f having a set of
parameters θ is a vector of partial derivatives
of f with respect to each parameter θi
• Gradient indicates the direction of change for
θ which greatest increases f(θ)
• Question: How can we use the Gradient to train
the neural networks?
Error Back-propagation (Backprop)
• Squared error
• Gradient points to direction of increased E -> So what?
• Use chain rule
• h(x) = f(g(x))
• h'(x) = ?
Backprop (1)
• If j is on output layer
• If j is on hidden layer
Backprop (2)
• Calculation backward from output layers
• Change objective function affects only output nodes
• Cross entropy for classification problem
• Change activation function affects partial diff sl
j
• Can be applied to any NN structures
Weights update
• Basic update
• Common update today
learning rate
momentum weight decay
Optimizers
• SGD (stochastic gradient descent)
• Adadelta: adaptive learning rate method
• RMSprop: divide the gradient by running average of its
recent magnitude
• Adam: use first and second moment to scale the gradient
• Nadam: Adam RMSprop with Nesterov momentum
• ….
Neural Network for Machine Learning
Lecture 6c: The momentum method
G. Hinton
https://www.youtube.com/watch?v=8yg2mRJx-z4
ex2: MNIST with MLP
Load MNIST data
bitmap 28x28 pixels = 784 features
10 classes
Deep learning and image analytics using Python by Dr Sanparit
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based Learning
Applied to Document Recognition", Proc. Of the IEEE, November 1998
MLP
CNN
Convolutional NN (CNN)
• Image Convolution
• Feature extractor + Classifier
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based Learning Applied to
Document Recognition", Proc. Of the IEEE, November 1998
Conv2D
• Input shape = (nchannels, w, w)
• format = ‘channels_first’
• Conv2D( filters, kernel_size, padding, strides, data_format)
• filters = number of convolution kernels = number of output channels
• kernel_size: ex (3,3)
• padding: ‘same’, ‘valid’
• strides: how to slide the kernel across the image
• ex: Conv2D(10, (3,3), padding=‘same’)
• Output shape = (10, w,w)
ex3: MNIST with CNN
BatchNormalization: normalize outputs of a layer
MaxPooling: reduce size of the feature maps
alternative AveragePooling
Is this larger or smaller than previous MLP?
ReLU(x) = max{ 0 , x }
MLP has 79,510 params
yields 96%
MLP uses ~2s/epoch
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based Learning
Applied to Document Recognition", Proc. Of the IEEE, November 1998
MLP
CNN
1.2 million params + preprocessing
• CNN achieves better results compared to MLP
• MLP structure is simpler but uses larger number
of parameters
• CNN is deeper
• CNN is slower -> GPU since 2010,2012-now!!
• CNN top layers are MLP
• MLP with deeper structure yields bad result ->
gradient vanishing problem
Gradient Vanishing
• Backprop
• Solutions
• Pretraining: stack of RBMs, stack of Autoencoders
• CNN: shared weights
• ReLU: set f’ = 1 or 0
<1
G. Hinton, S. Osindero, and Y.-W. Teh, “A Fast Learning Algorithm for Deep Belief Nets",
In Neural Computation, 18, pp. 1527-1554, 2006
Labeled faces in the wild
Y. Sun et al. Deep Learning Face Representation from Predicting 10,000 classes, CVPR 2014
http://vis-www.cs.umass.edu/lfw/
ex4: DeepID network
• Sun et al. used 60 of these NNs.
• Each one is trained on part of the
face images
Y. Sun et al. Deep Learning Face Representation from Predicting 10,000 classes, CVPR 2014
• Same network structure but trained on different dataset yields
different performance
• Now you should know how to construct basic CNN
• The design of the CNN structure is an open problem
• The number of kernels
• The depth of the network
• Reduce size or not
• Activations
• …
Reuse trained CNN
Almost the same structure
DeepID trained on
CelebFace and tested on
LFW
Reuse trained CNN
• Food & Restaurant domain
• Unconstrained images
• Manual tags
• Food / Non-food
Some results
• GIST (global feature) + SVM (RBF):
85.57%
• SIFT (local feature) + BoF + SVM
(Histogram intersection): 89.69%
• SIFT + SPM (spatial pyramid
matching) + LLC (locality-constrained
linear coding) + SVM (linear): 91.48%
• CNN (AlexNet trained on other
dataset) + SVM (linear): 93.58%
S. Lazebnik et al. “Beyond bag of Features: spatial Pyramid Matching for
Natural Scene Categories”, CVPR 2006
J. Wang et al. “Locality-constrained Linear Coding for Image Classification”, CVPR 2010
D. Lowe “Object recognition from local scale-invariant features“, ICCV 1999
ImageNet challenge
• 2010-2012: SVM + Spatial Pyramid + local features
• 2012: AlexNet (7 layers, 60M params, Drop-out, ReLU, GPU)
• 2013: OverFeat (8 layers, bounding box regression)
• 2014: GooLeNet (20 “layers”, 1M params, Inception
module), VGG (3x3 kernel, 20 layers)
• 2015: ResNet (150 layers, skip connection)
• 2016: Combined model (ResNet, Inception, Inception-
ResNet, Wide-ResNet, …)
Overfit problem
• Understand VS memorizing
• Rule of thumbs: when #params is large the model tends to be overfit
• Problem: NN structure is defined first!
• Solution
• Early stopping
• Weights decay
• Optimal brain damage
• Drop-out ~simulated brain damage
• Increase training data
validation error
training error
iterations
Deep learning and image analytics using Python by Dr Sanparit
Inception module
Original design Variations
Explore various methods to
combine convolutions
C. Szegedy et al. “Rethinking the Inception Architecture for Computer Vision”, CVPR 2016
Xception module
• Convolution kernel finds correlation in 3D (2D spatial + 1D channel)
• Inception hyp: cross-channel and spatial correlations can be
decoupled
• Extreme case: Xception module
F. Chollet “Xception: Deep Learning with Depthwise Separable Convolutions”, arXiv:1610.02357
ResNet
• Add skip connections
• Weights of unnecessary blocks will be driven
toward zeros -> residual
• Acts like mixture of several shallower networks
ResNet in Keras
Deep learning and image analytics using Python by Dr Sanparit
How to improve further?
• Change CNN structure
• Pre-processing
• Increase training data: ex use tangent vectors
Q & A

More Related Content

PDF
Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
PPTX
Deep Learning in Recommender Systems - RecSys Summer School 2017
PPTX
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
PPTX
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
PDF
Domain Transfer and Adaptation Survey
PDF
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
PDF
Deep Learning for Recommender Systems RecSys2017 Tutorial
PDF
Context-aware preference modeling with factorization
Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
Deep Learning in Recommender Systems - RecSys Summer School 2017
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Domain Transfer and Adaptation Survey
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
Deep Learning for Recommender Systems RecSys2017 Tutorial
Context-aware preference modeling with factorization

What's hot (20)

PPTX
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
PDF
Foundations: Artificial Neural Networks
PPTX
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
PDF
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
PDF
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
PPTX
Tg noh jeju_workshop
ODP
Master Defense Slides (translated)
PPTX
Deep learning with TensorFlow
PPTX
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
PDF
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
PDF
Generative Models for General Audiences
PDF
Learning to learn unlearned feature for segmentation
PDF
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
PPTX
Attention Is All You Need
PDF
Deep Learning for Natural Language Processing
PDF
Deep Learning for Computer Vision: Visualization (UPC 2016)
PDF
Devil in the Details: Analysing the Performance of ConvNet Features
PDF
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
PPTX
Machine Learning Essentials Demystified part1 | Big Data Demystified
PDF
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Foundations: Artificial Neural Networks
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Tg noh jeju_workshop
Master Defense Slides (translated)
Deep learning with TensorFlow
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Generative Models for General Audiences
Learning to learn unlearned feature for segmentation
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Attention Is All You Need
Deep Learning for Natural Language Processing
Deep Learning for Computer Vision: Visualization (UPC 2016)
Devil in the Details: Analysing the Performance of ConvNet Features
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Machine Learning Essentials Demystified part1 | Big Data Demystified
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Ad

Similar to Deep learning and image analytics using Python by Dr Sanparit (20)

PPTX
Deep Learning
PPTX
Introduction to computer vision with Convoluted Neural Networks
PPTX
Introduction to computer vision
PPTX
Cvpr 2018 papers review (efficient computing)
PPTX
Introduction to deep learning
PDF
Separating Hype from Reality in Deep Learning with Sameer Farooqui
PPTX
A Survey of Convolutional Neural Networks
PDF
Apache MXNet ODSC West 2018
PDF
Fundamental of deep learning
PDF
Recurrent Neural Networks, LSTM and GRU
PPTX
Deep learning from a novice perspective
PDF
DLD meetup 2017, Efficient Deep Learning
PPTX
Deep Learning in Computer Vision
PDF
Convolutional Neural Networks (CNN)
PDF
Do deep nets really need to be deep?
PPTX
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PPTX
Introduction to deep learning
PPTX
Tìm hiểu về CNN và ResNet | Computer Vision
PPTX
Convolutional neural networks 이론과 응용
PPTX
AI powered emotion recognition: From Inception to Production - Global AI Conf...
Deep Learning
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision
Cvpr 2018 papers review (efficient computing)
Introduction to deep learning
Separating Hype from Reality in Deep Learning with Sameer Farooqui
A Survey of Convolutional Neural Networks
Apache MXNet ODSC West 2018
Fundamental of deep learning
Recurrent Neural Networks, LSTM and GRU
Deep learning from a novice perspective
DLD meetup 2017, Efficient Deep Learning
Deep Learning in Computer Vision
Convolutional Neural Networks (CNN)
Do deep nets really need to be deep?
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Introduction to deep learning
Tìm hiểu về CNN và ResNet | Computer Vision
Convolutional neural networks 이론과 응용
AI powered emotion recognition: From Inception to Production - Global AI Conf...
Ad

More from BAINIDA (20)

PDF
ดนตรีของพระเจ้าแผ่นดิน อานนท์ ศักดิ์วรวิชญ์ สุรพงษ์ บ้านไกรทอง หอประชุมวปอ 7...
PDF
Mixed methods in social and behavioral sciences
PDF
Advanced quantitative research methods in political science and pa
PPTX
Latest thailand election2019report
PDF
Data science in medicine
PPTX
Nursing data science
PDF
Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...
PDF
Statistics and big data for justice and fairness
PDF
Data science and big data for business and industrial application
PDF
Update trend: Free digital marketing metrics for start-up
PDF
Advent of ds and stat adjustment
PPTX
เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร
PPTX
Data visualization. map
PPTX
Dark data by Worapol Alex Pongpech
PDF
Deepcut Thai word Segmentation @ NIDA
PPTX
Professionals and wanna be in Business Analytics and Data Science
PDF
Visualizing for impact final
PPTX
Python programming workshop
PDF
Second prize business plan @ the First NIDA business analytics and data scien...
PDF
Second prize data analysis @ the First NIDA business analytics and data scie...
ดนตรีของพระเจ้าแผ่นดิน อานนท์ ศักดิ์วรวิชญ์ สุรพงษ์ บ้านไกรทอง หอประชุมวปอ 7...
Mixed methods in social and behavioral sciences
Advanced quantitative research methods in political science and pa
Latest thailand election2019report
Data science in medicine
Nursing data science
Financial time series analysis with R@the 3rd NIDA BADS conference by Asst. p...
Statistics and big data for justice and fairness
Data science and big data for business and industrial application
Update trend: Free digital marketing metrics for start-up
Advent of ds and stat adjustment
เมื่อ Data Science เข้ามา สถิติศาสตร์จะปรับตัวอย่างไร
Data visualization. map
Dark data by Worapol Alex Pongpech
Deepcut Thai word Segmentation @ NIDA
Professionals and wanna be in Business Analytics and Data Science
Visualizing for impact final
Python programming workshop
Second prize business plan @ the First NIDA business analytics and data scien...
Second prize data analysis @ the First NIDA business analytics and data scie...

Recently uploaded (20)

PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
master seminar digital applications in india
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Lesson notes of climatology university.
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Pharma ospi slides which help in ospi learning
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
RMMM.pdf make it easy to upload and study
PPTX
Cell Types and Its function , kingdom of life
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Pre independence Education in Inndia.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
Renaissance Architecture: A Journey from Faith to Humanism
master seminar digital applications in india
Final Presentation General Medicine 03-08-2024.pptx
Lesson notes of climatology university.
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Pharma ospi slides which help in ospi learning
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Pharmacology of Heart Failure /Pharmacotherapy of CHF
RMMM.pdf make it easy to upload and study
Cell Types and Its function , kingdom of life
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Pre independence Education in Inndia.pdf
Cell Structure & Organelles in detailed.
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?

Deep learning and image analytics using Python by Dr Sanparit

  • 1. Deep Learning and Image Analytics using Python sanparith.marukatat@nectec.or.th Code examples are available at
 https://goo.gl/PKLd97
  • 4. Neural Networks • Neurons are connected via synapse • A neuron receives activations from other neurons • When these activations reach a threshold, it fires an electronics signal to other neurons http://en.wikipedia.org/wiki/Neuron
  • 6. Multi-Layer Perceptron • Number of input nodes = number of features • 1 hidden layer • Full connection between consecutive layers • 2-class • 1 output node with class label +1 and -1 or 0 • more than 2 classes • Number of output nodes = number of classes (WHY?) • Each output node is associated with a single class • Classification rule: put the input pattern in the class whose corresponding output node gives maximal value
  • 9. ex1: MLP Load data Split into • input feature vector • class Normalize input Random split Build an MLP • 8 input nodes • 1 hidden layer • 100 hidden nodes • 1 output node • Sigmoid units • Cross-entropy • Adam optimizer Training
  • 10. Why? Bias • Parameters = weights • How to train = Gradient
  • 11. Gradient • Gradient of a function f having a set of parameters θ is a vector of partial derivatives of f with respect to each parameter θi • Gradient indicates the direction of change for θ which greatest increases f(θ) • Question: How can we use the Gradient to train the neural networks?
  • 12. Error Back-propagation (Backprop) • Squared error • Gradient points to direction of increased E -> So what? • Use chain rule • h(x) = f(g(x)) • h'(x) = ?
  • 13. Backprop (1) • If j is on output layer • If j is on hidden layer
  • 14. Backprop (2) • Calculation backward from output layers • Change objective function affects only output nodes • Cross entropy for classification problem • Change activation function affects partial diff sl j • Can be applied to any NN structures
  • 15. Weights update • Basic update • Common update today learning rate momentum weight decay
  • 16. Optimizers • SGD (stochastic gradient descent) • Adadelta: adaptive learning rate method • RMSprop: divide the gradient by running average of its recent magnitude • Adam: use first and second moment to scale the gradient • Nadam: Adam RMSprop with Nesterov momentum • ….
  • 17. Neural Network for Machine Learning Lecture 6c: The momentum method G. Hinton https://www.youtube.com/watch?v=8yg2mRJx-z4
  • 18. ex2: MNIST with MLP Load MNIST data bitmap 28x28 pixels = 784 features 10 classes
  • 20. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based Learning Applied to Document Recognition", Proc. Of the IEEE, November 1998 MLP CNN
  • 21. Convolutional NN (CNN) • Image Convolution • Feature extractor + Classifier Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based Learning Applied to Document Recognition", Proc. Of the IEEE, November 1998
  • 22. Conv2D • Input shape = (nchannels, w, w) • format = ‘channels_first’ • Conv2D( filters, kernel_size, padding, strides, data_format) • filters = number of convolution kernels = number of output channels • kernel_size: ex (3,3) • padding: ‘same’, ‘valid’ • strides: how to slide the kernel across the image • ex: Conv2D(10, (3,3), padding=‘same’) • Output shape = (10, w,w)
  • 23. ex3: MNIST with CNN BatchNormalization: normalize outputs of a layer MaxPooling: reduce size of the feature maps alternative AveragePooling Is this larger or smaller than previous MLP? ReLU(x) = max{ 0 , x }
  • 24. MLP has 79,510 params yields 96% MLP uses ~2s/epoch
  • 25. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-Based Learning Applied to Document Recognition", Proc. Of the IEEE, November 1998 MLP CNN 1.2 million params + preprocessing
  • 26. • CNN achieves better results compared to MLP • MLP structure is simpler but uses larger number of parameters • CNN is deeper • CNN is slower -> GPU since 2010,2012-now!! • CNN top layers are MLP • MLP with deeper structure yields bad result -> gradient vanishing problem
  • 27. Gradient Vanishing • Backprop • Solutions • Pretraining: stack of RBMs, stack of Autoencoders • CNN: shared weights • ReLU: set f’ = 1 or 0 <1 G. Hinton, S. Osindero, and Y.-W. Teh, “A Fast Learning Algorithm for Deep Belief Nets", In Neural Computation, 18, pp. 1527-1554, 2006
  • 28. Labeled faces in the wild Y. Sun et al. Deep Learning Face Representation from Predicting 10,000 classes, CVPR 2014 http://vis-www.cs.umass.edu/lfw/
  • 29. ex4: DeepID network • Sun et al. used 60 of these NNs. • Each one is trained on part of the face images Y. Sun et al. Deep Learning Face Representation from Predicting 10,000 classes, CVPR 2014
  • 30. • Same network structure but trained on different dataset yields different performance • Now you should know how to construct basic CNN • The design of the CNN structure is an open problem • The number of kernels • The depth of the network • Reduce size or not • Activations • …
  • 31. Reuse trained CNN Almost the same structure DeepID trained on CelebFace and tested on LFW
  • 32. Reuse trained CNN • Food & Restaurant domain • Unconstrained images • Manual tags • Food / Non-food
  • 33. Some results • GIST (global feature) + SVM (RBF): 85.57% • SIFT (local feature) + BoF + SVM (Histogram intersection): 89.69% • SIFT + SPM (spatial pyramid matching) + LLC (locality-constrained linear coding) + SVM (linear): 91.48% • CNN (AlexNet trained on other dataset) + SVM (linear): 93.58% S. Lazebnik et al. “Beyond bag of Features: spatial Pyramid Matching for Natural Scene Categories”, CVPR 2006 J. Wang et al. “Locality-constrained Linear Coding for Image Classification”, CVPR 2010 D. Lowe “Object recognition from local scale-invariant features“, ICCV 1999
  • 34. ImageNet challenge • 2010-2012: SVM + Spatial Pyramid + local features • 2012: AlexNet (7 layers, 60M params, Drop-out, ReLU, GPU) • 2013: OverFeat (8 layers, bounding box regression) • 2014: GooLeNet (20 “layers”, 1M params, Inception module), VGG (3x3 kernel, 20 layers) • 2015: ResNet (150 layers, skip connection) • 2016: Combined model (ResNet, Inception, Inception- ResNet, Wide-ResNet, …)
  • 35. Overfit problem • Understand VS memorizing • Rule of thumbs: when #params is large the model tends to be overfit • Problem: NN structure is defined first! • Solution • Early stopping • Weights decay • Optimal brain damage • Drop-out ~simulated brain damage • Increase training data validation error training error iterations
  • 37. Inception module Original design Variations Explore various methods to combine convolutions C. Szegedy et al. “Rethinking the Inception Architecture for Computer Vision”, CVPR 2016
  • 38. Xception module • Convolution kernel finds correlation in 3D (2D spatial + 1D channel) • Inception hyp: cross-channel and spatial correlations can be decoupled • Extreme case: Xception module F. Chollet “Xception: Deep Learning with Depthwise Separable Convolutions”, arXiv:1610.02357
  • 39. ResNet • Add skip connections • Weights of unnecessary blocks will be driven toward zeros -> residual • Acts like mixture of several shallower networks
  • 42. How to improve further? • Change CNN structure • Pre-processing • Increase training data: ex use tangent vectors
  • 43. Q & A