SlideShare a Scribd company logo
Francesco Pugliese, PhD
Italian National Institute of Statistics, Division
"Information and Application Architecture“, Directorate
for methodology and statistical design
Matteo Testi, MsC
Data Science at Sferanet S.r.l
Email Francesco Pugliese : francesco.pugliese@istat.it
Email Matteo Testi: testi@sferaspa.com
• Image classification is the task of taking an input image and
outputting a class (a cat, dog, etc) or a probability of those
classes that better describe the image. For humans, this task of
recognition is one of the first skills we learn.
• When we see an image or just when we look at the world
around us, most of the time we are able to immediately
characterize the scene and give each object a label, all this
without even consciously noticing it.
2
3
What we see
What computers see
4
Convolutional Neural Networks (CNNs) are biologically-inspired
variants of MLPs. From Hubel and Wiesel’s early work on the cat’s
visual cortex, we know the visual cortex contains a complex
arrangement of cells. These cells are sensitive to small sub-regions
of the visual field, called a receptive field.
5
• For example, some neurons fired when exposed to vertical
edges and some when shown horizontal or diagonal edges.
• Feature identifiers
mercoledì 3 Maggio 2017 6
*
7
LeNet was one of the very first convolutional neural networks
which helped to propel the field of Deep Learning. This
pioneering work by Yann LeCun was named LeNet5 after many
previous successful iterations since the year 1988.
8
MNIST database is handwritten digits composed of 60.000
pattern.
• 60.000 training set
• 10,000 test set
LeNet has been applied to this dataset with accuracy of 0.95%.
9
10
11
12
So far, in tens of experiments,
the resulting performances
were many magnitudes better
than other machine learning
techniques today available.
GPU
•The advent of GPUs makes possible the
training of very large neural networks with
even more than 150 millions of parameters.
BIG
DATA
• A new generation of larger
training and test sets.
Dropou
t
• Better model regularization techniques
have been discovered such as
“Dropout” or “Data Augmentation”
13
- A new study proves the relationship between Vision
capabilities and Intelligence (Tsukahara et al., 2016).
- Computer Vision needs human-like abilities.
EVERYDAY LIFE BIOMEDICAL IMAGES
14
•A new generation of machines
might accomplish typical human tasks
such as recognizing and moving
objects, driving cars, cultivating fields,
cleaning streets, city garbage
collecting, etc.
ImageNet: ImageNet is a dataset of over 15 million labeled high-resolution images belonging to
roughly 22,000 categories.
• Since 2010 a competition called «ImageNet Large-Scale Visual Recognition Challenge
(ILSVRC)» uses a subset of ImageNet with roughly 1000 images in each of 1000 categories.
Train Set: 1.2 million training images,
Validation Set: 50,000 images
Test set: 150,000 images.
15
Kaggle: In 2010, Kaggle was founded as a platform for predictive modeling and analytics competitions on which
companies and researchers post their data.
• Statisticians and data scientists from all over the world compete to produce the best models.
• Data Science Bowl 2017 was the biggest competition focused on “Lung Cancer Detection”. The competition was
founded by Arnold Foundation and awarded $1 million in prizes (1st ranked $500,000).
Train Set: around 150 CT labelled scans images per patient from 1200 patients encoded in DICOM format.
Stage 1 test set: 190 patients CT scans.
Stage 2 test : 500 patients CT scans.
16
Grand Challenges in Biomedical Image Analysis: This is a web-
site hosting new competitions in the Biomedicine field.
Specifically, LUNA (LUng Nodule Analysis) focuses on a large-
scale evaluation of automatic nodule detection algorithms.
Train Set: LIDC/IDRI database consisting of 888 CT Scans labelled
by 4 expert radiologists.
Each neuron in the convolutional layer is
connected only to a local region in the
input volume spatially. In this case there
are 5 neurons along the depth all looking
at the same region.
17
Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs. We know the
visual cortex contains a complex arrangement of cells (Hubel, D. and Wiesel, T., 1968). These cells
are sensitive to small sub-regions of the visual field, called a receptive field. Other layers are: RELU
layer, Pool Layer. Typical CNNs settings are: a) Number of Kernels (Filters), b) Receptive Field size,
b) Padding, c) Stride. These parameters are tied by the following equation:
18
Conv
Nets
AlexNet
(2012)
GoogleNet
(2014)
Residual
Nets
(2015)
VGG Net
(2° ranked
in 2014)
Traditional issues with
Convolutional Layers:
•Wide convolutional layers
determine overfitting and
vanishing gradient problem
with the Solver (SGD, Adam,
etc).
•Low depth architectures
produce raw features (need to
push the depth forward).
Model Regularization :
•Dropout - Co-adaptation and
Models Ensemble (Srivastava,
N. et al., 2014).
•Weight penalty L1/L2
•Data Augmentation (crop, flip,
rotation, ZCA whitening, etc.)
19
Critical Feautures (Krizhevsky, A. et al, 2012)
• 8 trainable layers: 5 convolutional layers and 3 fully connected layers.
• Max pooling layers after 1st, 2nd and 5th layer.
• Rectified Linear Units (ReLUs) (Nair, V., & Hinton, G. E. 2010).
• Local Response Normalization.
• 60 millions parameters, 650 thousands neurons.
• Regularizations: Dropout (prob 0.5 in the first 2 fc layers, Data Augmentation (translactions,
horizontal reflections, PCA on RGB).
• Trained on 2 GTX 580 3 GB GPUs.
Results:
• 1 CNNs: 40.7% Top-1
Error, 18.2% Top-5
Error
• 5 CNNs: 38.1% Top-1
Error,16.4% Top-5 Error
• SIFT+FVs: 26.2% Top-5
Error (Sánchez, J., et al.,
2013).
20
Critical Feautures (Simonyan, K., & Zisserman, A., 2014 ):
• Kernels with small receptive fields: 3x3 which is the smallest size to capture the notion of
left/right up/down, center. It is easy to see that a stack of two 3×3 conv. layers (without spatial
pooling in between) has an effective receptive field of 5×5, and so on.
• Small size Receptive Field is a way to increase the nonlinearity of the decision function fields of
the conv. layers.
• Increasing depth architectures: VGG-16 (2xConv3-64, 2xConv3-128, 3xConv3-256, 6xConv3-512,
3xFC), VGG-19 (same as VGG-16 but with 8xConv3-512).
• Upside: less complex topology, outperforms GoogleNet on single-network classification accuracy
• Downside: 138 millions
parameters for VGG-16 !
Results:
• Multi ConvNet model :
(D/[256;512]/256,384,51
2),
(E/[256;512]/256,384,51
2), multi-crop & dense
eval: 23.7% Top-1 Error,
6.8% Top-5 Error.
21
Critical Feautures (Szegedy, C., et al., 2015):
• Computationally Effective Deep architecture: 22 layers
• Why the name inception, you ask? Because the
module represents a network within a network. If you
don't get the reference, go watch Christopher Nolan's
“INCEPTION”, computer scientists are hilarious.
• Inception: it isbasically the parallel combination of 1×1
3×3, and 5×5 convolutional filters.
• Bottleneck layer: The great insight of the inception
module is the use of 1×1 conv-
olutional blocks (NiN) to reduce
the number of features before
the expensive parallel blocks.
• Upside: 4 millions parameters!
• Downside: Not scalable!
Results:
• 7 Models Ensemble : 6.67%
Top-5 Error.
22
Critical Feautures (He, K., et al., 2016). :
• Degradation Problem: Stacking more and more layers
IS NOT better. With the network depth increasing,
accuracy gets saturated and then degrades rapidly! It’s
an issue of “solvers”.
• Solves the “Degradation problem”: by fitting a residual
mapping which is easier to optimize.
• Shortcut connections:
• Very deep architecture: up to 1202 layers with
WideResnet with only 19.4 million parameters!
• Upside: Increasing accuracy with more depth
• Downside: They don’t consider other architectures
breakthroughs.
Results:
• ResNet : 3.57% Top-5 Error.
• CNNs show superhuman
abilities at Image Recognition!
5% Human estimated Top-5
error (Johnson, R. C., 2015).
23
24
Problems:
• Feature extraction: In biomedicine feature
extraction is not as easy as in an Imagenet
competition with general images. A previous
Image Preprocessing is needed. This is called
Segmentation.
• On Kaggle website there are whole
competitions just regarding Segmentation.
One of these was called «Ultrasound Nerve
Segmentation».
25
Critical Feautures (Ronneberger, O., et al., 2015):
• U-NET can be trained end-to-end from very few images
and outperforms the prior best methods.
• It consists of a contracting path (left side) to capture
context and an symmetric expansive path (right side)
enabling precise localization.
• Upsampling part (repeating rows and cols) has a large
number of feature channels which allow the network
to propagate context information to higher resolution
layers.
• Spatial Dropout: feature maps
dropout.
• Upside: Small training set.
• Downside: Risk of overfitting.
26
Candidate Nodule
Selection via
UNET
Dilation, Erosion,
Nodules Distance
Merging
False Positive
Reduction via
WideResNet
27
Tsukahara, J. S., Harrison, T. L., & Engle, R. W. (2016). The relationship between baseline pupil size and
intelligence. Cognitive Psychology, 91, 109-123.
Hubel, D. H., & Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate cortex. The
Journal of physiology, 195(1), 215-243.
Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to
prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929-1958.
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of
the 27th international conference on machine learning (ICML-10) (pp. 807-814).
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural
networks. In Advances in neural information processing systems (pp. 1097-1105).
Sánchez, J., Perronnin, F., Mensink, T., & Verbeek, J. (2013). Image classification with the fisher vector: Theory
and practice. International journal of computer vision, 105(3), 222-245.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv
preprint arXiv:1409.1556.
28
.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with
convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1-9).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (pp. 770-778).
Johnson, R. C. (2015). Microsoft, Google beat humans at image recognition. EE Times.
Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U-net: Convolutional networks for biomedical image
segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp.
234-241). Springer International Publishing.
29
.
Thank you for attention.
Francesco Pugliese
Matteo Testi
30

More Related Content

PDF
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
PDF
Hoip10 articulo counting people in crowded environments_univ_berlin
PDF
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
PDF
A Novel GA-SVM Model For Vehicles And Pedestrial Classification In Videos
PPT
Optimal deep learning model For Classification of Lung Cancer on CT Images
PDF
Zeeshan.ali.presentations
PDF
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
PDF
Deep learning for pose-invariant face detection in unconstrained environment
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
Hoip10 articulo counting people in crowded environments_univ_berlin
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
A Novel GA-SVM Model For Vehicles And Pedestrial Classification In Videos
Optimal deep learning model For Classification of Lung Cancer on CT Images
Zeeshan.ali.presentations
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
Deep learning for pose-invariant face detection in unconstrained environment

What's hot (20)

PDF
Geometric Deep Learning
PDF
Multiple Person Tracking with Shadow Removal Using Adaptive Gaussian Mixture ...
PDF
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
PDF
FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS
PDF
Brain Tumor Segmentation in MRI Images
PDF
Performance analysis on color image mosaicing techniques on FPGA
PDF
Using Mask R CNN to Isolate PV Panels from Background Object in Images
PDF
IRJET- Segmentation of Nucleus and Cytoplasm from Unit Papanicolaou Smear Ima...
PDF
Satellite Image Classification with Deep Learning Survey
PDF
IRJET- Brain Tumor Detection using Deep Learning
PDF
Video Retrieval of Specific Persons in Specific Locations
PDF
Automatic Foreground object detection using Visual and Motion Saliency
PPT
Optimal Deep Learning model for Classification of Lung Cancer
PDF
Medical Imaging at UPC - Elisa Sayrol - UPC Barcelona 2018
PPT
Applying Deep Learning with Weak and Noisy labels
PPTX
GANs in Digital Pathology
PDF
3D Segmentation of Brain Tumor Imaging
PDF
Object recognition with cortex like mechanisms pami-07
PDF
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
PDF
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
Geometric Deep Learning
Multiple Person Tracking with Shadow Removal Using Adaptive Gaussian Mixture ...
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
FACE EXPRESSION RECOGNITION USING CONVOLUTION NEURAL NETWORK (CNN) MODELS
Brain Tumor Segmentation in MRI Images
Performance analysis on color image mosaicing techniques on FPGA
Using Mask R CNN to Isolate PV Panels from Background Object in Images
IRJET- Segmentation of Nucleus and Cytoplasm from Unit Papanicolaou Smear Ima...
Satellite Image Classification with Deep Learning Survey
IRJET- Brain Tumor Detection using Deep Learning
Video Retrieval of Specific Persons in Specific Locations
Automatic Foreground object detection using Visual and Motion Saliency
Optimal Deep Learning model for Classification of Lung Cancer
Medical Imaging at UPC - Elisa Sayrol - UPC Barcelona 2018
Applying Deep Learning with Weak and Noisy labels
GANs in Digital Pathology
3D Segmentation of Brain Tumor Imaging
Object recognition with cortex like mechanisms pami-07
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION

Similar to Deep learning and computer vision (20)

PDF
project report on A Learning Framework for Morphological Operators using Coun...
PPTX
brain tumor.pptx
PDF
Image Segmentation and Classification using Neural Network
PDF
Image Segmentation and Classification using Neural Network
PPTX
Deep Learning
PDF
IRJET-Breast Cancer Detection using Convolution Neural Network
PDF
Mnist report
PPTX
Mnist report ppt
PDF
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
PPTX
Deep learning
DOCX
Convolutional Neural Networks
PDF
ct_meeting_final_jcy (1).pdf
PDF
Lung Cancer Detection using transfer learning.pptx.pdf
PDF
Deep learning: Cutting through the Myths and Hype
PPTX
Deep Learning Training at Intel
PPTX
self operating maps
PPTX
x-RAYS PROJECT
PDF
Deeplearning in finance
PDF
Hand Written Digit Classification
PPTX
FINAL_Team_4.pptx
project report on A Learning Framework for Morphological Operators using Coun...
brain tumor.pptx
Image Segmentation and Classification using Neural Network
Image Segmentation and Classification using Neural Network
Deep Learning
IRJET-Breast Cancer Detection using Convolution Neural Network
Mnist report
Mnist report ppt
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
Deep learning
Convolutional Neural Networks
ct_meeting_final_jcy (1).pdf
Lung Cancer Detection using transfer learning.pptx.pdf
Deep learning: Cutting through the Myths and Hype
Deep Learning Training at Intel
self operating maps
x-RAYS PROJECT
Deeplearning in finance
Hand Written Digit Classification
FINAL_Team_4.pptx

More from MeetupDataScienceRoma (20)

PDF
Serve Davvero il Machine Learning nelle PMI? | Niccolò Annino
PDF
Meta-learning through the lenses of Statistical Learning Theory (Carlo Cilibe...
PPTX
Claudio Gallicchio - Deep Reservoir Computing for Structured Data
PDF
Docker for Deep Learning (Andrea Panizza)
PDF
Machine Learning for Epidemiological Models (Enrico Meloni)
PDF
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
PDF
Web Meetup #2: Modelli matematici per l'epidemiologia
PDF
Deep red - The environmental impact of deep learning (Paolo Caressa)
PDF
[Sponsored] C3.ai description
PDF
Paolo Galeone - Dissecting tf.function to discover auto graph strengths and s...
PPTX
Multimodal AI Approach to Provide Assistive Services (Francesco Puja)
PPTX
Introduzione - Meetup MLOps & Assistive AI
PDF
Zero, One, Many - Machine Learning in Produzione (Luca Palmieri)
PPTX
Mario Incarnati - The power of data visualization
PDF
Machine Learning in the AWS Cloud
PPTX
OLIVAW: reaching superhuman strength at Othello
PDF
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
PPTX
Bring your neural networks to the browser with TF.js - Simone Scardapane
PPTX
Meetup Gennaio 2019 - Slide introduttiva
PPTX
Elena Gagliardoni - Neural Chatbot
Serve Davvero il Machine Learning nelle PMI? | Niccolò Annino
Meta-learning through the lenses of Statistical Learning Theory (Carlo Cilibe...
Claudio Gallicchio - Deep Reservoir Computing for Structured Data
Docker for Deep Learning (Andrea Panizza)
Machine Learning for Epidemiological Models (Enrico Meloni)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Web Meetup #2: Modelli matematici per l'epidemiologia
Deep red - The environmental impact of deep learning (Paolo Caressa)
[Sponsored] C3.ai description
Paolo Galeone - Dissecting tf.function to discover auto graph strengths and s...
Multimodal AI Approach to Provide Assistive Services (Francesco Puja)
Introduzione - Meetup MLOps & Assistive AI
Zero, One, Many - Machine Learning in Produzione (Luca Palmieri)
Mario Incarnati - The power of data visualization
Machine Learning in the AWS Cloud
OLIVAW: reaching superhuman strength at Othello
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
Bring your neural networks to the browser with TF.js - Simone Scardapane
Meetup Gennaio 2019 - Slide introduttiva
Elena Gagliardoni - Neural Chatbot

Recently uploaded (20)

PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
annual-report-2024-2025 original latest.
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
1_Introduction to advance data techniques.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Introduction to machine learning and Linear Models
PDF
Foundation of Data Science unit number two notes
PPTX
Database Infoormation System (DBIS).pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
annual-report-2024-2025 original latest.
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Business Acumen Training GuidePresentation.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Fluorescence-microscope_Botany_detailed content
.pdf is not working space design for the following data for the following dat...
1_Introduction to advance data techniques.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
climate analysis of Dhaka ,Banglades.pptx
Introduction to machine learning and Linear Models
Foundation of Data Science unit number two notes
Database Infoormation System (DBIS).pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf

Deep learning and computer vision

  • 1. Francesco Pugliese, PhD Italian National Institute of Statistics, Division "Information and Application Architecture“, Directorate for methodology and statistical design Matteo Testi, MsC Data Science at Sferanet S.r.l Email Francesco Pugliese : francesco.pugliese@istat.it Email Matteo Testi: testi@sferaspa.com
  • 2. • Image classification is the task of taking an input image and outputting a class (a cat, dog, etc) or a probability of those classes that better describe the image. For humans, this task of recognition is one of the first skills we learn. • When we see an image or just when we look at the world around us, most of the time we are able to immediately characterize the scene and give each object a label, all this without even consciously noticing it. 2
  • 3. 3 What we see What computers see
  • 4. 4 Convolutional Neural Networks (CNNs) are biologically-inspired variants of MLPs. From Hubel and Wiesel’s early work on the cat’s visual cortex, we know the visual cortex contains a complex arrangement of cells. These cells are sensitive to small sub-regions of the visual field, called a receptive field.
  • 5. 5 • For example, some neurons fired when exposed to vertical edges and some when shown horizontal or diagonal edges. • Feature identifiers
  • 7. 7 LeNet was one of the very first convolutional neural networks which helped to propel the field of Deep Learning. This pioneering work by Yann LeCun was named LeNet5 after many previous successful iterations since the year 1988.
  • 8. 8 MNIST database is handwritten digits composed of 60.000 pattern. • 60.000 training set • 10,000 test set LeNet has been applied to this dataset with accuracy of 0.95%.
  • 9. 9
  • 10. 10
  • 11. 11
  • 12. 12 So far, in tens of experiments, the resulting performances were many magnitudes better than other machine learning techniques today available. GPU •The advent of GPUs makes possible the training of very large neural networks with even more than 150 millions of parameters. BIG DATA • A new generation of larger training and test sets. Dropou t • Better model regularization techniques have been discovered such as “Dropout” or “Data Augmentation”
  • 13. 13 - A new study proves the relationship between Vision capabilities and Intelligence (Tsukahara et al., 2016). - Computer Vision needs human-like abilities. EVERYDAY LIFE BIOMEDICAL IMAGES
  • 14. 14 •A new generation of machines might accomplish typical human tasks such as recognizing and moving objects, driving cars, cultivating fields, cleaning streets, city garbage collecting, etc.
  • 15. ImageNet: ImageNet is a dataset of over 15 million labeled high-resolution images belonging to roughly 22,000 categories. • Since 2010 a competition called «ImageNet Large-Scale Visual Recognition Challenge (ILSVRC)» uses a subset of ImageNet with roughly 1000 images in each of 1000 categories. Train Set: 1.2 million training images, Validation Set: 50,000 images Test set: 150,000 images. 15
  • 16. Kaggle: In 2010, Kaggle was founded as a platform for predictive modeling and analytics competitions on which companies and researchers post their data. • Statisticians and data scientists from all over the world compete to produce the best models. • Data Science Bowl 2017 was the biggest competition focused on “Lung Cancer Detection”. The competition was founded by Arnold Foundation and awarded $1 million in prizes (1st ranked $500,000). Train Set: around 150 CT labelled scans images per patient from 1200 patients encoded in DICOM format. Stage 1 test set: 190 patients CT scans. Stage 2 test : 500 patients CT scans. 16 Grand Challenges in Biomedical Image Analysis: This is a web- site hosting new competitions in the Biomedicine field. Specifically, LUNA (LUng Nodule Analysis) focuses on a large- scale evaluation of automatic nodule detection algorithms. Train Set: LIDC/IDRI database consisting of 888 CT Scans labelled by 4 expert radiologists.
  • 17. Each neuron in the convolutional layer is connected only to a local region in the input volume spatially. In this case there are 5 neurons along the depth all looking at the same region. 17 Convolutional Neural Networks (CNN) are biologically-inspired variants of MLPs. We know the visual cortex contains a complex arrangement of cells (Hubel, D. and Wiesel, T., 1968). These cells are sensitive to small sub-regions of the visual field, called a receptive field. Other layers are: RELU layer, Pool Layer. Typical CNNs settings are: a) Number of Kernels (Filters), b) Receptive Field size, b) Padding, c) Stride. These parameters are tied by the following equation:
  • 18. 18 Conv Nets AlexNet (2012) GoogleNet (2014) Residual Nets (2015) VGG Net (2° ranked in 2014) Traditional issues with Convolutional Layers: •Wide convolutional layers determine overfitting and vanishing gradient problem with the Solver (SGD, Adam, etc). •Low depth architectures produce raw features (need to push the depth forward). Model Regularization : •Dropout - Co-adaptation and Models Ensemble (Srivastava, N. et al., 2014). •Weight penalty L1/L2 •Data Augmentation (crop, flip, rotation, ZCA whitening, etc.)
  • 19. 19 Critical Feautures (Krizhevsky, A. et al, 2012) • 8 trainable layers: 5 convolutional layers and 3 fully connected layers. • Max pooling layers after 1st, 2nd and 5th layer. • Rectified Linear Units (ReLUs) (Nair, V., & Hinton, G. E. 2010). • Local Response Normalization. • 60 millions parameters, 650 thousands neurons. • Regularizations: Dropout (prob 0.5 in the first 2 fc layers, Data Augmentation (translactions, horizontal reflections, PCA on RGB). • Trained on 2 GTX 580 3 GB GPUs. Results: • 1 CNNs: 40.7% Top-1 Error, 18.2% Top-5 Error • 5 CNNs: 38.1% Top-1 Error,16.4% Top-5 Error • SIFT+FVs: 26.2% Top-5 Error (Sánchez, J., et al., 2013).
  • 20. 20 Critical Feautures (Simonyan, K., & Zisserman, A., 2014 ): • Kernels with small receptive fields: 3x3 which is the smallest size to capture the notion of left/right up/down, center. It is easy to see that a stack of two 3×3 conv. layers (without spatial pooling in between) has an effective receptive field of 5×5, and so on. • Small size Receptive Field is a way to increase the nonlinearity of the decision function fields of the conv. layers. • Increasing depth architectures: VGG-16 (2xConv3-64, 2xConv3-128, 3xConv3-256, 6xConv3-512, 3xFC), VGG-19 (same as VGG-16 but with 8xConv3-512). • Upside: less complex topology, outperforms GoogleNet on single-network classification accuracy • Downside: 138 millions parameters for VGG-16 ! Results: • Multi ConvNet model : (D/[256;512]/256,384,51 2), (E/[256;512]/256,384,51 2), multi-crop & dense eval: 23.7% Top-1 Error, 6.8% Top-5 Error.
  • 21. 21 Critical Feautures (Szegedy, C., et al., 2015): • Computationally Effective Deep architecture: 22 layers • Why the name inception, you ask? Because the module represents a network within a network. If you don't get the reference, go watch Christopher Nolan's “INCEPTION”, computer scientists are hilarious. • Inception: it isbasically the parallel combination of 1×1 3×3, and 5×5 convolutional filters. • Bottleneck layer: The great insight of the inception module is the use of 1×1 conv- olutional blocks (NiN) to reduce the number of features before the expensive parallel blocks. • Upside: 4 millions parameters! • Downside: Not scalable! Results: • 7 Models Ensemble : 6.67% Top-5 Error.
  • 22. 22 Critical Feautures (He, K., et al., 2016). : • Degradation Problem: Stacking more and more layers IS NOT better. With the network depth increasing, accuracy gets saturated and then degrades rapidly! It’s an issue of “solvers”. • Solves the “Degradation problem”: by fitting a residual mapping which is easier to optimize. • Shortcut connections: • Very deep architecture: up to 1202 layers with WideResnet with only 19.4 million parameters! • Upside: Increasing accuracy with more depth • Downside: They don’t consider other architectures breakthroughs. Results: • ResNet : 3.57% Top-5 Error. • CNNs show superhuman abilities at Image Recognition! 5% Human estimated Top-5 error (Johnson, R. C., 2015).
  • 23. 23
  • 24. 24 Problems: • Feature extraction: In biomedicine feature extraction is not as easy as in an Imagenet competition with general images. A previous Image Preprocessing is needed. This is called Segmentation. • On Kaggle website there are whole competitions just regarding Segmentation. One of these was called «Ultrasound Nerve Segmentation».
  • 25. 25 Critical Feautures (Ronneberger, O., et al., 2015): • U-NET can be trained end-to-end from very few images and outperforms the prior best methods. • It consists of a contracting path (left side) to capture context and an symmetric expansive path (right side) enabling precise localization. • Upsampling part (repeating rows and cols) has a large number of feature channels which allow the network to propagate context information to higher resolution layers. • Spatial Dropout: feature maps dropout. • Upside: Small training set. • Downside: Risk of overfitting.
  • 26. 26 Candidate Nodule Selection via UNET Dilation, Erosion, Nodules Distance Merging False Positive Reduction via WideResNet
  • 27. 27
  • 28. Tsukahara, J. S., Harrison, T. L., & Engle, R. W. (2016). The relationship between baseline pupil size and intelligence. Cognitive Psychology, 91, 109-123. Hubel, D. H., & Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate cortex. The Journal of physiology, 195(1), 215-243. Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929-1958. Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807-814). Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105). Sánchez, J., Perronnin, F., Mensink, T., & Verbeek, J. (2013). Image classification with the fisher vector: Theory and practice. International journal of computer vision, 105(3), 222-245. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. 28 .
  • 29. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1-9). He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770-778). Johnson, R. C. (2015). Microsoft, Google beat humans at image recognition. EE Times. Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 234-241). Springer International Publishing. 29 .
  • 30. Thank you for attention. Francesco Pugliese Matteo Testi 30

Editor's Notes

  • #29: Slide 17
  • #30: Slide 18
  • #31: Slide 19 Thank you for attention.