SlideShare a Scribd company logo
Kevin McGuinness
kevin.mcguinness@dcu.ie
Research Fellow
Insight Centre for Data Analytics
Dublin City University
DEEP
LEARNING
WORKSHOP
Dublin City University
28-29 April 2017
Learning where to look: focus
and attention in deep vision
Overview
Visual attention models and their applications
Deep vision for medical image analysis
Deep crowd analysis
Interactive deep vision: image segmentation
Visual Attention Models and their
Applications
The importance of visual attention
Learning where to look: focus and attention in deep vision
The importance of visual attention
The importance of visual attention
Learning where to look: focus and attention in deep vision
The importance of visual attention
Why don’t we see the changes?
We don’t really see the whole image
We only focus on small specific regions: the salient parts
Human beings reliably attend to the same regions of images
when shown
What we perceive
Where we look
What we actually see
Can we predict where humans will look?
Yes! Computational models of visual saliency
Why might this be useful?
SalNet: deep visual saliency model
Predict map of visual attention from image pixels
(find the parts of the image that stand out)
● Feedforward 8 layer “fully convolutional”
architecture
● Transfer learning in bottom 3 layers from
pretrained VGG-M model on ImageNet
● Trained on SALICON dataset (simulated
crowdsourced attention dataset using
mouse and artificial foveation)
● Top-5 in MIT 300 saliency benchmark
http://saliency.mit.edu/results_mit300.html
Predicted Ground truth
Pan, McGuinness, et al. Shallow and Deep Convolutional Networks for Saliency Prediction, CVPR 2016 http://arxiv.org/abs/1603.00845
ImageGroundtruthPrediction
ImageGroundtruthPrediction
SalGAN
Adversarial loss
Data loss
Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O’Connor, Jordi Torres, Elisa Sayrol and Xavier Giro-i-Nieto. “SalGAN: Visual
Saliency Prediction with Generative Adversarial Networks.” arXiv. 2017.
18
SalNet and SalGAN benchmarks
Applications of visual attention
Intelligent image cropping
Image retrieval
Improved image classification
Intelligent image cropping
Learning where to look: focus and attention in deep vision
Image retrieval: query by example
Given:
● An example query image
that illustrates the user's
information need
● A very large dataset of
images
Task:
● Rank all images in the
dataset according to how
likely they are to fulfil the
user's information need
23
Retrieval benchmarks
Oxford Buildings
2007
Paris Buildings
2008
TRECVID INS
2014
Bags of convolutional features instance search
Objective: rank images according to relevance to
query image
Local CNN features and BoW
● Pretrained VGG-16 network
● Features from conv-5
● L2-norm, PCA, L2-norm
● K-means clustering -> BoW
● Cosine similarity
● Query augmentation, spatial reranking
Scalable, fast, high-performance on Oxford 5K,
Paris 6K and TRECVid INS
BoW Descriptor
Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search, ICMR 2016 http://arxiv.org/abs/1604.04653
Bags of convolutional features instance search
BoW Descriptor
Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search, ICMR 2016 http://arxiv.org/abs/1604.04653
Using saliency to improve retrieval
CNN
CNN
Saliency
Semantic
features
Importance
weighting
Weighted
features
Pooling (e.g.
BoW)
Image descriptors
Saliency weighted retrieval
Oxford Paris INSTRE
Global Local Global Local Global Local
No weighting 0.614 0.680 0.621 0.720 0.304 0.472
Center prior 0.656 0.702 0.691 0.758 0.407 0.546
Saliency 0.680 0.717 0.716 0.770 0.514 0.617
QE saliency - 0.784 - 0.834 0.719
Mean Average Precision
12.4%
Using saliency to improve image classification
Conv 1
Conv 3
Conv 4
Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Max-Pooling
Max-Pooling
RGB
Saliency
Conv 1
Batch Norm.
Max-Pooling
Figure credit: Eric Arazo
Why does it improve classification accuracy?
Acoustic guitar +25 %
Volleyball +23 %
Deep Vision for Medical Image
Analysis
Task: predict KL grade from X-Ray images
Pipeline: locate and classify
Detection performance
FCN detection performance
Template matching: (J > 0.5) 8.3%
SVM on handcrafted features: (J > 0.5): 38.6%
Multi-objective learning helps!
Same network used to regress on KL grade and predict a discrete KL grade
Jointly train on both objectives
Comparison with the state of the art
How far are we from human-level accuracy?
Most errors are between grade 0 and 1 and grade 1 and 2.
Human experts have a hard time with these grades too.
Agreement among humans on OAI
● weighted kappa of 0.70 [0.65-0.76]
Human machine agreement
● weighted kappa of 0.67 [0.65-0.68]
Predictions agree with the “gold standard” about as well as
the “gold standard” agrees with itself.
Confusion matrix
Neonatal brain image segmentation
Volumetric semantic segmentation: label each pixel with class of brain matter.
Applications:
● Prerequisite for volumetric analysis
● Early identification of risk factors for impaired brain development
Challenge:
● Neonatal brains very different
● Sparse training data! Neobrains challenge has 2 training examples
Cerebellum Unmyelinated
white matter
Cortical grey
matter
Ventricles Cerebrospinal
fluid
Brainstem
The task
Model
● 8 layer FCN
● 64/96 convolution filters per layer
● Atrous (dilated) convolution to increase receptive field
without sacrificing prediction resolution
● 9D per pixel softmax over classes
● Binary cross entropy loss
● L2
regularization
● Aggressive data augmentation: scale, crop, rotate, flip,
gamma
● Train on 2 axial volumes (~50 slides per volume) for 500
epochs using Adam optimizer
Animation from: http://nicolovaligi.com/deep-learning-models-semantic-segmentation.html
Sample results
Cerebellum Unmyelinated
white matter
Cortical grey
matter
Ventricles Cerebrospinal
fluid
Brainstem
Tissue Ours LRDE_LTCI
UPF_SIMBioSy
s
Cerebellum 0.92 0.94 0.94
Myelinated white matter 0.51 0.06 0.54
Basal ganglia and thalami 0.91 0.91 0.93
Ventricles 0.89 0.87 0.83
Unmyelinated white matter 0.93 0.93 0.91
Brainstem 0.82 0.85 0.85
Cortical grey matter 0.88 0.87 0.85
Cerebrospinal fluid 0.83 0.83 0.79
UWM+MWM 0.93 0.93 0.90
CSF+Ven 0.84 0.84 0.79
0.85 0.80 0.83
Neobrains challenge
New state of the art on
Neobrains infant brain
segmentation challenge for axial
volume segmentation
Deep learning with only 2
training examples!
No ensembling yet. Best
competing approach is a large
ensemble.
Second best is also a deep net.
Deep Vision for Crowd Analysis
Fully convolutional crowd counting
FCN ∑
Crowd
count
estimate
8x conv layers
2x max pooling
Crowd density
estimate
Marsden et al. Fully convolutional crowd counting on highly congested scenes. VISAPP 2017
https://arxiv.org/abs/1612.00220
True count: 1544
Predicted count: 1566
Benchmark results: UCF CC 50 dataset
45-45,000 people per image
State of the art improved by 11% (MSE) and 13% (MSE)
Interactive Deep Vision
Interactive image segmentation
Image Encoder
FCN
Interaction Encoder
FCN
Segmentation Decoder
FCN
Channel Concatenation
Ground truth (MS
COCO)
Weak Labels
Cross Entropy
Loss
Auto-generated
User Outline
Gaussian Process
Input Image
Crop from MS
COCO
Dilated
Convolutions
Pixel Predictions
DeepClip: training
Image Encoder
FCN
Interaction Encoder
FCN
Segmentation Decoder
FCN
Channel Concatenation
User
interactions
Input Image
Dilated
Convolutions
Pixel Predictions
DeepClip: prediction
Binary
Segmentation
Vector tracer
Graph Cuts
Optimizer
Learning where to look: focus and attention in deep vision
Closing remarks
● Deep learning has completely revolutionized computer vision
● Human visual attention is important! Incorporating visual attention models
helps in many tasks
● You don’t need a huge amount of training data to train an effective deep
model
● Simulation techniques are effective for data generation
● Multi-task deep learning is an effective way of providing “more signal” during
training.
Questions?

More Related Content

PPTX
Transfer Learning and Fine-tuning Deep Neural Networks
PPTX
Deep Learning With Neural Networks
PPTX
AlexNet
PPTX
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PDF
DeepFix: a fully convolutional neural network for predicting human fixations...
PDF
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
PDF
Deep Learning - Convolutional Neural Networks
PDF
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Transfer Learning and Fine-tuning Deep Neural Networks
Deep Learning With Neural Networks
AlexNet
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
DeepFix: a fully convolutional neural network for predicting human fixations...
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Learning - Convolutional Neural Networks
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020

What's hot (20)

PPTX
CNN Tutorial
PDF
DLD meetup 2017, Efficient Deep Learning
PPTX
Introduction to deep learning
PDF
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
PDF
Medical Imaging at DCU - Kevin McGuinness - UPC Barcelona 2018
PPTX
Image classification with Deep Neural Networks
PDF
Synthetic dialogue generation with Deep Learning
 
PDF
Convolutional neural network
PPTX
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
PPTX
Convolutional neural networks 이론과 응용
PDF
Convolutional Neural Network
PPTX
Deep Learning Tutorial
PDF
Tutorial on Deep Learning
PDF
Machine Learning and Deep Learning with R
PPTX
Geek Night 17.0 - Artificial Intelligence and Machine Learning
PDF
Introduction to Convolutional Neural Networks
PDF
Deep learning in Computer Vision
PPTX
Deep Learning in Computer Vision
PPTX
Introduction to Deep Learning
PPTX
Efficient Neural Network Architecture for Image Classfication
CNN Tutorial
DLD meetup 2017, Efficient Deep Learning
Introduction to deep learning
Deep Learning - 인공지능 기계학습의 새로운 트랜드 :김인중
Medical Imaging at DCU - Kevin McGuinness - UPC Barcelona 2018
Image classification with Deep Neural Networks
Synthetic dialogue generation with Deep Learning
 
Convolutional neural network
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional neural networks 이론과 응용
Convolutional Neural Network
Deep Learning Tutorial
Tutorial on Deep Learning
Machine Learning and Deep Learning with R
Geek Night 17.0 - Artificial Intelligence and Machine Learning
Introduction to Convolutional Neural Networks
Deep learning in Computer Vision
Deep Learning in Computer Vision
Introduction to Deep Learning
Efficient Neural Network Architecture for Image Classfication
Ad

Similar to Learning where to look: focus and attention in deep vision (20)

PDF
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
PDF
Relevance feedback for image retrieval with EEG signals
PDF
[212]big models without big data using domain specific deep networks in data-...
PPTX
Miccai2018 paperlist
PDF
Master's Thesis - Data Science - Presentation
PDF
Medical Imaging at UPC - Elisa Sayrol - UPC Barcelona 2018
PDF
The impact of visual saliency prediction in image classification
PDF
Cs231n 2017 lecture12 Visualizing and Understanding
PDF
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
PDF
Article overview: Deep Neural Networks Reveal a Gradient in the Complexity of...
PPTX
How deep learning reshapes medicine
PPTX
NTI_AI_Project_graphic era hill_2022.pptx
PPTX
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
PDF
20141003.journal club
PPTX
ObjRecog2-17 (1).pptx
PDF
(2017/06)Practical points of deep learning for medical imaging
PDF
Performance Comparison Analysis for Medical Images Using Deep Learning Approa...
PPT
2017 07 03_meetup_d
PPT
2017 07 03_meetup_d
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Relevance feedback for image retrieval with EEG signals
[212]big models without big data using domain specific deep networks in data-...
Miccai2018 paperlist
Master's Thesis - Data Science - Presentation
Medical Imaging at UPC - Elisa Sayrol - UPC Barcelona 2018
The impact of visual saliency prediction in image classification
Cs231n 2017 lecture12 Visualizing and Understanding
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
Article overview: Deep Neural Networks Reveal a Gradient in the Complexity of...
How deep learning reshapes medicine
NTI_AI_Project_graphic era hill_2022.pptx
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
20141003.journal club
ObjRecog2-17 (1).pptx
(2017/06)Practical points of deep learning for medical imaging
Performance Comparison Analysis for Medical Images Using Deep Learning Approa...
2017 07 03_meetup_d
2017 07 03_meetup_d
Ad

More from Universitat Politècnica de Catalunya (20)

PDF
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
PDF
Deep Generative Learning for All
PDF
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
PDF
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
PDF
The Transformer - Xavier Giró - UPC Barcelona 2021
PDF
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
PDF
Open challenges in sign language translation and production
PPTX
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
PPTX
Discovery and Learning of Navigation Goals from Pixels in Minecraft
PDF
Learn2Sign : Sign language recognition and translation using human keypoint e...
PDF
Intepretability / Explainable AI for Deep Neural Networks
PDF
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
PDF
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
PDF
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
PDF
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
PDF
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
PDF
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
PDF
Curriculum Learning for Recurrent Video Object Segmentation
PDF
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
PDF
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
The Transformer - Xavier Giró - UPC Barcelona 2021
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Open challenges in sign language translation and production
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Learn2Sign : Sign language recognition and translation using human keypoint e...
Intepretability / Explainable AI for Deep Neural Networks
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Curriculum Learning for Recurrent Video Object Segmentation
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020

Recently uploaded (20)

PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Lecture1 pattern recognition............
PPT
Quality review (1)_presentation of this 21
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
1_Introduction to advance data techniques.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Lecture1 pattern recognition............
Quality review (1)_presentation of this 21
Clinical guidelines as a resource for EBP(1).pdf
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
STUDY DESIGN details- Lt Col Maksud (21).pptx
1_Introduction to advance data techniques.pptx
Mega Projects Data Mega Projects Data
Database Infoormation System (DBIS).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
IB Computer Science - Internal Assessment.pptx
climate analysis of Dhaka ,Banglades.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Introduction to Knowledge Engineering Part 1
Miokarditis (Inflamasi pada Otot Jantung)
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf

Learning where to look: focus and attention in deep vision

  • 1. Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University DEEP LEARNING WORKSHOP Dublin City University 28-29 April 2017 Learning where to look: focus and attention in deep vision
  • 2. Overview Visual attention models and their applications Deep vision for medical image analysis Deep crowd analysis Interactive deep vision: image segmentation
  • 3. Visual Attention Models and their Applications
  • 4. The importance of visual attention
  • 6. The importance of visual attention
  • 7. The importance of visual attention
  • 9. The importance of visual attention
  • 10. Why don’t we see the changes? We don’t really see the whole image We only focus on small specific regions: the salient parts Human beings reliably attend to the same regions of images when shown
  • 14. Can we predict where humans will look? Yes! Computational models of visual saliency Why might this be useful?
  • 15. SalNet: deep visual saliency model Predict map of visual attention from image pixels (find the parts of the image that stand out) ● Feedforward 8 layer “fully convolutional” architecture ● Transfer learning in bottom 3 layers from pretrained VGG-M model on ImageNet ● Trained on SALICON dataset (simulated crowdsourced attention dataset using mouse and artificial foveation) ● Top-5 in MIT 300 saliency benchmark http://saliency.mit.edu/results_mit300.html Predicted Ground truth Pan, McGuinness, et al. Shallow and Deep Convolutional Networks for Saliency Prediction, CVPR 2016 http://arxiv.org/abs/1603.00845
  • 18. SalGAN Adversarial loss Data loss Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O’Connor, Jordi Torres, Elisa Sayrol and Xavier Giro-i-Nieto. “SalGAN: Visual Saliency Prediction with Generative Adversarial Networks.” arXiv. 2017. 18
  • 19. SalNet and SalGAN benchmarks
  • 20. Applications of visual attention Intelligent image cropping Image retrieval Improved image classification
  • 23. Image retrieval: query by example Given: ● An example query image that illustrates the user's information need ● A very large dataset of images Task: ● Rank all images in the dataset according to how likely they are to fulfil the user's information need 23
  • 24. Retrieval benchmarks Oxford Buildings 2007 Paris Buildings 2008 TRECVID INS 2014
  • 25. Bags of convolutional features instance search Objective: rank images according to relevance to query image Local CNN features and BoW ● Pretrained VGG-16 network ● Features from conv-5 ● L2-norm, PCA, L2-norm ● K-means clustering -> BoW ● Cosine similarity ● Query augmentation, spatial reranking Scalable, fast, high-performance on Oxford 5K, Paris 6K and TRECVid INS BoW Descriptor Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search, ICMR 2016 http://arxiv.org/abs/1604.04653
  • 26. Bags of convolutional features instance search BoW Descriptor Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search, ICMR 2016 http://arxiv.org/abs/1604.04653
  • 27. Using saliency to improve retrieval CNN CNN Saliency Semantic features Importance weighting Weighted features Pooling (e.g. BoW) Image descriptors
  • 28. Saliency weighted retrieval Oxford Paris INSTRE Global Local Global Local Global Local No weighting 0.614 0.680 0.621 0.720 0.304 0.472 Center prior 0.656 0.702 0.691 0.758 0.407 0.546 Saliency 0.680 0.717 0.716 0.770 0.514 0.617 QE saliency - 0.784 - 0.834 0.719 Mean Average Precision
  • 29. 12.4% Using saliency to improve image classification Conv 1 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Max-Pooling Max-Pooling RGB Saliency Conv 1 Batch Norm. Max-Pooling Figure credit: Eric Arazo
  • 30. Why does it improve classification accuracy? Acoustic guitar +25 % Volleyball +23 %
  • 31. Deep Vision for Medical Image Analysis
  • 32. Task: predict KL grade from X-Ray images
  • 34. Detection performance FCN detection performance Template matching: (J > 0.5) 8.3% SVM on handcrafted features: (J > 0.5): 38.6%
  • 35. Multi-objective learning helps! Same network used to regress on KL grade and predict a discrete KL grade Jointly train on both objectives
  • 36. Comparison with the state of the art
  • 37. How far are we from human-level accuracy? Most errors are between grade 0 and 1 and grade 1 and 2. Human experts have a hard time with these grades too. Agreement among humans on OAI ● weighted kappa of 0.70 [0.65-0.76] Human machine agreement ● weighted kappa of 0.67 [0.65-0.68] Predictions agree with the “gold standard” about as well as the “gold standard” agrees with itself. Confusion matrix
  • 38. Neonatal brain image segmentation Volumetric semantic segmentation: label each pixel with class of brain matter. Applications: ● Prerequisite for volumetric analysis ● Early identification of risk factors for impaired brain development Challenge: ● Neonatal brains very different ● Sparse training data! Neobrains challenge has 2 training examples
  • 39. Cerebellum Unmyelinated white matter Cortical grey matter Ventricles Cerebrospinal fluid Brainstem The task
  • 40. Model ● 8 layer FCN ● 64/96 convolution filters per layer ● Atrous (dilated) convolution to increase receptive field without sacrificing prediction resolution ● 9D per pixel softmax over classes ● Binary cross entropy loss ● L2 regularization ● Aggressive data augmentation: scale, crop, rotate, flip, gamma ● Train on 2 axial volumes (~50 slides per volume) for 500 epochs using Adam optimizer Animation from: http://nicolovaligi.com/deep-learning-models-semantic-segmentation.html
  • 41. Sample results Cerebellum Unmyelinated white matter Cortical grey matter Ventricles Cerebrospinal fluid Brainstem
  • 42. Tissue Ours LRDE_LTCI UPF_SIMBioSy s Cerebellum 0.92 0.94 0.94 Myelinated white matter 0.51 0.06 0.54 Basal ganglia and thalami 0.91 0.91 0.93 Ventricles 0.89 0.87 0.83 Unmyelinated white matter 0.93 0.93 0.91 Brainstem 0.82 0.85 0.85 Cortical grey matter 0.88 0.87 0.85 Cerebrospinal fluid 0.83 0.83 0.79 UWM+MWM 0.93 0.93 0.90 CSF+Ven 0.84 0.84 0.79 0.85 0.80 0.83 Neobrains challenge New state of the art on Neobrains infant brain segmentation challenge for axial volume segmentation Deep learning with only 2 training examples! No ensembling yet. Best competing approach is a large ensemble. Second best is also a deep net.
  • 43. Deep Vision for Crowd Analysis
  • 44. Fully convolutional crowd counting FCN ∑ Crowd count estimate 8x conv layers 2x max pooling Crowd density estimate Marsden et al. Fully convolutional crowd counting on highly congested scenes. VISAPP 2017 https://arxiv.org/abs/1612.00220
  • 46. Benchmark results: UCF CC 50 dataset 45-45,000 people per image State of the art improved by 11% (MSE) and 13% (MSE)
  • 49. Image Encoder FCN Interaction Encoder FCN Segmentation Decoder FCN Channel Concatenation Ground truth (MS COCO) Weak Labels Cross Entropy Loss Auto-generated User Outline Gaussian Process Input Image Crop from MS COCO Dilated Convolutions Pixel Predictions DeepClip: training
  • 50. Image Encoder FCN Interaction Encoder FCN Segmentation Decoder FCN Channel Concatenation User interactions Input Image Dilated Convolutions Pixel Predictions DeepClip: prediction Binary Segmentation Vector tracer Graph Cuts Optimizer
  • 52. Closing remarks ● Deep learning has completely revolutionized computer vision ● Human visual attention is important! Incorporating visual attention models helps in many tasks ● You don’t need a huge amount of training data to train an effective deep model ● Simulation techniques are effective for data generation ● Multi-task deep learning is an effective way of providing “more signal” during training.