SlideShare a Scribd company logo
1
Comparison of Fine-tuning and Extension
Strategies for Deep Convolutional Neural
Networks
Nikiforos Pittaras1, Foteini Markatopoulou1,2, Vasileios Mezaris1,
and Ioannis Patras2
1Information Technologies Institute / Centre for Research and Technology Hellas
2Queen Mary University of London
2
Problem
3
Typical solution
4
Typical solution
We evaluate
DCNN-based
approaches
5
Transfer learning
6
Literature review: Fine-tuning strategies
•Replacing the classification
layer with a new output
layer [2,5,18]
• The new layer is learned
from scratch
• All the other layers are fine-
tuned
7
•Re-initializing the last N
layers [1,9,18]
• The last N layers are
learned from scratch
• The first M-N layers are
fine-tuned with a low
learning rate [18] (or could
remain frozen [1,9])
Literature review: Fine-tuning strategies
8
•Extending the network by
one or more fully connected
layers [1,8,9,15]
Literature review: Fine-tuning strategies
9
FT3-ex: extension strategy
10
FT3-ex: extension strategy
Add one or more FC layers
before the classification layer
11
FT3-ex: extension strategy
Insert one or more FC layers
for each auxiliary classifier
12
FT3-ex: extension strategy
We use the output of the last
three layers as features to
train LR classifiers
13
FT3-ex: extension strategy
We also evaluate the direct
output of each network
14
Evaluation setup
Dataset: TRECVID SIN 2013
• 800 and 200 hours of internet archive videos for training and testing
• One keyframe per video shot
• Evaluated concepts: 38, Evaluation measure: MXinfAP
Dataset: PASCAL VOC 2012
• 5717 training, 5823 validation and 10991 test images
• Evaluation on the validation set instead of the original test set
• Evaluated concepts: 20, Evaluation measure: MAP
We fine-tuned 3 pre-trained ImageNet DCNNs:
• CaffeNet-1k, trained on 1000 ImageNet categories
• GoogLeNet-1k, trained on the same 1000 ImageNet categories
• GoogLeNet-5k, trained using 5055 ImageNet categories
15
Evaluation setup
For each pair of utilized network and fine-tuning strategy we evaluate:
• The direct output of the network
• Logistic regression (LR) classifiers trained on DCNN-based features
• One LR classifier per concept trained using the output of one layer
• The late-fused output (arithmetic mean) of LR classifiers trained using the
last three layers
We also evaluate the two auxiliary classifiers of the GoogLeNet-based networks
16
Preliminary experiments for
parameter selection
• Classification accuracy for the FT1-def strategy and CaffeNet-1k-60-SIN
• k: the learning rate multiplier of the pre-trained layers
• e: the number of training epochs
• The best accuracy per e is underlined; the globally best accuracy is bold and
underlined
• Improved accuracy for:
• Smaller learning rate values for the pre-trained layers
• Higher values for the training epochs
17
Experimental results – # target concepts
• Goal: Assess impact of number of target concepts
• Experiment: We fine-tune the network for either 60 or all 345 concepts; we
evaluate the same 38⊆60 concepts
• Conclusion: Fine-tuning a network for more concepts improves concept
detection accuracy
19
20
21
22
23
24
25
26
FT1-def
FT2-re1
FT2-re2
FT3-ex1-64
FT3-ex1-128
FT3-ex1-256
FT3-ex1-512
FT3-ex1-1024
FT3-ex1-2048
FT3-ex1-4096
FT3-ex2-64
FT3-ex2-128
FT3-ex2-256
FT3-ex2-512
FT3-ex2-1024
FT3-ex2-2048
FT3-ex2-4096
MXInfAP(%)
Fine-tuning strategies
TRECVID SIN Dataset
CaffeNet-1k-60-SIN
CaffeNet-1k-345-SIN
18
Experimental results – direct output
• Goal: Assess DCNNs as standalone classifiers (direct output)
• Experiment: Fine-tuning on the TRECVID SIN dataset
• Conclusion: FT3-ex1-64 and FT3-ex1-128 constitute the top-two methods
irrespective of the employed DCNN
14
16
18
20
22
24
26
FT1-def
FT2-re1
FT2-re2
FT3-ex1-64
FT3-ex1-128
FT3-ex1-256
FT3-ex1-512
FT3-ex1-1024
FT3-ex1-2048
FT3-ex1-4096
FT3-ex2-64
FT3-ex2-128
FT3-ex2-256
FT3-ex2-512
FT3-ex2-1024
FT3-ex2-2048
FT3-ex2-4096
MXInfAP(%)
Fine-tuning strategies
TRECVID SIN Dataset
CaffeNet-1k-SIN
GoogLeNet-1k-SIN
GoogLeNet-5k-SIN
19
Experimental results – direct output
• Goal: Assess DCNNs as standalone classifiers (direct output)
• Experiment: Fine-tuning on the PASCAL VOC dataset
• Conclusion: FT3-ex1-512 and FT3-ex1-1024 the best performing strategies for the
CaffeNet network
58
63
68
73
78
83
FT1-def
FT2-re1
FT2-re2
FT3-ex1-64
FT3-ex1-128
FT3-ex1-256
FT3-ex1-512
FT3-ex1-1024
FT3-ex1-2048
FT3-ex1-4096
FT3-ex2-64
FT3-ex2-128
FT3-ex2-256
FT3-ex2-512
FT3-ex2-1024
FT3-ex2-2048
FT3-ex2-4096
MAP(%)
Fine-tuning strategies
PASCAL VOC Dataset
CaffeNet-1k-VOC
GoogLeNet-1k-VOC
GoogLeNet-5k-VOC
20
Experimental results – direct output
• Goal: Assess DCNNs as standalone classifiers (direct output)
• Experiment: Fine-tuning on the PASCAL VOC dataset
• Conclusion: FT3-ex1-2048 and FT3-ex1-4096 the top-two methods for the
GoogLeNet-based networks
58
63
68
73
78
83
FT1-def
FT2-re1
FT2-re2
FT3-ex1-64
FT3-ex1-128
FT3-ex1-256
FT3-ex1-512
FT3-ex1-1024
FT3-ex1-2048
FT3-ex1-4096
FT3-ex2-64
FT3-ex2-128
FT3-ex2-256
FT3-ex2-512
FT3-ex2-1024
FT3-ex2-2048
FT3-ex2-4096
MAP(%)
Fine-tuning strategies
PASCAL VOC Dataset
CaffeNet-1k-VOC
GoogLeNet-1k-VOC
GoogLeNet-5k-VOC
21
Experimental results – direct output
• Main conclusions: FT3-ex strategy with one extension layer is always the best
solution
• The optimal dimension of the extension layer depends on the dataset and the
network architecture
22
Experimental results – DCNN features
• Goal: Assess DCNNs as feature generators (DCNN-based features)
• Experiment: LR concept detectors trained on the output of the last 3 layers and
fused in terms of arithmetic-mean
• Conclusion: FT3- ex1-512 in the top-five methods; FT3-ex2-64 is always among
the five worst fine-tuning methods
17
19
21
23
25
27
29
31
33
FT1-def
FT2-re1
FT2-re2
FT3-ex1-64
FT3-ex1-128
FT3-ex1-256
FT3-ex1-512
FT3-ex1-1024
FT3-ex1-2048
FT3-ex1-4096
FT3-ex2-64
FT3-ex2-128
FT3-ex2-256
FT3-ex2-512
FT3-ex2-1024
FT3-ex2-2048
FT3-ex2-4096
MXInfAP(%)
Fine-tuning strategies
TRECVID SIN Dataset
CaffeNet-1k-SIN
GoogLeNet-1k-SIN
GoogLeNet-5k-SIN
23
Experimental results – DCNN features
• The same conclusions hold for the PASCAL VOC Dataset
50
55
60
65
70
75
80
85
90
FT1-def
FT2-re1
FT2-re2
FT3-ex1-64
FT3-ex1-128
FT3-ex1-256
FT3-ex1-512
FT3-ex1-1024
FT3-ex1-2048
FT3-ex1-4096
FT3-ex2-64
FT3-ex2-128
FT3-ex2-256
FT3-ex2-512
FT3-ex2-1024
FT3-ex2-2048
FT3-ex2-4096
MAP(%)
Fine-tuning strategies
PASCAL VOC Dataset
CaffeNet-1k-VOC
GoogLeNet-1k-VOC
GoogLeNet-5k-VOC
24
Experimental results – DCNN features
• Main conclusions: FT3-ex strategy almost always outperforms the other two fine-
tuning strategies
• FT3-ex1-512 is in the top-five methods
• Additional conclusions: drawn from results presented in the paper
• Features extracted from the top layers are more accurate than layers
positioned lower in the network; the optimal layer varies, depending on the
target domain dataset
• Better to combine features extracted from many layers
• The presented results correspond to the fused output of the last 3 layers
25
Conclusions
• Extension strategy almost always outperforms all the other strategies
• Increase the depth with one fully-connected layer
• Fine-tune the rest of the layers
• DCNN-based features significantly outperform the direct output
• In a few cases the direct output works comparably well
• Choose based on the application that the DCNN will be used; e.g., real time
applications’ time and memory limitations
• Better to combine features extracted from many layers
26
References
[1] Campos, V., Salvador, A., Giro-i Nieto, X., Jou, B.: Diving deep into sentiment: understanding fine-tuned
CNNs for visual sentiment prediction. In: 1st International Workshop on Affect and Sentiment in Multimedia
(ASM 2015), pp. 57–62. ACM, Brisbane (2015)
[2] Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into
convolutional nets. In: British Machine Vision Conference (2014)
[5.] Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and
semantic segmentation. In: Computer Vision and Pattern Recognition (CVPR 2014) (2014)
[8] Markatopoulou, F., et al.: ITI-CERTH participation in TRECVID 2015. In: TRECVID 2015 Workshop. NIST,
Gaithersburg (2015)
[9] Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using
convolutional neural networks. In: Computer Vision and Pattern Recognition (CVPR 2014) (2014)
[15] Snoek, C., Fontijne, D., van de Sande, K.E., Stokman, H., et al.: Qualcomm Research and University of
Amsterdam at TRECVID 2015: recognizing concepts, objects, and events in video. In: TRECVID 2015 Workshop.
NIST, Gaithersburg (2015)
[18] Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? CoRR
abs/1411.1792 (2014)
27
Thank you for your attention!
Questions?
More information and contact:
Dr. Vasileios Mezaris
bmezaris@iti.gr
http://www.iti.gr/~bmezaris

More Related Content

PDF
ELLA LC algorithm presentation in ICIP 2016
PDF
Comparing Incremental Learning Strategies for Convolutional Neural Networks
PDF
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
PPT
Kernel analysis of deep networks
PPTX
Speaker identification
PDF
Convolutional Neural Networks (CNN)
PDF
Naver learning to rank question answer pairs using hrde-ltc
PDF
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
ELLA LC algorithm presentation in ICIP 2016
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Kernel analysis of deep networks
Speaker identification
Convolutional Neural Networks (CNN)
Naver learning to rank question answer pairs using hrde-ltc
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.

What's hot (20)

PDF
Introduction to Model-Based Machine Learning
PDF
MediaEval 2016 - UNIFESP Predicting Media Interestingness Task
PPTX
Convolutional neural networks deepa
PDF
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
PDF
TRECVID 2016 Ad-hoc Video Seach task, CERTH-ITI
PDF
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
PDF
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
PDF
End-to-End Object Detection with Transformers
PDF
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
PDF
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
PPTX
Neuroevolution and deep learing
PDF
Multiple Object Tracking - Laura Leal-Taixe - UPC Barcelona 2018
PDF
#4 Convolutional Neural Networks for Natural Language Processing
PDF
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
PDF
Review: Incremental Few-shot Instance Segmentation [CDM]
PDF
How much position information do convolutional neural networks encode? review...
PDF
MediaEval 2016 - ININ Submission to Zero Cost ASR Task
PDF
Deformable DETR Review [CDM]
PDF
MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop
PPTX
Transformer in Vision
Introduction to Model-Based Machine Learning
MediaEval 2016 - UNIFESP Predicting Media Interestingness Task
Convolutional neural networks deepa
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
TRECVID 2016 Ad-hoc Video Seach task, CERTH-ITI
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
End-to-End Object Detection with Transformers
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Neuroevolution and deep learing
Multiple Object Tracking - Laura Leal-Taixe - UPC Barcelona 2018
#4 Convolutional Neural Networks for Natural Language Processing
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
Review: Incremental Few-shot Instance Segmentation [CDM]
How much position information do convolutional neural networks encode? review...
MediaEval 2016 - ININ Submission to Zero Cost ASR Task
Deformable DETR Review [CDM]
MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop
Transformer in Vision
Ad

Viewers also liked (6)

PDF
InVID at the IPTC Autumn Meeting 2016
PDF
InVID verification application presentation at SMVW16
PDF
InVID project presentation at SMVW16
PDF
InVID platform presentation at SMVW16
PDF
InVID at Computational journalism workshop in Rennes, France
PDF
InVID Project Presentation 3rd release March 2016
InVID at the IPTC Autumn Meeting 2016
InVID verification application presentation at SMVW16
InVID project presentation at SMVW16
InVID platform presentation at SMVW16
InVID at Computational journalism workshop in Rennes, France
InVID Project Presentation 3rd release March 2016
Ad

Similar to Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks (20)

PDF
Deep Learning Initiative @ NECSTLab
PDF
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
PDF
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PDF
CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Mo...
PPTX
Introduction to CNN Models: DenseNet & MobileNet
PPTX
powerpoint feb
PDF
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
PDF
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
PDF
[2020 CVPR Efficient DET paper review]
PDF
Fractional step discriminant pruning
PPTX
Trackster Pruning at the CMS High-Granularity Calorimeter
PPTX
Cvpr 2018 papers review (efficient computing)
PPTX
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
PDF
Once-for-All: Train One Network and Specialize it for Efficient Deployment
PDF
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
PDF
rcnn.pdfmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
PPTX
CNN Dataflow implementation on FPGAs
PPTX
Unsupervised Learning: Clustering
PDF
Detection focal loss 딥러닝 논문읽기 모임 발표자료
PDF
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
Deep Learning Initiative @ NECSTLab
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Mo...
Introduction to CNN Models: DenseNet & MobileNet
powerpoint feb
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
[2020 CVPR Efficient DET paper review]
Fractional step discriminant pruning
Trackster Pruning at the CMS High-Granularity Calorimeter
Cvpr 2018 papers review (efficient computing)
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Once-for-All: Train One Network and Specialize it for Efficient Deployment
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
rcnn.pdfmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
CNN Dataflow implementation on FPGAs
Unsupervised Learning: Clustering
Detection focal loss 딥러닝 논문읽기 모임 발표자료
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS

More from InVID Project (17)

PDF
InVID at IFCN Global Fact V
PDF
Presentation of the InVID technologies for image forensics analysis
PDF
Presentation of the InVID tool for video fragmentation and keyframe-based rev...
PDF
Presentation of the InVID project and verification technologies
PDF
Presentation of the InVID verification technologies at IPTC 2018
PDF
IUT-Toulouse-InVID-2018
PDF
A state of the art in journalism about fake image and video detection
PDF
Presentation of the InVID tool for social media verification
PDF
Presentation of the InVID tool for video fragmentation and reverse keyframe s...
PDF
Presentation of the InVID Verification Plugin
PDF
Presentation of the InVID tools for image forensics analysis
PDF
Overview of the InVID project
PDF
Second issue of the InVID newsletter
PDF
The InVID Plug-in: Web Video Verification on the Browser
PPTX
Video Retrieval for Multimedia Verification of Breaking News on Social Networks
PDF
The InVID Vefirication Plugin at #IFRAExpo #DCXExpo
PDF
The InVID Vefirication Plugin at DisinfoLab
InVID at IFCN Global Fact V
Presentation of the InVID technologies for image forensics analysis
Presentation of the InVID tool for video fragmentation and keyframe-based rev...
Presentation of the InVID project and verification technologies
Presentation of the InVID verification technologies at IPTC 2018
IUT-Toulouse-InVID-2018
A state of the art in journalism about fake image and video detection
Presentation of the InVID tool for social media verification
Presentation of the InVID tool for video fragmentation and reverse keyframe s...
Presentation of the InVID Verification Plugin
Presentation of the InVID tools for image forensics analysis
Overview of the InVID project
Second issue of the InVID newsletter
The InVID Plug-in: Web Video Verification on the Browser
Video Retrieval for Multimedia Verification of Breaking News on Social Networks
The InVID Vefirication Plugin at #IFRAExpo #DCXExpo
The InVID Vefirication Plugin at DisinfoLab

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Approach and Philosophy of On baking technology
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Building Integrated photovoltaic BIPV_UPV.pdf
Understanding_Digital_Forensics_Presentation.pptx
Electronic commerce courselecture one. Pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Spectroscopy.pptx food analysis technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The AUB Centre for AI in Media Proposal.docx
Unlocking AI with Model Context Protocol (MCP)
Spectral efficient network and resource selection model in 5G networks
Approach and Philosophy of On baking technology
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Empathic Computing: Creating Shared Understanding
Diabetes mellitus diagnosis method based random forest with bat algorithm
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton

Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks

  • 1. 1 Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks Nikiforos Pittaras1, Foteini Markatopoulou1,2, Vasileios Mezaris1, and Ioannis Patras2 1Information Technologies Institute / Centre for Research and Technology Hellas 2Queen Mary University of London
  • 6. 6 Literature review: Fine-tuning strategies •Replacing the classification layer with a new output layer [2,5,18] • The new layer is learned from scratch • All the other layers are fine- tuned
  • 7. 7 •Re-initializing the last N layers [1,9,18] • The last N layers are learned from scratch • The first M-N layers are fine-tuned with a low learning rate [18] (or could remain frozen [1,9]) Literature review: Fine-tuning strategies
  • 8. 8 •Extending the network by one or more fully connected layers [1,8,9,15] Literature review: Fine-tuning strategies
  • 10. 10 FT3-ex: extension strategy Add one or more FC layers before the classification layer
  • 11. 11 FT3-ex: extension strategy Insert one or more FC layers for each auxiliary classifier
  • 12. 12 FT3-ex: extension strategy We use the output of the last three layers as features to train LR classifiers
  • 13. 13 FT3-ex: extension strategy We also evaluate the direct output of each network
  • 14. 14 Evaluation setup Dataset: TRECVID SIN 2013 • 800 and 200 hours of internet archive videos for training and testing • One keyframe per video shot • Evaluated concepts: 38, Evaluation measure: MXinfAP Dataset: PASCAL VOC 2012 • 5717 training, 5823 validation and 10991 test images • Evaluation on the validation set instead of the original test set • Evaluated concepts: 20, Evaluation measure: MAP We fine-tuned 3 pre-trained ImageNet DCNNs: • CaffeNet-1k, trained on 1000 ImageNet categories • GoogLeNet-1k, trained on the same 1000 ImageNet categories • GoogLeNet-5k, trained using 5055 ImageNet categories
  • 15. 15 Evaluation setup For each pair of utilized network and fine-tuning strategy we evaluate: • The direct output of the network • Logistic regression (LR) classifiers trained on DCNN-based features • One LR classifier per concept trained using the output of one layer • The late-fused output (arithmetic mean) of LR classifiers trained using the last three layers We also evaluate the two auxiliary classifiers of the GoogLeNet-based networks
  • 16. 16 Preliminary experiments for parameter selection • Classification accuracy for the FT1-def strategy and CaffeNet-1k-60-SIN • k: the learning rate multiplier of the pre-trained layers • e: the number of training epochs • The best accuracy per e is underlined; the globally best accuracy is bold and underlined • Improved accuracy for: • Smaller learning rate values for the pre-trained layers • Higher values for the training epochs
  • 17. 17 Experimental results – # target concepts • Goal: Assess impact of number of target concepts • Experiment: We fine-tune the network for either 60 or all 345 concepts; we evaluate the same 38⊆60 concepts • Conclusion: Fine-tuning a network for more concepts improves concept detection accuracy 19 20 21 22 23 24 25 26 FT1-def FT2-re1 FT2-re2 FT3-ex1-64 FT3-ex1-128 FT3-ex1-256 FT3-ex1-512 FT3-ex1-1024 FT3-ex1-2048 FT3-ex1-4096 FT3-ex2-64 FT3-ex2-128 FT3-ex2-256 FT3-ex2-512 FT3-ex2-1024 FT3-ex2-2048 FT3-ex2-4096 MXInfAP(%) Fine-tuning strategies TRECVID SIN Dataset CaffeNet-1k-60-SIN CaffeNet-1k-345-SIN
  • 18. 18 Experimental results – direct output • Goal: Assess DCNNs as standalone classifiers (direct output) • Experiment: Fine-tuning on the TRECVID SIN dataset • Conclusion: FT3-ex1-64 and FT3-ex1-128 constitute the top-two methods irrespective of the employed DCNN 14 16 18 20 22 24 26 FT1-def FT2-re1 FT2-re2 FT3-ex1-64 FT3-ex1-128 FT3-ex1-256 FT3-ex1-512 FT3-ex1-1024 FT3-ex1-2048 FT3-ex1-4096 FT3-ex2-64 FT3-ex2-128 FT3-ex2-256 FT3-ex2-512 FT3-ex2-1024 FT3-ex2-2048 FT3-ex2-4096 MXInfAP(%) Fine-tuning strategies TRECVID SIN Dataset CaffeNet-1k-SIN GoogLeNet-1k-SIN GoogLeNet-5k-SIN
  • 19. 19 Experimental results – direct output • Goal: Assess DCNNs as standalone classifiers (direct output) • Experiment: Fine-tuning on the PASCAL VOC dataset • Conclusion: FT3-ex1-512 and FT3-ex1-1024 the best performing strategies for the CaffeNet network 58 63 68 73 78 83 FT1-def FT2-re1 FT2-re2 FT3-ex1-64 FT3-ex1-128 FT3-ex1-256 FT3-ex1-512 FT3-ex1-1024 FT3-ex1-2048 FT3-ex1-4096 FT3-ex2-64 FT3-ex2-128 FT3-ex2-256 FT3-ex2-512 FT3-ex2-1024 FT3-ex2-2048 FT3-ex2-4096 MAP(%) Fine-tuning strategies PASCAL VOC Dataset CaffeNet-1k-VOC GoogLeNet-1k-VOC GoogLeNet-5k-VOC
  • 20. 20 Experimental results – direct output • Goal: Assess DCNNs as standalone classifiers (direct output) • Experiment: Fine-tuning on the PASCAL VOC dataset • Conclusion: FT3-ex1-2048 and FT3-ex1-4096 the top-two methods for the GoogLeNet-based networks 58 63 68 73 78 83 FT1-def FT2-re1 FT2-re2 FT3-ex1-64 FT3-ex1-128 FT3-ex1-256 FT3-ex1-512 FT3-ex1-1024 FT3-ex1-2048 FT3-ex1-4096 FT3-ex2-64 FT3-ex2-128 FT3-ex2-256 FT3-ex2-512 FT3-ex2-1024 FT3-ex2-2048 FT3-ex2-4096 MAP(%) Fine-tuning strategies PASCAL VOC Dataset CaffeNet-1k-VOC GoogLeNet-1k-VOC GoogLeNet-5k-VOC
  • 21. 21 Experimental results – direct output • Main conclusions: FT3-ex strategy with one extension layer is always the best solution • The optimal dimension of the extension layer depends on the dataset and the network architecture
  • 22. 22 Experimental results – DCNN features • Goal: Assess DCNNs as feature generators (DCNN-based features) • Experiment: LR concept detectors trained on the output of the last 3 layers and fused in terms of arithmetic-mean • Conclusion: FT3- ex1-512 in the top-five methods; FT3-ex2-64 is always among the five worst fine-tuning methods 17 19 21 23 25 27 29 31 33 FT1-def FT2-re1 FT2-re2 FT3-ex1-64 FT3-ex1-128 FT3-ex1-256 FT3-ex1-512 FT3-ex1-1024 FT3-ex1-2048 FT3-ex1-4096 FT3-ex2-64 FT3-ex2-128 FT3-ex2-256 FT3-ex2-512 FT3-ex2-1024 FT3-ex2-2048 FT3-ex2-4096 MXInfAP(%) Fine-tuning strategies TRECVID SIN Dataset CaffeNet-1k-SIN GoogLeNet-1k-SIN GoogLeNet-5k-SIN
  • 23. 23 Experimental results – DCNN features • The same conclusions hold for the PASCAL VOC Dataset 50 55 60 65 70 75 80 85 90 FT1-def FT2-re1 FT2-re2 FT3-ex1-64 FT3-ex1-128 FT3-ex1-256 FT3-ex1-512 FT3-ex1-1024 FT3-ex1-2048 FT3-ex1-4096 FT3-ex2-64 FT3-ex2-128 FT3-ex2-256 FT3-ex2-512 FT3-ex2-1024 FT3-ex2-2048 FT3-ex2-4096 MAP(%) Fine-tuning strategies PASCAL VOC Dataset CaffeNet-1k-VOC GoogLeNet-1k-VOC GoogLeNet-5k-VOC
  • 24. 24 Experimental results – DCNN features • Main conclusions: FT3-ex strategy almost always outperforms the other two fine- tuning strategies • FT3-ex1-512 is in the top-five methods • Additional conclusions: drawn from results presented in the paper • Features extracted from the top layers are more accurate than layers positioned lower in the network; the optimal layer varies, depending on the target domain dataset • Better to combine features extracted from many layers • The presented results correspond to the fused output of the last 3 layers
  • 25. 25 Conclusions • Extension strategy almost always outperforms all the other strategies • Increase the depth with one fully-connected layer • Fine-tune the rest of the layers • DCNN-based features significantly outperform the direct output • In a few cases the direct output works comparably well • Choose based on the application that the DCNN will be used; e.g., real time applications’ time and memory limitations • Better to combine features extracted from many layers
  • 26. 26 References [1] Campos, V., Salvador, A., Giro-i Nieto, X., Jou, B.: Diving deep into sentiment: understanding fine-tuned CNNs for visual sentiment prediction. In: 1st International Workshop on Affect and Sentiment in Multimedia (ASM 2015), pp. 57–62. ACM, Brisbane (2015) [2] Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (2014) [5.] Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition (CVPR 2014) (2014) [8] Markatopoulou, F., et al.: ITI-CERTH participation in TRECVID 2015. In: TRECVID 2015 Workshop. NIST, Gaithersburg (2015) [9] Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Computer Vision and Pattern Recognition (CVPR 2014) (2014) [15] Snoek, C., Fontijne, D., van de Sande, K.E., Stokman, H., et al.: Qualcomm Research and University of Amsterdam at TRECVID 2015: recognizing concepts, objects, and events in video. In: TRECVID 2015 Workshop. NIST, Gaithersburg (2015) [18] Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? CoRR abs/1411.1792 (2014)
  • 27. 27 Thank you for your attention! Questions? More information and contact: Dr. Vasileios Mezaris bmezaris@iti.gr http://www.iti.gr/~bmezaris