SlideShare a Scribd company logo
Remote Sensing Laboratory
Dept. of Information Engineering and Computer Science
University of Trento
Via Sommarive, 14, I-38123 Povo, Trento, Italy
STUDENT
Michele Compri
Multi-Label Remote Sensing Image
Retrieval By Using Deep Features
E-mail: michele.compri@studenti.unitn.it
THESIS ADVISORS
Begüm Demir (Unitn)
Xavier Girò-i-Nieto (UPC)
University of Trento, Italy
Outline
Michele Compri
Introduction
Aim of the Thesis
1
Conclusion
Proposed Approach to Multi-Label RS Image Retrieval
2
3
5
Experimental Results4
2
University of Trento, Italy
Introduction
Michele Compri
✓ During the last decade, advances in RS technology has led to an increased
volume of remote sensing (RS) images.
✓ EO data archives grow rapidly motivating the need of efficient and effective
content-based image retrieval (CBIR) methods.
3
Query
Archive
Similar metrics(
Euclidean, cosine
similarity)
v = ( v1,...,vn)
v1= ( v11,...,v1n)
vk = ( vk1,...,vkn)
Image MatchingImage Representation Ranking
University of Trento, Italy
Aim of the Thesis
Michele Compri
✓ Usually, in CBIR system in RS, for image representation and image
matching, images are categorized under a single-label.
✓ Such strategy does not fit well the complexity of RS image, where
each one might be associated multi labels.
4
Parking Lot Tennis court Airplane
Airplane, Cars,
Grass, Trees
Cars, Pavement
Bare-soil,Court,
Grass, Tree
Proposal Solution: To Investigate the effectiveness of different Deep
Learning architecture in the framework of multi-label RS image retrieval
problems.
University of Trento, Italy
Proposed Approach: General View
Michele Compri 5
XTr
Training set TTr
Fine-tuning
Pretrained
DEEP CNN
fine-tuned
DEEP CNN
N retrieved images
Feature
Extraction
Retrieval
System is composed by three main stages:
● Pretrained Architecture
● Fine-Tuning
● Retrieval
University of Trento, Italy
Proposed Approach: Pretrained Architectures
Michele Compri
✓ Since that CNN takes a lot time and huge amount of data to be trained, pretrained
models on ImageNet are considered.
✓ In particular,three different pretrained architecture on ImageNet have been
considered:
➢ VGG16: CNN characterized by 16 weights layers, with intermediate
max pooling layers and 3 fully connected(FC) layers
➢ Inception V3: Improved version of GoogleNet, containing more layers
but less parameters, by removing FC and using global
average pooling
➢ ResNet50: Deeper CNN characterized by residual layer that allows
data to flow by skipping the convolutional blocks
✓ Since RS images are different to images present in ImageNet, fine-tuning approach is
considered to better hold on the features.
6
University of Trento, Italy 7
Proposed Approach: Fine-Tuning
XTr
Training set TTr
Fine-tuning
Pretrained
DEEP CNN
fine-tuned
DEEP CNN
Architecture
Classifier
Architecture
New
Classfier
New
Classfier
}High level
Trainable
Frozen
}
Michele Compri 7
➢ Fine tuning is a transfer
learning strategy to use
generic features of
pretrained architecture
while training the top of
fine-tuned architecture
➢ Fine tuning consists in
two phases:
■ Replace classifier
■ Training only top of
architecture
➢ Since that Multi-Label are
considered, binary cross
entropy as cost function
and sigmoid activation
are used
University of Trento, Italy 8
Proposed Approach:Feature Extraction
XTr
Test set TTr
Fine-tuned
DEEP CNN
Feature
Extraction
Features Extraction
Michele Compri 8
Retrieval
v = ( v1,...,vn)
OUTPUT
VGG16
BLOCK 1
BLOCK 2
BLOCK 3
BLOCK 5
CLASSIFIER
BLOCK 4
University of Trento, Italy 9
Proposed Approach:Retrieval
XTr
Test set TTr
Fine-tuned
DEEP CNN
Feature
Extraction
Image Retrieval
Michele Compri 9
Retrieval
v = ( v1,...,vn)
Image Dataset
Image Matching
BLOCK 1
BLOCK 2
BLOCK 3
BLOCK 5
CLASSIFIER
OUTPUT
VGG16
BLOCK 4
University of Trento, Italy 10
UC Merced Land Use benchmark archive: 2100 images categorized under 21 Land-cover
classes (categories) and characterized by 17 primitive classes (Multi-labels)
Data Set Description
Field Trees Airplane
Bare-soil Chaparral Buildings
Grass Sea Sand
Pavement Mobile-home Cars
Ship Dock Water
Tanks Court
Multi-labels (primitive classes)Single-Label( Broad categories)
Agricultural Airplane Basell diamond
Beach Buildings Chaparral
Dense Residential Forest Freeway
Golf Course Harbor Intersection
Medium Residential Mobile Home
Park
Overpass
Parking Lot River Runaway
Sparse Residential Storage Tanks Tennis Court
Airplane
Airplane, Cars,
Grass, Trees
Parking Lot
Cars, Pavement
Tennis Court
Bare-soil,Court,
Grass, Tree
Michele Compri 10
University of Trento, Italy
Experimental Setup
11
✓ Considered Framework is Keras, which is deep learning python library
that run on top of Theano, numerical computational library.
✓ Dataset is splitted as: 80% training set and 20% test set.
✓ Different values for each meta-parameter have been tested using
fine-tuning technique.
11
Name Values
Optimizer initial/final SGD/ Adam
Learning rate initial/final 0.001/ 0.01
Weights decay initial/final 0 /0.3678
Michele Compri 11
University of Trento, Italy
Experimental Setup
12
✓ To fine-tune, each architecture is splitted into Fine-tuned layers and
Frozen layers.
✓ Fine-tuned layers: During training phase the weights presented in that
layer are updated, in according with considered archive.
✓ Frozen layers: Part of architecture where weights does not change
( generic features).
12
Architecture Fine-tuned
Layers (Top)
VGG-16 14-18
Inception V3 172-217
ResNet 50 152-174
Michele Compri
New
Classfier
}High level
Trainable
Frozen
}
12
University of Trento, Italy
Experimental Results
13
Architectures Accuracy Precision Recall
VGG-16 58.22% 69.40% 69.95%
Inception V3 52.15% 63.08% 62.64%
ResNet 50 66.89% 76.27% 78.06%
✓ Baseline Experiment: Performance of original pretrained Deep
architectures on retrieval the most 20 similar images.
✓ To evaluate performance three metrics have been considered: Accuracy,
Precision and Recall
Michele Compri 13
University of Trento, Italy
Experimental Results
14
Architectures Accuracy Precision Recall
VGG-16 70.97% 80.54% 81.61%
Inception-V3 66.97% 76.69% 77.53%
ResNet50 72.51% 82.18% 83.05%
Architecture Accuracy Precision Recall
VGG-16 +12.75% +11.14% +11.66%
inception V3 +14.82% +13.61% +14.89%
ResNet50 +5.62% +5.91% +4.99%
✓ Performance of fine-tuned architectures on top 20 retrieved images
✓ Gain of fine-tuning with respect to the model pre-trained with ImageNet
Michele Compri 14
University of Trento, Italy
Experimental Results
15
Methods Accuracy Precision Recall
SVM 70.39% 80.32% 76.08%
ResNet50 72.51% 82.18% 83.05%
✓ Performance of SVM by using SIFT features vs fine-tuned architecture
Michele Compri 15
University of Trento, Italy
Experimental Results
16
Intersection
Buildings, Cars, Grass,
Pavement, Tree
Intersection
Buildings, Cars, Grass,
Pavement, Tree
Intersection
Bare-soil, Buildings, Cars,
Grass, Pavement, Tree
Tenniscourt
Buildings, Cars, Court,
Pavement, Tree
Sparse Residential
Buildings, Grass, Pavement,
Tree
Medium Residential
Buildings, Cars, Grass,
Tree
Intersection
Bare-soil, Buildings, Cars,
Grass, Pavement, Tree
Intersection
Bare-soil, Buildings, Cars,
Grass, Pavement, Tree
Intersection
Bare-soil, Buildings, Cars,
Grass, Pavement, Tree
Intersection
Bare-soil, Buildings, Cars,
Grass, Pavement, TreeQuery
VGG16 Inception V3 ResNet50
111
10 1010
202020
Michele Compri 16
University of Trento, Italy
Conclusion
17
✓ Unlike to existing CBIR system, multi-label RS images are retrieved by
investigating the effectiveness of different Deep Learning architecture.
✓ Three different pretrained architecture on ImageNet are considered: VGG16,
Inception V3 and ResNet50
✓ These off-the-shell models are fine-tuned with subset of RS images and their
multi-label information.
✓ From retrieval experiment we observe that architectures and also fine-tuning
strategy are effectived in multi-label RS images framework.
✓ As future development:
▪ Different architectures could be analyzed
▪ Data augmentation could be taken in consideration
▪ Collect more data to train architectures from scratch
Michele Compri 17
University of Trento, Italy 18
THANKS FOR YOUR
ATTENTION !
Michele Compri 18

More Related Content

PPTX
License Plate Recognition System
PPTX
Object detection presentation
PPTX
Introduction to Digital Image Processing
PPTX
Image enhancement techniques
PPTX
Introduction to image contrast and enhancement method
PPTX
Computer Vision
PDF
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
PDF
camera-based Lane detection by deep learning
License Plate Recognition System
Object detection presentation
Introduction to Digital Image Processing
Image enhancement techniques
Introduction to image contrast and enhancement method
Computer Vision
“Person Re-Identification and Tracking at the Edge: Challenges and Techniques...
camera-based Lane detection by deep learning

What's hot (20)

PPT
Image enhancement
PPTX
Brain Tumor Detection Using Image Processing
PPTX
Digital image processing
PPTX
Image processing second unit Notes
PPTX
Parking management system ppt
PPTX
Presentation1 (2)
PPTX
Computer vision
PPTX
Stages of image processing
PDF
Image Restoration (Digital Image Processing)
PPTX
Vehicle detection
PDF
Integration of BIM and GIS: From Ideal to Reality
PPT
Image pre processing
PPT
introduction to Digital Image Processing
PPT
Görüntü i̇şleme
DOCX
Eye ball cursor movement using opencv
PPTX
Automated attendance system based on facial recognition
PPTX
Watershed
PPTX
Multimedia_image recognition steps
DOCX
imageprocessing-abstract
PDF
画像解析の基礎知識
Image enhancement
Brain Tumor Detection Using Image Processing
Digital image processing
Image processing second unit Notes
Parking management system ppt
Presentation1 (2)
Computer vision
Stages of image processing
Image Restoration (Digital Image Processing)
Vehicle detection
Integration of BIM and GIS: From Ideal to Reality
Image pre processing
introduction to Digital Image Processing
Görüntü i̇şleme
Eye ball cursor movement using opencv
Automated attendance system based on facial recognition
Watershed
Multimedia_image recognition steps
imageprocessing-abstract
画像解析の基礎知識
Ad

Viewers also liked (20)

PDF
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
PDF
YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group)
PPT
Tools for Image Retrieval in Large Multimedia Databases
PDF
Convolutional Features for Instance Search
PDF
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
PDF
Region-oriented Convolutional Networks for Object Retrieval
PDF
Deep Learning for Computer Vision: ImageNet Challenge (UPC 2016)
PDF
Deep Learning for Computer Vision: Object Detection (UPC 2016)
PDF
Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...
PDF
Deep and Young Vision Learning at UPC BarcelonaTech (NIPS 2016)
PDF
Relevance feedback for image retrieval with EEG signals
PPTX
落合 Wba hackathon2_成果報告_最終版
PDF
Creating new classes of objects with deep generative neural nets
PDF
Part-based Object Retrieval with Binary Partition Trees
PDF
Paper crf design_tools
PPTX
Image enhancement
PPTX
LIvRE: A Video Extension to the LIRE Content-Based Image Retrieval System
PDF
情報幾何勉強会 EMアルゴリズム
PPTX
第2回nips読み会・関西資料『unsupervised learning for physical interaction through video ...
PPTX
Conditional Random Fields - Vidya Venkiteswaran
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group)
Tools for Image Retrieval in Large Multimedia Databases
Convolutional Features for Instance Search
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Region-oriented Convolutional Networks for Object Retrieval
Deep Learning for Computer Vision: ImageNet Challenge (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...
Deep and Young Vision Learning at UPC BarcelonaTech (NIPS 2016)
Relevance feedback for image retrieval with EEG signals
落合 Wba hackathon2_成果報告_最終版
Creating new classes of objects with deep generative neural nets
Part-based Object Retrieval with Binary Partition Trees
Paper crf design_tools
Image enhancement
LIvRE: A Video Extension to the LIRE Content-Based Image Retrieval System
情報幾何勉強会 EMアルゴリズム
第2回nips読み会・関西資料『unsupervised learning for physical interaction through video ...
Conditional Random Fields - Vidya Venkiteswaran
Ad

Similar to Multi-label Remote Sensing Image Retrieval based on Deep Features (20)

DOCX
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Compressed domain video retargeting
PPTX
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
PDF
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
PDF
CAR DAMAGE DETECTION USING DEEP LEARNING
PDF
Traffic Sign Recognition System
PPTX
cityscapes Semantic Segmentation using FCN, U Net and U Net++.pptx
PPT
Fast optimization intevacoct6_3final
PDF
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
DOCX
Large-scale Video Classification with Convolutional Neural Net.docx
PDF
Inception v4 vs Inception Resnet v2.pdf
PDF
IRJET- Mango Classification using Convolutional Neural Networks
PDF
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
PDF
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
PDF
Ieee projects 2012 2013 - Datamining
PDF
Analysis of Educational Robotics activities using a machine learning approach
PDF
Video captioning in Vietnamese using deep learning
PPTX
What’s New in eCognition 8.9
PDF
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
PDF
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
PDF
IRJET- Automatic Object Sorting using Deep Learning
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Compressed domain video retargeting
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
CAR DAMAGE DETECTION USING DEEP LEARNING
Traffic Sign Recognition System
cityscapes Semantic Segmentation using FCN, U Net and U Net++.pptx
Fast optimization intevacoct6_3final
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...
Large-scale Video Classification with Convolutional Neural Net.docx
Inception v4 vs Inception Resnet v2.pdf
IRJET- Mango Classification using Convolutional Neural Networks
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
Ieee projects 2012 2013 - Datamining
Analysis of Educational Robotics activities using a machine learning approach
Video captioning in Vietnamese using deep learning
What’s New in eCognition 8.9
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
IRJET- Automatic Object Sorting using Deep Learning

More from Universitat Politècnica de Catalunya (20)

PDF
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
PDF
Deep Generative Learning for All
PDF
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
PDF
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
PDF
The Transformer - Xavier Giró - UPC Barcelona 2021
PDF
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
PDF
Open challenges in sign language translation and production
PPTX
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
PPTX
Discovery and Learning of Navigation Goals from Pixels in Minecraft
PDF
Learn2Sign : Sign language recognition and translation using human keypoint e...
PDF
Intepretability / Explainable AI for Deep Neural Networks
PDF
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
PDF
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
PDF
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
PDF
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
PDF
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
PDF
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
PDF
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
PDF
Curriculum Learning for Recurrent Video Object Segmentation
PDF
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
The Transformer - Xavier Giró - UPC Barcelona 2021
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Open challenges in sign language translation and production
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Learn2Sign : Sign language recognition and translation using human keypoint e...
Intepretability / Explainable AI for Deep Neural Networks
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Curriculum Learning for Recurrent Video Object Segmentation
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020

Recently uploaded (20)

PDF
annual-report-2024-2025 original latest.
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Computer network topology notes for revision
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
1_Introduction to advance data techniques.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Mega Projects Data Mega Projects Data
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Business Analytics and business intelligence.pdf
PDF
Foundation of Data Science unit number two notes
PPTX
Database Infoormation System (DBIS).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
annual-report-2024-2025 original latest.
Supervised vs unsupervised machine learning algorithms
Computer network topology notes for revision
Data_Analytics_and_PowerBI_Presentation.pptx
.pdf is not working space design for the following data for the following dat...
STUDY DESIGN details- Lt Col Maksud (21).pptx
1_Introduction to advance data techniques.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
ISS -ESG Data flows What is ESG and HowHow
Mega Projects Data Mega Projects Data
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Qualitative Qantitative and Mixed Methods.pptx
Reliability_Chapter_ presentation 1221.5784
Business Acumen Training GuidePresentation.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Business Analytics and business intelligence.pdf
Foundation of Data Science unit number two notes
Database Infoormation System (DBIS).pptx
Clinical guidelines as a resource for EBP(1).pdf
Introduction-to-Cloud-ComputingFinal.pptx

Multi-label Remote Sensing Image Retrieval based on Deep Features

  • 1. Remote Sensing Laboratory Dept. of Information Engineering and Computer Science University of Trento Via Sommarive, 14, I-38123 Povo, Trento, Italy STUDENT Michele Compri Multi-Label Remote Sensing Image Retrieval By Using Deep Features E-mail: michele.compri@studenti.unitn.it THESIS ADVISORS Begüm Demir (Unitn) Xavier Girò-i-Nieto (UPC)
  • 2. University of Trento, Italy Outline Michele Compri Introduction Aim of the Thesis 1 Conclusion Proposed Approach to Multi-Label RS Image Retrieval 2 3 5 Experimental Results4 2
  • 3. University of Trento, Italy Introduction Michele Compri ✓ During the last decade, advances in RS technology has led to an increased volume of remote sensing (RS) images. ✓ EO data archives grow rapidly motivating the need of efficient and effective content-based image retrieval (CBIR) methods. 3 Query Archive Similar metrics( Euclidean, cosine similarity) v = ( v1,...,vn) v1= ( v11,...,v1n) vk = ( vk1,...,vkn) Image MatchingImage Representation Ranking
  • 4. University of Trento, Italy Aim of the Thesis Michele Compri ✓ Usually, in CBIR system in RS, for image representation and image matching, images are categorized under a single-label. ✓ Such strategy does not fit well the complexity of RS image, where each one might be associated multi labels. 4 Parking Lot Tennis court Airplane Airplane, Cars, Grass, Trees Cars, Pavement Bare-soil,Court, Grass, Tree Proposal Solution: To Investigate the effectiveness of different Deep Learning architecture in the framework of multi-label RS image retrieval problems.
  • 5. University of Trento, Italy Proposed Approach: General View Michele Compri 5 XTr Training set TTr Fine-tuning Pretrained DEEP CNN fine-tuned DEEP CNN N retrieved images Feature Extraction Retrieval System is composed by three main stages: ● Pretrained Architecture ● Fine-Tuning ● Retrieval
  • 6. University of Trento, Italy Proposed Approach: Pretrained Architectures Michele Compri ✓ Since that CNN takes a lot time and huge amount of data to be trained, pretrained models on ImageNet are considered. ✓ In particular,three different pretrained architecture on ImageNet have been considered: ➢ VGG16: CNN characterized by 16 weights layers, with intermediate max pooling layers and 3 fully connected(FC) layers ➢ Inception V3: Improved version of GoogleNet, containing more layers but less parameters, by removing FC and using global average pooling ➢ ResNet50: Deeper CNN characterized by residual layer that allows data to flow by skipping the convolutional blocks ✓ Since RS images are different to images present in ImageNet, fine-tuning approach is considered to better hold on the features. 6
  • 7. University of Trento, Italy 7 Proposed Approach: Fine-Tuning XTr Training set TTr Fine-tuning Pretrained DEEP CNN fine-tuned DEEP CNN Architecture Classifier Architecture New Classfier New Classfier }High level Trainable Frozen } Michele Compri 7 ➢ Fine tuning is a transfer learning strategy to use generic features of pretrained architecture while training the top of fine-tuned architecture ➢ Fine tuning consists in two phases: ■ Replace classifier ■ Training only top of architecture ➢ Since that Multi-Label are considered, binary cross entropy as cost function and sigmoid activation are used
  • 8. University of Trento, Italy 8 Proposed Approach:Feature Extraction XTr Test set TTr Fine-tuned DEEP CNN Feature Extraction Features Extraction Michele Compri 8 Retrieval v = ( v1,...,vn) OUTPUT VGG16 BLOCK 1 BLOCK 2 BLOCK 3 BLOCK 5 CLASSIFIER BLOCK 4
  • 9. University of Trento, Italy 9 Proposed Approach:Retrieval XTr Test set TTr Fine-tuned DEEP CNN Feature Extraction Image Retrieval Michele Compri 9 Retrieval v = ( v1,...,vn) Image Dataset Image Matching BLOCK 1 BLOCK 2 BLOCK 3 BLOCK 5 CLASSIFIER OUTPUT VGG16 BLOCK 4
  • 10. University of Trento, Italy 10 UC Merced Land Use benchmark archive: 2100 images categorized under 21 Land-cover classes (categories) and characterized by 17 primitive classes (Multi-labels) Data Set Description Field Trees Airplane Bare-soil Chaparral Buildings Grass Sea Sand Pavement Mobile-home Cars Ship Dock Water Tanks Court Multi-labels (primitive classes)Single-Label( Broad categories) Agricultural Airplane Basell diamond Beach Buildings Chaparral Dense Residential Forest Freeway Golf Course Harbor Intersection Medium Residential Mobile Home Park Overpass Parking Lot River Runaway Sparse Residential Storage Tanks Tennis Court Airplane Airplane, Cars, Grass, Trees Parking Lot Cars, Pavement Tennis Court Bare-soil,Court, Grass, Tree Michele Compri 10
  • 11. University of Trento, Italy Experimental Setup 11 ✓ Considered Framework is Keras, which is deep learning python library that run on top of Theano, numerical computational library. ✓ Dataset is splitted as: 80% training set and 20% test set. ✓ Different values for each meta-parameter have been tested using fine-tuning technique. 11 Name Values Optimizer initial/final SGD/ Adam Learning rate initial/final 0.001/ 0.01 Weights decay initial/final 0 /0.3678 Michele Compri 11
  • 12. University of Trento, Italy Experimental Setup 12 ✓ To fine-tune, each architecture is splitted into Fine-tuned layers and Frozen layers. ✓ Fine-tuned layers: During training phase the weights presented in that layer are updated, in according with considered archive. ✓ Frozen layers: Part of architecture where weights does not change ( generic features). 12 Architecture Fine-tuned Layers (Top) VGG-16 14-18 Inception V3 172-217 ResNet 50 152-174 Michele Compri New Classfier }High level Trainable Frozen } 12
  • 13. University of Trento, Italy Experimental Results 13 Architectures Accuracy Precision Recall VGG-16 58.22% 69.40% 69.95% Inception V3 52.15% 63.08% 62.64% ResNet 50 66.89% 76.27% 78.06% ✓ Baseline Experiment: Performance of original pretrained Deep architectures on retrieval the most 20 similar images. ✓ To evaluate performance three metrics have been considered: Accuracy, Precision and Recall Michele Compri 13
  • 14. University of Trento, Italy Experimental Results 14 Architectures Accuracy Precision Recall VGG-16 70.97% 80.54% 81.61% Inception-V3 66.97% 76.69% 77.53% ResNet50 72.51% 82.18% 83.05% Architecture Accuracy Precision Recall VGG-16 +12.75% +11.14% +11.66% inception V3 +14.82% +13.61% +14.89% ResNet50 +5.62% +5.91% +4.99% ✓ Performance of fine-tuned architectures on top 20 retrieved images ✓ Gain of fine-tuning with respect to the model pre-trained with ImageNet Michele Compri 14
  • 15. University of Trento, Italy Experimental Results 15 Methods Accuracy Precision Recall SVM 70.39% 80.32% 76.08% ResNet50 72.51% 82.18% 83.05% ✓ Performance of SVM by using SIFT features vs fine-tuned architecture Michele Compri 15
  • 16. University of Trento, Italy Experimental Results 16 Intersection Buildings, Cars, Grass, Pavement, Tree Intersection Buildings, Cars, Grass, Pavement, Tree Intersection Bare-soil, Buildings, Cars, Grass, Pavement, Tree Tenniscourt Buildings, Cars, Court, Pavement, Tree Sparse Residential Buildings, Grass, Pavement, Tree Medium Residential Buildings, Cars, Grass, Tree Intersection Bare-soil, Buildings, Cars, Grass, Pavement, Tree Intersection Bare-soil, Buildings, Cars, Grass, Pavement, Tree Intersection Bare-soil, Buildings, Cars, Grass, Pavement, Tree Intersection Bare-soil, Buildings, Cars, Grass, Pavement, TreeQuery VGG16 Inception V3 ResNet50 111 10 1010 202020 Michele Compri 16
  • 17. University of Trento, Italy Conclusion 17 ✓ Unlike to existing CBIR system, multi-label RS images are retrieved by investigating the effectiveness of different Deep Learning architecture. ✓ Three different pretrained architecture on ImageNet are considered: VGG16, Inception V3 and ResNet50 ✓ These off-the-shell models are fine-tuned with subset of RS images and their multi-label information. ✓ From retrieval experiment we observe that architectures and also fine-tuning strategy are effectived in multi-label RS images framework. ✓ As future development: ▪ Different architectures could be analyzed ▪ Data augmentation could be taken in consideration ▪ Collect more data to train architectures from scratch Michele Compri 17
  • 18. University of Trento, Italy 18 THANKS FOR YOUR ATTENTION ! Michele Compri 18