Multi-label Remote Sensing Image Retrieval based on Deep Features

Remote Sensing Laboratory
Dept. of Information Engineering and Computer Science
University of Trento
Via Sommarive, 14, I-38123 Povo, Trento, Italy
STUDENT
Michele Compri
Multi-Label Remote Sensing Image
Retrieval By Using Deep Features
E-mail: michele.compri@studenti.unitn.it
THESIS ADVISORS
Begüm Demir (Unitn)
Xavier Girò-i-Nieto (UPC)

University of Trento, Italy
Outline
Michele Compri
Introduction
Aim of the Thesis
1
Conclusion
Proposed Approach to Multi-Label RS Image Retrieval
2
3
5
Experimental Results4
2

Introduction
Michele Compri
✓ During the last decade, advances in RS technology has led to an increased
volume of remote sensing (RS) images.
✓ EO data archives grow rapidly motivating the need of efficient and effective
content-based image retrieval (CBIR) methods.
3
Query
Archive
Similar metrics(
Euclidean, cosine
similarity)
v = ( v1,...,vn)
v1= ( v11,...,v1n)
vk = ( vk1,...,vkn)
Image MatchingImage Representation Ranking

Aim of the Thesis
Michele Compri
✓ Usually, in CBIR system in RS, for image representation and image
matching, images are categorized under a single-label.
✓ Such strategy does not fit well the complexity of RS image, where
each one might be associated multi labels.
4
Parking Lot Tennis court Airplane
Airplane, Cars,
Grass, Trees
Cars, Pavement
Bare-soil,Court,
Grass, Tree
Proposal Solution: To Investigate the effectiveness of different Deep
Learning architecture in the framework of multi-label RS image retrieval
problems.

Proposed Approach: General View
Michele Compri 5
XTr
Training set TTr
Fine-tuning
Pretrained
DEEP CNN
fine-tuned
DEEP CNN
N retrieved images
Feature
Extraction
Retrieval
System is composed by three main stages:
● Pretrained Architecture
● Fine-Tuning
● Retrieval

Proposed Approach: Pretrained Architectures
Michele Compri
✓ Since that CNN takes a lot time and huge amount of data to be trained, pretrained
models on ImageNet are considered.
✓ In particular,three different pretrained architecture on ImageNet have been
considered:
➢ VGG16: CNN characterized by 16 weights layers, with intermediate
max pooling layers and 3 fully connected(FC) layers
➢ Inception V3: Improved version of GoogleNet, containing more layers
but less parameters, by removing FC and using global
average pooling
➢ ResNet50: Deeper CNN characterized by residual layer that allows
data to flow by skipping the convolutional blocks
✓ Since RS images are different to images present in ImageNet, fine-tuning approach is
considered to better hold on the features.
6

University of Trento, Italy 7
Proposed Approach: Fine-Tuning
XTr
Training set TTr
Fine-tuning
Pretrained
DEEP CNN
fine-tuned
DEEP CNN
Architecture
Classifier
Architecture
New
Classfier
New
Classfier
}High level
Trainable
Frozen
}
Michele Compri 7
➢ Fine tuning is a transfer
learning strategy to use
generic features of
pretrained architecture
while training the top of
fine-tuned architecture
➢ Fine tuning consists in
two phases:
■ Replace classifier
■ Training only top of
architecture
➢ Since that Multi-Label are
considered, binary cross
entropy as cost function
and sigmoid activation
are used

Proposed Approach:Feature Extraction
XTr
Test set TTr
Fine-tuned
DEEP CNN
Feature
Extraction
Features Extraction
Michele Compri 8
Retrieval
v = ( v1,...,vn)
OUTPUT
VGG16
BLOCK 1
BLOCK 2
BLOCK 3
BLOCK 5
CLASSIFIER
BLOCK 4

Proposed Approach:Retrieval
XTr
Test set TTr
Fine-tuned
DEEP CNN
Feature
Extraction
Image Retrieval
Michele Compri 9
Retrieval
v = ( v1,...,vn)
Image Dataset
Image Matching
BLOCK 1
BLOCK 2
BLOCK 3
BLOCK 5
CLASSIFIER
OUTPUT
VGG16
BLOCK 4

UC Merced Land Use benchmark archive: 2100 images categorized under 21 Land-cover
classes (categories) and characterized by 17 primitive classes (Multi-labels)
Data Set Description
Field Trees Airplane
Bare-soil Chaparral Buildings
Grass Sea Sand
Pavement Mobile-home Cars
Ship Dock Water
Tanks Court
Multi-labels (primitive classes)Single-Label( Broad categories)
Agricultural Airplane Basell diamond
Beach Buildings Chaparral
Dense Residential Forest Freeway
Golf Course Harbor Intersection
Medium Residential Mobile Home
Park
Overpass
Parking Lot River Runaway
Sparse Residential Storage Tanks Tennis Court
Airplane
Airplane, Cars,
Grass, Trees
Parking Lot
Cars, Pavement
Tennis Court
Bare-soil,Court,
Grass, Tree
Michele Compri 10

Experimental Setup
11
✓ Considered Framework is Keras, which is deep learning python library
that run on top of Theano, numerical computational library.
✓ Dataset is splitted as: 80% training set and 20% test set.
✓ Different values for each meta-parameter have been tested using
fine-tuning technique.
11
Name Values
Optimizer initial/final SGD/ Adam
Learning rate initial/final 0.001/ 0.01
Weights decay initial/final 0 /0.3678
Michele Compri 11

Experimental Setup
12
✓ To fine-tune, each architecture is splitted into Fine-tuned layers and
Frozen layers.
✓ Fine-tuned layers: During training phase the weights presented in that
layer are updated, in according with considered archive.
✓ Frozen layers: Part of architecture where weights does not change
( generic features).
12
Architecture Fine-tuned
Layers (Top)
VGG-16 14-18
Inception V3 172-217
ResNet 50 152-174
Michele Compri
New
Classfier
}High level
Trainable
Frozen
}
12

Experimental Results
13
Architectures Accuracy Precision Recall
VGG-16 58.22% 69.40% 69.95%
Inception V3 52.15% 63.08% 62.64%
ResNet 50 66.89% 76.27% 78.06%
✓ Baseline Experiment: Performance of original pretrained Deep
architectures on retrieval the most 20 similar images.
✓ To evaluate performance three metrics have been considered: Accuracy,
Precision and Recall
Michele Compri 13

14
Architectures Accuracy Precision Recall
VGG-16 70.97% 80.54% 81.61%
Inception-V3 66.97% 76.69% 77.53%
ResNet50 72.51% 82.18% 83.05%
Architecture Accuracy Precision Recall
VGG-16 +12.75% +11.14% +11.66%
inception V3 +14.82% +13.61% +14.89%
ResNet50 +5.62% +5.91% +4.99%
✓ Performance of fine-tuned architectures on top 20 retrieved images
✓ Gain of fine-tuning with respect to the model pre-trained with ImageNet
Michele Compri 14

15
Methods Accuracy Precision Recall
SVM 70.39% 80.32% 76.08%
ResNet50 72.51% 82.18% 83.05%
✓ Performance of SVM by using SIFT features vs fine-tuned architecture
Michele Compri 15

16
Intersection
Buildings, Cars, Grass,
Pavement, Tree
Intersection
Pavement, Tree
Intersection
Bare-soil, Buildings, Cars,
Grass, Pavement, Tree
Tenniscourt
Buildings, Cars, Court,
Pavement, Tree
Sparse Residential
Buildings, Grass, Pavement,
Tree
Medium Residential
Tree
Intersection
Intersection
Intersection
Intersection
Grass, Pavement, TreeQuery
VGG16 Inception V3 ResNet50
111
10 1010
202020
Michele Compri 16

Conclusion
17
✓ Unlike to existing CBIR system, multi-label RS images are retrieved by
investigating the effectiveness of different Deep Learning architecture.
✓ Three different pretrained architecture on ImageNet are considered: VGG16,
Inception V3 and ResNet50
✓ These off-the-shell models are fine-tuned with subset of RS images and their
multi-label information.
✓ From retrieval experiment we observe that architectures and also fine-tuning
strategy are effectived in multi-label RS images framework.
✓ As future development:
▪ Different architectures could be analyzed
▪ Data augmentation could be taken in consideration
▪ Collect more data to train architectures from scratch
Michele Compri 17

THANKS FOR YOUR
ATTENTION !
Michele Compri 18

Multi-label Remote Sensing Image Retrieval based on Deep Features

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Multi-label Remote Sensing Image Retrieval based on Deep Features (20)

More from Universitat Politècnica de Catalunya (20)

Recently uploaded (20)

Multi-label Remote Sensing Image Retrieval based on Deep Features