An accurate retrieval through R-MAC+ descriptors for landmark recognition

An accurate retrieval through R-MAC+
descriptors for landmark recognition
Federico Magliani, Andrea Prati
ICDSC 2018 – Eindhoven, Netherlands – 3-4 September 2018

Agenda
2
➢ Motivations
➢ Summary of contributions
➢ Related works
➢ Introduction to R-MAC descriptors
➢ Proposed approach (R-MAC+)
➢ Experimental results
➢ Conclusions

Motivations
3
Landmark Recognition problem
➢ Try to understand what’s is in front
of you and retrieve similar images.
➢ Semantic gap: for a human, this task
is pretty simple thanks to personal
experience, but a computer can use
only the info available in the images.
➢ It is far from being solved
(viewpoint, illumination conditions,
image resolution, ...).

Motivations
4
➢ Challenges
○ High accuracy retrieval (precision)
○ Fast research (response to query)
○ Reduced memory occupied (mobile friendly)
○ Work well with big data (>1M data)
➢ Possible applications
○ Augmented reality (tourism)
○ Person Re-ID (video-surveillance)
○ Online clothes search (fashion)

Agenda
5
➢ Motivations
➢ Related works
➢ Conclusions and Future Works

Summary of contributions
6
➢ a new region detector for CNN feature maps implemented through grids, that respect
the aspect ratio of the images.
➢ an improvement on the effectiveness of the multi-resolution approach for R-MAC
descriptors.
➢ a novel retrieval method for checking the similarities between query descriptors and
regions of database R-MAC descriptors. It allows to outperform the results of R-MAC
descriptors on Oxford5k and Paris6k by +7% and +3%.

Agenda
7
➢ Motivations
➢ Summary of contribution
➢ Related works
➢ Conclusions

Related works
8
➢ Bag of Words (BoW): first method for solving the problem (different
techniques: vocabulary tree, …).
➢ VLAD: similar to BoW, but using the residual of the descriptors
(=feature descriptor - closest centers in the vocabulary).
➢ CNN based: extract features from intermediate layers of CNN
architectures and then apply previous embedding techniques (BLCF, ...).
➢ MAC: max pooling applied on CNN features
➢ R-MAC: regional MAC descriptors created through the application of a
rigid-grid mechanism

Agenda
9
➢ Motivations
➢ Related works
➢ Conclusions

R-MAC (Regional MAC) descriptors
10
Considering a rectangular region R ⊆ Ω = (1,W) x (1,H), and define the regional feature vector:
fR
= (fR,1
...fR,i
...fR,K
)T
where fR,i
= max Xi
(p) is the maximum activation of the ith
channel on the considered
region.
Then we calculate the feature vector associated with each region, and post-process it with
l2
-normalization, PCA-whitening and l2
-normalization. We combine the collection of regional feature
vectors into a single image vector by summing them and l2
-normalizing in the end.
We define the response maps and sample square regions at
L different scales
➢ at the largest scale (l=1), the region size is determined
to be as large as possible (height = width = min(W,H))
➢ at every other scale l, we uniformly sample l x (l+m-1)
regions of width 2min(W,H)/(l+1). (with m=2)

R-MAC (Regional MAC) descriptors
11
Settings:
➢ Fully convolutional off-the-shelf VGG16
➢ Pool5
➢ Spatial Max pooling
➢ High Resolution images
➢ Global descriptor based on aggregating region vectors
➢ Sliding window approach
Tolias et al. Particular object retrieval with integral max-pooling of CNN activations. arXiv 2015.

Agenda
12
➢ Motivations
➢ Related works
➢ Conclusions

Proposed approach: R-MAC+
New multi-resolution approach: the images are resized of +25%,-25%, 0% on the largest
size, respecting the aspect ratio of the image.
➢ This strategy is an alternative of the first multi-resolution approach, that resized the
image to a fixed size: 550px, 800px and 1050 on the largest size, retaining the aspect
ratio of the image.
➢ This strategy should allow to augment the dimensions of the feature maps in order to
have more features and therefore local maxima than the previous multi-resolution
R-MAC. This approach is connected to the new region detector, that detects a
reduced number of regions (15) instead of the 20 of the original one.
13

14
A new mechanism for region detection in the CNN feature maps (15 regions)
● l=0 → 1 region covering entirely the image;
● l=1 → 2 square regions (widthRegion = heightRegion = min(H,W));
● l=2 → 6 rect regions (widthRegion = heightRegion =⌈2*min(W,H)/(l+1))⌉, arranged along the
horizontal axis (width and height of the regions are adapted to cover all the image);
● l=3 → 6 rect regions (widthRegion = heightRegion= ⌈2*min(W,H)/(l+2))⌉, arranged along the
vertical axis (width and height of the regions are adapted to cover all the image).

15
A new retrieval method based on db regions (MAC descriptors of the database images) and the
R-MAC descriptors of the query images (+7% on Oxford5k and +4% on Paris6k than previous results)

Agenda
16
➢ Motivations
➢ Related works
➢ Conclusions

Datasets and evaluation metric
Datasets:
➢ Holidays (1491 images: 500 classes, 500 queries).
➢ Oxford5k (5063 images, 11 classes, 55 queries).
➢ Paris6k (6412 images, 11 classes, 55 queries).
Evaluation metric:
➢ mAP (mean Average Precision) → mean of Average Precision scores (correct results)
for each query, based on the position in the ranking.
17

Results
18
Method Network Holidays
(original/rotated)
Oxf5k Paris6k
MAC VGG19 76.26 % 57.44 % 73.15 %
R-MAC VGG19 87.65 % 65.56 % 82.80 %
R-MAC ResNet50 92.55 % 71.77 % 83.31 %
M-R R-MAC+ ResNet50 94.63 % / 95.58 % 78.88 % 88.63 %
M-R R-MAC+ with retrieval
based on db regions
ResNet50 94.37 % / 95.87 % 85.39 % 91.90 %

Results after QE application
19
Method Network Holidays
(original/rotated)
Oxf5k Paris6k
M-R R-MAC+ ResNet50 94.97 % / 95.97 % 86.45 % 92.01 %
based on db regions
ResNet50 94.42 % / 96.05 % 87.92 % 93.64 %
based on db regions and query
expansion based on db regions
ResNet50 94.28 % / 95.91 % 88.78 % 92.30 %

Comparison with the state of the art
20

Agenda
21
➢ Motivations
➢ Related works
➢ Conclusions

Conclusions
➢ We propose different improvements on R-MAC descriptors in order to make the
retrieval very accurate.
○ A multi-resolution approach, that uses bigger feature maps than the previous one.
○ A new region detector with the use of adaptable grids allows to catch more local
maxima.
○ A novel retrieval method based on db regions that highly boosts the performance on
Oxford5k and Paris6k.
➢ The proposed method outperforms the state of the art on Holidays, both on the
original and rotated version. Also it outperforms the state-of-the-art results on
some other public benchmarks without the fine-tuning application.
22

Thank you for your attention!
questions?
http://implab.ce.unipr.it
23

An accurate retrieval through R-MAC+ descriptors for landmark recognition

More Related Content

What's hot (20)

Similar to An accurate retrieval through R-MAC+ descriptors for landmark recognition (20)

Recently uploaded (20)

An accurate retrieval through R-MAC+ descriptors for landmark recognition