SlideShare a Scribd company logo
An accurate retrieval through R-MAC+
descriptors for landmark recognition
Federico Magliani, Andrea Prati
ICDSC 2018 – Eindhoven, Netherlands – 3-4 September 2018
Agenda
2
➢ Motivations
➢ Summary of contributions
➢ Related works
➢ Introduction to R-MAC descriptors
➢ Proposed approach (R-MAC+)
➢ Experimental results
➢ Conclusions
Motivations
3
Landmark Recognition problem
➢ Try to understand what’s is in front
of you and retrieve similar images.
➢ Semantic gap: for a human, this task
is pretty simple thanks to personal
experience, but a computer can use
only the info available in the images.
➢ It is far from being solved
(viewpoint, illumination conditions,
image resolution, ...).
Motivations
4
➢ Challenges
○ High accuracy retrieval (precision)
○ Fast research (response to query)
○ Reduced memory occupied (mobile friendly)
○ Work well with big data (>1M data)
➢ Possible applications
○ Augmented reality (tourism)
○ Person Re-ID (video-surveillance)
○ Online clothes search (fashion)
Agenda
5
➢ Motivations
➢ Summary of contributions
➢ Related works
➢ Introduction to R-MAC descriptors
➢ Proposed approach (R-MAC+)
➢ Experimental results
➢ Conclusions and Future Works
Summary of contributions
6
➢ a new region detector for CNN feature maps implemented through grids, that respect
the aspect ratio of the images.
➢ an improvement on the effectiveness of the multi-resolution approach for R-MAC
descriptors.
➢ a novel retrieval method for checking the similarities between query descriptors and
regions of database R-MAC descriptors. It allows to outperform the results of R-MAC
descriptors on Oxford5k and Paris6k by +7% and +3%.
Agenda
7
➢ Motivations
➢ Summary of contribution
➢ Related works
➢ Introduction to R-MAC descriptors
➢ Proposed approach (R-MAC+)
➢ Experimental results
➢ Conclusions
Related works
8
➢ Bag of Words (BoW): first method for solving the problem (different
techniques: vocabulary tree, …).
➢ VLAD: similar to BoW, but using the residual of the descriptors
(=feature descriptor - closest centers in the vocabulary).
➢ CNN based: extract features from intermediate layers of CNN
architectures and then apply previous embedding techniques (BLCF, ...).
➢ MAC: max pooling applied on CNN features
➢ R-MAC: regional MAC descriptors created through the application of a
rigid-grid mechanism
Agenda
9
➢ Motivations
➢ Summary of contributions
➢ Related works
➢ Introduction to R-MAC descriptors
➢ Proposed approach (R-MAC+)
➢ Experimental results
➢ Conclusions
R-MAC (Regional MAC) descriptors
10
Considering a rectangular region R ⊆ Ω = (1,W) x (1,H), and define the regional feature vector:
fR
= (fR,1
...fR,i
...fR,K
)T
where fR,i
= max Xi
(p) is the maximum activation of the ith
channel on the considered
region.
Then we calculate the feature vector associated with each region, and post-process it with
l2
-normalization, PCA-whitening and l2
-normalization. We combine the collection of regional feature
vectors into a single image vector by summing them and l2
-normalizing in the end.
We define the response maps and sample square regions at
L different scales
➢ at the largest scale (l=1), the region size is determined
to be as large as possible (height = width = min(W,H))
➢ at every other scale l, we uniformly sample l x (l+m-1)
regions of width 2min(W,H)/(l+1). (with m=2)
R-MAC (Regional MAC) descriptors
11
Settings:
➢ Fully convolutional off-the-shelf VGG16
➢ Pool5
➢ Spatial Max pooling
➢ High Resolution images
➢ Global descriptor based on aggregating region vectors
➢ Sliding window approach
Tolias et al. Particular object retrieval with integral max-pooling of CNN activations. arXiv 2015.
Agenda
12
➢ Motivations
➢ Summary of contributions
➢ Related works
➢ Introduction to R-MAC descriptors
➢ Proposed approach (R-MAC+)
➢ Experimental results
➢ Conclusions
Proposed approach: R-MAC+
New multi-resolution approach: the images are resized of +25%,-25%, 0% on the largest
size, respecting the aspect ratio of the image.
➢ This strategy is an alternative of the first multi-resolution approach, that resized the
image to a fixed size: 550px, 800px and 1050 on the largest size, retaining the aspect
ratio of the image.
➢ This strategy should allow to augment the dimensions of the feature maps in order to
have more features and therefore local maxima than the previous multi-resolution
R-MAC. This approach is connected to the new region detector, that detects a
reduced number of regions (15) instead of the 20 of the original one.
13
Proposed approach: R-MAC+
14
A new mechanism for region detection in the CNN feature maps (15 regions)
● l=0 → 1 region covering entirely the image;
● l=1 → 2 square regions (widthRegion = heightRegion = min(H,W));
● l=2 → 6 rect regions (widthRegion = heightRegion =⌈2*min(W,H)/(l+1))⌉, arranged along the
horizontal axis (width and height of the regions are adapted to cover all the image);
● l=3 → 6 rect regions (widthRegion = heightRegion= ⌈2*min(W,H)/(l+2))⌉, arranged along the
vertical axis (width and height of the regions are adapted to cover all the image).
Proposed approach: R-MAC+
15
A new retrieval method based on db regions (MAC descriptors of the database images) and the
R-MAC descriptors of the query images (+7% on Oxford5k and +4% on Paris6k than previous results)
Agenda
16
➢ Motivations
➢ Summary of contributions
➢ Related works
➢ Introduction to R-MAC descriptors
➢ Proposed approach (R-MAC+)
➢ Experimental results
➢ Conclusions
Datasets and evaluation metric
Datasets:
➢ Holidays (1491 images: 500 classes, 500 queries).
➢ Oxford5k (5063 images, 11 classes, 55 queries).
➢ Paris6k (6412 images, 11 classes, 55 queries).
Evaluation metric:
➢ mAP (mean Average Precision) → mean of Average Precision scores (correct results)
for each query, based on the position in the ranking.
17
Results
18
Method Network Holidays
(original/rotated)
Oxf5k Paris6k
MAC VGG19 76.26 % 57.44 % 73.15 %
R-MAC VGG19 87.65 % 65.56 % 82.80 %
R-MAC ResNet50 92.55 % 71.77 % 83.31 %
M-R R-MAC+ ResNet50 94.63 % / 95.58 % 78.88 % 88.63 %
M-R R-MAC+ with retrieval
based on db regions
ResNet50 94.37 % / 95.87 % 85.39 % 91.90 %
Results after QE application
19
Method Network Holidays
(original/rotated)
Oxf5k Paris6k
M-R R-MAC+ ResNet50 94.97 % / 95.97 % 86.45 % 92.01 %
M-R R-MAC+ with retrieval
based on db regions
ResNet50 94.42 % / 96.05 % 87.92 % 93.64 %
M-R R-MAC+ with retrieval
based on db regions and query
expansion based on db regions
ResNet50 94.28 % / 95.91 % 88.78 % 92.30 %
Comparison with the state of the art
20
Agenda
21
➢ Motivations
➢ Summary of contributions
➢ Related works
➢ Introduction to R-MAC descriptors
➢ Proposed approach (R-MAC+)
➢ Experimental results
➢ Conclusions
Conclusions
➢ We propose different improvements on R-MAC descriptors in order to make the
retrieval very accurate.
○ A multi-resolution approach, that uses bigger feature maps than the previous one.
○ A new region detector with the use of adaptable grids allows to catch more local
maxima.
○ A novel retrieval method based on db regions that highly boosts the performance on
Oxford5k and Paris6k.
➢ The proposed method outperforms the state of the art on Holidays, both on the
original and rotated version. Also it outperforms the state-of-the-art results on
some other public benchmarks without the fine-tuning application.
22
Thank you for your attention!
questions?
http://implab.ce.unipr.it
23

More Related Content

PPT
FR4.L09.5 - THREE DIMENSIONAL RECONSTRUCTION OF URBAN AREAS USING JOINTLY PHA...
PDF
Poster_Final
PDF
Fast Global Stereo Matching Via Energy Pyramid Minimization
PDF
A real-time system for vehicle detection with shadow removal and vehicle clas...
PPTX
Convolutional Patch Representations for Image Retrieval An unsupervised approach
PDF
CLIM: Transition Workshop - Statistical Emulation with Dimension Reduction fo...
PDF
Visual odometry & slam utilizing indoor structured environments
PPTX
Deep image retrieval - learning global representations for image search - ub ...
FR4.L09.5 - THREE DIMENSIONAL RECONSTRUCTION OF URBAN AREAS USING JOINTLY PHA...
Poster_Final
Fast Global Stereo Matching Via Energy Pyramid Minimization
A real-time system for vehicle detection with shadow removal and vehicle clas...
Convolutional Patch Representations for Image Retrieval An unsupervised approach
CLIM: Transition Workshop - Statistical Emulation with Dimension Reduction fo...
Visual odometry & slam utilizing indoor structured environments
Deep image retrieval - learning global representations for image search - ub ...

What's hot (20)

PPTX
Densebox
PPTX
Aerial detection part3
PPTX
Feature pyramid networks for object detection
PPTX
Automatic road environment classification 20121002
PDF
NetVLAD: CNN architecture for weakly supervised place recognition
PPT
Path Planning And Navigation
PDF
06466595
PPTX
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
PPT
K-BestMatch
PDF
object detection paper review
PPT
presentazione_IGARSS2011.ppt
PDF
mid_presentation
PDF
computervision project
PDF
Implementation of a lane-tracking system for autonomous driving using Kalman ...
PPT
Automatic Dense Semantic Mapping From Visual Street-level Imagery
PDF
Report bep thomas_blanken
DOC
PPTX
Prunet, Pascal: Plume detection and characterization from XCO2 imagery: Evalu...
PDF
REVIEW OF LANE DETECTION AND TRACKING ALGORITHMS IN ADVANCED DRIVER ASSISTANC...
PPT
Mmclass5b
Densebox
Aerial detection part3
Feature pyramid networks for object detection
Automatic road environment classification 20121002
NetVLAD: CNN architecture for weakly supervised place recognition
Path Planning And Navigation
06466595
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
K-BestMatch
object detection paper review
presentazione_IGARSS2011.ppt
mid_presentation
computervision project
Implementation of a lane-tracking system for autonomous driving using Kalman ...
Automatic Dense Semantic Mapping From Visual Street-level Imagery
Report bep thomas_blanken
Prunet, Pascal: Plume detection and characterization from XCO2 imagery: Evalu...
REVIEW OF LANE DETECTION AND TRACKING ALGORITHMS IN ADVANCED DRIVER ASSISTANC...
Mmclass5b
Ad

Similar to An accurate retrieval through R-MAC+ descriptors for landmark recognition (20)

PDF
Module-5-1_230523_171754 (1).pdf
PDF
rcnn.pdfmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
PPT
PCA-SIFT: A More Distinctive Representation for Local Image Descriptors
PPTX
A hybrid sine cosine optimization algorithm for solving global optimization p...
PDF
Pushing Intelligence to Edge Nodes : Low Power circuits for Self Localization...
PPT
FV_IGARSS11.ppt
PPT
FV_IGARSS11.ppt
PPT
FV_IGARSS11.ppt
PPT
FV_IGARSS11.ppt
PPT
FR3.L09 - MULTIBASELINE GRADIENT AMBIGUITY RESOLUTION TO SUPPORT MINIMUM COST...
PDF
Video Stitching using Improved RANSAC and SIFT
PPTX
All projects
PDF
ICRA Nathan Piasco
PDF
Convolutional Neural Network for pixel-wise skyline detection
PDF
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PDF
Real time traffic management - challenges and solutions
PDF
Computer Vision: Feature matching with RANSAC Algorithm
PPTX
crowd counting.pptx
PDF
Landmark Retrieval & Recognition
PDF
Deep image retrieval learning global representations for image search
Module-5-1_230523_171754 (1).pdf
rcnn.pdfmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
PCA-SIFT: A More Distinctive Representation for Local Image Descriptors
A hybrid sine cosine optimization algorithm for solving global optimization p...
Pushing Intelligence to Edge Nodes : Low Power circuits for Self Localization...
FV_IGARSS11.ppt
FV_IGARSS11.ppt
FV_IGARSS11.ppt
FV_IGARSS11.ppt
FR3.L09 - MULTIBASELINE GRADIENT AMBIGUITY RESOLUTION TO SUPPORT MINIMUM COST...
Video Stitching using Improved RANSAC and SIFT
All projects
ICRA Nathan Piasco
Convolutional Neural Network for pixel-wise skyline detection
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Real time traffic management - challenges and solutions
Computer Vision: Feature matching with RANSAC Algorithm
crowd counting.pptx
Landmark Retrieval & Recognition
Deep image retrieval learning global representations for image search
Ad

Recently uploaded (20)

PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
DOCX
573137875-Attendance-Management-System-original
PPTX
Geodesy 1.pptx...............................................
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PPTX
Construction Project Organization Group 2.pptx
PPT
Mechanical Engineering MATERIALS Selection
PPT
introduction to datamining and warehousing
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Fundamentals of Mechanical Engineering.pptx
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
PPT on Performance Review to get promotions
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
573137875-Attendance-Management-System-original
Geodesy 1.pptx...............................................
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Automation-in-Manufacturing-Chapter-Introduction.pdf
Categorization of Factors Affecting Classification Algorithms Selection
Construction Project Organization Group 2.pptx
Mechanical Engineering MATERIALS Selection
introduction to datamining and warehousing
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Fundamentals of Mechanical Engineering.pptx
Fundamentals of safety and accident prevention -final (1).pptx
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PPT on Performance Review to get promotions
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks

An accurate retrieval through R-MAC+ descriptors for landmark recognition

  • 1. An accurate retrieval through R-MAC+ descriptors for landmark recognition Federico Magliani, Andrea Prati ICDSC 2018 – Eindhoven, Netherlands – 3-4 September 2018
  • 2. Agenda 2 ➢ Motivations ➢ Summary of contributions ➢ Related works ➢ Introduction to R-MAC descriptors ➢ Proposed approach (R-MAC+) ➢ Experimental results ➢ Conclusions
  • 3. Motivations 3 Landmark Recognition problem ➢ Try to understand what’s is in front of you and retrieve similar images. ➢ Semantic gap: for a human, this task is pretty simple thanks to personal experience, but a computer can use only the info available in the images. ➢ It is far from being solved (viewpoint, illumination conditions, image resolution, ...).
  • 4. Motivations 4 ➢ Challenges ○ High accuracy retrieval (precision) ○ Fast research (response to query) ○ Reduced memory occupied (mobile friendly) ○ Work well with big data (>1M data) ➢ Possible applications ○ Augmented reality (tourism) ○ Person Re-ID (video-surveillance) ○ Online clothes search (fashion)
  • 5. Agenda 5 ➢ Motivations ➢ Summary of contributions ➢ Related works ➢ Introduction to R-MAC descriptors ➢ Proposed approach (R-MAC+) ➢ Experimental results ➢ Conclusions and Future Works
  • 6. Summary of contributions 6 ➢ a new region detector for CNN feature maps implemented through grids, that respect the aspect ratio of the images. ➢ an improvement on the effectiveness of the multi-resolution approach for R-MAC descriptors. ➢ a novel retrieval method for checking the similarities between query descriptors and regions of database R-MAC descriptors. It allows to outperform the results of R-MAC descriptors on Oxford5k and Paris6k by +7% and +3%.
  • 7. Agenda 7 ➢ Motivations ➢ Summary of contribution ➢ Related works ➢ Introduction to R-MAC descriptors ➢ Proposed approach (R-MAC+) ➢ Experimental results ➢ Conclusions
  • 8. Related works 8 ➢ Bag of Words (BoW): first method for solving the problem (different techniques: vocabulary tree, …). ➢ VLAD: similar to BoW, but using the residual of the descriptors (=feature descriptor - closest centers in the vocabulary). ➢ CNN based: extract features from intermediate layers of CNN architectures and then apply previous embedding techniques (BLCF, ...). ➢ MAC: max pooling applied on CNN features ➢ R-MAC: regional MAC descriptors created through the application of a rigid-grid mechanism
  • 9. Agenda 9 ➢ Motivations ➢ Summary of contributions ➢ Related works ➢ Introduction to R-MAC descriptors ➢ Proposed approach (R-MAC+) ➢ Experimental results ➢ Conclusions
  • 10. R-MAC (Regional MAC) descriptors 10 Considering a rectangular region R ⊆ Ω = (1,W) x (1,H), and define the regional feature vector: fR = (fR,1 ...fR,i ...fR,K )T where fR,i = max Xi (p) is the maximum activation of the ith channel on the considered region. Then we calculate the feature vector associated with each region, and post-process it with l2 -normalization, PCA-whitening and l2 -normalization. We combine the collection of regional feature vectors into a single image vector by summing them and l2 -normalizing in the end. We define the response maps and sample square regions at L different scales ➢ at the largest scale (l=1), the region size is determined to be as large as possible (height = width = min(W,H)) ➢ at every other scale l, we uniformly sample l x (l+m-1) regions of width 2min(W,H)/(l+1). (with m=2)
  • 11. R-MAC (Regional MAC) descriptors 11 Settings: ➢ Fully convolutional off-the-shelf VGG16 ➢ Pool5 ➢ Spatial Max pooling ➢ High Resolution images ➢ Global descriptor based on aggregating region vectors ➢ Sliding window approach Tolias et al. Particular object retrieval with integral max-pooling of CNN activations. arXiv 2015.
  • 12. Agenda 12 ➢ Motivations ➢ Summary of contributions ➢ Related works ➢ Introduction to R-MAC descriptors ➢ Proposed approach (R-MAC+) ➢ Experimental results ➢ Conclusions
  • 13. Proposed approach: R-MAC+ New multi-resolution approach: the images are resized of +25%,-25%, 0% on the largest size, respecting the aspect ratio of the image. ➢ This strategy is an alternative of the first multi-resolution approach, that resized the image to a fixed size: 550px, 800px and 1050 on the largest size, retaining the aspect ratio of the image. ➢ This strategy should allow to augment the dimensions of the feature maps in order to have more features and therefore local maxima than the previous multi-resolution R-MAC. This approach is connected to the new region detector, that detects a reduced number of regions (15) instead of the 20 of the original one. 13
  • 14. Proposed approach: R-MAC+ 14 A new mechanism for region detection in the CNN feature maps (15 regions) ● l=0 → 1 region covering entirely the image; ● l=1 → 2 square regions (widthRegion = heightRegion = min(H,W)); ● l=2 → 6 rect regions (widthRegion = heightRegion =⌈2*min(W,H)/(l+1))⌉, arranged along the horizontal axis (width and height of the regions are adapted to cover all the image); ● l=3 → 6 rect regions (widthRegion = heightRegion= ⌈2*min(W,H)/(l+2))⌉, arranged along the vertical axis (width and height of the regions are adapted to cover all the image).
  • 15. Proposed approach: R-MAC+ 15 A new retrieval method based on db regions (MAC descriptors of the database images) and the R-MAC descriptors of the query images (+7% on Oxford5k and +4% on Paris6k than previous results)
  • 16. Agenda 16 ➢ Motivations ➢ Summary of contributions ➢ Related works ➢ Introduction to R-MAC descriptors ➢ Proposed approach (R-MAC+) ➢ Experimental results ➢ Conclusions
  • 17. Datasets and evaluation metric Datasets: ➢ Holidays (1491 images: 500 classes, 500 queries). ➢ Oxford5k (5063 images, 11 classes, 55 queries). ➢ Paris6k (6412 images, 11 classes, 55 queries). Evaluation metric: ➢ mAP (mean Average Precision) → mean of Average Precision scores (correct results) for each query, based on the position in the ranking. 17
  • 18. Results 18 Method Network Holidays (original/rotated) Oxf5k Paris6k MAC VGG19 76.26 % 57.44 % 73.15 % R-MAC VGG19 87.65 % 65.56 % 82.80 % R-MAC ResNet50 92.55 % 71.77 % 83.31 % M-R R-MAC+ ResNet50 94.63 % / 95.58 % 78.88 % 88.63 % M-R R-MAC+ with retrieval based on db regions ResNet50 94.37 % / 95.87 % 85.39 % 91.90 %
  • 19. Results after QE application 19 Method Network Holidays (original/rotated) Oxf5k Paris6k M-R R-MAC+ ResNet50 94.97 % / 95.97 % 86.45 % 92.01 % M-R R-MAC+ with retrieval based on db regions ResNet50 94.42 % / 96.05 % 87.92 % 93.64 % M-R R-MAC+ with retrieval based on db regions and query expansion based on db regions ResNet50 94.28 % / 95.91 % 88.78 % 92.30 %
  • 20. Comparison with the state of the art 20
  • 21. Agenda 21 ➢ Motivations ➢ Summary of contributions ➢ Related works ➢ Introduction to R-MAC descriptors ➢ Proposed approach (R-MAC+) ➢ Experimental results ➢ Conclusions
  • 22. Conclusions ➢ We propose different improvements on R-MAC descriptors in order to make the retrieval very accurate. ○ A multi-resolution approach, that uses bigger feature maps than the previous one. ○ A new region detector with the use of adaptable grids allows to catch more local maxima. ○ A novel retrieval method based on db regions that highly boosts the performance on Oxford5k and Paris6k. ➢ The proposed method outperforms the state of the art on Holidays, both on the original and rotated version. Also it outperforms the state-of-the-art results on some other public benchmarks without the fine-tuning application. 22
  • 23. Thank you for your attention! questions? http://implab.ce.unipr.it 23