SlideShare a Scribd company logo
Placing Images with Refined Language Models and
Similarity Search with PCA-reduced VGG Features
Giorgos Kordopatis-Zilos1, Adrian Popescu2, Symeon Papadopoulos1 and
Yiannis Kompatsiaris1
1 Information Technologies Institute (ITI), CERTH, Greece
2 CEA LIST, 91190 Gif-sur-Yvette, France
MediaEval 2016 Workshop, Oct. 20-21, 2016, Hilversum, Netherlands.
Summary
#2
Tag-based location estimation (1 runs)
• Built upon the scheme of our 2015 participation [1] (Kordopatis-Zilos et
al., MediaEval 2015)
• Based on a refined probabilistic Language Model
Visual-based location estimation (1 run)
• Extract PCA-reduced VGG features to compute image similarities
• Geospatial clustering scheme of the most visually similar images
Hybrid location estimation (3 run)
• Combination of the textual and visual approaches using a set of rules
Training sets
• Training set released by the organisers (≈4.7M geotagged items)
• YFCC dataset, excl. images from users in test set (≈40M geotagged items)
• External data derived from gazetteers, i.e. Geonames and OpenStreetMap
Tag-based location estimation
#3
• Processing steps of the approach
– Offline: language model construction
– Online: location estimation
OpenStreetMap
Pre-processing
• Tags and titles of the training set items are processed
• Apply
– URL decoding
– lowercase transformation
– tokenization
• Remove
– accents
– symbols
– punctuations
• The multi-word tags are split into their individual terms,
which are also included in the item's term set
• Discarded numerics or less than three characters terms
#4
Language Model (LM)
• LM-based estimation
– Most Likely Cell (mlc) considered the cell with the highest probability and
used to produce the estimation
𝑚𝑙𝑐𝑗 = arg max 𝑖 ෍
𝑘=1
𝑇 𝑗
𝑝(𝑡 𝑘|𝑐𝑖) ∗ 𝑤(𝑡 𝑘)
Inspired from [4]: (Popescu, MediaEval 2013)
#5
• LM generation scheme
– divide earth surface in rectangular
cells with a side length of 0.01°
– calculate term-cell probabilities
𝑝(𝑡|𝑐) = 𝑁 𝑢/𝑁𝑡
Feature Selection and Weighting
#6
Feature Weighting
• Locality weight function, a function based on term relative position in T
• Spatial Entropy weight function, a Gaussian function based on the term’s
spatial entropy
• Linear combination of the two weights
Feature Selection
• Calculate terms locality using a grid of 0.01°×0.01°
• When a user uses a given term, he/she is assigned to the
entire cell neighborhood instead of a unique cell as in [1]
𝑙 𝑡 = 𝑁𝑡 ∗
σ 𝑐∈𝐶 σ 𝑢∈𝑈𝑡,𝑐
|{𝑢′|𝑢′ ∈ 𝑈𝑡,𝑐, 𝑢′ ≠ 𝑢}|
𝑁𝑡
2
• Terms with non-zero locality score form the term set 𝑇
Refinements
#7
• Multiple Grids
– Built an additional LM using a finer
grid (cell side length of 0.001°)
– combine the MLC of the individual
language models
• Similarity search [5] (Van Laere et al., ICMR 2011)
– determine 𝑘 𝑡 most similar training images in the MLC
– their center-of-gravity is the final location estimation
From [2]: (Kordopatis-Zilos et al., PAISI 2015)
Visual-based location estimation
#8
• Main Objectives
• Ensure that the visual features are generic and transferable
• Provide a compact representation of the features
• Model building
• CNN features extracted by fine-tuning the VGG model [4]
• Training: ~5K Points Of Interest (POIs), over 7M Flickr images using
queries with:
– the POI name and a radius of 5km around its coordinates
– the POI name and the associated city name
• Compressed outputs of fc7 layer (4096d) to 128d using PCA,
learned on a subset of 250,000 train images
• Similarity Search based on the PCA-reduced CNN features
Visual-based location estimation
#9
Location Estimation
• Geospatial clustering of 𝑘 𝑣 = 20 visually most similar images
• The largest cluster (or the first in case of equal size) is selected and
its centroid is used as the location estimate
Visual Confidence
• Confidence metric for the visual estimation is based on the size of
the largest cluster
𝑐𝑜𝑛𝑓𝑣 𝑖 = max(
𝑛 𝑖 − 𝑛 𝑡
𝑘 𝑣 − 𝑛 𝑡
, 0)
𝑛 𝑖 : number of neighbors in the largest cluster of image i
𝑛 𝑡: configuration parameter of the confidence score ‘’strictness’’
Hybrid-based location estimation
• A set of rules to determine the
source of estimation between the
text and visual approaches
• The visual estimation is chosen in
cases:
→ No estimation could be produced by
the text approach
→ Visual estimation fell inside the
borders of the mlc
→ By comparing the confidence scores
𝑐𝑜𝑛𝑓𝑣 and 𝑐𝑜𝑛𝑓𝑡 [1]
• Otherwise the text estimation is
selected
#10
Runs and Results
#11
RUN-1: Tag-based location estimation + released training set
RUN-2: Visual-based location estimation + released training set
RUN-3: Hybrid location estimation + released training set
RUN-4: Hybrid location estimation + YFCC dataset
RUN-5: Hybrid location estimation + YFCC + External data
RUN-E: Visual-based location estimation + entire YFCC dataset
Images
Runs and Results
#12
RUN-1: Tag-based location estimation + released training set
RUN-2: Visual-based location estimation + released training set
RUN-3: Hybrid location estimation + released training set
RUN-4: Hybrid location estimation + YFCC dataset
RUN-5: Hybrid location estimation + YFCC + External data
Videos
References
#13
[1] G. Kordopatis-Zilos, A. Popescu, S. Papadopoulos, and Y. Kompatsiaris.
Socialsensor at mediaeval placing task 2015. In MediaEval 2015 Placing Task,
2015.
[2] G. Kordopatis-Zilos, S. Papadopoulos, and Y. Kompatsiaris. Geotagging social
media content with a refined language modelling approach. In Intelligence and
Security Informatics, pages 21–40, 2015.
[3] A. Popescu. CEA LIST's participation at mediaeval 2013 placing task. In
MediaEval 2013 Placing Task, 2013.
[4] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-
scale image recognition. In International Conference on Learning
Representations, 2015.
[5] O. Van Laere, S. Schockaert, and B. Dhoedt. Finding locations of Flickr resources
using language models and similarity search. ICMR ’11, pages 48:1–48:8, New
York, NY, USA, 2011. ACM.
Thank you!
#14
Data/Code:
– https://github.com/MKLab-ITI/multimedia-geotagging/
Get in touch:
– Giorgos Kordopatis-Zilos: georgekordopatis@iti.gr
– Symeon Papadopoulos: papadop@iti.gr / @sympap
With the support of:

More Related Content

PPTX
Placing Images with Refined Language Models and Similarity Search with PCA-re...
PPTX
In-depth Exploration of Geotagging Performance
PPTX
Automated features extraction from satellite images.
PDF
Deep Local Parametric Filters for Image Enhancement
PDF
PCA and Classification
PDF
Feature Extraction from the Satellite Image Gray Color and Knowledge Discove...
PDF
Feature Based Image Classification by using Principal Component Analysis
PPTX
CERTH/CEA LIST at MediaEval Placing Task 2015
Placing Images with Refined Language Models and Similarity Search with PCA-re...
In-depth Exploration of Geotagging Performance
Automated features extraction from satellite images.
Deep Local Parametric Filters for Image Enhancement
PCA and Classification
Feature Extraction from the Satellite Image Gray Color and Knowledge Discove...
Feature Based Image Classification by using Principal Component Analysis
CERTH/CEA LIST at MediaEval Placing Task 2015

What's hot (20)

PDF
Ijetcas14 474
PPTX
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
PPTX
Convolutional Patch Representations for Image Retrieval An unsupervised approach
PDF
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group
PPTX
Deep image retrieval - learning global representations for image search - ub ...
PDF
High Performance Computing for Satellite Image Processing and Analyzing – A ...
PDF
Visualizing 3D atmospheric data with spherical volume texture on virtual globes
PPTX
Automatic Building detection for satellite Images using IGV and DSM
PPTX
Fault Enhancement Using Spectrally Based Seismic Attributes -- Dewett and Hen...
PDF
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
PDF
Hyougo iv2014 slide
PDF
IRJET - Dehazing of Single Nighttime Haze Image using Superpixel Method
PDF
satellite image processing
PPTX
Big Linked Data Interlinking - ExtremeEarth Open Workshop
PPTX
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
PDF
Velocity model building in Petrel
PPTX
Processing of satellite_image_using_digi
PDF
F0255046056
PPTX
Petrel course Module_1: Import data and management, make simple surfaces
PDF
Analysis of KinectFusion
Ijetcas14 474
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
Convolutional Patch Representations for Image Retrieval An unsupervised approach
DTAM: Dense Tracking and Mapping in Real-Time, Robot vision Group
Deep image retrieval - learning global representations for image search - ub ...
High Performance Computing for Satellite Image Processing and Analyzing – A ...
Visualizing 3D atmospheric data with spherical volume texture on virtual globes
Automatic Building detection for satellite Images using IGV and DSM
Fault Enhancement Using Spectrally Based Seismic Attributes -- Dewett and Hen...
論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real...
Hyougo iv2014 slide
IRJET - Dehazing of Single Nighttime Haze Image using Superpixel Method
satellite image processing
Big Linked Data Interlinking - ExtremeEarth Open Workshop
Reading group - Week 2 - Trajectory Pooled Deep-Convolutional Descriptors (TDD)
Velocity model building in Petrel
Processing of satellite_image_using_digi
F0255046056
Petrel course Module_1: Import data and management, make simple surfaces
Analysis of KinectFusion
Ad

Viewers also liked (19)

PDF
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
PDF
MediaEval 2015 - GTM-UVigo Systems for Person Discovery Task at MediaEval 2015
PDF
MediaEval 2015 - JRS at Synchronization of Multi-user Event Media Task
PDF
MediaEval 2016 - Emotion in Music Task: Lessons Learned
PDF
MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...
PDF
MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop
PDF
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...
PDF
MediaEval 2015 - Verifying Multimedia Use at MediaEval 2015
PDF
MediaEval 2016 - TUD-MMC Predicting media Interestingness Task
PDF
MediaEval 2016 - BUT Zero-Cost Speech Recognition
PPTX
Video Retrieval for Multimedia Verification of Breaking News on Social Networks
PDF
MediaEval 2016: A Multimodal System for the Verifying Multimedia Use Task
PDF
MediaEval 2016 - Simula Team @ Context of Experience Task
PDF
MediaEval 2015 - Synchronization of Multi-User Event Media at MediaEval 2015:...
PPTX
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
PDF
MediaEval 2016: LAPI at Predicting Media Interestingness Task
PDF
The InVID Plug-in: Web Video Verification on the Browser
PDF
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
PDF
MediaEval 2016 - Verifying Multimedia Use Task Overview
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2015 - GTM-UVigo Systems for Person Discovery Task at MediaEval 2015
MediaEval 2015 - JRS at Synchronization of Multi-user Event Media Task
MediaEval 2016 - Emotion in Music Task: Lessons Learned
MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...
MediaEval 2016 - IR Evaluation: Putting the User Back in the Loop
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...
MediaEval 2015 - Verifying Multimedia Use at MediaEval 2015
MediaEval 2016 - TUD-MMC Predicting media Interestingness Task
MediaEval 2016 - BUT Zero-Cost Speech Recognition
Video Retrieval for Multimedia Verification of Breaking News on Social Networks
MediaEval 2016: A Multimodal System for the Verifying Multimedia Use Task
MediaEval 2016 - Simula Team @ Context of Experience Task
MediaEval 2015 - Synchronization of Multi-User Event Media at MediaEval 2015:...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
MediaEval 2016: LAPI at Predicting Media Interestingness Task
The InVID Plug-in: Web Video Verification on the Browser
MediaEval 2016 - HUCVL Predicting Interesting Key Frames with Deep Models
MediaEval 2016 - Verifying Multimedia Use Task Overview
Ad

Similar to MediaEval 2016 - Placing Images with Refined Language Models and Similarity Search with PCA-reduced VGG Features (20)

PPTX
Geotagging Social Media Content with a Refined Language Modelling Approach
PPTX
Geotagging Social Media Content with a Refined Language Modelling Approach
PDF
Techniques for effective and efficient fire detection from social media images
PDF
Big Linked Data Federation - ExtremeEarth Open Workshop
PDF
NetVLAD: CNN architecture for weakly supervised place recognition
PDF
I MAGE S UBSET S ELECTION U SING G ABOR F ILTERS A ND N EURAL N ETWORKS
PDF
IMAGE SUBSET SELECTION USING GABOR FILTERS AND NEURAL NETWORKS
PDF
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
PPTX
GIS Analysis For Site Remediation
PPTX
Project Matsu: Elastic Clouds for Disaster Relief
PDF
IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
PDF
Deep image retrieval learning global representations for image search
PDF
Crowd sourcing gis for global urban area mapping
DOC
Algorithms and tools for point cloud generation
PPTX
Semantic Segmentation on Satellite Imagery
PDF
IRJET- Digital Image Forgery Detection using Local Binary Patterns (LBP) and ...
PDF
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
PDF
SD-miner System to Retrieve Probabilistic Neighborhood Points in Spatial Dat...
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
Techniques for effective and efficient fire detection from social media images
Big Linked Data Federation - ExtremeEarth Open Workshop
NetVLAD: CNN architecture for weakly supervised place recognition
I MAGE S UBSET S ELECTION U SING G ABOR F ILTERS A ND N EURAL N ETWORKS
IMAGE SUBSET SELECTION USING GABOR FILTERS AND NEURAL NETWORKS
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
GIS Analysis For Site Remediation
Project Matsu: Elastic Clouds for Disaster Relief
IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
Deep image retrieval learning global representations for image search
Crowd sourcing gis for global urban area mapping
Algorithms and tools for point cloud generation
Semantic Segmentation on Satellite Imagery
IRJET- Digital Image Forgery Detection using Local Binary Patterns (LBP) and ...
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
SD-miner System to Retrieve Probabilistic Neighborhood Points in Spatial Dat...

More from multimediaeval (20)

PPTX
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
PDF
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
PDF
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
PDF
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
PPTX
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
PDF
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
PDF
Fooling an Automatic Image Quality Estimator
PDF
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
PDF
Pixel Privacy: Quality Camouflage for Social Images
PDF
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
PPTX
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
PDF
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
PDF
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
PPTX
Deep Conditional Adversarial learning for polyp Segmentation
PPTX
A Temporal-Spatial Attention Model for Medical Image Detection
PPTX
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
PDF
Fine-tuning for Polyp Segmentation with Attention
PPTX
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
PPTX
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
PDF
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Fooling an Automatic Image Quality Estimator
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Pixel Privacy: Quality Camouflage for Social Images
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Deep Conditional Adversarial learning for polyp Segmentation
A Temporal-Spatial Attention Model for Medical Image Detection
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
Fine-tuning for Polyp Segmentation with Attention
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...

Recently uploaded (20)

PPTX
Microbiology with diagram medical studies .pptx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
An interstellar mission to test astrophysical black holes
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPT
protein biochemistry.ppt for university classes
PPTX
2Systematics of Living Organisms t-.pptx
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
BIOMOLECULES PPT........................
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Microbiology with diagram medical studies .pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Comparative Structure of Integument in Vertebrates.pptx
Cell Membrane: Structure, Composition & Functions
An interstellar mission to test astrophysical black holes
INTRODUCTION TO EVS | Concept of sustainability
protein biochemistry.ppt for university classes
2Systematics of Living Organisms t-.pptx
Derivatives of integument scales, beaks, horns,.pptx
7. General Toxicologyfor clinical phrmacy.pptx
Phytochemical Investigation of Miliusa longipes.pdf
AlphaEarth Foundations and the Satellite Embedding dataset
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Biophysics 2.pdffffffffffffffffffffffffff
The KM-GBF monitoring framework – status & key messages.pptx
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
2. Earth - The Living Planet Module 2ELS
BIOMOLECULES PPT........................
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Taita Taveta Laboratory Technician Workshop Presentation.pptx

MediaEval 2016 - Placing Images with Refined Language Models and Similarity Search with PCA-reduced VGG Features

  • 1. Placing Images with Refined Language Models and Similarity Search with PCA-reduced VGG Features Giorgos Kordopatis-Zilos1, Adrian Popescu2, Symeon Papadopoulos1 and Yiannis Kompatsiaris1 1 Information Technologies Institute (ITI), CERTH, Greece 2 CEA LIST, 91190 Gif-sur-Yvette, France MediaEval 2016 Workshop, Oct. 20-21, 2016, Hilversum, Netherlands.
  • 2. Summary #2 Tag-based location estimation (1 runs) • Built upon the scheme of our 2015 participation [1] (Kordopatis-Zilos et al., MediaEval 2015) • Based on a refined probabilistic Language Model Visual-based location estimation (1 run) • Extract PCA-reduced VGG features to compute image similarities • Geospatial clustering scheme of the most visually similar images Hybrid location estimation (3 run) • Combination of the textual and visual approaches using a set of rules Training sets • Training set released by the organisers (≈4.7M geotagged items) • YFCC dataset, excl. images from users in test set (≈40M geotagged items) • External data derived from gazetteers, i.e. Geonames and OpenStreetMap
  • 3. Tag-based location estimation #3 • Processing steps of the approach – Offline: language model construction – Online: location estimation OpenStreetMap
  • 4. Pre-processing • Tags and titles of the training set items are processed • Apply – URL decoding – lowercase transformation – tokenization • Remove – accents – symbols – punctuations • The multi-word tags are split into their individual terms, which are also included in the item's term set • Discarded numerics or less than three characters terms #4
  • 5. Language Model (LM) • LM-based estimation – Most Likely Cell (mlc) considered the cell with the highest probability and used to produce the estimation 𝑚𝑙𝑐𝑗 = arg max 𝑖 ෍ 𝑘=1 𝑇 𝑗 𝑝(𝑡 𝑘|𝑐𝑖) ∗ 𝑤(𝑡 𝑘) Inspired from [4]: (Popescu, MediaEval 2013) #5 • LM generation scheme – divide earth surface in rectangular cells with a side length of 0.01° – calculate term-cell probabilities 𝑝(𝑡|𝑐) = 𝑁 𝑢/𝑁𝑡
  • 6. Feature Selection and Weighting #6 Feature Weighting • Locality weight function, a function based on term relative position in T • Spatial Entropy weight function, a Gaussian function based on the term’s spatial entropy • Linear combination of the two weights Feature Selection • Calculate terms locality using a grid of 0.01°×0.01° • When a user uses a given term, he/she is assigned to the entire cell neighborhood instead of a unique cell as in [1] 𝑙 𝑡 = 𝑁𝑡 ∗ σ 𝑐∈𝐶 σ 𝑢∈𝑈𝑡,𝑐 |{𝑢′|𝑢′ ∈ 𝑈𝑡,𝑐, 𝑢′ ≠ 𝑢}| 𝑁𝑡 2 • Terms with non-zero locality score form the term set 𝑇
  • 7. Refinements #7 • Multiple Grids – Built an additional LM using a finer grid (cell side length of 0.001°) – combine the MLC of the individual language models • Similarity search [5] (Van Laere et al., ICMR 2011) – determine 𝑘 𝑡 most similar training images in the MLC – their center-of-gravity is the final location estimation From [2]: (Kordopatis-Zilos et al., PAISI 2015)
  • 8. Visual-based location estimation #8 • Main Objectives • Ensure that the visual features are generic and transferable • Provide a compact representation of the features • Model building • CNN features extracted by fine-tuning the VGG model [4] • Training: ~5K Points Of Interest (POIs), over 7M Flickr images using queries with: – the POI name and a radius of 5km around its coordinates – the POI name and the associated city name • Compressed outputs of fc7 layer (4096d) to 128d using PCA, learned on a subset of 250,000 train images • Similarity Search based on the PCA-reduced CNN features
  • 9. Visual-based location estimation #9 Location Estimation • Geospatial clustering of 𝑘 𝑣 = 20 visually most similar images • The largest cluster (or the first in case of equal size) is selected and its centroid is used as the location estimate Visual Confidence • Confidence metric for the visual estimation is based on the size of the largest cluster 𝑐𝑜𝑛𝑓𝑣 𝑖 = max( 𝑛 𝑖 − 𝑛 𝑡 𝑘 𝑣 − 𝑛 𝑡 , 0) 𝑛 𝑖 : number of neighbors in the largest cluster of image i 𝑛 𝑡: configuration parameter of the confidence score ‘’strictness’’
  • 10. Hybrid-based location estimation • A set of rules to determine the source of estimation between the text and visual approaches • The visual estimation is chosen in cases: → No estimation could be produced by the text approach → Visual estimation fell inside the borders of the mlc → By comparing the confidence scores 𝑐𝑜𝑛𝑓𝑣 and 𝑐𝑜𝑛𝑓𝑡 [1] • Otherwise the text estimation is selected #10
  • 11. Runs and Results #11 RUN-1: Tag-based location estimation + released training set RUN-2: Visual-based location estimation + released training set RUN-3: Hybrid location estimation + released training set RUN-4: Hybrid location estimation + YFCC dataset RUN-5: Hybrid location estimation + YFCC + External data RUN-E: Visual-based location estimation + entire YFCC dataset Images
  • 12. Runs and Results #12 RUN-1: Tag-based location estimation + released training set RUN-2: Visual-based location estimation + released training set RUN-3: Hybrid location estimation + released training set RUN-4: Hybrid location estimation + YFCC dataset RUN-5: Hybrid location estimation + YFCC + External data Videos
  • 13. References #13 [1] G. Kordopatis-Zilos, A. Popescu, S. Papadopoulos, and Y. Kompatsiaris. Socialsensor at mediaeval placing task 2015. In MediaEval 2015 Placing Task, 2015. [2] G. Kordopatis-Zilos, S. Papadopoulos, and Y. Kompatsiaris. Geotagging social media content with a refined language modelling approach. In Intelligence and Security Informatics, pages 21–40, 2015. [3] A. Popescu. CEA LIST's participation at mediaeval 2013 placing task. In MediaEval 2013 Placing Task, 2013. [4] K. Simonyan and A. Zisserman. Very deep convolutional networks for large- scale image recognition. In International Conference on Learning Representations, 2015. [5] O. Van Laere, S. Schockaert, and B. Dhoedt. Finding locations of Flickr resources using language models and similarity search. ICMR ’11, pages 48:1–48:8, New York, NY, USA, 2011. ACM.
  • 14. Thank you! #14 Data/Code: – https://github.com/MKLab-ITI/multimedia-geotagging/ Get in touch: – Giorgos Kordopatis-Zilos: georgekordopatis@iti.gr – Symeon Papadopoulos: papadop@iti.gr / @sympap With the support of: