SlideShare a Scribd company logo
Dynamic Two-Stage Image Retrieval from
     Large Multimodal Databases

                      Avi Arampatzis
                    Konstantinos Zagoris
                   Savvas Chatzichristofis


    Department of Electrical and Computer Engineering
            Democritus University of Thrace
                  Xanthi 67100, Greece
                         ECIR 2011
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   2


                                                         Outline

1. Content-based Image Retrieval (CBIR): global vs local features, scalability

2. A problem of global features: The Red Tomato / Red Pie-Chart Problem

3. Multimodal Information Collections

4. Two-stage Image Retrieval from Multimedia Databases
     related work
     our main contributions

5. Experiments on the ImageCLEF 2010 Wikipedia Test Collection
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   3


                        Content-based Image Retrieval (CBIR)

 In CBIR, images are represented by either
     local features:
      ¦ computed at multiple image points; capable of recognizing objects
      ¦ computationally expensive, e.g. due to high dimensionality
     global features:
      ¦ generalize images with single vectors capturing color, texture, shape, etc.
      ¦ computationally cheap (relatively)
        ¥ thus, more popular

 CBIR with either global or local features
     does not scale up well to large databases efficiency-wise
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   4


                                     CBIR with Global Features

 notoriously noisy for image queries of low generality
  (generality = the fraction of relevant images in a collection)
     In text retrieval: documents matching no query keywords are not retrieved
     In image retrieval: CBIR will rank the whole collection
      ¦ The Red Tomato / Red Pie-Chart Problem [next]
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   5


                    The Red Tomato / Red Pie-Chart Problem




  Image query:                                        Collection items:



 If the query has a low generality
     early ranks may be dominated by spurious results (such as the pie-chart)
     possibly ranked even before red tomatoes on non-white backgrounds
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   7


                         Contemporary Information Collections

 Large

 Multimodal (= provide different manners of retrieval )


A good example is Wikipedia:

 Large:
     3,605,426 articles (English version alone), “millions of images”, etc.

 Multimodal:
     Topics are covered in several languages.
     Topics may include non-textual media (image, sound, video, etc.)
     annotated in a variety of metadata fields (caption, description, etc.)
    Thus, there are several ways to get to the same info/topic.
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   8


                  Image Retrieval from Multimodal Databases
In image retrieval systems, users are assumed to target visual similarity. Thus,

 Image is the primary medium.
     image modalities are most important than others

 Modalities from other media are secondary.
     but they can still help with the problems of CBIR:
      ¦ low generality (Red Tomato/Pie-Chart) problem (esp. for global feats.)
      ¦ speed

Traditionally, fusion has been used, but:

 weighing of media is not trivial; usually requires training data

 not theoretically sound: influence of textual scores may worsen the visual quality
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   9


       Two-stage Image Retrieval from Multimedia Databases
Key idea for improving CBIR effectiveness:
Before CBIR, raise query generality by reducing collection size via filtering.
Assumptions:

 Query is expressed in the primary medium i.e. image,

 accompanied by a query in a secondary medium, e.g. text.

Two stage retrieval:

 Rank the collection by the secondary medium/query, e.g. text.

 Draw a rank threshold K.

 Re-rank only the top-K items with CBIR.

Using a ‘cheaper’ secondary medium, we also improve speed.
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   10


                                         Previous Related Work

Best-results re-ranking by visual content has been seen before, but

 for other purposes, e.g. result clustering, diversity, . . .

 using external info (e.g. sets of images), or training data, . . .

 They all used
     global features
     a static predefined threshold for all queries

Effectiveness results have been mixed,
while some didn’t provide a comparative evaluation.
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   11


                                            Main Contributions

In view of the related literature, our main contributions are:

 dynamic thresholds per query
     no external information
     no training data

 extensive evaluation of thresholding types and levels
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   12


              The ImageCLEF 2010 Wikipedia Test Collection

 237,434 items
     Primary medium: image
      ¦ Heterogeneous: color natural images, graphics, greyscale images, etc.
      ¦ variety of image sizes
      ¦ It’s a large benchmark image database for today’s standards.
     + noisy and incomplete user-supplied annotations
     + wikipedia articles containing the images
     Annotations  articles come in 3 languages: En, Fr, De.

 70 test topics
     Visual part:
      ¦ 1 or more example images
     Textual part:
      ¦ 3 titles fields, one per language (En, Fr, De)
     Topics are assessed by visual similarity.
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   13


                                         Indexing and Retrieval

 text: only English annotations + English query
     Lemur Toolkit V4.11 / Indri V2.11
      ¦ tf.idf retrieval model (works better with our thresholding methods)
      ¦ default settings + Krovetz stemmer

 image:
     Joint Composite Descriptor (JCD)
      ¦ captures color and texture information
      ¦ developed for color natural images
     Spatial Color Distribution (SpCD)
      ¦ captures color and its spatial distribution
      ¦ more suitable for colored graphics (fewer colors, less texture)
     JCD  SpCD are found to be more effective than MPEG-7 descriptors.
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   14


                                   Thresholding and Re-ranking

 Static Thresholding: a fixed pre-selected rank threshold K for all topics
     K  25, 50, 100, 250, 500, 1000.

 Dynamic Thresholding: a variable rank threshold per topic
     Score-Distributional Threshold Optimization (SDTO)
      [Arampatzis et.al., ”Where to Stop Reading a Ranked-list?”, SIGIR 2009]
     Two types:
      ¦ Threshold on Precision: g  0.990, 0.950, 0.800, 0.500, 0.330, 0.100
      ¦ Threshold on prel: θ  0.990, 0.950, 0.800, 0.500, 0.330, 0.100
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   15


                        Initial Experiments: Setting a Baseline
                 item scoring by                                MAP      P@10          P@20          bpref
                 JCD1                                           .0058    .0486         .0479         .0352
                 maxi JCDi                                      .0072    .0614         .0614         .0387
                 maxi JCDi + maxi SpCDi                         .0112    .0871         .0886         .0415
                 tf.idf (text-only)                             .1293    .3614         .3314         .1806


 Image-only runs provide very weak baselines.

 We chose the text-only run as a baseline for the statistical significance testing.

 This makes sense also from an efficiency point of view:

     if using a secondary text modality for image retrieval is more effective than
      current CBIR methods, then there is no reason at all for using
      computationally costly CBIR methods.
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases        Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   16


                                                         Results

    threshold        K€         MAP         P@10
                                                 JCD1
                                                         P@20         bpref        MAP
                                                                                        maxi JCDi +
                                                                                             P@10
                                                                                                          maxi SpCDi
                                                                                                            P@20     bpref
    text-only        —          .1293       .3614        .3314        .1806        .1293     .3614          .3314    .1806
           25        25       .1162™      .3957 -       .3457˜       .1641™
                                                                                 .1168   - .3943 -         .3436 ˜
                                                                                                                    .1659™
           50        50        .1144™     .3829 -      .3579˜        .1608™      .1154 -   .3986 -         .3557 -  .1648™
          100        100      .1138 -     .3786 -      .3471 -       .1609™      .1133 -   .3900 -         .3486 - .1623 -
  K
          250        250       .1081™     .3414 -      .3164 -      .1644 -       .1092™   .3771 -        .3564 -  .1664 -
          500        500       .0968™
                                          .3200 -      .3007 -       .1575™       .0999™
                                                                                           .3557 -         .3250 -  .1590™
         1000       1000       .0865™
                                           .2871™       .2729™       .1493™       .0909™
                                                                                           .3329 -         .3064 -  .1511™
         .9900       49       .1364 -     .4214˜       .3550 -       .1902˜       .1385˜    .4371˜œ
                                                                                                           .3743˜   .1921˜
         .9500       68       .1352 -      .4171˜      .3586 -      .1912˜       .1386˜    .4500˜ œ
                                                                                                           .3836˜  .1932˜
         .8000       95       .1318 -     .4000 -      .3536 -      .1892 -      .1365 -    .4443˜œ
                                                                                                          .3871˜   .1924 -
  g
         .5000       151      .1196 -     .3814 -      .3393 -      .1808 -      .1226 -   .4043 -         .3550 - .1813 -
         .3333       237       .1085™
                                          .3500 -      .3000 -      .1707 -       .1121™   .3857 -         .3364 - .1734 -
         .1000       711       .0864™
                                           .2871™       .2621™
                                                                     .1461™
                                                                                  .0909™
                                                                                           .3357 -         .2964 -  .1487™

         .9900       42       .1342 -     .4043 -      .3414 -      .1865 -       .1375˜    .4371˜œ
                                                                                                           .3700˜   .1897˜
         .9500       51       .1371 -      .4214˜      .3586 -       .1903˜       .1417˜œ
                                                                                            .4500˜œ
                                                                                                           .3864˜œ
                                                                                                                    .1924˜œ

         .8000       81       .1384˜      .4229˜       .3614 -       .1921˜      .1427˜ œ
                                                                                           .4629˜ œ
                                                                                                           .3871˜œ
                                                                                                                   .1961˜  œ
  θ
         .5000       91       .1367 -     .4057 -      .3571 -       .1919˜       .1397˜    .4400˜œ
                                                                                                           .3829˜   .1937˜
         .3333       109      .1375 -     .4129 -      .3636 -      .1933˜        .1404˜    .4500˜œ
                                                                                                           .3907˜œ
                                                                                                                    .1949˜œ

         .1000       130      .1314 -     .4100 -      .3629 -      .1866 -      .1370 -    .4371˜         .3843˜   .1922˜
  image-only         —         .0058™
                                           .0486™
                                                        .0479™
                                                                     .0352™
                                                                                  .0112™
                                                                                            .0871™
                                                                                                           .0886™
                                                                                                                    .0415™
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   17


                                                         Results

 Static thresholding improves initial Precision at the cost of MAP and bpref.

 Dynamic thresholding on Precision or prel does not have this drawback.

 The level of static and precision thresholds influences greatly the effectiveness;
  unsuitable choices (e.g. too loose) lead to a degraded performance.

 Prel thresholds are much more robust in this respect.

 Better CBIR at the second stage leads to overall improvements, but the
  thresholding type seems more important: While the two CBIR methods vary
  greatly in performance (the best has almost double the effectiveness of the
  other), static thresholding is not influenced much by this choice. Dynamic
  methods benefit more from improved CBIR.
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   18


                                                         Results
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   20


                                                    Conclusions

 Two-stage retrieval with dynamic thresholding:
     more effective  robust than static thresholding
     practically insensitive to a wide range of choices for the optimization measure
     beats significantly the text-only  several image-only baselines

 Two-stage retrieval, irrespective of thresholding type:
     efficiency benefit:
      ¦ it cuts down greatly on expensive image operations
      ¦ on average, only 2 to 5 in 10,000 images had to be scored
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   21


                                               Further Research

 Compare two-stage retrieval to fusion. [Did this; check ECIR11 poster]
     Both methods work better than text-only  image-only;
      two-stage is slightly better than fusion, but the difference is non-significant.
     Both methods are robust in different ways:
      ¦ Fusion provides less variability across topics but it is sensitive to the
        weighing parameter of the contributing media.
      ¦ Two-stage provides a much lower sensitivity to its thresholding parameter
        but has a higher variability across topics.

 Try two-stage retrieval with other type of image descriptors,
  e.g. the visual codebook approach (TOP-SURF).

 Generalization to multi-stage retrieval, where rankings for the media are
  successively being thresholded and re-ranked with respect to a media hierarchy.
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases   Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis   22




                                                  avi@ee.duth.gr



                                                       Thank you !
http://www.mmretrieval.net

More Related Content

PPTX
MultiModal Retrieval Image
PPTX
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
PPTX
Developing Document Image Retrieval System
PPTX
Automatic Image Annotation
PPTX
Scene Text Detection on Images using Cellular Automata
PPT
Color reduction using the combination of the kohonen self organized feature m...
PDF
Test PDF
PDF
IRJET - Object Detection using Deep Learning with OpenCV and Python
MultiModal Retrieval Image
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
Developing Document Image Retrieval System
Automatic Image Annotation
Scene Text Detection on Images using Cellular Automata
Color reduction using the combination of the kohonen self organized feature m...
Test PDF
IRJET - Object Detection using Deep Learning with OpenCV and Python

What's hot (20)

PDF
An adaptive-model-for-blind-image-restoration-using-bayesian-approach
PDF
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
PDF
Btv thesis defense_v1.02-final
PDF
TMS workshop on machine learning in materials science: Intro to deep learning...
PDF
Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
PDF
Artem Baklanov - Votes Aggregation Techniques in Geo-Wiki Crowdsourcing Game:...
PDF
Optimized Neural Network for Classification of Multispectral Images
PDF
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
PDF
Secure Multi-Party Computation Based Privacy Preserving Extreme Learning Mach...
PDF
Volume 2-issue-6-1930-1932
PDF
A0360109
PDF
184816386 x mining
PDF
Bhadale group of companies ai neural networks and algorithms catalogue
PDF
Indoor Point Cloud Processing
PDF
Fractional step discriminant pruning
PPTX
Decision Transformer: Reinforcement Learning via Sequence Modeling
PDF
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
PPT
Tools for Image Retrieval in Large Multimedia Databases
PDF
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
PPT
Radial Thickness Calculation and Visualization for Volumetric Layers-8397
An adaptive-model-for-blind-image-restoration-using-bayesian-approach
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Btv thesis defense_v1.02-final
TMS workshop on machine learning in materials science: Intro to deep learning...
Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
Artem Baklanov - Votes Aggregation Techniques in Geo-Wiki Crowdsourcing Game:...
Optimized Neural Network for Classification of Multispectral Images
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
Secure Multi-Party Computation Based Privacy Preserving Extreme Learning Mach...
Volume 2-issue-6-1930-1932
A0360109
184816386 x mining
Bhadale group of companies ai neural networks and algorithms catalogue
Indoor Point Cloud Processing
Fractional step discriminant pruning
Decision Transformer: Reinforcement Learning via Sequence Modeling
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Tools for Image Retrieval in Large Multimedia Databases
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
Radial Thickness Calculation and Visualization for Volumetric Layers-8397
Ad

Viewers also liked (18)

PPTX
Comparative Performance Evaluation of Image Descriptors Over IEEE 802.11b Noi...
PPTX
Content and Metadata Based Image Document Retrieval (in Greek)
PPTX
Handwritten and Machine Printed Text Separation in Document Images using the ...
PPTX
Svm based cbir of breast masses on mammograms
PPTX
Text extraction using document structure features and support vector machines
PPTX
Query expansion based on visual content new
PDF
A Review of Feature Extraction Techniques for CBIR based on SVM
DOC
PPTX
Segmentation - based Historical Handwritten Word Spotting using document-spec...
PPTX
Cbir final ppt
PDF
Literature Review on Content Based Image Retrieval
PPTX
Content based image retrieval using clustering Algorithm(CBIR)
PPTX
Content Based Image Retrieval
PDF
PDF
Content based image retrieval (cbir) using
PDF
Content Based Image Retrieval
PDF
Faro Visual Attention For Implicit Relevance Feedback In A Content Based Imag...
PPT
Content based image retrieval(cbir)
Comparative Performance Evaluation of Image Descriptors Over IEEE 802.11b Noi...
Content and Metadata Based Image Document Retrieval (in Greek)
Handwritten and Machine Printed Text Separation in Document Images using the ...
Svm based cbir of breast masses on mammograms
Text extraction using document structure features and support vector machines
Query expansion based on visual content new
A Review of Feature Extraction Techniques for CBIR based on SVM
Segmentation - based Historical Handwritten Word Spotting using document-spec...
Cbir final ppt
Literature Review on Content Based Image Retrieval
Content based image retrieval using clustering Algorithm(CBIR)
Content Based Image Retrieval
Content based image retrieval (cbir) using
Content Based Image Retrieval
Faro Visual Attention For Implicit Relevance Feedback In A Content Based Imag...
Content based image retrieval(cbir)
Ad

Similar to Dynamic Two-Stage Image Retrieval from Large Multimodal Databases (20)

PDF
Mri brain image retrieval using multi support vector machine classifier
PDF
Multimedia Databases: Performance Measure Benchmarking Model (PMBM) Framework
PDF
Ijaems apr-2016-16 Active Learning Method for Interactive Image Retrieval
PDF
Content-Based Image Retrieval by Multi-Featrus Extraction and K-Means Clustering
PDF
A Survey on Techniques Used for Content Based Image Retrieval
PDF
Global Descriptor Attributes Based Content Based Image Retrieval of Query Images
PDF
Efficient CBIR Using Color Histogram Processing
PDF
A Comparative Study of Content Based Image Retrieval Trends and Approaches
PDF
Volume 2-issue-6-2077-2080
PDF
Volume 2-issue-6-2077-2080
PDF
An Unsupervised Cluster-based Image Retrieval Algorithm using Relevance Feedback
PDF
Dc31472476
PDF
A Survey on Image retrieval techniques with feature extraction
PDF
Et35839844
PDF
Ijcet 06 10_004
PDF
Relevance feedback a novel method to associate user subjectivity to image
PDF
Feature extraction techniques on cbir a review
PDF
Survey on Multiple Query Content Based Image Retrieval Systems
Mri brain image retrieval using multi support vector machine classifier
Multimedia Databases: Performance Measure Benchmarking Model (PMBM) Framework
Ijaems apr-2016-16 Active Learning Method for Interactive Image Retrieval
Content-Based Image Retrieval by Multi-Featrus Extraction and K-Means Clustering
A Survey on Techniques Used for Content Based Image Retrieval
Global Descriptor Attributes Based Content Based Image Retrieval of Query Images
Efficient CBIR Using Color Histogram Processing
A Comparative Study of Content Based Image Retrieval Trends and Approaches
Volume 2-issue-6-2077-2080
Volume 2-issue-6-2077-2080
An Unsupervised Cluster-based Image Retrieval Algorithm using Relevance Feedback
Dc31472476
A Survey on Image retrieval techniques with feature extraction
Et35839844
Ijcet 06 10_004
Relevance feedback a novel method to associate user subjectivity to image
Feature extraction techniques on cbir a review
Survey on Multiple Query Content Based Image Retrieval Systems

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Approach and Philosophy of On baking technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Electronic commerce courselecture one. Pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation theory and applications.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
A Presentation on Artificial Intelligence
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Review of recent advances in non-invasive hemoglobin estimation
Empathic Computing: Creating Shared Understanding
Chapter 3 Spatial Domain Image Processing.pdf
Network Security Unit 5.pdf for BCA BBA.
Approach and Philosophy of On baking technology
Dropbox Q2 2025 Financial Results & Investor Presentation
MYSQL Presentation for SQL database connectivity
Electronic commerce courselecture one. Pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation theory and applications.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Unlocking AI with Model Context Protocol (MCP)
Encapsulation_ Review paper, used for researhc scholars
A Presentation on Artificial Intelligence
The Rise and Fall of 3GPP – Time for a Sabbatical?
Review of recent advances in non-invasive hemoglobin estimation

Dynamic Two-Stage Image Retrieval from Large Multimodal Databases

  • 1. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis Konstantinos Zagoris Savvas Chatzichristofis Department of Electrical and Computer Engineering Democritus University of Thrace Xanthi 67100, Greece ECIR 2011
  • 2. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 2 Outline 1. Content-based Image Retrieval (CBIR): global vs local features, scalability 2. A problem of global features: The Red Tomato / Red Pie-Chart Problem 3. Multimodal Information Collections 4. Two-stage Image Retrieval from Multimedia Databases related work our main contributions 5. Experiments on the ImageCLEF 2010 Wikipedia Test Collection
  • 3. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 3 Content-based Image Retrieval (CBIR) In CBIR, images are represented by either local features: ¦ computed at multiple image points; capable of recognizing objects ¦ computationally expensive, e.g. due to high dimensionality global features: ¦ generalize images with single vectors capturing color, texture, shape, etc. ¦ computationally cheap (relatively) ¥ thus, more popular CBIR with either global or local features does not scale up well to large databases efficiency-wise
  • 4. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 4 CBIR with Global Features notoriously noisy for image queries of low generality (generality = the fraction of relevant images in a collection) In text retrieval: documents matching no query keywords are not retrieved In image retrieval: CBIR will rank the whole collection ¦ The Red Tomato / Red Pie-Chart Problem [next]
  • 5. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 5 The Red Tomato / Red Pie-Chart Problem Image query: Collection items: If the query has a low generality early ranks may be dominated by spurious results (such as the pie-chart) possibly ranked even before red tomatoes on non-white backgrounds
  • 7. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 7 Contemporary Information Collections Large Multimodal (= provide different manners of retrieval ) A good example is Wikipedia: Large: 3,605,426 articles (English version alone), “millions of images”, etc. Multimodal: Topics are covered in several languages. Topics may include non-textual media (image, sound, video, etc.) annotated in a variety of metadata fields (caption, description, etc.) Thus, there are several ways to get to the same info/topic.
  • 8. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 8 Image Retrieval from Multimodal Databases In image retrieval systems, users are assumed to target visual similarity. Thus, Image is the primary medium. image modalities are most important than others Modalities from other media are secondary. but they can still help with the problems of CBIR: ¦ low generality (Red Tomato/Pie-Chart) problem (esp. for global feats.) ¦ speed Traditionally, fusion has been used, but: weighing of media is not trivial; usually requires training data not theoretically sound: influence of textual scores may worsen the visual quality
  • 9. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 9 Two-stage Image Retrieval from Multimedia Databases Key idea for improving CBIR effectiveness: Before CBIR, raise query generality by reducing collection size via filtering. Assumptions: Query is expressed in the primary medium i.e. image, accompanied by a query in a secondary medium, e.g. text. Two stage retrieval: Rank the collection by the secondary medium/query, e.g. text. Draw a rank threshold K. Re-rank only the top-K items with CBIR. Using a ‘cheaper’ secondary medium, we also improve speed.
  • 10. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 10 Previous Related Work Best-results re-ranking by visual content has been seen before, but for other purposes, e.g. result clustering, diversity, . . . using external info (e.g. sets of images), or training data, . . . They all used global features a static predefined threshold for all queries Effectiveness results have been mixed, while some didn’t provide a comparative evaluation.
  • 11. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 11 Main Contributions In view of the related literature, our main contributions are: dynamic thresholds per query no external information no training data extensive evaluation of thresholding types and levels
  • 12. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 12 The ImageCLEF 2010 Wikipedia Test Collection 237,434 items Primary medium: image ¦ Heterogeneous: color natural images, graphics, greyscale images, etc. ¦ variety of image sizes ¦ It’s a large benchmark image database for today’s standards. + noisy and incomplete user-supplied annotations + wikipedia articles containing the images Annotations articles come in 3 languages: En, Fr, De. 70 test topics Visual part: ¦ 1 or more example images Textual part: ¦ 3 titles fields, one per language (En, Fr, De) Topics are assessed by visual similarity.
  • 13. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 13 Indexing and Retrieval text: only English annotations + English query Lemur Toolkit V4.11 / Indri V2.11 ¦ tf.idf retrieval model (works better with our thresholding methods) ¦ default settings + Krovetz stemmer image: Joint Composite Descriptor (JCD) ¦ captures color and texture information ¦ developed for color natural images Spatial Color Distribution (SpCD) ¦ captures color and its spatial distribution ¦ more suitable for colored graphics (fewer colors, less texture) JCD SpCD are found to be more effective than MPEG-7 descriptors.
  • 14. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 14 Thresholding and Re-ranking Static Thresholding: a fixed pre-selected rank threshold K for all topics K 25, 50, 100, 250, 500, 1000. Dynamic Thresholding: a variable rank threshold per topic Score-Distributional Threshold Optimization (SDTO) [Arampatzis et.al., ”Where to Stop Reading a Ranked-list?”, SIGIR 2009] Two types: ¦ Threshold on Precision: g 0.990, 0.950, 0.800, 0.500, 0.330, 0.100 ¦ Threshold on prel: θ 0.990, 0.950, 0.800, 0.500, 0.330, 0.100
  • 15. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 15 Initial Experiments: Setting a Baseline item scoring by MAP P@10 P@20 bpref JCD1 .0058 .0486 .0479 .0352 maxi JCDi .0072 .0614 .0614 .0387 maxi JCDi + maxi SpCDi .0112 .0871 .0886 .0415 tf.idf (text-only) .1293 .3614 .3314 .1806 Image-only runs provide very weak baselines. We chose the text-only run as a baseline for the statistical significance testing. This makes sense also from an efficiency point of view: if using a secondary text modality for image retrieval is more effective than current CBIR methods, then there is no reason at all for using computationally costly CBIR methods.
  • 16. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 16 Results threshold K€ MAP P@10 JCD1 P@20 bpref MAP maxi JCDi + P@10 maxi SpCDi P@20 bpref text-only — .1293 .3614 .3314 .1806 .1293 .3614 .3314 .1806 25 25 .1162™ .3957 - .3457˜ .1641™ .1168 - .3943 - .3436 ˜ .1659™ 50 50 .1144™ .3829 - .3579˜ .1608™ .1154 - .3986 - .3557 - .1648™ 100 100 .1138 - .3786 - .3471 - .1609™ .1133 - .3900 - .3486 - .1623 - K 250 250 .1081™ .3414 - .3164 - .1644 - .1092™ .3771 - .3564 - .1664 - 500 500 .0968™ .3200 - .3007 - .1575™ .0999™ .3557 - .3250 - .1590™ 1000 1000 .0865™ .2871™ .2729™ .1493™ .0909™ .3329 - .3064 - .1511™ .9900 49 .1364 - .4214˜ .3550 - .1902˜ .1385˜ .4371˜œ .3743˜ .1921˜ .9500 68 .1352 - .4171˜ .3586 - .1912˜ .1386˜ .4500˜ œ .3836˜ .1932˜ .8000 95 .1318 - .4000 - .3536 - .1892 - .1365 - .4443˜œ .3871˜ .1924 - g .5000 151 .1196 - .3814 - .3393 - .1808 - .1226 - .4043 - .3550 - .1813 - .3333 237 .1085™ .3500 - .3000 - .1707 - .1121™ .3857 - .3364 - .1734 - .1000 711 .0864™ .2871™ .2621™ .1461™ .0909™ .3357 - .2964 - .1487™ .9900 42 .1342 - .4043 - .3414 - .1865 - .1375˜ .4371˜œ .3700˜ .1897˜ .9500 51 .1371 - .4214˜ .3586 - .1903˜ .1417˜œ .4500˜œ .3864˜œ .1924˜œ .8000 81 .1384˜ .4229˜ .3614 - .1921˜ .1427˜ œ .4629˜ œ .3871˜œ .1961˜ œ θ .5000 91 .1367 - .4057 - .3571 - .1919˜ .1397˜ .4400˜œ .3829˜ .1937˜ .3333 109 .1375 - .4129 - .3636 - .1933˜ .1404˜ .4500˜œ .3907˜œ .1949˜œ .1000 130 .1314 - .4100 - .3629 - .1866 - .1370 - .4371˜ .3843˜ .1922˜ image-only — .0058™ .0486™ .0479™ .0352™ .0112™ .0871™ .0886™ .0415™
  • 17. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 17 Results Static thresholding improves initial Precision at the cost of MAP and bpref. Dynamic thresholding on Precision or prel does not have this drawback. The level of static and precision thresholds influences greatly the effectiveness; unsuitable choices (e.g. too loose) lead to a degraded performance. Prel thresholds are much more robust in this respect. Better CBIR at the second stage leads to overall improvements, but the thresholding type seems more important: While the two CBIR methods vary greatly in performance (the best has almost double the effectiveness of the other), static thresholding is not influenced much by this choice. Dynamic methods benefit more from improved CBIR.
  • 18. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 18 Results
  • 20. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 20 Conclusions Two-stage retrieval with dynamic thresholding: more effective robust than static thresholding practically insensitive to a wide range of choices for the optimization measure beats significantly the text-only several image-only baselines Two-stage retrieval, irrespective of thresholding type: efficiency benefit: ¦ it cuts down greatly on expensive image operations ¦ on average, only 2 to 5 in 10,000 images had to be scored
  • 21. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 21 Further Research Compare two-stage retrieval to fusion. [Did this; check ECIR11 poster] Both methods work better than text-only image-only; two-stage is slightly better than fusion, but the difference is non-significant. Both methods are robust in different ways: ¦ Fusion provides less variability across topics but it is sensitive to the weighing parameter of the contributing media. ¦ Two-stage provides a much lower sensitivity to its thresholding parameter but has a higher variability across topics. Try two-stage retrieval with other type of image descriptors, e.g. the visual codebook approach (TOP-SURF). Generalization to multi-stage retrieval, where rankings for the media are successively being thresholded and re-ranked with respect to a media hierarchy.
  • 22. Dynamic Two-Stage Image Retrieval from Large Multimodal Databases Avi Arampatzis, Konstantinos Zagoris, Savvas Chatzichristofis 22 avi@ee.duth.gr Thank you !