Advances in Image Search and Retrieval

Advances in
Image Search and Retrieval

Oge Marques

Florida Atlantic University

Boca Raton, FL - USA

Take-home message

•  Visual Information Retrieval (VIR) is a fascinating
research ﬁeld with many open challenges and
opportunities which have the potential to impact
the way we organize, annotate, and retrieve visual
data (images and videos).

•  In this tutorial we present some of the latest and
most representative advances in image search and
retrieval.

Disclaimer #1

•  Visual Information Retrieval (VIR) is a highly
interdisciplinary ﬁeld, but …

Visual
Information
Retrieval
Image and
Video
Processing
(Multimedia)
Database
Systems
Information
Retrieval
Machine
Learning
Computer
Vision
Data Mining
Visual data
modeling and
representation
Human Visual
Perception

Disclaimer #2

•  There are many things that I believe…

•  … but cannot prove

Background and Motivation

What is it that we’re trying to do
and
why is it so difﬁcult?

– Taking pictures and storing, sharing, and publishing
them has never been so easy and inexpensive.

– If only we could say the same about ﬁnding the images
we want and retrieving them…


The “big mismatch”


•  Q: What do you do when you need to find an image
(on the Web)?
•  A1: Google (image search), of course!


Google image search results for “sydney opera house”


Google image search results for “opera”


•  Q: What do you do when you need to find an
image (on the Web)?
•  A2: Other (so-called specialized) image search
engines
•  http://images.search.yahoo.com/
•  http://pictures.ask.com
•  http://www.bing.com/images
•  http://pixsy.com/


•  Q:What do you do when you need to ﬁnd an
image (on the Web)?

•  A3: Search directly on large photo repositories:

– Flickr

– Webshots

– Shutterstock


Flickr image search results for “opera”


Webshots image search results for “opera”


Shutterstock image search results for “opera”


•  Are you happy with the results so far?


•  Back to our original (two-part) question:

– What is it that we’re trying to do?

– We are trying to create
automated solutions to the problem of
finding and retrieving visual information,
from (large, unstructured) repositories,
in a way that satisfies search criteria specified by users,
relying (primarily) on the visual contents of the media.


•  Why is it so difﬁcult?

•  There are many challenges, among them:

– The elusive notion of similarity

– The semantic gap

– Large datasets and broad domains

– Combination of visual and textual information

– The users (and how to make them happy)

Outline

•  Part I – Core concepts, techniques, and tools

– Design, implementation, and evaluation aspects

•  Part II – Medical image retrieval

– Challenges, resources, and opportunities

•  Part III – Applications and related areas

– Mobile visual search, social networks, and more

•  Part IV – Where is image search headed?

– Advice for young researchers

Part I

Core concepts, techniques, and
tools

Core concepts, techniques, and tools

•  Design

– Challenges

– Principles

– Concepts

•  Implementation

– Languages and tools

•  Evaluation

– Datasets

– Benchmarks

Design challenges

•  Capturing and measuring similarity

•  Semantic gap (and other gaps)

•  Large datasets and broad domains

•  Users’ needs and intentions

•  Growing up (as a ﬁeld)

The elusive notion of similarity

•  Are these two images similar?


•  Is the second or the third image more similar to
the ﬁrst?


•  Which image ﬁts better to the ﬁrst two: the third
or the fourth?

The semantic gap

•  The semantic gap is the lack of coincidence
between the information that one can extract
from the visual data and the interpretation that
the same data have for a user in a given situation.

•  The pivotal point in content-based retrieval is that the user
seeks semantic similarity, but the database can only provide
similarity by data processing.This is what we called the
semantic gap. [Smeulders et al., 2000]

Google sort by subject

http://www.google.com/landing/imagesorting/

Google image swirl

http://image-swirl.googlelabs.com/

How I see it…

•  The semantic gap problem has not been solved (and
maybe will never be…)

•  What are the alternatives?

–  Treat visual similarity and semantic relatedness differently

•  Examples:Alipr, Google (or Bing) similarity search, etc.

–  Improve both (text-based and visual) search methods
independently

–  Combine visual and textual information in a meaningful
way

–  Trust the user

•  Collaborative ﬁltering, crowdsourcing, games.

•  But, wait… There
are other gaps!

– Just when you
thought the
semantic gap was
your only
problem…

Source: [Deserno, Antani, and Long, 2009]

Large datasets and broad domains

•  Large datasets bring additional challenges in all
aspects of the system:

– Storage requirements: images, metadata, and “visual
signatures”

– Computational cost of indexing, searching, retrieving,
and displaying images

– Network and latency issues

Large datasets and broad domains

Challenge: users’ needs and intentions

•  Users and developers have quite different views

•  Cultural and contextual information should be
taken into account

•  User intentions are hard to infer

– Privacy issues

– Users themselves don’t always know what they want

– Who misses the MS Ofﬁce paper clip?


•  The user’s
perspective

– What do they
want?

– Where do
they want to
search?

– In what form
do they
express their
query?


•  The image
retrieval system
should be able to
be mindful of:

–  How users wish
the results to be
presented

–  Where users
desire to search

–  The nature of
user input/
interaction.


•  Each application has
different users (with
different intent, needs,
background, cultural bias,
etc.) and different visual
assets.

Challenge: growing up (as a ﬁeld)

•  It’s been 10 years since the “end of the early years”

–  Are the challenges from 2000 still relevant?

–  Are the directions and guidelines from 2000 still
appropriate?

–  Have we grown up (at all)?

–  Let’s revisit the ‘Concluding Remarks’ from that paper…

Revisiting [Smeulders et al. 2000]

What they said

•  Driving forces

–  “[…] content-based image
retrieval (CBIR) will continue
to grow in every direction:
new audiences, new purposes,
new styles of use, new modes
of interaction, larger data sets,
and new methods to solve
the problems.”

How I see it

•  Yes, we have seen many new
audiences, new purposes, new
styles of use, and new modes
of interaction emerge.

•  Each of these usually requires
new methods to solve the
problems that they bring.

•  However, not too many
researchers see them as a
driving force (as they should).


What they said

•  Heritage of computer vision

–  “An important obstacle to
overcome […] is to realize
that image retrieval does not
entail solving the general
image understanding
problem.”

How I see it

•  I’m afraid I have bad news…

–  Computer vision hasn’t made
so much progress during the
past 10 years.

–  Some classical problems
(including image
understanding)
remain unresolved.

–  Similarly, CBIR from a
pure computer vision
perspective didn’t work
too well either.


What they said

•  Inﬂuence on computer
vision

–  “[…] CBIR offers a different
look at traditional computer
vision problems: large data
sets, no reliance on strong
segmentation, and revitalized
interest in color image
processing and invariance.”

How I see it

•  The adoption of large data sets
became standard practice in
computer vision.

•  No reliance on strong
segmentation (still unresolved) led
to new areas of research, e.g.,
automatic ROI extraction and RBIR.

•  Color image processing and color
descriptors became incredibly
popular, useful, and (to some
degree) effective.

•  Invariance still a huge problem

–  But it’s cheaper than ever to have
multiple views.


What they said

•  Similarity and learning

–  “We make a pledge for the
importance of human- based
similarity rather than general
similarity.Also, the connection
between image semantics,
image data, and query context
will have to be made clearer
in the future.”

–  “[…] in order to bring
semantics to the user, learning
is inevitable.”

How I see it

•  The authors were pointing in the
right direction (human in the
loop, role of context, beneﬁts
from learning,…)

•  However:

–  Similarity is a tough problem to
crack and model.

•  Even the understanding of how
humans judge image similarity is
very limited.

–  Machine learning is almost
inevitable…

•  … but sometimes it can be
abused.


What they said

•  Interaction

–  Better visualization options,
more control to the user,
ability to provide feedback
[…]

How I see it

•  Signiﬁcant progress on
visualization interfaces and
devices.

•  Relevance Feedback: still a
very tricky tradeoff (effort
vs. perceived beneﬁt), but
more popular than ever
(rating, thumbs up/down,
etc.)


What they said

•  Need for databases

–  “The connection between
CBIR and database research is
likely to increase in the
future. […] problems like the
definition of suitable query
languages, efficient search in
high dimensional feature
space, search in the presence
of changing similarity
measures are largely unsolved
[…]”

How I see it

•  Very little progress

–  Image search and retrieval has
benefited much more from
document information
retrieval than from database
research.


What they said

•  The problem of evaluation

–  CBIR could use a reference
standard against which new
algorithms could be evaluated
(similar to TREC in the ﬁeld of
text recognition).

–  “A comprehensive and publicly
available collection of images,
sorted by class and retrieval
purposes, together with a
protocol to standardize
experimental practices, will be
instrumental in the next phase
of CBIR.”

How I see it

•  Signiﬁcant progress on
benchmarks, standardized
datasets, etc.

–  ImageCLEF

–  PascalVOC Challenge

–  MSRA dataset

–  Simplicity dataset

–  UCID dataset and ground truth
(GT)

–  Accio / SIVAL dataset and GT

–  Caltech 101, Caltech 256

–  LabelMe


What they said

•  Semantic gap and other
sources

–  “A critical point in the
advancement of CBIR is the
semantic gap, where the
meaning of an image is rarely
self-evident. […] One way to
resolve the semantic gap
comes from sources outside
the image by integrating other
sources of information about the
image in the query.”

How I see it

•  The semantic gap problem
has not been solved (and
maybe will never be…)

•  But the idea about using
other sources was right on
the spot!

–  Geographical context

–  Social networks

–  Tags

Visual Information Retrieval (VIR)

Query / Search
Engine
User
User interface (Querying, Browsing,
Viewing)
Digital Image and
Video Archive
Visual summaries Indexes
Digitization +
Compression
Cataloguing / Feature
extraction
Image or Video

Designing aVIR system: a mind map

Tools and resources

•  Visual descriptors and machine learning algorithms
have become commodities.

•  Examples of publicly available implementation and
tools:

– Visual descriptors:

•  img(Rummager) by Savvas Chatzichristoﬁs

•  Caliph Emir and Lire by Mathias Lux

– Machine Learning:

•  Weka

Part II

Medical Image Retrieval

Medical image retrieval

•  Challenges

– We’re entering a new country…

•  How much can we bring?

•  Do we speak the language?

•  Do we know their culture?

•  Do they understand us and where we come from?

•  Opportunities

– They use images (extensively)

– They have expert knowledge

– Domains are narrow (almost by deﬁnition)

– Fewer clients, but potentially more $$

Medical image retrieval

•  Selected challenges:

– Different terminology

– Standards

– Modality dependencies

•  Other challenges:

– Equipment dependencies

– Privacy issues

– Proprietary data

Different terminology

•  Be prepared for:

– New acronyms

•  CBMIR (Content-Based Medical Image Retrieval)

•  PACS (Picture Archiving and Communication System)

•  DICOM (Digital Imaging and COmmunication in Medicine)

•  Hospital Information Systems (HIS)

•  Radiological Information Systems (RIS)

– New phrases

•  Imaging informatics

– Lots of technical medical terms

Standards

•  DICOM (http://medical.nema.org/)

–  Global IT standard, created in 1993, used in virtually all
hospitals worldwide.

–  Designed to ensure the interoperability of different
systems and manage related workﬂow.

–  Will be required by all EHR systems that include imaging
information as an integral part of the patient record.

–  750+ technical and medical experts participate in 20+
active DICOM working groups.

–  Standard is updated 4-5 times per year.

–  Many available tools! (see http://www.idoimaging.com/)

Medical image modalities

•  The IRMA code [Lehmann et al., 2003]

–  4 axes with 3 to 4 positions, each in {0,...9,a,...,z}, where 0
denotes unspeciﬁed to determine the end of a path along an
axis.

•  Technical code (T) describes the imaging modality

•  Directional code (D) models body orientations

•  Anatomical code (A) refers to the body region examined

•  Biological code (B) describes the biological system
examined.


•  The IRMA code [Lehmann et al., 2003]

–  The entire code results in a character string of 14
characters (IRMA:TTTT – DDD – AAA – BBB).

Example: “x-ray, projection radiography,
analog, high energy – sagittal, left lateral
decubitus, inspiration – chest, lung –
respiratory system, lung”
Source: [Lehmann et al., 2003]


•  The IRMA code
[Lehmann et al.,
2003]

–  The companion
tool…

Source: [Lehmann et al., 2004]

CBMIR vs. text-based MIR

•  Most current retrieval systems in clinical use rely on
text keywords such as DICOM header information to
perform retrieval.

•  CBIR has been widely researched in a variety of
domains and provides an intuitive and expressive
method for querying visual data using features, e.g.
color, shape, and texture.

•  However, current CBIR systems:

–  are not easily integrated into the healthcare environment;

–  have not been widely evaluated using a large dataset; and

–  lack the ability to perform relevance feedback to reﬁne
retrieval results.

Source: [Hsu et al., 2009]

Who are the main players?

•  USA

– NIH (National Institutes of Health)

•  NIBIB - National Institute of Biomedical Imaging and
Bioengineering

•  NCI - National Cancer Institute

•  NLM – National Libraries of Medicine

– Several universities and hospitals

•  Europe

– Aachen University (Germany)

– Geneva University (Switzerland)

•  Big companies (Siemens, GE, etc.)

Medical image retrieval systems: examples

•  IRMA (Image Retrieval in Medical Applications)

–  Aachen University (Germany)

•  http://ganymed.imib.rwth-aachen.de/irma/

–  3 online demos:

•  IRMA Query demo: allows the evaluation of CBIR on several
databases.

•  IRMA Extended Query Reﬁnement demo: CBIR from the IRMA
database (a subset of 10,000 images).

•  Spine Pathology and Image Retrieval Systems (SPIRS) designed by the
NLM/NIH (USA): holds information of ~17,000 spine x-rays.


•  MedGIFT (GNU Image Finding Tool)

– Geneva University (Switzerland)

•  http://www.sim.hcuge.ch/medgift/

– Large effort, including projects such as:

•  Talisman (lung image retrieval)

•  Case-based fracture image retrieval system

•  Onco-Media: medical image retrieval + grid computing

•  ImageCLEF: evaluation and validation

•  medSearch


•  WebMIRS

– NIH / NLM (USA)

•  http://archive.nlm.nih.gov/proj/webmirs/index.php

– Query by text + navigation by categories

– Uses datasets and related x-ray images from the
National Health and Nutrition Examination Survey
(NHANES)


•  SPIRS (Spine Pathology Image Retrieval System):
Web-based image retrieval system for large
biomedical databases

– NIH / UCLA (USA)

– Representative case study on highly specialized CBMIR

Source: [Hsu et al., 2009]


•  National Biomedical Imaging Archive (NBIA)

– NCI / NIH (USA)

•  https://imaging.nci.nih.gov/

– Search based on metadata (DICOM ﬁelds)

– 3 search options:

•  Simple

•  Advanced

•  Dynamic


•  ARSS Goldminer

– American Roentgen Ray Society (USA)

•  http://goldminer.arrs.org/

– Query by text

– Results can be ﬁltered by:

•  Modality

•  Age

•  Sex


•  Yottalook Images

–  iVirtuoso (USA)

•  http://www.yottalook.com/

–  Developed and maintained by four radiologists

–  Query by text

–  Claims to use 4 “core technologies”:

•  natural query analysis”

•  semantic ontology”

•  “relevance algorithm”

•  a specialized content delivery system that provides high yield
content based on the search term.

Evaluation: ImageCLEF Medical Image Retrieval

•  ImageCLEF Medical Image
Retrieval

•  http://www.imageclef.org/2011/medical

– Dataset: 77,000+ images from articles published in
medical journals including text of the captions and link
to the html of the full text articles.

– 3 types of tasks:

•  Modality Classiﬁcation: given an image, return its modality

•  Ad-hoc retrieval: classic medical retrieval task, with 3
“ﬂavors”: textual, mixed and semantic queries

•  Case-based retrieval: retrieve cases including images that
might best suit the provided case description.


•  ImageCLEF Medical Image Retrieval 2011

– Modality Classiﬁcation



– Modality Classification – FAU Team

•  Personnel: 4 grad students + 2 undergrads + advisor

•  Strategy:

–  Textual classification using Lucene and associated tools and libraries

–  Visual classification using 8 contemporary descriptors and 3
different families of classifiers, implemented using Weka and
associated tools and libraries

•  Supporting tools:

–  Manual annotation tool (http://imageclef.mlab.ceecs.fau.edu/)

–  Training set visualization tool
(http://imageclef.mlab.ceecs.fau.edu/classification/)


•  ImageCLEF
Medical Image
Retrieval 2011

– Modality
Classiﬁcation
Results – FAU
Team (textual)



– Modality Classiﬁcation Results – FAU Team (visual)


•  ImageCLEF
Medical
Image
Retrieval
2011

•  Modality
Classiﬁcation
Results –
FAU Team
(visual)

Medical Image Retrieval: promising directions

•  Better user interfaces (responsive, highly interactive,
and capable of supporting relevance feedback)

•  New applications of CBMIR, including:

–  Teaching

–  Research

–  Diagnosis

–  PACS and Electronic Patient Records

•  CBMIR evaluation using medical experts

•  Integration of local and global features

•  New visual descriptors

Medical Image Retrieval: promising directions

•  New devices

Part III

Applications and related areas

Applications and related areas

•  New devices and services

•  Mobile visual search

•  Image search and retrieval in the age of social
networks

•  Games!

•  Other related areas

•  Our recent work (highlights)

New devices and services

•  Flickr (b. 2004)

•  YouTube (b. 2005)

•  Flip video cameras (b. 2006)

•  iPhone (b. 2007)

•  iPad (b. 2010)

Mobile visual search

•  Driving factors

– Capable devices

Source: http://www.apple.com/iphone/specs.html

1 GHz ARM
Cortex-A8
processor,
PowerVR
SGX535GPU,
Apple A4 chipset



– Motivated users: image taking and image sharing are
huge!

–  Source: http://www.onlinemarketing-trends.com/2011/03/facebook-photo-statistics-and-insights.html


•  Facebook for iPhone

–  Source: http://statistics.allfacebook.com/applications/single/facebook-for-iphone/6628568379/


•  Instagram: 2 million registered (although not necessarily
active) users, who upload ~300,000 photos per day

•  Several apps based on it!

–  http://iphone.appstorm.net/roundups/photography/5-cool-apps-for-getting-the-most-out-of-instagram/


•  Food photo
sharing!



– Legitimate (or not quite…) needs and use cases

–  Source: http://www.slideshare.net/dtunkelang/search-by-sight-google-goggles



– Smart phone market


•  Smart phone market

Source: http://www.cellular-news.com/story/48647.php?s=h


•  Examples of applications

– Google Goggles

– oMoby (and the IQ Engines API)

– Others (kooaba, Fetch!, Gazopa, etc.)


•  Google Goggles

– Android and iPhone

– Narrow-domain search and retrieval


•  oMoby (and the IQ Engines API)

– iPhone


•  oMoby (and the IQ Engines API)

Image search and retrieval social networks

•  The [so-called] Web 2.0 has brought about:

–  New data sources

–  New usage patterns

–  New understanding about the users, their needs,
habits, preferences

–  New opportunities

–  Lots of metadata!

–  A chance to experience a true paradigm shift

•  Before: image annotation is tedious, labor-intensive,
expensive

•  After: image annotation is fun!

Games!

◦  Google Image Labeler

◦  Games with a purpose (GWAP)

  The ESP Game

  Squigl

  Matchin

Other related areas

•  Semi-automatic image annotation

•  Tag recommendation systems

•  Story annotation engines

•  Content-based image ﬁltering

•  Copyright detection

•  Watermark detection

– and many more

Our recent work (highlights)

•  PRISM

– Image Genius

•  Unsupervised ROI extraction from an image

– Crazy Collage

•  MEDIX and associated tools

•  Callisto: a content-based tag recommendation
tool

PRISM

With Liam Mayron, Harris Corp., USA

Image Genius

With Asif Rahman, FAU, USA

Unsupervised ROI extraction

With
Gustavo B.
Borba and
Humberto
R. Gamba,
UTFPR,
Brazil

Crazy Collage

Gustavo B.
Borba et
al., UTFPR,
Brazil

MEDIX

•  Medical image retrieval system with DICOM
capabilities

With Asif Rahman, FAU, USA

Callisto

With Mathias Lux and
Arthur Pitman, Klagenfurt
University,Austria

Part IV

Where is image search headed?

Where is image search headed?

•  Advice for [young] researchers

– In this last part, I’ve compiled pieces and bits of advice
that I believe might help researchers who are entering
the ﬁeld.

– They focus on research avenues that I personally
consider to be the most promising.

Advice for [young] researchers

• LOOK

• THINK

• UNDERSTAND

• CREATE


• LOOK…

– at yourself (how do you search for images and videos?)

– around (related areas and how they have grown)

– at Google (and other major players)


• THINK…

– mobile devices

– new devices and services

– social networks

– games


• UNDERSTAND…

– human intentions and emotions

– the context of the search

– user’s preferences and needs


• CREATE…

– better interfaces

– better user experience

– new business opportunities (added value)

Concluding thoughts

–  I believe (but cannot prove…) that successfulVIR
solutions will:

•  combine content-based image retrieval (CBIR) with
metadata (high-level semantic-based image retrieval)

•  only be truly successful in narrow domains

•  include the user in the loop

– Relevance Feedback (RF)

– Collaborative efforts (tagging, rating, annotating)

•  provide friendly, intuitive interfaces

•  incorporate results and insights from cognitive science,
particularly human visual attention, perception, and
memory

Concluding thoughts

•  “Image search and retrieval” is not a problem, but
rather a collection of related problems that look like
one.

•  There is a great need for good solutions to speciﬁc
problems.

•  10 years after “the end of the early years”, research in
visual information retrieval still has many open
problems, challenges, and opportunities.

Thanks!

•  Questions?

•  For additional information: omarques@fau.edu

Advances in Image Search and Retrieval

More Related Content

What's hot (7)

Viewers also liked (20)

Similar to Advances in Image Search and Retrieval (20)

Recently uploaded (20)

Advances in Image Search and Retrieval