Multimedia searching

Digital shapes content-based
searching & retrieval
Web Science Course (Fall 2011)
Laura Papaleo
https://www.linkedin.com/in/laurapapaleo/
laura.papaleo@gmail.com

Outline
 Digital shapes definition
 Content-based retrieval basics
 Image retrieval
 Video retrieval
 3D model retrieval
2

Multimedia content
short introduction
Laura Papaleo | laura.papaleo@gmail.com

Image and Digital Image
 An image is an artifact that has a similar
appearance to some subject - usually a physical
object/person (wikipedia).
 Images may be two-dimensional (e.g.
photograph) or three-dimensional (statue,
hologram, …).
 2D Digital Image:
 Numeric representation of a two-dimensional
image. Without qualifications, the term "digital
image" usually refers to raster images also called
bitmap images
 3D Digital image (3D model):
 a mathematical representation of any three-
dimensional surface of object (either inanimate or
living)
4

Video and Digital Video
 Video is the technology of electronically maintain a
sequence of still images representing scenes in
motion.
 Digital video comprises a series of orthogonal bitmap
digital images (frames) displayed in rapid succession
at a constant rate.
5

In a more general sense: Digital Shapes
6
 Multidimensional media
characterized by a visual
appearance in a space of 2,
3, or more dimensions.
 Examples:
images, 3D models, videos,
animations, and so on.
 they can be acquired from
real environments/objects or
synthetically created.

How to describe a shape ?
7
 Geometry
 Detect relevant local
features
 Structure
 Organize them in a
structure
 Semantics
 Use the structure to detect
high-level features
(semantics)
perception
understanding
From the AIM@SHAPE FP7 NoE

What do we need to describe a shape ?
8
 Geometry
 shape descriptors based on
geometric representations (e.g.,
shape distributions, PCA, ..)
 Structure
 shape descriptors based on the
configuration of features (e.g.,
skeletons, Reeb graphs)
 Semantics
 shape ontologies and domain
conceptualization (e.g., metadata,
ontology, reasoners and inference)
From the AIM@SHAPE FP7
NoE

Digital shapes searching
Basics

Content-based retrieval (CBR)
 It is related to the problem of
searching for digital shapes in
large databases (as the web) using
their actual content
 First defined in 1992 by Kato et al. for
images (A sketch retrieval method for full
color image database-query by visual
example - Pattern Recognition).
 Known also as query by content (QBC)
and content-based visual information
retrieval (CBVIR)
 Techniques, tools and algorithms used
originate from statistics, pattern
recognition, signal processing, computer
vision, computer graphics, geometry
modeling and so on.
e.g. for images
10

Content-based retrieval (CBR)
 Content-based:
 the search related to the contents
of the digital shapes rather than the
metadata (keywords, tags, and/or
descriptions associated).
 The term 'content' is by itself
complex to be defined
 It might refer to colors, shapes,
textures, or any other information
that can be derived from.
 It is context-dependent
Similar “shape”
Different color
Different “semantic”
11

Why do we need efficient CBR systems?
 Filtering Digital Shapes based
on their actual content
 could provide better indexing
 could return more accurate
results
 could support in avoiding
ambiguity
 could fill the gap between
content providers and user needs
 Could be in support for
multimodal indexing and
searching (text-based + content-
based + different heuristics)
Color
features
Texture
features
Shape
features
Spatial
layout
Content
retrieval
12

Why do we need efficient CBR systems?
 Text or keyword – based techniques can
be applied to digital shapes
(standard approach)
  good results (as in many existing
online systems)
  requires humans to describe every
data
 Human description can be: context-
dependent, skill-dependent, personal, non
objective
 Manual “annotation” is impractical for
very large repositories, as for digital
shapes automatically generated Lion::BackRightLeg::Foot
13

Content-based Querying: by example
 Visual understanding is powerful
 Users request to use visual information
Digital shape
repository
Extracted
Features
Compute
Similarity
User Query
Extracted
Features
Ranked
results
14
Results

Visual features, similarity, ranking…
15
 Visual Features try to catch the visual
appearance of the digital shape
 Es. Color distribution,
geometric primitives and so on…
 Features need to be extracted from all items in
the repository as for the user query
 Opportune indexing is necessary
 Similarity: All digital shapes are transformed
from
the object space to a high dimensional feature
space.
 For each feature
 Choose the appropriate function to measure
similarity
 Using a distance function, similarity search between
objects can be provided by a nearest neighbor
search in the feature space.
 Ranking: Assign a weighted function to the
results, collect feedbacks.
R
B
G

Data Layer
Retrieval engine
Sample CBR architecture
Digital shape
collection
Visual
features
Text
annotation
Multi-dimensionalindexing
Query
processin
g
Queryinterface
Feature
extraction
16
Feature
extraction

Other query methods
 Browsing by examples (multiple inputs)
 Browsing categories (customized/hierarchical)
 Querying by region (rather than the entire digital
shape)
 Querying by visual sketch
 Querying by specific features
 Multimodal queries (e.g. combining touch, voice,
etc.)
17

Image Searching & Retrieval Basics

Content-based Querying: by example
 Example for images
Image
Database
Extracted
Features
Compute
Similarity
Input image query
Extracted
Features
Ranked
Images
19

Similarity measures for images
 Measures that must solely be based on the
information included in the digital representation of
the images.
 Common technique:
Extract a set of visual features
Visual features fall into one of the following categories
 Colour
 Texture
 ShapeVisual Information Retrieval, Del Bimbo
A., Morgan-Kaufmann, 1999
20

Similarity measures for images
 All images are transformed from the object space to a high
dimensional feature space.
 In this space every image is a point with the coordinate representing
its features characteristics
 Similar images are “near” in space
 The definition of an appropriate distance function is crucial for the
success of the feature transformation.
 Some examples for distance metrics are
 The Euclidean distance [Niblack 1993],
 The Manhattan distance [Stricker and Orengo 1995]
 The distance between two points measured along axes at right angles
 The maximum norm [Stricker and Orengo 1995],
 The quadratic function [Hafner et alii 1995],
 Earth Mover's Distance [Rubner, Tomasi, and Guibas 2000],
 Deformation Models [Keysers et alii 2007b].
21

Visual Features Extraction
 What are relevant visual features for images?
 Primitive features
 Mean color (RGB)
 Color Histogram
 Semantic features
 Color Layout, texture etc…
 Domain specific features
 Face recognition,
 fingerprint matching
 etc…
General features
22

Color: Distance measures
 Based on color similarity
 Obtained by computing a color
histogram for each image
 Computing the difference among the
histograms…
 Current research (Color layout)
 segment color proportion by region and by
spatial relationship among several color
regions.
 NOTE: Examining images on colors is
the most used techniques because it
does not depend on image size or
orientation.
23

Color Layout
 Need for Color Layout
 Global color features give too many false positives
 How it works:
 Divide whole image into sub-blocks
 Extract features from each sub-block
 Can we go one step further?
 Divide into regions based on color feature concentration
 This process is called segmentation.
24
http://april.eecs.umich.edu/

Example: Color layout
Smith & Chang Single Color Extraction
and Image Query, 1995
25

Texture measures
 Texture measures look for visual
patterns in images.
 Texture is a difficult concept to represent.
 Identification in images achieved by
modeling texture as a two-dimensional
gray level variation.
 The relative brightness of pairs of pixels is
computed such that degree of contrast,
regularity, coarseness and directionality may
be estimated
26

Texture classification
 Most accepted classification of textures based on
psychology studies – Tamura representation
 Coarseness
 relates to distances of notable spatial variations of grey levels, that
is, implicitly, to the size of the primitive elements (texels) forming
the texture
 Contrast
 measures how grey levels q; q = 0, 1, ..., qmax, vary in the
image g and to what extent their distribution is biased to black or
white
 Degree of directionality
 measured using the frequency distribution of oriented local edges
against their directional angles
 Linelikeness, Regularity & Roughness a combination of the
above three…
 http://www.cs.auckland.ac.nz/compsci708s1c/lectures/Glect-
html/topic4c708FSC.htm#tamura
H. Tamura, et al.. Texture features
corresponding to visual perception. IEEE
Transactions1978
27

Shape-based measures
 Shape refers to the shape of a
particular region in an image.
 Shapes are often determined by
applying segmentation or edge
detection to an image.
 In some case accurate shape
detection will require human
intervention because methods
like segmentation are very
difficult to completely automate.
28

Shape features
 Segment images into visual segments (e.g.,
Blobworld, Normalized-cuts algorithm, and so on…)
 Extract features from segments
 Cluster similar segments (k-means)
Visterms (=blob-
tokens)
… …
Images Segments
V1 V2
V3 V4V1
V5 V6
29

Segmentation
 Segment images into parts (tile or regions)
(a) 5 tiles (b) 9 tiles
(c) 5 regions (d) 9 regions
Tiling
Regioning
Break Image down into visually coherent areas
Break image down into simple geometric shapes
30

Image Indexing and Ranking
 It is important to determine the most similar efficiently
 The problem is usually solved by using some kind of
index structure for the content descriptors (feature
vectors) of the images (1)
 Thus:
 similarity metric influences the effectiveness of the retrieval
 index structure biases the efficiency of the retrieval
 Efficiency can also improve using algorithmic
optimization during query execution (2)
1. Managing Gigabytes: Compressing and Indexing Documents and Images Morgan
Kaufmann, 1999
2. Speeding Up IDM without Degradation of Retrieval Quality, CLEF 2007
31

Hermitage Museum (domain-oriented)
 Hermitage (http://www.hermitagemuseum.org)
 The QBIC Colour Search
locates two-dimensional artwork
in the Digital Collection that match
the colours specified.
 The QBIC Layout Search
using geometric shapes the user can
approximate the visual organisation
of the work of art for
which she is searching
33

Google image searching (general purpose)
 “image-based” functionalities:
 Drag and drop an image
 Input and URL of an image
 Use pre-defined images on the web
 “text-based” functionalities:
 Automatic “Best guess” for text description of the input image, when
possible
 Add additional text description to refine the search
 sort by relevance, “sort by subject” (new)
 Google uses computer vision techniques to match your image to
other images in the Google Images index and additional image
collections.
 Color, shapes, spatial distribution …
..June
2011
34

Google (Cont.)
 The search results page can show
results for a text description as
well as related images.
  for the “web” and not for a
specific application…
   At initial stage
  works well with standard
images Famous person, places,
and so on…
  Some results are not ok
   No facial recognition due to
privacy issue
 but Picasa uses facial recognition
algorithms, as well as Facebook
etc…
35

Content-Based Video Retrieval
Basics

Motivation
 There is an amazing growth in
the amount of digital video data
in recent years.
 Lack of tools for classify and
retrieve video content
 There exists a gap between
low-level features and high-
level semantic content.
 To let machine understand
video is important and
challenging.
37

Video retrieval methods
 Video consists of:
 Text
 Audio
 Images
 + All change over time
 Searching and Retrieval methods can
be based on :
 Metadata
 Text
 Audio
 Content
 + a combination of the above …
Images
Text
Audio
Video searching
Content
Audio
Metadata,
Text
38

Metadata, Text & Audio-based Methods
 Metadata-based
 Video is indexed and retrieved based on structured metadata
information by using a traditional DBMS
 Metadata examples are the title, author, producer, director,
date, types of video.
 Text-based
 Video is indexed and retrieved based on associated subtitles
(text) using traditional IR techniques for text documents.
 Transcripts and subtitles are already exist in many types of
video such as news and movies, eliminating the need for
manual annotation.
 Audio-based
 Video indexed and retrieved based on associated soundtracks
using the methods for audio indexing and retrieval.
 Speech recognition is applied if necessary.
39

Content-Based Video Retrieval (CBVR)
 There are two approaches for content-based video
retrieval:
 Treat video as a collection of images
 Divide video sequences into groups of similar frames
 In both cases, they rely on temporal analysis
Video
Scenes
Shots
Frames
Key Frame
Analysis
Shot Boundary
Analysis
Obvious Cuts
40

Query by example for video
41
 Image query input
 Feature extraction according to the repository
 If video as a sequence of images, search for “similar
images” according to the extracted features
 If video as group of similar frames, search for “similar”
among the representative of each frames group
 Rank and return the results
 Video query input
 Analyse and extract feature characteristics
 For each representative image proceed as before

An example (research paper)
 Extracts keyframes through
the semantic content
 Matching is done via low
level visual content using
the concept of Color
Coherence Vectors (CCV)
 Feature Extractor (DB creator)
 A real time system that
preprocesses all the videos in the
database and stores the unique
features of every video
containing the CCV for all the
keyframes.
 Video Search Engine via
Image or Video Query
Rao et al. Real Time Retrieval of Similar
Videos in Large Databases” 2009
42

3D models searching & retrieval
Basics

3D Model retrieval: Conceptual framework
November 28, 201744
Tangelder & Veltkamp, A survey of content-based 3d
shape retrieval methods, 2008
3D
models
DB
Descriptor
extraction
Descriptor
s
Index
construction
Index
structurefetching matching
Query
formulation
sketch
Descriptor
extraction
Query
Descriptor
s
Visualization
results
3d models
IDs
online
offline
Query by example

3D models matching methods
 Three broad categories:
 feature based methods,
 graph based methods
 other methods.
 Note, that the classes of
these methods are not
completely disjoined.
45

Feature-based methods
 Work on geometric and topological
properties of 3D shapes.
 Can be divided into four categories
according to the type of shape features
used:
 Global features and global distributions
 Spatial maps
 Local features
46
Spectral distance

Graph-based methods
 extract a geometric meaning from a
3D shape
 Structure and maintain how shape
components are linked together.
 They can be divided into 3
categories:
 Model graphs,
 Reeb graphs,
 Skeletons
 OPNE ISSUE: Efficient computation
of existing graph metrics for general
graphs is not possible.
 computing the edit distance is NP-hard
 computing the maximal common
subgraph is even NP-complete.
47
Chao et al. A Graph-based Shape Matching
Scheme for 3D Articulated Objects Computer
Animation And Virtual Worlds, 2011
visimp.org

Princeton Shape Repository
 http://shape.cs.princeton.edu/search.html
48

McGill 3D Shape Benchmark
49
 http://www.cim.mcgill.ca/~shape/benchMark/
 It offers a repository for testing 3D shape retrieval
algorithms.
 Emphasis on including articulating parts.

Observations & OPEN ISSUES
50
 Good literature for images
 Open research for video and 3D models
 CBS “usable” in domain specific application
 Open research for general purpose CBS (on the web)
 Open research for multimodal searching
 Ranking and feedback, new frontiers with the advent of
Web 2.0 and Web 3.0
 Cooperative environment could support the creation of a global
“well annotated digital world”
 Accountability problems
 Trusting
 History, provenance is important…

Observations & OPEN ISSUES
51
 Open research: Adaptive visualization of the results
according to the user’ needs
 Image and abstract could be useful in specific conditions
 3D model online browsing could be important in other
conditions
 Video preview? Or?
 The same for the querying interface… HCI issues…
 Web searching performances: open research in on-the-
fly indexing of videos and 3D models
 Open issue: relevant portions of result digital shapes
should be usable as new query simply by selecting a
portion (and then “find similar items”)
 Interactive selection of portions of images, video and 3D
models

Multimedia searching

More Related Content

What's hot (20)

Similar to Multimedia searching (20)

Recently uploaded (20)

Multimedia searching