Assessing Subject Metadata for Images

Assessing Subject
Metadata for Images
Hannah Marie Marshall, hmm88@cornell.edu
Metadata Librarian for Image Collections
Cornell University Library
ARLIS/NA+VRA 2016
March 11, 2016
Seattle, Washington

Assessment Goals
• Determine retrieval rates
• Determine the search utility
• Primary Terms
• “What is the image of?”
• Secondary Terms
• “What is the image about?”
• Tertiary Terms
• “How does the image
communicate to the viewer?”

Challenges of subject analysis for
images
• "Image indexing is a complex socio-cognitive process that
involves processing sensory input through classifying,
abstracting, and mapping sensory data into concepts and
entities often expressed through socially-defined and
culturally-justified linguistic labels and identifiers"
(Heidorn, 1999)
• "Concept-based indexing has the advantage of providing
higher-level analysis of the image content but is expensive to
implement and suffers from a lack of inter-indexer
consistency due to the subjective nature of image
interpretation" (Chen, Rasmussen, 1999)

Findings – types of terms
Search Utility
• Primary Terms
• Secondary Terms
• Tertiary Terms
• Non-subject Terms
• Descriptive terms that don’t
address the subject matter of the
work (i.e. worktype,
materials/techniques,
style/period)
64%
34%
12%
13%
19%
16%
5%
37%
EXISTING DATA USERS
TYPES OF TERMS
Primary Terms Secondary Terms
Tertiary Terms Non-Subject Terms

Search Utility
• Higher levels of correspondence
for images of two-dimensional
works
• Higher retrieval rates
• Higher search utility
• Users were 2.5 times more likely
to use non-subject terms to
describe and search for images
of three-dimensional works (and
non-representational/abstract
works)
• Pottery, jewelry, sculpture
71.70%
45.30%
0
47.20%
26.40%
15.30%
16%
0
5%
8.20%
13%
19%
0
32%
16.80%
0%
19.70%
0
15.80%
48.60%
EXISTING
DATA
USERS EXISTING
DATA
USERS
2D WORKS VS. 3D WORKS

Search Utility
• Users were 2.5 times more likely
to use non-subject terms to
describe and search for images
of three-dimensional works (and
non-representational/abstract
works)
• Pottery, jewelry, sculpture
0% 10% 20% 30% 40% 50% 60%
Worktype
Style/Period
Materials/Techniques
Culture
Most common types of non-
subject access points

Findings – literal terms
Retrieval Rates
• Literal matches = successful
image retrieval
• Non-matches = unsuccessful
image retrieval
• Successful retrieval = 8.5%
• Unsuccessful retrieval =
91.5%
Correspondence between
existing metadata and users’
search terms
Non-matches Literal Matches

Findings – literal terms
Retrieval Rates
• Of that 8.5%...
• Primary Terms (75%)
• Secondary Terms (3%)
• Tertiary Terms (16%)
• Non-subject Terms (6%)
• Other descriptive metadata that
does not address subject
meaning (i.e. materials and
techniques)
Corresponding literal terms
broken down by type

Conclusions
• Primary terms yield the greatest
search utility and higher levels
of successful image retrieval.
• High numbers of non-subject
terms applied to images of
three-dimensional and non-
representational works suggest
that subject metadata is a weak
access point for them

Assessing Subject Metadata for Images

More Related Content

Viewers also liked (14)

Similar to Assessing Subject Metadata for Images (20)

Recently uploaded (20)

Assessing Subject Metadata for Images