SlideShare a Scribd company logo
Building Enterprise Taxonomies 2nd Edition Darin
L Stewart download
https://ebookbell.com/product/building-enterprise-taxonomies-2nd-
edition-darin-l-stewart-52722162
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Building Enterprise Iot Solutions With Eclipse Iot Technologies An
Open Source Approach To Edge Computing 1st Edition Frdric Desbiens
https://ebookbell.com/product/building-enterprise-iot-solutions-with-
eclipse-iot-technologies-an-open-source-approach-to-edge-
computing-1st-edition-frdric-desbiens-47394342
Building Enterprise Applications With Windows Presentation Foundation
And The Mvvm Model View Viewmodel Pattern 1st Edition Raffaele
Garofalo
https://ebookbell.com/product/building-enterprise-applications-with-
windows-presentation-foundation-and-the-mvvm-model-view-viewmodel-
pattern-1st-edition-raffaele-garofalo-2247674
Building Enterprise Blockchain Solutions On Aws A Developers Guide To
Build Deploy And Managed Apps Using Ethereum Hyperledger Fabric And
Aws Blockchain English Edition Palaniachari
https://ebookbell.com/product/building-enterprise-blockchain-
solutions-on-aws-a-developers-guide-to-build-deploy-and-managed-apps-
using-ethereum-hyperledger-fabric-and-aws-blockchain-english-edition-
palaniachari-34563892
Building Enterprise Javascript Applications Daniel Li
https://ebookbell.com/product/building-enterprise-javascript-
applications-daniel-li-36371318
Building Enterprise Systems With Odp An Introduction To Open
Distributed Processing Linington
https://ebookbell.com/product/building-enterprise-systems-with-odp-an-
introduction-to-open-distributed-processing-linington-4393208
Building Enterprise Systems With Odp An Introduction To Open
Distributed Processing 1st Edition Peter F Linington
https://ebookbell.com/product/building-enterprise-systems-with-odp-an-
introduction-to-open-distributed-processing-1st-edition-peter-f-
linington-4440298
Building Enterpriseready Telephony Systems With Sipxecs 40 Leveraging
Open Source Voip For A Rocksolid Communications System Michael W
Picher Anthony Graziano
https://ebookbell.com/product/building-enterpriseready-telephony-
systems-with-sipxecs-40-leveraging-open-source-voip-for-a-rocksolid-
communications-system-michael-w-picher-anthony-graziano-4720524
Building Enterprise Iot Applications Chandrasekar Vuppalapati
https://ebookbell.com/product/building-enterprise-iot-applications-
chandrasekar-vuppalapati-11116130
Building Enterprise Applications With Windows Presentation Foundation
And The Mvvm Model View Viewmodel Pattern 1st Edition Raffaele
Garofalo
https://ebookbell.com/product/building-enterprise-applications-with-
windows-presentation-foundation-and-the-mvvm-model-view-viewmodel-
pattern-1st-edition-raffaele-garofalo-11508960
Building Enterprise Taxonomies 2nd Edition Darin L Stewart
Buil
E .
• Darin L. Stewart
I
Building
Enterprise
Taxonomies
A Controlled Vocabulary Primer
SECOND EDT
TTON
Darin L. Stewart, Ph.D.
(@
Mokita Press
To Laura,
who taught me the importance of details.
Copyright © 2011 by Darin L. Stewart
Published by Makita Press.
All rights reserved. Printed in the United States of
America. No part of this book may be reproduced in
any manner whatsoever without written perm1ss1on
except in the case of brief quotations embodied in
critical articles and reviews. For information contact
clearance@
mokitapress.com
SECOND EDITION
Contents
1. Findability 1
Infoglut 4
The Problem with Search 6
Tcleporting and Orienteerin 12
2. Metadata 23
T ypes of Metadata 28
Descriptive Metadata 28
Administrative Metadata 30
Structural Metadata 31
Metadata Schemas 32
Where Do I Put It? 36
Where Does It Come From? 39
Metadata and Authority Control 41
3. Taxonomy 45
Linnaean Taxonomy 48
Controlled Vocabulat:ies 51
Faceted Classification 59
4. Preparations 67
The Taxonomy Development Cycle 70
Research 72
Performing a Content Audit 76
Creating a Governance Document 83
5. Terms 89
Internal Term Sources 90
I ntranets and Websites 92
External Term Sources 95
Existing Taxonomies 98
Refining Terms 99
Basic Hygiene 99
Compound and Precoordinated Terms 107
Disambiguation 110
6. Structure 115
Card Sorting 116
Categories and Facets 122
7. Interoperability 135
Basic XML Concepts 137
Representing Hierarchy 140
l;ear of Baggage Handling 144
XSLT 146
Zthes 152
8. Ontology_ 159
What Is An Ontology? 160
Class Hierarchy, Slots and Facets 162
Resource Description Framework 168
RDF/ XNU~ 176
RDF Schema 180
Web Ontology Language (OWL) 181
9. Folksonomy 185
Tagging 186
Folksonomy 191
Tag Clouds 194
Pace Layerin 202
Glossary 205
Notes 219
Index 227
1
Findability
"But the plans were on display..."
"On display? I had to go down to the cellar to find them."
'That's the display department."
"With a flashlight."
"Ah, well, the lights had probably gone."
"So had the stairs."
"But look, you found the notice didn't you?"
"Yes," said Arthur, "yes I did. It was on display in the
bottom of a locked filing cabinet stuck in a disused lavatory
with a sign on the door saying 'Beware of the Leopard."'
From The Hitchhikers Guide to the Galaxy
rinding good information is hard, much harder than it should be.
Your first encounter with a new website often feels like entering a
strange land with its own language, laws, customs and culture. You
have business to conduct there, but must do so without the benefit
of an interpreter or guide. As you begin to explore the homepage,
you must quickly orient yourself to its wugue approach to navigation,
in terpret bizarre labels and menus, guess at search terms and wade
through propaganda in search of useful information. And these are
just the public pages.
T hings get much more dangerous if you venture out of the tourist
areas and onto an i.ntranet or, heaven help you, a file system. Once
you enter the realm of the enterprise i.n fonnation system, all bets are
off. The seemingly unified front of the corporate website dissolves
into a collection of fiefdoms, each with its own local dialect and
2 Chapter One
jealously guarded borders passable only with the right permissions
and passwords. There also seems to be a civil war underway.
Despite our best efforts, most websites, portals, intrancts and file
systems are hostile environments for information seekers. We hire
consultants, hold focus groups and conduct usability studies to
w1derstand our users' needs. We build site maps, add search boxes,
and tag our content, and users still get lost. According to surveys
conducted by Gartner, IDC and others, knowledge workers spend
from thirty to as much as forty percent of their work day searching
for information and yet only find what they need less than half the
time.
1
This means we spend more time looking for documents than
actuaUy reading them. This situation is not just embarrassing, it's
expensive.
A third of a scmor knowledge worker's time, the time they spend
chasing information, works out to be roughly S26,000 a year i.n salary
and benefits on average. When those searches arc successful, this is a
legitimate cost of doing business. When they fail, that fruitless search
time is a drain on resources. Yct as expensive as tlus may seem,
search time is a mi.nor component of the cost of luddcn information.
Even ilic tens of iliousands of dollars spent on redesigning and
maintaining an improved website is trivial if it gets users to tl1e
content iliey need. The true cost comes when users ilirow up their
hands and abandon ilieir search. Studies have suggested that this
happens after about twelve minutes at tl,c outside.
This phenomenon is not restricted to complex searches and obscure
facts. Inforn1ation as mundane as tl,e contact i.nformation for the
director of human resources cannot be located by employees on their
own Intranet fifty-seven percent of the time. Those intrepid few who
can find ilic information usually must troU tl1rough multiple Web
pages and documents looki.ng for an org chart (which is probably out
of date) that nught have tl,e director's name. They must then look up
the director in an employee directory located elsewhere on the
Intranet hoping they spelled ilie name right. O ne study found tlus to
be the case in five out of si.-..:: corporate Intraoets.2
Findability 3
When people can't fi11d what they need, they don't just give up. They
go elsewhere. When a consumer doesn't find the right product, they
go to a competitor, which in aggregate costs your company half of its
potential sales. Tf they arc already a customer, they pick up the phone.
This costs you an average of seventeen dollars for each call that yom
self-service website was supposed to eliminatc.3
When an employee
can't find what they need, they go to a co-worker, doubling costs
while halving productivity and often yielding no better results. Tn a
2002 research note, Rcgi11a Casonato and Kathy Harris of Gartner
estim ated that ao employee will get fifty to seventy-five percent of
the mformation they need directly from other people, effectively
erasmg tl1e benefits of a corporate Tntranet.4
When a knowledge worker reaches this dead-end, they have little
choice but to set about creating tl1e information they need &om
scratch. This may be as simple as running a report and stitching a few
documents together, but more often it involves considerable
research, an additional information chase and consultation with
multiple colleagues. Unfortunately, all this effort is not being
expended to create information, but to recreate it. As much as ninety
percent of the time spent creating information for a specific need is
actually recreating information that already exists but could not be
located.
5
According to Kit Sims Taylor, this is because it is simply too
hard to find what you need.
At present it is easier to write that contract clause,
exam question, insurance policy clause, etc., ourselves
than to find something close enough to what we want
from elsewhere.... While most of us do not like to
admit that much of our creative work involves
reinventing the wheel, an honest assessment of our
work would indicate that we do far more 'recreating'
than creating. 6
Taylor has found that in addition to the amount of time spent
looking for information, an additional thirty percent is spent
reinventing the wheel. When you account for communication and
collaboration overhead, only ten percent of our time, effort and
4 Chapter One
energies is actually spent in the creation of new knowledge and
information. In a separate study, IDC found that th.is "knowledge
work deficit" costs Fortune 500 companies over twelve billion dollars
annually.7
These arc just the purely quantifiable costs. Consider the impact poor
findability has on decision making when there simply isn't time to re-
research and recreate the needed intelligence. Critical decisions may
be delayed because the information we can find, if any, is either
incomplete or conllicting. Worse, bad decisions may be enacted when
they wouldn't have even been considered had a fuller, more accurate
picture been available. ln this age of compliance, the ability to locate
and produce information on demand can mean the difference
between passing an audit and dissolving the company.
lnfoglut
So how did we get into this mess? We have spent literally trillions of
dollars on information technology, and yet our access to information
seems to get worse in direct proportion to the amount of money and
effort expended to improve it. Some pundits point to the sheer
volume of information with which we arc inundated and resign
themselves to this inevitable consequence of life in the information
age. As Britton Hadden of n NfE magazine put it:
Everyday living is too fast, too busy, too complicated.
More than at any other time in history, it's important to
have good information on just about every aspect of
life. And there is more information available than ever
before. Too much in fact. There is simply no time for
people to gather and absorb the information they need.
Hadden made this observation in 1929, shortly before founding the
magazine. Infoglut is not a new problem, but until recently it was at
least somewhat manageable. Today we are discovering that the.: only
Findability 5
thing worse and more dangerous than trying to run an organization
with too little information is trymg to manage one with too much.
Everyone understands intuitively that infoglut is a problem, but few
have a clear sense of how much of a problem it really is. Experts
have long proclaimed the dangers of information overload. While
hyperbole is the lifeblood of consultants, in this case they seem to be
right on th.e money.
Each year the world produces roughly five exabytes (1018
/ijleJ) of
new information. To put that in more familiar terms, if the seventeen
million books in the Library of Congress were fully digitized, five
exabytes would be the eguivalent of 37,000 new libraries each year.
While thjs is staggering in and of itself, consider that in 1999 it is
estimated that only two exabytes of new information was created,
meaning that the rate of information growth is accelerating by 30% a
year. 92% of that information is stored on digital meilia and 40% is
generated by the United States alone. We create 1,397 terabytcs of
o ffice documents each year. Each day we send thirty-one billi.on
emails.8
It is no wonder that we are, as John Naisbitt famously put it,
"drowning in information, but starved for knowledge."
The deluge bas not caught us by surprise. O n the contrary, we have
attacked it with a vengeance, pouring billions into data warehouses,
CR.N(, EJUJ, business intelligence and other data management and
reporting systems. These efforts and investments have bought us
great insight into our str11ctured con/en!: that highly organized
information structured according to a well defined schema or
framework. These are the records found in relational databases and
tl1at slot so ruccly into spreadsheets and reports. 'l'he information
contained in these records can easily be located, manipulated and
retrieved by means of standard guery languages such as SQL.
Unfortunately, this type of domesticated data makes up only fifteen
percent of the total information with which we must copc.'1
The
remaining eighty-five percent is made up of Web pages, emails,
memos, PowerPoint presentations, invoices, product literature,
procedure manuals, take-out menus and anything else that doesn't fit
neatly into a row in a database. The common factor among all of
6 Cbap terOne
these different forms of 1111struct11red co11/e11! is that they arc all designed
for human consumption rather than machine processing. As a result,
all of the tried and true methods of data management we have
worked so hard to master fail miserably when asked to bring a
company picnic announcement to heel. So while quarterly sales
forecasts across four continents may be readily available, knowing
whether you are supposed to bring a salad or a dessert may be out o f
reach.
The Problem with Search
This aspect of the information onslaught bas in fact taken us by
surprise. Many of us arc still in denial. A fter all, with fully indexed,
electronic in formation sources, full-text searching should allow us to
specify all the terms and subjects in which we are interested and have
the information retrieved and delivered to our desktop. As any user
of Google, A9, or countless other search and retrieval engines has
learned tlirough painful experience, things rarely work out that neatly.
Rather than receiving a nice, neat set of t,'l.rgeted documents, search
engines generally present us with long lists of Web pages that merely
contain the words on which we searched. Whether or not those
words a.re used in the manner and context we intended (did you
mean Mercury the planet, tl1e car, ilie Roman God or the element?)
isn't pa.rt of the equation. We a.re left to sort through page after page
of links looking for something that might be relevant.
Part of th.is problem is self-inflicted. People just don't write good
queries. O ne third of the time, search engine users only specify a
single word as tl1e.i.r query and on average use only two or thrcc.
10
This is what leads to so many irrelevant documents being returned.
We don't give enough context to our subject to eliminate documents
that arc not o f interest. lf you query just on the term "Washington"
you will receive links to information o n the state, the president, the
capital, a type of apple, a movie star, a university and so forth. In all,
Google returns 1,180,000,000 "hits." If you add the term "Denzel"
Findability 7
the number of links drops to 3,520,000, and we are reasonably
focused on the actor. If we add the phrase "Academy Awa.rd" we
finally get to 107,000 docwnents reasonably focused on the actor's
accolades. So the more specific and verbose we are with ou.r queries
the more relevant the results.
But what happens if you use the J cademy Awa.rd's comm on
nickname "Oscar" in yom query? The number of hits jumps to
593,000. This is the risk of getting too specific with search terms. By
using the proper name of the award rather than its popular name, we
may have missed 486,000 potentially relevant documents. Guessing
the wrong search term can have a dramatic impact on what you do
and don't frnd.
Information scientists have long been aware that there a.re tradcoffs
between depth and coverage whenever a search is conducted. The
broader the search is, the more documents that a.re retrieved,
including those that a.re not relevant to the actual information need.
Conversely, the deeper or narrower the search, the more likely
retrieved documents a.re to be relevant. The cost, of course, is that it
is also more likely that documents of interest will be missed in the
search. The difficulty arises from the fine balance of preciJion and
recall.
Precision is usually described as a ratio: the number of relevant
documents retrieved divided by the total number of documents
retrieved. In other words, what percentage of the total number of
docwnents retrieved arc actually related to the topic being
investigated? For example, a Google search on the terms "precision"
and "recall" returns approximately 970,000 documents. T he fust few
documents in the list do indeed prove to be related to measures of
search performance. However, a few links into the list a news item
appears: "Vermont Precision Woodworks Announce Recall of
Cribs." From the search engine's perspective, this is a perfectly valid
docw11ent. Tt contains both of the search tcrn1.s in its title. In fact,
one search term appears in the title of the website itself, www.recall-
wa.rnings.com, thus causing it to receive a high relevancy ranking.
O ut of 970,000 documents, it is safe to assume that many, if not
8 Cbapter One
most, of the retrieved documents will have this level of relevancy to
our 9uery. This indicates low precision, but high recall.
Recall is also a ratio and is defined as the number of relevant
documents retrievedthe total number of relevant documents in the
collection being searched. The example above probably has a high
recall due to the large number o f documents returned.
Relevant
Documents Retrieved
Total
Documents Retrieved
=PRECISION
Relev,ml
Documents Retrieved
Total Relevant
Documents in Collection
recaU
=RECALL
These two measures are inversely related: as recall increases, precision
decreases. A balance must be found between the two, retrieving
enough docw11ents to get an individual the information they need
without returning so many that wading through irrelevant
information becomes burdensome. This balance is the heart of
information retrieval, but it is difficult to measure precision and recall
precisely. 'Ibis is because we rarely know what is contained in the
collection we arc searching, in this case the Internet itself, and also
because the notion of relevance is very subjective. At best we can
estim ate recall and precision based on feedback from users of the
search engine in 9ucstion and make adjustments as appropriate.
Taking our Google search on "precision" and "recall" as a test case, it
may seem that the problem isn't so bad. J fter all, the first several
documents in the list were on the exact topic we were seeking: search
performance measures. We can just disregard the other 3.5 million
documents offered. We got what we needed from the top ten or
l:'(vcnty.
Findability 9
This ability to rank pertinent documents near the top of a result set is
what has made Google the clear winner of the search engine wars.
Their PageRanJ<: algorithm is a key ingredient in the Google secret
sauce.11
Rather than just counting how many ti.mes a certain word
occurs in a document or where it occurs, Google also looks at who
links to that document. If a lot of pages reference a particular
website, chances are that it is a pretty important source of
information on the topic at hand. If the pages linking in are
themselves important, then that likelihood increases and the
document's relevancy rank improves accordingly. This variation on
"citation analysis," which is traditionally used to determine the
importance of scholarly publications, has radically changed Internet
search for the better. Google even offers a free tool that l can add to
my website to search my own content with just a few lines of code.
So, problem solved? Not quite. There arc several caveats to applying
a Google-like tool to your fi.ndability challenges. First, Google free
site search is really only searching a subset of the entire Google index,
that part representing just your website. As a result, only those Web
pages that are open and available to the public will be included in a
search. Anything on the lntranet is invisible to the Google spiders,
the programs that find and index Web pages and build up the search
index. Even those pages and documents that are open to the Internet
at large may be missed. Indexing programs only go so deep when
looking over a website. If your content is more than a link or two
away from the main page, it will probably be missed. Any new
content you add will likewise be invisible until the next time an
indexing spider happens by- a process completely outside of your
control. As Google explains:
There are a number of reasons a page might not appear
in the results of your Google free site search. It could
be that Google hasn't crawled that particular page yet.
Google refreshes its index frequently, but some pages
are inevitably missed. Or, the page may have
Javascript, frames, or store information in a database.
Pages like these are difficult or impossible for the
Google crawler to visit and index.
12
Chapter 0 11e
Finally, Google's greatest strength, the PagcRank algorithm, is also its
greatest weakness when applied to a single website. l t is unlikely that
CNN.com or eBay will reference your org chart. In fact, very few
websites outside of your organization will link to your internal
documents. Yet the rankings applied to your documents are
determined in the context of rankings of the Internet as a whole. Th.is
effectively renders the relevancy judgments made on your content
meaningless when the search is restricted to your own sitc.11
Aside from the arcane nature of indexing, the very act of searching
can be a struggle in most organizations. Documents and content are
spread out across multiple locations and repositories. Policies may be
on the Intranet, quarterly reports on the file system, resumes in a
departmental directory and price lists on the company homepage.
Finding information is no longer an exercise in finding a needle in a
haystack. First you must choose which haystacks to search, in what
order, and for how long. In most organizations, less than half of their
documents arc centrally inclexecl.14
Th.is means that it is impossible to
look for information in aU potential locations with a single query or
even a single search tool. Th.is dispersal of information across an
organization leads to another search challenge: choosing the correct
query terms.
< !-- sitesearch Googl e -->
<FORM rnethod =GET a ccion=" http: / ;..,...,. google . com/search ">
<input type=hidden narne=ie value=UTF-B>
<i np ut type=ludden narne=oe val ue=UT F-B>
< TABLE £.9.£9.-!-..2£.= " #FFFFFF "><~..i:.:><td>
< A HREF= " htt p : / /www. google . com/ " >
<IMG SRC= " http : / /www. google.com/ l ogos/ Logo_ 40wht . gif"
£2.uie r= " 0 " ALT= "Google " ></A>
</t d><td>
<INPUT TY PE=text narne=q size=31 maxlength=255 v alue="">
<INPUT type=submit name=btn<q VALUE="Google Search" >
<f ont size=-1>
<input type=hidden narne=dornains v alue="YOUR DOMAIN NAME " ><b.r_>
< input type=radio name=sit;es;ean:::J~ value= ""> WliJW <input type=r adio
!l
s!l!!..
~ it~ ~ value="YOUR DOMAIN NAME" chec ke d > YOUR DOMAIN NAME
<br ></font ></td></tr></TABLE>
</ FORM>
~ Sit;eSea~ Google -->
Figure 2. Just cut, paste and you've got search. Not quite.
Fir1dability
Most search engines create their
indexes by extracting terms
from the full text of
documents. As a result content
creators and authors become de
facto indexers and catalogers.
The words they choose in
authoring their documents
become the search terms
available to their readers. This
becomes a problem if they
don't speak the same language.
Th.is goes back to the Mercury
(planet, car, god) and actor
(Academy J ward or Oscar)
problem.
11
Figure 3. The ideal relationship
between author and searcher.
Unless there is a company standard for terminology, and these are
rare, each area of an enterprise is going to have its own language. f
cmtomer in one area may be a client in another and a patron somewhere
else. This lack of consistency in search and indexing terms has
proven to be the single greatest challenge to the effectiveness of
search and findability in general.15
Ultimately, any search consists of, at rruru.mum, four hurdles that
must be cleared. First, the information seeker must be able to
articulate what they are looking for with the right syntax for the
specific search tool being used. ext, they must guess what words an
author may have used to express the concept of interest. Then, with
the query in mind, they must figure out the most likely place to
search. Finally, they must sift through the results of their search,
separating the potentially relevant from the clearly irrelevant and
hope what they end up with is complete, representing all that is
available. Really, it's a wonder that we ever ftnd anything at all.
12 Chapter One
Teleporting and Orienteering
A keyword search is most often an attempt (usually several attempts,
actually) to go directly and instantaneously to the exact location o f
desired information. If we search the Web on the terms "Aladc:lin
Theater Box-Office" we hope to land where we can purchase tickets
for concerts at th.is small venue in Portland, Oregon, without having
to sift through irrelevant information. The academic commwlity has
labeled tllis sort of information seeking behavior teleporting.16
Teleporting is one strategy for finc:ling information and can be
executed in various ways with a number of search tactics. In addition
to keyword search, an information seeker may attempt to tcleport by
specifying a specific URL, opening a certain email, or typing in a
directory path to a particular document. Perfect tclcporting (hitting
your target on the first attempt) is a rare accomplishment; so rare in
fact that a game, "Google Whacking," has sprung up around the
challcnge.17
Yct despite tl1e difficulty in finding just the right
information witl1 search alone, most websites and information portals
seem designed to encourage the attempt as evidenced by the
ubiquitous search box.
A more realistic scenario is to teleport into the general vicinity of the
information you arc seeking, using search or some other tactic, an<l
then zero in on your target with a succession of small steps. To buy
OLLr concert r-jckcts for a show at the AJadc:lin, for example, we might
teleport by typing in the URL for tl,c theater: www.aladc:lin-
tl1catcr.com. We know we arc close, but still can't buy our tickets so
we may follow tl1c link to the "Upcoming Shows" page. Herc we find
the performer we arc looking for listed vitl1 a linl< to "show details"
so we click through to tl1at page. Finally we sec a banner for "Local
Ticket Outlet Information," which leads us to a link for the "Aladdin
Theater Online Ticketing Page" where we can order our tickets.
This strategy of locating information by continually narrowing our
search through incremental steps has been dubbed orienteering
(though most people simply call it browsing) and has proven to be
Findability 13
the preferred approach to finding information. Studies conducted at
the MIT Artificial Intelligence lab have found that information
seekers use keyword search less than forty percent of the ti.me.
Surprisingly, this holds true even when searchers know exactly what
they arc looking for and even where to find it (see table 1).
18
Specific General Specific
Total
Information Information Document
Orienteering 47 19 41 120
Teleporting 34 23 17 80
Total 81 42 58 200
Table I. Information need by search strategy (19 unknowns removed).
There are circumstances where keyword search yields nominally
better results than navigation. In one study, information seekers were
more successful at locating information on a well indexed medical
information site by using search rather than browsing. Interestingly,
those most successful at finding what they were looking for were
tl1ose individuals who turned to search only after browsing failed (see
figure 3). Even when individuals abandon browse oo a given
information hunt and succeed with search, they invariably return to
orienteering oo thei.r next task. 19
M.I.T. researchers have found several reasons why people prefer to
zero in on information rather than attempting to pounce on it in a
single great leap. First, it can be difficult to dearly articulate exactly
what it is you arc seeking. This is the case even when trying to
retrieve familiar information and documents. Think of the last ti.me
you were asked for directions to a familiar destination. Even though
you may be able to drive there without thinking, you may have a hard
ti.me giving step by step instructions on how to get to that same
location. Browsing reduces the cognitive demand on information
seekers by allowing them to follow familiar paths to the general area
of the information they are seeking, guickly and easily reducing the
size of the area they must explore. This also allows searchers to draw
14
Q)
-
Ill
0:::
V,
V,
Q)
0
0
:,
(f)
(/)
....
a.
E
(I)
....
-
c:x:
Chapter One
Search versus browse success rates
100%
90% /
,
80%
,
,
70%
, ,.
,,
,.
60% ,,
,
,.
,
50% /
,.
,
'
40% , , /
,.
/
30% '
,.,,
20% .,
/
/
10%
0%
100%
80%
60% ·
40%
20%
0%
,
'
/
Browse Search Search after
Browse failure
Information Seeking Strategy
First choice of strategy
/1
, I
~
-✓ .,,,,l
/
/ /
,.
,.
/
/
,,
Brow se Search
Information Seeking Strategy
Figure 4. Information seeking behaviors.
Find:,bility 15
on a broad range of "meta-information" about the target of their
search.
For example, say you need to locate a company memo that was
circulated six months ago and has since disappeared into the bowels
of the company Intranet. Even though you have no idea where to
find the memo itself, you recall seeing it referred to in a11 email from
a colleague. You may not know exactly where to find that email
either, but you likely will recall who it was from and roughly when
you received it, along with some idea of the subject line and general
content. This will allow you to find tl1c email that will in turn point
you toward the actual target of your search-the company memo.
Even though you can't teleport even i_
nto the general vicinity of the
memo, you can start from a known frame of reference (the email)
and follow clues along the way until you arrive at your goal.
The small steps of orienteering and the clues found along the way
also provide information seekers with a strong sense of location
throughout their search. The importance of tl1e "you arc here" factor
should not be underestimated. When users feel in control and that
tl1cy arc heading in the right direction and arc able to backtrack if
they take a wrong turn, they are less likely to abandon a search
prematurely. When people drop into the middle of an information
space as a result o f a keyword search, they have no context and little
indication of how to proceed. T his sense of disorientation can cause
both knowledge workers and potential customers to leave a website
as quickly as they arrive.
By contrast, navigating through an infomiation space allows the user
to become acclimated to the environment at their own pace, much
like easing into a hot bath rather than plunging into scalding watet:.
This process of guided exploration also has the dual benefit of
building context for interpreting the target information once it is
found and allowing for serendipitous discoveries along the way. Most
importantly, information seekers arc more likely to continue their
search i_f they are confident that they are on the right path and tl,at
their efforts will pay off.
16 Chapter One
NN/g
t!.t:U.g use1t.com md.o rg AskTog
Nielsen Norman Group
Strategies to ennance-'tne"us"er experiehce - - - C _-
Home ~ services Publications ~ About NN/g
NN/g Home :• Services • Training > Intranet usabili ty
Figure 5. Breadcrumb trails arc often used to give users a sense of control
over their exploration of a new information space.
It's interesting to note that the word browse derives from an
antiquated French term brost meaning "young shoot" and referring to
the way that animals feed on the young shoots of trees and shrubs.
As animals seek for nourishment, they must balance tl,c nutrition to
be gained against tl1e energy expended obtaining it. This behavior is
fundamentally the same for information seekers. Visitors to an
information space, whether it be a website, Intranet, database, file
system or what bave you, arc continually balancing cost and benefit:
"Will tl,c i.nformation I find here be worth tl,c time and effort it is
costing me to track it down?" As they browse a website, they will be
repeatedly assessing tlie likeW1ood of fmding what tl1cy need i.n tl1c
current environment and determining when it's time to move on to
more promising pastures.
This metaphor has become the basis of information foraging
theory, a model of information-seeking behavior developed by Peter
Pirolli and Stuart Card of the Xerox Palo Alto Research Ccnter.w
According to this model, we search for information across tl,c
Internet using essentially the same strategics hunter-gatherers use to
search for food across tl,c savannah. The nature of the prey may be
new, but tl,c fundamental approach hasn't changed for millennia.
Botl, animals and humans attempt to maximize their "benefit per unit
cost." When the benefit, in terms of likelihood of finding tl,e
necessary food or information witl, an acceptable investment of time
and energy, falls below a certain tlireshold, the current website or
watering hole will be labeled sterile and the forager moves on to a
more fertile patch. Steps can be taken to reduce the W
<elihood of
users leaving our i_nformacion patches prematurely. One of tl,c most
Findabifity 17
effective strategies is to increase the strength of the information scent
present in our systems.
The notion of information scent is central to information foraging.
The basic idea is that just like a game anim al, i.n fonnation leaves
behind spoor that can be detected and tracked.
Associated concepts "rub off' on one another, leaving
detectable traces, just as a watering hole frequented
by woolly mammoths will smell of woolly mammoths. A
hunter-gatherer seeking mammoths is likely to be
drawn to the watering hole, if only to look for spoor.
Information foragers do the same. Imagine you're
looking for texts about foraging theory. If [a search]
throws up a box containing the keyword "hunter-
gatherer", you're likely to select that box. It just smells
right.21
Consider oux ticket purchasing example. When we fast arrive at the
theater's homepage, we sec labels such as "Artist of the Month" and
"Show Listings," which may even include the concert we arc seeking.
Even though we don't see that we can purchase tickets here, the page
smells like concert tickets so we continue our search by clicking on
"Upcoming Shows." Herc the scent gets stronger when we find the
right show along with a link to "Show Details," which finally gets us
to "Buy Tickets O nline." Throughout the process of browsing, the
scent of concert tickets is strong and gets stronger the closer we get
to our goal. This continual positive feedback can keep information
seekers happy with the current infom1-ation patch and prevent them
from jumping to a competitor or colleague to meet their needs.
Strong information scent can be a double-edged sword if mishandled.
The most common pitfall occurs when a strong scent points toward
what should be the right answer but isn't. Jakob Nielsen
demonstrated this phenomenon in a study of a health information for
teens wcbsite.22
Users were asked to find out how much they could
weigh without being considered overweight. Most users quickly
18 Chapter One
gravitated toward an area of the site labeled "rood & Fitness." Th.is
clear, concise label had strong information scent fo.r the question at
hand. Featured prominently within that area of the site was a lengthy
article entitled "What's the right weight for my height?" that was also
ranked highly by a search on the tenn "weight."
This would seem to be a bull's-eye except for the fact that the article
docs not contain the answer to the question. Because the information
scent leading to this article was so strong, users were convinced they
were looking in the right place. When th.c information wasn't there,
they naturally concluded that because it wasn't where it should be, it
must not exist anywhere on tl1e site and abandoned ilieir search. Th.is
is an w1 fortunatc result since ilie answer was in fact available on the
site. It was buried in an article titled "Body Mass Index (BMI)." The
information scent of ili.is title for answering the target question is
almost non-existent. hrst, the title is a bit academic and maybe even
intimidating for the website's teenage audience. Worse, the title gives
no indication of the article's content which includes a straightforward
[ IIY IIC
IEISOILIM{ J
Thursday, August 26
Pink Martini
Oregon Zoo
Tlckot Price: $32.00 adv/ $32.00 d os
All A.Qes Event
loullkkrt
O.tlrtl1fo,mtti1n
OoorJ nl Gntei ti -IPl"I. Lawn Entry @ SPl'I, Shol!f nt 7PM ...
~ Does anvth1no say summer 1n Porttand quite like a Prnk Maroni concert at the Oreoon Zoo? What
•
~-· better way to hear sonos from their latest release - the lush, breezy "Splendor m the Grass" -
than on the zoo's lush, breezy concert lawn? Our hometown heroes are international stars, but
desprte a busy European tounno schedule, Pink Ma1t1nt will out away its oassports for two soeaal
performances at the zoo - their only Portland appearances this summer
• ,._ ' please note only GAtix are available at the Aladdin Uox office; reservation pack6ges ore
available at tickel11H1stcr.com.The concerts all start at 7 p,m. Your ticket will allow you into
the Zoo al 4 1,.111. ol tho day or the concert. For all concerts, the lnwn is closed al 4 p.m. for
the sound c.hcck. ond then opened ot 5 p.m. for concert tltkct·holders. • • • •
PiM!Mutini
Figure 6. A website with good information scent.
Findability 19
calculation of optimal weight using height, weight and age. In a
nutshell, the container of the information was mislabeled.
The problem of bad labels strikes at the heart of findabilily. If
information seekers cannot recognize the content they are searching
for even when they find it, it may as well not exist. Even when an
information producer gives careful consideration to labeling and
categorization, the result may have no meaning to information
consumers. J physician, wanting to be precise, may label a document
on treating a particular rcspiratoty condition with the terms
laryngotracheobronchitis, inspiratory stridor and dexamethasonc.
While this may be perfectly appropriate for other doctors, it is of
little use to a mother searching the Web for information on how to
alleviate the wheezing cough of her daughter with croup.
Most information systems today are organized much like libraries
before Melvil Dewey created his decimal system for classification.
Patrons were left to wander stacks of untitled o.r oddly titled books
piled on shelves according to some idiosyncratic organizational
scheme comprehended only by an arcane priesthood of local
librarians.
Overcoming this barrier to discovery is the role of controlled
vocabularies and taxonomies. By developing a structured collection
of terms and guidelines around how they arc to be applied,
information can be managed in a manner tl1at facilitates its discovery,
interpretation and use to the greatest extent possible.
Beyond just finding information, the hierarchical nature of a
ta.xonom)1 can help educate an information seeker by guiding them
tluough a subject. The mother searching for information about her
daughter's illness will not only discover that dexamethasone is a
steroidal treatment for the condition, but that humidified air may also
alleviate her discomfort. Continuing tluough tl1c structure she will
discover additional treatments and potential complications. Finally,
she will learn that the proper name for "croup" is 1.11 fact
laryngotracheobronchitis, giving her a new term to search on and
expanding the potential information sources available to her.
20 Chapter One
The parent/child relationships inherent in the tree structure of a
taxonomy are powerful tools in guiding a seeker through what may
be an unfamiliar subject. By explicitly showing how terms and
concepts arc related, a searcher will discover associations that they
didn't: know existed. Most importantly, they can define and refine
their information need as they explore rather than having to precisely
articulate it up front wben they may not know exactly what it is they
are seeking.
O rgani7.ing information according to a well defined structure, such as
a taxonomy, also provides stability to an information environment.
Information changes continually. D elphi Group has estimated that at
least ten percent of enterprise information changes monthly i.n an
average organization.23
Without some means of governance, relevant
information becomes a moving target. Today a search on
"taxonomies" may yield 1,900,000 matches. Tomorrow o r next week
tlrnt same query could return 1,985,000 hits with completely different
rankings. That article I found last week that was so useful but that I
didn't bookmark could now be anywhere.
A taxonomy can act as a dynamic bookmark. As new documents and
in formation become available, they can be classified, labeled and
published in accordance with the taxonomy without changing its
structure. When a knowledge worker needs to return to an area of
interest, he will still find it where he left it. The only difference will be
that tl1crc is now more information available there. In addition, the
new information will be in context witl, relationships an<l potential
avenues of exploration clearly visible.
Managing terms and keywords can also enhance search by bridging
the vocabulary gap between information producer and consumer. A
search engine integrated witl, a ta.,xonomy would know that a search
on cro11p should also look for laryngolracheobro11chitis and that in certain
contexts "Oscar" is another way of saying "Academy Award." It can
also compensate for common spelling errors and variants (i.e., theatre
or theater) and synonyms (fall or plunge or spill or tumble). T hese
expansions may seem trivial, but they can dramatically improve the
effectiveness and efficiency of search.
Findability
A sample hierarchy ofrespiratory illnesses
CROUP
(USE FOR laryngotracheobronchitis)
Symptoms
fever
wheezing
(USE FOR inspiratory strider)
swollen lymph glands
decreased appetite
Treatment
humidified air
fever reducer
acetaminophen
ibuprofen
steroid
dexamethasone
prelone
orapred
pulmicort
breathing treatment
acemic epinephrine
Complication
kidney inflammation
(USE FOR glomerulonephritis)
rheumatic fever
STREP THROAT
Symptoms
fever
swollen lymph glands
rash
Treatment
antibiotic
amoxicillin
erythromycin
Complication
rheumatic fever
RESPIRATORY SYNCYTIAL VIRUS
(USE RSV)
Symptoms
21
22 Chapter One
Controlled vocabularies, like taxonomy and its relatives, arc not silver
bullets and will not magically cure all information management
problems, but they are a critical component of findability. If properly
constructed, applied and maintained, a ta,
xonomy can radically
increase the value of information by making it more available,
understandable and actionable. The remainder of this book will
demonstrate how this can be achieved. Before we can delve into the
mysteries and wonders of taxonomies, however, we must take a brief
detour into the world of metadata.
2
Metadata
If we fail to anticipate the unforeseen or expect the
unexpected in a universe of infinite possibilities, we may
find ourselves at the mercy of anyone or anything that
cannot be programmed, categorized or easily
referenced.
Fox Mulder, "The X-Files"
Art collecting is a tricky business. The value of a painting, sculpture
or even a rare book can vary wildly depending on the circumstances
of a purchase. Two similar works by Monet may go on the auction
block together; one sells for thousands, the other for millions. The
only substantive difference between the two is the existence of
provenance information. A clear record of a painting's histoiy, who
has owned it, when and where it has previously sold and for how
much is essential to deterrnio.i.ng whether or not it is a wise
investment. Without such information we have no context for our
decision. Is it overpriced or undervalued? Is it stolen? Is it a verified
Monet or just a suspected Monet? Even though it is the painting
itself that holds our interest, we need information about the painting
to gualify our interest. This same principle applies to less tangible
assets- namely information.
When we first locate new information we tend to be suspicious. Can
I trust these numbers? Is this the current version of the document? Is
this image copyright cleared? This is especially true if the source of
24 Metad:1ta
that information is not familiar to us. Before we trust a document or
a Web page, we need to know a little more about it. Some of these
gucstions may be answered by the search itself. When we look for
information, we usually try to specify parameters to limit the scope of
the search. Specifying the author of a document, the date of its
publication, whether it is a report, invoice, form or memo will not
onJy enhance our chances of locating what we a.re looking for but can
pre-gualify the content as it is found. This kind of reference
information is generally not indicated explicitly in the content itself,
but rather is supplementary to it. It is metadata.
The standard definition of metadata is usually given as "data about
data." Th.is gets at the general idea, but is not gu.ite adequate. The
term "meta" comes from the Greek root meaning something !hatjollmvs
anolher and lakes ii into acco1111t. Thus, metadata is generally developed
from associated source data and as a function of the information it
describes. The G reek tem1 aJso means among, alongside, or 1vith, so it
follows that mctadata can take several complementary forms in
relationship to its parent information. rinally, if tl1c Latin derivation
is taken into account, meta can mean /ranscendent, so metadata shouJd
be expected to add value above and beyond the content it describes.
To complicate matters, the distinction between data and metadata
can be flu.id. What is metadata in one context may be pure data in
another. For example, if you are looking for an article on a cert'W1
topic by a certain author, then the writer's name and the subject
keywords arc metadata and tl1e content of the article is data. By
contrast, say you are trying to remember the name of the author who
wrote a particular article in tbe 1940s and can't remember the title.
You uo remember that it contained tbc pbrasc: "Man cannot hope
fully to duplicate th.is mental process artificially, but he certainly
ought to be able to learn from it." In th.is case the publication date
range, 1940-1949, and the content of the article itself are the
metadata and tl1e author's name is tl1e data. 1
Cbapter Two 25
The Value ofMetadata
In late '1988, a non-descript van pulled up in front of Christie's
East, the pmchasing office of the renowned auction house in
New York City. Tied to its top with several lengths of rope was a
six by s.eveo foot canvas. T he driver had found it at a warehouse
sale of unclaimed property and purchased it on a whim for
$1,000. The painting was in bad shape and nothinKwas known
about ·it, but it was large and old and ougbt to be worth
something. He offered it to Christie's for $1,500. Ian Kennedy, a
residen~ expert of Old Masters for Christie's e..~amined the
painting an instantly recognized it as a work of tbe Italian Master
Dosso Dossi. With this new bit of information, the asking price
.rose from $1,500 to $800,000. It was purchased by the London
art deal~rs Hazlitt, Gooden & Fox for $4 million, dirt, tips and
all. Two months later it was sold to the Getty Museum for an
even higher price.
11
Allegory of Fortune,11
Dosso Dossi
26 Metadata
The defining characteristic of metadata is that whatever form it takes,
it facilitates the identification and discovery of a discrete package of
information. The classic example of this is the library catalog card.
Independent of any actual content from the item being described, a
simple 3" x 5" card can provide a wealth of information that is usefu l
in locating and managing an information resource, in this case a
book. At a glance, we can determine the title, author, publisher,
length, topic and even location of the book. This quick access is by
design.
973.4
B21 UcCullough, David C.
John Ada.ms / [by] David lkCullough
Mei.r Yor k : Simon & Schuster, c2001
751 p., (40) p. of plates : ill. (some c ol.) ,
maps ; 2 5 cm.
Includes bibliographical r eferen c es (p . 703-726)
and inde x.
ISBI-! 0-7432-2313 - 6
l. Adams , John, 1 735-1826, 2. Pr esidents - United
Stat es - Biography. 3. Un i ted States - Po l iti cs
and govema ent - 1783-1809 . I. Title.
E. 322.H38 2001
9 73.4' 4' 092 [BJ 2001027010
Figure I. Mctadata in a traditional card catalog.
/n often overlooked feature of the humble card catalog is that the
cards are organized to facilitate this at-a-glance utility. Each card has
a consistent location and format for each piece of information it
contains. When looking at an author card, we know the first line
indicates the author of the work and the second line is the book's
title. The structure of the card telJs us that a book is a biography of
John Adams written by David McCullough rather than the other way
around. The same principle applies to electronic resources. To be
useful, mctadata must be structured to facilitate both discovery and
interpretation.
Chap ter Two 27
Most major newspapers now provide onJine editions with searchable
full-text archives. Tf we type in a few well chosen key words, we have
a chance of finding something of .interest. The newspaper's search
engine will match our query terms against every word of every article
of every edition contained in the archive. This is searching the data,
the actual content of the newspapers. This type of search is subject to
all of the pitfalls of unconstrained search as discussed in the prior
chapter. If we instead search the meladata, we can dramatically
improve the effectiveness of our search.
111:WS fllTEllTAIIIMEllT OTHm StCTIOIIS ClASSlftEDS JOBS CARS H
OMES REIITALS
• JOBS
• CARS
• HOMES
• REIITALS
MORE Cln$1FIEOS
SAi.ES &DEALS
8USIMEst OIRECTO~
eo._,.1,..__kjf
rucE.,IID
ARCHIVES
Ba~k Searc.h
AdvolO~f.ld $e.uch
s.wedSearch
Login
Account &PUl'Ch.1$C$
Knowledge Ccnte,
Arc.hive• Trouble
Rer,011
l.nlmea.com Sit♦
Servke•
ARC:HIVES Hfl.l' ~ lllf•'
Abot.n the ArclW'f:
Prl<ing
Term& of Service
Se,
u ch TQJa
FAO
Storie,: Prio-rto 1tl3S
Sea,ch ror:
-------
Coment O1
1
llo11s: 0 11
1.'1985 . Present (Te><Q
0 121
-1•1881 - 12/311
198-1 (Htstonc Article Images)
Soll By: 0 Most Recent First
0 Oldest Fhsl
0 Retavance
Date Options: 0 All dates
O oate Range
AtRhor:
Headline:
A1ticle Type: Al
r.:: ,, ....,,~
F1 0111; .wl ~ ·: 1 v i ~
To: ~~;--::~
- - L - -
(optlonaQ
- - - ~ (option•~
Sectloo: Al
- - - -
Semell O1
>11011s: Search Articles Only
SearchMieles.Advertisements and Listings
EIL#M
Figure 2. The advanced search page of the LA Times.
V'
28 Metadata
Tf we would like to research the position of fom1cr president Jimmy
Carter on U.S. trade with China, a reasonable place to start is the
arch.ivcs of the Los Angeles Times (www.latimes.com). Js we would
h• th I d "C " "]) Li " d "Cl • "
expect, scare Jng on c ceywor s arter, o cy, an una
returns an assortment of documents ranging from an analysis of the
conflict between China and Taiwan to an obituary of Stanford
University professor Michael Oksenberg. Fortunately, the Times
archive provides an advanced search mechanism utilizing extensive
meta.data. Rather than a blind search where all words are treated
cgually, the Times enables users to restrict certain terms to certain
areas. We can specify that "Jimmy Carter" only be matched against
authors and that only articles of the type "opinion piece" with the
word "China" i.o the head.line be retrieved. Even though we are no
longer looking at any of the archive's actual data or article text and
are instead searching only meta.data, we receive a precise set of
documents with a strong likelihood of being .relevant to ou.r interest.
Types ofMetadata
The advantages metac.lata affords to searching electronic versions of
traditional textual resources are straightforward. However, the digital
world isn't as simple a place as it once was, and newspapers,
magazine articles and the like arc rapidly becoming a minority among
the milieu of online information. ew types of i.nformation objects
and artifacts seem to emerge daily. Io order to manage this deluge of
new forms of information, we must be able to describe them in ways
that are specific to each wuguc type and the tasks utilizing them. To
this encl, several different forms of metadata- desc.riptivc, technical,
and administrative- may be developed for any given information
object.
Descriptive Metadata
D escriptive metadata is by far the most common form of meta.data
i.n use today and is usually what you will encounter as an in formation
Chapter Two 29
seeker. This type of metadata comprises what is explicitly added to
content to make it easier to find. lo a nutshell, descriptive metadata is
the who, what, when, and where of an information resource. 'v'hile it
found its first broad application with textual resources such as the
LA Times archives, it is rapidly coming to permeate every aspect of
the online world.
Take for example, Apple Computer's popular iTunes online music
service. Since the content offered by i'l'unes is non-textual (i.e., the
strains of a Bach concerto or a John Coltrane solo), full-text search of
the content itself is ill-suited to retrieval. Rather, you search the
textual information associated with the audio or video file you are
trying to find. Most files have been extensively tagged with
descriptive metadata. This includes the basics, such as artist, album,
and song title as well as more advanced categories such as genre, sub-
genre, release date and publisher. Each piece of metadata associated
• f<02S Mro5t.F.,.
iTunes Review
Th• Rloe >nd f>II ofZlggy
Stoudusr o1nd the Spider~
fromM3n
03id BOWie
(;if1 lhlct.tuck 0
AtlklAAttt 0
Tehflieod 0
~.Deu.Sep28. 19SI(!:
~~~
C
sil H!rJ l'fvln
lt.tt . .........,~
lWR't!:)(wt,.ol 51•11.'WI lhtr~toci ~ - !l'N~by
enegolo01U
liS ...... lo)'•&neremorblto~ lWlll'N,llmde'S'slW
llOl'lfl'l:ofltl6tietnYJ~.KUOl-70ttd ~.Dl'!M~
~.,~ptng?",m11~~ .wtfk.,byn~.,,,,..
,weco.Jtfaf
~ nrac:t.~~rncrtll':«ttltth::cied:a:t'i.:t
~rqCll'NdlGf~. N ~
0tu:ttne-¥bf!:e,Pwt",t,:1ectl:o
,1 A.fl"'flll
. ~ IOd:Rd~w~U90(3~-..wfOU ~~
bac.h "Sla,JMt,;~)e'(~.•"frf9Y•1;"1-w,QCrt:t
Ytur:d!'
Swl=
---
◄ S(a,,no,-,
5 llAn't.~JY
6 l«h'5lerclnt
...
11,q(J"ltoYo,ssef
.,,,,,,,._
'f:42 Ob-,,d Bowie
J:33 O.Vld~
4Jl8 O.Yidao..
4:13 o.w,ea-,
2:$1 Oraw!ltor,<,,'JO
1:20 O.W,Bo,,,,w!,
2:i6 [)r,,dflO'tle
Z:38 oa'Yld&ow,,e-
J;l2 0r,41:UJowe
3:2'40.W,BtMle
II~
Figure 3. Metadata in iTuncs.
Tor Attb,t Or...~ b
I . Unde, P'l1ttsur•
2 ~ e Oddiry
3 undtf P'l tt,w e
• , .,., Dance
5 Ch-M'OO•
d. Rt!bel Rebl-1
,:.]~
q......,_ o ,@=
......yrt,...,...";).-« ~
l,i'l.t ,irtot1!1 .lll!SOl>ouglll
l.9:w.-eOddRy.
,..,,,..,.,.
AI.Mklln Saoe.
"'""'"""'
Lo,..,
D,v,a-
11..-.esEt~ T~f't41.etlM:c.u
t ~k.it>e- 1.~~ :th om~tt.ero-
SHM O 1. eu... fleiO
I l.ci.., ,kttodl
ol. Dl'Ad 1o.,.1,e ~•vIow ...
l
s:. lhrlr-.n.oa:
, . Pt<Mo Put•
1. Ro.t, Vf rut• t;
•-A>-0 u
Customer Reviews t"g '.'t1,•A■11Re-'Mwc O
eu..JtA11-hBeS1 * :lf..:A*Jt
b'tJitt-....,. ,.. -
Trclllliftltshoa,-.e:~os•Q-'• rt oneotlhOMotuct.
.....nere 1 r1~:opd.<:t.Aa.i:::1onc;c!OdlCW'9 boo<'ao1,etis11
e,M1tir1M«-.or~•ia.a:::ill,'Wf!Ke'!!'Cll.l<<in111dilrat..-cift.~O!t
....«tt,.,,.ior,1,~0,:tCiN'liel- Moire-
0 lheRtse«ldfalofZ...
0 ft,elbwloodF.,fofz...
0 Tt.RNondFalofZ...
0 TheRaundfaldZ... SOW~
0 lhoRisoMd.Falofz... 1().99 ...,
0 TheRISoMIOFalofZ... S0-99 ~
0 TheR.Q! YiCFaldZ... $0,99 ~
0 TheR.AlrldFalof Z... S0.99C.,••o-)
0 TI-.Ra.-.:!Falof z... MunOri)o
() TheRae.-,cFaldZ... S0.99
30 M etadata
with a particular song increases the probability that it will be found,
either by searching or browsing, and subsequently sold.
The value of descriptive mcta<lata doesn't rest solely in discovery and
retrieval It also facilitates tl1e second part of the e-commerce
equation: making the sale. Once a user browses tl1rough genres, sub-
gemcs, and artists to a particular albwn of interest they can read
reviews, ratings, song length, and even beats per minute. All of this is
descriptive metadata that will help ilic information seeker make a
value judgment of the content t11cy arc considering. The principle is
equally valid for corporate earnings reports as it is for Mariah Carey
videos.
Administrative Metadata
If descriptive mctadata is intended primarily for the information
seeker, administrative metadata is 1na.inly for the benefit of tl1e
information owner or steward. Metadata elements specifying from
where a file or document came, where it is to be hosted, who is
authorized to modify it, when it is to be archived, in what form and
for how long arc all administrative mctadata. It is created for the
purposes of management, decision making and record keeping.3
Administrative mctadata is tl1e lifeblood of modern content,
document and records management systems. It allows content to
move through its lifecycle in a largely automated fashion. For
example, companies try to keep ilieir websites interesting by
continually changing their content. cw stories arc posted to the
homepage and older content is moved to less prominent locations. J
few well chosen pieces of metadata, such as publish date, run length,
and archive page ID can combine with business ruJcs in a content
management system to automate for tlic most part the entire process
of updating a website. This frees the Web team to focus on creating
compelling content rather tlian shuffling files around the server. It
also allows tl,e website to be updated in the middle of the night
wiiliout disturbing the webmastcr's sleep.
Chapter Two 31
Recently, administrative mctadata has found a new niche in the form
of Digital Rights Management (DRM). Once the province of
military intelligence and industrial secrets, DRM has recently moved
into the mainstream. As distribution of intellectual property across
tl1e Internet and corporate Intra.nets has become the norm, having a
reliable means to track that content and control who can access it has
become essential. DRM secures digital materials and limits access to
only those with tl1e proper autl1orization. In addition, a complete
DRM solution facilitates and tracks any transactions involving tl,c
content you wish to protect. !,.or example, allowing copying or
limiting the period of access or the number of ti.mes content may be
viewed must all be supported.4
ORM technologies and techniques arc
dnven by administrative meta.data.
Structural Metadata
As we have noted, information comes in many forms and &om many
sources, usually bundled into packages tl1at a.re largely black boxes to
us. How a.re we, or more importantly ilic tools we use, to know how
the information is to be read, manipulated and displayed? How docs
an application know the technical requirements for integrating the
contents of some strange new file into its world so that we may have
access to its contents? This is the role of structural metadata.
Structural mecadata, sometimes referred to as technical metadata,
display metadata or use metadata, describes how an information
object, usually a file or set of related files, is put togetl,er. This can
range &om technical details such as file size, compression scheme,
and scanning resolution to display and navigation information such
as presentation order, typographic instructions, and search
mechanisms.
The most common application of struclural metadata is defining how
information is to be organized in databases and data warehouses.
Every piece of information housed in a database must be grouped
into records and described in terms of type, size, and relationships.
32 Metadata
The structural metadata governing this organization is in fact what
makes up a database and turns unorganized data into a usable
collection of structured information.
Another way of looking at structural metadata is the page-turner
model. In this model, structural metadata specifies how individual
information objects are bound together to make up a single
information package that is presented in a specific order, like the
pages and chapters of a book. This allows text, images, and other
content to be presented in sequence, but enables the user to navigate
it at will, jumping from section to section, while preserving the
organization and structure originally intended by the creator.
Metadata Schemas
Regardless of its type- descriptive, administrative or structural-and
the purpose to which it is applied, all metadata share certain
characteristics. At a minimum metadata must posses semantics,
synta..
x, and structure .5
Semantics refers to the meaning of metadata within a pmticular
comJtnmi!J or domain. T
t is important to note that any given metadata
field can have different interpretations depending on the context in
which it is being used. For example, the administrative field sample
so11rce could refer to a medical procedure or even a particular patient
in a medical context, or it could refer to a certain musical instrument
or recording in the context of audio production. It could just as easily
be a technical field referencing a particular device or encoding
scheme. The point is that without clearly defined semantics, it is
nearly impossible to accurately interpret mctadata.
Just as people cannot interpret metadata without an understanding of
its semantics, computers can't make sense of it without syntax and
structure. Syntax is the systematic arrangement of metadata elements
and their values according to well defined rules. The most common
Chapter Two 33
form of syntax currently is the name-value pair in which the name of
the metadata clement is simply matched with its value, such as:
<author =Arturo Perez-Reverte>
<title = The Club Dumas>
<genre =Fiction>
Structure defines how metadata is to be organized to ensure
consistent representation and interpretation in line with its syntax and
semantics. The structure specifies which mctadata elements are
allowed where, in what order and how often. A record describing a
"book" must start with one or more authors, followed by a single
title, a single genre, an optional sub-genre, a single publisher and so
forth.
Taken together, semantics, syntax, and structure form a type of
grammar, called a schema, that specifics the rules governing the
metadata of any given domain or application. At the most basic level,
a schema specifics a list of attributes that arc valid for describing ao
information package. A more sophisticated schema will often detail
out every aspect of how metadata is to be encoded and represented.
In all cases the overarching gmtl of defining a rich schema is to make
metadata as useful as possible in terms of interoperability,
extensibility and flexibility.
Interoperability is the ability of information systems to exchange
metadata an<l interact in a useful way over communication networks
such as the Internet.(' This is what allows the computers at
Amazon.com to talk to your bank or credit card company and receive
payment for the book you ordered. Extensibility means that the
original definition of the schema isn't the final word. It should always
be possible to add additional metadata elements (albeit in an
organized and controlled manner) to any schema in order to
accommodate specific and often L111forescen user needs.
34 Metadata
Above all, mctadata users demand flexibility from their metadata
schemes and systems. T hey do not want to be compelled to add
information that they deem is irrelevant or too cumbersome. As a
result, most mctadata schemas allow authors to include as much or as
little detail as they desire in a metadata record. This makes autl10rs
happy, but tends to make life difficult for information aod metadata
administrators, since the more flexible mctadata is, the less
interoperable it becomes. Two informatio n systems may depend on a
particular metadata elem ent in order to communicate, and if an
author fails to provide it, interaction between tl1c tvo systems
becomes impossible. Imagine if Amazon.com neglected to include
the price of a book when it tried to charge your credit card. Schemas
serve to mitigate tl1ese problems while presc1v ing as much flexibility
as possible.
T he number of publicly available schemas has exploded in recent
years, and there now seems to be metadata standards (official, de
facto, and even competing) for nearly every domain imaginable. O ne
of the earliest and most broadly applied is the Dublin Core (DC).
am ed after the Ohio city in which it was first drafted, the D ublin
Core was originally developed witl1 an eye to describing document-
like objects. More recently, D C metadata is beginning to be applied
to a broad range of other types of resources as well.
O ne of the strengths of DC and a prime reason for its popularity is
its simplicity. The D C schema captures the fundamental characteristic
of an information resource in a manner tliat is easy to create and
comprehend. Thomas Baker of the German National Research
Center for Information T echnology has referred to it as "metadata
pidgin for digital tourists."7
ln its current form, D C consists of fifteen elements covcnng tl1e
basic descriptive, administrative and structural needs of an
information object. For each clement the schema supplies both an
official label and a concise definition. I;or example creator is defined
as: "an entity primarily responsible for making the content of the
resource." Just as with a well defined structure, clear definitions of
Chapter Two 35
labels and terms arc essential to ensuring the appropriate
interpretation and application of metadata.
The D ublin Core is an example of a simple schema that can mediate
between the extremes of full indexing of raw text and highly
structured content. It provides a mechanism for capturing the
fundamental information necessary to describe an information
rcsow:cc without the burden of elements that may be irrelevant to a
particular community or application.
Some have perceived the spare nature of DC schema as a weakness.
While its basic nature allows it to describe many different types of
resources, it limits the detail you can capture about that resource. For
example, the creator clement, described above, makes no distinction
between a person, an organization, or a service. This could be
essential information to a particular application. Perhaps even more
troublesome is the fact that there are no constraints placed on the
values a given element may take. For example, the subject element
can be filled with a keyword, a Library of Congress Subject Heading
or a free text description. This lack of standard terms and values is
critical, as we shall sec shortly.
Descriptive
Title
Subject
Description
Source
Language
Relation
Coverage
Administrative
Creator
=
Publisher
Contributor
Rights
Figure 4. The current Dublin Core clement sci.
Structural
Date
Type
Format
Identifier
36 Metadata
These shortcomings arc common to most metadata schemas. The
Dublin Core is a good example of how linutations can be overcome
through extensibility. The DC supports two types of qualifiers,
schemes and types, which refine the base schema.
Schemas allow you to specify the standard syntax or vocabulary that
arc allowable for clement values. T he D C element Slf~jec/ may be
qualified with MESH to indicate that all values must be drawn from the
Medical Subject Headings vocabulary or LCSH to require Library of
Congress terms. Likewise the language clement may be qualified with
ISO 639-2RFC 3066 to ensure that any value applied to that field
conforms to the ISO standard.
DC types refine the definition of the core element itself. The basic
D C clement date, defined as "a date associated with an event in the
life cycle of the resource" is too generic to be useful. 13y applying a
type, the basic date clement can be transformed into date created,
issued, accepted, available, or acquired, among other possibilities.
This ability to refine and enhance the schema without corrupting its
fundamental nature and structure is the key to metadata extensibility.
'qithout it, any metadata system will quickly become obsolete
regardless of bow well conceived and executed initially.
Where Do I Put It?
Mctadata can live in several different places. TraditiooaU
y, as with the
card catalog, it has been recorded and stored separately from the
object it describes with a pointer of some sort to the location of the
information resource itself. This is o ften the case in content
management and data warehouse systems. Information resources will
be given a unique identifier and stored in whatever form and on
whatever system is most appropriate. 'fhe metadata describing that
resource may be hosted in a separate database dedicated to that
purpose. The metadata and the object it describes remain I.inked by
means of the resource's identifier.
Chap ter Two 37
This approach has the advantage of making it simple to update the
metadata of any given information resource. If a new manager takes
over responsibility for a large number of documents, you can simply
update the database with the new information rather than tracking
down and retagging the documents themselves. The disadvantage of
this approach is that the metadata doesn't travel with the document if
it is shared. If a file with externally managed metadata is ern,'liled to a
colleague at another organization, they will receive the content but
not the descriptive information. This can become a problem if that
additional inf01mation is critical to making the document usable.
Ao alternative to external management is to make the metadata a part
of the information resource itself. Most applications supporting thjs
approach store metadata as properties of the file they describe.
Mjcrosoft Windows, for example, allows an author to add summary
metadata to any file, which may then be used to organize, locate, a.nd
retrieve the information resource. In addition to traveling with the
file, internal metadata has the advantage of being somewhat self-
maintainiog. In the case of Windows metadata, some information is
extracted directly and automatically from the document itself. The
organization of the file is automatically extracted from heading styles
in a Word document, Excel worksheet titles, or slide titles in a
PowerPoint presentation. If the file changes, the new structure is
automatically reflected in the metadata. Usage statistics are also
automatically updated throughout the life of the document. At first
blush, semi-automatic maintenance and close coupling witl1 tl1e
information it describes makes internal metadata a very attractive
option, but it does come at a cost.
rirst, while some of the descriptive metadata (title, author, cornpa,!Y) can
be automatically generated, the fields that are most useful to retrieval
(su~ject, category, kry111ords) must be manually selected, keyed, and
maintained. If tl1e owner of the document changes, as mentioned
earlier, not only docs that field need to be updated in each impacted
docwnent, tl,ere will be no history of ownership. O nce ao internal
field is updated, all previous values are lost. This can become critical
if an explanation of something in ilie document is needed and no one
remembers who origi.nalJy wrote it.
38
----·· . ~-- ··-- ··· ·----· - ···--· • •·--· --·· • - -· - 'I
!lntn:irtifnfon to th"; i.;;11~~11itir Wi;bt1utlini. <lo;;-1•-: {tltil
Property
Description
[¼Title
c;:rsubject
[?'category
[¥Ke'WOrds
CJ'Comments
Origin
[?'source
[¥Author
Introduction to the Semantic Web
Semantic Web
Lectures
Semantic Web, RDF, Ontology
Draft of Lecture 1
Darin L. Stewart
CJ'Revision Mumber 2
'-'--_O_K_....,J_]" Cancel 11-" Apply I, _Help
Figure 5. Metadata in Microsoft Windows.
M etadata
tnother hazard is shifting terminology. The vocabulary of any
organization or community inevitably changes over ti.me. Keywords,
subject headings, and even category labels need to be updated to
reflect these changes. Otherwise a search engine will not be able to
match a relevant document tagged with obsolete tenns with a guery
from a user searching with the latest buzzwords. Additionally, while
deliberate keywords arc essential to effective retrieval, as discussed in
the prior chapter, the burden of selecting, assigning and maintaining
them falls primarily on the author (who is invariably overworked
already). This often leads to sporadic metadata and often
idiosyncratic tags and terms. This becomes an even greater problem
in the context of authority control, which we will discuss shortly.
Chap ter Two 39
Where Does It Come From?
The potential sources of metadata and the means of creating it are as
varied as the information resources they describe. Systems for
automatic generation exist but rarely reach an acceptable level of
quality without human assistance. Conversely, a broad application of
metadata across an enterprise of any si7.e is generally too tedious for
human beings working without the help of scripts, term extractors
and tagging tools. As a result, most successful metadata endeavors
draw on a range of sources, tools and techniques depending on the
nature of the information under consideration and the purposes for
which it is intended.
The same principle is just as applicable to creating the metadata for a
single information resource as it is to an entire collection. In most
cases, the descriptive metadat,'l will be assigned by the creator or
author of the information. This has the advantage of terms coming
from the person most familiar with the content and its original intent.
It has the disadvantage of the metadata reflecting the biases and
idiosyncrasies of the author, whose vocabulary may not necessarily
reflect that of her audience.
The readers may also place the information in a different context
from that originally conceived by the author. As a resuJt, it is often
advantageous to leave the creation of descriptive metadata to the
professionals. The National Information Standards Organization
(NISO) has noted that it is often more efficient to have indexers or
other information professionals create this metadata, because the
authors rarely have the time or necessary skills.R This is, of course, an
additional line item cost, but when lifetime cost of ownership
(especially in terms of findability) is taken into account, leaving it to
the professionals is often cheaper in the long run.
Administrative and structural metadata will often be generated by the
technical staff that prepares an information resource to be published
and distributed. The individual scanning an image or creating a digital
recording is in the best position to supply details about resolution, bit
40 Metad,1ta
rates and encoding schemes. The individual adding the resource to
the content management system will know when it is to be posted to
the website, for how long, and where it is to be archived at the cod of
its run.
As with any budding field, there are an abundance of tools available
to assist in the creation of mctadata. The most common (and
cheapest) is the application of templates such as those available in
most word processing applications. In addition to providing
standardized formatting of common document types, templates can
also guide the author in providing basic descriptive metadata. Even if
professional indexers arc utilized to create the final metadata, it is
often effective for the author to create a "first draft" of the mctadata
to serve as a guide. A well conceived document template can simplify
this task and improve the quality of the mctadata.
One of the challenges of high quality mctadata is ensuring that it
confonns to the appropriate schema. Mark-up and tagging tools can
prompt the user for the appropriate fields, requiring those that arc
mandatory for compliance to the designated schema. Once the
mctadata is complete, the tool can either embed the metadata in the
information resource itself or e>-
rport it to an cxtcmal mctadata
repository or database.
Extraction tools will analyze the content of an information resource
and attempt to extract appropriate terms and values for certain
metadata fields. ror structural mctadata, this is often straightforward
and quite effective. For more conceptual clements such as subject,
category or keyword, it gets a bit trickier. Most tools rely on a mixture
of statistical and computational techniques to make a best guess at
appropriate descriptive metadata. In most cases these tools require a
great deal of training in terms of sample docwncnts and target
vocabularies, and still depend on human intervention and revision.
However, much like having authors take a first pass at assigning
mctadata, automated extraction tools can dramaticalJy reduce the full
mctadata burden to a more manageable one of cleanup and
refinement.
Chapter Two 41
"'°'"'It::!
~J
e -=-~~
-+• .<11!: J
~ 9..
s,..,,,,
ZOMSIEI.AND
-
...........
e zoMBIElAUO Tllle:1 OYORc:
,..~
"''""""""'
l;IZ°"""""d
..,,_
;IZ~[BU",~I
lll!Z~ l
..,,_
~~!HDJ
Lite~aZorbekindl£iJI
lalZOlrlbd.n:!!2009!
::.
~ RMdead
tiSNlllollheOMd
~ Z~ ( 2-0itel luUi,01S"-'-
Q!Thf!Hill"90Ye'(R-A.Y:edSr,ole-Occ
~ ZorrbeL!ndl2-0c:c 'Nu1 UooSrul
~ Kd<An(Th~8,-l""'°"°C
ijZOltlesfZartluZCll'tld ·~
:;ao....,ie..,..i
'""'""
5""'0
..
on •200'3-10-2 Fl
-~R~R Pl
Gt<~ l:.Uf ~ El
~=
:c j°"""'
~-~
~~-1g~""--
............. a:int. ~ ~-.vi•~--- ,..
~..- .llllilrd Jl'OllcoJd00-=nc&,,,,...,,rto lfl.llt,o,sflt"'
~~~:~b-=J>:/t!~aj~·~ 8
net.named~U•:1.•£~lnc::M:11!1 ~r, tlll'
In..v,dJ,OUCSI~~ l'W!IU.-hic.tn
l a!Q 114t'ih
,_Nl'ft9AIIJ.,.M• S,tj d-=,,~l'IN~h.A•d'-".,_.'
~:Oll"C'Otw..11.Wltl~h'l'!lm&efllfll'IT'.aSIMoOl'ldA,hQ."II .,.
Figure 6. A metadata creation aid: Meta-X.
Metadata andAuthority Control
0 2llMB!EIANO r,_,_..,.
Metadata is a hard sell. It is expensive to create and difficult to
maintain. Executives have a tough time understanding how the
problem of having too much information to manage can be solved by
adding on yet more information. Metadata is a bit of a "hair of the
dog" solution. We add a little extra information to make a lot of
information more usable. Js to the expense the answer is, of course,
pay now or pay more later; sometimes a lot more. As discussed in the
prior chapter, a few moments tagging a document can save hours
bLU1ting for it l..'lter. When done properly, metadata initiatives nearly
always generate a positive return on investment. Unfortunately, few
a.re done properly and most fail. A prime reason for this is a lack of
authority control.
The notion of authority control boils down to making sure everyone
involved in the creation and management of an information resource
42 Metadata
is speaking the same language. It is the mechanism by which
consistency in onl.i.ne systems is created and maintained. When
applied to search and even navigation, it promotes greater precision
by providing official or "authorized" forms of names, labels and
values. As part of this system, references to equivalent terms and
synonyms and variants are created which dramatically improve recall.
9
recall.9
For example, if the authorized term for a "non-rigid, buoyant
airsrup" is blimp there will be cross references to zeppelin and
dirigible. An information seeker searching on any of these equivalent
terms would receive information for all of them.
The value of authority control to metadata should be obvious. While
schemas provide structure, syntax and semantics to ow: mctadata, thry
do nothing to ensttre comistenry i11 the values assigned to the elements of the
schema. The Dublin Core may specify an element called language and
define it as, "the language of the intellectual content of the resource,"
but it does nothing to limit the potential values that can be assigned
to that field. If DC metadata is being created for a□ international
news story, its language could be tagged as English, Eng., En, American
English, British English, or any number of variants. Each is potentially
valid, but the lack of consistency turns retrieval into a crap shoot. If
an information seeker searches on English they will receive only those
information resources labeled with that exact term. Anything tagged
with another term for English wilJ be ignored.
The solution is to restrict potential metaclata values to an agreed
upon list of terms, so that both information creators and seekers are
speaking the same language. Io many cases, an authoritative
vocabulary already exists and ca□ be adopted wholesale. Io the case
of the D C language element, the International O rganization for
Standardization (TSO) Language Codes standard (ISO 639-2)
provides authoritative names and codes for languages. English would
then be consistently represented as eng, Italian as ita, Japanese as jpn
and Esperanto as epo.
If the desired granularity docs not exist in tl1e standard, it can be
expanded. D CMI actually recommends this as a best practice in the
case of languagcs.
10
The ISO standard can be used in conjunction
Chapter Two 43
wiili ilie Internet Societies' proposal for language codes (RFC 3066),
which includes ilie more specific labels of en-US for American
E nglish, en-AU for English as used in Australia, en-GB for English in
ilie United Kingdom, or even en-GB-oed for British English using
spelling from the O xford E nglish Dictionary. The additional
advantage of adopting auilioritative terms is the possibility of
sa.ucturing the labels to reflect relationships.
Eng(UseFor English, en,)
En-AU (UseFor Australian English)
En-GB (UseFor British English)
En-GB-oed(UseFor British English OED spelling)
Despite the advantages it offers, authority control is a difficult pill to
swallow for most organizations. The prospect of giving up ownership
of terms and labels is often enough to incite turf battles in even ilie
most collegial of environments. Authors feel that it is unnecessary
and even inadvisable to constrain their vocabulary in any way (though
they invariably recognize the need for such constraints among their
coU
cagucs). Deciding who and what is ilie "authority" and who and
what is governed by its dictates are among the most contentious
issues in information management. If metadata is a hard sell,
authority control can turn into a shotgun wedding.
Fortunately, it needn't be so. A balance can be struck between the
expressive needs of content authors and the findability needs of
information seekers. D oing so depends on the proper definition,
creation and management of the inform.ation resources provided to
both groups. Taxonomies arc the lynch pin of this process.
Random documents with unrelated
content Scribd suggests to you:
6 Ib. p. 199.
7 Ann. du Museum d’Hist. Nat., tom. i. p. 234.
8 Lyell’s Principles of Geology, ii. p. 31.
9 Principles of Geology, ii. p. 8.
10 This subject will be found to be discussed at considerable
length, and in a very satisfactory manner, in the second
volume of Mr. Lyell’s Principles of Geology, p. 1-65.
11 Animaux sans Vertébres, i. p. 260.
12 Ibid. 258, N. Dict. d’Hist. Nat. xvi. Art. Intelligence.
13 Kirby’s Bridge. Treat. Intro. p. xxxii.
14 N. Dict. d’Hist. Nat. xxii. Art. Nature, 377; Anim. sans Vert.
i. p. 317.
15 Anim. sans Vert. i. p. 316.
16 Anim. sans Vert., vol. i. 322.
17 On the Influence of the Moon on the Earth’s Atmosphere;
Journal de Physique, Prairial, an. vi. Most of Lamarck’s
other essays on Meteorology will be found in the
periodical just named.
18 The most recent and probably the best edition of the
Animaux sans Vertébres, is in eight volumes octavo,
augmented with notes by M. M. Deshages and Milne
Edwards.
19 Animaux sans Vertébres, i. 381.
20 Horæ Entomologicæ, p. 213.
21 Cuvier conceives that the basin of Paris contains a greater
accumulation of fossil shells than any other place of equal
extent. At Grignon, no fewer than six hundred different
species have been collected in a space not exceeding a
few square toises.
22 See Boisduval, Nouv. Ann. du Museum, vol. ii.
23 Benett’s Wanderings, &c. i. p. 265.
24 Bridg. Treat. ii. 350.
25 Horsfield’s Catal. of the Lepidopterous Insects of Java,
Intro. p. 9.
26 This work extends to fourteen volumes (the last published
in 1833), and three supplementary ones are in course of
preparation.
27 Species général des Lépidoptères, p. 158.
28 Voyage de l’Astrolabe, Ent., pl. 4, fig. 1 and 2.
29 Species général des Lépidoptères, vol. i. p. 184.
30 Encyclop. Methodique, Art. Papillon, p. 67. No. 116.
31 Descrip. Catal. of Lepid. of Indian Company, pl. i. fig. 14.
32 Species général des Lepidoptères, i. p. 435.
33 Wilson’s Illust. of Zoology, fol. 27.
34 On the Plate the under figure should have been marked 1,
the upper 2.
35 Supp. to Cramer, p. 10, 11.
36 Owing to the resemblance which this species bears to H.
Cupido, the latter name has been inadvertently attached
to the figure on the adjoining Plate.
37 Zoological Illustrations, 126.
38 Trans. of Zool. Society of London, i. p. 187.
39 Zoological Illustrations, 2d series, 131.
Transcriber’s Note:
Obvious printer errors corrected silently.
Inconsistent spelling and hyphenation are as in the original.
*** END OF THE PROJECT GUTENBERG EBOOK FOREIGN
BUTTERFLIES ***
Updated editions will replace the previous one—the old editions will
be renamed.
Creating the works from print editions not protected by U.S.
copyright law means that no one owns a United States copyright in
these works, so the Foundation (and you!) can copy and distribute it
in the United States without permission and without paying
copyright royalties. Special rules, set forth in the General Terms of
Use part of this license, apply to copying and distributing Project
Gutenberg™ electronic works to protect the PROJECT GUTENBERG™
concept and trademark. Project Gutenberg is a registered trademark,
and may not be used if you charge for an eBook, except by following
the terms of the trademark license, including paying royalties for use
of the Project Gutenberg trademark. If you do not charge anything
for copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such as
creation of derivative works, reports, performances and research.
Project Gutenberg eBooks may be modified and printed and given
away—you may do practically ANYTHING in the United States with
eBooks not protected by U.S. copyright law. Redistribution is subject
to the trademark license, especially commercial redistribution.
START: FULL LICENSE
THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK
To protect the Project Gutenberg™ mission of promoting the free
distribution of electronic works, by using or distributing this work (or
any other work associated in any way with the phrase “Project
Gutenberg”), you agree to comply with all the terms of the Full
Project Gutenberg™ License available with this file or online at
www.gutenberg.org/license.
Section 1. General Terms of Use and
Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand, agree
to and accept all the terms of this license and intellectual property
(trademark/copyright) agreement. If you do not agree to abide by all
the terms of this agreement, you must cease using and return or
destroy all copies of Project Gutenberg™ electronic works in your
possession. If you paid a fee for obtaining a copy of or access to a
Project Gutenberg™ electronic work and you do not agree to be
bound by the terms of this agreement, you may obtain a refund
from the person or entity to whom you paid the fee as set forth in
paragraph 1.E.8.
1.B. “Project Gutenberg” is a registered trademark. It may only be
used on or associated in any way with an electronic work by people
who agree to be bound by the terms of this agreement. There are a
few things that you can do with most Project Gutenberg™ electronic
works even without complying with the full terms of this agreement.
See paragraph 1.C below. There are a lot of things you can do with
Project Gutenberg™ electronic works if you follow the terms of this
agreement and help preserve free future access to Project
Gutenberg™ electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright law
in the United States and you are located in the United States, we do
not claim a right to prevent you from copying, distributing,
performing, displaying or creating derivative works based on the
work as long as all references to Project Gutenberg are removed. Of
course, we hope that you will support the Project Gutenberg™
mission of promoting free access to electronic works by freely
sharing Project Gutenberg™ works in compliance with the terms of
this agreement for keeping the Project Gutenberg™ name associated
with the work. You can easily comply with the terms of this
agreement by keeping this work in the same format with its attached
full Project Gutenberg™ License when you share it without charge
with others.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
1.E. Unless you have removed all references to Project Gutenberg:
1.E.1. The following sentence, with active links to, or other
immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project Gutenberg™
work (any work on which the phrase “Project Gutenberg” appears,
or with which the phrase “Project Gutenberg” is associated) is
accessed, displayed, performed, viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this eBook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.
1.E.2. If an individual Project Gutenberg™ electronic work is derived
from texts not protected by U.S. copyright law (does not contain a
notice indicating that it is posted with permission of the copyright
holder), the work can be copied and distributed to anyone in the
United States without paying any fees or charges. If you are
redistributing or providing access to a work with the phrase “Project
Gutenberg” associated with or appearing on the work, you must
comply either with the requirements of paragraphs 1.E.1 through
1.E.7 or obtain permission for the use of the work and the Project
Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.
1.E.3. If an individual Project Gutenberg™ electronic work is posted
with the permission of the copyright holder, your use and distribution
must comply with both paragraphs 1.E.1 through 1.E.7 and any
additional terms imposed by the copyright holder. Additional terms
will be linked to the Project Gutenberg™ License for all works posted
with the permission of the copyright holder found at the beginning
of this work.
1.E.4. Do not unlink or detach or remove the full Project
Gutenberg™ License terms from this work, or any files containing a
part of this work or any other work associated with Project
Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute this
electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the Project
Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.
1.E.7. Do not charge a fee for access to, viewing, displaying,
performing, copying or distributing any Project Gutenberg™ works
unless you comply with paragraph 1.E.8 or 1.E.9.
1.E.8. You may charge a reasonable fee for copies of or providing
access to or distributing Project Gutenberg™ electronic works
provided that:
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You provide a full refund of any money paid by a user who
notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.
• You provide, in accordance with paragraph 1.F.3, a full refund of
any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™
electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or
damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.
1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for
the “Right of Replacement or Refund” described in paragraph 1.F.3,
the Project Gutenberg Literary Archive Foundation, the owner of the
Project Gutenberg™ trademark, and any other party distributing a
Project Gutenberg™ electronic work under this agreement, disclaim
all liability to you for damages, costs and expenses, including legal
fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR
NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR
BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK
OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL
NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT,
CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF
YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.
1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you
discover a defect in this electronic work within 90 days of receiving
it, you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or
entity that provided you with the defective work may elect to provide
a replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.
1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
1.F.5. Some states do not allow disclaimers of certain implied
warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation,
the trademark owner, any agent or employee of the Foundation,
anyone providing copies of Project Gutenberg™ electronic works in
accordance with this agreement, and any volunteers associated with
the production, promotion and distribution of Project Gutenberg™
electronic works, harmless from all liability, costs and expenses,
including legal fees, that arise directly or indirectly from any of the
following which you do or cause to occur: (a) distribution of this or
any Project Gutenberg™ work, (b) alteration, modification, or
additions or deletions to any Project Gutenberg™ work, and (c) any
Defect you cause.
Section 2. Information about the Mission
of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.
Volunteers and financial support to provide volunteers with the
assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.
Section 3. Information about the Project
Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.
The Foundation’s business office is located at 809 North 1500 West,
Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact
Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many
small donations ($1 to $5,000) are particularly important to
maintaining tax exempt status with the IRS.
The Foundation is committed to complying with the laws regulating
charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.
While we cannot and do not solicit contributions from states where
we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.
International donations are gratefully accepted, but we cannot make
any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Section 5. General Information About
Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.
Project Gutenberg™ eBooks are often created from several printed
editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
Managing your Metadata w/ SharePoint 2010
PPTX
Knowledge Management And The Technical Writer
PDF
Improving Findability: The Role of Information Architecture in Effective Search
PPT
Enterprise Information Catalogue Another Way
PDF
Optimising Your Content for findability
PDF
IA basics
PPT
Practical Approaches to Sharing Information
PDF
Configuring share point 2010 just do it
Managing your Metadata w/ SharePoint 2010
Knowledge Management And The Technical Writer
Improving Findability: The Role of Information Architecture in Effective Search
Enterprise Information Catalogue Another Way
Optimising Your Content for findability
IA basics
Practical Approaches to Sharing Information
Configuring share point 2010 just do it

Similar to Building Enterprise Taxonomies 2nd Edition Darin L Stewart (20)

PDF
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
PDF
OK So Enterprise Search is "Janky" - Now What?
PDF
Social Intranet
PPTX
Enterprise Search, Simple, Complex and Powerful
PPT
Taxonomies And Search Aiim Mn
PDF
Intelligent Search
PDF
xx EAEC0141311SYN Scripps_final
PPTX
Practical Information Architecture
PPTX
Card Sorting Your Way to Meaningful Metadata
PDF
Optimising Your Content for Findability
PPTX
Movin on Up - SPEngage Phoenix 2017
PDF
How Search 2.0 Has Been Redefined by Enterprise 2.0
PPT
Enterprise 2.0: social networks behind the firewall
PDF
Introduction to Enterprise Search
PPTX
Taxonomies and Search for Chicago SharePoint User Group
PDF
Survey Says... Your Content Supply Chain is Broken
PDF
HOW TO PROVIDE USEFUL INFORMATION IN A USER-CENTERED INTRANET SITE
PPTX
Using Taxonomy for Customer-centric Dynamic Publishing
KEY
Adaptable Information Workshop slides
PDF
Digital Asset Management: Searching is Easy. Finding is Hard.
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
OK So Enterprise Search is "Janky" - Now What?
Social Intranet
Enterprise Search, Simple, Complex and Powerful
Taxonomies And Search Aiim Mn
Intelligent Search
xx EAEC0141311SYN Scripps_final
Practical Information Architecture
Card Sorting Your Way to Meaningful Metadata
Optimising Your Content for Findability
Movin on Up - SPEngage Phoenix 2017
How Search 2.0 Has Been Redefined by Enterprise 2.0
Enterprise 2.0: social networks behind the firewall
Introduction to Enterprise Search
Taxonomies and Search for Chicago SharePoint User Group
Survey Says... Your Content Supply Chain is Broken
HOW TO PROVIDE USEFUL INFORMATION IN A USER-CENTERED INTRANET SITE
Using Taxonomy for Customer-centric Dynamic Publishing
Adaptable Information Workshop slides
Digital Asset Management: Searching is Easy. Finding is Hard.
Ad

Recently uploaded (20)

PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
RMMM.pdf make it easy to upload and study
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
master seminar digital applications in india
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
Pharma ospi slides which help in ospi learning
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Business Ethics Teaching Materials for college
Abdominal Access Techniques with Prof. Dr. R K Mishra
RMMM.pdf make it easy to upload and study
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
master seminar digital applications in india
Anesthesia in Laparoscopic Surgery in India
Microbial disease of the cardiovascular and lymphatic systems
TR - Agricultural Crops Production NC III.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Supply Chain Operations Speaking Notes -ICLT Program
Microbial diseases, their pathogenesis and prophylaxis
Week 4 Term 3 Study Techniques revisited.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
VCE English Exam - Section C Student Revision Booklet
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Pharma ospi slides which help in ospi learning
STATICS OF THE RIGID BODIES Hibbelers.pdf
Business Ethics Teaching Materials for college
Ad

Building Enterprise Taxonomies 2nd Edition Darin L Stewart

  • 1. Building Enterprise Taxonomies 2nd Edition Darin L Stewart download https://ebookbell.com/product/building-enterprise-taxonomies-2nd- edition-darin-l-stewart-52722162 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Building Enterprise Iot Solutions With Eclipse Iot Technologies An Open Source Approach To Edge Computing 1st Edition Frdric Desbiens https://ebookbell.com/product/building-enterprise-iot-solutions-with- eclipse-iot-technologies-an-open-source-approach-to-edge- computing-1st-edition-frdric-desbiens-47394342 Building Enterprise Applications With Windows Presentation Foundation And The Mvvm Model View Viewmodel Pattern 1st Edition Raffaele Garofalo https://ebookbell.com/product/building-enterprise-applications-with- windows-presentation-foundation-and-the-mvvm-model-view-viewmodel- pattern-1st-edition-raffaele-garofalo-2247674 Building Enterprise Blockchain Solutions On Aws A Developers Guide To Build Deploy And Managed Apps Using Ethereum Hyperledger Fabric And Aws Blockchain English Edition Palaniachari https://ebookbell.com/product/building-enterprise-blockchain- solutions-on-aws-a-developers-guide-to-build-deploy-and-managed-apps- using-ethereum-hyperledger-fabric-and-aws-blockchain-english-edition- palaniachari-34563892 Building Enterprise Javascript Applications Daniel Li https://ebookbell.com/product/building-enterprise-javascript- applications-daniel-li-36371318
  • 3. Building Enterprise Systems With Odp An Introduction To Open Distributed Processing Linington https://ebookbell.com/product/building-enterprise-systems-with-odp-an- introduction-to-open-distributed-processing-linington-4393208 Building Enterprise Systems With Odp An Introduction To Open Distributed Processing 1st Edition Peter F Linington https://ebookbell.com/product/building-enterprise-systems-with-odp-an- introduction-to-open-distributed-processing-1st-edition-peter-f- linington-4440298 Building Enterpriseready Telephony Systems With Sipxecs 40 Leveraging Open Source Voip For A Rocksolid Communications System Michael W Picher Anthony Graziano https://ebookbell.com/product/building-enterpriseready-telephony- systems-with-sipxecs-40-leveraging-open-source-voip-for-a-rocksolid- communications-system-michael-w-picher-anthony-graziano-4720524 Building Enterprise Iot Applications Chandrasekar Vuppalapati https://ebookbell.com/product/building-enterprise-iot-applications- chandrasekar-vuppalapati-11116130 Building Enterprise Applications With Windows Presentation Foundation And The Mvvm Model View Viewmodel Pattern 1st Edition Raffaele Garofalo https://ebookbell.com/product/building-enterprise-applications-with- windows-presentation-foundation-and-the-mvvm-model-view-viewmodel- pattern-1st-edition-raffaele-garofalo-11508960
  • 5. Buil E . • Darin L. Stewart I
  • 6. Building Enterprise Taxonomies A Controlled Vocabulary Primer SECOND EDT TTON Darin L. Stewart, Ph.D. (@ Mokita Press
  • 7. To Laura, who taught me the importance of details.
  • 8. Copyright © 2011 by Darin L. Stewart Published by Makita Press. All rights reserved. Printed in the United States of America. No part of this book may be reproduced in any manner whatsoever without written perm1ss1on except in the case of brief quotations embodied in critical articles and reviews. For information contact clearance@ mokitapress.com SECOND EDITION
  • 9. Contents 1. Findability 1 Infoglut 4 The Problem with Search 6 Tcleporting and Orienteerin 12 2. Metadata 23 T ypes of Metadata 28 Descriptive Metadata 28 Administrative Metadata 30 Structural Metadata 31 Metadata Schemas 32 Where Do I Put It? 36 Where Does It Come From? 39 Metadata and Authority Control 41 3. Taxonomy 45 Linnaean Taxonomy 48 Controlled Vocabulat:ies 51 Faceted Classification 59 4. Preparations 67 The Taxonomy Development Cycle 70 Research 72
  • 10. Performing a Content Audit 76 Creating a Governance Document 83 5. Terms 89 Internal Term Sources 90 I ntranets and Websites 92 External Term Sources 95 Existing Taxonomies 98 Refining Terms 99 Basic Hygiene 99 Compound and Precoordinated Terms 107 Disambiguation 110 6. Structure 115 Card Sorting 116 Categories and Facets 122 7. Interoperability 135 Basic XML Concepts 137 Representing Hierarchy 140 l;ear of Baggage Handling 144 XSLT 146 Zthes 152 8. Ontology_ 159 What Is An Ontology? 160 Class Hierarchy, Slots and Facets 162 Resource Description Framework 168
  • 11. RDF/ XNU~ 176 RDF Schema 180 Web Ontology Language (OWL) 181 9. Folksonomy 185 Tagging 186 Folksonomy 191 Tag Clouds 194 Pace Layerin 202 Glossary 205 Notes 219 Index 227
  • 12. 1 Findability "But the plans were on display..." "On display? I had to go down to the cellar to find them." 'That's the display department." "With a flashlight." "Ah, well, the lights had probably gone." "So had the stairs." "But look, you found the notice didn't you?" "Yes," said Arthur, "yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying 'Beware of the Leopard."' From The Hitchhikers Guide to the Galaxy rinding good information is hard, much harder than it should be. Your first encounter with a new website often feels like entering a strange land with its own language, laws, customs and culture. You have business to conduct there, but must do so without the benefit of an interpreter or guide. As you begin to explore the homepage, you must quickly orient yourself to its wugue approach to navigation, in terpret bizarre labels and menus, guess at search terms and wade through propaganda in search of useful information. And these are just the public pages. T hings get much more dangerous if you venture out of the tourist areas and onto an i.ntranet or, heaven help you, a file system. Once you enter the realm of the enterprise i.n fonnation system, all bets are off. The seemingly unified front of the corporate website dissolves into a collection of fiefdoms, each with its own local dialect and
  • 13. 2 Chapter One jealously guarded borders passable only with the right permissions and passwords. There also seems to be a civil war underway. Despite our best efforts, most websites, portals, intrancts and file systems are hostile environments for information seekers. We hire consultants, hold focus groups and conduct usability studies to w1derstand our users' needs. We build site maps, add search boxes, and tag our content, and users still get lost. According to surveys conducted by Gartner, IDC and others, knowledge workers spend from thirty to as much as forty percent of their work day searching for information and yet only find what they need less than half the time. 1 This means we spend more time looking for documents than actuaUy reading them. This situation is not just embarrassing, it's expensive. A third of a scmor knowledge worker's time, the time they spend chasing information, works out to be roughly S26,000 a year i.n salary and benefits on average. When those searches arc successful, this is a legitimate cost of doing business. When they fail, that fruitless search time is a drain on resources. Yct as expensive as tlus may seem, search time is a mi.nor component of the cost of luddcn information. Even ilic tens of iliousands of dollars spent on redesigning and maintaining an improved website is trivial if it gets users to tl1e content iliey need. The true cost comes when users ilirow up their hands and abandon ilieir search. Studies have suggested that this happens after about twelve minutes at tl,c outside. This phenomenon is not restricted to complex searches and obscure facts. Inforn1ation as mundane as tl,e contact i.nformation for the director of human resources cannot be located by employees on their own Intranet fifty-seven percent of the time. Those intrepid few who can find ilic information usually must troU tl1rough multiple Web pages and documents looki.ng for an org chart (which is probably out of date) that nught have tl,e director's name. They must then look up the director in an employee directory located elsewhere on the Intranet hoping they spelled ilie name right. O ne study found tlus to be the case in five out of si.-..:: corporate Intraoets.2
  • 14. Findability 3 When people can't fi11d what they need, they don't just give up. They go elsewhere. When a consumer doesn't find the right product, they go to a competitor, which in aggregate costs your company half of its potential sales. Tf they arc already a customer, they pick up the phone. This costs you an average of seventeen dollars for each call that yom self-service website was supposed to eliminatc.3 When an employee can't find what they need, they go to a co-worker, doubling costs while halving productivity and often yielding no better results. Tn a 2002 research note, Rcgi11a Casonato and Kathy Harris of Gartner estim ated that ao employee will get fifty to seventy-five percent of the mformation they need directly from other people, effectively erasmg tl1e benefits of a corporate Tntranet.4 When a knowledge worker reaches this dead-end, they have little choice but to set about creating tl1e information they need &om scratch. This may be as simple as running a report and stitching a few documents together, but more often it involves considerable research, an additional information chase and consultation with multiple colleagues. Unfortunately, all this effort is not being expended to create information, but to recreate it. As much as ninety percent of the time spent creating information for a specific need is actually recreating information that already exists but could not be located. 5 According to Kit Sims Taylor, this is because it is simply too hard to find what you need. At present it is easier to write that contract clause, exam question, insurance policy clause, etc., ourselves than to find something close enough to what we want from elsewhere.... While most of us do not like to admit that much of our creative work involves reinventing the wheel, an honest assessment of our work would indicate that we do far more 'recreating' than creating. 6 Taylor has found that in addition to the amount of time spent looking for information, an additional thirty percent is spent reinventing the wheel. When you account for communication and collaboration overhead, only ten percent of our time, effort and
  • 15. 4 Chapter One energies is actually spent in the creation of new knowledge and information. In a separate study, IDC found that th.is "knowledge work deficit" costs Fortune 500 companies over twelve billion dollars annually.7 These arc just the purely quantifiable costs. Consider the impact poor findability has on decision making when there simply isn't time to re- research and recreate the needed intelligence. Critical decisions may be delayed because the information we can find, if any, is either incomplete or conllicting. Worse, bad decisions may be enacted when they wouldn't have even been considered had a fuller, more accurate picture been available. ln this age of compliance, the ability to locate and produce information on demand can mean the difference between passing an audit and dissolving the company. lnfoglut So how did we get into this mess? We have spent literally trillions of dollars on information technology, and yet our access to information seems to get worse in direct proportion to the amount of money and effort expended to improve it. Some pundits point to the sheer volume of information with which we arc inundated and resign themselves to this inevitable consequence of life in the information age. As Britton Hadden of n NfE magazine put it: Everyday living is too fast, too busy, too complicated. More than at any other time in history, it's important to have good information on just about every aspect of life. And there is more information available than ever before. Too much in fact. There is simply no time for people to gather and absorb the information they need. Hadden made this observation in 1929, shortly before founding the magazine. Infoglut is not a new problem, but until recently it was at least somewhat manageable. Today we are discovering that the.: only
  • 16. Findability 5 thing worse and more dangerous than trying to run an organization with too little information is trymg to manage one with too much. Everyone understands intuitively that infoglut is a problem, but few have a clear sense of how much of a problem it really is. Experts have long proclaimed the dangers of information overload. While hyperbole is the lifeblood of consultants, in this case they seem to be right on th.e money. Each year the world produces roughly five exabytes (1018 /ijleJ) of new information. To put that in more familiar terms, if the seventeen million books in the Library of Congress were fully digitized, five exabytes would be the eguivalent of 37,000 new libraries each year. While thjs is staggering in and of itself, consider that in 1999 it is estimated that only two exabytes of new information was created, meaning that the rate of information growth is accelerating by 30% a year. 92% of that information is stored on digital meilia and 40% is generated by the United States alone. We create 1,397 terabytcs of o ffice documents each year. Each day we send thirty-one billi.on emails.8 It is no wonder that we are, as John Naisbitt famously put it, "drowning in information, but starved for knowledge." The deluge bas not caught us by surprise. O n the contrary, we have attacked it with a vengeance, pouring billions into data warehouses, CR.N(, EJUJ, business intelligence and other data management and reporting systems. These efforts and investments have bought us great insight into our str11ctured con/en!: that highly organized information structured according to a well defined schema or framework. These are the records found in relational databases and tl1at slot so ruccly into spreadsheets and reports. 'l'he information contained in these records can easily be located, manipulated and retrieved by means of standard guery languages such as SQL. Unfortunately, this type of domesticated data makes up only fifteen percent of the total information with which we must copc.'1 The remaining eighty-five percent is made up of Web pages, emails, memos, PowerPoint presentations, invoices, product literature, procedure manuals, take-out menus and anything else that doesn't fit neatly into a row in a database. The common factor among all of
  • 17. 6 Cbap terOne these different forms of 1111struct11red co11/e11! is that they arc all designed for human consumption rather than machine processing. As a result, all of the tried and true methods of data management we have worked so hard to master fail miserably when asked to bring a company picnic announcement to heel. So while quarterly sales forecasts across four continents may be readily available, knowing whether you are supposed to bring a salad or a dessert may be out o f reach. The Problem with Search This aspect of the information onslaught bas in fact taken us by surprise. Many of us arc still in denial. A fter all, with fully indexed, electronic in formation sources, full-text searching should allow us to specify all the terms and subjects in which we are interested and have the information retrieved and delivered to our desktop. As any user of Google, A9, or countless other search and retrieval engines has learned tlirough painful experience, things rarely work out that neatly. Rather than receiving a nice, neat set of t,'l.rgeted documents, search engines generally present us with long lists of Web pages that merely contain the words on which we searched. Whether or not those words a.re used in the manner and context we intended (did you mean Mercury the planet, tl1e car, ilie Roman God or the element?) isn't pa.rt of the equation. We a.re left to sort through page after page of links looking for something that might be relevant. Part of th.is problem is self-inflicted. People just don't write good queries. O ne third of the time, search engine users only specify a single word as tl1e.i.r query and on average use only two or thrcc. 10 This is what leads to so many irrelevant documents being returned. We don't give enough context to our subject to eliminate documents that arc not o f interest. lf you query just on the term "Washington" you will receive links to information o n the state, the president, the capital, a type of apple, a movie star, a university and so forth. In all, Google returns 1,180,000,000 "hits." If you add the term "Denzel"
  • 18. Findability 7 the number of links drops to 3,520,000, and we are reasonably focused on the actor. If we add the phrase "Academy Awa.rd" we finally get to 107,000 docwnents reasonably focused on the actor's accolades. So the more specific and verbose we are with ou.r queries the more relevant the results. But what happens if you use the J cademy Awa.rd's comm on nickname "Oscar" in yom query? The number of hits jumps to 593,000. This is the risk of getting too specific with search terms. By using the proper name of the award rather than its popular name, we may have missed 486,000 potentially relevant documents. Guessing the wrong search term can have a dramatic impact on what you do and don't frnd. Information scientists have long been aware that there a.re tradcoffs between depth and coverage whenever a search is conducted. The broader the search is, the more documents that a.re retrieved, including those that a.re not relevant to the actual information need. Conversely, the deeper or narrower the search, the more likely retrieved documents a.re to be relevant. The cost, of course, is that it is also more likely that documents of interest will be missed in the search. The difficulty arises from the fine balance of preciJion and recall. Precision is usually described as a ratio: the number of relevant documents retrieved divided by the total number of documents retrieved. In other words, what percentage of the total number of docwnents retrieved arc actually related to the topic being investigated? For example, a Google search on the terms "precision" and "recall" returns approximately 970,000 documents. T he fust few documents in the list do indeed prove to be related to measures of search performance. However, a few links into the list a news item appears: "Vermont Precision Woodworks Announce Recall of Cribs." From the search engine's perspective, this is a perfectly valid docw11ent. Tt contains both of the search tcrn1.s in its title. In fact, one search term appears in the title of the website itself, www.recall- wa.rnings.com, thus causing it to receive a high relevancy ranking. O ut of 970,000 documents, it is safe to assume that many, if not
  • 19. 8 Cbapter One most, of the retrieved documents will have this level of relevancy to our 9uery. This indicates low precision, but high recall. Recall is also a ratio and is defined as the number of relevant documents retrievedthe total number of relevant documents in the collection being searched. The example above probably has a high recall due to the large number o f documents returned. Relevant Documents Retrieved Total Documents Retrieved =PRECISION Relev,ml Documents Retrieved Total Relevant Documents in Collection recaU =RECALL These two measures are inversely related: as recall increases, precision decreases. A balance must be found between the two, retrieving enough docw11ents to get an individual the information they need without returning so many that wading through irrelevant information becomes burdensome. This balance is the heart of information retrieval, but it is difficult to measure precision and recall precisely. 'Ibis is because we rarely know what is contained in the collection we arc searching, in this case the Internet itself, and also because the notion of relevance is very subjective. At best we can estim ate recall and precision based on feedback from users of the search engine in 9ucstion and make adjustments as appropriate. Taking our Google search on "precision" and "recall" as a test case, it may seem that the problem isn't so bad. J fter all, the first several documents in the list were on the exact topic we were seeking: search performance measures. We can just disregard the other 3.5 million documents offered. We got what we needed from the top ten or l:'(vcnty.
  • 20. Findability 9 This ability to rank pertinent documents near the top of a result set is what has made Google the clear winner of the search engine wars. Their PageRanJ<: algorithm is a key ingredient in the Google secret sauce.11 Rather than just counting how many ti.mes a certain word occurs in a document or where it occurs, Google also looks at who links to that document. If a lot of pages reference a particular website, chances are that it is a pretty important source of information on the topic at hand. If the pages linking in are themselves important, then that likelihood increases and the document's relevancy rank improves accordingly. This variation on "citation analysis," which is traditionally used to determine the importance of scholarly publications, has radically changed Internet search for the better. Google even offers a free tool that l can add to my website to search my own content with just a few lines of code. So, problem solved? Not quite. There arc several caveats to applying a Google-like tool to your fi.ndability challenges. First, Google free site search is really only searching a subset of the entire Google index, that part representing just your website. As a result, only those Web pages that are open and available to the public will be included in a search. Anything on the lntranet is invisible to the Google spiders, the programs that find and index Web pages and build up the search index. Even those pages and documents that are open to the Internet at large may be missed. Indexing programs only go so deep when looking over a website. If your content is more than a link or two away from the main page, it will probably be missed. Any new content you add will likewise be invisible until the next time an indexing spider happens by- a process completely outside of your control. As Google explains: There are a number of reasons a page might not appear in the results of your Google free site search. It could be that Google hasn't crawled that particular page yet. Google refreshes its index frequently, but some pages are inevitably missed. Or, the page may have Javascript, frames, or store information in a database. Pages like these are difficult or impossible for the Google crawler to visit and index. 12
  • 21. Chapter 0 11e Finally, Google's greatest strength, the PagcRank algorithm, is also its greatest weakness when applied to a single website. l t is unlikely that CNN.com or eBay will reference your org chart. In fact, very few websites outside of your organization will link to your internal documents. Yet the rankings applied to your documents are determined in the context of rankings of the Internet as a whole. Th.is effectively renders the relevancy judgments made on your content meaningless when the search is restricted to your own sitc.11 Aside from the arcane nature of indexing, the very act of searching can be a struggle in most organizations. Documents and content are spread out across multiple locations and repositories. Policies may be on the Intranet, quarterly reports on the file system, resumes in a departmental directory and price lists on the company homepage. Finding information is no longer an exercise in finding a needle in a haystack. First you must choose which haystacks to search, in what order, and for how long. In most organizations, less than half of their documents arc centrally inclexecl.14 Th.is means that it is impossible to look for information in aU potential locations with a single query or even a single search tool. Th.is dispersal of information across an organization leads to another search challenge: choosing the correct query terms. < !-- sitesearch Googl e --> <FORM rnethod =GET a ccion=" http: / ;..,...,. google . com/search "> <input type=hidden narne=ie value=UTF-B> <i np ut type=ludden narne=oe val ue=UT F-B> < TABLE £.9.£9.-!-..2£.= " #FFFFFF "><~..i:.:><td> < A HREF= " htt p : / /www. google . com/ " > <IMG SRC= " http : / /www. google.com/ l ogos/ Logo_ 40wht . gif" £2.uie r= " 0 " ALT= "Google " ></A> </t d><td> <INPUT TY PE=text narne=q size=31 maxlength=255 v alue=""> <INPUT type=submit name=btn<q VALUE="Google Search" > <f ont size=-1> <input type=hidden narne=dornains v alue="YOUR DOMAIN NAME " ><b.r_> < input type=radio name=sit;es;ean:::J~ value= ""> WliJW <input type=r adio !l s!l!!.. ~ it~ ~ value="YOUR DOMAIN NAME" chec ke d > YOUR DOMAIN NAME <br ></font ></td></tr></TABLE> </ FORM> ~ Sit;eSea~ Google --> Figure 2. Just cut, paste and you've got search. Not quite.
  • 22. Fir1dability Most search engines create their indexes by extracting terms from the full text of documents. As a result content creators and authors become de facto indexers and catalogers. The words they choose in authoring their documents become the search terms available to their readers. This becomes a problem if they don't speak the same language. Th.is goes back to the Mercury (planet, car, god) and actor (Academy J ward or Oscar) problem. 11 Figure 3. The ideal relationship between author and searcher. Unless there is a company standard for terminology, and these are rare, each area of an enterprise is going to have its own language. f cmtomer in one area may be a client in another and a patron somewhere else. This lack of consistency in search and indexing terms has proven to be the single greatest challenge to the effectiveness of search and findability in general.15 Ultimately, any search consists of, at rruru.mum, four hurdles that must be cleared. First, the information seeker must be able to articulate what they are looking for with the right syntax for the specific search tool being used. ext, they must guess what words an author may have used to express the concept of interest. Then, with the query in mind, they must figure out the most likely place to search. Finally, they must sift through the results of their search, separating the potentially relevant from the clearly irrelevant and hope what they end up with is complete, representing all that is available. Really, it's a wonder that we ever ftnd anything at all.
  • 23. 12 Chapter One Teleporting and Orienteering A keyword search is most often an attempt (usually several attempts, actually) to go directly and instantaneously to the exact location o f desired information. If we search the Web on the terms "Aladc:lin Theater Box-Office" we hope to land where we can purchase tickets for concerts at th.is small venue in Portland, Oregon, without having to sift through irrelevant information. The academic commwlity has labeled tllis sort of information seeking behavior teleporting.16 Teleporting is one strategy for finc:ling information and can be executed in various ways with a number of search tactics. In addition to keyword search, an information seeker may attempt to tcleport by specifying a specific URL, opening a certain email, or typing in a directory path to a particular document. Perfect tclcporting (hitting your target on the first attempt) is a rare accomplishment; so rare in fact that a game, "Google Whacking," has sprung up around the challcnge.17 Yct despite tl1e difficulty in finding just the right information witl1 search alone, most websites and information portals seem designed to encourage the attempt as evidenced by the ubiquitous search box. A more realistic scenario is to teleport into the general vicinity of the information you arc seeking, using search or some other tactic, an<l then zero in on your target with a succession of small steps. To buy OLLr concert r-jckcts for a show at the AJadc:lin, for example, we might teleport by typing in the URL for tl,c theater: www.aladc:lin- tl1catcr.com. We know we arc close, but still can't buy our tickets so we may follow tl1c link to the "Upcoming Shows" page. Herc we find the performer we arc looking for listed vitl1 a linl< to "show details" so we click through to tl1at page. Finally we sec a banner for "Local Ticket Outlet Information," which leads us to a link for the "Aladdin Theater Online Ticketing Page" where we can order our tickets. This strategy of locating information by continually narrowing our search through incremental steps has been dubbed orienteering (though most people simply call it browsing) and has proven to be
  • 24. Findability 13 the preferred approach to finding information. Studies conducted at the MIT Artificial Intelligence lab have found that information seekers use keyword search less than forty percent of the ti.me. Surprisingly, this holds true even when searchers know exactly what they arc looking for and even where to find it (see table 1). 18 Specific General Specific Total Information Information Document Orienteering 47 19 41 120 Teleporting 34 23 17 80 Total 81 42 58 200 Table I. Information need by search strategy (19 unknowns removed). There are circumstances where keyword search yields nominally better results than navigation. In one study, information seekers were more successful at locating information on a well indexed medical information site by using search rather than browsing. Interestingly, those most successful at finding what they were looking for were tl1ose individuals who turned to search only after browsing failed (see figure 3). Even when individuals abandon browse oo a given information hunt and succeed with search, they invariably return to orienteering oo thei.r next task. 19 M.I.T. researchers have found several reasons why people prefer to zero in on information rather than attempting to pounce on it in a single great leap. First, it can be difficult to dearly articulate exactly what it is you arc seeking. This is the case even when trying to retrieve familiar information and documents. Think of the last ti.me you were asked for directions to a familiar destination. Even though you may be able to drive there without thinking, you may have a hard ti.me giving step by step instructions on how to get to that same location. Browsing reduces the cognitive demand on information seekers by allowing them to follow familiar paths to the general area of the information they are seeking, guickly and easily reducing the size of the area they must explore. This also allows searchers to draw
  • 25. 14 Q) - Ill 0::: V, V, Q) 0 0 :, (f) (/) .... a. E (I) .... - c:x: Chapter One Search versus browse success rates 100% 90% / , 80% , , 70% , ,. ,, ,. 60% ,, , ,. , 50% / ,. , ' 40% , , / ,. / 30% ' ,.,, 20% ., / / 10% 0% 100% 80% 60% · 40% 20% 0% , ' / Browse Search Search after Browse failure Information Seeking Strategy First choice of strategy /1 , I ~ -✓ .,,,,l / / / ,. ,. / / ,, Brow se Search Information Seeking Strategy Figure 4. Information seeking behaviors.
  • 26. Find:,bility 15 on a broad range of "meta-information" about the target of their search. For example, say you need to locate a company memo that was circulated six months ago and has since disappeared into the bowels of the company Intranet. Even though you have no idea where to find the memo itself, you recall seeing it referred to in a11 email from a colleague. You may not know exactly where to find that email either, but you likely will recall who it was from and roughly when you received it, along with some idea of the subject line and general content. This will allow you to find tl1c email that will in turn point you toward the actual target of your search-the company memo. Even though you can't teleport even i_ nto the general vicinity of the memo, you can start from a known frame of reference (the email) and follow clues along the way until you arrive at your goal. The small steps of orienteering and the clues found along the way also provide information seekers with a strong sense of location throughout their search. The importance of tl1e "you arc here" factor should not be underestimated. When users feel in control and that tl1cy arc heading in the right direction and arc able to backtrack if they take a wrong turn, they are less likely to abandon a search prematurely. When people drop into the middle of an information space as a result o f a keyword search, they have no context and little indication of how to proceed. T his sense of disorientation can cause both knowledge workers and potential customers to leave a website as quickly as they arrive. By contrast, navigating through an infomiation space allows the user to become acclimated to the environment at their own pace, much like easing into a hot bath rather than plunging into scalding watet:. This process of guided exploration also has the dual benefit of building context for interpreting the target information once it is found and allowing for serendipitous discoveries along the way. Most importantly, information seekers arc more likely to continue their search i_f they are confident that they are on the right path and tl,at their efforts will pay off.
  • 27. 16 Chapter One NN/g t!.t:U.g use1t.com md.o rg AskTog Nielsen Norman Group Strategies to ennance-'tne"us"er experiehce - - - C _- Home ~ services Publications ~ About NN/g NN/g Home :• Services • Training > Intranet usabili ty Figure 5. Breadcrumb trails arc often used to give users a sense of control over their exploration of a new information space. It's interesting to note that the word browse derives from an antiquated French term brost meaning "young shoot" and referring to the way that animals feed on the young shoots of trees and shrubs. As animals seek for nourishment, they must balance tl,c nutrition to be gained against tl1e energy expended obtaining it. This behavior is fundamentally the same for information seekers. Visitors to an information space, whether it be a website, Intranet, database, file system or what bave you, arc continually balancing cost and benefit: "Will tl,c i.nformation I find here be worth tl,c time and effort it is costing me to track it down?" As they browse a website, they will be repeatedly assessing tlie likeW1ood of fmding what tl1cy need i.n tl1c current environment and determining when it's time to move on to more promising pastures. This metaphor has become the basis of information foraging theory, a model of information-seeking behavior developed by Peter Pirolli and Stuart Card of the Xerox Palo Alto Research Ccnter.w According to this model, we search for information across tl,c Internet using essentially the same strategics hunter-gatherers use to search for food across tl,c savannah. The nature of the prey may be new, but tl,c fundamental approach hasn't changed for millennia. Botl, animals and humans attempt to maximize their "benefit per unit cost." When the benefit, in terms of likelihood of finding tl,e necessary food or information witl, an acceptable investment of time and energy, falls below a certain tlireshold, the current website or watering hole will be labeled sterile and the forager moves on to a more fertile patch. Steps can be taken to reduce the W <elihood of users leaving our i_nformacion patches prematurely. One of tl,c most
  • 28. Findabifity 17 effective strategies is to increase the strength of the information scent present in our systems. The notion of information scent is central to information foraging. The basic idea is that just like a game anim al, i.n fonnation leaves behind spoor that can be detected and tracked. Associated concepts "rub off' on one another, leaving detectable traces, just as a watering hole frequented by woolly mammoths will smell of woolly mammoths. A hunter-gatherer seeking mammoths is likely to be drawn to the watering hole, if only to look for spoor. Information foragers do the same. Imagine you're looking for texts about foraging theory. If [a search] throws up a box containing the keyword "hunter- gatherer", you're likely to select that box. It just smells right.21 Consider oux ticket purchasing example. When we fast arrive at the theater's homepage, we sec labels such as "Artist of the Month" and "Show Listings," which may even include the concert we arc seeking. Even though we don't see that we can purchase tickets here, the page smells like concert tickets so we continue our search by clicking on "Upcoming Shows." Herc the scent gets stronger when we find the right show along with a link to "Show Details," which finally gets us to "Buy Tickets O nline." Throughout the process of browsing, the scent of concert tickets is strong and gets stronger the closer we get to our goal. This continual positive feedback can keep information seekers happy with the current infom1-ation patch and prevent them from jumping to a competitor or colleague to meet their needs. Strong information scent can be a double-edged sword if mishandled. The most common pitfall occurs when a strong scent points toward what should be the right answer but isn't. Jakob Nielsen demonstrated this phenomenon in a study of a health information for teens wcbsite.22 Users were asked to find out how much they could weigh without being considered overweight. Most users quickly
  • 29. 18 Chapter One gravitated toward an area of the site labeled "rood & Fitness." Th.is clear, concise label had strong information scent fo.r the question at hand. Featured prominently within that area of the site was a lengthy article entitled "What's the right weight for my height?" that was also ranked highly by a search on the tenn "weight." This would seem to be a bull's-eye except for the fact that the article docs not contain the answer to the question. Because the information scent leading to this article was so strong, users were convinced they were looking in the right place. When th.c information wasn't there, they naturally concluded that because it wasn't where it should be, it must not exist anywhere on tl1e site and abandoned ilieir search. Th.is is an w1 fortunatc result since ilie answer was in fact available on the site. It was buried in an article titled "Body Mass Index (BMI)." The information scent of ili.is title for answering the target question is almost non-existent. hrst, the title is a bit academic and maybe even intimidating for the website's teenage audience. Worse, the title gives no indication of the article's content which includes a straightforward [ IIY IIC IEISOILIM{ J Thursday, August 26 Pink Martini Oregon Zoo Tlckot Price: $32.00 adv/ $32.00 d os All A.Qes Event loullkkrt O.tlrtl1fo,mtti1n OoorJ nl Gntei ti -IPl"I. Lawn Entry @ SPl'I, Shol!f nt 7PM ... ~ Does anvth1no say summer 1n Porttand quite like a Prnk Maroni concert at the Oreoon Zoo? What • ~-· better way to hear sonos from their latest release - the lush, breezy "Splendor m the Grass" - than on the zoo's lush, breezy concert lawn? Our hometown heroes are international stars, but desprte a busy European tounno schedule, Pink Ma1t1nt will out away its oassports for two soeaal performances at the zoo - their only Portland appearances this summer • ,._ ' please note only GAtix are available at the Aladdin Uox office; reservation pack6ges ore available at tickel11H1stcr.com.The concerts all start at 7 p,m. Your ticket will allow you into the Zoo al 4 1,.111. ol tho day or the concert. For all concerts, the lnwn is closed al 4 p.m. for the sound c.hcck. ond then opened ot 5 p.m. for concert tltkct·holders. • • • • PiM!Mutini Figure 6. A website with good information scent.
  • 30. Findability 19 calculation of optimal weight using height, weight and age. In a nutshell, the container of the information was mislabeled. The problem of bad labels strikes at the heart of findabilily. If information seekers cannot recognize the content they are searching for even when they find it, it may as well not exist. Even when an information producer gives careful consideration to labeling and categorization, the result may have no meaning to information consumers. J physician, wanting to be precise, may label a document on treating a particular rcspiratoty condition with the terms laryngotracheobronchitis, inspiratory stridor and dexamethasonc. While this may be perfectly appropriate for other doctors, it is of little use to a mother searching the Web for information on how to alleviate the wheezing cough of her daughter with croup. Most information systems today are organized much like libraries before Melvil Dewey created his decimal system for classification. Patrons were left to wander stacks of untitled o.r oddly titled books piled on shelves according to some idiosyncratic organizational scheme comprehended only by an arcane priesthood of local librarians. Overcoming this barrier to discovery is the role of controlled vocabularies and taxonomies. By developing a structured collection of terms and guidelines around how they arc to be applied, information can be managed in a manner tl1at facilitates its discovery, interpretation and use to the greatest extent possible. Beyond just finding information, the hierarchical nature of a ta.xonom)1 can help educate an information seeker by guiding them tluough a subject. The mother searching for information about her daughter's illness will not only discover that dexamethasone is a steroidal treatment for the condition, but that humidified air may also alleviate her discomfort. Continuing tluough tl1c structure she will discover additional treatments and potential complications. Finally, she will learn that the proper name for "croup" is 1.11 fact laryngotracheobronchitis, giving her a new term to search on and expanding the potential information sources available to her.
  • 31. 20 Chapter One The parent/child relationships inherent in the tree structure of a taxonomy are powerful tools in guiding a seeker through what may be an unfamiliar subject. By explicitly showing how terms and concepts arc related, a searcher will discover associations that they didn't: know existed. Most importantly, they can define and refine their information need as they explore rather than having to precisely articulate it up front wben they may not know exactly what it is they are seeking. O rgani7.ing information according to a well defined structure, such as a taxonomy, also provides stability to an information environment. Information changes continually. D elphi Group has estimated that at least ten percent of enterprise information changes monthly i.n an average organization.23 Without some means of governance, relevant information becomes a moving target. Today a search on "taxonomies" may yield 1,900,000 matches. Tomorrow o r next week tlrnt same query could return 1,985,000 hits with completely different rankings. That article I found last week that was so useful but that I didn't bookmark could now be anywhere. A taxonomy can act as a dynamic bookmark. As new documents and in formation become available, they can be classified, labeled and published in accordance with the taxonomy without changing its structure. When a knowledge worker needs to return to an area of interest, he will still find it where he left it. The only difference will be that tl1crc is now more information available there. In addition, the new information will be in context witl, relationships an<l potential avenues of exploration clearly visible. Managing terms and keywords can also enhance search by bridging the vocabulary gap between information producer and consumer. A search engine integrated witl, a ta.,xonomy would know that a search on cro11p should also look for laryngolracheobro11chitis and that in certain contexts "Oscar" is another way of saying "Academy Award." It can also compensate for common spelling errors and variants (i.e., theatre or theater) and synonyms (fall or plunge or spill or tumble). T hese expansions may seem trivial, but they can dramatically improve the effectiveness and efficiency of search.
  • 32. Findability A sample hierarchy ofrespiratory illnesses CROUP (USE FOR laryngotracheobronchitis) Symptoms fever wheezing (USE FOR inspiratory strider) swollen lymph glands decreased appetite Treatment humidified air fever reducer acetaminophen ibuprofen steroid dexamethasone prelone orapred pulmicort breathing treatment acemic epinephrine Complication kidney inflammation (USE FOR glomerulonephritis) rheumatic fever STREP THROAT Symptoms fever swollen lymph glands rash Treatment antibiotic amoxicillin erythromycin Complication rheumatic fever RESPIRATORY SYNCYTIAL VIRUS (USE RSV) Symptoms 21
  • 33. 22 Chapter One Controlled vocabularies, like taxonomy and its relatives, arc not silver bullets and will not magically cure all information management problems, but they are a critical component of findability. If properly constructed, applied and maintained, a ta, xonomy can radically increase the value of information by making it more available, understandable and actionable. The remainder of this book will demonstrate how this can be achieved. Before we can delve into the mysteries and wonders of taxonomies, however, we must take a brief detour into the world of metadata.
  • 34. 2 Metadata If we fail to anticipate the unforeseen or expect the unexpected in a universe of infinite possibilities, we may find ourselves at the mercy of anyone or anything that cannot be programmed, categorized or easily referenced. Fox Mulder, "The X-Files" Art collecting is a tricky business. The value of a painting, sculpture or even a rare book can vary wildly depending on the circumstances of a purchase. Two similar works by Monet may go on the auction block together; one sells for thousands, the other for millions. The only substantive difference between the two is the existence of provenance information. A clear record of a painting's histoiy, who has owned it, when and where it has previously sold and for how much is essential to deterrnio.i.ng whether or not it is a wise investment. Without such information we have no context for our decision. Is it overpriced or undervalued? Is it stolen? Is it a verified Monet or just a suspected Monet? Even though it is the painting itself that holds our interest, we need information about the painting to gualify our interest. This same principle applies to less tangible assets- namely information. When we first locate new information we tend to be suspicious. Can I trust these numbers? Is this the current version of the document? Is this image copyright cleared? This is especially true if the source of
  • 35. 24 Metad:1ta that information is not familiar to us. Before we trust a document or a Web page, we need to know a little more about it. Some of these gucstions may be answered by the search itself. When we look for information, we usually try to specify parameters to limit the scope of the search. Specifying the author of a document, the date of its publication, whether it is a report, invoice, form or memo will not onJy enhance our chances of locating what we a.re looking for but can pre-gualify the content as it is found. This kind of reference information is generally not indicated explicitly in the content itself, but rather is supplementary to it. It is metadata. The standard definition of metadata is usually given as "data about data." Th.is gets at the general idea, but is not gu.ite adequate. The term "meta" comes from the Greek root meaning something !hatjollmvs anolher and lakes ii into acco1111t. Thus, metadata is generally developed from associated source data and as a function of the information it describes. The G reek tem1 aJso means among, alongside, or 1vith, so it follows that mctadata can take several complementary forms in relationship to its parent information. rinally, if tl1c Latin derivation is taken into account, meta can mean /ranscendent, so metadata shouJd be expected to add value above and beyond the content it describes. To complicate matters, the distinction between data and metadata can be flu.id. What is metadata in one context may be pure data in another. For example, if you are looking for an article on a cert'W1 topic by a certain author, then the writer's name and the subject keywords arc metadata and tl1e content of the article is data. By contrast, say you are trying to remember the name of the author who wrote a particular article in tbe 1940s and can't remember the title. You uo remember that it contained tbc pbrasc: "Man cannot hope fully to duplicate th.is mental process artificially, but he certainly ought to be able to learn from it." In th.is case the publication date range, 1940-1949, and the content of the article itself are the metadata and tl1e author's name is tl1e data. 1
  • 36. Cbapter Two 25 The Value ofMetadata In late '1988, a non-descript van pulled up in front of Christie's East, the pmchasing office of the renowned auction house in New York City. Tied to its top with several lengths of rope was a six by s.eveo foot canvas. T he driver had found it at a warehouse sale of unclaimed property and purchased it on a whim for $1,000. The painting was in bad shape and nothinKwas known about ·it, but it was large and old and ougbt to be worth something. He offered it to Christie's for $1,500. Ian Kennedy, a residen~ expert of Old Masters for Christie's e..~amined the painting an instantly recognized it as a work of tbe Italian Master Dosso Dossi. With this new bit of information, the asking price .rose from $1,500 to $800,000. It was purchased by the London art deal~rs Hazlitt, Gooden & Fox for $4 million, dirt, tips and all. Two months later it was sold to the Getty Museum for an even higher price. 11 Allegory of Fortune,11 Dosso Dossi
  • 37. 26 Metadata The defining characteristic of metadata is that whatever form it takes, it facilitates the identification and discovery of a discrete package of information. The classic example of this is the library catalog card. Independent of any actual content from the item being described, a simple 3" x 5" card can provide a wealth of information that is usefu l in locating and managing an information resource, in this case a book. At a glance, we can determine the title, author, publisher, length, topic and even location of the book. This quick access is by design. 973.4 B21 UcCullough, David C. John Ada.ms / [by] David lkCullough Mei.r Yor k : Simon & Schuster, c2001 751 p., (40) p. of plates : ill. (some c ol.) , maps ; 2 5 cm. Includes bibliographical r eferen c es (p . 703-726) and inde x. ISBI-! 0-7432-2313 - 6 l. Adams , John, 1 735-1826, 2. Pr esidents - United Stat es - Biography. 3. Un i ted States - Po l iti cs and govema ent - 1783-1809 . I. Title. E. 322.H38 2001 9 73.4' 4' 092 [BJ 2001027010 Figure I. Mctadata in a traditional card catalog. /n often overlooked feature of the humble card catalog is that the cards are organized to facilitate this at-a-glance utility. Each card has a consistent location and format for each piece of information it contains. When looking at an author card, we know the first line indicates the author of the work and the second line is the book's title. The structure of the card telJs us that a book is a biography of John Adams written by David McCullough rather than the other way around. The same principle applies to electronic resources. To be useful, mctadata must be structured to facilitate both discovery and interpretation.
  • 38. Chap ter Two 27 Most major newspapers now provide onJine editions with searchable full-text archives. Tf we type in a few well chosen key words, we have a chance of finding something of .interest. The newspaper's search engine will match our query terms against every word of every article of every edition contained in the archive. This is searching the data, the actual content of the newspapers. This type of search is subject to all of the pitfalls of unconstrained search as discussed in the prior chapter. If we instead search the meladata, we can dramatically improve the effectiveness of our search. 111:WS fllTEllTAIIIMEllT OTHm StCTIOIIS ClASSlftEDS JOBS CARS H OMES REIITALS • JOBS • CARS • HOMES • REIITALS MORE Cln$1FIEOS SAi.ES &DEALS 8USIMEst OIRECTO~ eo._,.1,..__kjf rucE.,IID ARCHIVES Ba~k Searc.h AdvolO~f.ld $e.uch s.wedSearch Login Account &PUl'Ch.1$C$ Knowledge Ccnte, Arc.hive• Trouble Rer,011 l.nlmea.com Sit♦ Servke• ARC:HIVES Hfl.l' ~ lllf•' Abot.n the ArclW'f: Prl<ing Term& of Service Se, u ch TQJa FAO Storie,: Prio-rto 1tl3S Sea,ch ror: ------- Coment O1 1 llo11s: 0 11 1.'1985 . Present (Te><Q 0 121 -1•1881 - 12/311 198-1 (Htstonc Article Images) Soll By: 0 Most Recent First 0 Oldest Fhsl 0 Retavance Date Options: 0 All dates O oate Range AtRhor: Headline: A1ticle Type: Al r.:: ,, ....,,~ F1 0111; .wl ~ ·: 1 v i ~ To: ~~;--::~ - - L - - (optlonaQ - - - ~ (option•~ Sectloo: Al - - - - Semell O1 >11011s: Search Articles Only SearchMieles.Advertisements and Listings EIL#M Figure 2. The advanced search page of the LA Times. V'
  • 39. 28 Metadata Tf we would like to research the position of fom1cr president Jimmy Carter on U.S. trade with China, a reasonable place to start is the arch.ivcs of the Los Angeles Times (www.latimes.com). Js we would h• th I d "C " "]) Li " d "Cl • " expect, scare Jng on c ceywor s arter, o cy, an una returns an assortment of documents ranging from an analysis of the conflict between China and Taiwan to an obituary of Stanford University professor Michael Oksenberg. Fortunately, the Times archive provides an advanced search mechanism utilizing extensive meta.data. Rather than a blind search where all words are treated cgually, the Times enables users to restrict certain terms to certain areas. We can specify that "Jimmy Carter" only be matched against authors and that only articles of the type "opinion piece" with the word "China" i.o the head.line be retrieved. Even though we are no longer looking at any of the archive's actual data or article text and are instead searching only meta.data, we receive a precise set of documents with a strong likelihood of being .relevant to ou.r interest. Types ofMetadata The advantages metac.lata affords to searching electronic versions of traditional textual resources are straightforward. However, the digital world isn't as simple a place as it once was, and newspapers, magazine articles and the like arc rapidly becoming a minority among the milieu of online information. ew types of i.nformation objects and artifacts seem to emerge daily. Io order to manage this deluge of new forms of information, we must be able to describe them in ways that are specific to each wuguc type and the tasks utilizing them. To this encl, several different forms of metadata- desc.riptivc, technical, and administrative- may be developed for any given information object. Descriptive Metadata D escriptive metadata is by far the most common form of meta.data i.n use today and is usually what you will encounter as an in formation
  • 40. Chapter Two 29 seeker. This type of metadata comprises what is explicitly added to content to make it easier to find. lo a nutshell, descriptive metadata is the who, what, when, and where of an information resource. 'v'hile it found its first broad application with textual resources such as the LA Times archives, it is rapidly coming to permeate every aspect of the online world. Take for example, Apple Computer's popular iTunes online music service. Since the content offered by i'l'unes is non-textual (i.e., the strains of a Bach concerto or a John Coltrane solo), full-text search of the content itself is ill-suited to retrieval. Rather, you search the textual information associated with the audio or video file you are trying to find. Most files have been extensively tagged with descriptive metadata. This includes the basics, such as artist, album, and song title as well as more advanced categories such as genre, sub- genre, release date and publisher. Each piece of metadata associated • f<02S Mro5t.F.,. iTunes Review Th• Rloe >nd f>II ofZlggy Stoudusr o1nd the Spider~ fromM3n 03id BOWie (;if1 lhlct.tuck 0 AtlklAAttt 0 Tehflieod 0 ~.Deu.Sep28. 19SI(!: ~~~ C sil H!rJ l'fvln lt.tt . .........,~ lWR't!:)(wt,.ol 51•11.'WI lhtr~toci ~ - !l'N~by enegolo01U liS ...... lo)'•&neremorblto~ lWlll'N,llmde'S'slW llOl'lfl'l:ofltl6tietnYJ~.KUOl-70ttd ~.Dl'!M~ ~.,~ptng?",m11~~ .wtfk.,byn~.,,,,.. ,weco.Jtfaf ~ nrac:t.~~rncrtll':«ttltth::cied:a:t'i.:t ~rqCll'NdlGf~. N ~ 0tu:ttne-¥bf!:e,Pwt",t,:1ectl:o ,1 A.fl"'flll . ~ IOd:Rd~w~U90(3~-..wfOU ~~ bac.h "Sla,JMt,;~)e'(~.•"frf9Y•1;"1-w,QCrt:t Ytur:d!' Swl= --- ◄ S(a,,no,-, 5 llAn't.~JY 6 l«h'5lerclnt ... 11,q(J"ltoYo,ssef .,,,,,,,._ 'f:42 Ob-,,d Bowie J:33 O.Vld~ 4Jl8 O.Yidao.. 4:13 o.w,ea-, 2:$1 Oraw!ltor,<,,'JO 1:20 O.W,Bo,,,,w!, 2:i6 [)r,,dflO'tle Z:38 oa'Yld&ow,,e- J;l2 0r,41:UJowe 3:2'40.W,BtMle II~ Figure 3. Metadata in iTuncs. Tor Attb,t Or...~ b I . Unde, P'l1ttsur• 2 ~ e Oddiry 3 undtf P'l tt,w e • , .,., Dance 5 Ch-M'OO• d. Rt!bel Rebl-1 ,:.]~ q......,_ o ,@= ......yrt,...,...";).-« ~ l,i'l.t ,irtot1!1 .lll!SOl>ouglll l.9:w.-eOddRy. ,..,,,..,.,. AI.Mklln Saoe. "'""'"""' Lo,.., D,v,a- 11..-.esEt~ T~f't41.etlM:c.u t ~k.it>e- 1.~~ :th om~tt.ero- SHM O 1. eu... fleiO I l.ci.., ,kttodl ol. Dl'Ad 1o.,.1,e ~•vIow ... l s:. lhrlr-.n.oa: , . Pt<Mo Put• 1. Ro.t, Vf rut• t; •-A>-0 u Customer Reviews t"g '.'t1,•A■11Re-'Mwc O eu..JtA11-hBeS1 * :lf..:A*Jt b'tJitt-....,. ,.. - Trclllliftltshoa,-.e:~os•Q-'• rt oneotlhOMotuct. .....nere 1 r1~:opd.<:t.Aa.i:::1onc;c!OdlCW'9 boo<'ao1,etis11 e,M1tir1M«-.or~•ia.a:::ill,'Wf!Ke'!!'Cll.l<<in111dilrat..-cift.~O!t ....«tt,.,,.ior,1,~0,:tCiN'liel- Moire- 0 lheRtse«ldfalofZ... 0 ft,elbwloodF.,fofz... 0 Tt.RNondFalofZ... 0 TheRaundfaldZ... SOW~ 0 lhoRisoMd.Falofz... 1().99 ..., 0 TheRISoMIOFalofZ... S0-99 ~ 0 TheR.Q! YiCFaldZ... $0,99 ~ 0 TheR.AlrldFalof Z... S0.99C.,••o-) 0 TI-.Ra.-.:!Falof z... MunOri)o () TheRae.-,cFaldZ... S0.99
  • 41. 30 M etadata with a particular song increases the probability that it will be found, either by searching or browsing, and subsequently sold. The value of descriptive mcta<lata doesn't rest solely in discovery and retrieval It also facilitates tl1e second part of the e-commerce equation: making the sale. Once a user browses tl1rough genres, sub- gemcs, and artists to a particular albwn of interest they can read reviews, ratings, song length, and even beats per minute. All of this is descriptive metadata that will help ilic information seeker make a value judgment of the content t11cy arc considering. The principle is equally valid for corporate earnings reports as it is for Mariah Carey videos. Administrative Metadata If descriptive mctadata is intended primarily for the information seeker, administrative metadata is 1na.inly for the benefit of tl1e information owner or steward. Metadata elements specifying from where a file or document came, where it is to be hosted, who is authorized to modify it, when it is to be archived, in what form and for how long arc all administrative mctadata. It is created for the purposes of management, decision making and record keeping.3 Administrative mctadata is tl1e lifeblood of modern content, document and records management systems. It allows content to move through its lifecycle in a largely automated fashion. For example, companies try to keep ilieir websites interesting by continually changing their content. cw stories arc posted to the homepage and older content is moved to less prominent locations. J few well chosen pieces of metadata, such as publish date, run length, and archive page ID can combine with business ruJcs in a content management system to automate for tlic most part the entire process of updating a website. This frees the Web team to focus on creating compelling content rather tlian shuffling files around the server. It also allows tl,e website to be updated in the middle of the night wiiliout disturbing the webmastcr's sleep.
  • 42. Chapter Two 31 Recently, administrative mctadata has found a new niche in the form of Digital Rights Management (DRM). Once the province of military intelligence and industrial secrets, DRM has recently moved into the mainstream. As distribution of intellectual property across tl1e Internet and corporate Intra.nets has become the norm, having a reliable means to track that content and control who can access it has become essential. DRM secures digital materials and limits access to only those with tl1e proper autl1orization. In addition, a complete DRM solution facilitates and tracks any transactions involving tl,c content you wish to protect. !,.or example, allowing copying or limiting the period of access or the number of ti.mes content may be viewed must all be supported.4 ORM technologies and techniques arc dnven by administrative meta.data. Structural Metadata As we have noted, information comes in many forms and &om many sources, usually bundled into packages tl1at a.re largely black boxes to us. How a.re we, or more importantly ilic tools we use, to know how the information is to be read, manipulated and displayed? How docs an application know the technical requirements for integrating the contents of some strange new file into its world so that we may have access to its contents? This is the role of structural metadata. Structural mecadata, sometimes referred to as technical metadata, display metadata or use metadata, describes how an information object, usually a file or set of related files, is put togetl,er. This can range &om technical details such as file size, compression scheme, and scanning resolution to display and navigation information such as presentation order, typographic instructions, and search mechanisms. The most common application of struclural metadata is defining how information is to be organized in databases and data warehouses. Every piece of information housed in a database must be grouped into records and described in terms of type, size, and relationships.
  • 43. 32 Metadata The structural metadata governing this organization is in fact what makes up a database and turns unorganized data into a usable collection of structured information. Another way of looking at structural metadata is the page-turner model. In this model, structural metadata specifies how individual information objects are bound together to make up a single information package that is presented in a specific order, like the pages and chapters of a book. This allows text, images, and other content to be presented in sequence, but enables the user to navigate it at will, jumping from section to section, while preserving the organization and structure originally intended by the creator. Metadata Schemas Regardless of its type- descriptive, administrative or structural-and the purpose to which it is applied, all metadata share certain characteristics. At a minimum metadata must posses semantics, synta.. x, and structure .5 Semantics refers to the meaning of metadata within a pmticular comJtnmi!J or domain. T t is important to note that any given metadata field can have different interpretations depending on the context in which it is being used. For example, the administrative field sample so11rce could refer to a medical procedure or even a particular patient in a medical context, or it could refer to a certain musical instrument or recording in the context of audio production. It could just as easily be a technical field referencing a particular device or encoding scheme. The point is that without clearly defined semantics, it is nearly impossible to accurately interpret mctadata. Just as people cannot interpret metadata without an understanding of its semantics, computers can't make sense of it without syntax and structure. Syntax is the systematic arrangement of metadata elements and their values according to well defined rules. The most common
  • 44. Chapter Two 33 form of syntax currently is the name-value pair in which the name of the metadata clement is simply matched with its value, such as: <author =Arturo Perez-Reverte> <title = The Club Dumas> <genre =Fiction> Structure defines how metadata is to be organized to ensure consistent representation and interpretation in line with its syntax and semantics. The structure specifies which mctadata elements are allowed where, in what order and how often. A record describing a "book" must start with one or more authors, followed by a single title, a single genre, an optional sub-genre, a single publisher and so forth. Taken together, semantics, syntax, and structure form a type of grammar, called a schema, that specifics the rules governing the metadata of any given domain or application. At the most basic level, a schema specifics a list of attributes that arc valid for describing ao information package. A more sophisticated schema will often detail out every aspect of how metadata is to be encoded and represented. In all cases the overarching gmtl of defining a rich schema is to make metadata as useful as possible in terms of interoperability, extensibility and flexibility. Interoperability is the ability of information systems to exchange metadata an<l interact in a useful way over communication networks such as the Internet.(' This is what allows the computers at Amazon.com to talk to your bank or credit card company and receive payment for the book you ordered. Extensibility means that the original definition of the schema isn't the final word. It should always be possible to add additional metadata elements (albeit in an organized and controlled manner) to any schema in order to accommodate specific and often L111forescen user needs.
  • 45. 34 Metadata Above all, mctadata users demand flexibility from their metadata schemes and systems. T hey do not want to be compelled to add information that they deem is irrelevant or too cumbersome. As a result, most mctadata schemas allow authors to include as much or as little detail as they desire in a metadata record. This makes autl10rs happy, but tends to make life difficult for information aod metadata administrators, since the more flexible mctadata is, the less interoperable it becomes. Two informatio n systems may depend on a particular metadata elem ent in order to communicate, and if an author fails to provide it, interaction between tl1c tvo systems becomes impossible. Imagine if Amazon.com neglected to include the price of a book when it tried to charge your credit card. Schemas serve to mitigate tl1ese problems while presc1v ing as much flexibility as possible. T he number of publicly available schemas has exploded in recent years, and there now seems to be metadata standards (official, de facto, and even competing) for nearly every domain imaginable. O ne of the earliest and most broadly applied is the Dublin Core (DC). am ed after the Ohio city in which it was first drafted, the D ublin Core was originally developed witl1 an eye to describing document- like objects. More recently, D C metadata is beginning to be applied to a broad range of other types of resources as well. O ne of the strengths of DC and a prime reason for its popularity is its simplicity. The D C schema captures the fundamental characteristic of an information resource in a manner tliat is easy to create and comprehend. Thomas Baker of the German National Research Center for Information T echnology has referred to it as "metadata pidgin for digital tourists."7 ln its current form, D C consists of fifteen elements covcnng tl1e basic descriptive, administrative and structural needs of an information object. For each clement the schema supplies both an official label and a concise definition. I;or example creator is defined as: "an entity primarily responsible for making the content of the resource." Just as with a well defined structure, clear definitions of
  • 46. Chapter Two 35 labels and terms arc essential to ensuring the appropriate interpretation and application of metadata. The D ublin Core is an example of a simple schema that can mediate between the extremes of full indexing of raw text and highly structured content. It provides a mechanism for capturing the fundamental information necessary to describe an information rcsow:cc without the burden of elements that may be irrelevant to a particular community or application. Some have perceived the spare nature of DC schema as a weakness. While its basic nature allows it to describe many different types of resources, it limits the detail you can capture about that resource. For example, the creator clement, described above, makes no distinction between a person, an organization, or a service. This could be essential information to a particular application. Perhaps even more troublesome is the fact that there are no constraints placed on the values a given element may take. For example, the subject element can be filled with a keyword, a Library of Congress Subject Heading or a free text description. This lack of standard terms and values is critical, as we shall sec shortly. Descriptive Title Subject Description Source Language Relation Coverage Administrative Creator = Publisher Contributor Rights Figure 4. The current Dublin Core clement sci. Structural Date Type Format Identifier
  • 47. 36 Metadata These shortcomings arc common to most metadata schemas. The Dublin Core is a good example of how linutations can be overcome through extensibility. The DC supports two types of qualifiers, schemes and types, which refine the base schema. Schemas allow you to specify the standard syntax or vocabulary that arc allowable for clement values. T he D C element Slf~jec/ may be qualified with MESH to indicate that all values must be drawn from the Medical Subject Headings vocabulary or LCSH to require Library of Congress terms. Likewise the language clement may be qualified with ISO 639-2RFC 3066 to ensure that any value applied to that field conforms to the ISO standard. DC types refine the definition of the core element itself. The basic D C clement date, defined as "a date associated with an event in the life cycle of the resource" is too generic to be useful. 13y applying a type, the basic date clement can be transformed into date created, issued, accepted, available, or acquired, among other possibilities. This ability to refine and enhance the schema without corrupting its fundamental nature and structure is the key to metadata extensibility. 'qithout it, any metadata system will quickly become obsolete regardless of bow well conceived and executed initially. Where Do I Put It? Mctadata can live in several different places. TraditiooaU y, as with the card catalog, it has been recorded and stored separately from the object it describes with a pointer of some sort to the location of the information resource itself. This is o ften the case in content management and data warehouse systems. Information resources will be given a unique identifier and stored in whatever form and on whatever system is most appropriate. 'fhe metadata describing that resource may be hosted in a separate database dedicated to that purpose. The metadata and the object it describes remain I.inked by means of the resource's identifier.
  • 48. Chap ter Two 37 This approach has the advantage of making it simple to update the metadata of any given information resource. If a new manager takes over responsibility for a large number of documents, you can simply update the database with the new information rather than tracking down and retagging the documents themselves. The disadvantage of this approach is that the metadata doesn't travel with the document if it is shared. If a file with externally managed metadata is ern,'liled to a colleague at another organization, they will receive the content but not the descriptive information. This can become a problem if that additional inf01mation is critical to making the document usable. Ao alternative to external management is to make the metadata a part of the information resource itself. Most applications supporting thjs approach store metadata as properties of the file they describe. Mjcrosoft Windows, for example, allows an author to add summary metadata to any file, which may then be used to organize, locate, a.nd retrieve the information resource. In addition to traveling with the file, internal metadata has the advantage of being somewhat self- maintainiog. In the case of Windows metadata, some information is extracted directly and automatically from the document itself. The organization of the file is automatically extracted from heading styles in a Word document, Excel worksheet titles, or slide titles in a PowerPoint presentation. If the file changes, the new structure is automatically reflected in the metadata. Usage statistics are also automatically updated throughout the life of the document. At first blush, semi-automatic maintenance and close coupling witl1 tl1e information it describes makes internal metadata a very attractive option, but it does come at a cost. rirst, while some of the descriptive metadata (title, author, cornpa,!Y) can be automatically generated, the fields that are most useful to retrieval (su~ject, category, kry111ords) must be manually selected, keyed, and maintained. If tl1e owner of the document changes, as mentioned earlier, not only docs that field need to be updated in each impacted docwnent, tl,ere will be no history of ownership. O nce ao internal field is updated, all previous values are lost. This can become critical if an explanation of something in ilie document is needed and no one remembers who origi.nalJy wrote it.
  • 49. 38 ----·· . ~-- ··-- ··· ·----· - ···--· • •·--· --·· • - -· - 'I !lntn:irtifnfon to th"; i.;;11~~11itir Wi;bt1utlini. <lo;;-1•-: {tltil Property Description [¼Title c;:rsubject [?'category [¥Ke'WOrds CJ'Comments Origin [?'source [¥Author Introduction to the Semantic Web Semantic Web Lectures Semantic Web, RDF, Ontology Draft of Lecture 1 Darin L. Stewart CJ'Revision Mumber 2 '-'--_O_K_....,J_]" Cancel 11-" Apply I, _Help Figure 5. Metadata in Microsoft Windows. M etadata tnother hazard is shifting terminology. The vocabulary of any organization or community inevitably changes over ti.me. Keywords, subject headings, and even category labels need to be updated to reflect these changes. Otherwise a search engine will not be able to match a relevant document tagged with obsolete tenns with a guery from a user searching with the latest buzzwords. Additionally, while deliberate keywords arc essential to effective retrieval, as discussed in the prior chapter, the burden of selecting, assigning and maintaining them falls primarily on the author (who is invariably overworked already). This often leads to sporadic metadata and often idiosyncratic tags and terms. This becomes an even greater problem in the context of authority control, which we will discuss shortly.
  • 50. Chap ter Two 39 Where Does It Come From? The potential sources of metadata and the means of creating it are as varied as the information resources they describe. Systems for automatic generation exist but rarely reach an acceptable level of quality without human assistance. Conversely, a broad application of metadata across an enterprise of any si7.e is generally too tedious for human beings working without the help of scripts, term extractors and tagging tools. As a result, most successful metadata endeavors draw on a range of sources, tools and techniques depending on the nature of the information under consideration and the purposes for which it is intended. The same principle is just as applicable to creating the metadata for a single information resource as it is to an entire collection. In most cases, the descriptive metadat,'l will be assigned by the creator or author of the information. This has the advantage of terms coming from the person most familiar with the content and its original intent. It has the disadvantage of the metadata reflecting the biases and idiosyncrasies of the author, whose vocabulary may not necessarily reflect that of her audience. The readers may also place the information in a different context from that originally conceived by the author. As a resuJt, it is often advantageous to leave the creation of descriptive metadata to the professionals. The National Information Standards Organization (NISO) has noted that it is often more efficient to have indexers or other information professionals create this metadata, because the authors rarely have the time or necessary skills.R This is, of course, an additional line item cost, but when lifetime cost of ownership (especially in terms of findability) is taken into account, leaving it to the professionals is often cheaper in the long run. Administrative and structural metadata will often be generated by the technical staff that prepares an information resource to be published and distributed. The individual scanning an image or creating a digital recording is in the best position to supply details about resolution, bit
  • 51. 40 Metad,1ta rates and encoding schemes. The individual adding the resource to the content management system will know when it is to be posted to the website, for how long, and where it is to be archived at the cod of its run. As with any budding field, there are an abundance of tools available to assist in the creation of mctadata. The most common (and cheapest) is the application of templates such as those available in most word processing applications. In addition to providing standardized formatting of common document types, templates can also guide the author in providing basic descriptive metadata. Even if professional indexers arc utilized to create the final metadata, it is often effective for the author to create a "first draft" of the mctadata to serve as a guide. A well conceived document template can simplify this task and improve the quality of the mctadata. One of the challenges of high quality mctadata is ensuring that it confonns to the appropriate schema. Mark-up and tagging tools can prompt the user for the appropriate fields, requiring those that arc mandatory for compliance to the designated schema. Once the mctadata is complete, the tool can either embed the metadata in the information resource itself or e>- rport it to an cxtcmal mctadata repository or database. Extraction tools will analyze the content of an information resource and attempt to extract appropriate terms and values for certain metadata fields. ror structural mctadata, this is often straightforward and quite effective. For more conceptual clements such as subject, category or keyword, it gets a bit trickier. Most tools rely on a mixture of statistical and computational techniques to make a best guess at appropriate descriptive metadata. In most cases these tools require a great deal of training in terms of sample docwncnts and target vocabularies, and still depend on human intervention and revision. However, much like having authors take a first pass at assigning mctadata, automated extraction tools can dramaticalJy reduce the full mctadata burden to a more manageable one of cleanup and refinement.
  • 52. Chapter Two 41 "'°'"'It::! ~J e -=-~~ -+• .<11!: J ~ 9.. s,..,,,, ZOMSIEI.AND - ........... e zoMBIElAUO Tllle:1 OYORc: ,..~ "''""""""' l;IZ°"""""d ..,,_ ;IZ~[BU",~I lll!Z~ l ..,,_ ~~!HDJ Lite~aZorbekindl£iJI lalZOlrlbd.n:!!2009! ::. ~ RMdead tiSNlllollheOMd ~ Z~ ( 2-0itel luUi,01S"-'- Q!Thf!Hill"90Ye'(R-A.Y:edSr,ole-Occ ~ ZorrbeL!ndl2-0c:c 'Nu1 UooSrul ~ Kd<An(Th~8,-l""'°"°C ijZOltlesfZartluZCll'tld ·~ :;ao....,ie..,..i '""'"" 5""'0 .. on •200'3-10-2 Fl -~R~R Pl Gt<~ l:.Uf ~ El ~= :c j°"""' ~-~ ~~-1g~""-- ............. a:int. ~ ~-.vi•~--- ,.. ~..- .llllilrd Jl'OllcoJd00-=nc&,,,,...,,rto lfl.llt,o,sflt"' ~~~:~b-=J>:/t!~aj~·~ 8 net.named~U•:1.•£~lnc::M:11!1 ~r, tlll' In..v,dJ,OUCSI~~ l'W!IU.-hic.tn l a!Q 114t'ih ,_Nl'ft9AIIJ.,.M• S,tj d-=,,~l'IN~h.A•d'-".,_.' ~:Oll"C'Otw..11.Wltl~h'l'!lm&efllfll'IT'.aSIMoOl'ldA,hQ."II .,. Figure 6. A metadata creation aid: Meta-X. Metadata andAuthority Control 0 2llMB!EIANO r,_,_..,. Metadata is a hard sell. It is expensive to create and difficult to maintain. Executives have a tough time understanding how the problem of having too much information to manage can be solved by adding on yet more information. Metadata is a bit of a "hair of the dog" solution. We add a little extra information to make a lot of information more usable. Js to the expense the answer is, of course, pay now or pay more later; sometimes a lot more. As discussed in the prior chapter, a few moments tagging a document can save hours bLU1ting for it l..'lter. When done properly, metadata initiatives nearly always generate a positive return on investment. Unfortunately, few a.re done properly and most fail. A prime reason for this is a lack of authority control. The notion of authority control boils down to making sure everyone involved in the creation and management of an information resource
  • 53. 42 Metadata is speaking the same language. It is the mechanism by which consistency in onl.i.ne systems is created and maintained. When applied to search and even navigation, it promotes greater precision by providing official or "authorized" forms of names, labels and values. As part of this system, references to equivalent terms and synonyms and variants are created which dramatically improve recall. 9 recall.9 For example, if the authorized term for a "non-rigid, buoyant airsrup" is blimp there will be cross references to zeppelin and dirigible. An information seeker searching on any of these equivalent terms would receive information for all of them. The value of authority control to metadata should be obvious. While schemas provide structure, syntax and semantics to ow: mctadata, thry do nothing to ensttre comistenry i11 the values assigned to the elements of the schema. The Dublin Core may specify an element called language and define it as, "the language of the intellectual content of the resource," but it does nothing to limit the potential values that can be assigned to that field. If DC metadata is being created for a□ international news story, its language could be tagged as English, Eng., En, American English, British English, or any number of variants. Each is potentially valid, but the lack of consistency turns retrieval into a crap shoot. If an information seeker searches on English they will receive only those information resources labeled with that exact term. Anything tagged with another term for English wilJ be ignored. The solution is to restrict potential metaclata values to an agreed upon list of terms, so that both information creators and seekers are speaking the same language. Io many cases, an authoritative vocabulary already exists and ca□ be adopted wholesale. Io the case of the D C language element, the International O rganization for Standardization (TSO) Language Codes standard (ISO 639-2) provides authoritative names and codes for languages. English would then be consistently represented as eng, Italian as ita, Japanese as jpn and Esperanto as epo. If the desired granularity docs not exist in tl1e standard, it can be expanded. D CMI actually recommends this as a best practice in the case of languagcs. 10 The ISO standard can be used in conjunction
  • 54. Chapter Two 43 wiili ilie Internet Societies' proposal for language codes (RFC 3066), which includes ilie more specific labels of en-US for American E nglish, en-AU for English as used in Australia, en-GB for English in ilie United Kingdom, or even en-GB-oed for British English using spelling from the O xford E nglish Dictionary. The additional advantage of adopting auilioritative terms is the possibility of sa.ucturing the labels to reflect relationships. Eng(UseFor English, en,) En-AU (UseFor Australian English) En-GB (UseFor British English) En-GB-oed(UseFor British English OED spelling) Despite the advantages it offers, authority control is a difficult pill to swallow for most organizations. The prospect of giving up ownership of terms and labels is often enough to incite turf battles in even ilie most collegial of environments. Authors feel that it is unnecessary and even inadvisable to constrain their vocabulary in any way (though they invariably recognize the need for such constraints among their coU cagucs). Deciding who and what is ilie "authority" and who and what is governed by its dictates are among the most contentious issues in information management. If metadata is a hard sell, authority control can turn into a shotgun wedding. Fortunately, it needn't be so. A balance can be struck between the expressive needs of content authors and the findability needs of information seekers. D oing so depends on the proper definition, creation and management of the inform.ation resources provided to both groups. Taxonomies arc the lynch pin of this process.
  • 55. Random documents with unrelated content Scribd suggests to you:
  • 56. 6 Ib. p. 199. 7 Ann. du Museum d’Hist. Nat., tom. i. p. 234. 8 Lyell’s Principles of Geology, ii. p. 31. 9 Principles of Geology, ii. p. 8. 10 This subject will be found to be discussed at considerable length, and in a very satisfactory manner, in the second volume of Mr. Lyell’s Principles of Geology, p. 1-65. 11 Animaux sans Vertébres, i. p. 260. 12 Ibid. 258, N. Dict. d’Hist. Nat. xvi. Art. Intelligence. 13 Kirby’s Bridge. Treat. Intro. p. xxxii. 14 N. Dict. d’Hist. Nat. xxii. Art. Nature, 377; Anim. sans Vert. i. p. 317. 15 Anim. sans Vert. i. p. 316. 16 Anim. sans Vert., vol. i. 322. 17 On the Influence of the Moon on the Earth’s Atmosphere; Journal de Physique, Prairial, an. vi. Most of Lamarck’s other essays on Meteorology will be found in the periodical just named. 18 The most recent and probably the best edition of the Animaux sans Vertébres, is in eight volumes octavo, augmented with notes by M. M. Deshages and Milne Edwards. 19 Animaux sans Vertébres, i. 381. 20 Horæ Entomologicæ, p. 213.
  • 57. 21 Cuvier conceives that the basin of Paris contains a greater accumulation of fossil shells than any other place of equal extent. At Grignon, no fewer than six hundred different species have been collected in a space not exceeding a few square toises. 22 See Boisduval, Nouv. Ann. du Museum, vol. ii. 23 Benett’s Wanderings, &c. i. p. 265. 24 Bridg. Treat. ii. 350. 25 Horsfield’s Catal. of the Lepidopterous Insects of Java, Intro. p. 9. 26 This work extends to fourteen volumes (the last published in 1833), and three supplementary ones are in course of preparation. 27 Species général des Lépidoptères, p. 158. 28 Voyage de l’Astrolabe, Ent., pl. 4, fig. 1 and 2. 29 Species général des Lépidoptères, vol. i. p. 184. 30 Encyclop. Methodique, Art. Papillon, p. 67. No. 116. 31 Descrip. Catal. of Lepid. of Indian Company, pl. i. fig. 14. 32 Species général des Lepidoptères, i. p. 435. 33 Wilson’s Illust. of Zoology, fol. 27. 34 On the Plate the under figure should have been marked 1, the upper 2. 35 Supp. to Cramer, p. 10, 11.
  • 58. 36 Owing to the resemblance which this species bears to H. Cupido, the latter name has been inadvertently attached to the figure on the adjoining Plate. 37 Zoological Illustrations, 126. 38 Trans. of Zool. Society of London, i. p. 187. 39 Zoological Illustrations, 2d series, 131.
  • 59. Transcriber’s Note: Obvious printer errors corrected silently. Inconsistent spelling and hyphenation are as in the original.
  • 60. *** END OF THE PROJECT GUTENBERG EBOOK FOREIGN BUTTERFLIES *** Updated editions will replace the previous one—the old editions will be renamed. Creating the works from print editions not protected by U.S. copyright law means that no one owns a United States copyright in these works, so the Foundation (and you!) can copy and distribute it in the United States without permission and without paying copyright royalties. Special rules, set forth in the General Terms of Use part of this license, apply to copying and distributing Project Gutenberg™ electronic works to protect the PROJECT GUTENBERG™ concept and trademark. Project Gutenberg is a registered trademark, and may not be used if you charge for an eBook, except by following the terms of the trademark license, including paying royalties for use of the Project Gutenberg trademark. If you do not charge anything for copies of this eBook, complying with the trademark license is very easy. You may use this eBook for nearly any purpose such as creation of derivative works, reports, performances and research. Project Gutenberg eBooks may be modified and printed and given away—you may do practically ANYTHING in the United States with eBooks not protected by U.S. copyright law. Redistribution is subject to the trademark license, especially commercial redistribution. START: FULL LICENSE
  • 61. THE FULL PROJECT GUTENBERG LICENSE
  • 62. PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK To protect the Project Gutenberg™ mission of promoting the free distribution of electronic works, by using or distributing this work (or any other work associated in any way with the phrase “Project Gutenberg”), you agree to comply with all the terms of the Full Project Gutenberg™ License available with this file or online at www.gutenberg.org/license. Section 1. General Terms of Use and Redistributing Project Gutenberg™ electronic works 1.A. By reading or using any part of this Project Gutenberg™ electronic work, you indicate that you have read, understand, agree to and accept all the terms of this license and intellectual property (trademark/copyright) agreement. If you do not agree to abide by all the terms of this agreement, you must cease using and return or destroy all copies of Project Gutenberg™ electronic works in your possession. If you paid a fee for obtaining a copy of or access to a Project Gutenberg™ electronic work and you do not agree to be bound by the terms of this agreement, you may obtain a refund from the person or entity to whom you paid the fee as set forth in paragraph 1.E.8. 1.B. “Project Gutenberg” is a registered trademark. It may only be used on or associated in any way with an electronic work by people who agree to be bound by the terms of this agreement. There are a few things that you can do with most Project Gutenberg™ electronic works even without complying with the full terms of this agreement. See paragraph 1.C below. There are a lot of things you can do with Project Gutenberg™ electronic works if you follow the terms of this agreement and help preserve free future access to Project Gutenberg™ electronic works. See paragraph 1.E below.
  • 63. 1.C. The Project Gutenberg Literary Archive Foundation (“the Foundation” or PGLAF), owns a compilation copyright in the collection of Project Gutenberg™ electronic works. Nearly all the individual works in the collection are in the public domain in the United States. If an individual work is unprotected by copyright law in the United States and you are located in the United States, we do not claim a right to prevent you from copying, distributing, performing, displaying or creating derivative works based on the work as long as all references to Project Gutenberg are removed. Of course, we hope that you will support the Project Gutenberg™ mission of promoting free access to electronic works by freely sharing Project Gutenberg™ works in compliance with the terms of this agreement for keeping the Project Gutenberg™ name associated with the work. You can easily comply with the terms of this agreement by keeping this work in the same format with its attached full Project Gutenberg™ License when you share it without charge with others. 1.D. The copyright laws of the place where you are located also govern what you can do with this work. Copyright laws in most countries are in a constant state of change. If you are outside the United States, check the laws of your country in addition to the terms of this agreement before downloading, copying, displaying, performing, distributing or creating derivative works based on this work or any other Project Gutenberg™ work. The Foundation makes no representations concerning the copyright status of any work in any country other than the United States. 1.E. Unless you have removed all references to Project Gutenberg: 1.E.1. The following sentence, with active links to, or other immediate access to, the full Project Gutenberg™ License must appear prominently whenever any copy of a Project Gutenberg™ work (any work on which the phrase “Project Gutenberg” appears, or with which the phrase “Project Gutenberg” is associated) is accessed, displayed, performed, viewed, copied or distributed:
  • 64. This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook. 1.E.2. If an individual Project Gutenberg™ electronic work is derived from texts not protected by U.S. copyright law (does not contain a notice indicating that it is posted with permission of the copyright holder), the work can be copied and distributed to anyone in the United States without paying any fees or charges. If you are redistributing or providing access to a work with the phrase “Project Gutenberg” associated with or appearing on the work, you must comply either with the requirements of paragraphs 1.E.1 through 1.E.7 or obtain permission for the use of the work and the Project Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9. 1.E.3. If an individual Project Gutenberg™ electronic work is posted with the permission of the copyright holder, your use and distribution must comply with both paragraphs 1.E.1 through 1.E.7 and any additional terms imposed by the copyright holder. Additional terms will be linked to the Project Gutenberg™ License for all works posted with the permission of the copyright holder found at the beginning of this work. 1.E.4. Do not unlink or detach or remove the full Project Gutenberg™ License terms from this work, or any files containing a part of this work or any other work associated with Project Gutenberg™. 1.E.5. Do not copy, display, perform, distribute or redistribute this electronic work, or any part of this electronic work, without prominently displaying the sentence set forth in paragraph 1.E.1
  • 65. with active links or immediate access to the full terms of the Project Gutenberg™ License. 1.E.6. You may convert to and distribute this work in any binary, compressed, marked up, nonproprietary or proprietary form, including any word processing or hypertext form. However, if you provide access to or distribute copies of a Project Gutenberg™ work in a format other than “Plain Vanilla ASCII” or other format used in the official version posted on the official Project Gutenberg™ website (www.gutenberg.org), you must, at no additional cost, fee or expense to the user, provide a copy, a means of exporting a copy, or a means of obtaining a copy upon request, of the work in its original “Plain Vanilla ASCII” or other form. Any alternate format must include the full Project Gutenberg™ License as specified in paragraph 1.E.1. 1.E.7. Do not charge a fee for access to, viewing, displaying, performing, copying or distributing any Project Gutenberg™ works unless you comply with paragraph 1.E.8 or 1.E.9. 1.E.8. You may charge a reasonable fee for copies of or providing access to or distributing Project Gutenberg™ electronic works provided that: • You pay a royalty fee of 20% of the gross profits you derive from the use of Project Gutenberg™ works calculated using the method you already use to calculate your applicable taxes. The fee is owed to the owner of the Project Gutenberg™ trademark, but he has agreed to donate royalties under this paragraph to the Project Gutenberg Literary Archive Foundation. Royalty payments must be paid within 60 days following each date on which you prepare (or are legally required to prepare) your periodic tax returns. Royalty payments should be clearly marked as such and sent to the Project Gutenberg Literary Archive Foundation at the address specified in Section 4, “Information
  • 66. about donations to the Project Gutenberg Literary Archive Foundation.” • You provide a full refund of any money paid by a user who notifies you in writing (or by e-mail) within 30 days of receipt that s/he does not agree to the terms of the full Project Gutenberg™ License. You must require such a user to return or destroy all copies of the works possessed in a physical medium and discontinue all use of and all access to other copies of Project Gutenberg™ works. • You provide, in accordance with paragraph 1.F.3, a full refund of any money paid for a work or a replacement copy, if a defect in the electronic work is discovered and reported to you within 90 days of receipt of the work. • You comply with all other terms of this agreement for free distribution of Project Gutenberg™ works. 1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™ electronic work or group of works on different terms than are set forth in this agreement, you must obtain permission in writing from the Project Gutenberg Literary Archive Foundation, the manager of the Project Gutenberg™ trademark. Contact the Foundation as set forth in Section 3 below. 1.F. 1.F.1. Project Gutenberg volunteers and employees expend considerable effort to identify, do copyright research on, transcribe and proofread works not protected by U.S. copyright law in creating the Project Gutenberg™ collection. Despite these efforts, Project Gutenberg™ electronic works, and the medium on which they may be stored, may contain “Defects,” such as, but not limited to, incomplete, inaccurate or corrupt data, transcription errors, a copyright or other intellectual property infringement, a defective or
  • 67. damaged disk or other medium, a computer virus, or computer codes that damage or cannot be read by your equipment. 1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for the “Right of Replacement or Refund” described in paragraph 1.F.3, the Project Gutenberg Literary Archive Foundation, the owner of the Project Gutenberg™ trademark, and any other party distributing a Project Gutenberg™ electronic work under this agreement, disclaim all liability to you for damages, costs and expenses, including legal fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE. 1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you discover a defect in this electronic work within 90 days of receiving it, you can receive a refund of the money (if any) you paid for it by sending a written explanation to the person you received the work from. If you received the work on a physical medium, you must return the medium with your written explanation. The person or entity that provided you with the defective work may elect to provide a replacement copy in lieu of a refund. If you received the work electronically, the person or entity providing it to you may choose to give you a second opportunity to receive the work electronically in lieu of a refund. If the second copy is also defective, you may demand a refund in writing without further opportunities to fix the problem. 1.F.4. Except for the limited right of replacement or refund set forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
  • 68. INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PURPOSE. 1.F.5. Some states do not allow disclaimers of certain implied warranties or the exclusion or limitation of certain types of damages. If any disclaimer or limitation set forth in this agreement violates the law of the state applicable to this agreement, the agreement shall be interpreted to make the maximum disclaimer or limitation permitted by the applicable state law. The invalidity or unenforceability of any provision of this agreement shall not void the remaining provisions. 1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation, the trademark owner, any agent or employee of the Foundation, anyone providing copies of Project Gutenberg™ electronic works in accordance with this agreement, and any volunteers associated with the production, promotion and distribution of Project Gutenberg™ electronic works, harmless from all liability, costs and expenses, including legal fees, that arise directly or indirectly from any of the following which you do or cause to occur: (a) distribution of this or any Project Gutenberg™ work, (b) alteration, modification, or additions or deletions to any Project Gutenberg™ work, and (c) any Defect you cause. Section 2. Information about the Mission of Project Gutenberg™ Project Gutenberg™ is synonymous with the free distribution of electronic works in formats readable by the widest variety of computers including obsolete, old, middle-aged and new computers. It exists because of the efforts of hundreds of volunteers and donations from people in all walks of life. Volunteers and financial support to provide volunteers with the assistance they need are critical to reaching Project Gutenberg™’s goals and ensuring that the Project Gutenberg™ collection will
  • 69. remain freely available for generations to come. In 2001, the Project Gutenberg Literary Archive Foundation was created to provide a secure and permanent future for Project Gutenberg™ and future generations. To learn more about the Project Gutenberg Literary Archive Foundation and how your efforts and donations can help, see Sections 3 and 4 and the Foundation information page at www.gutenberg.org. Section 3. Information about the Project Gutenberg Literary Archive Foundation The Project Gutenberg Literary Archive Foundation is a non-profit 501(c)(3) educational corporation organized under the laws of the state of Mississippi and granted tax exempt status by the Internal Revenue Service. The Foundation’s EIN or federal tax identification number is 64-6221541. Contributions to the Project Gutenberg Literary Archive Foundation are tax deductible to the full extent permitted by U.S. federal laws and your state’s laws. The Foundation’s business office is located at 809 North 1500 West, Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up to date contact information can be found at the Foundation’s website and official page at www.gutenberg.org/contact Section 4. Information about Donations to the Project Gutenberg Literary Archive Foundation Project Gutenberg™ depends upon and cannot survive without widespread public support and donations to carry out its mission of increasing the number of public domain and licensed works that can be freely distributed in machine-readable form accessible by the widest array of equipment including outdated equipment. Many
  • 70. small donations ($1 to $5,000) are particularly important to maintaining tax exempt status with the IRS. The Foundation is committed to complying with the laws regulating charities and charitable donations in all 50 states of the United States. Compliance requirements are not uniform and it takes a considerable effort, much paperwork and many fees to meet and keep up with these requirements. We do not solicit donations in locations where we have not received written confirmation of compliance. To SEND DONATIONS or determine the status of compliance for any particular state visit www.gutenberg.org/donate. While we cannot and do not solicit contributions from states where we have not met the solicitation requirements, we know of no prohibition against accepting unsolicited donations from donors in such states who approach us with offers to donate. International donations are gratefully accepted, but we cannot make any statements concerning tax treatment of donations received from outside the United States. U.S. laws alone swamp our small staff. Please check the Project Gutenberg web pages for current donation methods and addresses. Donations are accepted in a number of other ways including checks, online payments and credit card donations. To donate, please visit: www.gutenberg.org/donate. Section 5. General Information About Project Gutenberg™ electronic works Professor Michael S. Hart was the originator of the Project Gutenberg™ concept of a library of electronic works that could be freely shared with anyone. For forty years, he produced and distributed Project Gutenberg™ eBooks with only a loose network of volunteer support.
  • 71. Project Gutenberg™ eBooks are often created from several printed editions, all of which are confirmed as not protected by copyright in the U.S. unless a copyright notice is included. Thus, we do not necessarily keep eBooks in compliance with any particular paper edition. Most people start at our website which has the main PG search facility: www.gutenberg.org. This website includes information about Project Gutenberg™, including how to make donations to the Project Gutenberg Literary Archive Foundation, how to help produce our new eBooks, and how to subscribe to our email newsletter to hear about new eBooks.
  • 72. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com