Building Enterprise Taxonomies 2nd Edition Darin L Stewart
Building Enterprise Taxonomies 2nd Edition Darin L Stewart
Building Enterprise Taxonomies 2nd Edition Darin L Stewart
Building Enterprise Taxonomies 2nd Edition Darin L Stewart
1. Building Enterprise Taxonomies 2nd Edition Darin
L Stewart download
https://ebookbell.com/product/building-enterprise-taxonomies-2nd-
edition-darin-l-stewart-52722162
Explore and download more ebooks at ebookbell.com
2. Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Building Enterprise Iot Solutions With Eclipse Iot Technologies An
Open Source Approach To Edge Computing 1st Edition Frdric Desbiens
https://ebookbell.com/product/building-enterprise-iot-solutions-with-
eclipse-iot-technologies-an-open-source-approach-to-edge-
computing-1st-edition-frdric-desbiens-47394342
Building Enterprise Applications With Windows Presentation Foundation
And The Mvvm Model View Viewmodel Pattern 1st Edition Raffaele
Garofalo
https://ebookbell.com/product/building-enterprise-applications-with-
windows-presentation-foundation-and-the-mvvm-model-view-viewmodel-
pattern-1st-edition-raffaele-garofalo-2247674
Building Enterprise Blockchain Solutions On Aws A Developers Guide To
Build Deploy And Managed Apps Using Ethereum Hyperledger Fabric And
Aws Blockchain English Edition Palaniachari
https://ebookbell.com/product/building-enterprise-blockchain-
solutions-on-aws-a-developers-guide-to-build-deploy-and-managed-apps-
using-ethereum-hyperledger-fabric-and-aws-blockchain-english-edition-
palaniachari-34563892
Building Enterprise Javascript Applications Daniel Li
https://ebookbell.com/product/building-enterprise-javascript-
applications-daniel-li-36371318
3. Building Enterprise Systems With Odp An Introduction To Open
Distributed Processing Linington
https://ebookbell.com/product/building-enterprise-systems-with-odp-an-
introduction-to-open-distributed-processing-linington-4393208
Building Enterprise Systems With Odp An Introduction To Open
Distributed Processing 1st Edition Peter F Linington
https://ebookbell.com/product/building-enterprise-systems-with-odp-an-
introduction-to-open-distributed-processing-1st-edition-peter-f-
linington-4440298
Building Enterpriseready Telephony Systems With Sipxecs 40 Leveraging
Open Source Voip For A Rocksolid Communications System Michael W
Picher Anthony Graziano
https://ebookbell.com/product/building-enterpriseready-telephony-
systems-with-sipxecs-40-leveraging-open-source-voip-for-a-rocksolid-
communications-system-michael-w-picher-anthony-graziano-4720524
Building Enterprise Iot Applications Chandrasekar Vuppalapati
https://ebookbell.com/product/building-enterprise-iot-applications-
chandrasekar-vuppalapati-11116130
Building Enterprise Applications With Windows Presentation Foundation
And The Mvvm Model View Viewmodel Pattern 1st Edition Raffaele
Garofalo
https://ebookbell.com/product/building-enterprise-applications-with-
windows-presentation-foundation-and-the-mvvm-model-view-viewmodel-
pattern-1st-edition-raffaele-garofalo-11508960
9. Contents
1. Findability 1
Infoglut 4
The Problem with Search 6
Tcleporting and Orienteerin 12
2. Metadata 23
T ypes of Metadata 28
Descriptive Metadata 28
Administrative Metadata 30
Structural Metadata 31
Metadata Schemas 32
Where Do I Put It? 36
Where Does It Come From? 39
Metadata and Authority Control 41
3. Taxonomy 45
Linnaean Taxonomy 48
Controlled Vocabulat:ies 51
Faceted Classification 59
4. Preparations 67
The Taxonomy Development Cycle 70
Research 72
10. Performing a Content Audit 76
Creating a Governance Document 83
5. Terms 89
Internal Term Sources 90
I ntranets and Websites 92
External Term Sources 95
Existing Taxonomies 98
Refining Terms 99
Basic Hygiene 99
Compound and Precoordinated Terms 107
Disambiguation 110
6. Structure 115
Card Sorting 116
Categories and Facets 122
7. Interoperability 135
Basic XML Concepts 137
Representing Hierarchy 140
l;ear of Baggage Handling 144
XSLT 146
Zthes 152
8. Ontology_ 159
What Is An Ontology? 160
Class Hierarchy, Slots and Facets 162
Resource Description Framework 168
11. RDF/ XNU~ 176
RDF Schema 180
Web Ontology Language (OWL) 181
9. Folksonomy 185
Tagging 186
Folksonomy 191
Tag Clouds 194
Pace Layerin 202
Glossary 205
Notes 219
Index 227
12. 1
Findability
"But the plans were on display..."
"On display? I had to go down to the cellar to find them."
'That's the display department."
"With a flashlight."
"Ah, well, the lights had probably gone."
"So had the stairs."
"But look, you found the notice didn't you?"
"Yes," said Arthur, "yes I did. It was on display in the
bottom of a locked filing cabinet stuck in a disused lavatory
with a sign on the door saying 'Beware of the Leopard."'
From The Hitchhikers Guide to the Galaxy
rinding good information is hard, much harder than it should be.
Your first encounter with a new website often feels like entering a
strange land with its own language, laws, customs and culture. You
have business to conduct there, but must do so without the benefit
of an interpreter or guide. As you begin to explore the homepage,
you must quickly orient yourself to its wugue approach to navigation,
in terpret bizarre labels and menus, guess at search terms and wade
through propaganda in search of useful information. And these are
just the public pages.
T hings get much more dangerous if you venture out of the tourist
areas and onto an i.ntranet or, heaven help you, a file system. Once
you enter the realm of the enterprise i.n fonnation system, all bets are
off. The seemingly unified front of the corporate website dissolves
into a collection of fiefdoms, each with its own local dialect and
13. 2 Chapter One
jealously guarded borders passable only with the right permissions
and passwords. There also seems to be a civil war underway.
Despite our best efforts, most websites, portals, intrancts and file
systems are hostile environments for information seekers. We hire
consultants, hold focus groups and conduct usability studies to
w1derstand our users' needs. We build site maps, add search boxes,
and tag our content, and users still get lost. According to surveys
conducted by Gartner, IDC and others, knowledge workers spend
from thirty to as much as forty percent of their work day searching
for information and yet only find what they need less than half the
time.
1
This means we spend more time looking for documents than
actuaUy reading them. This situation is not just embarrassing, it's
expensive.
A third of a scmor knowledge worker's time, the time they spend
chasing information, works out to be roughly S26,000 a year i.n salary
and benefits on average. When those searches arc successful, this is a
legitimate cost of doing business. When they fail, that fruitless search
time is a drain on resources. Yct as expensive as tlus may seem,
search time is a mi.nor component of the cost of luddcn information.
Even ilic tens of iliousands of dollars spent on redesigning and
maintaining an improved website is trivial if it gets users to tl1e
content iliey need. The true cost comes when users ilirow up their
hands and abandon ilieir search. Studies have suggested that this
happens after about twelve minutes at tl,c outside.
This phenomenon is not restricted to complex searches and obscure
facts. Inforn1ation as mundane as tl,e contact i.nformation for the
director of human resources cannot be located by employees on their
own Intranet fifty-seven percent of the time. Those intrepid few who
can find ilic information usually must troU tl1rough multiple Web
pages and documents looki.ng for an org chart (which is probably out
of date) that nught have tl,e director's name. They must then look up
the director in an employee directory located elsewhere on the
Intranet hoping they spelled ilie name right. O ne study found tlus to
be the case in five out of si.-..:: corporate Intraoets.2
14. Findability 3
When people can't fi11d what they need, they don't just give up. They
go elsewhere. When a consumer doesn't find the right product, they
go to a competitor, which in aggregate costs your company half of its
potential sales. Tf they arc already a customer, they pick up the phone.
This costs you an average of seventeen dollars for each call that yom
self-service website was supposed to eliminatc.3
When an employee
can't find what they need, they go to a co-worker, doubling costs
while halving productivity and often yielding no better results. Tn a
2002 research note, Rcgi11a Casonato and Kathy Harris of Gartner
estim ated that ao employee will get fifty to seventy-five percent of
the mformation they need directly from other people, effectively
erasmg tl1e benefits of a corporate Tntranet.4
When a knowledge worker reaches this dead-end, they have little
choice but to set about creating tl1e information they need &om
scratch. This may be as simple as running a report and stitching a few
documents together, but more often it involves considerable
research, an additional information chase and consultation with
multiple colleagues. Unfortunately, all this effort is not being
expended to create information, but to recreate it. As much as ninety
percent of the time spent creating information for a specific need is
actually recreating information that already exists but could not be
located.
5
According to Kit Sims Taylor, this is because it is simply too
hard to find what you need.
At present it is easier to write that contract clause,
exam question, insurance policy clause, etc., ourselves
than to find something close enough to what we want
from elsewhere.... While most of us do not like to
admit that much of our creative work involves
reinventing the wheel, an honest assessment of our
work would indicate that we do far more 'recreating'
than creating. 6
Taylor has found that in addition to the amount of time spent
looking for information, an additional thirty percent is spent
reinventing the wheel. When you account for communication and
collaboration overhead, only ten percent of our time, effort and
15. 4 Chapter One
energies is actually spent in the creation of new knowledge and
information. In a separate study, IDC found that th.is "knowledge
work deficit" costs Fortune 500 companies over twelve billion dollars
annually.7
These arc just the purely quantifiable costs. Consider the impact poor
findability has on decision making when there simply isn't time to re-
research and recreate the needed intelligence. Critical decisions may
be delayed because the information we can find, if any, is either
incomplete or conllicting. Worse, bad decisions may be enacted when
they wouldn't have even been considered had a fuller, more accurate
picture been available. ln this age of compliance, the ability to locate
and produce information on demand can mean the difference
between passing an audit and dissolving the company.
lnfoglut
So how did we get into this mess? We have spent literally trillions of
dollars on information technology, and yet our access to information
seems to get worse in direct proportion to the amount of money and
effort expended to improve it. Some pundits point to the sheer
volume of information with which we arc inundated and resign
themselves to this inevitable consequence of life in the information
age. As Britton Hadden of n NfE magazine put it:
Everyday living is too fast, too busy, too complicated.
More than at any other time in history, it's important to
have good information on just about every aspect of
life. And there is more information available than ever
before. Too much in fact. There is simply no time for
people to gather and absorb the information they need.
Hadden made this observation in 1929, shortly before founding the
magazine. Infoglut is not a new problem, but until recently it was at
least somewhat manageable. Today we are discovering that the.: only
16. Findability 5
thing worse and more dangerous than trying to run an organization
with too little information is trymg to manage one with too much.
Everyone understands intuitively that infoglut is a problem, but few
have a clear sense of how much of a problem it really is. Experts
have long proclaimed the dangers of information overload. While
hyperbole is the lifeblood of consultants, in this case they seem to be
right on th.e money.
Each year the world produces roughly five exabytes (1018
/ijleJ) of
new information. To put that in more familiar terms, if the seventeen
million books in the Library of Congress were fully digitized, five
exabytes would be the eguivalent of 37,000 new libraries each year.
While thjs is staggering in and of itself, consider that in 1999 it is
estimated that only two exabytes of new information was created,
meaning that the rate of information growth is accelerating by 30% a
year. 92% of that information is stored on digital meilia and 40% is
generated by the United States alone. We create 1,397 terabytcs of
o ffice documents each year. Each day we send thirty-one billi.on
emails.8
It is no wonder that we are, as John Naisbitt famously put it,
"drowning in information, but starved for knowledge."
The deluge bas not caught us by surprise. O n the contrary, we have
attacked it with a vengeance, pouring billions into data warehouses,
CR.N(, EJUJ, business intelligence and other data management and
reporting systems. These efforts and investments have bought us
great insight into our str11ctured con/en!: that highly organized
information structured according to a well defined schema or
framework. These are the records found in relational databases and
tl1at slot so ruccly into spreadsheets and reports. 'l'he information
contained in these records can easily be located, manipulated and
retrieved by means of standard guery languages such as SQL.
Unfortunately, this type of domesticated data makes up only fifteen
percent of the total information with which we must copc.'1
The
remaining eighty-five percent is made up of Web pages, emails,
memos, PowerPoint presentations, invoices, product literature,
procedure manuals, take-out menus and anything else that doesn't fit
neatly into a row in a database. The common factor among all of
17. 6 Cbap terOne
these different forms of 1111struct11red co11/e11! is that they arc all designed
for human consumption rather than machine processing. As a result,
all of the tried and true methods of data management we have
worked so hard to master fail miserably when asked to bring a
company picnic announcement to heel. So while quarterly sales
forecasts across four continents may be readily available, knowing
whether you are supposed to bring a salad or a dessert may be out o f
reach.
The Problem with Search
This aspect of the information onslaught bas in fact taken us by
surprise. Many of us arc still in denial. A fter all, with fully indexed,
electronic in formation sources, full-text searching should allow us to
specify all the terms and subjects in which we are interested and have
the information retrieved and delivered to our desktop. As any user
of Google, A9, or countless other search and retrieval engines has
learned tlirough painful experience, things rarely work out that neatly.
Rather than receiving a nice, neat set of t,'l.rgeted documents, search
engines generally present us with long lists of Web pages that merely
contain the words on which we searched. Whether or not those
words a.re used in the manner and context we intended (did you
mean Mercury the planet, tl1e car, ilie Roman God or the element?)
isn't pa.rt of the equation. We a.re left to sort through page after page
of links looking for something that might be relevant.
Part of th.is problem is self-inflicted. People just don't write good
queries. O ne third of the time, search engine users only specify a
single word as tl1e.i.r query and on average use only two or thrcc.
10
This is what leads to so many irrelevant documents being returned.
We don't give enough context to our subject to eliminate documents
that arc not o f interest. lf you query just on the term "Washington"
you will receive links to information o n the state, the president, the
capital, a type of apple, a movie star, a university and so forth. In all,
Google returns 1,180,000,000 "hits." If you add the term "Denzel"
18. Findability 7
the number of links drops to 3,520,000, and we are reasonably
focused on the actor. If we add the phrase "Academy Awa.rd" we
finally get to 107,000 docwnents reasonably focused on the actor's
accolades. So the more specific and verbose we are with ou.r queries
the more relevant the results.
But what happens if you use the J cademy Awa.rd's comm on
nickname "Oscar" in yom query? The number of hits jumps to
593,000. This is the risk of getting too specific with search terms. By
using the proper name of the award rather than its popular name, we
may have missed 486,000 potentially relevant documents. Guessing
the wrong search term can have a dramatic impact on what you do
and don't frnd.
Information scientists have long been aware that there a.re tradcoffs
between depth and coverage whenever a search is conducted. The
broader the search is, the more documents that a.re retrieved,
including those that a.re not relevant to the actual information need.
Conversely, the deeper or narrower the search, the more likely
retrieved documents a.re to be relevant. The cost, of course, is that it
is also more likely that documents of interest will be missed in the
search. The difficulty arises from the fine balance of preciJion and
recall.
Precision is usually described as a ratio: the number of relevant
documents retrieved divided by the total number of documents
retrieved. In other words, what percentage of the total number of
docwnents retrieved arc actually related to the topic being
investigated? For example, a Google search on the terms "precision"
and "recall" returns approximately 970,000 documents. T he fust few
documents in the list do indeed prove to be related to measures of
search performance. However, a few links into the list a news item
appears: "Vermont Precision Woodworks Announce Recall of
Cribs." From the search engine's perspective, this is a perfectly valid
docw11ent. Tt contains both of the search tcrn1.s in its title. In fact,
one search term appears in the title of the website itself, www.recall-
wa.rnings.com, thus causing it to receive a high relevancy ranking.
O ut of 970,000 documents, it is safe to assume that many, if not
19. 8 Cbapter One
most, of the retrieved documents will have this level of relevancy to
our 9uery. This indicates low precision, but high recall.
Recall is also a ratio and is defined as the number of relevant
documents retrievedthe total number of relevant documents in the
collection being searched. The example above probably has a high
recall due to the large number o f documents returned.
Relevant
Documents Retrieved
Total
Documents Retrieved
=PRECISION
Relev,ml
Documents Retrieved
Total Relevant
Documents in Collection
recaU
=RECALL
These two measures are inversely related: as recall increases, precision
decreases. A balance must be found between the two, retrieving
enough docw11ents to get an individual the information they need
without returning so many that wading through irrelevant
information becomes burdensome. This balance is the heart of
information retrieval, but it is difficult to measure precision and recall
precisely. 'Ibis is because we rarely know what is contained in the
collection we arc searching, in this case the Internet itself, and also
because the notion of relevance is very subjective. At best we can
estim ate recall and precision based on feedback from users of the
search engine in 9ucstion and make adjustments as appropriate.
Taking our Google search on "precision" and "recall" as a test case, it
may seem that the problem isn't so bad. J fter all, the first several
documents in the list were on the exact topic we were seeking: search
performance measures. We can just disregard the other 3.5 million
documents offered. We got what we needed from the top ten or
l:'(vcnty.
20. Findability 9
This ability to rank pertinent documents near the top of a result set is
what has made Google the clear winner of the search engine wars.
Their PageRanJ<: algorithm is a key ingredient in the Google secret
sauce.11
Rather than just counting how many ti.mes a certain word
occurs in a document or where it occurs, Google also looks at who
links to that document. If a lot of pages reference a particular
website, chances are that it is a pretty important source of
information on the topic at hand. If the pages linking in are
themselves important, then that likelihood increases and the
document's relevancy rank improves accordingly. This variation on
"citation analysis," which is traditionally used to determine the
importance of scholarly publications, has radically changed Internet
search for the better. Google even offers a free tool that l can add to
my website to search my own content with just a few lines of code.
So, problem solved? Not quite. There arc several caveats to applying
a Google-like tool to your fi.ndability challenges. First, Google free
site search is really only searching a subset of the entire Google index,
that part representing just your website. As a result, only those Web
pages that are open and available to the public will be included in a
search. Anything on the lntranet is invisible to the Google spiders,
the programs that find and index Web pages and build up the search
index. Even those pages and documents that are open to the Internet
at large may be missed. Indexing programs only go so deep when
looking over a website. If your content is more than a link or two
away from the main page, it will probably be missed. Any new
content you add will likewise be invisible until the next time an
indexing spider happens by- a process completely outside of your
control. As Google explains:
There are a number of reasons a page might not appear
in the results of your Google free site search. It could
be that Google hasn't crawled that particular page yet.
Google refreshes its index frequently, but some pages
are inevitably missed. Or, the page may have
Javascript, frames, or store information in a database.
Pages like these are difficult or impossible for the
Google crawler to visit and index.
12
21. Chapter 0 11e
Finally, Google's greatest strength, the PagcRank algorithm, is also its
greatest weakness when applied to a single website. l t is unlikely that
CNN.com or eBay will reference your org chart. In fact, very few
websites outside of your organization will link to your internal
documents. Yet the rankings applied to your documents are
determined in the context of rankings of the Internet as a whole. Th.is
effectively renders the relevancy judgments made on your content
meaningless when the search is restricted to your own sitc.11
Aside from the arcane nature of indexing, the very act of searching
can be a struggle in most organizations. Documents and content are
spread out across multiple locations and repositories. Policies may be
on the Intranet, quarterly reports on the file system, resumes in a
departmental directory and price lists on the company homepage.
Finding information is no longer an exercise in finding a needle in a
haystack. First you must choose which haystacks to search, in what
order, and for how long. In most organizations, less than half of their
documents arc centrally inclexecl.14
Th.is means that it is impossible to
look for information in aU potential locations with a single query or
even a single search tool. Th.is dispersal of information across an
organization leads to another search challenge: choosing the correct
query terms.
< !-- sitesearch Googl e -->
<FORM rnethod =GET a ccion=" http: / ;..,...,. google . com/search ">
<input type=hidden narne=ie value=UTF-B>
<i np ut type=ludden narne=oe val ue=UT F-B>
< TABLE £.9.£9.-!-..2£.= " #FFFFFF "><~..i:.:><td>
< A HREF= " htt p : / /www. google . com/ " >
<IMG SRC= " http : / /www. google.com/ l ogos/ Logo_ 40wht . gif"
£2.uie r= " 0 " ALT= "Google " ></A>
</t d><td>
<INPUT TY PE=text narne=q size=31 maxlength=255 v alue="">
<INPUT type=submit name=btn<q VALUE="Google Search" >
<f ont size=-1>
<input type=hidden narne=dornains v alue="YOUR DOMAIN NAME " ><b.r_>
< input type=radio name=sit;es;ean:::J~ value= ""> WliJW <input type=r adio
!l
s!l!!..
~ it~ ~ value="YOUR DOMAIN NAME" chec ke d > YOUR DOMAIN NAME
<br ></font ></td></tr></TABLE>
</ FORM>
~ Sit;eSea~ Google -->
Figure 2. Just cut, paste and you've got search. Not quite.
22. Fir1dability
Most search engines create their
indexes by extracting terms
from the full text of
documents. As a result content
creators and authors become de
facto indexers and catalogers.
The words they choose in
authoring their documents
become the search terms
available to their readers. This
becomes a problem if they
don't speak the same language.
Th.is goes back to the Mercury
(planet, car, god) and actor
(Academy J ward or Oscar)
problem.
11
Figure 3. The ideal relationship
between author and searcher.
Unless there is a company standard for terminology, and these are
rare, each area of an enterprise is going to have its own language. f
cmtomer in one area may be a client in another and a patron somewhere
else. This lack of consistency in search and indexing terms has
proven to be the single greatest challenge to the effectiveness of
search and findability in general.15
Ultimately, any search consists of, at rruru.mum, four hurdles that
must be cleared. First, the information seeker must be able to
articulate what they are looking for with the right syntax for the
specific search tool being used. ext, they must guess what words an
author may have used to express the concept of interest. Then, with
the query in mind, they must figure out the most likely place to
search. Finally, they must sift through the results of their search,
separating the potentially relevant from the clearly irrelevant and
hope what they end up with is complete, representing all that is
available. Really, it's a wonder that we ever ftnd anything at all.
23. 12 Chapter One
Teleporting and Orienteering
A keyword search is most often an attempt (usually several attempts,
actually) to go directly and instantaneously to the exact location o f
desired information. If we search the Web on the terms "Aladc:lin
Theater Box-Office" we hope to land where we can purchase tickets
for concerts at th.is small venue in Portland, Oregon, without having
to sift through irrelevant information. The academic commwlity has
labeled tllis sort of information seeking behavior teleporting.16
Teleporting is one strategy for finc:ling information and can be
executed in various ways with a number of search tactics. In addition
to keyword search, an information seeker may attempt to tcleport by
specifying a specific URL, opening a certain email, or typing in a
directory path to a particular document. Perfect tclcporting (hitting
your target on the first attempt) is a rare accomplishment; so rare in
fact that a game, "Google Whacking," has sprung up around the
challcnge.17
Yct despite tl1e difficulty in finding just the right
information witl1 search alone, most websites and information portals
seem designed to encourage the attempt as evidenced by the
ubiquitous search box.
A more realistic scenario is to teleport into the general vicinity of the
information you arc seeking, using search or some other tactic, an<l
then zero in on your target with a succession of small steps. To buy
OLLr concert r-jckcts for a show at the AJadc:lin, for example, we might
teleport by typing in the URL for tl,c theater: www.aladc:lin-
tl1catcr.com. We know we arc close, but still can't buy our tickets so
we may follow tl1c link to the "Upcoming Shows" page. Herc we find
the performer we arc looking for listed vitl1 a linl< to "show details"
so we click through to tl1at page. Finally we sec a banner for "Local
Ticket Outlet Information," which leads us to a link for the "Aladdin
Theater Online Ticketing Page" where we can order our tickets.
This strategy of locating information by continually narrowing our
search through incremental steps has been dubbed orienteering
(though most people simply call it browsing) and has proven to be
24. Findability 13
the preferred approach to finding information. Studies conducted at
the MIT Artificial Intelligence lab have found that information
seekers use keyword search less than forty percent of the ti.me.
Surprisingly, this holds true even when searchers know exactly what
they arc looking for and even where to find it (see table 1).
18
Specific General Specific
Total
Information Information Document
Orienteering 47 19 41 120
Teleporting 34 23 17 80
Total 81 42 58 200
Table I. Information need by search strategy (19 unknowns removed).
There are circumstances where keyword search yields nominally
better results than navigation. In one study, information seekers were
more successful at locating information on a well indexed medical
information site by using search rather than browsing. Interestingly,
those most successful at finding what they were looking for were
tl1ose individuals who turned to search only after browsing failed (see
figure 3). Even when individuals abandon browse oo a given
information hunt and succeed with search, they invariably return to
orienteering oo thei.r next task. 19
M.I.T. researchers have found several reasons why people prefer to
zero in on information rather than attempting to pounce on it in a
single great leap. First, it can be difficult to dearly articulate exactly
what it is you arc seeking. This is the case even when trying to
retrieve familiar information and documents. Think of the last ti.me
you were asked for directions to a familiar destination. Even though
you may be able to drive there without thinking, you may have a hard
ti.me giving step by step instructions on how to get to that same
location. Browsing reduces the cognitive demand on information
seekers by allowing them to follow familiar paths to the general area
of the information they are seeking, guickly and easily reducing the
size of the area they must explore. This also allows searchers to draw
25. 14
Q)
-
Ill
0:::
V,
V,
Q)
0
0
:,
(f)
(/)
....
a.
E
(I)
....
-
c:x:
Chapter One
Search versus browse success rates
100%
90% /
,
80%
,
,
70%
, ,.
,,
,.
60% ,,
,
,.
,
50% /
,.
,
'
40% , , /
,.
/
30% '
,.,,
20% .,
/
/
10%
0%
100%
80%
60% ·
40%
20%
0%
,
'
/
Browse Search Search after
Browse failure
Information Seeking Strategy
First choice of strategy
/1
, I
~
-✓ .,,,,l
/
/ /
,.
,.
/
/
,,
Brow se Search
Information Seeking Strategy
Figure 4. Information seeking behaviors.
26. Find:,bility 15
on a broad range of "meta-information" about the target of their
search.
For example, say you need to locate a company memo that was
circulated six months ago and has since disappeared into the bowels
of the company Intranet. Even though you have no idea where to
find the memo itself, you recall seeing it referred to in a11 email from
a colleague. You may not know exactly where to find that email
either, but you likely will recall who it was from and roughly when
you received it, along with some idea of the subject line and general
content. This will allow you to find tl1c email that will in turn point
you toward the actual target of your search-the company memo.
Even though you can't teleport even i_
nto the general vicinity of the
memo, you can start from a known frame of reference (the email)
and follow clues along the way until you arrive at your goal.
The small steps of orienteering and the clues found along the way
also provide information seekers with a strong sense of location
throughout their search. The importance of tl1e "you arc here" factor
should not be underestimated. When users feel in control and that
tl1cy arc heading in the right direction and arc able to backtrack if
they take a wrong turn, they are less likely to abandon a search
prematurely. When people drop into the middle of an information
space as a result o f a keyword search, they have no context and little
indication of how to proceed. T his sense of disorientation can cause
both knowledge workers and potential customers to leave a website
as quickly as they arrive.
By contrast, navigating through an infomiation space allows the user
to become acclimated to the environment at their own pace, much
like easing into a hot bath rather than plunging into scalding watet:.
This process of guided exploration also has the dual benefit of
building context for interpreting the target information once it is
found and allowing for serendipitous discoveries along the way. Most
importantly, information seekers arc more likely to continue their
search i_f they are confident that they are on the right path and tl,at
their efforts will pay off.
27. 16 Chapter One
NN/g
t!.t:U.g use1t.com md.o rg AskTog
Nielsen Norman Group
Strategies to ennance-'tne"us"er experiehce - - - C _-
Home ~ services Publications ~ About NN/g
NN/g Home :• Services • Training > Intranet usabili ty
Figure 5. Breadcrumb trails arc often used to give users a sense of control
over their exploration of a new information space.
It's interesting to note that the word browse derives from an
antiquated French term brost meaning "young shoot" and referring to
the way that animals feed on the young shoots of trees and shrubs.
As animals seek for nourishment, they must balance tl,c nutrition to
be gained against tl1e energy expended obtaining it. This behavior is
fundamentally the same for information seekers. Visitors to an
information space, whether it be a website, Intranet, database, file
system or what bave you, arc continually balancing cost and benefit:
"Will tl,c i.nformation I find here be worth tl,c time and effort it is
costing me to track it down?" As they browse a website, they will be
repeatedly assessing tlie likeW1ood of fmding what tl1cy need i.n tl1c
current environment and determining when it's time to move on to
more promising pastures.
This metaphor has become the basis of information foraging
theory, a model of information-seeking behavior developed by Peter
Pirolli and Stuart Card of the Xerox Palo Alto Research Ccnter.w
According to this model, we search for information across tl,c
Internet using essentially the same strategics hunter-gatherers use to
search for food across tl,c savannah. The nature of the prey may be
new, but tl,c fundamental approach hasn't changed for millennia.
Botl, animals and humans attempt to maximize their "benefit per unit
cost." When the benefit, in terms of likelihood of finding tl,e
necessary food or information witl, an acceptable investment of time
and energy, falls below a certain tlireshold, the current website or
watering hole will be labeled sterile and the forager moves on to a
more fertile patch. Steps can be taken to reduce the W
<elihood of
users leaving our i_nformacion patches prematurely. One of tl,c most
28. Findabifity 17
effective strategies is to increase the strength of the information scent
present in our systems.
The notion of information scent is central to information foraging.
The basic idea is that just like a game anim al, i.n fonnation leaves
behind spoor that can be detected and tracked.
Associated concepts "rub off' on one another, leaving
detectable traces, just as a watering hole frequented
by woolly mammoths will smell of woolly mammoths. A
hunter-gatherer seeking mammoths is likely to be
drawn to the watering hole, if only to look for spoor.
Information foragers do the same. Imagine you're
looking for texts about foraging theory. If [a search]
throws up a box containing the keyword "hunter-
gatherer", you're likely to select that box. It just smells
right.21
Consider oux ticket purchasing example. When we fast arrive at the
theater's homepage, we sec labels such as "Artist of the Month" and
"Show Listings," which may even include the concert we arc seeking.
Even though we don't see that we can purchase tickets here, the page
smells like concert tickets so we continue our search by clicking on
"Upcoming Shows." Herc the scent gets stronger when we find the
right show along with a link to "Show Details," which finally gets us
to "Buy Tickets O nline." Throughout the process of browsing, the
scent of concert tickets is strong and gets stronger the closer we get
to our goal. This continual positive feedback can keep information
seekers happy with the current infom1-ation patch and prevent them
from jumping to a competitor or colleague to meet their needs.
Strong information scent can be a double-edged sword if mishandled.
The most common pitfall occurs when a strong scent points toward
what should be the right answer but isn't. Jakob Nielsen
demonstrated this phenomenon in a study of a health information for
teens wcbsite.22
Users were asked to find out how much they could
weigh without being considered overweight. Most users quickly
29. 18 Chapter One
gravitated toward an area of the site labeled "rood & Fitness." Th.is
clear, concise label had strong information scent fo.r the question at
hand. Featured prominently within that area of the site was a lengthy
article entitled "What's the right weight for my height?" that was also
ranked highly by a search on the tenn "weight."
This would seem to be a bull's-eye except for the fact that the article
docs not contain the answer to the question. Because the information
scent leading to this article was so strong, users were convinced they
were looking in the right place. When th.c information wasn't there,
they naturally concluded that because it wasn't where it should be, it
must not exist anywhere on tl1e site and abandoned ilieir search. Th.is
is an w1 fortunatc result since ilie answer was in fact available on the
site. It was buried in an article titled "Body Mass Index (BMI)." The
information scent of ili.is title for answering the target question is
almost non-existent. hrst, the title is a bit academic and maybe even
intimidating for the website's teenage audience. Worse, the title gives
no indication of the article's content which includes a straightforward
[ IIY IIC
IEISOILIM{ J
Thursday, August 26
Pink Martini
Oregon Zoo
Tlckot Price: $32.00 adv/ $32.00 d os
All A.Qes Event
loullkkrt
O.tlrtl1fo,mtti1n
OoorJ nl Gntei ti -IPl"I. Lawn Entry @ SPl'I, Shol!f nt 7PM ...
~ Does anvth1no say summer 1n Porttand quite like a Prnk Maroni concert at the Oreoon Zoo? What
•
~-· better way to hear sonos from their latest release - the lush, breezy "Splendor m the Grass" -
than on the zoo's lush, breezy concert lawn? Our hometown heroes are international stars, but
desprte a busy European tounno schedule, Pink Ma1t1nt will out away its oassports for two soeaal
performances at the zoo - their only Portland appearances this summer
• ,._ ' please note only GAtix are available at the Aladdin Uox office; reservation pack6ges ore
available at tickel11H1stcr.com.The concerts all start at 7 p,m. Your ticket will allow you into
the Zoo al 4 1,.111. ol tho day or the concert. For all concerts, the lnwn is closed al 4 p.m. for
the sound c.hcck. ond then opened ot 5 p.m. for concert tltkct·holders. • • • •
PiM!Mutini
Figure 6. A website with good information scent.
30. Findability 19
calculation of optimal weight using height, weight and age. In a
nutshell, the container of the information was mislabeled.
The problem of bad labels strikes at the heart of findabilily. If
information seekers cannot recognize the content they are searching
for even when they find it, it may as well not exist. Even when an
information producer gives careful consideration to labeling and
categorization, the result may have no meaning to information
consumers. J physician, wanting to be precise, may label a document
on treating a particular rcspiratoty condition with the terms
laryngotracheobronchitis, inspiratory stridor and dexamethasonc.
While this may be perfectly appropriate for other doctors, it is of
little use to a mother searching the Web for information on how to
alleviate the wheezing cough of her daughter with croup.
Most information systems today are organized much like libraries
before Melvil Dewey created his decimal system for classification.
Patrons were left to wander stacks of untitled o.r oddly titled books
piled on shelves according to some idiosyncratic organizational
scheme comprehended only by an arcane priesthood of local
librarians.
Overcoming this barrier to discovery is the role of controlled
vocabularies and taxonomies. By developing a structured collection
of terms and guidelines around how they arc to be applied,
information can be managed in a manner tl1at facilitates its discovery,
interpretation and use to the greatest extent possible.
Beyond just finding information, the hierarchical nature of a
ta.xonom)1 can help educate an information seeker by guiding them
tluough a subject. The mother searching for information about her
daughter's illness will not only discover that dexamethasone is a
steroidal treatment for the condition, but that humidified air may also
alleviate her discomfort. Continuing tluough tl1c structure she will
discover additional treatments and potential complications. Finally,
she will learn that the proper name for "croup" is 1.11 fact
laryngotracheobronchitis, giving her a new term to search on and
expanding the potential information sources available to her.
31. 20 Chapter One
The parent/child relationships inherent in the tree structure of a
taxonomy are powerful tools in guiding a seeker through what may
be an unfamiliar subject. By explicitly showing how terms and
concepts arc related, a searcher will discover associations that they
didn't: know existed. Most importantly, they can define and refine
their information need as they explore rather than having to precisely
articulate it up front wben they may not know exactly what it is they
are seeking.
O rgani7.ing information according to a well defined structure, such as
a taxonomy, also provides stability to an information environment.
Information changes continually. D elphi Group has estimated that at
least ten percent of enterprise information changes monthly i.n an
average organization.23
Without some means of governance, relevant
information becomes a moving target. Today a search on
"taxonomies" may yield 1,900,000 matches. Tomorrow o r next week
tlrnt same query could return 1,985,000 hits with completely different
rankings. That article I found last week that was so useful but that I
didn't bookmark could now be anywhere.
A taxonomy can act as a dynamic bookmark. As new documents and
in formation become available, they can be classified, labeled and
published in accordance with the taxonomy without changing its
structure. When a knowledge worker needs to return to an area of
interest, he will still find it where he left it. The only difference will be
that tl1crc is now more information available there. In addition, the
new information will be in context witl, relationships an<l potential
avenues of exploration clearly visible.
Managing terms and keywords can also enhance search by bridging
the vocabulary gap between information producer and consumer. A
search engine integrated witl, a ta.,xonomy would know that a search
on cro11p should also look for laryngolracheobro11chitis and that in certain
contexts "Oscar" is another way of saying "Academy Award." It can
also compensate for common spelling errors and variants (i.e., theatre
or theater) and synonyms (fall or plunge or spill or tumble). T hese
expansions may seem trivial, but they can dramatically improve the
effectiveness and efficiency of search.
33. 22 Chapter One
Controlled vocabularies, like taxonomy and its relatives, arc not silver
bullets and will not magically cure all information management
problems, but they are a critical component of findability. If properly
constructed, applied and maintained, a ta,
xonomy can radically
increase the value of information by making it more available,
understandable and actionable. The remainder of this book will
demonstrate how this can be achieved. Before we can delve into the
mysteries and wonders of taxonomies, however, we must take a brief
detour into the world of metadata.
34. 2
Metadata
If we fail to anticipate the unforeseen or expect the
unexpected in a universe of infinite possibilities, we may
find ourselves at the mercy of anyone or anything that
cannot be programmed, categorized or easily
referenced.
Fox Mulder, "The X-Files"
Art collecting is a tricky business. The value of a painting, sculpture
or even a rare book can vary wildly depending on the circumstances
of a purchase. Two similar works by Monet may go on the auction
block together; one sells for thousands, the other for millions. The
only substantive difference between the two is the existence of
provenance information. A clear record of a painting's histoiy, who
has owned it, when and where it has previously sold and for how
much is essential to deterrnio.i.ng whether or not it is a wise
investment. Without such information we have no context for our
decision. Is it overpriced or undervalued? Is it stolen? Is it a verified
Monet or just a suspected Monet? Even though it is the painting
itself that holds our interest, we need information about the painting
to gualify our interest. This same principle applies to less tangible
assets- namely information.
When we first locate new information we tend to be suspicious. Can
I trust these numbers? Is this the current version of the document? Is
this image copyright cleared? This is especially true if the source of
35. 24 Metad:1ta
that information is not familiar to us. Before we trust a document or
a Web page, we need to know a little more about it. Some of these
gucstions may be answered by the search itself. When we look for
information, we usually try to specify parameters to limit the scope of
the search. Specifying the author of a document, the date of its
publication, whether it is a report, invoice, form or memo will not
onJy enhance our chances of locating what we a.re looking for but can
pre-gualify the content as it is found. This kind of reference
information is generally not indicated explicitly in the content itself,
but rather is supplementary to it. It is metadata.
The standard definition of metadata is usually given as "data about
data." Th.is gets at the general idea, but is not gu.ite adequate. The
term "meta" comes from the Greek root meaning something !hatjollmvs
anolher and lakes ii into acco1111t. Thus, metadata is generally developed
from associated source data and as a function of the information it
describes. The G reek tem1 aJso means among, alongside, or 1vith, so it
follows that mctadata can take several complementary forms in
relationship to its parent information. rinally, if tl1c Latin derivation
is taken into account, meta can mean /ranscendent, so metadata shouJd
be expected to add value above and beyond the content it describes.
To complicate matters, the distinction between data and metadata
can be flu.id. What is metadata in one context may be pure data in
another. For example, if you are looking for an article on a cert'W1
topic by a certain author, then the writer's name and the subject
keywords arc metadata and tl1e content of the article is data. By
contrast, say you are trying to remember the name of the author who
wrote a particular article in tbe 1940s and can't remember the title.
You uo remember that it contained tbc pbrasc: "Man cannot hope
fully to duplicate th.is mental process artificially, but he certainly
ought to be able to learn from it." In th.is case the publication date
range, 1940-1949, and the content of the article itself are the
metadata and tl1e author's name is tl1e data. 1
36. Cbapter Two 25
The Value ofMetadata
In late '1988, a non-descript van pulled up in front of Christie's
East, the pmchasing office of the renowned auction house in
New York City. Tied to its top with several lengths of rope was a
six by s.eveo foot canvas. T he driver had found it at a warehouse
sale of unclaimed property and purchased it on a whim for
$1,000. The painting was in bad shape and nothinKwas known
about ·it, but it was large and old and ougbt to be worth
something. He offered it to Christie's for $1,500. Ian Kennedy, a
residen~ expert of Old Masters for Christie's e..~amined the
painting an instantly recognized it as a work of tbe Italian Master
Dosso Dossi. With this new bit of information, the asking price
.rose from $1,500 to $800,000. It was purchased by the London
art deal~rs Hazlitt, Gooden & Fox for $4 million, dirt, tips and
all. Two months later it was sold to the Getty Museum for an
even higher price.
11
Allegory of Fortune,11
Dosso Dossi
37. 26 Metadata
The defining characteristic of metadata is that whatever form it takes,
it facilitates the identification and discovery of a discrete package of
information. The classic example of this is the library catalog card.
Independent of any actual content from the item being described, a
simple 3" x 5" card can provide a wealth of information that is usefu l
in locating and managing an information resource, in this case a
book. At a glance, we can determine the title, author, publisher,
length, topic and even location of the book. This quick access is by
design.
973.4
B21 UcCullough, David C.
John Ada.ms / [by] David lkCullough
Mei.r Yor k : Simon & Schuster, c2001
751 p., (40) p. of plates : ill. (some c ol.) ,
maps ; 2 5 cm.
Includes bibliographical r eferen c es (p . 703-726)
and inde x.
ISBI-! 0-7432-2313 - 6
l. Adams , John, 1 735-1826, 2. Pr esidents - United
Stat es - Biography. 3. Un i ted States - Po l iti cs
and govema ent - 1783-1809 . I. Title.
E. 322.H38 2001
9 73.4' 4' 092 [BJ 2001027010
Figure I. Mctadata in a traditional card catalog.
/n often overlooked feature of the humble card catalog is that the
cards are organized to facilitate this at-a-glance utility. Each card has
a consistent location and format for each piece of information it
contains. When looking at an author card, we know the first line
indicates the author of the work and the second line is the book's
title. The structure of the card telJs us that a book is a biography of
John Adams written by David McCullough rather than the other way
around. The same principle applies to electronic resources. To be
useful, mctadata must be structured to facilitate both discovery and
interpretation.
38. Chap ter Two 27
Most major newspapers now provide onJine editions with searchable
full-text archives. Tf we type in a few well chosen key words, we have
a chance of finding something of .interest. The newspaper's search
engine will match our query terms against every word of every article
of every edition contained in the archive. This is searching the data,
the actual content of the newspapers. This type of search is subject to
all of the pitfalls of unconstrained search as discussed in the prior
chapter. If we instead search the meladata, we can dramatically
improve the effectiveness of our search.
111:WS fllTEllTAIIIMEllT OTHm StCTIOIIS ClASSlftEDS JOBS CARS H
OMES REIITALS
• JOBS
• CARS
• HOMES
• REIITALS
MORE Cln$1FIEOS
SAi.ES &DEALS
8USIMEst OIRECTO~
eo._,.1,..__kjf
rucE.,IID
ARCHIVES
Ba~k Searc.h
AdvolO~f.ld $e.uch
s.wedSearch
Login
Account &PUl'Ch.1$C$
Knowledge Ccnte,
Arc.hive• Trouble
Rer,011
l.nlmea.com Sit♦
Servke•
ARC:HIVES Hfl.l' ~ lllf•'
Abot.n the ArclW'f:
Prl<ing
Term& of Service
Se,
u ch TQJa
FAO
Storie,: Prio-rto 1tl3S
Sea,ch ror:
-------
Coment O1
1
llo11s: 0 11
1.'1985 . Present (Te><Q
0 121
-1•1881 - 12/311
198-1 (Htstonc Article Images)
Soll By: 0 Most Recent First
0 Oldest Fhsl
0 Retavance
Date Options: 0 All dates
O oate Range
AtRhor:
Headline:
A1ticle Type: Al
r.:: ,, ....,,~
F1 0111; .wl ~ ·: 1 v i ~
To: ~~;--::~
- - L - -
(optlonaQ
- - - ~ (option•~
Sectloo: Al
- - - -
Semell O1
>11011s: Search Articles Only
SearchMieles.Advertisements and Listings
EIL#M
Figure 2. The advanced search page of the LA Times.
V'
39. 28 Metadata
Tf we would like to research the position of fom1cr president Jimmy
Carter on U.S. trade with China, a reasonable place to start is the
arch.ivcs of the Los Angeles Times (www.latimes.com). Js we would
h• th I d "C " "]) Li " d "Cl • "
expect, scare Jng on c ceywor s arter, o cy, an una
returns an assortment of documents ranging from an analysis of the
conflict between China and Taiwan to an obituary of Stanford
University professor Michael Oksenberg. Fortunately, the Times
archive provides an advanced search mechanism utilizing extensive
meta.data. Rather than a blind search where all words are treated
cgually, the Times enables users to restrict certain terms to certain
areas. We can specify that "Jimmy Carter" only be matched against
authors and that only articles of the type "opinion piece" with the
word "China" i.o the head.line be retrieved. Even though we are no
longer looking at any of the archive's actual data or article text and
are instead searching only meta.data, we receive a precise set of
documents with a strong likelihood of being .relevant to ou.r interest.
Types ofMetadata
The advantages metac.lata affords to searching electronic versions of
traditional textual resources are straightforward. However, the digital
world isn't as simple a place as it once was, and newspapers,
magazine articles and the like arc rapidly becoming a minority among
the milieu of online information. ew types of i.nformation objects
and artifacts seem to emerge daily. Io order to manage this deluge of
new forms of information, we must be able to describe them in ways
that are specific to each wuguc type and the tasks utilizing them. To
this encl, several different forms of metadata- desc.riptivc, technical,
and administrative- may be developed for any given information
object.
Descriptive Metadata
D escriptive metadata is by far the most common form of meta.data
i.n use today and is usually what you will encounter as an in formation
40. Chapter Two 29
seeker. This type of metadata comprises what is explicitly added to
content to make it easier to find. lo a nutshell, descriptive metadata is
the who, what, when, and where of an information resource. 'v'hile it
found its first broad application with textual resources such as the
LA Times archives, it is rapidly coming to permeate every aspect of
the online world.
Take for example, Apple Computer's popular iTunes online music
service. Since the content offered by i'l'unes is non-textual (i.e., the
strains of a Bach concerto or a John Coltrane solo), full-text search of
the content itself is ill-suited to retrieval. Rather, you search the
textual information associated with the audio or video file you are
trying to find. Most files have been extensively tagged with
descriptive metadata. This includes the basics, such as artist, album,
and song title as well as more advanced categories such as genre, sub-
genre, release date and publisher. Each piece of metadata associated
• f<02S Mro5t.F.,.
iTunes Review
Th• Rloe >nd f>II ofZlggy
Stoudusr o1nd the Spider~
fromM3n
03id BOWie
(;if1 lhlct.tuck 0
AtlklAAttt 0
Tehflieod 0
~.Deu.Sep28. 19SI(!:
~~~
C
sil H!rJ l'fvln
lt.tt . .........,~
lWR't!:)(wt,.ol 51•11.'WI lhtr~toci ~ - !l'N~by
enegolo01U
liS ...... lo)'•&neremorblto~ lWlll'N,llmde'S'slW
llOl'lfl'l:ofltl6tietnYJ~.KUOl-70ttd ~.Dl'!M~
~.,~ptng?",m11~~ .wtfk.,byn~.,,,,..
,weco.Jtfaf
~ nrac:t.~~rncrtll':«ttltth::cied:a:t'i.:t
~rqCll'NdlGf~. N ~
0tu:ttne-¥bf!:e,Pwt",t,:1ectl:o
,1 A.fl"'flll
. ~ IOd:Rd~w~U90(3~-..wfOU ~~
bac.h "Sla,JMt,;~)e'(~.•"frf9Y•1;"1-w,QCrt:t
Ytur:d!'
Swl=
---
◄ S(a,,no,-,
5 llAn't.~JY
6 l«h'5lerclnt
...
11,q(J"ltoYo,ssef
.,,,,,,,._
'f:42 Ob-,,d Bowie
J:33 O.Vld~
4Jl8 O.Yidao..
4:13 o.w,ea-,
2:$1 Oraw!ltor,<,,'JO
1:20 O.W,Bo,,,,w!,
2:i6 [)r,,dflO'tle
Z:38 oa'Yld&ow,,e-
J;l2 0r,41:UJowe
3:2'40.W,BtMle
II~
Figure 3. Metadata in iTuncs.
Tor Attb,t Or...~ b
I . Unde, P'l1ttsur•
2 ~ e Oddiry
3 undtf P'l tt,w e
• , .,., Dance
5 Ch-M'OO•
d. Rt!bel Rebl-1
,:.]~
q......,_ o ,@=
......yrt,...,...";).-« ~
l,i'l.t ,irtot1!1 .lll!SOl>ouglll
l.9:w.-eOddRy.
,..,,,..,.,.
AI.Mklln Saoe.
"'""'"""'
Lo,..,
D,v,a-
11..-.esEt~ T~f't41.etlM:c.u
t ~k.it>e- 1.~~ :th om~tt.ero-
SHM O 1. eu... fleiO
I l.ci.., ,kttodl
ol. Dl'Ad 1o.,.1,e ~•vIow ...
l
s:. lhrlr-.n.oa:
, . Pt<Mo Put•
1. Ro.t, Vf rut• t;
•-A>-0 u
Customer Reviews t"g '.'t1,•A■11Re-'Mwc O
eu..JtA11-hBeS1 * :lf..:A*Jt
b'tJitt-....,. ,.. -
Trclllliftltshoa,-.e:~os•Q-'• rt oneotlhOMotuct.
.....nere 1 r1~:opd.<:t.Aa.i:::1onc;c!OdlCW'9 boo<'ao1,etis11
e,M1tir1M«-.or~•ia.a:::ill,'Wf!Ke'!!'Cll.l<<in111dilrat..-cift.~O!t
....«tt,.,,.ior,1,~0,:tCiN'liel- Moire-
0 lheRtse«ldfalofZ...
0 ft,elbwloodF.,fofz...
0 Tt.RNondFalofZ...
0 TheRaundfaldZ... SOW~
0 lhoRisoMd.Falofz... 1().99 ...,
0 TheRISoMIOFalofZ... S0-99 ~
0 TheR.Q! YiCFaldZ... $0,99 ~
0 TheR.AlrldFalof Z... S0.99C.,••o-)
0 TI-.Ra.-.:!Falof z... MunOri)o
() TheRae.-,cFaldZ... S0.99
41. 30 M etadata
with a particular song increases the probability that it will be found,
either by searching or browsing, and subsequently sold.
The value of descriptive mcta<lata doesn't rest solely in discovery and
retrieval It also facilitates tl1e second part of the e-commerce
equation: making the sale. Once a user browses tl1rough genres, sub-
gemcs, and artists to a particular albwn of interest they can read
reviews, ratings, song length, and even beats per minute. All of this is
descriptive metadata that will help ilic information seeker make a
value judgment of the content t11cy arc considering. The principle is
equally valid for corporate earnings reports as it is for Mariah Carey
videos.
Administrative Metadata
If descriptive mctadata is intended primarily for the information
seeker, administrative metadata is 1na.inly for the benefit of tl1e
information owner or steward. Metadata elements specifying from
where a file or document came, where it is to be hosted, who is
authorized to modify it, when it is to be archived, in what form and
for how long arc all administrative mctadata. It is created for the
purposes of management, decision making and record keeping.3
Administrative mctadata is tl1e lifeblood of modern content,
document and records management systems. It allows content to
move through its lifecycle in a largely automated fashion. For
example, companies try to keep ilieir websites interesting by
continually changing their content. cw stories arc posted to the
homepage and older content is moved to less prominent locations. J
few well chosen pieces of metadata, such as publish date, run length,
and archive page ID can combine with business ruJcs in a content
management system to automate for tlic most part the entire process
of updating a website. This frees the Web team to focus on creating
compelling content rather tlian shuffling files around the server. It
also allows tl,e website to be updated in the middle of the night
wiiliout disturbing the webmastcr's sleep.
42. Chapter Two 31
Recently, administrative mctadata has found a new niche in the form
of Digital Rights Management (DRM). Once the province of
military intelligence and industrial secrets, DRM has recently moved
into the mainstream. As distribution of intellectual property across
tl1e Internet and corporate Intra.nets has become the norm, having a
reliable means to track that content and control who can access it has
become essential. DRM secures digital materials and limits access to
only those with tl1e proper autl1orization. In addition, a complete
DRM solution facilitates and tracks any transactions involving tl,c
content you wish to protect. !,.or example, allowing copying or
limiting the period of access or the number of ti.mes content may be
viewed must all be supported.4
ORM technologies and techniques arc
dnven by administrative meta.data.
Structural Metadata
As we have noted, information comes in many forms and &om many
sources, usually bundled into packages tl1at a.re largely black boxes to
us. How a.re we, or more importantly ilic tools we use, to know how
the information is to be read, manipulated and displayed? How docs
an application know the technical requirements for integrating the
contents of some strange new file into its world so that we may have
access to its contents? This is the role of structural metadata.
Structural mecadata, sometimes referred to as technical metadata,
display metadata or use metadata, describes how an information
object, usually a file or set of related files, is put togetl,er. This can
range &om technical details such as file size, compression scheme,
and scanning resolution to display and navigation information such
as presentation order, typographic instructions, and search
mechanisms.
The most common application of struclural metadata is defining how
information is to be organized in databases and data warehouses.
Every piece of information housed in a database must be grouped
into records and described in terms of type, size, and relationships.
43. 32 Metadata
The structural metadata governing this organization is in fact what
makes up a database and turns unorganized data into a usable
collection of structured information.
Another way of looking at structural metadata is the page-turner
model. In this model, structural metadata specifies how individual
information objects are bound together to make up a single
information package that is presented in a specific order, like the
pages and chapters of a book. This allows text, images, and other
content to be presented in sequence, but enables the user to navigate
it at will, jumping from section to section, while preserving the
organization and structure originally intended by the creator.
Metadata Schemas
Regardless of its type- descriptive, administrative or structural-and
the purpose to which it is applied, all metadata share certain
characteristics. At a minimum metadata must posses semantics,
synta..
x, and structure .5
Semantics refers to the meaning of metadata within a pmticular
comJtnmi!J or domain. T
t is important to note that any given metadata
field can have different interpretations depending on the context in
which it is being used. For example, the administrative field sample
so11rce could refer to a medical procedure or even a particular patient
in a medical context, or it could refer to a certain musical instrument
or recording in the context of audio production. It could just as easily
be a technical field referencing a particular device or encoding
scheme. The point is that without clearly defined semantics, it is
nearly impossible to accurately interpret mctadata.
Just as people cannot interpret metadata without an understanding of
its semantics, computers can't make sense of it without syntax and
structure. Syntax is the systematic arrangement of metadata elements
and their values according to well defined rules. The most common
44. Chapter Two 33
form of syntax currently is the name-value pair in which the name of
the metadata clement is simply matched with its value, such as:
<author =Arturo Perez-Reverte>
<title = The Club Dumas>
<genre =Fiction>
Structure defines how metadata is to be organized to ensure
consistent representation and interpretation in line with its syntax and
semantics. The structure specifies which mctadata elements are
allowed where, in what order and how often. A record describing a
"book" must start with one or more authors, followed by a single
title, a single genre, an optional sub-genre, a single publisher and so
forth.
Taken together, semantics, syntax, and structure form a type of
grammar, called a schema, that specifics the rules governing the
metadata of any given domain or application. At the most basic level,
a schema specifics a list of attributes that arc valid for describing ao
information package. A more sophisticated schema will often detail
out every aspect of how metadata is to be encoded and represented.
In all cases the overarching gmtl of defining a rich schema is to make
metadata as useful as possible in terms of interoperability,
extensibility and flexibility.
Interoperability is the ability of information systems to exchange
metadata an<l interact in a useful way over communication networks
such as the Internet.(' This is what allows the computers at
Amazon.com to talk to your bank or credit card company and receive
payment for the book you ordered. Extensibility means that the
original definition of the schema isn't the final word. It should always
be possible to add additional metadata elements (albeit in an
organized and controlled manner) to any schema in order to
accommodate specific and often L111forescen user needs.
45. 34 Metadata
Above all, mctadata users demand flexibility from their metadata
schemes and systems. T hey do not want to be compelled to add
information that they deem is irrelevant or too cumbersome. As a
result, most mctadata schemas allow authors to include as much or as
little detail as they desire in a metadata record. This makes autl10rs
happy, but tends to make life difficult for information aod metadata
administrators, since the more flexible mctadata is, the less
interoperable it becomes. Two informatio n systems may depend on a
particular metadata elem ent in order to communicate, and if an
author fails to provide it, interaction between tl1c tvo systems
becomes impossible. Imagine if Amazon.com neglected to include
the price of a book when it tried to charge your credit card. Schemas
serve to mitigate tl1ese problems while presc1v ing as much flexibility
as possible.
T he number of publicly available schemas has exploded in recent
years, and there now seems to be metadata standards (official, de
facto, and even competing) for nearly every domain imaginable. O ne
of the earliest and most broadly applied is the Dublin Core (DC).
am ed after the Ohio city in which it was first drafted, the D ublin
Core was originally developed witl1 an eye to describing document-
like objects. More recently, D C metadata is beginning to be applied
to a broad range of other types of resources as well.
O ne of the strengths of DC and a prime reason for its popularity is
its simplicity. The D C schema captures the fundamental characteristic
of an information resource in a manner tliat is easy to create and
comprehend. Thomas Baker of the German National Research
Center for Information T echnology has referred to it as "metadata
pidgin for digital tourists."7
ln its current form, D C consists of fifteen elements covcnng tl1e
basic descriptive, administrative and structural needs of an
information object. For each clement the schema supplies both an
official label and a concise definition. I;or example creator is defined
as: "an entity primarily responsible for making the content of the
resource." Just as with a well defined structure, clear definitions of
46. Chapter Two 35
labels and terms arc essential to ensuring the appropriate
interpretation and application of metadata.
The D ublin Core is an example of a simple schema that can mediate
between the extremes of full indexing of raw text and highly
structured content. It provides a mechanism for capturing the
fundamental information necessary to describe an information
rcsow:cc without the burden of elements that may be irrelevant to a
particular community or application.
Some have perceived the spare nature of DC schema as a weakness.
While its basic nature allows it to describe many different types of
resources, it limits the detail you can capture about that resource. For
example, the creator clement, described above, makes no distinction
between a person, an organization, or a service. This could be
essential information to a particular application. Perhaps even more
troublesome is the fact that there are no constraints placed on the
values a given element may take. For example, the subject element
can be filled with a keyword, a Library of Congress Subject Heading
or a free text description. This lack of standard terms and values is
critical, as we shall sec shortly.
Descriptive
Title
Subject
Description
Source
Language
Relation
Coverage
Administrative
Creator
=
Publisher
Contributor
Rights
Figure 4. The current Dublin Core clement sci.
Structural
Date
Type
Format
Identifier
47. 36 Metadata
These shortcomings arc common to most metadata schemas. The
Dublin Core is a good example of how linutations can be overcome
through extensibility. The DC supports two types of qualifiers,
schemes and types, which refine the base schema.
Schemas allow you to specify the standard syntax or vocabulary that
arc allowable for clement values. T he D C element Slf~jec/ may be
qualified with MESH to indicate that all values must be drawn from the
Medical Subject Headings vocabulary or LCSH to require Library of
Congress terms. Likewise the language clement may be qualified with
ISO 639-2RFC 3066 to ensure that any value applied to that field
conforms to the ISO standard.
DC types refine the definition of the core element itself. The basic
D C clement date, defined as "a date associated with an event in the
life cycle of the resource" is too generic to be useful. 13y applying a
type, the basic date clement can be transformed into date created,
issued, accepted, available, or acquired, among other possibilities.
This ability to refine and enhance the schema without corrupting its
fundamental nature and structure is the key to metadata extensibility.
'qithout it, any metadata system will quickly become obsolete
regardless of bow well conceived and executed initially.
Where Do I Put It?
Mctadata can live in several different places. TraditiooaU
y, as with the
card catalog, it has been recorded and stored separately from the
object it describes with a pointer of some sort to the location of the
information resource itself. This is o ften the case in content
management and data warehouse systems. Information resources will
be given a unique identifier and stored in whatever form and on
whatever system is most appropriate. 'fhe metadata describing that
resource may be hosted in a separate database dedicated to that
purpose. The metadata and the object it describes remain I.inked by
means of the resource's identifier.
48. Chap ter Two 37
This approach has the advantage of making it simple to update the
metadata of any given information resource. If a new manager takes
over responsibility for a large number of documents, you can simply
update the database with the new information rather than tracking
down and retagging the documents themselves. The disadvantage of
this approach is that the metadata doesn't travel with the document if
it is shared. If a file with externally managed metadata is ern,'liled to a
colleague at another organization, they will receive the content but
not the descriptive information. This can become a problem if that
additional inf01mation is critical to making the document usable.
Ao alternative to external management is to make the metadata a part
of the information resource itself. Most applications supporting thjs
approach store metadata as properties of the file they describe.
Mjcrosoft Windows, for example, allows an author to add summary
metadata to any file, which may then be used to organize, locate, a.nd
retrieve the information resource. In addition to traveling with the
file, internal metadata has the advantage of being somewhat self-
maintainiog. In the case of Windows metadata, some information is
extracted directly and automatically from the document itself. The
organization of the file is automatically extracted from heading styles
in a Word document, Excel worksheet titles, or slide titles in a
PowerPoint presentation. If the file changes, the new structure is
automatically reflected in the metadata. Usage statistics are also
automatically updated throughout the life of the document. At first
blush, semi-automatic maintenance and close coupling witl1 tl1e
information it describes makes internal metadata a very attractive
option, but it does come at a cost.
rirst, while some of the descriptive metadata (title, author, cornpa,!Y) can
be automatically generated, the fields that are most useful to retrieval
(su~ject, category, kry111ords) must be manually selected, keyed, and
maintained. If tl1e owner of the document changes, as mentioned
earlier, not only docs that field need to be updated in each impacted
docwnent, tl,ere will be no history of ownership. O nce ao internal
field is updated, all previous values are lost. This can become critical
if an explanation of something in ilie document is needed and no one
remembers who origi.nalJy wrote it.
49. 38
----·· . ~-- ··-- ··· ·----· - ···--· • •·--· --·· • - -· - 'I
!lntn:irtifnfon to th"; i.;;11~~11itir Wi;bt1utlini. <lo;;-1•-: {tltil
Property
Description
[¼Title
c;:rsubject
[?'category
[¥Ke'WOrds
CJ'Comments
Origin
[?'source
[¥Author
Introduction to the Semantic Web
Semantic Web
Lectures
Semantic Web, RDF, Ontology
Draft of Lecture 1
Darin L. Stewart
CJ'Revision Mumber 2
'-'--_O_K_....,J_]" Cancel 11-" Apply I, _Help
Figure 5. Metadata in Microsoft Windows.
M etadata
tnother hazard is shifting terminology. The vocabulary of any
organization or community inevitably changes over ti.me. Keywords,
subject headings, and even category labels need to be updated to
reflect these changes. Otherwise a search engine will not be able to
match a relevant document tagged with obsolete tenns with a guery
from a user searching with the latest buzzwords. Additionally, while
deliberate keywords arc essential to effective retrieval, as discussed in
the prior chapter, the burden of selecting, assigning and maintaining
them falls primarily on the author (who is invariably overworked
already). This often leads to sporadic metadata and often
idiosyncratic tags and terms. This becomes an even greater problem
in the context of authority control, which we will discuss shortly.
50. Chap ter Two 39
Where Does It Come From?
The potential sources of metadata and the means of creating it are as
varied as the information resources they describe. Systems for
automatic generation exist but rarely reach an acceptable level of
quality without human assistance. Conversely, a broad application of
metadata across an enterprise of any si7.e is generally too tedious for
human beings working without the help of scripts, term extractors
and tagging tools. As a result, most successful metadata endeavors
draw on a range of sources, tools and techniques depending on the
nature of the information under consideration and the purposes for
which it is intended.
The same principle is just as applicable to creating the metadata for a
single information resource as it is to an entire collection. In most
cases, the descriptive metadat,'l will be assigned by the creator or
author of the information. This has the advantage of terms coming
from the person most familiar with the content and its original intent.
It has the disadvantage of the metadata reflecting the biases and
idiosyncrasies of the author, whose vocabulary may not necessarily
reflect that of her audience.
The readers may also place the information in a different context
from that originally conceived by the author. As a resuJt, it is often
advantageous to leave the creation of descriptive metadata to the
professionals. The National Information Standards Organization
(NISO) has noted that it is often more efficient to have indexers or
other information professionals create this metadata, because the
authors rarely have the time or necessary skills.R This is, of course, an
additional line item cost, but when lifetime cost of ownership
(especially in terms of findability) is taken into account, leaving it to
the professionals is often cheaper in the long run.
Administrative and structural metadata will often be generated by the
technical staff that prepares an information resource to be published
and distributed. The individual scanning an image or creating a digital
recording is in the best position to supply details about resolution, bit
51. 40 Metad,1ta
rates and encoding schemes. The individual adding the resource to
the content management system will know when it is to be posted to
the website, for how long, and where it is to be archived at the cod of
its run.
As with any budding field, there are an abundance of tools available
to assist in the creation of mctadata. The most common (and
cheapest) is the application of templates such as those available in
most word processing applications. In addition to providing
standardized formatting of common document types, templates can
also guide the author in providing basic descriptive metadata. Even if
professional indexers arc utilized to create the final metadata, it is
often effective for the author to create a "first draft" of the mctadata
to serve as a guide. A well conceived document template can simplify
this task and improve the quality of the mctadata.
One of the challenges of high quality mctadata is ensuring that it
confonns to the appropriate schema. Mark-up and tagging tools can
prompt the user for the appropriate fields, requiring those that arc
mandatory for compliance to the designated schema. Once the
mctadata is complete, the tool can either embed the metadata in the
information resource itself or e>-
rport it to an cxtcmal mctadata
repository or database.
Extraction tools will analyze the content of an information resource
and attempt to extract appropriate terms and values for certain
metadata fields. ror structural mctadata, this is often straightforward
and quite effective. For more conceptual clements such as subject,
category or keyword, it gets a bit trickier. Most tools rely on a mixture
of statistical and computational techniques to make a best guess at
appropriate descriptive metadata. In most cases these tools require a
great deal of training in terms of sample docwncnts and target
vocabularies, and still depend on human intervention and revision.
However, much like having authors take a first pass at assigning
mctadata, automated extraction tools can dramaticalJy reduce the full
mctadata burden to a more manageable one of cleanup and
refinement.
52. Chapter Two 41
"'°'"'It::!
~J
e -=-~~
-+• .<11!: J
~ 9..
s,..,,,,
ZOMSIEI.AND
-
...........
e zoMBIElAUO Tllle:1 OYORc:
,..~
"''""""""'
l;IZ°"""""d
..,,_
;IZ~[BU",~I
lll!Z~ l
..,,_
~~!HDJ
Lite~aZorbekindl£iJI
lalZOlrlbd.n:!!2009!
::.
~ RMdead
tiSNlllollheOMd
~ Z~ ( 2-0itel luUi,01S"-'-
Q!Thf!Hill"90Ye'(R-A.Y:edSr,ole-Occ
~ ZorrbeL!ndl2-0c:c 'Nu1 UooSrul
~ Kd<An(Th~8,-l""'°"°C
ijZOltlesfZartluZCll'tld ·~
:;ao....,ie..,..i
'""'""
5""'0
..
on •200'3-10-2 Fl
-~R~R Pl
Gt<~ l:.Uf ~ El
~=
:c j°"""'
~-~
~~-1g~""--
............. a:int. ~ ~-.vi•~--- ,..
~..- .llllilrd Jl'OllcoJd00-=nc&,,,,...,,rto lfl.llt,o,sflt"'
~~~:~b-=J>:/t!~aj~·~ 8
net.named~U•:1.•£~lnc::M:11!1 ~r, tlll'
In..v,dJ,OUCSI~~ l'W!IU.-hic.tn
l a!Q 114t'ih
,_Nl'ft9AIIJ.,.M• S,tj d-=,,~l'IN~h.A•d'-".,_.'
~:Oll"C'Otw..11.Wltl~h'l'!lm&efllfll'IT'.aSIMoOl'ldA,hQ."II .,.
Figure 6. A metadata creation aid: Meta-X.
Metadata andAuthority Control
0 2llMB!EIANO r,_,_..,.
Metadata is a hard sell. It is expensive to create and difficult to
maintain. Executives have a tough time understanding how the
problem of having too much information to manage can be solved by
adding on yet more information. Metadata is a bit of a "hair of the
dog" solution. We add a little extra information to make a lot of
information more usable. Js to the expense the answer is, of course,
pay now or pay more later; sometimes a lot more. As discussed in the
prior chapter, a few moments tagging a document can save hours
bLU1ting for it l..'lter. When done properly, metadata initiatives nearly
always generate a positive return on investment. Unfortunately, few
a.re done properly and most fail. A prime reason for this is a lack of
authority control.
The notion of authority control boils down to making sure everyone
involved in the creation and management of an information resource
53. 42 Metadata
is speaking the same language. It is the mechanism by which
consistency in onl.i.ne systems is created and maintained. When
applied to search and even navigation, it promotes greater precision
by providing official or "authorized" forms of names, labels and
values. As part of this system, references to equivalent terms and
synonyms and variants are created which dramatically improve recall.
9
recall.9
For example, if the authorized term for a "non-rigid, buoyant
airsrup" is blimp there will be cross references to zeppelin and
dirigible. An information seeker searching on any of these equivalent
terms would receive information for all of them.
The value of authority control to metadata should be obvious. While
schemas provide structure, syntax and semantics to ow: mctadata, thry
do nothing to ensttre comistenry i11 the values assigned to the elements of the
schema. The Dublin Core may specify an element called language and
define it as, "the language of the intellectual content of the resource,"
but it does nothing to limit the potential values that can be assigned
to that field. If DC metadata is being created for a□ international
news story, its language could be tagged as English, Eng., En, American
English, British English, or any number of variants. Each is potentially
valid, but the lack of consistency turns retrieval into a crap shoot. If
an information seeker searches on English they will receive only those
information resources labeled with that exact term. Anything tagged
with another term for English wilJ be ignored.
The solution is to restrict potential metaclata values to an agreed
upon list of terms, so that both information creators and seekers are
speaking the same language. Io many cases, an authoritative
vocabulary already exists and ca□ be adopted wholesale. Io the case
of the D C language element, the International O rganization for
Standardization (TSO) Language Codes standard (ISO 639-2)
provides authoritative names and codes for languages. English would
then be consistently represented as eng, Italian as ita, Japanese as jpn
and Esperanto as epo.
If the desired granularity docs not exist in tl1e standard, it can be
expanded. D CMI actually recommends this as a best practice in the
case of languagcs.
10
The ISO standard can be used in conjunction
54. Chapter Two 43
wiili ilie Internet Societies' proposal for language codes (RFC 3066),
which includes ilie more specific labels of en-US for American
E nglish, en-AU for English as used in Australia, en-GB for English in
ilie United Kingdom, or even en-GB-oed for British English using
spelling from the O xford E nglish Dictionary. The additional
advantage of adopting auilioritative terms is the possibility of
sa.ucturing the labels to reflect relationships.
Eng(UseFor English, en,)
En-AU (UseFor Australian English)
En-GB (UseFor British English)
En-GB-oed(UseFor British English OED spelling)
Despite the advantages it offers, authority control is a difficult pill to
swallow for most organizations. The prospect of giving up ownership
of terms and labels is often enough to incite turf battles in even ilie
most collegial of environments. Authors feel that it is unnecessary
and even inadvisable to constrain their vocabulary in any way (though
they invariably recognize the need for such constraints among their
coU
cagucs). Deciding who and what is ilie "authority" and who and
what is governed by its dictates are among the most contentious
issues in information management. If metadata is a hard sell,
authority control can turn into a shotgun wedding.
Fortunately, it needn't be so. A balance can be struck between the
expressive needs of content authors and the findability needs of
information seekers. D oing so depends on the proper definition,
creation and management of the inform.ation resources provided to
both groups. Taxonomies arc the lynch pin of this process.
56. 6 Ib. p. 199.
7 Ann. du Museum d’Hist. Nat., tom. i. p. 234.
8 Lyell’s Principles of Geology, ii. p. 31.
9 Principles of Geology, ii. p. 8.
10 This subject will be found to be discussed at considerable
length, and in a very satisfactory manner, in the second
volume of Mr. Lyell’s Principles of Geology, p. 1-65.
11 Animaux sans Vertébres, i. p. 260.
12 Ibid. 258, N. Dict. d’Hist. Nat. xvi. Art. Intelligence.
13 Kirby’s Bridge. Treat. Intro. p. xxxii.
14 N. Dict. d’Hist. Nat. xxii. Art. Nature, 377; Anim. sans Vert.
i. p. 317.
15 Anim. sans Vert. i. p. 316.
16 Anim. sans Vert., vol. i. 322.
17 On the Influence of the Moon on the Earth’s Atmosphere;
Journal de Physique, Prairial, an. vi. Most of Lamarck’s
other essays on Meteorology will be found in the
periodical just named.
18 The most recent and probably the best edition of the
Animaux sans Vertébres, is in eight volumes octavo,
augmented with notes by M. M. Deshages and Milne
Edwards.
19 Animaux sans Vertébres, i. 381.
20 Horæ Entomologicæ, p. 213.
57. 21 Cuvier conceives that the basin of Paris contains a greater
accumulation of fossil shells than any other place of equal
extent. At Grignon, no fewer than six hundred different
species have been collected in a space not exceeding a
few square toises.
22 See Boisduval, Nouv. Ann. du Museum, vol. ii.
23 Benett’s Wanderings, &c. i. p. 265.
24 Bridg. Treat. ii. 350.
25 Horsfield’s Catal. of the Lepidopterous Insects of Java,
Intro. p. 9.
26 This work extends to fourteen volumes (the last published
in 1833), and three supplementary ones are in course of
preparation.
27 Species général des Lépidoptères, p. 158.
28 Voyage de l’Astrolabe, Ent., pl. 4, fig. 1 and 2.
29 Species général des Lépidoptères, vol. i. p. 184.
30 Encyclop. Methodique, Art. Papillon, p. 67. No. 116.
31 Descrip. Catal. of Lepid. of Indian Company, pl. i. fig. 14.
32 Species général des Lepidoptères, i. p. 435.
33 Wilson’s Illust. of Zoology, fol. 27.
34 On the Plate the under figure should have been marked 1,
the upper 2.
35 Supp. to Cramer, p. 10, 11.
58. 36 Owing to the resemblance which this species bears to H.
Cupido, the latter name has been inadvertently attached
to the figure on the adjoining Plate.
37 Zoological Illustrations, 126.
38 Trans. of Zool. Society of London, i. p. 187.
39 Zoological Illustrations, 2d series, 131.
60. *** END OF THE PROJECT GUTENBERG EBOOK FOREIGN
BUTTERFLIES ***
Updated editions will replace the previous one—the old editions will
be renamed.
Creating the works from print editions not protected by U.S.
copyright law means that no one owns a United States copyright in
these works, so the Foundation (and you!) can copy and distribute it
in the United States without permission and without paying
copyright royalties. Special rules, set forth in the General Terms of
Use part of this license, apply to copying and distributing Project
Gutenberg™ electronic works to protect the PROJECT GUTENBERG™
concept and trademark. Project Gutenberg is a registered trademark,
and may not be used if you charge for an eBook, except by following
the terms of the trademark license, including paying royalties for use
of the Project Gutenberg trademark. If you do not charge anything
for copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such as
creation of derivative works, reports, performances and research.
Project Gutenberg eBooks may be modified and printed and given
away—you may do practically ANYTHING in the United States with
eBooks not protected by U.S. copyright law. Redistribution is subject
to the trademark license, especially commercial redistribution.
START: FULL LICENSE
62. PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK
To protect the Project Gutenberg™ mission of promoting the free
distribution of electronic works, by using or distributing this work (or
any other work associated in any way with the phrase “Project
Gutenberg”), you agree to comply with all the terms of the Full
Project Gutenberg™ License available with this file or online at
www.gutenberg.org/license.
Section 1. General Terms of Use and
Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand, agree
to and accept all the terms of this license and intellectual property
(trademark/copyright) agreement. If you do not agree to abide by all
the terms of this agreement, you must cease using and return or
destroy all copies of Project Gutenberg™ electronic works in your
possession. If you paid a fee for obtaining a copy of or access to a
Project Gutenberg™ electronic work and you do not agree to be
bound by the terms of this agreement, you may obtain a refund
from the person or entity to whom you paid the fee as set forth in
paragraph 1.E.8.
1.B. “Project Gutenberg” is a registered trademark. It may only be
used on or associated in any way with an electronic work by people
who agree to be bound by the terms of this agreement. There are a
few things that you can do with most Project Gutenberg™ electronic
works even without complying with the full terms of this agreement.
See paragraph 1.C below. There are a lot of things you can do with
Project Gutenberg™ electronic works if you follow the terms of this
agreement and help preserve free future access to Project
Gutenberg™ electronic works. See paragraph 1.E below.
63. 1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright law
in the United States and you are located in the United States, we do
not claim a right to prevent you from copying, distributing,
performing, displaying or creating derivative works based on the
work as long as all references to Project Gutenberg are removed. Of
course, we hope that you will support the Project Gutenberg™
mission of promoting free access to electronic works by freely
sharing Project Gutenberg™ works in compliance with the terms of
this agreement for keeping the Project Gutenberg™ name associated
with the work. You can easily comply with the terms of this
agreement by keeping this work in the same format with its attached
full Project Gutenberg™ License when you share it without charge
with others.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
1.E. Unless you have removed all references to Project Gutenberg:
1.E.1. The following sentence, with active links to, or other
immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project Gutenberg™
work (any work on which the phrase “Project Gutenberg” appears,
or with which the phrase “Project Gutenberg” is associated) is
accessed, displayed, performed, viewed, copied or distributed:
64. This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this eBook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.
1.E.2. If an individual Project Gutenberg™ electronic work is derived
from texts not protected by U.S. copyright law (does not contain a
notice indicating that it is posted with permission of the copyright
holder), the work can be copied and distributed to anyone in the
United States without paying any fees or charges. If you are
redistributing or providing access to a work with the phrase “Project
Gutenberg” associated with or appearing on the work, you must
comply either with the requirements of paragraphs 1.E.1 through
1.E.7 or obtain permission for the use of the work and the Project
Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.
1.E.3. If an individual Project Gutenberg™ electronic work is posted
with the permission of the copyright holder, your use and distribution
must comply with both paragraphs 1.E.1 through 1.E.7 and any
additional terms imposed by the copyright holder. Additional terms
will be linked to the Project Gutenberg™ License for all works posted
with the permission of the copyright holder found at the beginning
of this work.
1.E.4. Do not unlink or detach or remove the full Project
Gutenberg™ License terms from this work, or any files containing a
part of this work or any other work associated with Project
Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute this
electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
65. with active links or immediate access to the full terms of the Project
Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.
1.E.7. Do not charge a fee for access to, viewing, displaying,
performing, copying or distributing any Project Gutenberg™ works
unless you comply with paragraph 1.E.8 or 1.E.9.
1.E.8. You may charge a reasonable fee for copies of or providing
access to or distributing Project Gutenberg™ electronic works
provided that:
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
66. about donations to the Project Gutenberg Literary Archive
Foundation.”
• You provide a full refund of any money paid by a user who
notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.
• You provide, in accordance with paragraph 1.F.3, a full refund of
any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™
electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or
67. damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.
1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for
the “Right of Replacement or Refund” described in paragraph 1.F.3,
the Project Gutenberg Literary Archive Foundation, the owner of the
Project Gutenberg™ trademark, and any other party distributing a
Project Gutenberg™ electronic work under this agreement, disclaim
all liability to you for damages, costs and expenses, including legal
fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR
NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR
BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK
OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL
NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT,
CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF
YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.
1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you
discover a defect in this electronic work within 90 days of receiving
it, you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or
entity that provided you with the defective work may elect to provide
a replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.
1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
68. INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
1.F.5. Some states do not allow disclaimers of certain implied
warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation,
the trademark owner, any agent or employee of the Foundation,
anyone providing copies of Project Gutenberg™ electronic works in
accordance with this agreement, and any volunteers associated with
the production, promotion and distribution of Project Gutenberg™
electronic works, harmless from all liability, costs and expenses,
including legal fees, that arise directly or indirectly from any of the
following which you do or cause to occur: (a) distribution of this or
any Project Gutenberg™ work, (b) alteration, modification, or
additions or deletions to any Project Gutenberg™ work, and (c) any
Defect you cause.
Section 2. Information about the Mission
of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.
Volunteers and financial support to provide volunteers with the
assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
69. remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.
Section 3. Information about the Project
Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.
The Foundation’s business office is located at 809 North 1500 West,
Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact
Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many
70. small donations ($1 to $5,000) are particularly important to
maintaining tax exempt status with the IRS.
The Foundation is committed to complying with the laws regulating
charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.
While we cannot and do not solicit contributions from states where
we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.
International donations are gratefully accepted, but we cannot make
any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Section 5. General Information About
Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.
71. Project Gutenberg™ eBooks are often created from several printed
editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.
72. Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com