Transcript: Improving your metadata: Common issues and how to fix them - Tech Forum 2024

Nataly Alarcón: Hello, everyone, thank you for joining us for this Tech Forum session. I am
Nataly Alarcón, Marketing and Events Manager at BookNet Canada. Welcome to
"Improving your metadata: Common issues and how to fix them."
Before we dive in, BookNet Canada acknowledges that its operations are remote, and our
colleagues contribute their work from the traditional territories of the Mississaugas of the
Credit, the Anishinabe, the Haudenosaunee, the Wyandot, the Mi'kmaq, the Ojibwa of Fort
William First Nation, the Three Fires Confederacy of First Nations, which includes the
Ojibwa, the Odawa, the Potawatomi, and the Métis, the original nations and people of the
lands we now call Beeton, Brampton, Guelph, Halifax, Thunder Bay, Toronto, Vaughan, and
Windsor. We endorse the Calls to Action from the Truth and Reconciliation Commission of
Canada and support an ongoing shift from gatekeeping to spacemaking in the book industry.
As this is a prerecorded session, there won't be a live Q&A, but we would love to hear from
you. Feel free to reach out with any questions or comments at techforum@booknetcanada.ca.
Now, I will hand things over to my colleagues who will guide you through some of the
common metadata issues we encounter. And more importantly how to fix them. Enjoy the
session.
Carol Gordon: Hello, my name is Carol Gordon and I'm the Director of Product
Development at BookNet Canada. Thanks for joining us today for this webinar. So, let's talk
about our common issue number one, and that is not providing enough data for people to
really understand your products. So, we want you to take away from this that great metadata
can really give you a competitive advantage especially in online marketplaces.
So, in general, the industry and consumers, they really want and need more data about your
books. Publishers have such an incredible depth of knowledge about their authors and other
contributors and the books that they produce, and the real challenge is to communicate all of
that information across the book supply chain, to the sales reps that sell your books, to book
sellers, to wholesalers that supply your books to libraries in a way that people can understand
the merits of the books that you're publishing.
So, the more steps away from your publishing house, the more your data has to stand on its
own. Sales reps may be in-house and have an extensive knowledge of your books and all of
their key selling points. External reps may attend a sales conference to learn about your
books. Book sellers may have meetings with sales reps but at each step, there's, kind of, less
available time to devote to the in-depth understanding of the books, and your data can really
provide that extra level of knowledge.
As you get to the consumer level, there are so many books available, and a consumer may
not even speak to a bookseller. They may simply be browsing or searching online. We
continue to see a proliferation of online booksellers in the Canadian market. And in an online
shop, your store listings, those store listings will only be as useful and as detailed as the file
that the store receives.

So, you're going to hear today about some of the specific data points that the industry is
really clamoring for, but in general, we hope to encourage publishers to just share as much as
they can in their metadata feeds to all levels of the supply chain.
So, you will hear a lot about standards today. So, let's just start by talking about what
standards are, why they're important, and why we use them. So, standards are what allow all
of the supply chain partners—publishers, wholesalers, distributors, retailers—to use a
common language to describe books. The structure of standards also allows systems like
catalogues and retailer shops to organize titles in logical ways and support search and filter
mechanisms. So, that's where we're getting into, that online bookstore scenario again.
So, in this webinar, we will mostly be talking about the ONIX for books metadata standard.
At its most basic level, ONIX is simply a file that is formatted in a way agreed upon by the
whole industry internationally so that any company in the supply chain can create and send
or receive and extract information about any book.
There are other standards in the industry. And I think most people will recognize this list:
ISBN is a standard, BISAC, Thema, and Dewey subject codes as well as newer standards
like ISNI, which is an international standard name identifier, which is gaining adoption for
identifying creators and entities like publishers and imprints in a standard way. So, it's not
scary. It's just that we have all agreed to work on things and send information in the same
way so we all know what we're talking about, and we can all extract and use the information.
And so as the needs of the industry change of course, these standards can evolve and change
to encompass new data points that are needed or reflect changing understandings. So, for
example, ONIX has recently added better support for communicating data about a book's
accessibility features. And BISAC has been working to address some identified gaps in
terminology and representation of Indigenous communities and world views.
BookNet represents the Canadian perspective on many book industry standards groups. So, a
great way to keep up to date on new standards in the industry and changes to existing ones
would be to sign up for the weekly eNews mailing list.
How to fix this? The first tip would be to take control of your data. You should know who is
creating the information about your books and how it's being sent to your data trading
partners. So, if you aren't managing your ONIX files yourself, you can ask questions about
what fields your distributor or your data provider supports and ensure that you're providing
as much of that data to them as you can.
If you have the capacity, you can explore options for sending your own data feeds to
supplement those distributor data feeds. If you have a relatively small number of titles or you
don't have a distributor, there are some inexpensive tools available that can help you create
ONIX feeds. BookNet's Webform is one option that we have available, but there are other
tools out there. You can also supply additional non-ONIX content like interior images,
excerpts, teacher's guides, and reading guides directly to your data trading partners.
BiblioShare, for example, accepts all of those, and that can increase the availability of that
content generally in the market.

Make use of the new data points and standards. Keeping up to date with those things. If
you're able to provide that information, that's great. We understand that that takes effort. So,
if you have an extensive backlist, it's okay to focus on adding support for those new fields on
your forthcoming lists or increasing support generally on what's available for forthcoming
titles and work through your backlist as much as it makes sense to do. Focus on core backlist
as you're able to do it. There may be tools available to help implement support for new
standards. For example, you can use a conversion tool like the BISAC to Thema translator.
We'll have a link to that at the end of this presentation, which takes your BISAC codes and
gives you a suggested Thema code, so Thema being the newest subject standard. It's not
replacing BISAC at this point. Those two standards are being supported together, but you can
learn more about Thema. And if you're interested in providing that data, there are tools that
will help you get into that and start providing that data a little more quickly than a manual
process.
And if you have specific questions, get in touch with the BookNet team. We're a good
resource that's here available for you, and we can probably point you in the right direction.
Stephanie Small: Hi, everyone. I'm Stephanie Small, and I'm one of the product
coordinators for the CataList and BiblioShare teams here at BookNet. I think we can all
agree that all of us in the publishing industry in this country have a vested interest in making
sure that Canadian voices are heard.
We know that booksellers, librarians, and readers themselves want to read Canadian-
authored books. So, it's important to have a way to identify those books to help them stand
out in the marketplace. Now, because of all this, it's not uncommon for a publisher to get in
touch with us and ask, "Why isn't my Canadian author being identified as Canadian?"
This question stems from the fact that many online sources will recognize and highlight
Canadian authors, usually with some kind of a visual indicator. For example, BookNet
products like CataList, SalesData, and Bibli-O-Matic display a maple leaf icon next to
contributor names. That lets everyone know that a particular contributor is Canadian.
And we aren't the only ones doing this. Some retailers, even tools of the trade, also convey
this information with a maple leaf, or a Canadian flag, or a badge of some sort to call out
Canadian content. This information is also useful in the background for things like marketing
initiatives, research, and reporting. Canadian bestsellers lists rely on Canadian contributor
information, and someone putting together a bookstore feature table can use it to find
qualifying titles. Another example, if you're familiar with the website 49th Shelf, they import
and display listings only for titles with Canadian authors.
So, how do you make sure that your Canadian-authored book isn't getting lost in the shuffle?
Well, first of all, let's start with the definition. BookNet and the Canadian Bibliographic
Committee define a Canadian contributor as an author, illustrator, translator, or editor who is
a Canadian citizen or a permanent resident of Canada.
Of course, there are many other contributor roles that are named in ONIX outside of those
core four, so your best bet is to provide location information for all contributor roles. Just

keep in mind that those who use your data, like retailers, for example, will make their own
choices on how to display it.
If you want more guidance on this topic, I highly recommend you read Identifying Canadian
Authorship, which is part of our BookNet Canada user documentation. We've linked it here
in our slide deck so you can check it out if you need to.
So, that's the standard. How do we fix it? Generally speaking, you'll need to provide location
information for your authors in your ONIX metadata. And the good news is whether you're
still using ONIX 2.1 or you've made the leap to ONIX 3.0, you can really easily update your
information to include this data.
If you're using ONIX 2.1, the process is pretty straightforward. The contributor composite
includes an optional data element, "country code," where you can input the value "CA," or
"Canada," in order to say that your contributor is Canadian. Easy peasy.
If you're using ONIX 3, the process is still straightforward, but there's one added detail that
you want to get right in order to make sure you're adhering to Canadian best practices. You'll
still be adding to the contributor composite, but within that is a new composite called
"contributor place," and that contains two elements. The first is "country code," which we
just saw in our previous example, and the second is "contributor place relator." Adding a
contributor place relator allows you to be much more specific about the relationship that the
contributor has to the place that you've identified. In our previous 2.1 example, we could tell
you that an author is somehow associated with Canada. In ONIX 3, now we can tell you
precisely what that association is.
Remember that the standard endorsed by the Canadian Bibliographic Committee is to
consider a contributor to be Canadian if they're a citizen of the country. So, you'll convey
that relationship, "citizen of," using the place relater "08." This is the code, for example, that
CataList needs in order to flag an author as Canadian in the system.
Now, we don't recommend that you use this code, "citizen of," willy-nilly just to make sure
you get that coveted maple leaf icon. Make sure you're using this code when you can make a
reasonable assumption that the author is Canadian according to the definition we provided
earlier. Otherwise, it's best to use a more appropriate place relater code or omit it altogether if
you have to, even if it means the indicator you want doesn't show up. What we're striving for
is accuracy, and hacking the data to do what you want it to do doesn't really help us achieve
that goal.
I want to wrap up by recognizing that how you communicate these details may differ
depending on how you're generating your ONIX. You may be using software that generates
the code for you, or you may be providing a spreadsheet to a data partner like a distributor.
Maybe you have some other method, but generally speaking, no matter which way you go
about it, it's always better to fix the problem at the source—in your metadata. Playing whack-
a-mole and submitting a hundred different bibliographic corrections on different platforms is
a lot more work and can mean a lot of inconsistencies in the future.

The takeaway is that if you don't have access to the data to take these steps we've outlined, or
if you've already gone through these steps and you're still not seeing results, talk to your data
partners. Find out how they expect to receive information on Canadian contributors and do
your best to follow their guidelines. That way, you'll be able to make sure that your Canadian
authors are being recognized in the market and that readers hungry for Canadian content can
find your books.
Vivian Luu: Hi, I am Vivian Luu, an Associate Product Manager at BookNet Canada. For
the next five or so minutes, let's talk about describing the ways a product can link to other
products. One thing that everyone is interested in is making book discovery easier. A reader
might want to find books comparable to the one that they just finished, or they might want to
look for other formats of a book. Finding related books, however, can be a challenge if those
books aren't identified in the metadata as being related.
There are a lot of relationships that can be described through the metadata, and here are a few
examples of the kinds of relationships that you might want to consider calling out in your
data. You can link the main product to other formats of the book, such as the ebook or the
paperback format. You can link to titles by the same author. You can link to comparable
titles, or you can link to previous editions.
Readers and book buyers can usually find this information on a book's product page. On this
slide, we have a screenshot of how a site can use related product data to, for example,
populate a drop-down list that shows the available formats of a book. There can also be a
section for the book's comparable titles, such as in this example from CataList. These are just
a few of the many ways that websites can display related products, but they can't do that if
the data is not available.
In both ONIX 2.1 and ONIX 3, a related product is identified using the related product
composite and the relevant relation code. The related product composite is repeatable so that
you can include different related products that are linked to the one that you're describing.
Within each related product composite, you must include a relation code to specify the
relationship that the related product has to the main product. In ONIX 2.1, the relation code
is non-repeatable, but in ONIX 3, the relation code is repeatable. That means the best
practice in ONIX 3 would be to provide one entry per ISBN that is in the related product
composite, and you can include in each entry all the types of relationships that this related
product has to the main product. But as with any advanced concept for ONIX 3, you should
confirm with major trading partners that they're prepared to accept data presented this way.
You can find relation codes in the ONIX Codelist 51, and here are a few common examples.
Relation code 03 is used if the related product is the previous edition of the book. Relation
code 22 is used if the related product is a book by the same author. Relation code 23 is used
if the related product is a similar book, for example, if it's a comparable title.
So, how do you make sure your related products are included in your metadata and that
they're used? First of all, connect with the websites that display your product information.
Ask them if they pull in related product data, and if so, which relation codes they support.
Then you can look into supplying the codes in your metadata. You would add the related

products like you would any other composite from the ONIX. And depending on how you
send your data to data recipients, you might not have to handle the metadata directly in its
XML format. If you're using an ONIX editing tool, for example, you likely have an interface
where you can enter the related product's ISBN and specify the relationship it has to the
product that you're updating.
This screenshot is an example of how you can add related products to a record using
Webform. When adding the related products, consider how many products you're providing.
Websites might have a limit on how many related products they display or accept for a single
book. If you find yourself adding 40 or 50 related products per record, think about the
marketing needs for your book and business and use that to figure out the most relevant
related products for the book.
Once you've added the related products to the related product composite and have sent that
metadata to your data recipients, websites that are built to show related products can use
them for display.
Kalpna Patel: Hello, I'm Kalpna Patel, Product Coordinator for SalesData and LibraryData
at BookNet Canada. They say age ain't nothing but a number, but I'm here to talk about how
important that number is to your metadata. Age range data refers to the specific range in
years or school grades of the intended audience of products aimed at children. This data is
expected by retailers for all children's and young adult books.
Without this information, consumers, booksellers, and ultimately publishers are at a great
disadvantage. Shoppers would not be able to locate and buy the books that are appropriate
for the ages and developmental stages of the children for whom they are shopping, and
booksellers wouldn't know which sections of the store the titles should be displayed in and
may end up losing sales due to poor product placement.
Providing specific age range data arms booksellers with the product knowledge to better
position and sell your books and allows consumers to find exactly what they're looking for.
Age range data for children's books, when it is included in the metadata, appears alongside
all of the title's bibliographic details and descriptions on retailers' websites, catalogues,
marketing materials, and on BookNet Canada products like CataList and SalesData.
The North American Best Practices Guide states that age range data must be supplied for all
trade products aimed at children and young adults. In addition to ONIX standards, data
suppliers should be sure to refer to this guide as it is full of excellent information about how
companies are actually using metadata in the Canadian market. When providing audience age
or grade ranges, data suppliers should be precise. Ranges on children's books should rarely
exceed two years at the lower end of the range, reflecting the core appeal or purpose of the
content of the product. The range can be larger, perhaps three or four years, at the upper end
of the children's age range.
An overly broad range, for example ages 6 to 11 or grades 2 to 7, or open-ended ranges such
as 6 plus or up to grade 7, are much less realistic than a narrow range such as ages 8 to 9,
even if the book might be suitable for some 6 or 11-year-olds. However, there are some
common sense exceptions to the rule. An open-ended range such as 12 plus would indicate

that the work is suitable for young adults. Zero to 99, however, while very cute, just isn't
very useful.
To indicate age range in your ONIX data, refer to Codelist 30 to see which ranges are
accommodated. Age, grade, reading, and interest age are the ones most commonly used and
are expressed by year, with the exception of infants, in which case age can be expressed in
months. While the values depend on the code used, they are expected to be numbers, with the
codelist defining what's being measured. You can provide a range which includes a specific
lower and upper age, like 10 to 14 years, or you can use the open-ended "from" to indicate a
lower age and "to" to indicate an upper age only. Exact ranges are often used to specify
grades.
Let's take a look at an example. Here's the ONIX data for a book intended for 2 to 50-year-
olds. From Codelist 30, we have used 18, which refers to reading age for the
<AudienceRangeQualifier>. Next, we're using the codes for from and to, 03 and 04
respectively, to indicate a specific age range, as opposed to an exact or open-ended range for
the <AudienceRangePrecision>. Finally, the lower and upper ages are indicated by the values
2 and 5 for the <AudienceRangeValue>. With this data, booksellers will know where to
shelve this title, consumers will know it's suitable for their toddler or preschooler, and
publishers will know that their books are being easily found and purchased.
Lily Dwyer: Hi, everyone, my name is Lily Dwyer, and I'm the Product Manager for
SalesData and LibraryData here at BookNet Canada. Today I'll be talking to you about
keywords and how to use them effectively within your ONIX metadata.
So, what's a keyword? In the Revised Best Practices for Keywords in Metadata, Fistry
defines a keyword as a consumer-oriented word or phrase that describes the content, theme,
or other relevant aspects of a book product that, one, is used to supplement but not repeat
publicly displayed data such as title and, two, we'll assist them with discoverability,
including differentiating among books with similar subjects and themes.
So, when used effectively, keywords should use natural language to enhance a book's
customer-facing metadata that in turn helps potential book buyers find books on online
bookstores. So, for example, they can help surface books when a reader doesn't really know
the title or the author of a book but they do know what it's about or who the main characters
are. They can also help expand search results if someone is looking for all of the books that
are available on a specific topic or winners of a certain award. And then they can also help
expand on a BISAC subject category if you want to make the subject even more specific.
However, the problem with keywords is when you use words that have no direct relation to
the content of a book with the intention of improving its discoverability and therefore
manipulating search results. So, this is particularly relevant in cases like comparable titles
and authors. So, for example, you shouldn't use a keyword like Harry Potter for a non-Harry
Potter book or Dan Brown for a title that wasn't actually written by Dan Brown. In cases like
this, oversaturation of certain keywords results in pages and pages of search results that
readers then have to click through to find the books that they're actually looking for.

So, in terms of where the data shows up, keywords are a little different from some of the
other metadata points that we've discussed in today's session as they're pieces of
bibliographic data that are typically not displayed to the consumer. So, this means that they
don't usually appear on a book's product page in the same way that title, author, and
description is available to a consumer when browsing books on a retailer's website. Instead,
they work in a, sort of, behind-the-scene manner to affect search results.
So, ideally, you want readers to find the book they're looking for via search on page 1 of their
returned search results rather than, say, page 15. That being said, keywords are displayed on
BookNet Canada products like CataList and Bibli-O-Matic, as you can see here. In regards to
ONIX, keywords are supported in four distinct keyword lists: normal, not for display, or
character, or name as keywords, and finally, one dedicated for place references.
So, I'm restricting myself to describing the one expected by Amazon, which is the normal
one using <SubjectSchemeIdentifier> 20 and applicable in both ONIX 2.1 and 3.0 data. So,
the others are used and are recommended for their focus purpose, but be aware that key place
names are restricted to ONIX 3.0 use only. If you have any questions about the extended list,
it's a great reason to contact BookNet for more information. So, ask your training partners if
you should be using them.
But as an example, here's how, let's say, a book about World War I would appear in your
metadata. One thing to keep in mind is that keywords are a little bit different from, say,
BISAC or Thema codes in ONIX. So, while subject composites for BISAC or Thema codes
each have their own separate entries, all of the keywords of a single product are provided in
the same subject composite and are separated by a semicolon, as you can see outlined here.
So, to make sure that you're using keywords correctly, here are a few rules to adhere to. First,
you should use keywords that are likely to be used by readers. So, for example, you don't
want to use any publishing terminology that is unfamiliar to users, but instead you should use
terms that reflect characters, locations, plot themes, or genre terms. You also want to
consider all of the phrases that a user might use to search for a book within a particular
subject. So, for example, when searching for a book on World War II, they may search by
World War 2 with the number 2, they might use the Second World War, or they might do
WWII.
So, two, your keywords should also supplement the data that already exists in your ONIX, so
you don't need to repeat any phrases that may already be found in other metadata fields. So,
let's take format, for example. If you want to make sure that a title can be identified as a
board book, you should use the appropriate board book form code and should not use that as
a keyword. Remember, space is limited and you should use keywords that add value to your
metadata.
Finally, choose keywords that can accurately represent the book's content. You should try to
avoid manipulating search results by referencing well-known authors or titles that are
irrelevant to the book. Not only can this make it harder for the reader to find the books
they're actually looking for, but it can also violate some retailer's terms of service if they
have rules against using misleading phrases.

Shuvanjan Karmaker: Hello, my name is Shuvanjan Karmaker, and I'm a product
coordinator for CataList and BiblioShare here at BookNet Canada. Today, I'll be talking to
you about HTML and XHTML and how to use them effectively in your ONIX. This is
written with a person doing the data entry in mind and is ignoring issues about inserting
HTML entries into an XML file. BNC is happy to talk about those, so get in touch.
In the meantime, here's how to keep the HTML both simple, easy to do, and appropriate to
ONIX. What is wrong? Display issues in your book, contributor, and/or biographical
descriptions caused by rogue <div> tags. Where the data shows up?
The data shows up on CataList and other downstream ONIX recipients, including libraries
and retailers. HTML and XHTML should still be contained within tags designated for them
but not within CSS tags like <div> or <span>. This can cause display issues or more in BNC
products and other downstream recipients working with your ONIX.
What the standard says? In order to add multiple paragraphs, or text formatted as bold or
italic, some ONIX data fields can contain markup. For example, a tag to add a paragraph or
tag to make some text bold. These are HTML tags, used most commonly on webpages.
HTML defines a set of tags that can be used in webpages, and ONIX allows a subset of these
tags to be used within certain ONIX data fields.
What it means? HTML allows a wide range of tags to be used, but ONIX allows only some
of these tags to be used within its records. It is crucial to use the HTML and XHTML tags
that are allowed. Number one, make sure text and tags are contained within paragraph tags.
Number two, never cut and paste from a word processor, PDF documents, or websites, and
make sure you know your characters are in UTF-8. Number three, case matters. Lowercase
for XHTML tags.
How to fix it? Only use HTML tags where it's expected, in the ONIX fields/tags designated
for them. It is strongly recommended that ONIX data suppliers use the following tags: <p>
tag for paragraph, <br/> for line breaks, <i> for italic, <em> tag for emphasis/bold. You can
use the <cite> tag for book titles. You can use the <ul>, <ol>, and <li> for bulleted numbered
lists. You can use the <sub> or the <sup> for sub and subscript tags. You can use <dl>, <dt>,
and <dd> for definition lists. You can use <ruby>, <rb>, <rp>, and <rt> for simple glosses in
Mandarin, Cantonese, Japanese, and other texts.
Any HTML and XHTML attributes, example style attributes, should be avoided. And it
should be emphasised that these recommendations apply to both HTML and XHTML. There
is a complete list of XHTML and HTML tags recommended, allowed, and disallowed within
the ONIX 3.0 Implementation and Best Practice Guide.
It is worth mentioning that errors often occur when people copy and paste text from PDF
documents and websites. The number one recommendation for fixing this would be to ensure
you "Paste Special" and only paste text into your metadata system. Then add only the
markup you intend to send.
Using XHTML instead of HTML is the best method of all. XHTML uses textformat="05"
instead of "02", and you must ensure that all the markup tags are properly matched and

nested. In HTML, the </p> close paragraph tag is actually optional, but in XHTML, it is
mandatory. And XHTML tags must always be lowercase, whereas HTML tags, uppercase
tags can be acceptable.
Please note, not all tags allow XHTML. If the text included in your HTML or XHTML are
used by retailers to index, then stripping them of XHTML or HTML will allow parsing of the
text. ONIX provides for this by XHTML enabled "display" entries like
<ContributorStatement> or <TitleStatement>. This is great if your book information needs
formatting support outside of the convention. If not, then keeping them free of HTML or
XHTML will allow retailer and library systems to index and parse as intended.
Lauren Stewart: Hello, I'm Lauren Stewart, and I'm a standards junkie. Today marks 55
days since my last standards meeting, but in all seriousness, I've had the privilege of not only
supporting BookNet's customers as they engage with our products and services, including
how subject classification schemes like BISAC and Thema show up in the data sent and
received across the industry but also having the fortune to see behind the curtain as a member
of BISG's subject committee, which administers BISAC and the International Thema
Steering Committee.
I've been asked to speak about subjects and how they're being mishandled by data centres.
I'm also going to talk about some opportunities along the way and include some reminders as
to how data recipients should engage with the same bibliographic data.
A main subject is the gin in your martini, the clams in your linguine. Well, actually, those are
lyrics from a Simpsons episode, but you get the idea. Tagging a main subject in your
metadata adds that little extra spice to your bibliographic data that not only meets the best
practices for the North American market and beyond but removes opportunities for
ambiguity by telling data recipients exactly which subject is the most important one to
consider when classifying a book.
Take, for example, the following selection of subjects associated with a book with product
information that was sent to BookNet's BiblioShare service. This is a screenshot taken using
our Bibli-O-Matic Browser extension. And here are the Thema subjects for that same book.
This book was brought to BookNet's attention by a customer of many of our products and
services who submitted a bibliographic correction on our sales data service. This book in
question currently has 24 subjects provided across four subject classification schemes,
including keywords and one internal classification scheme. On this slide, I've listed all the
subjects together in order from their ONIX file. There's a lot here, so let me walk you
through this to illustrate some of the issues insofar as subjects are considered.
Here you can see the 12 Thema subjects included in the file. None of these have main subject
indicated. And here are the 10 subjects attributed to this book according to BISAC. As
observed with the Thema subjects, none of these have main subject indicated.
I won't share the title of the book, but upon review of the title across various internet sites,
I'm fairly confident that this is a parenting book that explores the subject through a mental
health and healing lens. I can also state confidently that no online description of the book

mentions abuse, child or otherwise, although three of the provided subject codes state this,
and it appears on several of the sites listed here, including as the subject of record for
BookNet's own sales data and library data.
So, here you can see the various places that this data is showing up for this book. On this
slide, I've tried to explain the difference between how different sites display subject
information when a main subject is not specified by a data sender. While some of this
information is based on our most recent understanding, it may already be outdated. What is
most challenging about this particular ISBN is that there is so much inconsistency comparing
the publisher's supplied metadata against the different sites, even when considering the list
displayed on the publisher's own website. If this is confusing and frustrating, that is precisely
our point.
When a publisher creates bibliographic data to send to downstream data recipients, the
publisher is distilling down all of their collected wisdom about that book across many
different departments and job functions. Quite simply, before publication, the publisher is the
expert on the book, and data recipients must rely on that collected wisdom to make a number
of decisions about the book in order to support its sale.
Up until the book is bought or reviewed by a retailer buyer, the retailer must trust the
provided data. In many situations, perhaps safe to say in most situations, data exchange
between trading partners goes off without a hitch. When a data sender does not fulfil their
obligations, in this case by not providing a main subject, the publisher has effectively handed
over control of an incredibly powerful piece of bibliographic data, one that influences a
number of pivotal links in the supply chain. On this slide, I have listed some of the practical
applications of providing a main subject in bibliographic data, as well as the implications
when a main subject is not specified by data senders.
Now, in the previous slides, I've tried to appeal to your reason. Now I will appeal to your
instincts as honed in kindergarten. It's time to follow the rules. In case you're new here, the
book industry has, let's put some air quotes here, standards. Not you can't sit with us
standards but standards in how we communicate information about the books in our supply
chain. Resulting from the distilled wisdom of people with expertise in their subject matter
and who know the needs of the organisations they represent, standards in the book industry
were formed by people such as manufacturers, sellers, buyers, customers, trade associations,
users, and regulators.
Simply put, a standard is an agreed-upon way of doing something. We use a number of
standards in the book supply chain, including ONIX, as well as BISAC and Thema, which
are standards for subject classification. When selecting subject classifications for a book, the
publisher's goal is to unambiguously define to data recipients, particularly retailers, into
which section of their stores a book might be placed or where it should sell best. Main
subject is intended to do just that. A book may be about any number of topics, but this one
topic is the most accurate description of the content and where it will find the most readers of
a content area. I've included here some screenshots from the BISG website, which has an
excellent FAQ on how to use main subjects for BISAC.

While my example book from earlier included 10 BISAC codes, the subject standard very
clearly states that data senders should provide no more than three codes, ordered by content
focus, and choosing fewer than three is also a best practice when the codes chosen fully
describe the book. In fact, the ONIX specs practices notes there is diminishing value in
providing more and more codes. Two or three thoughtfully chosen, detailed, and relevant
subject categories are more useful to a retailer and to a potential purchaser than 12 or 13 that
are less relevant or overly broad.
Do not supply multiple, barely relevant classifications in an attempt to get the product listed
under as many headings as possible at a retailer or to place better on a so-called bestseller
list. For both BISAC and Thema, do not provide broad classification where you also provide
a more specific code. Always use the most specific classification that is appropriate.
Finally, the ONIX best practices cautions that main subject is sometimes misused to indicate
a broad category where a detailed category is also supplied. For example, with the book I
shared earlier, when BISAC code FAM03400 for family and relationships, parenting,
general, was provided in addition to two child detailed subcategories, namely FAM034020
for family relationships, parenting, co-parenting, and FAM03300 for family and
relationships, parenting, parent, and adult child. As exemplified by this example, main does
not mean broad. It should be used to indicate the most relevant detailed category.
You'll find that the Thema subject standard is quite similar to BISAC where implementation
is concerned. Ensure the first subject category is the primary or main subject. Classify titles
as precisely as applicable or as broadly as required. Assign as many categories as are
required within reason. While BISAC discourages more than three codes, Thema tops out at
four plus qualifiers.
Recall my example book and its 12 Thema subjects, 3 times the standard with no qualifiers
and no main subject indicated. How could a retailer be expected to know which subject most
accurately represents the book's content when there are so many subjects to choose from and
when it's been jumbled up and mixed in with other subject standards such as BISAC,
keywords, and internal subject classifications?
North American Data Exchange is outlined in the BookNet and BISG co-publication Best
Practices for Product Metadata Guide for North American Data Centres and Receivers. The
document confirms that granularity and specificity are paramount in assigning subject codes
and urges the use of the most specific code that is appropriate. Supplying a general subject
code on a given product that also has a more specific code in the same subject is bad
practice.
Although there is no limit in ONIX to the number of codes that can be supplied, a best
practice is to supply at least one BISAC code and up to three BISAC codes, if appropriate.
More than three codes should be reserved only for those cases where it is absolutely
necessary. One of those subjects should be considered the main subject of the product and
should be listed first. Generally, all subjects should be listed in order of importance.
The ONIX best practices as published by EDItEUR agrees if classification within a particular
scheme makes use of multiple categories or category codes and therefore more than one

subject composite, one should be flagged as being the primary category for that scheme with
the main subject empty element. As a reminder, in ONIX 3 there is no dedicated main
subject composite as there is in ONIX 2.1.
Another thing to keep in mind is that a main subject is unique to trade book subject systems
like BISAC and Thema and is a documented requirement from retailers. It's needed to
ground the book's focus so qualifiers and some subjects can be interpreted through that lens,
fiction or not, children or not.
In library systems, for example, multiple subjects are needed to fulsomely reflect the content
ordered by importance, even references considered minor in a trade book selling
environment. So, it doesn't matter which market you want to support, the main subject serves
the same purpose, a necessity for retailers and an influencer for discovery.
This one is an impossibly easy mistake to fix in your product metadata. All known major
bibliographic data systems provide support for the main subject tag in ONIX. We suspect
that the hindrance for many data centres is not a technological inability to provide this data
but either human error, lack of understanding of standards and best practices and/or internal
policies that misdirect staff to not select a main subject from a list of provided ones.
If you are watching this presentation today, consider yourself a member of the flock. If you
work for a firm that routinely fails to provide main subjects in your bibliographic data, we
are enlisting you to evangelise for change within your organisation. Rather than have to deal
with the effects of an ISBN with bibliographic data distributed without a main subject on a
one-off basis, we recommend analysis of your active ISBNs to ensure that all titles, front list
and back list, include a main subject for both BISAC and Thema.
And I will close by echoing what Stephanie contributed before me. Talk to your trading
partners, find out how they expect to receive information on subjects and do your best to
follow both the standards and their guidelines. Thank you.
Tom Richardson: Hi, my name is Tom Richardson. I'm BookNet Canada's bibliographic
manager, and I'm here to give a presentation on why metadata accuracy matters and applying
best practices.
So, "wrong" seldom describes metadata. It's more that mistakes in sending or
misinterpretation on loading are reflected in lost opportunities and sales. Our goal might be
best seen through the online experience of book buyers. I mean, book discovery should be
like following a thread that interests you, so long as the next option is of interest. Keep going
down a path, finding more options, seeing books you know and others you'd like to look at.
And at a certain point, when you think you've got a handle on your options, you reward
yourself by choosing the best for you today with a plan to come back. Online search should
be like a well-curated bookstore. What we generally deliver is more tiring and joyless online
results that don't really lead anywhere nice. Your modern choices that don't make sense, even
the mechanical stuff that should be easy, discovery through "by the same author," "next in
the series," "on the same subject," they're not terribly satisfying.

An engaged reader cares and can see what's not there. If they're exploring a new subject, any
book that doesn't fit is a flag. Too many, and they're giving up. Search results should validate
the search choice. They're not fighting the fourth in the series. And they're on their phone,
Googling to find the title in the hope that a crappy search can at least pull that. Returning is
optional when that happens.
Now, the same thing happens between metadata senders and receivers. It is completely
bizarre that many online retailers refuse the simplest solution of displaying book contributors
in the universally supported numbered sequence provided by publishers. And it's equally
bizarre that senders can't seem to supply book title and series metadata coherently without
any manipulation. Author and title, the most basic bibliographic data, are met with screams
of "just list them in order," cascading against "just match the book."
That said, book content is one of the most diverse and complex products sold, and book
metadata is one of the best systems, bar none. My comments are made looking at aggregated
data and are not representative of any one business and surprisingly few Canadian ones. We
should be proud of what we've made of it. And if you don't believe me, try to find a part for
your bicycle or decide if a fridge is discontinued or not. Then, look again at online book
search. It's not so bad, eh?
Now, aggregated data is the key to understanding what a best practice is. Anytime your data
rubs up against another company's, there's friction. Minor differences create poor outcomes,
and bad choices create disastrous ones. A recent meeting with a major data house had them
admit that their inability to provide market-based data in a specific case was because they
didn't actually track markets but used currency codes as a proxy.
That means it doesn't matter if the data was clearly marked "not for Canada," if it included a
CAD price. We could have an argument about whether it's the fault of the sender for not
supplying a price and currency to a market where it's not used, or in the data house for not
using market territories where it's supplied. And frankly, often both are done badly. But my
point is that if either of them had followed best practices around market data, then we likely
wouldn't have had the problem.
The right outcome would have, well, certainly had a better chance to muddle through the
morass. And that illustration is for two bad choices. Minor differences are more resilient. We
don't need perfect metadata. We just need it as good as it can be, as a best practice. It's in
your hands to supply it or use it with sufficient accuracy to minimise friction in an
aggregated book metadata situation. You can't control what other companies do, but if you
do this, you maximise the chance of meeting actual needs.
The simplest way to define any metadata best practice would be for senders mapping from a
database into a metadata distribution and receivers mapping metadata into their data set and
knowing that the former relies on the latter and the latter expects that the former has used or
will use the same definition for the value being mapped. The definition both use must come
from the metadata standard.
For just a moment, think about aggregated data and friction. You cannot possibly argue this
point. Now, ONIX is a book communication standard built out of other standards so its rules

can exist to support an outside standard, starting with ISBNs, and an ONIX record is
designed to be compatible with its assigned ISBN. It makes it hard to use ONIX for a quick
illustration, though I can at least say the obvious.
When you use an ONIX code to define a value, you need to use the EDItEUR's definition for
the tag, the code, and not your definition, what the tag name represents to your company.
You map to EDItEUR's definition, not your own. But let's make a quick comparison of the
definitions used by two subject standards, BISAC and Thema, because weirdly their
published best practices are almost interchangeable, at least at a high level. That's probably
because they are used for the same end purpose, and metadata in a real sense is defined by
end-user needs. Even if a publisher asks for a code, they ask in the hope it will be used for
the greater gain of both the publisher and retailer. It's a request for use, and metadata exists to
be used. You could say that best practices maximise usability, but I think my simple
definition already covers that, and I'm off on a tangent. Let's go back to Thema and BISAC.
Thema embeds its notes in its codelists. There are six golden rules, copious white papers on
topics like diversity, but outside of taking care to remember that the notes for the major
heading that your target code is treed under might be read to clarify its tree's usage, every
effort is made in that code's notes accompanying it to describe and use and offer similar
alternatives, and expected qualifiers that might flavour your subject decision.
The online browser includes the notes and is packed with synonyms so that market-specific
terms and expectations will lead to the correct code, whose description will confirm
compatibility with the overall need. BISAC appears to come up rather short in comparison.
Thema's browser is pretty fancy stuff compared to BISG's website, where you can find the
subject codes. But if you looked at the top of every section, there are guidelines for each
major subject heading, and the website is well-supported by a detailed frequently asked
questions page. And a recent reorganisation has left it very clean.
That said, literals, what BISAC calls descriptions, are expected to be self-explanatory. That's
a big difference, but supportable because BISAC works in a limited region, the North
American book market. And while codes may support more than English language books,
BISAC's literals are only available in English. Thema is set up in a more neutral way,
supporting book concepts that are easily translated into more than 20 languages, leaving it
with the formalism in its descriptions that can seem stilted in comparison to chatty BISAC.
One is not better than the other, it's just different.
And the reality is that most metadata creation for either system in our market are done using
subject descriptions viewed in drop-down menus, without any detail on either system. North
American metadata is like that because BISAC is actually designed with that in mind, and it's
what the North American industry is used to. It's worth noting that BISAC has grown
massively over the past decade. It's now way, way beyond drop-down menu simplicity and at
no point has Thema ever been designed for drop-downs, which is exactly why it has such a
fancy online browser.
But my point, and the long rundown to all this, is about that inserted relevant scene in the
slide is that it would be a best practice for subjects if the industry creator or user made sure

that their staff workflows included easy access to the supporting materials and ensured that
their staff can use them. If it's not being provided, you should flag it as a development goal
for your company. Not relying on your system's drop-down menus for subject selection is
one of the lowest hanging fruits you can find and cheap to boot. You're not giving your staff
the opportunity to maximise the value from the subject system, which is bespeakful. You're
making less money than you could. You're not giving the retailers the options they need. In
the reverse, of course, without easy access to the rules, the retailer can be reliably assumed to
be using less value than the publishers deliver.
Applying best practices is all about making more money from scarce resources, and it
requires your metadata to be delivered with meaning and for it to be used with meaning for a
purpose. Map your meaning to a carrier and map your needs from the carrier with a
workflow that ensures every component is used as thoughtfully as possible. The needs of
every standard in your metadata exchange, assignment, and support should be considered.
And done that way, metadata exchange is resilient.
Subjects is a good example, so I'm just going to close on Thema's six golden rules. It's not
really here to be read. They work well for BISAC if you allow that BISAC's codes often
come with an equivalent to a Thema qualifier embedded in them, and allow that it's a less for
both system and typically uses fewer subject codes. These rules work well for both because
they define a good book subject, and both systems are designed to support meaningful
information for an end user. Their goals are the same. Unsurprisingly, the rules are similar.
Standards actually want to play together well.
And I guess that's my closing point. If you're fighting with them or unsure, it's a good reason
to ask BookNet Canada a question. I am also required to say anytime subjects come up,
BISAC has an equivalent to Thema's qualifier list. It's their merchandising and regional
theme codelist. Multiple retailers have confirmed that they do indeed want that data. And
within their more limited focus, they are just as needed as qualifiers are to Thema. Use both,
use all, use well.
Nataly: Welcome back. Thank you for watching this session and staying with us until the
end. We have compiled a list of resources to help you explore this topic further. The slides,
transcript, and a link to the video will be available on the Tech Forum website for easy
access. Feel free to reach out with any questions about metadata and standards. We are here
to help you succeed.
For more professional development content, visit our website at bnctechforum.ca to explore
upcoming events and access recordings of past sessions. Lastly, we'd like to thank the
Department of Canadian Heritage for their support through the Canada Book Fund. Bye-bye.

Transcript: Improving your metadata: Common issues and how to fix them - Tech Forum 2024

More Related Content

Similar to Transcript: Improving your metadata: Common issues and how to fix them - Tech Forum 2024 (20)

More from BookNet Canada (20)

Recently uploaded (20)

Transcript: Improving your metadata: Common issues and how to fix them - Tech Forum 2024