SlideShare a Scribd company logo
Changing Data
Implementing Primo for the Tri-Universities Group (TUG)
Presentation at ELUNA
May, 2009
Alison Hitchens
Cataloguing & Metadata Librarian
Outline
 Background
 Loading data into Primo
 Normalization
 Testing
Where are we?
 Formed in 1995
 Shared resources and collaboration including:
 Shared storage facility
 Shared integrated library system (ILS)
 Reciprocal borrowing
 Document delivery
 Statistics portal
 Shared databases
 Collaborative functional committees
 Shared ILS and Catalogue
 TRELLIS (Voyager)
 No significant changes to interface in 10 years
 Search is limited to catalogue data
Changing Data: Implementing Primo for the Tri University Group of Libraries (2009)
Changing Data: Implementing Primo for the Tri University Group of Libraries (2009)
 One place to search
 Potential to include a variety of datasets:
 Library catalogue (currently loaded into Primo)
 Articles
 GIS information
 Our Ontario image bank
 Local repositories
 Deep search
Primo Advantages
 User-friendly interface
 XML compliant
 Avoids duplication of search results
 Groups together different editions of the
same work (FRBR)
 Interoperability with existing tools
Primo Advantages
Changing Data: Implementing Primo for the Tri University Group of Libraries (2009)
Changing Data: Implementing Primo for the Tri University Group of Libraries (2009)
The Primo Team
The Primo Team
 Team created in late January 2008
 Training held in late March 2008
 Primo Alpha launched to staff in July 2008
 Primo Beta launched to TUG community in
November 2008
 Goal: make Primo the primary search tool in
late May 2009
Phase Two
 Usability testing
 Naming & branding
 New data sources
 New books list
 Fine-tuning functionality
 Deep search
The Primo Team
Loading Data Into Primo
MARC
Changing Data: Implementing Primo for the Tri University Group of Libraries (2009)
Loading Data Into Primo
MARC
MARC
XML
Extract
MARC XML
MARC XML
Loading Data Into Primo
MARC
MARC
XML
PNX (Primo
Normalized XML)
Extract
Normalization
PNX: Display area
PNX: Search area
PNX: Deduplication area
Loading Data Into Primo
MARC
MARC
XML
PNX (Primo
Normalized XML)
Deduplication, FRBR,
Didumean, Indexing
Front End
(user interface)
Extract
Normalization
Deduplication
Deduplication
Normalization: what is it?
 Massaging data
 Rules that tell the program how to get from MARC
XML to Primo Normalized XML (PNX)
 Filter that distributes the incoming data and places it in
different sections
 What MARC tags hold the title
 What MARC codes show the format
 What data should be included in searches
 What data should be available for display
 Transformation rules
 How that data should be formatted (dates, punctuation,
capitalization, etc.)
Normalization: what is it?
 Customization
 Fixing “bad” data
 Complex changes
 Consortial issues
 Lessons learned the hard way
Customization
 Search fields
 Created call number search
 Augmented title search with contents note (505 tag)
 Display fields
 Added subject tag used for slide collection subjects (654 tag)
 Added explanatory text in front of analytical titles
 FRBR
 Excluded “selections”
 Facets
 Used location names as collection facets
Adding contents note to title
search
Adding contents note to title
search
Adding contents note to title
search
Fixing “bad” data
 Old records lacking proper indicators
 Main author (100 tag) with invalid indicators (1st
indicator blank or |)
 Old records lacking subfield coding
 Uniform title (240 tag) missing subfields ($k)
 ISBNS with hyphens
 008 with invalid data in first 6 characters
 Blanks or letters instead of record creation date
008 Workaround
008 Workaround
008 Workaround
008 Workaround
Complex changes
 Tweaking delivery of online journals
 Delivery using SFX
 Exclude serials that no longer have an online holding
but record still coded as online
 Exclude government serials
 Exclude public microdata files
 Exclude databases (integrating resources)
Tweaking delivery of online
journals
Tweaking delivery of online
journals
Tweaking delivery of online
journals
Consortial issues
 Restricting online resources to individual institutions
 Which URL should be presented?
 Should restrictions be presented?
 Coping with shared locations
 e.g. GWINTER = Internet resource shared by Guelph and
Waterloo but not Wilfrid Laurier
 Instead of 2 separate locations UGINTER and UWINTER
 Creating search scopes for colleges and campuses
 e.g. ability to limit search to Architecture materials
Restricting Online Resources
Restricting Online Resources
Restricting Online Resources
 Problem 1:
 Which link belongs to which institution?
 Otherwise will simply present the first URL in the
record
 Need to add $$I based on ownership
 Location code isn’t extracted with the 856
 Problem 2:
 Restricting the each link to each institution
 Otherwise will give Online access message to users
who do not have access
 Need to add restricted delivery scope
Specify Institution for Online
Resources
Restricting Online Resources
Restricting Online Resources
Restricting Online Resources
Limiting searches to a location
Limiting searches to a location
Limiting searches to a location
Lessons learned the hard way
 If checking that a tag
exists, need to also
include subfields
Lessons learned the hard way
 If writing more than one
value, need more than
one rule
Result: $$I GUELPH
Lessons learned the hard way
Match current, match any?
Problem: includes title
from 245, author from 100,
publisher from 260 etc.
Match current, match any?
Match any
If any of the 880 tags have $6 505 then copy the 880
tag as is.
This means that if any of the 880s tag meet this
requirement, it will copy all of the 880 tags.
Match current: just analyse them one at a time and
only copy the one that meets the condition
Testing
 Staging database
 What am I testing?
 Testing sample records
 Testing the process
 Random testing
Test specific changes
 Changes to normalization rules
 Changes to front end display
 Changes to tables (e.g. new location codes)
 New release enhancements/bug fixes
Look for:
 What you were expecting
 Note any surprises!
Staging database
 Holds 200,000 records
 Random sample of our collection
 100 titles from each location code
 Random sample proportionate to records held by
each institution
 Combination of old pre-TRELLIS records and
newly created records
 Shakespeare call number range to test grouping
of editions (FRBR)
Test records
 100 records for repeated testing in front end
 Brief records (acquisitions, e-reserves, CODOC)
 Different formats (micro, music, video, electronic)
 Things to test holdings info (acc. material, multi-volume,
multiple items, multiple locations)
 Foreign language materials
 Duplicates
 Editions
 10 records
 For immediate test of normalization rules in back office
Test specific functionality
 Fulfillment cycle
 When the user finds the item that he wants, can
he actually get the item based on:
 The information presented in the results screen
 The information presented in the full display
 The linking provided
 The information presented in the holdings display
Testing: online resources
 What am I testing?
(what do I want to happen)
 Is online availability showing correctly in relation
to the user?
 Online access
 Online access is restricted
 Physical resource
Testing: online resources
 What am I testing?
(what do I want to happen)
 Does user receive a relevant link?
 SFX delivery
 Direct to resource
 Link appropriate to institution
Testing: online resources
 What am I testing?
(what do I want to happen)
 Can user view alternate/multiple links in the full
record display?
Testing: Online resources
 Variables for sample records
 E-journals, e-books, e-data, databases
 SFX delivery, online delivery
 Multi-volume sets
 Restricted to one institution
 Different institutions, different providers
 Online for one institution, physical for another
Testing: online resources
 Instructions for testers:
 Each tester should check the list of records and
verify that there is an online link
 For each record test that the link label is correct:
 Available online: check against TRELLIS to verify that
your institution has an online holding
 Online access is restricted: check against TRELLIS to
verify that your institution does NOT have an online
holding
 Link takes you to the correct place
Testing: online resources
 Random sampling
 Each tester should also do a search on a subject
of their choice and verify links using the first page
of results
Testing environment
 Test in all views
 Waterloo, Laurier, Guelph
 Test in different IP ranges
 Test off campus
General testing
Testing: feedback from users
Overall, I am VERY
impressed with Primo. It
is far more functional in
many ways.
When I find an online
journal, the Click here
for access link does
not work
I wonder if the alerts by
email are working?
Let me login using
my UWDIR id
Changing Data
 Thank you!
Alison Hitchens
Cataloguing & Metadata Librarian
University of Waterloo Library
ahitchen@library.uwaterloo.ca
http://www.lib.uwaterloo.ca

More Related Content

PPTX
How to make your published data findable, accessible, interoperable and reusable
PPTX
Hotbot ppt
PPT
Hosting a compound centric community resource for chemistry data
PPT
Udhig0613
PPS
Informatics Transkills 2006-7
PDF
Www2012 tutorial content_aggregation
PPTX
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
PDF
Bio ontologies and semantic technologies[2]
How to make your published data findable, accessible, interoperable and reusable
Hotbot ppt
Hosting a compound centric community resource for chemistry data
Udhig0613
Informatics Transkills 2006-7
Www2012 tutorial content_aggregation
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
Bio ontologies and semantic technologies[2]

What's hot (20)

PPTX
Advances in Scientific Workflow Environments
PPTX
ANT1CAG: support session (library help)
PPTX
Open Annotation Model
PDF
Big Data Analytics course: Named Entities and Deep Learning for NLP
PPT
Information Skills: 2. Information Hunting (Natural Sciences, Bangor Universi...
PPT
Sherborn: Lyal - Digitising legacy taxonomic literature: processes, products ...
PDF
Bio ontologies and semantic technologies
PPTX
2019 02 12_biological_databases_part1_v_upload
PPTX
Reproducibility, Research Objects and Reality, Leiden 2016
PPTX
2020 02 11_biological_databases_part1
PPTX
SciFinder and its utility in Drug discovery
PPTX
2019 03 05_biological_databases_part3_v_upload
PPTX
Advanced searching on EBSCOhost to support systematic reviews
PPTX
Working with data.open.ac.uk, the Linked Data Platform of the Open University
PPTX
2019 03 05_biological_databases_part5_v_upload
PPTX
LUCERO - Building the Open University Web of Linked Data
PPTX
Experience from 10 months of University Linked Data
PDF
Connecting the dots: drug information and Linked Data
PDF
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
Advances in Scientific Workflow Environments
ANT1CAG: support session (library help)
Open Annotation Model
Big Data Analytics course: Named Entities and Deep Learning for NLP
Information Skills: 2. Information Hunting (Natural Sciences, Bangor Universi...
Sherborn: Lyal - Digitising legacy taxonomic literature: processes, products ...
Bio ontologies and semantic technologies
2019 02 12_biological_databases_part1_v_upload
Reproducibility, Research Objects and Reality, Leiden 2016
2020 02 11_biological_databases_part1
SciFinder and its utility in Drug discovery
2019 03 05_biological_databases_part3_v_upload
Advanced searching on EBSCOhost to support systematic reviews
Working with data.open.ac.uk, the Linked Data Platform of the Open University
2019 03 05_biological_databases_part5_v_upload
LUCERO - Building the Open University Web of Linked Data
Experience from 10 months of University Linked Data
Connecting the dots: drug information and Linked Data
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
Ad

Similar to Changing Data: Implementing Primo for the Tri University Group of Libraries (2009) (20)

PPT
Searching techniques
PPT
Searching techniques
PPTX
Faceted search using Solr and Ontopia
PPTX
Eureka, I found it! - Special Libraries Association 2021 Presentation
PPTX
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
PPT
From federated to aggregated search
PPT
Inteligent Catalogue Final
PPTX
Taxonomies in Search
PDF
Web Scale Discovery Services: Google like search experience
PDF
New member
PPTX
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
PDF
Ebsco discovery2012 to 2014
PPT
Synchronicity: Just-In-Time Discovery of Lost Web Pages
PPTX
Can personalised be upscaled?
PDF
New member webinar 052418
PPT
Harvesting From Many Silos at Web-scale Makes E-content Truly Discoverable
PPT
The Internet
PPT
Erl10 web scale-gb-sg
PPT
Social Web 2.0 Class Week 8: Social Metadata, Ratings, Social Tagging
PPTX
EDS for IFLA
Searching techniques
Searching techniques
Faceted search using Solr and Ontopia
Eureka, I found it! - Special Libraries Association 2021 Presentation
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
From federated to aggregated search
Inteligent Catalogue Final
Taxonomies in Search
Web Scale Discovery Services: Google like search experience
New member
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Ebsco discovery2012 to 2014
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Can personalised be upscaled?
New member webinar 052418
Harvesting From Many Silos at Web-scale Makes E-content Truly Discoverable
The Internet
Erl10 web scale-gb-sg
Social Web 2.0 Class Week 8: Social Metadata, Ratings, Social Tagging
EDS for IFLA
Ad

More from Alison Hitchens (15)

PPSX
Dewey Update: What's New with the DDC? (2010)
PPTX
RDA 101: an introduction to RDA (2012)
PPTX
Primo Reporting: Using 3rd Party Software to Create Primo Reports & Analyze P...
PPTX
Primo Central Trial, Usability Testing, and Implementation Options (2012)
PPTX
Primo at TUG: Using Primo in a Consortial Environment (2013)
PPTX
Trouble-shooting Tips for Primo (2013)
PPTX
What is #LODLAM?! (revised January 2015)
PPTX
Getting "good" e-theses MARC records from DSpace
PPT
RDA for Public Services
PPTX
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
PPTX
OPAC Via Primo (OvP): Sorting Out What is Primo and What is the ILS
PPTX
Making PowerPoint accessible
PDF
Linked open data and libraries
PPTX
Introducing linked data
PPTX
MOAR RDA For Systems Folks
Dewey Update: What's New with the DDC? (2010)
RDA 101: an introduction to RDA (2012)
Primo Reporting: Using 3rd Party Software to Create Primo Reports & Analyze P...
Primo Central Trial, Usability Testing, and Implementation Options (2012)
Primo at TUG: Using Primo in a Consortial Environment (2013)
Trouble-shooting Tips for Primo (2013)
What is #LODLAM?! (revised January 2015)
Getting "good" e-theses MARC records from DSpace
RDA for Public Services
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
OPAC Via Primo (OvP): Sorting Out What is Primo and What is the ILS
Making PowerPoint accessible
Linked open data and libraries
Introducing linked data
MOAR RDA For Systems Folks

Recently uploaded (20)

PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
RMMM.pdf make it easy to upload and study
PDF
01-Introduction-to-Information-Management.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Cell Types and Its function , kingdom of life
PPTX
Lesson notes of climatology university.
PPTX
Institutional Correction lecture only . . .
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Cell Structure & Organelles in detailed.
102 student loan defaulters named and shamed – Is someone you know on the list?
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
RMMM.pdf make it easy to upload and study
01-Introduction-to-Information-Management.pdf
Pre independence Education in Inndia.pdf
TR - Agricultural Crops Production NC III.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
STATICS OF THE RIGID BODIES Hibbelers.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Basic Mud Logging Guide for educational purpose
Cell Types and Its function , kingdom of life
Lesson notes of climatology university.
Institutional Correction lecture only . . .
human mycosis Human fungal infections are called human mycosis..pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf

Changing Data: Implementing Primo for the Tri University Group of Libraries (2009)

  • 1. Changing Data Implementing Primo for the Tri-Universities Group (TUG) Presentation at ELUNA May, 2009 Alison Hitchens Cataloguing & Metadata Librarian
  • 2. Outline  Background  Loading data into Primo  Normalization  Testing
  • 4.  Formed in 1995  Shared resources and collaboration including:  Shared storage facility  Shared integrated library system (ILS)  Reciprocal borrowing  Document delivery  Statistics portal  Shared databases  Collaborative functional committees
  • 5.  Shared ILS and Catalogue  TRELLIS (Voyager)  No significant changes to interface in 10 years  Search is limited to catalogue data
  • 8.  One place to search  Potential to include a variety of datasets:  Library catalogue (currently loaded into Primo)  Articles  GIS information  Our Ontario image bank  Local repositories  Deep search Primo Advantages
  • 9.  User-friendly interface  XML compliant  Avoids duplication of search results  Groups together different editions of the same work (FRBR)  Interoperability with existing tools Primo Advantages
  • 13. The Primo Team  Team created in late January 2008  Training held in late March 2008  Primo Alpha launched to staff in July 2008  Primo Beta launched to TUG community in November 2008  Goal: make Primo the primary search tool in late May 2009
  • 14. Phase Two  Usability testing  Naming & branding  New data sources  New books list  Fine-tuning functionality  Deep search
  • 16. Loading Data Into Primo MARC
  • 18. Loading Data Into Primo MARC MARC XML Extract
  • 21. Loading Data Into Primo MARC MARC XML PNX (Primo Normalized XML) Extract Normalization
  • 25. Loading Data Into Primo MARC MARC XML PNX (Primo Normalized XML) Deduplication, FRBR, Didumean, Indexing Front End (user interface) Extract Normalization
  • 28. Normalization: what is it?  Massaging data  Rules that tell the program how to get from MARC XML to Primo Normalized XML (PNX)  Filter that distributes the incoming data and places it in different sections  What MARC tags hold the title  What MARC codes show the format  What data should be included in searches  What data should be available for display  Transformation rules  How that data should be formatted (dates, punctuation, capitalization, etc.)
  • 29. Normalization: what is it?  Customization  Fixing “bad” data  Complex changes  Consortial issues  Lessons learned the hard way
  • 30. Customization  Search fields  Created call number search  Augmented title search with contents note (505 tag)  Display fields  Added subject tag used for slide collection subjects (654 tag)  Added explanatory text in front of analytical titles  FRBR  Excluded “selections”  Facets  Used location names as collection facets
  • 31. Adding contents note to title search
  • 32. Adding contents note to title search
  • 33. Adding contents note to title search
  • 34. Fixing “bad” data  Old records lacking proper indicators  Main author (100 tag) with invalid indicators (1st indicator blank or |)  Old records lacking subfield coding  Uniform title (240 tag) missing subfields ($k)  ISBNS with hyphens  008 with invalid data in first 6 characters  Blanks or letters instead of record creation date
  • 39. Complex changes  Tweaking delivery of online journals  Delivery using SFX  Exclude serials that no longer have an online holding but record still coded as online  Exclude government serials  Exclude public microdata files  Exclude databases (integrating resources)
  • 40. Tweaking delivery of online journals
  • 41. Tweaking delivery of online journals
  • 42. Tweaking delivery of online journals
  • 43. Consortial issues  Restricting online resources to individual institutions  Which URL should be presented?  Should restrictions be presented?  Coping with shared locations  e.g. GWINTER = Internet resource shared by Guelph and Waterloo but not Wilfrid Laurier  Instead of 2 separate locations UGINTER and UWINTER  Creating search scopes for colleges and campuses  e.g. ability to limit search to Architecture materials
  • 46. Restricting Online Resources  Problem 1:  Which link belongs to which institution?  Otherwise will simply present the first URL in the record  Need to add $$I based on ownership  Location code isn’t extracted with the 856  Problem 2:  Restricting the each link to each institution  Otherwise will give Online access message to users who do not have access  Need to add restricted delivery scope
  • 47. Specify Institution for Online Resources
  • 51. Limiting searches to a location
  • 52. Limiting searches to a location
  • 53. Limiting searches to a location
  • 54. Lessons learned the hard way  If checking that a tag exists, need to also include subfields
  • 55. Lessons learned the hard way  If writing more than one value, need more than one rule Result: $$I GUELPH
  • 57. Match current, match any? Problem: includes title from 245, author from 100, publisher from 260 etc.
  • 58. Match current, match any? Match any If any of the 880 tags have $6 505 then copy the 880 tag as is. This means that if any of the 880s tag meet this requirement, it will copy all of the 880 tags. Match current: just analyse them one at a time and only copy the one that meets the condition
  • 59. Testing  Staging database  What am I testing?  Testing sample records  Testing the process  Random testing
  • 60. Test specific changes  Changes to normalization rules  Changes to front end display  Changes to tables (e.g. new location codes)  New release enhancements/bug fixes Look for:  What you were expecting  Note any surprises!
  • 61. Staging database  Holds 200,000 records  Random sample of our collection  100 titles from each location code  Random sample proportionate to records held by each institution  Combination of old pre-TRELLIS records and newly created records  Shakespeare call number range to test grouping of editions (FRBR)
  • 62. Test records  100 records for repeated testing in front end  Brief records (acquisitions, e-reserves, CODOC)  Different formats (micro, music, video, electronic)  Things to test holdings info (acc. material, multi-volume, multiple items, multiple locations)  Foreign language materials  Duplicates  Editions  10 records  For immediate test of normalization rules in back office
  • 63. Test specific functionality  Fulfillment cycle  When the user finds the item that he wants, can he actually get the item based on:  The information presented in the results screen  The information presented in the full display  The linking provided  The information presented in the holdings display
  • 64. Testing: online resources  What am I testing? (what do I want to happen)  Is online availability showing correctly in relation to the user?  Online access  Online access is restricted  Physical resource
  • 65. Testing: online resources  What am I testing? (what do I want to happen)  Does user receive a relevant link?  SFX delivery  Direct to resource  Link appropriate to institution
  • 66. Testing: online resources  What am I testing? (what do I want to happen)  Can user view alternate/multiple links in the full record display?
  • 67. Testing: Online resources  Variables for sample records  E-journals, e-books, e-data, databases  SFX delivery, online delivery  Multi-volume sets  Restricted to one institution  Different institutions, different providers  Online for one institution, physical for another
  • 68. Testing: online resources  Instructions for testers:  Each tester should check the list of records and verify that there is an online link  For each record test that the link label is correct:  Available online: check against TRELLIS to verify that your institution has an online holding  Online access is restricted: check against TRELLIS to verify that your institution does NOT have an online holding  Link takes you to the correct place
  • 69. Testing: online resources  Random sampling  Each tester should also do a search on a subject of their choice and verify links using the first page of results
  • 70. Testing environment  Test in all views  Waterloo, Laurier, Guelph  Test in different IP ranges  Test off campus
  • 72. Testing: feedback from users Overall, I am VERY impressed with Primo. It is far more functional in many ways. When I find an online journal, the Click here for access link does not work I wonder if the alerts by email are working? Let me login using my UWDIR id
  • 73. Changing Data  Thank you! Alison Hitchens Cataloguing & Metadata Librarian University of Waterloo Library ahitchen@library.uwaterloo.ca http://www.lib.uwaterloo.ca