SlideShare a Scribd company logo
Linked Data: Uses and Users
Gretchen Gueguen
Data Services Coordinator
Digital Public Library of America
Linked Data: Uses and Users
Linked Data: Uses and Users
User Study Meta Analysis
The Metadata Is the Interface:
Better Description for Better Discovery of
Archives and Special
Collections, Synthesized from User Studies
Jennifer Schaffner, 2009. Report produced by OCLC
Research. Published online at:
http://www.oclc.org/programs/publications/reports/2
009-06.pdf
The Metadata IS the Interface
1. You can’t really do anything if the
data doesn’t support it. *
The Metadata IS the Interface
2. Users want to search on their own (not
with the help of librarians)
3. GOOGLE
The Metadata IS the Interface
4. Content is more important than
format
– Subject and named entry search
(though unstructured)
– Known item searching is rare
– Ambiguity between about-ness and of-
ness
The Metadata IS the Interface
5. Comprehensiveness is assumed
The Metadata IS the Interface
6. Users will scan and scroll if motivated
Linked Data to the Rescue
• Integration of resources from multiple
sources
• Better disambiguation
• Use of named entities
• Graph-based relevance
Linked Data: Uses and Users
Linked Data: Uses and Users
Define the Use Cases
One of the key difficulties in creating LD and
making it available is in defining the use cases
that make sense and will have value to the
community.
– Erik Mitchell, Library Technology Reports 2016
Linked Data: Uses and Users
Linked Data: Uses and Users
Linked Data: Uses and Users
Linked Data: Uses and Users
Linked Data: Uses and Users
Google Knowledge Graph
• Schema.org / JSON-LD
Creating Linked Data for Users
• The potential is there…but is the data?
• Specific linked data implementation
strategies and opportunities
– Embedding URIs
– Reconciling and enhancing with
matched URIs
– Schema.org
– Create entities when appropriate
The DPLA experience
• Created metadata model suitable for
Linked Data
• Developed a metadata ingestion
system using Linked Data Platform
Learning Opportunity
• Significant performance issues:
– 90 hours to simple map 500K records
• Lesson: create LD at the end of the
ingestion process
The fact that LAM institutions are still having to
select triplestores, SPARQL engines, indexing
platforms, and other services means that
there is still a relatively high bar for institutions
to cross in taking up LD projects.
-Mitchell, 2016
My $.02?
• Invest in creating good data
• Be adaptable, because the building blocks
of the technologies will change
• Keep users in mind when selecting priorities;
Create and publish your use cases
• Make subtle (or bigger) changes where and
when you can
• Test and prototype
THANKS!!
Gretchen Gueguen
Data Services Coordinator
gretchen@dp.la
Matthew Roberts in His Army Uniform
Tarrant County College NE, Heritage
Room via
the Portal to Texas History

More Related Content

PPTX
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...
PDF
Brooking Ingesting Metadata - FINAL
PPTX
RDAP 16: How do we know where to grow? Assessing Research Data Services at th...
PPTX
PPTX
Web scale discovery tools
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...
Brooking Ingesting Metadata - FINAL
RDAP 16: How do we know where to grow? Assessing Research Data Services at th...
Web scale discovery tools

What's hot (20)

PPT
Linked library data
PDF
Gateway to Oklahoma History Case Study: Structured Data and Metadata Evaluati...
PPSX
Web scale discovery service
PPTX
NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...
PPT
Bourne RDAP11 Data Publication Repositories
PPT
Web Information Extraction for the DB Research Domain
PPT
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
PPTX
NCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic Data
PPTX
Web scale discovery vs google scholar
DOCX
Open source search engine
PDF
Web Scale Discovery Services: Google like search experience
PPTX
Role of Cataloger in the 21st Century Academic Library
PPTX
Getting on the Same Page: Aligning ERM and LIbGuides Content
PPTX
PSI-MI stadards
PPTX
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...
PPTX
Web Information Extraction for the Database Research Domain
PPTX
The benefits of using Crossref metadata for libraries and scientists - Crossr...
Linked library data
Gateway to Oklahoma History Case Study: Structured Data and Metadata Evaluati...
Web scale discovery service
NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...
Bourne RDAP11 Data Publication Repositories
Web Information Extraction for the DB Research Domain
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
NCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic Data
Web scale discovery vs google scholar
Open source search engine
Web Scale Discovery Services: Google like search experience
Role of Cataloger in the 21st Century Academic Library
Getting on the Same Page: Aligning ERM and LIbGuides Content
PSI-MI stadards
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...
Web Information Extraction for the Database Research Domain
The benefits of using Crossref metadata for libraries and scientists - Crossr...
Ad

Similar to Linked Data: Uses and Users (20)

PPTX
Semantic Similarity and Selection of Resources Published According to Linked ...
PDF
Hansen Metadata for Institutional Repositories
PDF
Linked Data for the Masses: The approach and the Software
PPTX
Describing Theses and Dissertations Using Schema.org
PPTX
Open University Data
PDF
DMPTool Webinar 11: Complementary Tools
PDF
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
PDF
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
PDF
Metadata 2020 Vivo Conference 2018
PDF
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
PPTX
Introduction to APIs and Linked Data
PPTX
NISO Plus: Data Discovery and Reuse: AI Solutions & the Human Factor
PDF
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
PPTX
Linked open data project
PPTX
Building a national digital library repository – and freeing / Terry Reese
PPTX
Pemanfaatan Big Data Dalam Riset 2023.pptx
PPTX
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
PDF
Engaging Information Professionals in the Process of Authoritative Interlinki...
PDF
DMPTool for IMLS #WebWise14
PDF
Democratizing Data within your organization - Data Discovery
Semantic Similarity and Selection of Resources Published According to Linked ...
Hansen Metadata for Institutional Repositories
Linked Data for the Masses: The approach and the Software
Describing Theses and Dissertations Using Schema.org
Open University Data
DMPTool Webinar 11: Complementary Tools
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Metadata 2020 Vivo Conference 2018
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Introduction to APIs and Linked Data
NISO Plus: Data Discovery and Reuse: AI Solutions & the Human Factor
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
Linked open data project
Building a national digital library repository – and freeing / Terry Reese
Pemanfaatan Big Data Dalam Riset 2023.pptx
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Engaging Information Professionals in the Process of Authoritative Interlinki...
DMPTool for IMLS #WebWise14
Democratizing Data within your organization - Data Discovery
Ad

More from Gretchen Gueguen (11)

PPTX
DPLA Archival Description Working Group Update
PPTX
Data Quality at the Scale of Aggregation
PPTX
DPLA's Archival Description Working Group Update
PPT
Collecting in the Moment
PPTX
Do Digital Archivists Dream of Electronic Records
PPT
Capturing the Zeitgeist
PPT
Just keep clicking Till You Find It: Building a Library Digital Collection In...
PPT
National History Day Projects
PPT
The Daily Reflector Image Collection: Best Practices in the Classroom
PPTX
Seeds Of Change Technical Implementation
PPT
Crowdsourcing Digitization: Harnessing Workflows to Increase Output
DPLA Archival Description Working Group Update
Data Quality at the Scale of Aggregation
DPLA's Archival Description Working Group Update
Collecting in the Moment
Do Digital Archivists Dream of Electronic Records
Capturing the Zeitgeist
Just keep clicking Till You Find It: Building a Library Digital Collection In...
National History Day Projects
The Daily Reflector Image Collection: Best Practices in the Classroom
Seeds Of Change Technical Implementation
Crowdsourcing Digitization: Harnessing Workflows to Increase Output

Recently uploaded (20)

PDF
RMMM.pdf make it easy to upload and study
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Classroom Observation Tools for Teachers
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
RMMM.pdf make it easy to upload and study
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Classroom Observation Tools for Teachers
GDM (1) (1).pptx small presentation for students
Chinmaya Tiranga quiz Grand Finale.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Final Presentation General Medicine 03-08-2024.pptx
A systematic review of self-coping strategies used by university students to ...
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Anesthesia in Laparoscopic Surgery in India
Final Presentation General Medicine 03-08-2024.pptx
202450812 BayCHI UCSC-SV 20250812 v17.pptx
human mycosis Human fungal infections are called human mycosis..pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf

Linked Data: Uses and Users

  • 1. Linked Data: Uses and Users Gretchen Gueguen Data Services Coordinator Digital Public Library of America
  • 4. User Study Meta Analysis The Metadata Is the Interface: Better Description for Better Discovery of Archives and Special Collections, Synthesized from User Studies Jennifer Schaffner, 2009. Report produced by OCLC Research. Published online at: http://www.oclc.org/programs/publications/reports/2 009-06.pdf
  • 5. The Metadata IS the Interface 1. You can’t really do anything if the data doesn’t support it. *
  • 6. The Metadata IS the Interface 2. Users want to search on their own (not with the help of librarians)
  • 8. The Metadata IS the Interface 4. Content is more important than format – Subject and named entry search (though unstructured) – Known item searching is rare – Ambiguity between about-ness and of- ness
  • 9. The Metadata IS the Interface 5. Comprehensiveness is assumed
  • 10. The Metadata IS the Interface 6. Users will scan and scroll if motivated
  • 11. Linked Data to the Rescue • Integration of resources from multiple sources • Better disambiguation • Use of named entities • Graph-based relevance
  • 14. Define the Use Cases One of the key difficulties in creating LD and making it available is in defining the use cases that make sense and will have value to the community. – Erik Mitchell, Library Technology Reports 2016
  • 20. Google Knowledge Graph • Schema.org / JSON-LD
  • 21. Creating Linked Data for Users • The potential is there…but is the data? • Specific linked data implementation strategies and opportunities – Embedding URIs – Reconciling and enhancing with matched URIs – Schema.org – Create entities when appropriate
  • 22. The DPLA experience • Created metadata model suitable for Linked Data • Developed a metadata ingestion system using Linked Data Platform
  • 23. Learning Opportunity • Significant performance issues: – 90 hours to simple map 500K records • Lesson: create LD at the end of the ingestion process The fact that LAM institutions are still having to select triplestores, SPARQL engines, indexing platforms, and other services means that there is still a relatively high bar for institutions to cross in taking up LD projects. -Mitchell, 2016
  • 24. My $.02? • Invest in creating good data • Be adaptable, because the building blocks of the technologies will change • Keep users in mind when selecting priorities; Create and publish your use cases • Make subtle (or bigger) changes where and when you can • Test and prototype
  • 25. THANKS!! Gretchen Gueguen Data Services Coordinator gretchen@dp.la Matthew Roberts in His Army Uniform Tarrant County College NE, Heritage Room via the Portal to Texas History

Editor's Notes

  • #2: Hi, I’m Gretchen, Data Services Coordinator at the Digital Public Library of America. My role is to oversee metadata mapping and quality control. I work with our partner institutions all over the United States to help them prepare their data for harvest by DPLA. This is going to be a bit of a counterpoint to our keynote this morning. I would like to ask you to think of the unskilled users of linked data, not the developers. While I agree that the developers need to be able to use the data to make stuff, without a broad base of people able to use it, our future viability is limited. This afternoon I’m not actually going talk about DPLA, I’ve been asked to talk about users of linked data: what are the practical benefits to them?
  • #3: Thought quite a lot about how to approach this question. I’ve been asked a number of times to talk about how DPLA is using and plans to use linked data, but ironically, I’m not usually asked WHY…or at least not to talk about explicitly how this would be of benefit to our audience. So I thought I would start with User Studies in library research (this is an image of the front of the library at the university where I got my degree). These are usually research studies into how people seek and use information. Librarians have published hundreds of studies on how users behave when seeking information. I thought that we could start by articulating how people do research online and what they need to be successful. So what do people do with libraries?
  • #4: Okay, maybe not that exactly.
  • #5: I thought I would start with a 2009 study published by OCLC research (which is a big library services corporation) that specifically synthesized the results of nearly 100 such studies on the users information seeking behavior when using online resources. I’m pretty familiar with this paper, I assign it to my own students in their digital library course. The report was titled “The metadata IS the interface” (which is a great title). So the finding of this study show a couple of different things:
  • #6: First: an acknowledgment that you can’t really do anything without the data to support what you want to do — We’ll come back to this later, but for now, let’s just use that as foreground.
  • #7: Next, they found that users really don’t want to have to go through library staff or get help really at all. They want to work on their own to figure it out. That might seem obvious, but it’s worth the mentioning, because of the implications: users won’t probably read directions, use help screen unless they encounter a problem, and generally expect things to be really intuitive
  • #8: Third, Users really rely on generalized, google-style search, at least to start I’m sure you are astounded by this revelation But the takeaway is that they do this because They know it It appears simple Relevance ranking It’s a single jumping off point
  • #9: The research showed that people tended to search for subjects and named entities (people and places, for example). They didn’t search for types of content and only did known-item searching (looking for a specific title) within domains they were really familiar with. While users want to find subjects, they generally search using keyword techniques, rather than by using structured terminology. This means that they could end up being confused when terms showed up for both what something was about vs. what it is. Think of the situation of searching for research about an author, vs. works by that author. One of those is a subject and the other is not.
  • #10: Comprehensiveness is assumed by users. Here’s a quote from one of the studies: A student in the Maryland study expected that “the universe of primary sources is a finite, absolute body of material that can and has been already labeled and categorized for him.” Yikes. All kinds of implications for interfaces, but also possibly a strong indicator of where linked data could be beneficial (as in to link in more resources)
  • #11: Finally, users will scan and scroll through long lists, but really only when they are motivated. There is contradictory research that notes that a lot of users complain about the proverbial “wall of text” , but the report says “users mostly care more about what is in the collections” .. A wall of text isn’t a wall when it’s what you are looking for
  • #12: In some ways, this is good news. Linked data is a really good way to meet a lot of these needs. For example, linked data can aid in the integration of resources from multiple sources, helps with disambiguation and the indentification of named entities. In addition, using the graph can help with relevance, which is a big part of those assumptions users have about comprehensiveness and using google. And this is just a few ideas really, the sky is kind of the limit, especially when it comes to integration. However, building interfaces that really exploit these things seems to be in the very early stages. I spent some time trying to find really killer apps using linked data to prep for this talk. What I found was mostly like this:
  • #13: It’s linked Data!
  • #14: But maybe it isn’t the most user friendly resource. My guiding principle for trying to find things that I thought were examples of linked data really benefiting users was “would my Mom think this was cool? Would she be able to use it?” Furthermore, I was recently involved in a large number of interviews with managers of repositories for the planning of a new repository product based on the hydra technology stack. We asked them all how important linked data was to them, and virtually no one cared. Most said what they had heard about it seemed cool, but they hadn’t seen anything that made it seem really useful or necessary for digital libraries.
  • #15: I liked this quote I read in a recent issue of a publication called Library Technology Reports: “One of the key difficulties in creating LD and making it available is in defining the use cases that make sense and will have value to the community. Mitchel goes on to say in addition: Publishing data in some serialization of RDF is not especially useful or interesting if it does not capitalize on links to other datasets or provide new opportunities for computational analysis of data.” A lot of our projects at this point really only offer an API which doesn’t exactly make for intuitive google-like search experience. I think a big reason for that is that we are still in the process of creating the data and integrating it with other data sources. This is not to say that there hasn’t been any progress. What I’ve found is that there are actually a lot of small and subtle enhancements to traditional interfaces. And I think the best ones answer some of those needs that we just learned about from our user studies For example at DPLA we have built two features that utilize linked data to enhance the search tools we already have.
  • #16: The first is in create our map-based search interface.
  • #17: This is a look at a record for an item with some of our enhanced geographic data. The data itself is only visible in the direct view of the record through the API, but you can see that there are geographic coordinates there. We take string data supplied by our partners and try to match it against various sources. We then enhance those records with linked data URIs as well as coordinates taken from those endpoints, which in turn drive that map browse. This is a novel kind of interface, and I believe, helps users with relevance determination.
  • #18: Additionally, we are working now on integrating data from rightsstatements.org into our item records. This will allow some more robust data to display about rights status in these records while only needing to store the URI, but to be honest, the work here is mainly around simplifying and standardizing rights language. However, being able to populate the records with information from the rightsstatements ontology creates more accurate usable records.
  • #19: There are also some more specialized topical resources that do the linked data thing pretty well. Linked Jazz creates it’s own web of linkages between jazz artists using RDF triples, the building block of linked data. The seed of the project are oral history interviews which are mined for names that are referenced with URIs from other linked data sources like DBPedia and VIAF. It’s an example of an innovative interface, but also of incorporating those multiple data sources into a informative web of data. As you can see in this screenshot, both the transcript text and the wikipedia sources are integrated into the resource, while information from those sources further inform the shape of the web of relationships between the artists. This is also a great usage of the idea that users will scan and scroll if they are motivated. The interface itself doesn’t have to be delved into deeper to get a sense of what it is, if we go back and take a look at that full network but it invites deep research. And obviously, it also integrates a lot of different information resources into one interface.
  • #20: Another example of a more fully featured interface is from NYPL labs. They have started to build an entire digital library repository based around linked data entities like Agents (people and groups) and specific media resources. This is a great way to meet that need we know of for searching for resources based on named entities. This is an experimental demo application, but the concepts are being integrated into the main library search. I’ve done a search here for Cervantes. You can see I’ve got some interesting facets based on the kinds of things that can be enriched through linked data, like roles The entry itself brings together some different types of holdings and records at NYPL … in this case there are not a lot of images from their digital collections that are digitized, but I can choose to see results for different types of holdings, for example, notated music here. I can also use this linked data to help disambiguate those resources by Cervantes from those resources about him. Further, I can see here some of the external linked data sources used. I think this is a great example of how interfaces using linked data can be greatly enhanced along the lines of what we know users want and how they search.
  • #21: But the research also told us that users aren’t necessarily coming to us, they are going to google as a standard jumping off point. This means it is also crucial that we get included into Google’s Knowledge Graph and linked data really does help us with that. While we have our own library ontologies and standards, we can’t ignore the ones, like schema.org, that google pays attention to. By integrating this particular kind of linked data into ours, we can potentially reach a much larger audience.
  • #22: I want to end by talking about what our challenges are in creating these and even better resources for users. To come back to one of the first things the OCLC study noted: metadata needs to be there to support the interfaces we want. We need to create linked data to enhance the user experience, not just to make our own jobs easier. We are obviously at early stages with linked data and a lot of us are more focused on getting it in there than in doing much with it. Our challenge is to get the data there and we know that to do that we need to: Embed URIs Reconcile and enhance our data through the use of them Implement schema.org or other tools that help Google get us in the knowledge graph And create more linked data and entities when we can
  • #23: Sounds great, right. Straightforward. Well, maybe the data creation part is, but the other parts are a bit trickier From my own experience at DPLA we have stumbled on some of these steps We first developed a metadata model that takes advantage of linked data, incorporates a lot of different linked data properties that can be used for exact and closeMatch URIs, for example. We then started to build an ingestion system that uses Linked Data Platform, which uses http protocols to create data in a native triplestore (more or less …. Caveat, I’m not a programmer). We called it Heidrun, which is the name of a Norse mythological goat, and I could do a whole other presentation on why we named it that, but I won’t go into it now.
  • #24: However, we have experienced significant performance issues with this system. It was taking something like 90 hours to just create 500,000 records, that’s not including time to enhance them through other linked data endpoints (out of nearly 15 million that is too long). So we’ve actually decided that we are going to create records using more traditional methods, python scripts for example, and then create linked data to be made available at the end of the mapping process. I’m referring to this whole experience as a learning opportunity. And the fact of the matter is that most of us are still in this stage. Again, from the Library Technology Reports issue I mentioned earlier, Mitchell says : LAM institutions that seek to deploy LD applications are often exploring technical platforms and making localized decisions about the best systems to select. While systems do not need to be identical—in fact, it is advantageous for them to not be identical—the fact that LAM institutions are still having to select triplestores, SPARQL engines, indexing platforms, and other services means that there is still a relatively high bar for institutions to cross in taking up LD projects. So, we, as a community are still working out the landscape and we have to acknowledge that and be willing to change course.
  • #25: So my take on the challenges ahead for us in making linked data work for users are: Invest in creating good data But, be adaptable, because the building blocks of the technologies will change Keep users in mind when selecting priorities. And create and publish your use cases, because we all need them in order to build good services and to justify the expense and work and the “learning opportunities” And finally Make subtle (or bigger) changes where and when you can The way you do that is to test out changes and create prototypes that don’t have to be all or nothing solutions.