Linked Data: Uses and Users

Download as PPT, PDF

1 like244 views

This document summarizes key findings from a 2009 user study on metadata and discovery of archives and special collections. It discusses that users prefer to search independently without librarian help, value content over format, and will scan results if motivated. Linked data is presented as a solution to better integrate and disambiguate resources through named entities and graph-based relevance. The document also discusses challenges in defining useful linked data use cases and lessons from the DPLA experience in creating a linked data model and ingestion system. It emphasizes investing in good data, being adaptable, keeping users in mind, making incremental changes, and testing prototypes.

Education

Linked Data: Uses and Users
Gretchen Gueguen
Data Services Coordinator
Digital Public Library of America

User Study Meta Analysis
The Metadata Is the Interface:
Better Description for Better Discovery of
Archives and Special
Collections, Synthesized from User Studies
Jennifer Schaffner, 2009. Report produced by OCLC
Research. Published online at:
http://www.oclc.org/programs/publications/reports/2
009-06.pdf

The Metadata IS the Interface
1. You can’t really do anything if the
data doesn’t support it. *

The Metadata IS the Interface
2. Users want to search on their own (not
with the help of librarians)

The Metadata IS the Interface
4. Content is more important than
format
– Subject and named entry search
(though unstructured)
– Known item searching is rare
– Ambiguity between about-ness and of-
ness

The Metadata IS the Interface
5. Comprehensiveness is assumed

The Metadata IS the Interface
6. Users will scan and scroll if motivated

Linked Data to the Rescue
• Integration of resources from multiple
sources
• Better disambiguation
• Use of named entities
• Graph-based relevance

Define the Use Cases
One of the key difficulties in creating LD and
making it available is in defining the use cases
that make sense and will have value to the
community.
– Erik Mitchell, Library Technology Reports 2016

Google Knowledge Graph
• Schema.org / JSON-LD

Creating Linked Data for Users
• The potential is there…but is the data?
• Specific linked data implementation
strategies and opportunities
– Embedding URIs
– Reconciling and enhancing with
matched URIs
– Schema.org
– Create entities when appropriate

The DPLA experience
• Created metadata model suitable for
Linked Data
• Developed a metadata ingestion
system using Linked Data Platform

Learning Opportunity
• Significant performance issues:
– 90 hours to simple map 500K records
• Lesson: create LD at the end of the
ingestion process
The fact that LAM institutions are still having to
select triplestores, SPARQL engines, indexing
platforms, and other services means that
there is still a relatively high bar for institutions
to cross in taking up LD projects.
-Mitchell, 2016

My $.02?
• Invest in creating good data
• Be adaptable, because the building blocks
of the technologies will change
• Keep users in mind when selecting priorities;
Create and publish your use cases
• Make subtle (or bigger) changes where and
when you can
• Test and prototype

THANKS!!
Gretchen Gueguen
Data Services Coordinator
gretchen@dp.la
Matthew Roberts in His Army Uniform
Tarrant County College NE, Heritage
Room via
the Portal to Texas History

Linked Data: Uses and Users

1. Linked Data: Uses and Users Gretchen Gueguen Data Services Coordinator Digital Public Library of America

4. User Study Meta Analysis The Metadata Is the Interface: Better Description for Better Discovery of Archives and Special Collections, Synthesized from User Studies Jennifer Schaffner, 2009. Report produced by OCLC Research. Published online at: http://www.oclc.org/programs/publications/reports/2 009-06.pdf

5. The Metadata IS the Interface 1. You can’t really do anything if the data doesn’t support it. *

6. The Metadata IS the Interface 2. Users want to search on their own (not with the help of librarians)

7. 3. GOOGLE

8. The Metadata IS the Interface 4. Content is more important than format – Subject and named entry search (though unstructured) – Known item searching is rare – Ambiguity between about-ness and of- ness

9. The Metadata IS the Interface 5. Comprehensiveness is assumed

10. The Metadata IS the Interface 6. Users will scan and scroll if motivated

11. Linked Data to the Rescue • Integration of resources from multiple sources • Better disambiguation • Use of named entities • Graph-based relevance

14. Define the Use Cases One of the key difficulties in creating LD and making it available is in defining the use cases that make sense and will have value to the community. – Erik Mitchell, Library Technology Reports 2016

20. Google Knowledge Graph • Schema.org / JSON-LD

21. Creating Linked Data for Users • The potential is there…but is the data? • Specific linked data implementation strategies and opportunities – Embedding URIs – Reconciling and enhancing with matched URIs – Schema.org – Create entities when appropriate

22. The DPLA experience • Created metadata model suitable for Linked Data • Developed a metadata ingestion system using Linked Data Platform

23. Learning Opportunity • Significant performance issues: – 90 hours to simple map 500K records • Lesson: create LD at the end of the ingestion process The fact that LAM institutions are still having to select triplestores, SPARQL engines, indexing platforms, and other services means that there is still a relatively high bar for institutions to cross in taking up LD projects. -Mitchell, 2016

24. My $.02? • Invest in creating good data • Be adaptable, because the building blocks of the technologies will change • Keep users in mind when selecting priorities; Create and publish your use cases • Make subtle (or bigger) changes where and when you can • Test and prototype

25. THANKS!! Gretchen Gueguen Data Services Coordinator gretchen@dp.la Matthew Roberts in His Army Uniform Tarrant County College NE, Heritage Room via the Portal to Texas History

Editor's Notes

#2: Hi, I’m Gretchen, Data Services Coordinator at the Digital Public Library of America. My role is to oversee metadata mapping and quality control. I work with our partner institutions all over the United States to help them prepare their data for harvest by DPLA. This is going to be a bit of a counterpoint to our keynote this morning. I would like to ask you to think of the unskilled users of linked data, not the developers. While I agree that the developers need to be able to use the data to make stuff, without a broad base of people able to use it, our future viability is limited. This afternoon I’m not actually going talk about DPLA, I’ve been asked to talk about users of linked data: what are the practical benefits to them?
#3: Thought quite a lot about how to approach this question. I’ve been asked a number of times to talk about how DPLA is using and plans to use linked data, but ironically, I’m not usually asked WHY…or at least not to talk about explicitly how this would be of benefit to our audience. So I thought I would start with User Studies in library research (this is an image of the front of the library at the university where I got my degree). These are usually research studies into how people seek and use information. Librarians have published hundreds of studies on how users behave when seeking information. I thought that we could start by articulating how people do research online and what they need to be successful. So what do people do with libraries?
#4: Okay, maybe not that exactly.
#5: I thought I would start with a 2009 study published by OCLC research (which is a big library services corporation) that specifically synthesized the results of nearly 100 such studies on the users information seeking behavior when using online resources. I’m pretty familiar with this paper, I assign it to my own students in their digital library course. The report was titled “The metadata IS the interface” (which is a great title). So the finding of this study show a couple of different things:
#6: First: an acknowledgment that you can’t really do anything without the data to support what you want to do — We’ll come back to this later, but for now, let’s just use that as foreground.
#7: Next, they found that users really don’t want to have to go through library staff or get help really at all. They want to work on their own to figure it out. That might seem obvious, but it’s worth the mentioning, because of the implications: users won’t probably read directions, use help screen unless they encounter a problem, and generally expect things to be really intuitive
#8: Third, Users really rely on generalized, google-style search, at least to start I’m sure you are astounded by this revelation But the takeaway is that they do this because They know it It appears simple Relevance ranking It’s a single jumping off point
#9: The research showed that people tended to search for subjects and named entities (people and places, for example). They didn’t search for types of content and only did known-item searching (looking for a specific title) within domains they were really familiar with. While users want to find subjects, they generally search using keyword techniques, rather than by using structured terminology. This means that they could end up being confused when terms showed up for both what something was about vs. what it is. Think of the situation of searching for research about an author, vs. works by that author. One of those is a subject and the other is not.
#10: Comprehensiveness is assumed by users. Here’s a quote from one of the studies: A student in the Maryland study expected that “the universe of primary sources is a finite, absolute body of material that can and has been already labeled and categorized for him.” Yikes. All kinds of implications for interfaces, but also possibly a strong indicator of where linked data could be beneficial (as in to link in more resources)
#11: Finally, users will scan and scroll through long lists, but really only when they are motivated. There is contradictory research that notes that a lot of users complain about the proverbial “wall of text” , but the report says “users mostly care more about what is in the collections” .. A wall of text isn’t a wall when it’s what you are looking for
#12: In some ways, this is good news. Linked data is a really good way to meet a lot of these needs. For example, linked data can aid in the integration of resources from multiple sources, helps with disambiguation and the indentification of named entities. In addition, using the graph can help with relevance, which is a big part of those assumptions users have about comprehensiveness and using google. And this is just a few ideas really, the sky is kind of the limit, especially when it comes to integration. However, building interfaces that really exploit these things seems to be in the very early stages. I spent some time trying to find really killer apps using linked data to prep for this talk. What I found was mostly like this:
#13: It’s linked Data!
#14: But maybe it isn’t the most user friendly resource. My guiding principle for trying to find things that I thought were examples of linked data really benefiting users was “would my Mom think this was cool? Would she be able to use it?” Furthermore, I was recently involved in a large number of interviews with managers of repositories for the planning of a new repository product based on the hydra technology stack. We asked them all how important linked data was to them, and virtually no one cared. Most said what they had heard about it seemed cool, but they hadn’t seen anything that made it seem really useful or necessary for digital libraries.
#15: I liked this quote I read in a recent issue of a publication called Library Technology Reports: “One of the key difficulties in creating LD and making it available is in defining the use cases that make sense and will have value to the community. Mitchel goes on to say in addition: Publishing data in some serialization of RDF is not especially useful or interesting if it does not capitalize on links to other datasets or provide new opportunities for computational analysis of data.” A lot of our projects at this point really only offer an API which doesn’t exactly make for intuitive google-like search experience. I think a big reason for that is that we are still in the process of creating the data and integrating it with other data sources. This is not to say that there hasn’t been any progress. What I’ve found is that there are actually a lot of small and subtle enhancements to traditional interfaces. And I think the best ones answer some of those needs that we just learned about from our user studies For example at DPLA we have built two features that utilize linked data to enhance the search tools we already have.
#16: The first is in create our map-based search interface.
#17: This is a look at a record for an item with some of our enhanced geographic data. The data itself is only visible in the direct view of the record through the API, but you can see that there are geographic coordinates there. We take string data supplied by our partners and try to match it against various sources. We then enhance those records with linked data URIs as well as coordinates taken from those endpoints, which in turn drive that map browse. This is a novel kind of interface, and I believe, helps users with relevance determination.
#18: Additionally, we are working now on integrating data from rightsstatements.org into our item records. This will allow some more robust data to display about rights status in these records while only needing to store the URI, but to be honest, the work here is mainly around simplifying and standardizing rights language. However, being able to populate the records with information from the rightsstatements ontology creates more accurate usable records.
#19: There are also some more specialized topical resources that do the linked data thing pretty well. Linked Jazz creates it’s own web of linkages between jazz artists using RDF triples, the building block of linked data. The seed of the project are oral history interviews which are mined for names that are referenced with URIs from other linked data sources like DBPedia and VIAF. It’s an example of an innovative interface, but also of incorporating those multiple data sources into a informative web of data. As you can see in this screenshot, both the transcript text and the wikipedia sources are integrated into the resource, while information from those sources further inform the shape of the web of relationships between the artists. This is also a great usage of the idea that users will scan and scroll if they are motivated. The interface itself doesn’t have to be delved into deeper to get a sense of what it is, if we go back and take a look at that full network but it invites deep research. And obviously, it also integrates a lot of different information resources into one interface.
#20: Another example of a more fully featured interface is from NYPL labs. They have started to build an entire digital library repository based around linked data entities like Agents (people and groups) and specific media resources. This is a great way to meet that need we know of for searching for resources based on named entities. This is an experimental demo application, but the concepts are being integrated into the main library search. I’ve done a search here for Cervantes. You can see I’ve got some interesting facets based on the kinds of things that can be enriched through linked data, like roles The entry itself brings together some different types of holdings and records at NYPL … in this case there are not a lot of images from their digital collections that are digitized, but I can choose to see results for different types of holdings, for example, notated music here. I can also use this linked data to help disambiguate those resources by Cervantes from those resources about him. Further, I can see here some of the external linked data sources used. I think this is a great example of how interfaces using linked data can be greatly enhanced along the lines of what we know users want and how they search.
#21: But the research also told us that users aren’t necessarily coming to us, they are going to google as a standard jumping off point. This means it is also crucial that we get included into Google’s Knowledge Graph and linked data really does help us with that. While we have our own library ontologies and standards, we can’t ignore the ones, like schema.org, that google pays attention to. By integrating this particular kind of linked data into ours, we can potentially reach a much larger audience.
#22: I want to end by talking about what our challenges are in creating these and even better resources for users. To come back to one of the first things the OCLC study noted: metadata needs to be there to support the interfaces we want. We need to create linked data to enhance the user experience, not just to make our own jobs easier. We are obviously at early stages with linked data and a lot of us are more focused on getting it in there than in doing much with it. Our challenge is to get the data there and we know that to do that we need to: Embed URIs Reconcile and enhance our data through the use of them Implement schema.org or other tools that help Google get us in the knowledge graph And create more linked data and entities when we can
#23: Sounds great, right. Straightforward. Well, maybe the data creation part is, but the other parts are a bit trickier From my own experience at DPLA we have stumbled on some of these steps We first developed a metadata model that takes advantage of linked data, incorporates a lot of different linked data properties that can be used for exact and closeMatch URIs, for example. We then started to build an ingestion system that uses Linked Data Platform, which uses http protocols to create data in a native triplestore (more or less …. Caveat, I’m not a programmer). We called it Heidrun, which is the name of a Norse mythological goat, and I could do a whole other presentation on why we named it that, but I won’t go into it now.
#24: However, we have experienced significant performance issues with this system. It was taking something like 90 hours to just create 500,000 records, that’s not including time to enhance them through other linked data endpoints (out of nearly 15 million that is too long). So we’ve actually decided that we are going to create records using more traditional methods, python scripts for example, and then create linked data to be made available at the end of the mapping process. I’m referring to this whole experience as a learning opportunity. And the fact of the matter is that most of us are still in this stage. Again, from the Library Technology Reports issue I mentioned earlier, Mitchell says : LAM institutions that seek to deploy LD applications are often exploring technical platforms and making localized decisions about the best systems to select. While systems do not need to be identical—in fact, it is advantageous for them to not be identical—the fact that LAM institutions are still having to select triplestores, SPARQL engines, indexing platforms, and other services means that there is still a relatively high bar for institutions to cross in taking up LD projects. So, we, as a community are still working out the landscape and we have to acknowledge that and be willing to change course.
#25: So my take on the challenges ahead for us in making linked data work for users are: Invest in creating good data But, be adaptable, because the building blocks of the technologies will change Keep users in mind when selecting priorities. And create and publish your use cases, because we all need them in order to build good services and to justify the expense and work and the “learning opportunities” And finally Make subtle (or bigger) changes where and when you can The way you do that is to test out changes and create prototypes that don’t have to be all or nothing solutions.

Linked Data: Uses and Users

More Related Content

What's hot (20)

Similar to Linked Data: Uses and Users (20)

More from Gretchen Gueguen (11)

Recently uploaded (20)

Linked Data: Uses and Users

Editor's Notes