Linked Data in Production: Moving Beyond Ontologies

Moving Beyond Ontologies
Linked Data in Production
David Newbury
Assistant Director, Software and User Experience, Getty
CNI Spring Project Brieﬁngs, March 26, 2024

Thank you for inviting me to present today.
I am a technologist working in the cultural heritage sector.
I lead the public digital team, developing applications to support Getty’s mission.
2
Introduction: David Newbury

I work for Getty in Los Angeles, which is a
library/archive/museum/research center.
One of our major areas of digital leadership is
in the use of Linked Data for cultural
heritage.
3
Introduction: Getty

Linked Open Data is a set of technologies that
attempt to translate some of the best
practices of the Web for use with structured
data:
● The use of URLs as identiﬁers
● Networks of information, not tables
● Formal, shared standards for description
4
Introduction: What is Linked Open Data?

As it turns out, Linked Data is not wildly successful.
5
Introduction: Linked Data is (mostly) Dead.
But we all still talk about it a lot.

Linked Data’s appeal in cultural heritage is a
technological solution to a social problem:
Cheap storage, ubiquitous connectivity, and
search algorithms recontextualize the labor
behind cultural heritage data work.
6
Introduction: Provocation

Mass digitization, computational metadata
generation, and decades of catalouging mean
that our institutions have more data to provide
than we have the ability to provide context for.
Data overload and limited user attention are
the collections access problems of the next
decade.
7
Introduction: Cheap Storage
By User5515 - Own work, CC BY 3.0,
https://commons.wikimedia.org/w/index.php?curid=114332790

Our always-connected culture means that our
collections are increasingly seen as part of a
single digital ecosystem—
And the questions that are being asked require
information that extends beyond the
boundaries of any one institution.
8
Introduction: Ubiquitous Connectivity

And commercial tools have given users the
expectation that information is available for the
asking—
Eliding the labor and capital needed to create,
curate, and maintain that information.
9
Introduction: Search Algorithms

Linked Data has been seen as a solution:
It provide structures that manage the scale of data we create,
identiﬁers that maintain authority in a globally distributed environment,
and ontologies that enable complex data retrieval across datasets.
10
Introduction: The Magic Bullet

What came before:
Laying the Groundwork
11

In 2014, the Getty Vocabularies were
launched as Linked Data.
This, alongside the work at Yale Center for
British Art, Rijksmuseum, and the British
Museum, demonstrated the feasibility of
LOD within the museum community.
12
Laying the Groundwork: Getty Vocabularies

The archives of the Carnegie Museum of Art’s
Film Department launched in 2014.
The animating question was:
What would happen if you treated the
relationships between events, archival
material, people, and artwork as the
essential element, not the objects?
13
Laying the Groundwork: Carnegie Museum Archives

In 2017, the American Art Collaborative
launched.
It used these same principles to highlight
connections across 14 institutions and
152,000 items—using Getty’s Vocabularies as
a bridging structure between institutions.
14
Laying the Groundwork: American Art Collaborative

One of the most lasting outcomes of the
American Art Collaborative was Linked.Art,
the shared data model that connected
institutions.
15
Laying the Groundwork: Linked Art

What we’ve done:
Getty’s Digital Ecosystem
16

Getty has been doing Linked Data since 2014,
starting with the Getty Vocabularies.
It’s a thesaurus of concepts, people, and
places used for cataloging across many
institutions.
17
Getty’s Linked Data: Getty Vocabularies

Since then, we’ve moved most of our major
systems to use Linked Data—including our
archives…
18
Getty’s Linked Data: Archival Records

Since then, we’ve moved most of our major
systems to use Linked Data—including our
archives…
… and our museum collection.
19
Getty’s Linked Data: Archival Records

It’s used for onsite visitor experiences
via our audio guide…
20
Getty’s Linked Data: Audio Guide

It’s used for onsite visitor
experiences
via our audio guide…
…and to provide novel interfaces for
exploration of our materials.
21
Getty’s Linked Data: 12 Sunsets

It’s also used by third parties: both large, like
Google Arts & Culture…
22
Getty’s Linked Data: Google Arts & Culture

It’s also used by third parties: both large, like
Google Arts & Culture…
…and small, like this project by the Cultural
Ofﬁce of the Embassy of Spain.
23
Getty’s Linked Data: Spanish Art in the US

We’ve also built a complex, powerful digital
infrastructure to support this work—millions
of records in a single shared data model,
pulling from a wide collection of systems of
record.
24
Getty’s Linked Data: APIs

Under the hood:
Linked Data & the Everything API
25

26
Data Flow: How we wish it was
Staff Interface Public Website

Linked Data in Production: Moving Beyond Ontologies

These systems support people.
Digital infrastructure is designed to use computers to empower people to be
more effective at meeting the mission of the organization.
28
Getty’s Linked Data: User Needs

Catalogers need systems that match their
workﬂows—and different disciplines have
different needs.
Our infrastructure needed to not be tied to
any particular backend system.
29
Cataloging
System Support
Engineering
Access Research

Engineers need to get data in and out of
systems, using patterns and practices that
they already know how to use.
30
Cataloging
System Support
Engineering
Access Research

System admins just don’t want you to break
their stuff.
Pulling data out of systems on demand usually
breaks stuff.
31
Cataloging
System Support
Engineering
Access Research

Most end users are looking for content—they
want to learn what we know on a given topic.
This may be professional scholarship or it
might be looking for pictures—both are
examples of information-seeking behaviours.
32
Cataloging
System Support
Engineering
Access Research

And some researchers want to ﬁnd questions
that haven’t been asked before—to ﬁnd new
connections or patterns in the data that
others have overlooked.
33
Cataloging
System Support
Engineering
Access Research

Meeting the needs of catalogers is mostly not
my problem.
There are high-quality, professional tools that
work within the disciplinary training of the
ﬁeld.
34
Getty’s Linked Data: The LOD Gateway

Providing access to that data, though, often
requires recontextualization:
Changing the conceptual lens from one
focused on staff efﬁciencies to one focused on
user’s needs.
35

Doing so requires combining data from
multiple systems and multiple workﬂows into
a new record.
This combining—or linking—of data has
tradeoffs.
36

Imagine a record for the painting Irises.
37

And a second record, this one for Van Gogh.
38

These could be seen as two separate documents:
39
"@context":
"https://linked.art/ns/v1/linked-art.json",
"id": "person/1",
"type": "Person",
"identified_by": {
"id": "person/1/name",
"type": "Name",
"content": "Vincent Van Gogh"
}
"@context":
"id": "object/1",
"type": "HumanMadeObject",
"identified_by": {
"id": "object/1/name",
"type": "Name",
"content": "Irises"
},
"produced_by": {
"id": "object/1/production",
"carried_out_by": {"id":"person/1"}
}

Or as a single graph.
40

From the point of view of the data, these
are equivalent—they contain the same facts.
But from a usability perspective, they make
different things easy or hard.
41
"@context":
"https://linked.art/ns/v1/linked-
art.json",
"id": "person/1",
"type": "Person",
"identified_by": {
"id": "person/1/name",
"type": "Name",
"content": "Vincent
Van Gogh"
}
"@context":
"https://linked.art/ns/v1/linked-art.js
on",
"id": "object/1",
"identified_by": {
"type": "Name",
"content": "Irises"
},
"produced_by": {
}

Documents are optimized for Access:
They provide a speciﬁc set of data bundled
together by the data creator that provide all
the facts you need…given a speciﬁc context.
42
Documents: For Access and Discovery
"@context":
"id": "object/1",
"identified_by": {
"type": "Name",
"content": "Irises"
},
"produced_by": {
}

Graphs, alternately, are optimized for querying:
Allowing a user to deﬁne a speciﬁc context based
on novel criteria and returning that subset of
facts.
43
Graphs: For Queries

44
Imagine two Questions:
“What objects does Getty have that have images larger than
1200px on the longest side that have been exhibited in both New
York and Paris and were created by artists who lived before 1850?
and
What’s the label info for Irises?

At the Getty, we have never asked:
“What objects does Getty have that have images larger than
1200px on the longest side that have been exhibited in both New
York and Paris and were created by artists who lived before 1850?
but we ask
What’s the label info for Irises?
Several thousand times a day.
45
Imagine two Questions:

Having an interface for documents lets us
provide a simple, easily understandable
record that maps well to known contexts.
This is important, because people usually
expect these contexts. It makes answering
common questions simple.
46

Documents are also the way the internet
works: REST APIs, cache control, JSON,
webpages.
Using these well-known systems helps
developers make systems that are fast and
easy to build.
47

Research is different—each scholar brings
their own question and their own context.
Meeting their need means empowering
them to draw their own boundaries within
the data.
48
Graphs: For Asking Questions

Doing so is complex—it moves the burden of
deﬁning the relevant context to the user of
the data, not the creator of the data.
But it makes asking new questions possible,
even if it might be inefﬁcient or complicated.
49
Graphs: For Asking Questions

We’ve built our infrastructure to allow for
both use cases:
A developer can create, update, and delete
documents, and behind the scenes it will
keep a graph in sync with those changes.
50
Meeting Both Needs

It also allows for synchronization across systems:
A editor changes a record, which means the API needs updated, which means the website
needs updated, and the search interfaces, and third-party systems…
51
Linked Data Infrastructure: Tracking Changes

The infrastructure uses uses the
W3C ActivityStream standard and are
implemented using the patterns from the
IIIF Change Discovery API.
Using standards makes it easy to build
integrations against changing data—both
within our organization and for external
aggregators.
52
Linked Data Infrastructure: ActivityStreams and Standards

For some kinds of data, it's also valuable to
also be able to see what has changed over
time for a given record.
To do so, our APIs also supports Memento,
the standard underneath the Internet
Archive.
53
Linked Data Infrastructure: ActivityStreams and Standards

This lets you automatically open older
versions of the record—providing an audit log
and the ability for scholars to understand
how knowledge changes over time.
54
LLinked Data Infrastructure: ActivityStreams and Standards

Cool Tech, Bro:
Why Does this Matter?
55

A Hard-won lesson:
No application that we’ve built required Linked Data.
56
Getty’s Linked Data: What we learned

A Hard-won lesson:
No application that we’ve built required Linked Data.
Which, if you think about it, makes sense. Each application has
a speciﬁc, known context with clear record boundaries.
57

A Hard-won lesson:
Different users have different contexts and need different affordances.
A shared, graph-based data model allows us to re-present the data in a way that
matches user’s varying models of the world via multiple interfaces.
58

As Simple as possible:
A shared data model also makes our developers more effective—eventually.
Building on top of web technology lets the engineering learning curve be gradual.
59

Standards are valuable for interoperability—but also because
you don’t have to write all the documentation.
Nobody wants to write it, but you can’t work across institutions without it.
60

Minimize complexity in the data model.
Data is for computers—text is for humans. Resist the urge to show off.
You can always add complexity—you can never take it away.
61

Disciplinary Misdeeds:
The hardest part of this will be change management.
Recontextualizing information across boundaries hides disciplinary labor—
and digital innovations can conﬂict with pre-digital best practices.
62

Evangelize and collaborate.
What makes cultural data interesting is not contained within any one institution.
It’s shared across our entire, world-wide community. We should work together.
Shared models and shared code make that easier.
63

Over the next several years, we’ll be
expanding our usage of this system:
This fall, we’ll launch a new version of the
Getty Provenance Index, adding in 22M
records of transactions between art
dealers.
This research-focused dataset will allow
new insights into collections around the
world—and into the art market as a whole.
64
What’s Next: Provenance Index

We’re beginning to plan the next iteration of
the Getty Vocabularies infrastructure:
Working to understand how the multiple
contexts of our audiences can be
supported—and how new ways of working
impact the platform.
65
What’s Next: Getty Vocabularies

And we’re using the platforms and
standards we’ve put in place to enable
collaboration across the ﬁeld:
Working with the Smithsonian to provide
joint access and discovery for millions of
images from the photo morgue of
magazines such as Ebony and Jet.
66
What’s Next: Johnson Publishing Company Archives

Why do we do Linked Data?
The value is not in the technologies or the ontologies we use.
67

The value is in the ecosystem—information in varied context for different applications.
The value is in the audience—supporting user needs and conceptual models.
And it’s in the community—allowing data and code to be used beyond the Getty.
68

We do it for humans.
69

Contact me at dnewbury@getty.edu
Thank You.
David Newbury
Assistant Director, Software and User Experience, Getty
CNI Spring Project Brieﬁngs, March 26, 2024

Linked Data in Production: Moving Beyond Ontologies

More Related Content

What's hot (20)

Similar to Linked Data in Production: Moving Beyond Ontologies (20)

More from David Newbury (20)

Recently uploaded (20)

Linked Data in Production: Moving Beyond Ontologies