SlideShare a Scribd company logo
WG5: A data wrangling experiment
Karin de Wild
WG5: A data wrangling experiment
The process of formatting or restructuring raw data to suit specific needs
Interoperability
How can data across digital heritage collections be interlinked in machine-readable data?
A museum experiment
Creating a dataset that can give insights in body postures in early modern European art (1590-1709).
Aagje Lybeer
(Ghent University)
Elsje van Kessel
(St Andrews University)
Karin de wild
(Leiden University)
Rijksmuseum (data) collection
Amsterdam
Getting the data
How to utilize a large-scale dataset with
artworks from the Rijksmuseum collection.
SPARQL
SPARQL (pronounced "sparkle") is a language to formulate
questions (queries) for knowledge databases.
This language enables to retrieve and manipulate data stored
in Resource Description Framework (RDF) format (Linked
Open Data).
Basic syntax:
SELECT ?variables from triples using a given pattern defined
after WHERE
https://query.wikidata.org/ https://w.wiki/4FvE
WG5: A data wrangling experiment
Interoperability
How can data across digital heritage collections be interlinked in machine-readable data?
museum
⬇
Interoperability
How can data across digital heritage collections be interlinked in machine-readable data?
museum archive
⬇ ⬇
Interoperability
How can data across digital heritage collections be interlinked in machine-readable data?
CONTROLLED VOCABULARIES
Standardized and organized sets of words and phrases for retrieval and disambiguation of information
⬇
Metadata Rijksmuseum
EDM – European data model
CONTROLLED VOCABULARIES
Standardized and organized sets of words and phrases for retrieval and disambiguation of information
⬇ ⬇
Metadata Rijksmuseum Iconclass
EDM – European data model Vocabulary for art history
and visual studies
CONTROLLED VOCABULARIES
Standardized and organized sets of words and phrases for retrieval and disambiguation of information
A Web archive experiment
Creating a dataset for Web archive studies.
https://query.wikidata.org/ https://w.wiki/4Fud
⬇
Wiki data: Identifiers
Wikidata makes use of
identifiers for both internal
organization of the knowledge
base and for its connection to
other databases.
https://www.wikidata.org/w/index.php?search=&title=Special:Search&profile=advanced&fulltext=1&ns120=1
WG5: A data wrangling experiment
WG5: A data wrangling experiment
WG5: A data wrangling experiment
Writing a code book
A systematic description of data formats for body postures in early modern European art (1590-1709).
Code book
Describing the ideal dataset.
Code book
Describes the layout of the data in the dataset and what the data codes mean.
WG5: A data wrangling experiment
WG5: A data wrangling experiment
subject?
metadata?
Writing a code book
A systematic description of data formats for web archive studies.
WG5: A data wrangling experiment
THE IDEA
To start a debate about:
› whether a shared vocabulary for archived web data is
needed at all
› how this can be done (in case it is needed)
› and, eventually, what it could look like
› Ulrich Have, in an email when WG5 was established: “a
standard data format would be interesting as a kind of
future requirements document for researcher-ready data”

More Related Content

PPTX
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
PDF
Web Archive Research Skills and Tools Survey (WARST)
PPTX
Webber Presentation
PDF
Maurer Presentation - WARCnet Spring Meeting 2021
PPTX
Tuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard Jensen
PPT
A researcher driven data description for the archived web: Why and how?
PPTX
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
PPT
The Danish case: What does the danish web talk about
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Web Archive Research Skills and Tools Survey (WARST)
Webber Presentation
Maurer Presentation - WARCnet Spring Meeting 2021
Tuesday 5 May: The Shapes of Archives and Memory, Helle Strandgaard Jensen
A researcher driven data description for the archived web: Why and how?
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
The Danish case: What does the danish web talk about

What's hot (20)

PDF
20170501 Distributed Network of Digital Heritage Information
PPTX
Making social science more reproducible by encapsulating access to linked data
PPTX
QB'er demonstration
PDF
lodlam summit session browsable linked data
PDF
2016 05-20-clariah-wp4
PPTX
DSpace for Cultural Heritage: adding support for images visualization,audio/v...
PPTX
Semantic web 101: Benefits for geologists
PDF
Linked Data
PDF
The ARIADNE interoperability framework, component architecture and registry s...
PPTX
Presentatie for "Studiemiddag Linked Data Archieven"
PDF
Wikidata
PDF
Open Access of Research Data - The Present and Future Situation in Germany
PDF
Web at 25 - Ontos Linked Open Data
PDF
Maximising (Re)Usability of Library metadata using Linked Data
PDF
Open data and linked data
PPT
Open Knowledge Foundation Edinburgh meet-up #3
PPTX
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
PDF
Linking knowledge spaces
PPTX
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
PDF
Linked data as a library data platform
20170501 Distributed Network of Digital Heritage Information
Making social science more reproducible by encapsulating access to linked data
QB'er demonstration
lodlam summit session browsable linked data
2016 05-20-clariah-wp4
DSpace for Cultural Heritage: adding support for images visualization,audio/v...
Semantic web 101: Benefits for geologists
Linked Data
The ARIADNE interoperability framework, component architecture and registry s...
Presentatie for "Studiemiddag Linked Data Archieven"
Wikidata
Open Access of Research Data - The Present and Future Situation in Germany
Web at 25 - Ontos Linked Open Data
Maximising (Re)Usability of Library metadata using Linked Data
Open data and linked data
Open Knowledge Foundation Edinburgh meet-up #3
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
Linking knowledge spaces
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
Linked data as a library data platform
Ad

Similar to WG5: A data wrangling experiment (20)

PPT
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
PDF
Europeana and linked cultural heritage data
PPT
Estermann wikidata introduction-sapa-20180630
PPTX
The WARCnet Code Book of web archive data formats
PPTX
Building a Framework for Semantic Cultural Heritage Data
PPTX
Semantics and the Humanities: some lessons from my journey 2000-2012
PDF
The Art of Information and Data in Digital Cultural Heritage
PDF
Open Culture - How Wiki loves art and data - Packed
PDF
Linked Open Data Publications through Wikidata & Persistent Identification...
PPTX
European databases in cultural heritage: making connections
PDF
Linked Open Data Publications through Wikidata & Persistent Identification in...
PDF
MPhil Lecture on Data Vis for Analysis
PDF
Eun lre brussels_winer20100616
PDF
Wikidata for libraries and archives
PPTX
Automated interpretability of linked data ontologies: an evaluation within th...
PPT
Wikidata Introductory Workshop
PPT
Wikidata, a target for Europeana’s semantic strategy (Glam-Wiki 2015)
PPT
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015
PPTX
PPTX
Q rpedia codes presentation given in bristol 16 April bamkin & eden
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Europeana and linked cultural heritage data
Estermann wikidata introduction-sapa-20180630
The WARCnet Code Book of web archive data formats
Building a Framework for Semantic Cultural Heritage Data
Semantics and the Humanities: some lessons from my journey 2000-2012
The Art of Information and Data in Digital Cultural Heritage
Open Culture - How Wiki loves art and data - Packed
Linked Open Data Publications through Wikidata & Persistent Identification...
European databases in cultural heritage: making connections
Linked Open Data Publications through Wikidata & Persistent Identification in...
MPhil Lecture on Data Vis for Analysis
Eun lre brussels_winer20100616
Wikidata for libraries and archives
Automated interpretability of linked data ontologies: an evaluation within th...
Wikidata Introductory Workshop
Wikidata, a target for Europeana’s semantic strategy (Glam-Wiki 2015)
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015
Q rpedia codes presentation given in bristol 16 April bamkin & eden
Ad

More from WARCnet (20)

PPTX
Gauditz & Kunze, Web archives as research data FINAL.pptx
PPTX
Gauditz & Kunze, Web archives as research data FINAL.pptx
PDF
2022 Visit Royal Danish Library Ditte Laursen.pdf
PDF
20221015 introduction to panel Ditte Laursen.pdf
PPTX
WARCnet_2022.pptx
PPTX
WARCnet conference - Mapping social media archiving initiatives.pptx
PPTX
Warcnet 2022_final.pptx
PDF
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
PDF
Hegarty-WARCNet2022-slides.pdf
PDF
20221018_Panel_Covid_WARCnet_closing_conference.pdf
PPTX
Millward - We cannot put this off any longer - upload.pptx
PPTX
Balbi_Keynote_AarhusWARCnet.pptx
PPTX
Reporting from a Short-Term Network Stay at the BnF and INA
PPTX
Post WARCnet
PDF
Web scraping using semi-automated browsing
PPTX
Working Group 6 discussion
PPTX
What’s in a URL? Analysing COVID-19 web archive collections
PPTX
Working Group 2 on transnational events
PDF
Whose Archives? Reflections on ethics and the cultural significance of web ar...
PPTX
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...
Gauditz & Kunze, Web archives as research data FINAL.pptx
Gauditz & Kunze, Web archives as research data FINAL.pptx
2022 Visit Royal Danish Library Ditte Laursen.pdf
20221015 introduction to panel Ditte Laursen.pdf
WARCnet_2022.pptx
WARCnet conference - Mapping social media archiving initiatives.pptx
Warcnet 2022_final.pptx
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
Hegarty-WARCNet2022-slides.pdf
20221018_Panel_Covid_WARCnet_closing_conference.pdf
Millward - We cannot put this off any longer - upload.pptx
Balbi_Keynote_AarhusWARCnet.pptx
Reporting from a Short-Term Network Stay at the BnF and INA
Post WARCnet
Web scraping using semi-automated browsing
Working Group 6 discussion
What’s in a URL? Analysing COVID-19 web archive collections
Working Group 2 on transnational events
Whose Archives? Reflections on ethics and the cultural significance of web ar...
From WG2 Datathon to AWAC2. Exploring IIPC special COVID collection thanks to...

Recently uploaded (20)

PPTX
worship songs, in any order, compilation
PPTX
Project and change Managment: short video sequences for IBA
PDF
Why Top Brands Trust Enuncia Global for Language Solutions.pdf
PPTX
The spiral of silence is a theory in communication and political science that...
PPTX
Emphasizing It's Not The End 08 06 2025.pptx
PPTX
Role and Responsibilities of Bangladesh Coast Guard Base, Mongla Challenges
DOC
学位双硕士UTAS毕业证,墨尔本理工学院毕业证留学硕士毕业证
PPTX
Non-Verbal-Communication .mh.pdf_110245_compressed.pptx
PPTX
AcademyNaturalLanguageProcessing-EN-ILT-M02-Introduction.pptx
PPTX
nose tajweed for the arabic alphabets for the responsive
PPTX
Primary and secondary sources, and history
PPTX
Effective_Handling_Information_Presentation.pptx
PPTX
Tour Presentation Educational Activity.pptx
PPTX
Self management and self evaluation presentation
PPTX
Human Mind & its character Characteristics
DOCX
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
PPTX
Learning-Plan-5-Policies-and-Practices.pptx
PPTX
Intro to ISO 9001 2015.pptx wareness raising
PDF
Swiggy’s Playbook: UX, Logistics & Monetization
PPTX
_ISO_Presentation_ISO 9001 and 45001.pptx
worship songs, in any order, compilation
Project and change Managment: short video sequences for IBA
Why Top Brands Trust Enuncia Global for Language Solutions.pdf
The spiral of silence is a theory in communication and political science that...
Emphasizing It's Not The End 08 06 2025.pptx
Role and Responsibilities of Bangladesh Coast Guard Base, Mongla Challenges
学位双硕士UTAS毕业证,墨尔本理工学院毕业证留学硕士毕业证
Non-Verbal-Communication .mh.pdf_110245_compressed.pptx
AcademyNaturalLanguageProcessing-EN-ILT-M02-Introduction.pptx
nose tajweed for the arabic alphabets for the responsive
Primary and secondary sources, and history
Effective_Handling_Information_Presentation.pptx
Tour Presentation Educational Activity.pptx
Self management and self evaluation presentation
Human Mind & its character Characteristics
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
Learning-Plan-5-Policies-and-Practices.pptx
Intro to ISO 9001 2015.pptx wareness raising
Swiggy’s Playbook: UX, Logistics & Monetization
_ISO_Presentation_ISO 9001 and 45001.pptx

WG5: A data wrangling experiment