SlideShare a Scribd company logo
Text and Data Mining
Using Cultural Heritage Data:
Opportunities and challenges
Melanie Imming
EU Projects manager, LIBER
LIBER: Association of EU Research Libraries
TDM Cultural Heritage
OpenMinTeD
Open Text and Data Mining Platform for Open
Scientific Content
• focuses on interoperability across mining services
and content providers
• So that researchers can collaboratively create,
discover, share and re-use open texts and data
• Improve uptake of text and data mining (TDM) in the
EU
• Raise awareness of TDM
• Develop solutions to barriers together with
stakeholders
Text and Data Mining: How big is big?
Mining:
•More data than you can process yourself in reasonable amount of time
•Data that require computational intervention to make more sense of it all
Not Macro vs Micro
Making use of these techniques, data sets or new methods is not
automatically choosing to ‘go big’:
•Can be about one Work of Art
•Not Event History vs Longue Durée
TDM Cultural Heritage
What?
In research projects:
•Basic text mining: e.g. Word Clouds
•Network analysis
•Topic Modelling
Mining Cultural Heritage
Images © prof. dr. Joris van Eijnatten
How did newspapers in the twentieth century frame Europe?
Comparitive analysis of cultural patterns in time and space
prof. dr. Joris van Eijnatten
Toolbox
1 Read stuff ( use your eyes)
2 Time line generator (nGram viewers)
3 Semantic tekst mining tool (texcavator)
4 Corpus linguistics (e.g. Antconc, CasualConc, Wordsmith)
5 Topic modelling (e.g. Mallet)
6 Tekst analytics suite ( SPSS Modeler)
7 Vector-space modeling (ShiCo)
An Epidemiology of Information:
Data Mining the 1918 Influenza Pandemic
• Harness the power of data mining techniques with
interpretive analytics of the humanities and social science
• integrated traditional interpretive analysis (close readings of texts)
with dynamic temporal segmentation (topic modeling and
segmentation) and tone analysis
• Research can provide methods for understanding the spread of
information and the flow of disease in other societies facing the
threat of pandemics
U. of Kentucky
A Digging into Data project:
A Trans-Atlantic Platform for the Social Sciences and Humanities, representing
11 nations from both sides of the Atlantic.
Welt der Kinder - Children and their World
KNOWLEDGE OF THE WORLD AND ITS INTERPRETATION IN TEXT
BOOKS AND CHILDREN’S LITERATURE, 1850-1918
Prof. Dr. Iryna Gurevych
•Representations and interpretations of the world
in the period from 1850 until 1918
•Over 600.000 digitalized pages
“G. B. Wadström unterrichtet einen Negerprinzen” aus: Wilmsen,
Friedrich Philipp: Fremde Länder und Völker, Berlin 1815, Frontispiz.
Welt der Kinder - Children and their World
• Combining an established hermeneutic methodology with innovative
methods and technologies
• Close cooperation between historians, information scientists, and
computer scientists
• Developing reusable tools for the analysis of large (digital) corpora
• Test model for future similar projects
Authorship attribution
Mike Kestermont, assistant professor, University of Antwerp
•Stylometry (computational stylistics):
computational algorithms which can automatically identify the authors of
anonymous texts through the quantitative analysis of individual writing
styles
Who wrote the lyrics of the Wilhelmus, the oldest national anthem in
the world?
Authorship attribution
The Wilhelmus is traditionally ascribed to
Philips of Marnix, Lord of Saint-Aldegonde
By using these computational stylistics, a new possible
candidate came up:
Peter Datheen, a second-rate sixteenth-century poet
from French Flanders
Datheen wasn’t on the Short List: but he came up when
using a control group to validate the method
Workshop Nov 2015:
“Text and Data Mining in Europe: Challenges and Action”
Elsevier TDM Policy
• Access through API only
• Text only- no images, tables
• Research must register details
• Click-through licence
• Terms can change any time
• Reproducibility of results

More Related Content

PPTX
EurnewsLDN_Toine_Pieters
PDF
Digital librarianship - BIALL/CLSIG/SLA Europe Open Day
PDF
Text as a Resource. Text Mining in Historical Science #dhiha7
PPTX
Digital Research in the Arts and Humanities: some thoughts on what, why, and ...
PDF
Digital Research in the Arts and Humanities: some thoughts on what, why, and ...
PDF
What are the Digital Humanities and what use are they to me?
PDF
Digital history don spaeth 28 may 2013
PDF
OpenAIRE-connect project poster presented at RDA10 ( RDA Tenth Plenary Meetin...
EurnewsLDN_Toine_Pieters
Digital librarianship - BIALL/CLSIG/SLA Europe Open Day
Text as a Resource. Text Mining in Historical Science #dhiha7
Digital Research in the Arts and Humanities: some thoughts on what, why, and ...
Digital Research in the Arts and Humanities: some thoughts on what, why, and ...
What are the Digital Humanities and what use are they to me?
Digital history don spaeth 28 may 2013
OpenAIRE-connect project poster presented at RDA10 ( RDA Tenth Plenary Meetin...

What's hot (20)

PPTX
PPTX
Parthenos Training: Infrastructures - The infrastructural turn
KEY
Visby Final Final
PPTX
MediaDNA


PDF
Building a Network of Open Correspondence Projects A model for Open Science
PDF
Building a Network of Open Correspondence Projects. A model for Open Science
PPTX
Transcribathons as citizen science projects: a comparative analysis of Europ...
PPT
Esad 12may2010
ODP
Text and Image based Digital Humanities: providing access to textual heritage...
PDF
(BIG) DATA SCIENCE AND HISTORICAL ARCHAEOLOGICAL STUDIES: A METHODOLOGICAL, ...
PDF
The role of research libraries in a European e-science environment
PPTX
Why Big Data Will Survive the Hype - and Change the Way We Work
PPTX
Designing the Digital Humanities Library Lab @ Leuven (DH3L)
PPTX
Digital Humanities, Big Data, and New Research Methods
PPTX
Prescottimperialbigdata
PPT
E cloud wp1_plenary_01
PDF
Europeana Newspapers project contribution to the freedom of information: find...
PPTX
Living with Machines year two update
PPTX
Sextant: Browsing and Mapping the Ocean of Linked Geospatial Data
PPT
What are cyber infrastructures in the digital humanities
Parthenos Training: Infrastructures - The infrastructural turn
Visby Final Final
MediaDNA


Building a Network of Open Correspondence Projects A model for Open Science
Building a Network of Open Correspondence Projects. A model for Open Science
Transcribathons as citizen science projects: a comparative analysis of Europ...
Esad 12may2010
Text and Image based Digital Humanities: providing access to textual heritage...
(BIG) DATA SCIENCE AND HISTORICAL ARCHAEOLOGICAL STUDIES: A METHODOLOGICAL, ...
The role of research libraries in a European e-science environment
Why Big Data Will Survive the Hype - and Change the Way We Work
Designing the Digital Humanities Library Lab @ Leuven (DH3L)
Digital Humanities, Big Data, and New Research Methods
Prescottimperialbigdata
E cloud wp1_plenary_01
Europeana Newspapers project contribution to the freedom of information: find...
Living with Machines year two update
Sextant: Browsing and Mapping the Ocean of Linked Geospatial Data
What are cyber infrastructures in the digital humanities
Ad

Viewers also liked (13)

PDF
Analytics training v0.01
PPTX
Crowdsourcing your cultural heritage collections: considerations when choosi...
PPTX
Crowdsourcing, scholarship and the academy
PPTX
Designing Successful Heritage Crowdsourcing Projects
PPTX
Crowdsourcing in the Cultural Sector: approaches, challenges and issues
PPTX
Digital Odyssey 2015 - Open Collections
PDF
Crowdsourcing and Cultural Heritage Collections
PPTX
Planning for big data (lessons from cultural heritage)
PPTX
Crowdsourcing and Cultural Heritage workshop
PDF
SiSense 사이센스 True Agile BI 솔루션
PPTX
Introduction to information visualisation for humanities PhDs
PPTX
2013 빅데이터 및 API 기술 현황과 전망- 윤석찬
PPTX
RESTful API 제대로 만들기
Analytics training v0.01
Crowdsourcing your cultural heritage collections: considerations when choosi...
Crowdsourcing, scholarship and the academy
Designing Successful Heritage Crowdsourcing Projects
Crowdsourcing in the Cultural Sector: approaches, challenges and issues
Digital Odyssey 2015 - Open Collections
Crowdsourcing and Cultural Heritage Collections
Planning for big data (lessons from cultural heritage)
Crowdsourcing and Cultural Heritage workshop
SiSense 사이센스 True Agile BI 솔루션
Introduction to information visualisation for humanities PhDs
2013 빅데이터 및 API 기술 현황과 전망- 윤석찬
RESTful API 제대로 만들기
Ad

More from LIBER Europe (20)

PPTX
LIBER Europe Covid-19 Research Libraries Survey - December 2020
PDF
LIBER Webinar: Turning FAIR Data Into Reality
PDF
Copyright Reform: EU Legislative Process & LIBER Advocacy
PPTX
LIBER Webinar: Supporting Data Literacy
PPTX
Applying Bourdieu's Field Theory to MLS Curricula Development. Charlotte Nord...
PPTX
Growing a Culture for Change at The University of Manchester Library. Penny H...
PDF
Knowledge Exchange Consensus: Monitoring of Open Access Publications and Cost...
PDF
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
PDF
The Role of Libraries in the Adoption of Research Data Management. Ingeborg V...
PDF
LibChain – Open, Verifiable and Anonymous Access Management. Juan Cabello, P...
PDF
From Open Access to Open Data: Collaborative Work in the University Libraries...
PPTX
The Perks and Challenges of Drawing Maps and Walking at the Same Time
PDF
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
PDF
Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...
PDF
Adoption and Integration of Persistent Identifiers in European Research Infor...
PDF
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
PDF
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
PPTX
Enabling the Exchange and use of Data in Agriculture
PPT
GDPR - Thoughts on the EU Data Protection Regulation, Research and Libraries
PPT
Research Data Services and Data Collections: Library Synergies for Economic R...
LIBER Europe Covid-19 Research Libraries Survey - December 2020
LIBER Webinar: Turning FAIR Data Into Reality
Copyright Reform: EU Legislative Process & LIBER Advocacy
LIBER Webinar: Supporting Data Literacy
Applying Bourdieu's Field Theory to MLS Curricula Development. Charlotte Nord...
Growing a Culture for Change at The University of Manchester Library. Penny H...
Knowledge Exchange Consensus: Monitoring of Open Access Publications and Cost...
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
The Role of Libraries in the Adoption of Research Data Management. Ingeborg V...
LibChain – Open, Verifiable and Anonymous Access Management. Juan Cabello, P...
From Open Access to Open Data: Collaborative Work in the University Libraries...
The Perks and Challenges of Drawing Maps and Walking at the Same Time
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...
Adoption and Integration of Persistent Identifiers in European Research Infor...
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
Enabling the Exchange and use of Data in Agriculture
GDPR - Thoughts on the EU Data Protection Regulation, Research and Libraries
Research Data Services and Data Collections: Library Synergies for Economic R...

Recently uploaded (20)

PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
BIOMOLECULES PPT........................
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
An interstellar mission to test astrophysical black holes
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
Sciences of Europe No 170 (2025)
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
. Radiology Case Scenariosssssssssssssss
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
BIOMOLECULES PPT........................
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
An interstellar mission to test astrophysical black holes
TOTAL hIP ARTHROPLASTY Presentation.pptx
Comparative Structure of Integument in Vertebrates.pptx
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Sciences of Europe No 170 (2025)
ECG_Course_Presentation د.محمد صقران ppt
Placing the Near-Earth Object Impact Probability in Context
AlphaEarth Foundations and the Satellite Embedding dataset
INTRODUCTION TO EVS | Concept of sustainability
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
HPLC-PPT.docx high performance liquid chromatography
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
The KM-GBF monitoring framework – status & key messages.pptx
Taita Taveta Laboratory Technician Workshop Presentation.pptx
. Radiology Case Scenariosssssssssssssss
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf

Text and Data Mining Using Cultural Heritage Data

  • 1. Text and Data Mining Using Cultural Heritage Data: Opportunities and challenges Melanie Imming EU Projects manager, LIBER
  • 2. LIBER: Association of EU Research Libraries
  • 3. TDM Cultural Heritage OpenMinTeD Open Text and Data Mining Platform for Open Scientific Content • focuses on interoperability across mining services and content providers • So that researchers can collaboratively create, discover, share and re-use open texts and data • Improve uptake of text and data mining (TDM) in the EU • Raise awareness of TDM • Develop solutions to barriers together with stakeholders
  • 4. Text and Data Mining: How big is big? Mining: •More data than you can process yourself in reasonable amount of time •Data that require computational intervention to make more sense of it all Not Macro vs Micro Making use of these techniques, data sets or new methods is not automatically choosing to ‘go big’: •Can be about one Work of Art •Not Event History vs Longue Durée TDM Cultural Heritage
  • 5. What? In research projects: •Basic text mining: e.g. Word Clouds •Network analysis •Topic Modelling Mining Cultural Heritage Images © prof. dr. Joris van Eijnatten
  • 6. How did newspapers in the twentieth century frame Europe? Comparitive analysis of cultural patterns in time and space prof. dr. Joris van Eijnatten Toolbox 1 Read stuff ( use your eyes) 2 Time line generator (nGram viewers) 3 Semantic tekst mining tool (texcavator) 4 Corpus linguistics (e.g. Antconc, CasualConc, Wordsmith) 5 Topic modelling (e.g. Mallet) 6 Tekst analytics suite ( SPSS Modeler) 7 Vector-space modeling (ShiCo)
  • 7. An Epidemiology of Information: Data Mining the 1918 Influenza Pandemic • Harness the power of data mining techniques with interpretive analytics of the humanities and social science • integrated traditional interpretive analysis (close readings of texts) with dynamic temporal segmentation (topic modeling and segmentation) and tone analysis • Research can provide methods for understanding the spread of information and the flow of disease in other societies facing the threat of pandemics U. of Kentucky A Digging into Data project: A Trans-Atlantic Platform for the Social Sciences and Humanities, representing 11 nations from both sides of the Atlantic.
  • 8. Welt der Kinder - Children and their World KNOWLEDGE OF THE WORLD AND ITS INTERPRETATION IN TEXT BOOKS AND CHILDREN’S LITERATURE, 1850-1918 Prof. Dr. Iryna Gurevych •Representations and interpretations of the world in the period from 1850 until 1918 •Over 600.000 digitalized pages “G. B. Wadström unterrichtet einen Negerprinzen” aus: Wilmsen, Friedrich Philipp: Fremde Länder und Völker, Berlin 1815, Frontispiz.
  • 9. Welt der Kinder - Children and their World • Combining an established hermeneutic methodology with innovative methods and technologies • Close cooperation between historians, information scientists, and computer scientists • Developing reusable tools for the analysis of large (digital) corpora • Test model for future similar projects
  • 10. Authorship attribution Mike Kestermont, assistant professor, University of Antwerp •Stylometry (computational stylistics): computational algorithms which can automatically identify the authors of anonymous texts through the quantitative analysis of individual writing styles Who wrote the lyrics of the Wilhelmus, the oldest national anthem in the world?
  • 11. Authorship attribution The Wilhelmus is traditionally ascribed to Philips of Marnix, Lord of Saint-Aldegonde By using these computational stylistics, a new possible candidate came up: Peter Datheen, a second-rate sixteenth-century poet from French Flanders Datheen wasn’t on the Short List: but he came up when using a control group to validate the method
  • 12. Workshop Nov 2015: “Text and Data Mining in Europe: Challenges and Action”
  • 13. Elsevier TDM Policy • Access through API only • Text only- no images, tables • Research must register details • Click-through licence • Terms can change any time • Reproducibility of results