SlideShare a Scribd company logo
The challenge of making
digitised European newspaper
content available online
Susan Reilly, LIBER
Twitter: @skreilly
IFLA Newspapers, Singapore, Aug 2013
This project is partially funded under the ICT Policy Support Programme (ICT PSP)
as part of the Competitiveness and Innovation Framework Programme by the
European Community http://ec.europa.eu/ict_psp 2
Overview
Europeana Newspapers: making European
newspapers available online
Assessing the state of our digitised newspapers
collections
Where do we go from here
Europeana Newspapers: making European newspapers
available online
•Content from 20 countries! (13+7 new countries)
•Aggregation of more than 18 million newspapers into
Europeana
•Make newspapers more accessible by applying refinement
methods for OCR, OLR (article segmentation), and named
entity (NER) and class recognition
•Increase visibility via dedicated content browser
•Ensure sustainability by spreading best practice
This project is partially funded under the ICT Policy Support Programme (ICT PSP)
as part of the Competitiveness and Innovation Framework Programme by the
European Community http://ec.europa.eu/ict_psp
Asessing the state of Europe digitised newspaper
collections
•Who’s digitising newspapers?
•What percentage of newspaper
collections are digitised?
•How many pages?
•Quality of digitisation?
•How are images made available?
Findings:% of newspaper collections digitised
•Survey of LIBER member (400 European research
libraries)
•47 responses
• Does this indicate number of institutions digitising
newspaper?
•Less than 10% of respondents’ collections digitised
• Compared to average of 20% for % of total collection digitised
(Enumerate)
•130 million pages and 24,000 titles
• Not all libraries could provide exact figures because of
cursory nature of catalogue
This project is partially funded under the ICT Policy Support Programme (ICT PSP)
as part of the Competitiveness and Innovation Framework Programme by the
European Community http://ec.europa.eu/ict_psp
Findings: 20th century content an issue
•Conservative approach to copyright
terms
•½ of respondents reported a cut-off
date beyond which they do not make
content available
• Early as 1863
• Latest last 70 years
•Special arrangements with
publishers (23%)
•Collective rights agreements too
complex
This project is partially funded under the ICT Policy Support Programme (ICT PSP)
as part of the Competitiveness and Innovation Framework Programme by the
European Community http://ec.europa.eu/ict_psp
Findings: How accessible are the collections?
•85% provide free access
• Sometimes only at national level
•Some subscription fees/under licence
This project is partially funded under the ICT Policy Support Programme (ICT PSP)
as part of the Competitiveness and Innovation Framework Programme by the
European Community http://ec.europa.eu/ict_psp
Findings: How rich is the content?
•36% employ no OCR
•50% of those who did not confident
enough in results to expose OCR’d
text via search interface
•36% zoning and segmentation
•Only 6% named entity recognition
•Huge variance in metadata
• Dublin Core only
• Own standards
This project is partially funded under the ICT Policy Support Programme (ICT PSP)
as part of the Competitiveness and Innovation Framework Programme by the
European Community http://ec.europa.eu/ict_psp
Challenges
•Newspaper digitisation is behind
•Copyright issues more complex
•Lack of quality evaluation
technologies for OCR
•Lack of standardised metadata
suited to newspapers
This project is partially funded under the ICT Policy Support Programme (ICT PSP)
as part of the Competitiveness and Innovation Framework Programme by the
European Community http://ec.europa.eu/ict_psp
Solutions
•Standardised metadata mapped to EDM
•Quality evaluation technologies for OCR
•Clarity over rights issues
•Dialogue with publishers
•More funding for digitisation
• Increase visibility
Thank you for your attention!
http://www.libereurope.eu
http://www.europeana-newspapers.eu/

More Related Content

PPT
Europeana Newspapers Amsterdam workshop introduction
PPT
Europeana Newspapers LIBER2013 Workshop intro
PPTX
Europeana Newspaper metadata LIBER2013
PPT
Europeana Newspapers Aggregation Plan
PPT
ENP Belgrade Workshop Project Overview
PPT
ENP Belgrade WS refinement introduction
PPTX
ENP Belgrade WS Metadata
PPT
Realising the value of Europe's newspaper heritage
Europeana Newspapers Amsterdam workshop introduction
Europeana Newspapers LIBER2013 Workshop intro
Europeana Newspaper metadata LIBER2013
Europeana Newspapers Aggregation Plan
ENP Belgrade Workshop Project Overview
ENP Belgrade WS refinement introduction
ENP Belgrade WS Metadata
Realising the value of Europe's newspaper heritage

What's hot (20)

PPT
Europeana Newspapers Project
PPT
ENP Belgrade WS Introduction
PPT
PPT
Challenges and solutions in creating a european historic newspapers browser
PDF
04 europeana newspapers
PPT
Europeana Newspapers - the Gateway to European Newspapers Online
PPT
Refinement of Digitised Newspapers
PPTX
EurnewsLDN_Clemens_Neudecker
PDF
IFLA 2014 Europeana Newspapers Rossitza Atanassova
PPT
Europeana Newspapers wp2 liber2013
PDF
ENP_Dutch_Infoday_LWilms
PPT
Presentation of Hans-Jörg Lieder, BnF Information Day
PPTX
Presentation of Clemens Neudecker, BnF Information Day
PDF
Overview of the Europeana Newspapers Project
PPT
Europeana_Newspapers_ONB_infoday_HJLieder
PPTX
Large scale refinement of digital historical newspapers with named entities r...
PDF
Open Government Data in Europe
PPT
Europeana Newspapers Polish Information Day
PDF
Opening Remarks, Richard Swetenham
Europeana Newspapers Project
ENP Belgrade WS Introduction
Challenges and solutions in creating a european historic newspapers browser
04 europeana newspapers
Europeana Newspapers - the Gateway to European Newspapers Online
Refinement of Digitised Newspapers
EurnewsLDN_Clemens_Neudecker
IFLA 2014 Europeana Newspapers Rossitza Atanassova
Europeana Newspapers wp2 liber2013
ENP_Dutch_Infoday_LWilms
Presentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Clemens Neudecker, BnF Information Day
Overview of the Europeana Newspapers Project
Europeana_Newspapers_ONB_infoday_HJLieder
Large scale refinement of digital historical newspapers with named entities r...
Open Government Data in Europe
Europeana Newspapers Polish Information Day
Opening Remarks, Richard Swetenham
Ad

Viewers also liked (9)

PDF
Europeana Newspapers Estonian Infoday Ragne Kouts
PDF
Europeana Newspapers Estonian Infoday Krista Aru
PDF
Europeana Newspapers Estonian Infoday Krista Kiisa
PDF
Europeana Newspapers Estonian Infoday Fred Puss
PDF
Europeana Newspapers Estonian Infoday Kristel Veimann
PPTX
ENP_ONB_infday_GMuehlberger
PPTX
ENP_ONB_infoday_Schaller
PPTX
ENP_ONB_infoday_Neudecker
PPSX
ENP_ONB_infoday_CMueller
Europeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Kristel Veimann
ENP_ONB_infday_GMuehlberger
ENP_ONB_infoday_Schaller
ENP_ONB_infoday_Neudecker
ENP_ONB_infoday_CMueller
Ad

Similar to The challenges of making Europe's newspapers available online (17)

PPTX
Europeana Newspapers in a nutshell
PDF
Europeana Newspapers LFT Infoday Muehlberger
PPTX
Previous activities
PDF
Positioning libraries in the digital preservation landscape
PDF
02 europeana fashion
PPT
Update and forward plan for ENUMERATE - Digitisation intelligence for Europe
PPTX
DIGITAL INNOVATION HUBS: WHAT ARE THE ACHIEVEMENTS SO FAR AND WHAT REMAINS TO...
PDF
Smart Specialisation Platform on Energy, one year progress and perspectives
PDF
Performance Evaluation and Quality Assessment
PDF
DWS16 - Future Networks forum - Anna Krzyzanowska European Commission
PPTX
Acatech.pptx
PDF
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
PPTX
20141021_airqualitygovernance_CIVITAS_Gorris4a
PDF
Session 12: Elise Brax - The Grand Paris Express.pdf
PPT
Navarrete acei2014
PDF
Max Lemke - Open Innovation and Smart Cities in the Future Internet Context: ...
PPTX
Open Data and Open Science in the European Commission
Europeana Newspapers in a nutshell
Europeana Newspapers LFT Infoday Muehlberger
Previous activities
Positioning libraries in the digital preservation landscape
02 europeana fashion
Update and forward plan for ENUMERATE - Digitisation intelligence for Europe
DIGITAL INNOVATION HUBS: WHAT ARE THE ACHIEVEMENTS SO FAR AND WHAT REMAINS TO...
Smart Specialisation Platform on Energy, one year progress and perspectives
Performance Evaluation and Quality Assessment
DWS16 - Future Networks forum - Anna Krzyzanowska European Commission
Acatech.pptx
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
20141021_airqualitygovernance_CIVITAS_Gorris4a
Session 12: Elise Brax - The Grand Paris Express.pdf
Navarrete acei2014
Max Lemke - Open Innovation and Smart Cities in the Future Internet Context: ...
Open Data and Open Science in the European Commission

More from LIBER Europe (20)

PPTX
LIBER Europe Covid-19 Research Libraries Survey - December 2020
PDF
LIBER Webinar: Turning FAIR Data Into Reality
PDF
Copyright Reform: EU Legislative Process & LIBER Advocacy
PPTX
LIBER Webinar: Supporting Data Literacy
PPTX
Applying Bourdieu's Field Theory to MLS Curricula Development. Charlotte Nord...
PPTX
Growing a Culture for Change at The University of Manchester Library. Penny H...
PDF
Knowledge Exchange Consensus: Monitoring of Open Access Publications and Cost...
PDF
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
PDF
The Role of Libraries in the Adoption of Research Data Management. Ingeborg V...
PDF
LibChain – Open, Verifiable and Anonymous Access Management. Juan Cabello, P...
PDF
From Open Access to Open Data: Collaborative Work in the University Libraries...
PPTX
The Perks and Challenges of Drawing Maps and Walking at the Same Time
PDF
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
PDF
Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...
PDF
Adoption and Integration of Persistent Identifiers in European Research Infor...
PDF
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
PDF
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
PPTX
Enabling the Exchange and use of Data in Agriculture
PPT
GDPR - Thoughts on the EU Data Protection Regulation, Research and Libraries
PPT
Research Data Services and Data Collections: Library Synergies for Economic R...
LIBER Europe Covid-19 Research Libraries Survey - December 2020
LIBER Webinar: Turning FAIR Data Into Reality
Copyright Reform: EU Legislative Process & LIBER Advocacy
LIBER Webinar: Supporting Data Literacy
Applying Bourdieu's Field Theory to MLS Curricula Development. Charlotte Nord...
Growing a Culture for Change at The University of Manchester Library. Penny H...
Knowledge Exchange Consensus: Monitoring of Open Access Publications and Cost...
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
The Role of Libraries in the Adoption of Research Data Management. Ingeborg V...
LibChain – Open, Verifiable and Anonymous Access Management. Juan Cabello, P...
From Open Access to Open Data: Collaborative Work in the University Libraries...
The Perks and Challenges of Drawing Maps and Walking at the Same Time
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...
Adoption and Integration of Persistent Identifiers in European Research Infor...
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
Enabling the Exchange and use of Data in Agriculture
GDPR - Thoughts on the EU Data Protection Regulation, Research and Libraries
Research Data Services and Data Collections: Library Synergies for Economic R...

Recently uploaded (20)

PDF
Jim Stone Freelance Voterig August 13, 2025.pdf
PDF
Conflict, Narrative and Media -An Analysis of News on Israel-Palestine Confli...
PDF
KAL 007 Manual: The Russian Shootdoown of Civilian Plane on 09/01/1983
PDF
Aron Govil on Why America Lacks Skilled Engineers.pdf
PPTX
Pakistan movement part 2: story about Pakistan Movement
PDF
History ppt on World War 2 and its consequences
PDF
The Blogs_ Hamas’s Deflection Playbook _ Andy Blumenthal _ The Times of Israe...
PDF
Naidu Pushes for Rs 36 Crore Subsidy to Support Farmers in Need
PDF
Samaya Jyothi Live News Telugu | Breaking & Trusted Updates
PPTX
Precised New Precis and Composition 2025.pptx
PDF
2025-07-24_CED-HWB_WIPP_ACO000000001.pdf
PDF
Mindanao Debate Lecture Presentation Outline 1.General Facts 2.Mindanao Histo...
DOCX
Memecoin memecoinist news site for trends and insights
DOC
BU毕业证学历认证,阿什兰大学毕业证文凭证书
PPTX
ASEANOPOL: The Multinational Police Force
DOCX
Memecoin news and insights on memecoinist
DOCX
End Of The Age TV Program: Depicting the Actual Truth in a World of Lies
PPTX
7th-president-Ramon-Magsaysay-Presentation.pptx
PDF
The Most Dynamic Lawyer to Watch 2025.pdf
PPTX
Sir Creek Conflict: History and its importance
Jim Stone Freelance Voterig August 13, 2025.pdf
Conflict, Narrative and Media -An Analysis of News on Israel-Palestine Confli...
KAL 007 Manual: The Russian Shootdoown of Civilian Plane on 09/01/1983
Aron Govil on Why America Lacks Skilled Engineers.pdf
Pakistan movement part 2: story about Pakistan Movement
History ppt on World War 2 and its consequences
The Blogs_ Hamas’s Deflection Playbook _ Andy Blumenthal _ The Times of Israe...
Naidu Pushes for Rs 36 Crore Subsidy to Support Farmers in Need
Samaya Jyothi Live News Telugu | Breaking & Trusted Updates
Precised New Precis and Composition 2025.pptx
2025-07-24_CED-HWB_WIPP_ACO000000001.pdf
Mindanao Debate Lecture Presentation Outline 1.General Facts 2.Mindanao Histo...
Memecoin memecoinist news site for trends and insights
BU毕业证学历认证,阿什兰大学毕业证文凭证书
ASEANOPOL: The Multinational Police Force
Memecoin news and insights on memecoinist
End Of The Age TV Program: Depicting the Actual Truth in a World of Lies
7th-president-Ramon-Magsaysay-Presentation.pptx
The Most Dynamic Lawyer to Watch 2025.pdf
Sir Creek Conflict: History and its importance

The challenges of making Europe's newspapers available online

  • 1. The challenge of making digitised European newspaper content available online Susan Reilly, LIBER Twitter: @skreilly IFLA Newspapers, Singapore, Aug 2013
  • 2. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 2 Overview Europeana Newspapers: making European newspapers available online Assessing the state of our digitised newspapers collections Where do we go from here
  • 3. Europeana Newspapers: making European newspapers available online •Content from 20 countries! (13+7 new countries) •Aggregation of more than 18 million newspapers into Europeana •Make newspapers more accessible by applying refinement methods for OCR, OLR (article segmentation), and named entity (NER) and class recognition •Increase visibility via dedicated content browser •Ensure sustainability by spreading best practice
  • 4. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Asessing the state of Europe digitised newspaper collections •Who’s digitising newspapers? •What percentage of newspaper collections are digitised? •How many pages? •Quality of digitisation? •How are images made available?
  • 5. Findings:% of newspaper collections digitised •Survey of LIBER member (400 European research libraries) •47 responses • Does this indicate number of institutions digitising newspaper? •Less than 10% of respondents’ collections digitised • Compared to average of 20% for % of total collection digitised (Enumerate) •130 million pages and 24,000 titles • Not all libraries could provide exact figures because of cursory nature of catalogue
  • 6. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Findings: 20th century content an issue •Conservative approach to copyright terms •½ of respondents reported a cut-off date beyond which they do not make content available • Early as 1863 • Latest last 70 years •Special arrangements with publishers (23%) •Collective rights agreements too complex
  • 7. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Findings: How accessible are the collections? •85% provide free access • Sometimes only at national level •Some subscription fees/under licence
  • 8. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Findings: How rich is the content? •36% employ no OCR •50% of those who did not confident enough in results to expose OCR’d text via search interface •36% zoning and segmentation •Only 6% named entity recognition •Huge variance in metadata • Dublin Core only • Own standards
  • 9. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Challenges •Newspaper digitisation is behind •Copyright issues more complex •Lack of quality evaluation technologies for OCR •Lack of standardised metadata suited to newspapers
  • 10. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Solutions •Standardised metadata mapped to EDM •Quality evaluation technologies for OCR •Clarity over rights issues •Dialogue with publishers •More funding for digitisation • Increase visibility
  • 11. Thank you for your attention! http://www.libereurope.eu http://www.europeana-newspapers.eu/