SlideShare a Scribd company logo
Managing the Digitization of Large Press Archives
The New Library of Alexandria 
Overview 
Bibliotheca Alexandrina (BA)
Ø Center of excellence in the production 
and dissemination of knowledge 
Ø Place of dialogue, learning and 
understanding between cultures and 
peoples
Ø The World’s Window on Egypt 
Ø Egypt’s Window on the World 
Ø Instrument for Rising to the Challenges of 
the Digital Age 
Ø Center for Dialogue Between Peoples and 
Civilizations
Not just a Library of Books but rather a vast cultural and 
scientific complex
A library that can accommodate millions of books
7 
http://archive.bibalex.org
8
Managing the Digitization of Large Press Archives
Managing the Digitization of Large Press Archives
Managing the Digitization of Large Press Archives
Managing the Digitization of Large Press Archives
Managing the Digitization of Large Press Archives
14
15 
http://descegy.bibalex.org
16 
http://lartarab.bibalex.org
17 
More than 230,000 Arabic books are 
freely available online for Arabic 
readers worldwide
18 
http://suezcanal.bibalex.org
19
20 
http://naguib.bibalex.org/
21 
http://nasser.bibalex.org
22 
http://sadat.bibalex.org
Managing the Digitization of Large Press Archives
Ø Project Overview 
Ø Collection Overview 
Ø Data Representation 
Ø System Workflow 
— DAF (Digital Assets Factory) 
— Cataloguing 
— Website 
§ Solr search Engine 
§ Article Viewer 
24
25
Ø Centre for Economic, Judicial, and Social 
Study and Documentation (CEDEJ) 
collaborated with Bibliotheca Alexandrina 
(BA) for the digitization of its archive of 
massive press articles collection 
Ø The project consists of multiple modules to: 
— Index the Press Archive Collection 
— Control data entry workflow 
— Digitize and process data 
— Catalogue and review Articles 
— Archive Web Publishing 
26
27
Ø Package of press archive 
— 800,000+ press clips varying between 
§ Press 
§ Reports 
— 500+ publishers 
— 60,000+ writers and reporters 
— 200 Different subjects 
§ Economic, politics, social life, etc… 
— Archive Languages: 
§ Arabic, English and French 
— Date range from 1966 to 2009 
28
Ø Finished so far 
— 115,000 press clips varying between 
§ Press 
§ Reports 
— 200 publishers 
— 14,000 writers and reporters 
— 100 Different subjects 
§ Economic, politics, social life, etc… 
— Archive Languages: 
§ Arabic, English and French 
— Date range from 1966 to 2009 
29
30
Ø A list of packaged press archive is submitted to 
Bibliotheca Alexandrina to be scanned and 
catalogued 
Ø Source of data is a collection of boxes 
Ø The box is organized on the following 
hierarchy 
— Folder 
— File 
— Sub-File 
— Document 
Ø Document represents a single page of press 
31
32
33
34
35
36
37
38
Article Creation 
39
Article Metadata 
40
Lookups Management 
41
Reports 
42
43
44
45
Ø Based on Apache Lucene project v4.1 
Ø SolrNet API is used to connect to Solr 
server 
Ø Features 
— Simple/Advanced search 
— Results Highlighting 
— Fields AutoComplete 
— Text search (Article Viewer) 
46
47
48
49
50
51
52
53
Ø Article viewer is used for previewing articles 
— It is one of multiple viewers developed at BA 
Ø Architecture 
— Server Side: RESTful services 
— Client Side: JavaScript using JSONP 
Ø Features 
— Image preview 
— Metadata preview 
— Text selection 
— Searching/highlighting 
— Zooming options: fit width/height 
54
Ø Viewer Web Services 
— Metadata Web Service: 
§ Retrieve article catalogue metadata 
§ Return technical information (width, height, page 
count..) 
— Content Web Service: 
§ Retrieve the image of each single page in the article 
applying scaling to custom width and height 
responsively 
§ Return the selected text based on the user highlighted 
area 
— Search Web Service: 
§ Perform the search using Solr engine APIs in the 
content of the articles 
§ Highlight the matching phrases in the article image 
55
56
57
58

More Related Content

PDF
From Research Library to Research Services
PDF
Library Linked Data in Latvia - #LIBER2014 poster
PDF
lodlam summit session browsable linked data
PPT
Europeana and Schema.org - DC2013
PDF
Case study: Towards a linked digital collection of Latvian Cultural Heritage
PPTX
Katherine Kott Slides for DLF PM Group 2011
PDF
Dlf bonnie tijerina keynote
PPT
Participatory Digital Library
From Research Library to Research Services
Library Linked Data in Latvia - #LIBER2014 poster
lodlam summit session browsable linked data
Europeana and Schema.org - DC2013
Case study: Towards a linked digital collection of Latvian Cultural Heritage
Katherine Kott Slides for DLF PM Group 2011
Dlf bonnie tijerina keynote
Participatory Digital Library

Similar to Managing the Digitization of Large Press Archives (20)

PPT
(Nagy)_Digital_Library_of_Modern_History_of_Egypt.ppt
PPTX
Technology showcase
PPT
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
PPTX
How Libraries Use Publisher Metadata Redux (Steven Shadle)
PDF
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
PPT
BL Labs and Digital Humanities
PDF
Recommendation and the Library
PPT
British Library Labs Competition Presentation - Digital Humanities, Universit...
PPT
Knowledge and Wisdom: the role of research libraries in supporting the Europe...
PPT
Linked Data for Libraries: Benefits of a Conceptual Shift from Library-Specif...
PPT
Ifla swsig meeting - Puerto Rico - 20110817
PPT
Social networks and collaborative tool: connecting information in the Googlez...
PPT
Social networks and collaborative tools: connecting informations in the Googl...
PPTX
Digital Cultural Heritage: Experiences from British Library
PPTX
Digital Cultural Heritage: Experiences from British Library
PPT
Semantic Web special interest group meeting - IFLA WLIC 2012
PPTX
Emtacl12, mlibraries12 conferences, 2012
PDF
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200
PPT
Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013
PPT
BL Labs at Bloomsbury Digital Humanities Group
(Nagy)_Digital_Library_of_Modern_History_of_Egypt.ppt
Technology showcase
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
How Libraries Use Publisher Metadata Redux (Steven Shadle)
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
BL Labs and Digital Humanities
Recommendation and the Library
British Library Labs Competition Presentation - Digital Humanities, Universit...
Knowledge and Wisdom: the role of research libraries in supporting the Europe...
Linked Data for Libraries: Benefits of a Conceptual Shift from Library-Specif...
Ifla swsig meeting - Puerto Rico - 20110817
Social networks and collaborative tool: connecting information in the Googlez...
Social networks and collaborative tools: connecting informations in the Googl...
Digital Cultural Heritage: Experiences from British Library
Digital Cultural Heritage: Experiences from British Library
Semantic Web special interest group meeting - IFLA WLIC 2012
Emtacl12, mlibraries12 conferences, 2012
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200
Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013
BL Labs at Bloomsbury Digital Humanities Group
Ad

More from DLFCLIR (10)

PPT
Public Knowledge Project
PPTX
Biomedical Annotation - Kevin Livingston
PPTX
Introducing NYU to Digital Scholarship: A faculty-library partnership
PPTX
Collaborative Service Models: Building Support for Digital Scholarship
PPTX
Sustaining ArchivesSpace
PPTX
From Projects to... Services
PDF
An Introduction to Linked Data and Microdata
PPTX
Dlf 2011UDFR-a-semantic-registry-for-format-representation-information-v1
PPT
Charter Nonstarter by Eric Stedfeld, NYU
PPT
Hypatia for dlf 2011
Public Knowledge Project
Biomedical Annotation - Kevin Livingston
Introducing NYU to Digital Scholarship: A faculty-library partnership
Collaborative Service Models: Building Support for Digital Scholarship
Sustaining ArchivesSpace
From Projects to... Services
An Introduction to Linked Data and Microdata
Dlf 2011UDFR-a-semantic-registry-for-format-representation-information-v1
Charter Nonstarter by Eric Stedfeld, NYU
Hypatia for dlf 2011
Ad

Recently uploaded (20)

PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
Indian roads congress 037 - 2012 Flexible pavement
PDF
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
PPTX
Introduction to Building Materials
PDF
IGGE1 Understanding the Self1234567891011
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PPTX
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PPTX
Digestion and Absorption of Carbohydrates, Proteina and Fats
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
RMMM.pdf make it easy to upload and study
PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Indian roads congress 037 - 2012 Flexible pavement
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
Introduction to Building Materials
IGGE1 Understanding the Self1234567891011
LDMMIA Reiki Yoga Finals Review Spring Summer
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Digestion and Absorption of Carbohydrates, Proteina and Fats
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
History, Philosophy and sociology of education (1).pptx
A systematic review of self-coping strategies used by university students to ...
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
RMMM.pdf make it easy to upload and study
Weekly quiz Compilation Jan -July 25.pdf
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS

Managing the Digitization of Large Press Archives

  • 2. The New Library of Alexandria Overview Bibliotheca Alexandrina (BA)
  • 3. Ø Center of excellence in the production and dissemination of knowledge Ø Place of dialogue, learning and understanding between cultures and peoples
  • 4. Ø The World’s Window on Egypt Ø Egypt’s Window on the World Ø Instrument for Rising to the Challenges of the Digital Age Ø Center for Dialogue Between Peoples and Civilizations
  • 5. Not just a Library of Books but rather a vast cultural and scientific complex
  • 6. A library that can accommodate millions of books
  • 8. 8
  • 14. 14
  • 17. 17 More than 230,000 Arabic books are freely available online for Arabic readers worldwide
  • 19. 19
  • 24. Ø Project Overview Ø Collection Overview Ø Data Representation Ø System Workflow — DAF (Digital Assets Factory) — Cataloguing — Website § Solr search Engine § Article Viewer 24
  • 25. 25
  • 26. Ø Centre for Economic, Judicial, and Social Study and Documentation (CEDEJ) collaborated with Bibliotheca Alexandrina (BA) for the digitization of its archive of massive press articles collection Ø The project consists of multiple modules to: — Index the Press Archive Collection — Control data entry workflow — Digitize and process data — Catalogue and review Articles — Archive Web Publishing 26
  • 27. 27
  • 28. Ø Package of press archive — 800,000+ press clips varying between § Press § Reports — 500+ publishers — 60,000+ writers and reporters — 200 Different subjects § Economic, politics, social life, etc… — Archive Languages: § Arabic, English and French — Date range from 1966 to 2009 28
  • 29. Ø Finished so far — 115,000 press clips varying between § Press § Reports — 200 publishers — 14,000 writers and reporters — 100 Different subjects § Economic, politics, social life, etc… — Archive Languages: § Arabic, English and French — Date range from 1966 to 2009 29
  • 30. 30
  • 31. Ø A list of packaged press archive is submitted to Bibliotheca Alexandrina to be scanned and catalogued Ø Source of data is a collection of boxes Ø The box is organized on the following hierarchy — Folder — File — Sub-File — Document Ø Document represents a single page of press 31
  • 32. 32
  • 33. 33
  • 34. 34
  • 35. 35
  • 36. 36
  • 37. 37
  • 38. 38
  • 43. 43
  • 44. 44
  • 45. 45
  • 46. Ø Based on Apache Lucene project v4.1 Ø SolrNet API is used to connect to Solr server Ø Features — Simple/Advanced search — Results Highlighting — Fields AutoComplete — Text search (Article Viewer) 46
  • 47. 47
  • 48. 48
  • 49. 49
  • 50. 50
  • 51. 51
  • 52. 52
  • 53. 53
  • 54. Ø Article viewer is used for previewing articles — It is one of multiple viewers developed at BA Ø Architecture — Server Side: RESTful services — Client Side: JavaScript using JSONP Ø Features — Image preview — Metadata preview — Text selection — Searching/highlighting — Zooming options: fit width/height 54
  • 55. Ø Viewer Web Services — Metadata Web Service: § Retrieve article catalogue metadata § Return technical information (width, height, page count..) — Content Web Service: § Retrieve the image of each single page in the article applying scaling to custom width and height responsively § Return the selected text based on the user highlighted area — Search Web Service: § Perform the search using Solr engine APIs in the content of the articles § Highlight the matching phrases in the article image 55
  • 56. 56
  • 57. 57
  • 58. 58