This document summarizes a presentation given by Clemens Neudecker of the Staatsbibliothek zu Berlin on reading a million books and newspapers through digitization. It discusses various digital library projects and collections containing millions of digitized objects. It then focuses on the Europeana Newspapers project, which has digitized over 12 million historic newspaper pages from across Europe. The presentation describes the formats and standards used in digitization, as well as tools for working with digitized content. It also evaluates the performance of optical character recognition on the Europeana Newspapers collection and challenges involved in processing historic newspaper text.
Related topics: