SlideShare a Scribd company logo
Goobi in the Wellcome Library

      Digitisation Roadshow, Linz, Feb 2013

                 Dave Thompson
        Digital Curator, Wellcome Library
Goobi in the Wellcome Library
  •  In production March 2012.
  •  6 Servers running Goobi – test & production.
  •  11 staff users, some part time.
  •  1.2 million images processed & available via Library website.
  •  Can upload maximum of <1000 objects into SDB per 24 hrs.
  •  Total space allocated to Goobi is 40tb.
Digitising books can be boring…
…there isn’t much to see...
…but we have done more than just text.
A strategic approach

     •  Library transformation strategy, physical to digital.
     •  From ‘project’ to ‘production’.
     •  Digitisation as a sustainable end-to-end process.
     •  18 month pilot/implementation project.
     •  Just taken into production.
Diverse sources of content

     •  In-house digitisation.
     •  External contractors.
     •  Contractors working in-house.
     •  External organisations digitising their content for
        us.
Where did Goobi come from?

     •  Late 2010 early 2011 as plans for developing SDB
        grew realised that we needed a means of mass
        import of digital content.
     •  Began to think about high volume production &
        the management of that.
     •  Early modelling of our systems suggested that we
        needed a tool to manage production of content.
     •  Began looking at workflow tracking systems.
Needed to use existing Library tools
Perceived benefits of Goobi

     •  Web based distributed access to concurrent
        users.
     •  Flexible workflow based processing, managed
        through ‘Projects’.
     •  Workflow process enforced, ensures accuracy &
        efficiency.
     •  Adaptable to different types of content.
     •  Initiates & manages esternal processes via
        Intranda task manager (ITM).
     •  METS as basis of access & access control.
Rapid evolution of Goobi

     •  Goobi we have now quite different to what we
        bought.
     •  Initial configuration to import MARC XML DMD &
        to automate ingest into SDB.
     •  Initially Goobi didn’t scale to met our ambition.
     •  Initial install monolithic, now running Goobi as
        distributed services.
     •  Developed new features with Intranda, e.g.
        Jpylyzation.
Working with DMD

    •  Upload MARC XML DMD exported from Sierra
       using standard Goobi features.
    •  MARC fields edited to provide a consistent Goobi
       process title, e.g. using shelf mark.
    •  MARC Leader 6 field identifies content type, e.g
       ‘Archive’ or ‘Monograph’.
    •  Content ‘type’ used by Goobi to set default METS
       access conditions.
    •  DMD not delivered to end user, that comes from
       live catalogue.
Uploading content

     •  Content upload using the Sync2Goobi Tool for
        bulk import.
     •  Drag ‘n drop interface.
     •  Can be either TIFF or JP2.
     •  Project based workflow templates manage either
        format.
     •  Use Goobi Mount Tool (GMT) to access/manage
        content already uploaded.
Using METS Editor

     •  Main point of human interaction with Goobi. Goobi
        automates METS creation.
     •  METS basis for access control & usage conditions
        for material.
     •  Basis for retrieval of content from SDB by using
        SDB PUIDs.
     •  Goobi automates ingest of content into SDB &
        receives AMD in return.
How we use METS

    •  Setting material type & default values for access
       based on DMD.
    •  Access restrictions can be at the item level.
    •  DMD in METS not delivered to end user, serves
       only to help a human identify content when
       snagging.
Goobi in the Wellcome Library
Shared development

     •  Wellcome Trust is not a development house. Rely
        on Intranda to provide development support.
     •  Developed specifc requirememnts for extensions
        to Goobi, e.g. Jpylyser for JPEG2000 validation.
     •  Development proposals from both sides. We have
        idea, Intranda helps us make that idea a reality.
     •  Benefit from community developments
        commissioned by others.
Additional Tools

     •  Lurawave for converting TIFF to JPEG2000.
     •  Jpylyzer for validating JPEG2000 files.
     •  Sync2Goobi Tool for bulk upload of content.
     •  Goobi Mount Tool/MS Windows File Explorer for
        access to ‘Home’ folders.
Goobi – the future

     •  Built in OCR & creation of ALTO files.
     •  Further refinement of Sync2Goobi Tool.
     •  Further development/integration of validation
        tools.
     •  Integration of ftp with Goobi for 3rd party direct
        upload of content.
     •  Establishment of separate database server for
        Goobi.
Lessons learned - systems

     •  We were ambitious but underestimated what
        capacity we would require.
     •  Underestimated storage requirements.
     •  Underestimated the desirability of high levels of
        automation.
     •  Focus human interaction at as few points as
        possible.
Lessons learned - Intranda

     •  Have relied heavily on input & support from
        Intranda.
     •  Share information with Intranda & trust them to
        provide answers.
     •  Be prepared to share development. But be
        prepared to accept some pain.
Lessons learned - Goobi

     •  In less than a year Goobi has become key to
        delivering the Library’s content.
     •  Centralised user activities in one system – Goobi
        – less to learn, more efficient.
     •  Streamline & automate. High volume efficient
        production essential.
     •  Streamline other digitisation & access processes
        to match Goobi.
     •  METS an efficient single place for access related
        metadata.
Thank you

Questions now, questions later…?

   Dave Thompson, Digital Curator
         Wellcome Library

       d.thompson@wellcome.ac.uk

         http://wellcomelibrary.org/

More Related Content

PPT
Digitisation at Scale: Automating the mass acquisition of digitised content
PDF
X All The Things: Enterprise Content Management
PPTX
Catherine Grout/ Sarah Fahmy- JISC
PPTX
Copyright Clearance for Genetics Books, A pilot project at the Wellcome Library
PDF
Notes from breakout session group 2
PDF
Analysing digital audiences for first world war digital content
PPTX
Eric Mayer and Kathryn Eccles, Oxford Internet Institute
PPTX
Max hammond- Curtis and Cartwright
Digitisation at Scale: Automating the mass acquisition of digitised content
X All The Things: Enterprise Content Management
Catherine Grout/ Sarah Fahmy- JISC
Copyright Clearance for Genetics Books, A pilot project at the Wellcome Library
Notes from breakout session group 2
Analysing digital audiences for first world war digital content
Eric Mayer and Kathryn Eccles, Oxford Internet Institute
Max hammond- Curtis and Cartwright

Similar to Goobi in the Wellcome Library (20)

PPT
Systems and Processes: making order out of chaos
PPT
Dave's Wellcome Library digitisation presentation
PPT
Wt dnt digitisation_open_day_v9
PPTX
Building a Documentation Portal
PPTX
Connecting Intelligent Content with Micropublishing and Beyond
PDF
Google Summer of Code 2011: UOC & Apertium
PPTX
Automate Hadoop Cluster Deployment in a Banking Ecosystem
PPTX
You Don't Need IT To Do That - The World of Outsourcing and SaaS
PPTX
08 jorsek llc
PPTX
Untangling spring week2
PDF
Bill McCoy氏:電子出版の将来展望
PPTX
Tips for a successful SharePoint Migration strategy
PPTX
2015 WritersUA Sourcing Graphics
PPT
Mongo DB for Java, Python and PHP Developers
PPTX
OS Accelerate London - 09/16/15
PPTX
Targeted documentation STC Houston, Mar 20, 2012
PPTX
A Tale from the Upstream Path
PDF
How Not to Be Conned by Your Drupal Vendor!
PDF
Everyone wants (someone else) to do it: writing documentation for open source...
PPTX
Content Management Systems and Refactoring - Drupal, WordPress and eZ Publish
Systems and Processes: making order out of chaos
Dave's Wellcome Library digitisation presentation
Wt dnt digitisation_open_day_v9
Building a Documentation Portal
Connecting Intelligent Content with Micropublishing and Beyond
Google Summer of Code 2011: UOC & Apertium
Automate Hadoop Cluster Deployment in a Banking Ecosystem
You Don't Need IT To Do That - The World of Outsourcing and SaaS
08 jorsek llc
Untangling spring week2
Bill McCoy氏:電子出版の将来展望
Tips for a successful SharePoint Migration strategy
2015 WritersUA Sourcing Graphics
Mongo DB for Java, Python and PHP Developers
OS Accelerate London - 09/16/15
Targeted documentation STC Houston, Mar 20, 2012
A Tale from the Upstream Path
How Not to Be Conned by Your Drupal Vendor!
Everyone wants (someone else) to do it: writing documentation for open source...
Content Management Systems and Refactoring - Drupal, WordPress and eZ Publish
Ad

More from goobi_org (19)

PPT
Leistungsvergleich Präsentationsoberflächen für digitale Sammlungen 2013
PDF
Dokumenten-Management der Herzogin Anna Amalia Bibliothek Weimar: Ziele und G...
PPT
Gottfried Wilhelm Leibniz Bibliothek – Besonderheiten der Digitalen Sammlungen
PPT
Staatsbibliothek zu Berlin: Neue Entwicklungen und Projekte
PPT
Goobi-Anwendung an der UB Bielefeld
PPT
FulDig - Fuldaer Digitalisierungsserver der Hochschul- und Landesbibliothek
PPTX
Aufbau des Digitalisierungsreferats der UB TU Berlin
PPT
Aktuelle "Baustellen" und Fragen - Goobi an der Stabi Hamburg
PPS
Goobi an der UB Kassel - ORKA, Fortschritte und zukünftige Aufgaben
PPTX
GEI digital - Aufbau einer fachlichen Digitalisierungsplattform für externe D...
PPT
Goobi an der Univesitätsbibliothek Greifswald
PPT
Goobi in der Verbundzentrale des GBV
PPT
Goobi-Einsatz in der Zentral- und Landesbibliothek Berlin
PDF
Hamburgensien digital – Goobi an der Stabi Hamburg
PPT
Goobi für alle(s).
PPTX
Goobi. Digitalisieren im Verein - Leipzig, 13.03.2013
PPT
Digitalisierung in der Stabi Hamburg - zwischen Projekten und Routine
PPTX
Goobi e.V.: Strukturen und Ergebnisse der Anwendergemeinschaft
PDF
Mit Goobi in die Deutsche Digitale Bibliothek
Leistungsvergleich Präsentationsoberflächen für digitale Sammlungen 2013
Dokumenten-Management der Herzogin Anna Amalia Bibliothek Weimar: Ziele und G...
Gottfried Wilhelm Leibniz Bibliothek – Besonderheiten der Digitalen Sammlungen
Staatsbibliothek zu Berlin: Neue Entwicklungen und Projekte
Goobi-Anwendung an der UB Bielefeld
FulDig - Fuldaer Digitalisierungsserver der Hochschul- und Landesbibliothek
Aufbau des Digitalisierungsreferats der UB TU Berlin
Aktuelle "Baustellen" und Fragen - Goobi an der Stabi Hamburg
Goobi an der UB Kassel - ORKA, Fortschritte und zukünftige Aufgaben
GEI digital - Aufbau einer fachlichen Digitalisierungsplattform für externe D...
Goobi an der Univesitätsbibliothek Greifswald
Goobi in der Verbundzentrale des GBV
Goobi-Einsatz in der Zentral- und Landesbibliothek Berlin
Hamburgensien digital – Goobi an der Stabi Hamburg
Goobi für alle(s).
Goobi. Digitalisieren im Verein - Leipzig, 13.03.2013
Digitalisierung in der Stabi Hamburg - zwischen Projekten und Routine
Goobi e.V.: Strukturen und Ergebnisse der Anwendergemeinschaft
Mit Goobi in die Deutsche Digitale Bibliothek
Ad

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
August Patch Tuesday
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Hybrid model detection and classification of lung cancer
PPTX
Tartificialntelligence_presentation.pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
Chapter 5: Probability Theory and Statistics
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Mushroom cultivation and it's methods.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
Encapsulation theory and applications.pdf
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
August Patch Tuesday
SOPHOS-XG Firewall Administrator PPT.pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Approach and Philosophy of On baking technology
Hybrid model detection and classification of lung cancer
Tartificialntelligence_presentation.pptx
1 - Historical Antecedents, Social Consideration.pdf
OMC Textile Division Presentation 2021.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Heart disease approach using modified random forest and particle swarm optimi...
Chapter 5: Probability Theory and Statistics
A comparative analysis of optical character recognition models for extracting...
Univ-Connecticut-ChatGPT-Presentaion.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Programs and apps: productivity, graphics, security and other tools
Mushroom cultivation and it's methods.pdf
Encapsulation_ Review paper, used for researhc scholars

Goobi in the Wellcome Library

  • 1. Goobi in the Wellcome Library Digitisation Roadshow, Linz, Feb 2013 Dave Thompson Digital Curator, Wellcome Library
  • 2. Goobi in the Wellcome Library •  In production March 2012. •  6 Servers running Goobi – test & production. •  11 staff users, some part time. •  1.2 million images processed & available via Library website. •  Can upload maximum of <1000 objects into SDB per 24 hrs. •  Total space allocated to Goobi is 40tb.
  • 3. Digitising books can be boring…
  • 5. …but we have done more than just text.
  • 6. A strategic approach •  Library transformation strategy, physical to digital. •  From ‘project’ to ‘production’. •  Digitisation as a sustainable end-to-end process. •  18 month pilot/implementation project. •  Just taken into production.
  • 7. Diverse sources of content •  In-house digitisation. •  External contractors. •  Contractors working in-house. •  External organisations digitising their content for us.
  • 8. Where did Goobi come from? •  Late 2010 early 2011 as plans for developing SDB grew realised that we needed a means of mass import of digital content. •  Began to think about high volume production & the management of that. •  Early modelling of our systems suggested that we needed a tool to manage production of content. •  Began looking at workflow tracking systems.
  • 9. Needed to use existing Library tools
  • 10. Perceived benefits of Goobi •  Web based distributed access to concurrent users. •  Flexible workflow based processing, managed through ‘Projects’. •  Workflow process enforced, ensures accuracy & efficiency. •  Adaptable to different types of content. •  Initiates & manages esternal processes via Intranda task manager (ITM). •  METS as basis of access & access control.
  • 11. Rapid evolution of Goobi •  Goobi we have now quite different to what we bought. •  Initial configuration to import MARC XML DMD & to automate ingest into SDB. •  Initially Goobi didn’t scale to met our ambition. •  Initial install monolithic, now running Goobi as distributed services. •  Developed new features with Intranda, e.g. Jpylyzation.
  • 12. Working with DMD •  Upload MARC XML DMD exported from Sierra using standard Goobi features. •  MARC fields edited to provide a consistent Goobi process title, e.g. using shelf mark. •  MARC Leader 6 field identifies content type, e.g ‘Archive’ or ‘Monograph’. •  Content ‘type’ used by Goobi to set default METS access conditions. •  DMD not delivered to end user, that comes from live catalogue.
  • 13. Uploading content •  Content upload using the Sync2Goobi Tool for bulk import. •  Drag ‘n drop interface. •  Can be either TIFF or JP2. •  Project based workflow templates manage either format. •  Use Goobi Mount Tool (GMT) to access/manage content already uploaded.
  • 14. Using METS Editor •  Main point of human interaction with Goobi. Goobi automates METS creation. •  METS basis for access control & usage conditions for material. •  Basis for retrieval of content from SDB by using SDB PUIDs. •  Goobi automates ingest of content into SDB & receives AMD in return.
  • 15. How we use METS •  Setting material type & default values for access based on DMD. •  Access restrictions can be at the item level. •  DMD in METS not delivered to end user, serves only to help a human identify content when snagging.
  • 17. Shared development •  Wellcome Trust is not a development house. Rely on Intranda to provide development support. •  Developed specifc requirememnts for extensions to Goobi, e.g. Jpylyser for JPEG2000 validation. •  Development proposals from both sides. We have idea, Intranda helps us make that idea a reality. •  Benefit from community developments commissioned by others.
  • 18. Additional Tools •  Lurawave for converting TIFF to JPEG2000. •  Jpylyzer for validating JPEG2000 files. •  Sync2Goobi Tool for bulk upload of content. •  Goobi Mount Tool/MS Windows File Explorer for access to ‘Home’ folders.
  • 19. Goobi – the future •  Built in OCR & creation of ALTO files. •  Further refinement of Sync2Goobi Tool. •  Further development/integration of validation tools. •  Integration of ftp with Goobi for 3rd party direct upload of content. •  Establishment of separate database server for Goobi.
  • 20. Lessons learned - systems •  We were ambitious but underestimated what capacity we would require. •  Underestimated storage requirements. •  Underestimated the desirability of high levels of automation. •  Focus human interaction at as few points as possible.
  • 21. Lessons learned - Intranda •  Have relied heavily on input & support from Intranda. •  Share information with Intranda & trust them to provide answers. •  Be prepared to share development. But be prepared to accept some pain.
  • 22. Lessons learned - Goobi •  In less than a year Goobi has become key to delivering the Library’s content. •  Centralised user activities in one system – Goobi – less to learn, more efficient. •  Streamline & automate. High volume efficient production essential. •  Streamline other digitisation & access processes to match Goobi. •  METS an efficient single place for access related metadata.
  • 23. Thank you Questions now, questions later…? Dave Thompson, Digital Curator Wellcome Library d.thompson@wellcome.ac.uk http://wellcomelibrary.org/