SlideShare a Scribd company logo
RESEARCH POSTER PRESENTATION DESIGN © 2011
www.PosterPresentations.com
ABSTRACT	
  
BACKGROUND	
  
REFERENCES	
  
Feinberg School of Medicine researchers need to be able to correctly
represent their publications, therefore we created an extension to the
VIVO-ISF Ontology to represent most of the National Library of Medicine
publication types. This allows for granularity and correct representation of
types of scholarly outputs by FSM researchers.
VIVO-­‐ISF	
  Ontology	
  and	
  Local	
  
Northwestern	
  Ontology	
  Extensions	
  
to	
  represent	
  Na.onal	
  Library	
  of	
  
Medicine	
  Publica.on	
  Types	
  
IIIF	
  &	
  OpenSeadragon	
  
Repository	
  &	
  FSM	
  Databases	
  System	
  Architecture	
  and	
  Customiza.ons	
  
By establishing a digital repository for the Feinberg School of Medicine
(FSM) (Northwestern University, Chicago campus), we anticipate gaining
the ability to create, share, and preserve attractive, functional, and
citable digital collections and exhibits. We followed the National Library of
Medicine master evaluation criteria by looking at various factors that
included: functionality, scalability, extensibility, interoperability, ease
of deployment, system security, system, physical environment,
platform support, demonstrated successful deployments, system
support, strength of development community, stability of development
organization, and strength of technology roadmap for the future. These
factors played a significant role in determining the best platform for our
needs with special attention to interoperability and strength of the
technology roadmap for the future. These factors are especially important
for our case considering the desire to connect the digital repository with
platforms that produce VIVO-compatible structured linked data. VIVO is a
linked data platform that serves as a researchers’ hub and which provides
the names of researchers from academic institutions along with their
research output, affiliation, research overview, service, background,
researcher’s identities, teaching, and much more. VIVO’s semantic
approach to research networking has been widely adopted and the VIVO
data standard is a recommendation and best practice for representation of
information about research and researchers across the 62-member Clinical
and Translational Science Award (CTSA) Consortium. CTSA Hubs are
encouraged to “implement research networking tool(s) institution-wide
that utilize RDF triples and an ontology compatible with the VIVO
ontology… [and] people profiles at institutions should be publicly available
… as Linked Open Data.”[1]
The Galter Health Sciences Library team, as a member of the
Northwestern University Clinical and Translational Sciences Institute
(NUCATS), is establishing a digital repository to enable open
representation of diverse scholarly outputs and outcomes by our scholars.
Open access principles can help guide dissemination strategies for the
broad range of products and outcomes of research from the diverse
biomedical workforce. Our goal is to provide a digital home for traditional
and non-traditional scholarly outputs in the Galter Digital Repository
(GDR). Non-traditional outputs (defined for this purpose) are items
produced during the scholarly process but which are often not discoverable
or made available for reuse through the traditional scholarly publishing
workflow, including measurement devices, patient education materials,
curriculum materials, conference materials, community engagement
materials, and so on. Open access and availability to the products and
outcomes of research are increasingly required by funders and can serve as
an important way to demonstrate return on investment to partners and our
communities. For these reasons, the GDR serves as an important lynchpin
in the evaluation and continuous improvement activities of NUCATS and
other projects at FSM. FSM also has a rich digital heritage which we will
continue to build through the GDR, as well.
After taking into account the possibilities of the different frameworks that
provide digital repositories architecture we selected the Fedora open
source architecture. From Fedora’s DuraSpace wiki page: “[Fedora’s]
flexibility enables it to integrate with many types of enterprise and web-
based systems, offering scalability and durability. It also provides the
ability to express rich sets of relationships among digital resources and to
query the repository using the semantic web's SPARQL query language.”[2]
Our first test collection was the collection of photographs, manuscripts,
letters, and addresses (speeches) by/about Greene Vardiman Black, the
father of modern dentistry. The collection was previously digitized and
described with the help of Encoded Archival Description (EAD). EAD is an
XML standard for encoding archival finding aids, maintained by the
Technical Subcommittee for Encoded Archival Description of the Society of
American Archivists, in partnership with the Library of Congress. We
cross-walked the existing EAD metadata in order to display it in our
repository stack Fedora/Sufia/Blacklight.
To provide for rich metadata we added the Medical Subject Headings
terms (MESH), Library of Congress Subject Headings (LCSH), Subject
Names, and Subject Geographic Names to enable users to select keywords
and subjects from a controlled vocabulary. We also expanded the possible
“Resource type” options by adding publication types from VIVO-ISF
Ontology and the Local Northwestern Ontology to accommodate all the
publication types from the National Library of Medicine. This will allow us
to seamlessly move data between systems: Repository and VIVO.
GV	
  Black	
  Collec.on	
  
All images in the repository are viewable through our IIIF server.
Furthermore, we serve IIIF presentation metadata for all of our collections
and files. For the front-end pager we use the actively maintained
OpenSeadragon, included with Sufia. In addition to paging, it supports
features such as zooming, panning and browsing. Other pagers can be
easily integrated as long as they support IIIF.
Inclusion of IIIF allows us to group series of files into entities such as books
and then display them using OpenSeadragon. Users have the ability to mark
any Collection as ‘Multi-Part’. A ‘Multi-Part Collection’ contains individual
files and a link to a combined PDF file.
Each file in such a collection can have a page number that can be anything
alpha-numeric as found in the source document, backed by a sort-number
indicating the order pages are to be displayed. Only files with page
numbers are considered part of the collection and are viewable in
OpenSeadragon pager.
All IIIF related information are indexed in Solr along with other RDF
metadata for fast retrieval. The IIIF server respects the authorization
scheme used in the hydra-access-controls gem. The IIIF service uses
permission information stored in Fedora, indexed in Solr and it serves
images only to properly authorized users.
Current: LDAP integration
We integrated the Lightweight Directory Access Protocol (LDAP) into the
workflow enabling users to select names of creators and contributors from
a controlled list of names which is the same data source for the
Northwestern VIVO instance.
Future: Symplectic, VIVO, ORCID, Shibboleth, FASIS
[1] The Clinical and Translational Science Awards (CTSA) Consortium,
2013. Research Networking,
https://www.ctsacentral.org/best%20practices/research%20networking
[2] DuraSpace, 2013. Fedora, http://www.duraspace.org/about_fedora
[3] Society of American Archivists, 2015. Encoding Archival Description
(EAD). http://www2.archivists.org/groups/technical-subcommittee-on-
encoded-archival-description-ead/encoded-archival-description-ead
Our repository runs Fedora 4.2 and is front-ended by heavily customized
version of Sufia 6.0. We have integrated the authentication with LDAP and
groups used for authorization are also sourced from LDAP. Furthermore,
certain metadata fields such as ‘Creator’ only allow entries that are
verifiable in LDAP to consistently identify individuals. All our software runs
on CentOS 6.5 and we use Postgres as a relational database.
We have the ability to batch various feeds including EAD to create
hierarchical collections. Rails stack is proxied by Passenger and Apache
and includes various parts of the Hydra stack, custom RDF vocabularies,
IIIF server and background workers for Resque.
Our code is open to the public on Github:
https://github.com/galterlibrary/digital-repository/tree/master/app.
The main goal of customization is to make sure that we are able to keep
up with the upstream. If changes to the upstream code are needed pull
requests should be opened with the upstream. All other changes are done
by monkey-patching appropriate objects.
Starting from scratch – building the perfect digital repository
Galter	
  Health	
  Sciences	
  Library,	
  Feinberg	
  School	
  of	
  Medicine,	
  Northwestern	
  University	
  
Piotr	
  Hebal,	
  Violeta	
  Ilik,	
  and	
  Kris.	
  Holmes	
  

More Related Content

PDF
Karma is a tool! Managing your Data
PDF
Modeling Data with Karma – Data Integration Tool
PDF
Karma Data Modeling
PDF
What do MARC, RDF, and OWL have in common?
PDF
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
PPTX
Integrating with others: Stable VIVO URIs for local authority records; linkin...
PDF
Visualizing data
PDF
It Takes a Village to Grow ORCIDs on Campus: Establishing and Integrating Uni...
Karma is a tool! Managing your Data
Modeling Data with Karma – Data Integration Tool
Karma Data Modeling
What do MARC, RDF, and OWL have in common?
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
Integrating with others: Stable VIVO URIs for local authority records; linkin...
Visualizing data
It Takes a Village to Grow ORCIDs on Campus: Establishing and Integrating Uni...

What's hot (20)

PDF
The Case for Stable VIVO URIs
PPTX
Crediting informatics and data folks in life science teams
PDF
Access to Graduate Scholarship in VIVO: Establishing Connections and Tracing ...
PDF
LKG Editor Dev
PPTX
Linking Data, Linking People
PPTX
Agenda's for Preservation Research
PPTX
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
PDF
Dash: data sharing made easy
PPTX
LOD/LAM Presentation
PPTX
Linked data 101: Getting Caught in the Semantic Web
PPTX
Current metadata landscape in the library world Getaneh Alemu
PPTX
Make your data great again - Ver 2
PPTX
BibBase Linked Data Triplification Challenge 2010 Presentation
PPTX
PDF
Collaborative ontology development
PPTX
VRA_2015_CatalogingRoundup_Seneff
PPTX
Dataset Metadata, Tools and Approaches for Access and Preservation
PPTX
Data formats and ontologies
The Case for Stable VIVO URIs
Crediting informatics and data folks in life science teams
Access to Graduate Scholarship in VIVO: Establishing Connections and Tracing ...
LKG Editor Dev
Linking Data, Linking People
Agenda's for Preservation Research
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Dash: data sharing made easy
LOD/LAM Presentation
Linked data 101: Getting Caught in the Semantic Web
Current metadata landscape in the library world Getaneh Alemu
Make your data great again - Ver 2
BibBase Linked Data Triplification Challenge 2010 Presentation
Collaborative ontology development
VRA_2015_CatalogingRoundup_Seneff
Dataset Metadata, Tools and Approaches for Access and Preservation
Data formats and ontologies
Ad

Similar to Starting from scratch – building the perfect digital repository (20)

PPTX
Alamw15 VIVO
PPT
5-14-13 An Introduction to VIVO Presentation Slides
PPT
VIVO: enabling the discovery of research and scholarship
PDF
#ALAAC15 Linked Data Love
PDF
Kristi Holmes. A bird’s-eye view of scholarship at the individual, institutio...
PPTX
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
PPTX
VIVO at the University of Idaho
PPTX
Charleston Conference: VIVO, libraries, and users.
PPTX
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
PPTX
PPTX
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
PDF
On the Front Lines of Health Sciences: Trends, Trials, and Tribulations (10th...
PPTX
Vivo; Discovery; Profile Management; Data management
PPTX
Linked Open Data_mlanet13
PPTX
5-pln-1520-Conlon
PDF
An Introduction to VIVO
PDF
Final Acb All Hands 26 11 07.Key
PDF
BioSharing update and next steps - ELIXIR ALL Hands - March, 2015
PPT
NCompass Live: Digital Resources of the National Library of Medicine
Alamw15 VIVO
5-14-13 An Introduction to VIVO Presentation Slides
VIVO: enabling the discovery of research and scholarship
#ALAAC15 Linked Data Love
Kristi Holmes. A bird’s-eye view of scholarship at the individual, institutio...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
VIVO at the University of Idaho
Charleston Conference: VIVO, libraries, and users.
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
On the Front Lines of Health Sciences: Trends, Trials, and Tribulations (10th...
Vivo; Discovery; Profile Management; Data management
Linked Open Data_mlanet13
5-pln-1520-Conlon
An Introduction to VIVO
Final Acb All Hands 26 11 07.Key
BioSharing update and next steps - ELIXIR ALL Hands - March, 2015
NCompass Live: Digital Resources of the National Library of Medicine
Ad

Recently uploaded (20)

PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Pharma ospi slides which help in ospi learning
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Cell Types and Its function , kingdom of life
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Basic Mud Logging Guide for educational purpose
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Microbial diseases, their pathogenesis and prophylaxis
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPH.pptx obstetrics and gynecology in nursing
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Abdominal Access Techniques with Prof. Dr. R K Mishra
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Renaissance Architecture: A Journey from Faith to Humanism
Week 4 Term 3 Study Techniques revisited.pptx
102 student loan defaulters named and shamed – Is someone you know on the list?
Pharma ospi slides which help in ospi learning
human mycosis Human fungal infections are called human mycosis..pptx
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Cell Types and Its function , kingdom of life
Supply Chain Operations Speaking Notes -ICLT Program
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Anesthesia in Laparoscopic Surgery in India
Basic Mud Logging Guide for educational purpose

Starting from scratch – building the perfect digital repository

  • 1. RESEARCH POSTER PRESENTATION DESIGN © 2011 www.PosterPresentations.com ABSTRACT   BACKGROUND   REFERENCES   Feinberg School of Medicine researchers need to be able to correctly represent their publications, therefore we created an extension to the VIVO-ISF Ontology to represent most of the National Library of Medicine publication types. This allows for granularity and correct representation of types of scholarly outputs by FSM researchers. VIVO-­‐ISF  Ontology  and  Local   Northwestern  Ontology  Extensions   to  represent  Na.onal  Library  of   Medicine  Publica.on  Types   IIIF  &  OpenSeadragon   Repository  &  FSM  Databases  System  Architecture  and  Customiza.ons   By establishing a digital repository for the Feinberg School of Medicine (FSM) (Northwestern University, Chicago campus), we anticipate gaining the ability to create, share, and preserve attractive, functional, and citable digital collections and exhibits. We followed the National Library of Medicine master evaluation criteria by looking at various factors that included: functionality, scalability, extensibility, interoperability, ease of deployment, system security, system, physical environment, platform support, demonstrated successful deployments, system support, strength of development community, stability of development organization, and strength of technology roadmap for the future. These factors played a significant role in determining the best platform for our needs with special attention to interoperability and strength of the technology roadmap for the future. These factors are especially important for our case considering the desire to connect the digital repository with platforms that produce VIVO-compatible structured linked data. VIVO is a linked data platform that serves as a researchers’ hub and which provides the names of researchers from academic institutions along with their research output, affiliation, research overview, service, background, researcher’s identities, teaching, and much more. VIVO’s semantic approach to research networking has been widely adopted and the VIVO data standard is a recommendation and best practice for representation of information about research and researchers across the 62-member Clinical and Translational Science Award (CTSA) Consortium. CTSA Hubs are encouraged to “implement research networking tool(s) institution-wide that utilize RDF triples and an ontology compatible with the VIVO ontology… [and] people profiles at institutions should be publicly available … as Linked Open Data.”[1] The Galter Health Sciences Library team, as a member of the Northwestern University Clinical and Translational Sciences Institute (NUCATS), is establishing a digital repository to enable open representation of diverse scholarly outputs and outcomes by our scholars. Open access principles can help guide dissemination strategies for the broad range of products and outcomes of research from the diverse biomedical workforce. Our goal is to provide a digital home for traditional and non-traditional scholarly outputs in the Galter Digital Repository (GDR). Non-traditional outputs (defined for this purpose) are items produced during the scholarly process but which are often not discoverable or made available for reuse through the traditional scholarly publishing workflow, including measurement devices, patient education materials, curriculum materials, conference materials, community engagement materials, and so on. Open access and availability to the products and outcomes of research are increasingly required by funders and can serve as an important way to demonstrate return on investment to partners and our communities. For these reasons, the GDR serves as an important lynchpin in the evaluation and continuous improvement activities of NUCATS and other projects at FSM. FSM also has a rich digital heritage which we will continue to build through the GDR, as well. After taking into account the possibilities of the different frameworks that provide digital repositories architecture we selected the Fedora open source architecture. From Fedora’s DuraSpace wiki page: “[Fedora’s] flexibility enables it to integrate with many types of enterprise and web- based systems, offering scalability and durability. It also provides the ability to express rich sets of relationships among digital resources and to query the repository using the semantic web's SPARQL query language.”[2] Our first test collection was the collection of photographs, manuscripts, letters, and addresses (speeches) by/about Greene Vardiman Black, the father of modern dentistry. The collection was previously digitized and described with the help of Encoded Archival Description (EAD). EAD is an XML standard for encoding archival finding aids, maintained by the Technical Subcommittee for Encoded Archival Description of the Society of American Archivists, in partnership with the Library of Congress. We cross-walked the existing EAD metadata in order to display it in our repository stack Fedora/Sufia/Blacklight. To provide for rich metadata we added the Medical Subject Headings terms (MESH), Library of Congress Subject Headings (LCSH), Subject Names, and Subject Geographic Names to enable users to select keywords and subjects from a controlled vocabulary. We also expanded the possible “Resource type” options by adding publication types from VIVO-ISF Ontology and the Local Northwestern Ontology to accommodate all the publication types from the National Library of Medicine. This will allow us to seamlessly move data between systems: Repository and VIVO. GV  Black  Collec.on   All images in the repository are viewable through our IIIF server. Furthermore, we serve IIIF presentation metadata for all of our collections and files. For the front-end pager we use the actively maintained OpenSeadragon, included with Sufia. In addition to paging, it supports features such as zooming, panning and browsing. Other pagers can be easily integrated as long as they support IIIF. Inclusion of IIIF allows us to group series of files into entities such as books and then display them using OpenSeadragon. Users have the ability to mark any Collection as ‘Multi-Part’. A ‘Multi-Part Collection’ contains individual files and a link to a combined PDF file. Each file in such a collection can have a page number that can be anything alpha-numeric as found in the source document, backed by a sort-number indicating the order pages are to be displayed. Only files with page numbers are considered part of the collection and are viewable in OpenSeadragon pager. All IIIF related information are indexed in Solr along with other RDF metadata for fast retrieval. The IIIF server respects the authorization scheme used in the hydra-access-controls gem. The IIIF service uses permission information stored in Fedora, indexed in Solr and it serves images only to properly authorized users. Current: LDAP integration We integrated the Lightweight Directory Access Protocol (LDAP) into the workflow enabling users to select names of creators and contributors from a controlled list of names which is the same data source for the Northwestern VIVO instance. Future: Symplectic, VIVO, ORCID, Shibboleth, FASIS [1] The Clinical and Translational Science Awards (CTSA) Consortium, 2013. Research Networking, https://www.ctsacentral.org/best%20practices/research%20networking [2] DuraSpace, 2013. Fedora, http://www.duraspace.org/about_fedora [3] Society of American Archivists, 2015. Encoding Archival Description (EAD). http://www2.archivists.org/groups/technical-subcommittee-on- encoded-archival-description-ead/encoded-archival-description-ead Our repository runs Fedora 4.2 and is front-ended by heavily customized version of Sufia 6.0. We have integrated the authentication with LDAP and groups used for authorization are also sourced from LDAP. Furthermore, certain metadata fields such as ‘Creator’ only allow entries that are verifiable in LDAP to consistently identify individuals. All our software runs on CentOS 6.5 and we use Postgres as a relational database. We have the ability to batch various feeds including EAD to create hierarchical collections. Rails stack is proxied by Passenger and Apache and includes various parts of the Hydra stack, custom RDF vocabularies, IIIF server and background workers for Resque. Our code is open to the public on Github: https://github.com/galterlibrary/digital-repository/tree/master/app. The main goal of customization is to make sure that we are able to keep up with the upstream. If changes to the upstream code are needed pull requests should be opened with the upstream. All other changes are done by monkey-patching appropriate objects. Starting from scratch – building the perfect digital repository Galter  Health  Sciences  Library,  Feinberg  School  of  Medicine,  Northwestern  University   Piotr  Hebal,  Violeta  Ilik,  and  Kris.  Holmes