SlideShare a Scribd company logo
DSpace: Technical
Basics

Iryna Kuchma
Open Access Programme Manager
Open Access and the Evolving Scholarly Communication
Environment workshop, July 11, 2012, Makerere University


www.eifl.net                             Attribution 3.0 Unported
DSpace: Technical Basics
DSpace: Technical Basics
Application Architecture

The DSpace system is organised into three tiers
which consist of a number of components




Each layer only invokes the layer below it i.e. the
application layer may not used the storage layer
directly
The Storage Layer




The storage layer is responsible for physical
storage of metadata and content

DSpace uses a relational database to store all
information about the organization of content,
metadata about the content, information about e-
people and authorization, and the state of
currently-running workflows.
The Business Logic Layer




The business logic layer deals with managing
the content of the archive, users of the archive
(e-people), authorization, and workflow
The Application Layer




The application layer contains components
that communicate with the world outside of the
individual DSpace installation, for example the
Web user interface and the Open Archives
Initiative protocol for metadata harvesting service
The DSpace Web UI is the largest and most-
used component in the application layer. Two
versions:
1.   JSPUI: Built on Java Servlet and JavaServer Page
     technology
2.   XMLUI (Manakin): Built on XML and Cocoon technology
Server Architecture


                User Interface
            Web Application Server




These systems may reside on a single server or
be hosted separately on dedicated servers
Structural Overview

DSpace is split into three directory trees:
Source Directory [dspace-src]
   Surprisingly, this is where the source code resides
Install Directory [dspace]
 Populated during install & during normal operation
 Contains:
      Configuration files
      Command line tools
      Libraries
      DSpace archive (depending on configuration)

Web Deployment Directory
[tomcat]/webapps/dspace
   Contains the JSPs and Java classes and libraries
    necessary to run DSpace
Persistent Identifiers

The use of location based identifiers such as the
Uniform Resource Locator (URL) often leads to
problems in accessibility to resources with time
Often when accessing a resource via a hyperlink
users receive a “404 - page not found” error
Persistent identifiers are an attempt at solving the
issues surrounding resource identification and
long term preservation
A persistent identifier allows the resource to be
uniquely identified in a way that will not change if
the resource is renamed or relocated
Persistent Identifiers

This means that a resource can be reliably
referenced for future access by humans and
software

Caveat: Persistence is heavily dependant on
organisation policy i.e. persistence of an object is
only effective if an organisation maintains and
manages this persistence

Different systems in use for persistent identifiers
 Persistent Uniform Resource Locators (PURLs)
 Digital Object Identifiers (DOI)
 Handle – Used by DSpace
The Handle

     In a handle system, resource address is identified by a
      unique handle assigned by a common registration service


                     http://hdl.handle.net/2160/568


    Registration            Handle Prefix   Local Identifier
    Service
    http://hdl.handle.net   2160            568
Practical: Using a Handle

 Navigate to Aberystwyth’s DSpace repository – Cadair
 Select an item from a collection and note the handle
  address

   Open this address in a new browser window



   The handle will resolve an redirect back to your original
    item
Configuring the Handles service

Out of the box, a DSpace installation will use the
handle:
                  hdl:123456789
These aren't really Handles, since the global
Handle system doesn't actually know about them

3 Steps to handle configuration
Configuring the Handles service

In order to use handle in DSpace, registration for
a prefix with the Corporation for National
Research Initiatives (CNRI) is required

How to register with CNRI?
 Complete the registration form on the CNRI website
 Create & Upload the sitebndl.zip to CNRI
 Pay a small annual fee



http://www.handle.net/service_agreement.html
Generating the sitebndl.zip

The Site Bundle is an archive which contains
information about your DSpace installation and is
used to generate your handle
To generate the sitebndl.zip run the command:
   [dspace]/bin/dsrun net.handle.server.SimpleSetup
   [dspace]/handle-server
You will be required to complete a series of
questions
Once completed the sitebndl.zip can be found:
   [dspace]/handle-server/sitebndl.zip
Complete the registration and upload the
sitebndl.zip
Configuring the Handle Server

Once registration is complete, a handle should be
returned from CNRI
      Configuring the Handle Server
Edit the [dspace]/handle-server/config.dct to
include the lines in the “server_config” clause:
"storage_type" = "CUSTOM"
"storage_class" = "org.dspace.handle.HandlePlugin”


Update all references to YOUR_NAMING_AUTHORITY to
your assigned handle:
300:0.NA/YOUR_NAMING_AUTHORITY      -> 300:0.NA/2097
Updating the Handle Prefix

Edit [dspace]/config/dspace.cfg and update the
handle prefix




A restart of Tomcat will be required
If items have already been deposited into DSpace
their handle will need updating
  [dspace]/bin/update-handle-prefix 123456789
                    YourHandle
Starting the Handle Server

Finally start the handle server
        [dspace]/bin/start-handle-server

A script will be required to automate the starting
of the handle server upon a server boot

Once configured the handles should resolve as
the practical demonstrated earlier in this module
Workflow scenarios
Scenario 1: Head of research

    I want to be able to see everything
    my researchers deposit for quality
              control purposes
Workflow scenarios
Scenario 2: Repository manager

    I want to approve everything that
     goes in to the repository to make
   sure there are no copyright issues or
               bad metadata
Workflow scenarios
Scenario 3: Cataloguer

    I want to be able to see everything
    my researchers deposit for quality
              control purposes
The three workflows
DSpace has three workflow steps
1.   Accept/Reject Step
2.   Accept/Reject/Edit Metadata Step
3.   Edit Metadata Step


You can use any combination of the three
    Steps are worked through in order
Which might be used in each of the
previous scenarios?
RSS feeds
RSS feeds
– Site level (all new items)
– Community level (new items in all contained
  collections)
– Collection level (new items in that collection)
Can be read in modern web browsers
Can be subscribed to in news reader
software
Alerts
Alerts
– Created by users
– Created for a collection
– Emails sent each day for new items
– Script must run daily:
   • [dspace]/bin/sub-daily
DSpace statistcis
DSpace statistics:
– Collated from DSpace log files
– Reports generated daily (daily and monthly
  reports)
– http://dspace.example.com/dspace/statistics
   • Or via the Administer menu
– Can be private (must be logged in) or public
   • In dspace.cfg:
      – report.public = [true|false]
Statistics collected
The following statistics are collected
– General overview (e.g. number of items
  archived / number of item views / user logins)
– Archive Information (numbers of each type of
  item)
– Item view counts
– Actions performed
– Search terms used
Google Analytics
Google Analytics allow a richer and more
detailed suite of statistics
   •   Time visitors spent on the site
   •   Where they came from
   •   Terms they used in search engines to find items
   •   The geographic location of visitors
   •   How many pages they looked at
   •   Which pages they started and ended their visit on
– JSPUI requires a small code change, Manakin
  has a configurable option.
Credits
These slides have been produced re-using
The DSpace Course by:
– Stuart Lewis & Chris Yates
– Repository Support Project
    http://www.rsp.ac.uk/

– Part of the RepositoryNet
– Funded by JISC
    http://www.jisc.ac.uk/
Thank you! Questions?

More Related Content

PPT
Лекція №11
PPTX
Authority Control Part 1
PPTX
Subjects Plus: Information Management Tool - A Case Study, with Special Refer...
PPTX
Zotero and other RMS
PDF
Tca best practices2
PPSX
Koha Presentation at Uttara University
PPTX
koha PPT 23822.pptx
Лекція №11
Authority Control Part 1
Subjects Plus: Information Management Tool - A Case Study, with Special Refer...
Zotero and other RMS
Tca best practices2
Koha Presentation at Uttara University
koha PPT 23822.pptx

What's hot (20)

DOCX
A Research Proposal
PPT
Introduction To Controlled Vocabularies
PDF
Hadoop Overview kdd2011
PDF
Libraries past present and future
PPTX
Dspace software
PDF
Searching the Literature: Search Techniques and Construction
PPTX
Electronic reference sources
PPS
Open source Library Management Systems
PPTX
Storage Basics
PPTX
Search strategy
PPTX
How to find information on the internet
PPTX
A comparative analysis of library classification systems
PPT
Linux: Basics OF Linux
PPTX
SLSH ppt
PPTX
Library 2.0
PDF
Mendeley reference management tool
PPT
Module 1 introduction of Dspace
PPT
Drupal - Introduction to Drupal and Web Content Management
PPT
Library Automation A - Z Guide: A Hands on Module
A Research Proposal
Introduction To Controlled Vocabularies
Hadoop Overview kdd2011
Libraries past present and future
Dspace software
Searching the Literature: Search Techniques and Construction
Electronic reference sources
Open source Library Management Systems
Storage Basics
Search strategy
How to find information on the internet
A comparative analysis of library classification systems
Linux: Basics OF Linux
SLSH ppt
Library 2.0
Mendeley reference management tool
Module 1 introduction of Dspace
Drupal - Introduction to Drupal and Web Content Management
Library Automation A - Z Guide: A Hands on Module
Ad

Similar to DSpace: Technical Basics (20)

PDF
Unit7 of dscpaed whic is useful for softwaq
PPTX
Academy PRO: HTML5 Data storage
PDF
Tuning and optimizing webcenter spaces application white paper
PPTX
DBMS: Week 02 - Database System Architecture
PPTX
Cloud foundry architecture and deep dive
PPTX
Introduction to hadoop and hdfs
PPTX
Centralizing users’ authentication at Active Directory level 
PPT
Addmi 02-addm overview
DOCX
What is active directory
DOCX
Windows server Interview question and answers
PDF
Active Directory
ODP
Front Range PHP NoSQL Databases
PDF
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
PPTX
DSpace 4.2 Basics & Configuration
PPTX
Dspace4 150227090306-conversion-gate01
PPTX
L19 Application Architecture
PDF
Final domain control policy
PPT
Hadoop training in bangalore-kellytechnologies
PDF
Hadoop data management
PPT
Corporate-informatica-training-in-mumbai
Unit7 of dscpaed whic is useful for softwaq
Academy PRO: HTML5 Data storage
Tuning and optimizing webcenter spaces application white paper
DBMS: Week 02 - Database System Architecture
Cloud foundry architecture and deep dive
Introduction to hadoop and hdfs
Centralizing users’ authentication at Active Directory level 
Addmi 02-addm overview
What is active directory
Windows server Interview question and answers
Active Directory
Front Range PHP NoSQL Databases
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
DSpace 4.2 Basics & Configuration
Dspace4 150227090306-conversion-gate01
L19 Application Architecture
Final domain control policy
Hadoop training in bangalore-kellytechnologies
Hadoop data management
Corporate-informatica-training-in-mumbai
Ad

More from Iryna Kuchma (20)

PPTX
Open access policy workshop
PPTX
How to do research in today’s digital environment
PPTX
How to start: Setting up an open access repository in 22 steps
PPTX
Copyright and author’s rights: what you need to know when you publish your re...
PPTX
Introduction to open access and how it helps in your research and increases t...
PPTX
Open Access policies and best practices
PPTX
Overview of open access progress globally
PDF
Copyright management in open access projects
PDF
Open access policies
PDF
Open access repository: How to set it up in 22 steps
PDF
Open Access Archeology, Public Health, Social Sciences…
PDF
Introduction to Open Access and How you can get involved
PDF
Open access: What's in there for me?
PDF
Open access: train the trainers programmes
PDF
Open access: What's in there for me? And some ideas for advocacy programmes
PDF
Open Access, open research data and open science
PDF
Changing role of faculty librarians in open access
PDF
Introduction to open access and how you can get involved
PDF
International Open Access Policy Landscape and Why You Should Take Action Now
PDF
Open Access Initiatives on a Regional and Global Scale: EIFL, OASPA, COAR and...
Open access policy workshop
How to do research in today’s digital environment
How to start: Setting up an open access repository in 22 steps
Copyright and author’s rights: what you need to know when you publish your re...
Introduction to open access and how it helps in your research and increases t...
Open Access policies and best practices
Overview of open access progress globally
Copyright management in open access projects
Open access policies
Open access repository: How to set it up in 22 steps
Open Access Archeology, Public Health, Social Sciences…
Introduction to Open Access and How you can get involved
Open access: What's in there for me?
Open access: train the trainers programmes
Open access: What's in there for me? And some ideas for advocacy programmes
Open Access, open research data and open science
Changing role of faculty librarians in open access
Introduction to open access and how you can get involved
International Open Access Policy Landscape and Why You Should Take Action Now
Open Access Initiatives on a Regional and Global Scale: EIFL, OASPA, COAR and...

Recently uploaded (20)

PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
01-Introduction-to-Information-Management.pdf
PDF
Insiders guide to clinical Medicine.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Institutional Correction lecture only . . .
Pharmacology of Heart Failure /Pharmacotherapy of CHF
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Supply Chain Operations Speaking Notes -ICLT Program
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Microbial disease of the cardiovascular and lymphatic systems
TR - Agricultural Crops Production NC III.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
01-Introduction-to-Information-Management.pdf
Insiders guide to clinical Medicine.pdf
RMMM.pdf make it easy to upload and study
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
Sports Quiz easy sports quiz sports quiz
Institutional Correction lecture only . . .

DSpace: Technical Basics

  • 1. DSpace: Technical Basics Iryna Kuchma Open Access Programme Manager Open Access and the Evolving Scholarly Communication Environment workshop, July 11, 2012, Makerere University www.eifl.net Attribution 3.0 Unported
  • 4. Application Architecture The DSpace system is organised into three tiers which consist of a number of components Each layer only invokes the layer below it i.e. the application layer may not used the storage layer directly
  • 5. The Storage Layer The storage layer is responsible for physical storage of metadata and content DSpace uses a relational database to store all information about the organization of content, metadata about the content, information about e- people and authorization, and the state of currently-running workflows.
  • 6. The Business Logic Layer The business logic layer deals with managing the content of the archive, users of the archive (e-people), authorization, and workflow
  • 7. The Application Layer The application layer contains components that communicate with the world outside of the individual DSpace installation, for example the Web user interface and the Open Archives Initiative protocol for metadata harvesting service The DSpace Web UI is the largest and most- used component in the application layer. Two versions: 1. JSPUI: Built on Java Servlet and JavaServer Page technology 2. XMLUI (Manakin): Built on XML and Cocoon technology
  • 8. Server Architecture User Interface Web Application Server These systems may reside on a single server or be hosted separately on dedicated servers
  • 9. Structural Overview DSpace is split into three directory trees: Source Directory [dspace-src]  Surprisingly, this is where the source code resides Install Directory [dspace]  Populated during install & during normal operation  Contains:  Configuration files  Command line tools  Libraries  DSpace archive (depending on configuration) Web Deployment Directory [tomcat]/webapps/dspace  Contains the JSPs and Java classes and libraries necessary to run DSpace
  • 10. Persistent Identifiers The use of location based identifiers such as the Uniform Resource Locator (URL) often leads to problems in accessibility to resources with time Often when accessing a resource via a hyperlink users receive a “404 - page not found” error Persistent identifiers are an attempt at solving the issues surrounding resource identification and long term preservation A persistent identifier allows the resource to be uniquely identified in a way that will not change if the resource is renamed or relocated
  • 11. Persistent Identifiers This means that a resource can be reliably referenced for future access by humans and software Caveat: Persistence is heavily dependant on organisation policy i.e. persistence of an object is only effective if an organisation maintains and manages this persistence Different systems in use for persistent identifiers  Persistent Uniform Resource Locators (PURLs)  Digital Object Identifiers (DOI)  Handle – Used by DSpace
  • 12. The Handle  In a handle system, resource address is identified by a unique handle assigned by a common registration service http://hdl.handle.net/2160/568 Registration Handle Prefix Local Identifier Service http://hdl.handle.net 2160 568
  • 13. Practical: Using a Handle  Navigate to Aberystwyth’s DSpace repository – Cadair  Select an item from a collection and note the handle address  Open this address in a new browser window  The handle will resolve an redirect back to your original item
  • 14. Configuring the Handles service Out of the box, a DSpace installation will use the handle: hdl:123456789 These aren't really Handles, since the global Handle system doesn't actually know about them 3 Steps to handle configuration
  • 15. Configuring the Handles service In order to use handle in DSpace, registration for a prefix with the Corporation for National Research Initiatives (CNRI) is required How to register with CNRI?  Complete the registration form on the CNRI website  Create & Upload the sitebndl.zip to CNRI  Pay a small annual fee http://www.handle.net/service_agreement.html
  • 16. Generating the sitebndl.zip The Site Bundle is an archive which contains information about your DSpace installation and is used to generate your handle To generate the sitebndl.zip run the command: [dspace]/bin/dsrun net.handle.server.SimpleSetup [dspace]/handle-server You will be required to complete a series of questions Once completed the sitebndl.zip can be found: [dspace]/handle-server/sitebndl.zip Complete the registration and upload the sitebndl.zip
  • 17. Configuring the Handle Server Once registration is complete, a handle should be returned from CNRI Configuring the Handle Server Edit the [dspace]/handle-server/config.dct to include the lines in the “server_config” clause: "storage_type" = "CUSTOM" "storage_class" = "org.dspace.handle.HandlePlugin” Update all references to YOUR_NAMING_AUTHORITY to your assigned handle: 300:0.NA/YOUR_NAMING_AUTHORITY -> 300:0.NA/2097
  • 18. Updating the Handle Prefix Edit [dspace]/config/dspace.cfg and update the handle prefix A restart of Tomcat will be required If items have already been deposited into DSpace their handle will need updating [dspace]/bin/update-handle-prefix 123456789 YourHandle
  • 19. Starting the Handle Server Finally start the handle server [dspace]/bin/start-handle-server A script will be required to automate the starting of the handle server upon a server boot Once configured the handles should resolve as the practical demonstrated earlier in this module
  • 20. Workflow scenarios Scenario 1: Head of research I want to be able to see everything my researchers deposit for quality control purposes
  • 21. Workflow scenarios Scenario 2: Repository manager I want to approve everything that goes in to the repository to make sure there are no copyright issues or bad metadata
  • 22. Workflow scenarios Scenario 3: Cataloguer I want to be able to see everything my researchers deposit for quality control purposes
  • 23. The three workflows DSpace has three workflow steps 1. Accept/Reject Step 2. Accept/Reject/Edit Metadata Step 3. Edit Metadata Step You can use any combination of the three  Steps are worked through in order Which might be used in each of the previous scenarios?
  • 24. RSS feeds RSS feeds – Site level (all new items) – Community level (new items in all contained collections) – Collection level (new items in that collection) Can be read in modern web browsers Can be subscribed to in news reader software
  • 25. Alerts Alerts – Created by users – Created for a collection – Emails sent each day for new items – Script must run daily: • [dspace]/bin/sub-daily
  • 26. DSpace statistcis DSpace statistics: – Collated from DSpace log files – Reports generated daily (daily and monthly reports) – http://dspace.example.com/dspace/statistics • Or via the Administer menu – Can be private (must be logged in) or public • In dspace.cfg: – report.public = [true|false]
  • 27. Statistics collected The following statistics are collected – General overview (e.g. number of items archived / number of item views / user logins) – Archive Information (numbers of each type of item) – Item view counts – Actions performed – Search terms used
  • 28. Google Analytics Google Analytics allow a richer and more detailed suite of statistics • Time visitors spent on the site • Where they came from • Terms they used in search engines to find items • The geographic location of visitors • How many pages they looked at • Which pages they started and ended their visit on – JSPUI requires a small code change, Manakin has a configurable option.
  • 29. Credits These slides have been produced re-using The DSpace Course by: – Stuart Lewis & Chris Yates – Repository Support Project http://www.rsp.ac.uk/ – Part of the RepositoryNet – Funded by JISC http://www.jisc.ac.uk/