SlideShare a Scribd company logo
Research Data in eCommons @ Cornell: Present and Future
Wendy A. Kozlowski*, Dianne Dietrich, Gail Steinhart and Sarah Wright     Cornell University Library, Ithaca, NY 14853     *wak57@cornell.edu
As funding agencies increasingly prioritize
sharing of research data, the role of institutional
repositories (IRs) to house this material is likely
to increase as well. By its very nature, data differs
from the more traditional material housed in IRs
such as publications, presentations, theses and
dissertations. Given these distinctions, an effort
to optimize functionality of eCommons to handle
data could be helpful to accommodate future
data deposits. To evaluate what potential
eCommons users value in a repository for
research data, we reviewed several sources of
researcher feedback collected at Cornell
University and elsewhere.
Introduction
How well are we meeting researcher needs and where can we go from here?
33
36
87
169
357
376
393
471
898
1293
1528
2144
3297
3385
3562
6749
0 1000 2000 3000 4000 5000 6000 7000
Animations and Software
Maps, Plans and Blueprints
Datasets
Recordings and Musical Scores
Videos
Learning Objects and Fact Sheets
Presentations
Books or Book Chapters
Other (incl. Webpages and Websites)
Articles
Biographies and Interviews
Papers and Projects
Dissertations and Thesis
Technical Reports and Preprints
Images
Journals
Submitter‐designated Item ʺTypesʺ in eCommons*
*data as of 27 Mar 2013; n = 24778 
0
15
30
45
0
1000
2000
3000
4000
5000
6000
2002 2004 2006 2008 2010 2012
eCommons Submissions
Total Items Added
Item Type ʺDatasetʺ Additions
What does Cornell have?
Cornell University Library’s IR, eCommons, is a DSpace
powered repository available for materials in digital
formats that may be useful for educational, scholarly,
research or historical purposes. eCommons accepts
research data with file sizes up to 1GB and individual
collection sizes up to 10GB annually. By default, material
is openly accessible via the web and under certain
situations, access can be restricted to members of the
Cornell community only and/or embargoed for a
maximum of 5 years. Entries are assigned a persistent
identifier (www.handle.net), and the CU Library is
committed to preservation and to assuring long term
access to contents. Upon deposit, users can assign an
item type; presently, “dataset” items represent less than
one half of one percent of total content (see figures, left).
Datasets entries can be collections of multiple files;
distribution of dataset file types is shown to the right.
.wav 
(4602)
.pdf (46)
.csv (56)
.txt (50) .doc (20)
.xls (14)
.qsf (1)
.wb2 (1)
Entry type ʺdatasetʺ file extensions
What do researchers want?
0 2 4 6 8
Standardized metadata
Ability of general public to easily find the dataset
Documentation of changes made to the dataset…
Citation requirement for others when using dataset
Version control
Data citation tracking
Ability to cite the dataset in publications
Discovery of the dataset using Internet search…
A basic, public description of and link to the data
0 2 4 6 8
Access restrictions
Ability of others to comment or annotate
Usage/access statistics
Track and show user comments
Batch upload
Self‐submission
Connect to visualization or analytical tools
Easy transfer to permanent archive
Connect or merge data with other datasets
In the spring of 2012, 8 faculty and staff
from Cornell University (CU) and
Washington University in St Louis were
interviewed using a modified Data
Curation Profile (DCP) Toolkit1.
Researchers from a variety of disciplines
were asked to prioritize features related
to repository functionality (shown at
right). Results are generally consistent
with findings from a 2011 faculty survey
on data management needs2, DCPs
completed at other institutions3 and other
studies on data sharing4.
1 https://datacurationprofiles.org
2 http://dx.doi.org/10.7191/jeslib.2012.1008
3 http://hdl.handle.net/1853/28509
4 doi:10.1371/journal.pone.0021101
Key IR functions likely to be helpful to researchers Assessment of current eCommons support Considerations for the future of eCommons at Cornell
Discoverability via standard Internet search engines Good, with some exceptions, such as incomplete indexing of large PDF’s
In addition to Internet discoverability, DSpace 3.1 will offer enhanced search and browse features 
within the IR; upgrade planned for summer 2013.
Citation support (creation, export, tracking etc.) Not currently supported Explore creation of a suggested citation built in part from metadata; consider DOI assignment.
Version Control Not currently supported Item level versioning  supported in DSpace 3.1.
Self‐service submission Available; current active registered users: 968 (564 have submitted) Submission process may be additionally simplified using type‐based metadata fields.
Access control by data owners Access can be limited to a CU subgroup and limited embargos are allowed Advanced embargo functionality supported in DSpace 3.1.
Infrastructure to allow for dataset updates (due to 
changes or addition of new data)
Datasets can be manually updated, but not without administrator support. 
Some datasets are updated by replacement, some by addition of new files.
Clearly articulated best‐practices for dataset updates should be developed and added to 
eCommons usage policies.
Linking between data sets and related publications  Not currently supported
DSpace does not allow for this functionality, but linkages using VIVO and a CU metadata 
repository (sites.google.com/site/datastarsite) are currently in development.

More Related Content

PDF
RDAP 15 Navigating the Rocky Road to Research Data Acceptance
PPTX
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
PDF
Poster RDAP13: Data information literacy multiple paths to a single goal
PDF
Poster RDAP13: A Workflow for Depositing to a Research Data Repository: A Cas...
PDF
Poster: Very Open Data Project
PDF
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
PDF
Integration of research literature and data (InFoLiS)
PDF
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
RDAP 15 Navigating the Rocky Road to Research Data Acceptance
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Poster RDAP13: Data information literacy multiple paths to a single goal
Poster RDAP13: A Workflow for Depositing to a Research Data Repository: A Cas...
Poster: Very Open Data Project
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
Integration of research literature and data (InFoLiS)
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...

What's hot (20)

PPTX
NISO Training Thursday Crafting a Scientific Data Management Plan
PDF
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
PDF
Introduction to PANGAEA & EURO-BASIN Data Management, by Janine Felden
PPTX
Building and providing data management services a framework for everyone!
PPTX
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
PPTX
Repository Fringe 2016 - Survey Documentation and Analysis
DOCX
RDAP 16: DMPs and Public Access: Agency and Data Service Experiences
PDF
Praetzellis "Data Management Planning and Tools"
PPT
Open Data and Institutional Repositories
PDF
PPTX
Libraries and Research Data Curation: Barriers and Incentives for Preservatio...
PPT
An analysis and characterization of DMPs in NSF proposals from the University...
PPT
The NIH as a Digital Enterprise: Implications for PAG
PDF
RDAP14: Collaboration and tension between institutions and units providing da...
PDF
Strasser "Effective data management and its role in open research"
PPTX
Supporting the development of a national Research Data Discovery Service - A ...
PPTX
Supporting UC Research Data Management
PPTX
PPTX
DataONE Education Module 03: Data Management Planning
PDF
RDAP14 Poster: The DCC’s institutional engagement program: changing approache...
NISO Training Thursday Crafting a Scientific Data Management Plan
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
Introduction to PANGAEA & EURO-BASIN Data Management, by Janine Felden
Building and providing data management services a framework for everyone!
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
Repository Fringe 2016 - Survey Documentation and Analysis
RDAP 16: DMPs and Public Access: Agency and Data Service Experiences
Praetzellis "Data Management Planning and Tools"
Open Data and Institutional Repositories
Libraries and Research Data Curation: Barriers and Incentives for Preservatio...
An analysis and characterization of DMPs in NSF proposals from the University...
The NIH as a Digital Enterprise: Implications for PAG
RDAP14: Collaboration and tension between institutions and units providing da...
Strasser "Effective data management and its role in open research"
Supporting the development of a national Research Data Discovery Service - A ...
Supporting UC Research Data Management
DataONE Education Module 03: Data Management Planning
RDAP14 Poster: The DCC’s institutional engagement program: changing approache...
Ad

Similar to Poster RDAP13: Research Data in eCommons @ Cornell: Present and Future (20)

PDF
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
PPTX
A Tale of Two Data Catalogs
PPT
The Commons
PPTX
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
PPTX
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
PDF
Design phase kick-off event and Ceremony
PPT
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
PPT
Foundations for Discovery Informatics
PPT
The return of the hierarchical model
PPTX
Libraries, collections, technology: presented at Pennylvania State University...
PPT
Data curation issues for repositories
PPT
Disciplinary and institutional perspectives on digital curation
PPTX
DCC and FAIR initiatives
PPTX
Instutional repositories and data
PPTX
"In the Early Days of a Better Nation": Enhancing the power of metadata today...
PPT
PhRMA Some Early Thoughts
PPTX
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
PPT
Big Data in Biomedicine – An NIH Perspective
PPTX
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
PPT
Evolution or revolution? The changing data landscape
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
A Tale of Two Data Catalogs
The Commons
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Design phase kick-off event and Ceremony
Small Science: First Impressions of Curation Needs. Presentation at Digital L...
Foundations for Discovery Informatics
The return of the hierarchical model
Libraries, collections, technology: presented at Pennylvania State University...
Data curation issues for repositories
Disciplinary and institutional perspectives on digital curation
DCC and FAIR initiatives
Instutional repositories and data
"In the Early Days of a Better Nation": Enhancing the power of metadata today...
PhRMA Some Early Thoughts
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Big Data in Biomedicine – An NIH Perspective
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
Evolution or revolution? The changing data landscape
Ad

More from ASIS&T (20)

PPTX
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
PPTX
RDAP 16: Sustainability of data infrastructure: The history of science scienc...
PPTX
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...
PPTX
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
PPTX
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...
PDF
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)
PDF
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
PDF
RDAP 16 Poster: Interpreting Local Data Policies in Practice
PDF
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
PPTX
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
PPTX
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...
PPTX
RDAP 16 Lightning: RDM Discussion Group: How'd that go?
PPT
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
PDF
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge Broker
PPTX
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...
PPT
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
PPTX
RDAP 16 Lightning: Personas as a Policy Development Tool for Research Data
PPTX
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide Collaboration
PPTX
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...
PPTX
RDAP 16: How do we know where to grow? Assessing Research Data Services at th...
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
RDAP 16: Sustainability of data infrastructure: The history of science scienc...
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...
RDAP 16 Lightning: RDM Discussion Group: How'd that go?
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge Broker
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Personas as a Policy Development Tool for Research Data
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide Collaboration
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...
RDAP 16: How do we know where to grow? Assessing Research Data Services at th...

Poster RDAP13: Research Data in eCommons @ Cornell: Present and Future

  • 1. Research Data in eCommons @ Cornell: Present and Future Wendy A. Kozlowski*, Dianne Dietrich, Gail Steinhart and Sarah Wright     Cornell University Library, Ithaca, NY 14853     *wak57@cornell.edu As funding agencies increasingly prioritize sharing of research data, the role of institutional repositories (IRs) to house this material is likely to increase as well. By its very nature, data differs from the more traditional material housed in IRs such as publications, presentations, theses and dissertations. Given these distinctions, an effort to optimize functionality of eCommons to handle data could be helpful to accommodate future data deposits. To evaluate what potential eCommons users value in a repository for research data, we reviewed several sources of researcher feedback collected at Cornell University and elsewhere. Introduction How well are we meeting researcher needs and where can we go from here? 33 36 87 169 357 376 393 471 898 1293 1528 2144 3297 3385 3562 6749 0 1000 2000 3000 4000 5000 6000 7000 Animations and Software Maps, Plans and Blueprints Datasets Recordings and Musical Scores Videos Learning Objects and Fact Sheets Presentations Books or Book Chapters Other (incl. Webpages and Websites) Articles Biographies and Interviews Papers and Projects Dissertations and Thesis Technical Reports and Preprints Images Journals Submitter‐designated Item ʺTypesʺ in eCommons* *data as of 27 Mar 2013; n = 24778  0 15 30 45 0 1000 2000 3000 4000 5000 6000 2002 2004 2006 2008 2010 2012 eCommons Submissions Total Items Added Item Type ʺDatasetʺ Additions What does Cornell have? Cornell University Library’s IR, eCommons, is a DSpace powered repository available for materials in digital formats that may be useful for educational, scholarly, research or historical purposes. eCommons accepts research data with file sizes up to 1GB and individual collection sizes up to 10GB annually. By default, material is openly accessible via the web and under certain situations, access can be restricted to members of the Cornell community only and/or embargoed for a maximum of 5 years. Entries are assigned a persistent identifier (www.handle.net), and the CU Library is committed to preservation and to assuring long term access to contents. Upon deposit, users can assign an item type; presently, “dataset” items represent less than one half of one percent of total content (see figures, left). Datasets entries can be collections of multiple files; distribution of dataset file types is shown to the right. .wav  (4602) .pdf (46) .csv (56) .txt (50) .doc (20) .xls (14) .qsf (1) .wb2 (1) Entry type ʺdatasetʺ file extensions What do researchers want? 0 2 4 6 8 Standardized metadata Ability of general public to easily find the dataset Documentation of changes made to the dataset… Citation requirement for others when using dataset Version control Data citation tracking Ability to cite the dataset in publications Discovery of the dataset using Internet search… A basic, public description of and link to the data 0 2 4 6 8 Access restrictions Ability of others to comment or annotate Usage/access statistics Track and show user comments Batch upload Self‐submission Connect to visualization or analytical tools Easy transfer to permanent archive Connect or merge data with other datasets In the spring of 2012, 8 faculty and staff from Cornell University (CU) and Washington University in St Louis were interviewed using a modified Data Curation Profile (DCP) Toolkit1. Researchers from a variety of disciplines were asked to prioritize features related to repository functionality (shown at right). Results are generally consistent with findings from a 2011 faculty survey on data management needs2, DCPs completed at other institutions3 and other studies on data sharing4. 1 https://datacurationprofiles.org 2 http://dx.doi.org/10.7191/jeslib.2012.1008 3 http://hdl.handle.net/1853/28509 4 doi:10.1371/journal.pone.0021101 Key IR functions likely to be helpful to researchers Assessment of current eCommons support Considerations for the future of eCommons at Cornell Discoverability via standard Internet search engines Good, with some exceptions, such as incomplete indexing of large PDF’s In addition to Internet discoverability, DSpace 3.1 will offer enhanced search and browse features  within the IR; upgrade planned for summer 2013. Citation support (creation, export, tracking etc.) Not currently supported Explore creation of a suggested citation built in part from metadata; consider DOI assignment. Version Control Not currently supported Item level versioning  supported in DSpace 3.1. Self‐service submission Available; current active registered users: 968 (564 have submitted) Submission process may be additionally simplified using type‐based metadata fields. Access control by data owners Access can be limited to a CU subgroup and limited embargos are allowed Advanced embargo functionality supported in DSpace 3.1. Infrastructure to allow for dataset updates (due to  changes or addition of new data) Datasets can be manually updated, but not without administrator support.  Some datasets are updated by replacement, some by addition of new files. Clearly articulated best‐practices for dataset updates should be developed and added to  eCommons usage policies. Linking between data sets and related publications  Not currently supported DSpace does not allow for this functionality, but linkages using VIVO and a CU metadata  repository (sites.google.com/site/datastarsite) are currently in development.