SlideShare a Scribd company logo
Towards A Data Driven Understanding of Research Data
September 3, 2015
Montana State University, Research Council
Jerry Sheehan
Montana State University
Chief Information Officer
jsheehan@montana.edu
The “Consumerization” of Research Data
Trend 1 Costs and Capacity
• A “Consumer Effect Has” Pushed
Prices Down While Increasing
Performance.
• Users Can Easily Buy More Storage
Than They Need.
• There are No Enterprise Strategies
for Research Data Discovery.
• No explicit way to inventory
• Instruments have “bursty” behavior
when the move data on the
network
Montana State University-Information Technology Center
“Commodity” Data Laboratory Equipment @ Montana State
Device Data Generation Per Run
Illumina Genomic Sequence .5Tb to 1Tb per run
Confocal Microscope 50-100Gb per run
Transmission Electron Microcope 10-20Gb per run
Montana State University-Information Technology Center
Research Data Census was a Three Way Institutional Partnership
Information Technology Center
University Library Vice President for Research &
Economic Development
Montana State University-Information Technology Center
Response Rates and Demographics
Montana State University-Information Technology Center
What Types of Research Data Do You Have?
Montana State University-Information Technology Center
How Do You Store Your Data?
Montana State University-Information Technology Center
How Large is Your Research Data?
Montana State University-Information Technology Center
Who Do You Share Your Data With and When?
Montana State University-Information Technology Center
Statistically Significant Findings
Montana State University-Information Technology Center
•Researchers who share their data, regardless of who they share it with (colleagues, students, or non-MSU
researchers) also tend to download data from other sources or repositories (78 percent of people sharing their
data also download data, versus 37 percent of people not sharing their data; p-value: 1.67x10-7
).
•Researchers with large research data tend to download data from other sources or repositories (90 percent of
people with data sets above one terabyte also download data, versus 42 percent for people with data sets
below 10 Gb; p-value: 1.58x10-5
).
•Researchers who back up their data also tend to annotate it (55 percent of people who back up their data
also annotate it, versus 22 percent of people who don't back up their data; p-value: 5x10-3
).
•Researchers with large research data tend to annotate it (62 percent of people with data sets above one
terabyte also annotate their data, versus 39 percent of people with data sets below 10 Gb; p-value: 0.024).
•Researchers interested in learning more about data infrastructure and services who do not back up their data
cite technical barriers as their main reason for not doing so (p-value: 0.014).
Qualitative Interview Findings
Montana State University-Information Technology Center
•Researchers don’t usually describe their data by size, although many know the exact size of their data. Instead,
their standard practice is to describe how they transfer the file (via email, placed on hard drives, put in cloud
services, etc.
•Researchers' sense of when and how data is disseminated and shared varied widely.
•There is no common definition of “big data”. Definitions change between disciplines, researchers build “bigger
data” by aggregating many small research results.
•Without exception, interviewees described their research practices as involving collaboration with others, both
inside and outside the institution.
•All researchers responded positively when asked if they would engage MSU Library services that focus on data
set annotation and metadata markup, assistance with deposit in relevant data repositories, and educational
programs and training on campus IT resources.
Impacts of the the RDC
Montana State University-Information Technology Center
• Creation of a multi-stakeholder proposal ($500K) to the National Science Foundation for investment
in a science network for the Bozeman campus. PI: Jerry Sheehan, Co-PIs: Kenning Arlitsch, Ben
Poulter, Phil Stewart, and Mark Young.
• Input from the Research Data Census and the NSF Proposal is Driving FY16 Capital Investments for
Campus.
• New Collaboration between ITC and the Library to Bundle A Set of Data Services and Infrastructure
for the Montana State University Research Community.
• Formal Publication of Survey Results in On-Line Educause Review (Sept/Oct 2015).
• Modification of the Survey Instrument, Adoption of Instrument by Other MSU Campuses, and
Sharing of Instrument with Higher Education Community.

More Related Content

PDF
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
PDF
RDAP 15 Navigating the Rocky Road to Research Data Acceptance
PPT
The NIH as a Digital Enterprise: Implications for PAG
PDF
Poster RDAP13: Research Data in eCommons @ Cornell: Present and Future
PDF
Poster: Very Open Data Project
PDF
Poster RDAP13: A Workflow for Depositing to a Research Data Repository: A Cas...
PDF
Poster RDAP13: Data information literacy multiple paths to a single goal
PDF
Doing research better: The role of meta‐data
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...
RDAP 15 Navigating the Rocky Road to Research Data Acceptance
The NIH as a Digital Enterprise: Implications for PAG
Poster RDAP13: Research Data in eCommons @ Cornell: Present and Future
Poster: Very Open Data Project
Poster RDAP13: A Workflow for Depositing to a Research Data Repository: A Cas...
Poster RDAP13: Data information literacy multiple paths to a single goal
Doing research better: The role of meta‐data

What's hot (20)

PPT
RDAP 033111
PPTX
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
PDF
Research Metadata Mechanics - Simon Porter
PPTX
Open Access as a Means to Produce High Quality Data
PPTX
Helping Faculty Help Themselves: Open Access and Data Management Consulting A...
PPTX
Meeting Federal Research Requirements
PPTX
From Data Sharing to Data Stewardship
PPTX
La ricerca scientifica nell'era dei Big Data - Sabina Leonelli
PDF
What to do about data? An overview of guidelines and policies for dataset co...
PPTX
Library resources and services for grant development
DOCX
RDAP 16: DMPs and Public Access: Agency and Data Service Experiences
PPTX
Building and providing data management services a framework for everyone!
PPTX
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...
PDF
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
PPT
RDAP14: Emerging role of UC Libraries in research data management education
PDF
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
PDF
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)
PDF
Strasser "Effective data management and its role in open research"
PPTX
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
PPTX
Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...
RDAP 033111
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Research Metadata Mechanics - Simon Porter
Open Access as a Means to Produce High Quality Data
Helping Faculty Help Themselves: Open Access and Data Management Consulting A...
Meeting Federal Research Requirements
From Data Sharing to Data Stewardship
La ricerca scientifica nell'era dei Big Data - Sabina Leonelli
What to do about data? An overview of guidelines and policies for dataset co...
Library resources and services for grant development
RDAP 16: DMPs and Public Access: Agency and Data Service Experiences
Building and providing data management services a framework for everyone!
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
RDAP14: Emerging role of UC Libraries in research data management education
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)
Strasser "Effective data management and its role in open research"
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...
Ad

Viewers also liked (16)

PPTX
Mte 533 differentiating instruction
PPTX
Occupational safety
PPTX
Welcome to the dsc
DOC
Munawla Profile
PPTX
Draft review process
PPTX
презентация Ver.1.4
PDF
High Performance Cyberinfrastructure and Data Services
PPTX
PDF
All Copy Products Grand Junction
PDF
2015 TCS New York City Marathon Media Guide
PPT
презентация сп4
PPTX
Herramientas web 2
PDF
Generalidades del turismo
PPTX
Voici le printemps ! Conseils pour le nettoyage de printemps
PPTX
Les voitures les moins fiables – quelles voitures éviter vendredi le 13 ?
PPTX
Wireless Communication Generations
Mte 533 differentiating instruction
Occupational safety
Welcome to the dsc
Munawla Profile
Draft review process
презентация Ver.1.4
High Performance Cyberinfrastructure and Data Services
All Copy Products Grand Junction
2015 TCS New York City Marathon Media Guide
презентация сп4
Herramientas web 2
Generalidades del turismo
Voici le printemps ! Conseils pour le nettoyage de printemps
Les voitures les moins fiables – quelles voitures éviter vendredi le 13 ?
Wireless Communication Generations
Ad

Similar to Research Data Census (20)

PPTX
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
PPTX
One View of Data Science
PPTX
Magle data curation in libraries
PDF
McGeary Data Curation Network: Developing and Scaling
PPT
Library Analytics and Metrics Project
PDF
Data Management and Broader Impacts: a holistic approach
PDF
Ratan "Are we there yet? Keeping the promise of open science"
PPTX
Supporting the National Research Platform with a Lean Cyberinfrastructure (CI...
PPTX
Supporting the NRP with a Lean CI Staff
PPTX
PSB2014 A Vision for Biomedical Research
PPTX
What Can Happen when Genome Sciences Meets Data Sciences?
PPTX
Research Data Management Guidance overview
PPTX
RDMG Service Overview
PPTX
Research Data Management in Academic Libraries: Meeting the Challenge
PPTX
The UVA School of Data Science
PPT
Data Sharing & Data Citation
PDF
Digital Resources for Open Science
PPTX
Publishing perspectives on data management & future directions
PPTX
Next generation data services at the Marriott Library
PPTX
Data Management for Postgraduate students by Lynn Woolfrey
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
One View of Data Science
Magle data curation in libraries
McGeary Data Curation Network: Developing and Scaling
Library Analytics and Metrics Project
Data Management and Broader Impacts: a holistic approach
Ratan "Are we there yet? Keeping the promise of open science"
Supporting the National Research Platform with a Lean Cyberinfrastructure (CI...
Supporting the NRP with a Lean CI Staff
PSB2014 A Vision for Biomedical Research
What Can Happen when Genome Sciences Meets Data Sciences?
Research Data Management Guidance overview
RDMG Service Overview
Research Data Management in Academic Libraries: Meeting the Challenge
The UVA School of Data Science
Data Sharing & Data Citation
Digital Resources for Open Science
Publishing perspectives on data management & future directions
Next generation data services at the Marriott Library
Data Management for Postgraduate students by Lynn Woolfrey

More from Jerry Sheehan (7)

PDF
IT Town Hall Montana State
PPTX
Scaling Approaches to the National Research Platform
PPTX
Performance Evaluations for UIT
PPTX
Montana State University's Bridger: A Science Driven Network Cyberinfrastruc...
PPTX
Technology, Complexity & Change: Creative Frictions of the Present
PPTX
Research CI @ Montana State
PDF
Townhalloct2015
IT Town Hall Montana State
Scaling Approaches to the National Research Platform
Performance Evaluations for UIT
Montana State University's Bridger: A Science Driven Network Cyberinfrastruc...
Technology, Complexity & Change: Creative Frictions of the Present
Research CI @ Montana State
Townhalloct2015

Recently uploaded (20)

PDF
Hybrid model detection and classification of lung cancer
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
August Patch Tuesday
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PPTX
Tartificialntelligence_presentation.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Getting started with AI Agents and Multi-Agent Systems
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
WOOl fibre morphology and structure.pdf for textiles
Hybrid model detection and classification of lung cancer
Web App vs Mobile App What Should You Build First.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Enhancing emotion recognition model for a student engagement use case through...
August Patch Tuesday
A contest of sentiment analysis: k-nearest neighbor versus neural network
A novel scalable deep ensemble learning framework for big data classification...
O2C Customer Invoices to Receipt V15A.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Tartificialntelligence_presentation.pptx
Programs and apps: productivity, graphics, security and other tools
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Zenith AI: Advanced Artificial Intelligence
Getting started with AI Agents and Multi-Agent Systems
cloud_computing_Infrastucture_as_cloud_p
A comparative study of natural language inference in Swahili using monolingua...
WOOl fibre morphology and structure.pdf for textiles

Research Data Census

  • 1. Towards A Data Driven Understanding of Research Data September 3, 2015 Montana State University, Research Council Jerry Sheehan Montana State University Chief Information Officer jsheehan@montana.edu
  • 2. The “Consumerization” of Research Data Trend 1 Costs and Capacity • A “Consumer Effect Has” Pushed Prices Down While Increasing Performance. • Users Can Easily Buy More Storage Than They Need. • There are No Enterprise Strategies for Research Data Discovery. • No explicit way to inventory • Instruments have “bursty” behavior when the move data on the network Montana State University-Information Technology Center
  • 3. “Commodity” Data Laboratory Equipment @ Montana State Device Data Generation Per Run Illumina Genomic Sequence .5Tb to 1Tb per run Confocal Microscope 50-100Gb per run Transmission Electron Microcope 10-20Gb per run Montana State University-Information Technology Center
  • 4. Research Data Census was a Three Way Institutional Partnership Information Technology Center University Library Vice President for Research & Economic Development Montana State University-Information Technology Center
  • 5. Response Rates and Demographics Montana State University-Information Technology Center
  • 6. What Types of Research Data Do You Have? Montana State University-Information Technology Center
  • 7. How Do You Store Your Data? Montana State University-Information Technology Center
  • 8. How Large is Your Research Data? Montana State University-Information Technology Center
  • 9. Who Do You Share Your Data With and When? Montana State University-Information Technology Center
  • 10. Statistically Significant Findings Montana State University-Information Technology Center •Researchers who share their data, regardless of who they share it with (colleagues, students, or non-MSU researchers) also tend to download data from other sources or repositories (78 percent of people sharing their data also download data, versus 37 percent of people not sharing their data; p-value: 1.67x10-7 ). •Researchers with large research data tend to download data from other sources or repositories (90 percent of people with data sets above one terabyte also download data, versus 42 percent for people with data sets below 10 Gb; p-value: 1.58x10-5 ). •Researchers who back up their data also tend to annotate it (55 percent of people who back up their data also annotate it, versus 22 percent of people who don't back up their data; p-value: 5x10-3 ). •Researchers with large research data tend to annotate it (62 percent of people with data sets above one terabyte also annotate their data, versus 39 percent of people with data sets below 10 Gb; p-value: 0.024). •Researchers interested in learning more about data infrastructure and services who do not back up their data cite technical barriers as their main reason for not doing so (p-value: 0.014).
  • 11. Qualitative Interview Findings Montana State University-Information Technology Center •Researchers don’t usually describe their data by size, although many know the exact size of their data. Instead, their standard practice is to describe how they transfer the file (via email, placed on hard drives, put in cloud services, etc. •Researchers' sense of when and how data is disseminated and shared varied widely. •There is no common definition of “big data”. Definitions change between disciplines, researchers build “bigger data” by aggregating many small research results. •Without exception, interviewees described their research practices as involving collaboration with others, both inside and outside the institution. •All researchers responded positively when asked if they would engage MSU Library services that focus on data set annotation and metadata markup, assistance with deposit in relevant data repositories, and educational programs and training on campus IT resources.
  • 12. Impacts of the the RDC Montana State University-Information Technology Center • Creation of a multi-stakeholder proposal ($500K) to the National Science Foundation for investment in a science network for the Bozeman campus. PI: Jerry Sheehan, Co-PIs: Kenning Arlitsch, Ben Poulter, Phil Stewart, and Mark Young. • Input from the Research Data Census and the NSF Proposal is Driving FY16 Capital Investments for Campus. • New Collaboration between ITC and the Library to Bundle A Set of Data Services and Infrastructure for the Montana State University Research Community. • Formal Publication of Survey Results in On-Line Educause Review (Sept/Oct 2015). • Modification of the Survey Instrument, Adoption of Instrument by Other MSU Campuses, and Sharing of Instrument with Higher Education Community.