SlideShare a Scribd company logo
Biomedical Research as an Open Digital
Enterprise Philip E. Bourne Ph.D.
Associate Director for Data Science
National Institutes of Health
http://www.slideshare.net/pebourne
A View from the Funding Agencies
“It was the best of times, it was the
worst of times, it was the age of
wisdom, it was the age of foolishness,
it was the epoch of belief, it was the
epoch of incredulity, it was the season
of Light, it was the season of
Darkness, it was the spring of hope, it
was the winter of despair …”
A Tale of Two Numbers
Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
We (the NIH) Are Working On, But As
Yet Do Not Have Good Answers To:
1. Today, how much are we actually
spending on data and software related
activities?
2. How much should we be spending to
achieve the maximum benefit to
biomedical science relative to what we
spend in other areas?
There are other drivers of change out
there besides economics and an
increasing emphasis on data and
analytics
47/53 “landmark” publications
could not be replicated
[Begley, Ellis Nature,
483, 2012] [Carole Goble]
Reproducibility
 Most of the 27 Institutes and Centers of the NIH are
currently reviewing the ability to reproduce research
they are funding
 The NIH recently convened a meeting with publishers
to discuss the issue – a set of guiding principles
arose
Reproducibility – More is in the Works
 Much of the research life cycle is now digital -
encourage the reliability, accessibility, findability,
usability of data, methods, narrative, publications etc.
 How?
 Data sharing plans
 Standards frameworks
 Data and software catalogs
 PubMedCentral
? The Commons – PMC for the complete lifecycle
? Machine readable data sharing plans
? Small funding to communities
? Support for training and best practices in eScholarship
Growth as Another Driver
 Evidence:
– Google car
– 3D printers
– Waze
– Robotics
From: The Second Machine Age: Work, Progress,
and Prosperity in a Time of Brilliant Technologies
by Erik Brynjolfsson & Andrew McAfee
To Summarize Thus Far …
A time of great (unprecedented?)
scientific development but limited
funding
A time of upheaval in the way we do
science
From a funders perspective…
A time to squeeze every cent/penny to
maximize the amount of research that
can be done
A time when top down approaches
meet bottom up approaches
Top Down vs Bottom Up
 Top Down
– Regulations e.g. US:
Common Rule, FISMA,
HIPPA
– Data sharing policies
• OSTP
• GWAS
• Genome data
• Clinical trials
– Digital enablement
– Moves towards
reproducibility
 Bottom Up
– Communities emerge
and crowd source
• Collaboration
• Data shared
• Open source
software
• Common principles
• Standards
And Considering This Audience…
It was the age when software
developers are in the greatest demand
for science..
It was the age when the rewards
outside academia are greater than the
rewards inside
Optimistically This is a Time of
Opportunity
 The time for software
developers is here
 The time to derive new
business models is here
 The time to foster best
software practices is here
 ….
Okay so what are we doing about it?
To start with we are thinking about the
complete research lifecycle
The Research Life Cycle
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Tools and Resources Will Continue To
Be Developed
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Analysis
Tools
Visualization
Scholarly
Communication
Those Elements of the Research Life Cycle Need to
Become More Interconnected Around a Common
Framework
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Analysis
Tools
Visualization
Scholarly
Communication
Those Elements of the Research Life Cycle
Need to Become More Interconnected Around a
Common Framework
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Analysis
Tools
Visualization
Scholarly
Communication
Commercial &
Public Tools
Git-like
Resources
By Discipline
Data Journals
Discipline-
Based Metadata
Standards
Community Portals
Institutional Repositories
New Reward
Systems
Commercial Repositories
Training
Those Elements of the Research Life Cycle Need to
Become More Interconnected Around a Common
Framework
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
Lab
Notebooks
Data
Capture
Software
Analysis
Tools
Visualization
Scholarly
Communication
Commercial &
Public Tools
Git-like
Resources
By Discipline
Data Journals
Discipline-
Based Metadata
Standards
Community Portals
Institutional Repositories
New Reward
Systems
Commercial Repositories
Training
What are we proposing as that
common framework?
The Commons
Is …
 A public/private partnership
 An agile development starting with the evaluation of a
few pilots
 An example: porting DbGAP to the cloud
 An experiment with new funding strategies
What The Commons Is and Is Not
 Is Not:
– A database
– Confined to one physical
location
– A new large
infrastructure
– Owned by any one group
 Is:
– A conceptual framework
– Analogous to the Internet
– A collaboratory
– A few shared rules
• All research objects
have unique
identifiers
• All research objects
have limited
provenance
Sustainability and Sharing: The Commons
Data
The Long Tail
Core Facilities/HS Centers
Clinical /Patient
The Why:
Data Sharing Plans
The
Commons
Government
The How:
Data
Discovery
Index
Sustainable
Storage
Quality
Scientific
Discovery
Usability
Security/
Privacy
Commons == Extramural NCBI == Research Object Sandbox == Collaborative Environment
The End Game:
KnowledgeNIH
Awardees
Private
Sector
Metrics/
Standards
Rest of
Academia
Software Standards
Index
BD2K
Centers
Cloud, Research Objects,
What Does the Commons Enable?
 Dropbox like storage
 The opportunity to apply quality metrics
 Bring compute to the data
 A place to collaborate
 A place to discover
http://100plus.com/wp-content/uploads/Data-Commons-3-
1024x825.png
[Adapted from George Komatsoulis]
One Possible Commons Business Model
HPC, Institution …
Commons Pilots
 Define a set of use cases emphasizing:
– Openness of the system
– Support for basic statistical analysis
– Embedding of existing applications
– API support into existing resources
 Evaluate against the use cases
 Review results & business model with NIH leadership
 Design a pilot phase with various groups
 Conduct pilot for 6-12 months
 Evaluate outcomes and determine whether a wider
deployment makes sense
 Report to NIH leadership summer 2015
What Will Software Development Look
Like in the Commons?
 Software identifiers make software:
– Easy to find
– Easy of use
– Easy to cite
 Which means:
– Need a standard citation scheme
– Publishers must be encouraged to use it
– The software index should facilitate the above AND
• Provide metrics for use
• Ability to provide commentary
Minimal Software Specification
 Title
 Version
 License
 Links to source
 Human readable synopsis
 Author names, affiliations
 Ontological terms describing software
 Dependencies
 Acknowledgements
 Publications
Examples of Folks We Want to
Engage
 Other funding agencies – national and international
 Open Science Framework https://osf.io/
 Evernote https://evernote.com/
 Simtk https://simtk.org/xml/index.xml
 MyExperiment http://www.myexperiment.org/
 Galaxy http://galaxyproject.org/
 Lab notebook systems
 Other systems used already by NIH
Putting it all together in a coherent
strategy….
Associate Director for Data Science
Commons
Training
Center
BD2K
Modified
Review
Sustainability* Education* Innovation* Process
• Cloud – Data &
Compute
• Search
• Security
• Reproducibility
Standards
• App Store
• Coordinate
• Hands-on
• Syllabus
• MOOCs
• Community
• Centers
• Training Grants
• Catalogs
• Standards
• Analysis
• Data
Resource
Support
• Metrics
• Best
Practices
• Evaluation
• Portfolio
Analysis
The Biomedical Research Digital Enterprise
Communication
Collaboration
rogrammatic Theme
Deliverable
Example Features • IC’s
• Researchers
• Federal
Agencies
• International
Partners
• Computer
Scientists
Scientific Data Council External Advisory Board
* Hires made
BD2K – Commons Users
 Centers of Excellence in Data Science (Awards 9/14)
 Data Discovery Index Consortium (Award 9/14)
 Training grants awarded (Awards 9/14)
 Software development (Awards 15)
 Standards framework (Awards 15)
 Software index consortium (Award 15)
 Awards next year ~$100M
Mission Statement
To foster an ecosystem that enables
biomedical research to be conducted
as a digital enterprise that enhances
health, lengthens life and reduces
illness and disability
Some Acknowledgements
 Eric Green & Mark Guyer (NHGRI)
 Jennie Larkin (NHLBI)
 Leigh Finnegan (NHGRI)
 Vivien Bonazzi (NHGRI)
 Michelle Dunn (NCI)
 Mike Huerta (NLM)
 David Lipman (NLM)
 Jim Ostell (NLM)
 Andrea Norris (CIT)
 Peter Lyster (NIGMS)
 All the over 100 folks on the BD2K team

More Related Content

PPT
The Role of Automated Function Prediction in the Era of Big Data and Small Bu...
PPTX
Tragedy of the (Data) Commons
PPT
The Thinking Behind Big Data at the NIH
PPTX
Tragedy of the Data Commons (ODSC-East, 2021)
PPTX
Force11: Enabling transparency and efficiency in the research landscape
PPT
Open Data in a Global Ecosystem
PPT
AMIA 2014
PPT
Data Science BD2K Update for NIH
The Role of Automated Function Prediction in the Era of Big Data and Small Bu...
Tragedy of the (Data) Commons
The Thinking Behind Big Data at the NIH
Tragedy of the Data Commons (ODSC-East, 2021)
Force11: Enabling transparency and efficiency in the research landscape
Open Data in a Global Ecosystem
AMIA 2014
Data Science BD2K Update for NIH

What's hot (20)

PPTX
The Future(s) of the World Wide Web
PPT
Yale Day of Data
PPTX
The Commons: Leveraging the Power of the Cloud for Big Data
PPTX
The Evolution of e-Research: Machines, Methods and Music
PPT
BD2K Update
PPTX
A SWOT Analysis of Data Science @ NIH
PPTX
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
PPTX
Meeting Federal Research Requirements
PPTX
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
PPT
Foundations for Discovery Informatics
PPT
Big Data in Biomedicine – An NIH Perspective
PPTX
From Data Sharing to Data Stewardship
PPTX
Introduction and E-Research Timeline Review
PPT
There is No Intelligent Life Down Here
PPTX
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
PPT
Meeting the Computational Challenges Associated with Human Health
PPTX
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
PPTX
ICPSR Data Exploration Tools
PPTX
Urban Data Science at UW
PPTX
Linked Open Data_mlanet13
The Future(s) of the World Wide Web
Yale Day of Data
The Commons: Leveraging the Power of the Cloud for Big Data
The Evolution of e-Research: Machines, Methods and Music
BD2K Update
A SWOT Analysis of Data Science @ NIH
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Foundations for Discovery Informatics
Big Data in Biomedicine – An NIH Perspective
From Data Sharing to Data Stewardship
Introduction and E-Research Timeline Review
There is No Intelligent Life Down Here
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Meeting the Computational Challenges Associated with Human Health
Instructional Data Sets from Q-step Launch Event (Univ of Exeter) 3-20-2014
ICPSR Data Exploration Tools
Urban Data Science at UW
Linked Open Data_mlanet13
Ad

Viewers also liked (20)

PPT
Cross-Disciplinary Biomedical Research at Calit2
KEY
Case Study: SRM 2.0 - A next generation shared resource management system bui...
PPT
I V I F2 F July 2005 Talk
PPTX
150219 agbt giab_poster_marc
PPTX
Leadership in Decline: Assessing U.S. International Competitiveness in Biomed...
PPTX
Information Sciences Solutions to Core Facility Problems at St. Jude Children...
PDF
George Church: Standards & Open-Access Genome-Environment-Trait Data
PDF
Standards and tools for model management in biomedical research
PPTX
Maximizing Social Capital to Increase Core Facility Exposure and Usage
PDF
Genome in a Bottle Consortium Workshop Welcome Aug. 16
PDF
NIST program to develop genomic reference materials
PPTX
Biomedical research
PPT
A National Network of Biomedical Research Expertise
PPTX
Towards Biomedical Research as a Digital Enterprise
PPTX
Clean Labs Training
PDF
decentralization: a trend in biomedical research
PPTX
170326 giab abrf
PPTX
Making Biomedical Research More Like Airbnb
PPTX
Giab jan2016 intro and update 160128
PDF
Core Facility 2.0 - leveraging social media to enhance visibility
Cross-Disciplinary Biomedical Research at Calit2
Case Study: SRM 2.0 - A next generation shared resource management system bui...
I V I F2 F July 2005 Talk
150219 agbt giab_poster_marc
Leadership in Decline: Assessing U.S. International Competitiveness in Biomed...
Information Sciences Solutions to Core Facility Problems at St. Jude Children...
George Church: Standards & Open-Access Genome-Environment-Trait Data
Standards and tools for model management in biomedical research
Maximizing Social Capital to Increase Core Facility Exposure and Usage
Genome in a Bottle Consortium Workshop Welcome Aug. 16
NIST program to develop genomic reference materials
Biomedical research
A National Network of Biomedical Research Expertise
Towards Biomedical Research as a Digital Enterprise
Clean Labs Training
decentralization: a trend in biomedical research
170326 giab abrf
Making Biomedical Research More Like Airbnb
Giab jan2016 intro and update 160128
Core Facility 2.0 - leveraging social media to enhance visibility
Ad

Similar to Biomedical Research as an Open Digital Enterprise (20)

PPT
Human Genome and Big Data Challenges
PPT
Biomedical Research as Part of the Digital Enterprise
PPT
PhRMA Some Early Thoughts
PPT
PPTX
Reproducibility: A Funder and Data Science Perspective
PPT
Data at the NIH
PPT
Data at the NIH: Some Early Thoughts
PPTX
The future of the commons
PPTX
Towards a Platform for Global Health
PPT
Ask Not What the NIH Can Do For You; Ask What You Can Do For the NIH
PPT
The Era of Open
PPTX
Data commons bonazzi bd2 k fundamentals of science feb 2017
PPTX
The Future of Open Science
PPTX
Understanding the Big Data Enterprise
PPTX
PSB2014 A Vision for Biomedical Research
PPT
Some Early Thoughts
PDF
A Data Biosphere for Biomedical Research
PPTX
Big Data as a Catalyst for Collaboration & Innovation
PPTX
Biomedical Data Science: We Are Not Alone
PPT
Where Have We Been & Where Are We Going?
Human Genome and Big Data Challenges
Biomedical Research as Part of the Digital Enterprise
PhRMA Some Early Thoughts
Reproducibility: A Funder and Data Science Perspective
Data at the NIH
Data at the NIH: Some Early Thoughts
The future of the commons
Towards a Platform for Global Health
Ask Not What the NIH Can Do For You; Ask What You Can Do For the NIH
The Era of Open
Data commons bonazzi bd2 k fundamentals of science feb 2017
The Future of Open Science
Understanding the Big Data Enterprise
PSB2014 A Vision for Biomedical Research
Some Early Thoughts
A Data Biosphere for Biomedical Research
Big Data as a Catalyst for Collaboration & Innovation
Biomedical Data Science: We Are Not Alone
Where Have We Been & Where Are We Going?

More from Philip Bourne (20)

PPTX
Your Science Needs You - More Than Ever Before
PPTX
The Biological Data Sustainability Paradox: A Time to Think Differently
PPTX
Data Science and AI in Biomedicine: The World has Changed
PPTX
Data Science and AI in Biomedicine: The World has Changed
PPTX
AI in Medical Education A Meta View to Start a Conversation
PPTX
AI+ Now and Then How Did We Get Here And Where Are We Going
PPTX
Thoughts on Biological Data Sustainability
PPTX
What is FAIR Data and Who Needs It?
PPTX
Data Science Meets Biomedicine, Does Anything Change
PPTX
Data Science Meets Drug Discovery
PPTX
BIMS7100-2023. Social Responsibility in Research
PPTX
AI from the Perspective of a School of Data Science
PPTX
What Data Science Will Mean to You - One Person's View
PPTX
Novo Nordisk 080522.pptx
PPTX
Towards a US Open research Commons (ORC)
PPTX
COVID and Precision Education
PPTX
One View of Data Science
PPTX
Cancer Research Meets Data Science — What Can We Do Together?
PPTX
Data Science Meets Open Scholarship – What Comes Next?
PPTX
Data to Advance Sustainability
Your Science Needs You - More Than Ever Before
The Biological Data Sustainability Paradox: A Time to Think Differently
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
AI in Medical Education A Meta View to Start a Conversation
AI+ Now and Then How Did We Get Here And Where Are We Going
Thoughts on Biological Data Sustainability
What is FAIR Data and Who Needs It?
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Drug Discovery
BIMS7100-2023. Social Responsibility in Research
AI from the Perspective of a School of Data Science
What Data Science Will Mean to You - One Person's View
Novo Nordisk 080522.pptx
Towards a US Open research Commons (ORC)
COVID and Precision Education
One View of Data Science
Cancer Research Meets Data Science — What Can We Do Together?
Data Science Meets Open Scholarship – What Comes Next?
Data to Advance Sustainability

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPT
Teaching material agriculture food technology
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
A Presentation on Artificial Intelligence
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Approach and Philosophy of On baking technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
NewMind AI Weekly Chronicles - August'25 Week I
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
KodekX | Application Modernization Development
Encapsulation_ Review paper, used for researhc scholars
Understanding_Digital_Forensics_Presentation.pptx
Teaching material agriculture food technology
NewMind AI Monthly Chronicles - July 2025
Mobile App Security Testing_ A Comprehensive Guide.pdf
Review of recent advances in non-invasive hemoglobin estimation
A Presentation on Artificial Intelligence
Dropbox Q2 2025 Financial Results & Investor Presentation
Approach and Philosophy of On baking technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Building Integrated photovoltaic BIPV_UPV.pdf
Big Data Technologies - Introduction.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
NewMind AI Weekly Chronicles - August'25 Week I
The AUB Centre for AI in Media Proposal.docx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Unlocking AI with Model Context Protocol (MCP)
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
KodekX | Application Modernization Development

Biomedical Research as an Open Digital Enterprise

  • 1. Biomedical Research as an Open Digital Enterprise Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health http://www.slideshare.net/pebourne
  • 2. A View from the Funding Agencies
  • 3. “It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair …”
  • 4. A Tale of Two Numbers Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
  • 5. We (the NIH) Are Working On, But As Yet Do Not Have Good Answers To: 1. Today, how much are we actually spending on data and software related activities? 2. How much should we be spending to achieve the maximum benefit to biomedical science relative to what we spend in other areas?
  • 6. There are other drivers of change out there besides economics and an increasing emphasis on data and analytics
  • 7. 47/53 “landmark” publications could not be replicated [Begley, Ellis Nature, 483, 2012] [Carole Goble]
  • 8. Reproducibility  Most of the 27 Institutes and Centers of the NIH are currently reviewing the ability to reproduce research they are funding  The NIH recently convened a meeting with publishers to discuss the issue – a set of guiding principles arose
  • 9. Reproducibility – More is in the Works  Much of the research life cycle is now digital - encourage the reliability, accessibility, findability, usability of data, methods, narrative, publications etc.  How?  Data sharing plans  Standards frameworks  Data and software catalogs  PubMedCentral ? The Commons – PMC for the complete lifecycle ? Machine readable data sharing plans ? Small funding to communities ? Support for training and best practices in eScholarship
  • 10. Growth as Another Driver  Evidence: – Google car – 3D printers – Waze – Robotics From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee
  • 11. To Summarize Thus Far … A time of great (unprecedented?) scientific development but limited funding A time of upheaval in the way we do science
  • 12. From a funders perspective… A time to squeeze every cent/penny to maximize the amount of research that can be done A time when top down approaches meet bottom up approaches
  • 13. Top Down vs Bottom Up  Top Down – Regulations e.g. US: Common Rule, FISMA, HIPPA – Data sharing policies • OSTP • GWAS • Genome data • Clinical trials – Digital enablement – Moves towards reproducibility  Bottom Up – Communities emerge and crowd source • Collaboration • Data shared • Open source software • Common principles • Standards
  • 14. And Considering This Audience…
  • 15. It was the age when software developers are in the greatest demand for science.. It was the age when the rewards outside academia are greater than the rewards inside
  • 16. Optimistically This is a Time of Opportunity  The time for software developers is here  The time to derive new business models is here  The time to foster best software practices is here  ….
  • 17. Okay so what are we doing about it?
  • 18. To start with we are thinking about the complete research lifecycle
  • 19. The Research Life Cycle IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
  • 20. Tools and Resources Will Continue To Be Developed IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication
  • 21. Those Elements of the Research Life Cycle Need to Become More Interconnected Around a Common Framework IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication
  • 22. Those Elements of the Research Life Cycle Need to Become More Interconnected Around a Common Framework IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication Commercial & Public Tools Git-like Resources By Discipline Data Journals Discipline- Based Metadata Standards Community Portals Institutional Repositories New Reward Systems Commercial Repositories Training
  • 23. Those Elements of the Research Life Cycle Need to Become More Interconnected Around a Common Framework IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication Commercial & Public Tools Git-like Resources By Discipline Data Journals Discipline- Based Metadata Standards Community Portals Institutional Repositories New Reward Systems Commercial Repositories Training
  • 24. What are we proposing as that common framework?
  • 25. The Commons Is …  A public/private partnership  An agile development starting with the evaluation of a few pilots  An example: porting DbGAP to the cloud  An experiment with new funding strategies
  • 26. What The Commons Is and Is Not  Is Not: – A database – Confined to one physical location – A new large infrastructure – Owned by any one group  Is: – A conceptual framework – Analogous to the Internet – A collaboratory – A few shared rules • All research objects have unique identifiers • All research objects have limited provenance
  • 27. Sustainability and Sharing: The Commons Data The Long Tail Core Facilities/HS Centers Clinical /Patient The Why: Data Sharing Plans The Commons Government The How: Data Discovery Index Sustainable Storage Quality Scientific Discovery Usability Security/ Privacy Commons == Extramural NCBI == Research Object Sandbox == Collaborative Environment The End Game: KnowledgeNIH Awardees Private Sector Metrics/ Standards Rest of Academia Software Standards Index BD2K Centers Cloud, Research Objects,
  • 28. What Does the Commons Enable?  Dropbox like storage  The opportunity to apply quality metrics  Bring compute to the data  A place to collaborate  A place to discover http://100plus.com/wp-content/uploads/Data-Commons-3- 1024x825.png
  • 29. [Adapted from George Komatsoulis] One Possible Commons Business Model HPC, Institution …
  • 30. Commons Pilots  Define a set of use cases emphasizing: – Openness of the system – Support for basic statistical analysis – Embedding of existing applications – API support into existing resources  Evaluate against the use cases  Review results & business model with NIH leadership  Design a pilot phase with various groups  Conduct pilot for 6-12 months  Evaluate outcomes and determine whether a wider deployment makes sense  Report to NIH leadership summer 2015
  • 31. What Will Software Development Look Like in the Commons?  Software identifiers make software: – Easy to find – Easy of use – Easy to cite  Which means: – Need a standard citation scheme – Publishers must be encouraged to use it – The software index should facilitate the above AND • Provide metrics for use • Ability to provide commentary
  • 32. Minimal Software Specification  Title  Version  License  Links to source  Human readable synopsis  Author names, affiliations  Ontological terms describing software  Dependencies  Acknowledgements  Publications
  • 33. Examples of Folks We Want to Engage  Other funding agencies – national and international  Open Science Framework https://osf.io/  Evernote https://evernote.com/  Simtk https://simtk.org/xml/index.xml  MyExperiment http://www.myexperiment.org/  Galaxy http://galaxyproject.org/  Lab notebook systems  Other systems used already by NIH
  • 34. Putting it all together in a coherent strategy….
  • 35. Associate Director for Data Science Commons Training Center BD2K Modified Review Sustainability* Education* Innovation* Process • Cloud – Data & Compute • Search • Security • Reproducibility Standards • App Store • Coordinate • Hands-on • Syllabus • MOOCs • Community • Centers • Training Grants • Catalogs • Standards • Analysis • Data Resource Support • Metrics • Best Practices • Evaluation • Portfolio Analysis The Biomedical Research Digital Enterprise Communication Collaboration rogrammatic Theme Deliverable Example Features • IC’s • Researchers • Federal Agencies • International Partners • Computer Scientists Scientific Data Council External Advisory Board * Hires made
  • 36. BD2K – Commons Users  Centers of Excellence in Data Science (Awards 9/14)  Data Discovery Index Consortium (Award 9/14)  Training grants awarded (Awards 9/14)  Software development (Awards 15)  Standards framework (Awards 15)  Software index consortium (Award 15)  Awards next year ~$100M
  • 37. Mission Statement To foster an ecosystem that enables biomedical research to be conducted as a digital enterprise that enhances health, lengthens life and reduces illness and disability
  • 38. Some Acknowledgements  Eric Green & Mark Guyer (NHGRI)  Jennie Larkin (NHLBI)  Leigh Finnegan (NHGRI)  Vivien Bonazzi (NHGRI)  Michelle Dunn (NCI)  Mike Huerta (NLM)  David Lipman (NLM)  Jim Ostell (NLM)  Andrea Norris (CIT)  Peter Lyster (NIGMS)  All the over 100 folks on the BD2K team

Editor's Notes

  • #2: 1 hr
  • #8: Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124. doi:10.1371/journal.pmed.0020124 http://www.reuters.com/article/2012/03/28/us-science-cancer-idUSBRE82R12P20120328
  • #14: Federal Information Security Management Act of 2002 The Health Insurance Portability and Accountability Act of 1996