SlideShare a Scribd company logo
NIH – Big Data to
Knowledge
What is BD2K?
 Why is NIH investing $100M in this?
 For information about BD2K – click here




The following slides are highlights and
notes from NIH workshop events

*Information contained here belongs to the author and is not an official viewpoint of the NIH or any other organization
Drivers behind the BD2K grant
To meet the emerging needs of the
biomedical research community
 To create a better research ecosystem
 NIH seeks to invest in ways to help
researchers easily find, access, analyze,
and curate research data

The Purpose of NIH’s Data
Catalog Workshops
To take steps, independently and in
partnership with others, to enable a
future state in which clinical data
(including electronic health record data)
are used effectively to conduct research
and improve population health
 Workshop participants engage actively
in the discussions helping NIH develop
plans, programs, and funding initiatives
to implement BD2K

Challenges
Data sharing among biomedical
researchers is lacking
 There is no technical infrastructure for NIHfunded researchers to easily submit
datasets associated with their work
 Those datasets are not available to other
researchers
 There is little motivation to share data,
since the most common current unit of
academic credit is co-authorship in the
peer-reviewed literature

NIH’s Goals for BD2K
To advance basic and translational
science by facilitating and enhancing the
sharing of research generated data
 To promote the development of new
analytical methods and software for this
emerging data
 To increase the workforce in quantitative
science toward maximizing the return on
the NIH’s public investment in
biomedical research

NIH’s Goals for BD2K
 To

improve the public’s ability to
discover and access data resulting
from federally funded research
 Researchers want visual analytics,
and to build the database into a
―social network‖ – being able to
―friend‖ or ―like‖ the data
The Model
When the NIH created ClinicalTrials.gov in
collaboration with the Food and Drug
Administration (FDA) and medical journals,
the resource enabled clinical research
investigators to track ongoing or completed
trials. Subsequent requirements to enter
outcome data have added to its value.
 Establishing an analogous repository of
molecular, phenotype, imaging, and other
biomedical research data is of great value
to the biomedical research community.

NIH is looking for solutions


The development and implementation of analytical
methods and software tools valuable to the research
community follow a four-stage process.
 Prototyping within the context of targeted scientific

research projects
 Engineering within robust software tools that provide
appropriate user interfaces and data input/output features
for effective community adoption and utilization
 Dissemination to the research community — this process
that may require the availability of appropriate data
storage and computational resources
 Maintenance and support is required to address users’
questions, community-driven requests for bug fixes,
usability improvements, and new features
The Opportunity
 The

training of future data scientists is
at stake
 The creation of a platform for
scientific communities to share data
with citizen groups
 A new science – new discoveries and
relationships across data
NIH Data Catalog: Future
Vision







Interoperation with other systems,
interdisciplinary collaborations
―Likes‖ and cited metrics helping to find
relevant datasets
Non-obvious relationship discovery
Journals imbed links within publications
Enable learning: educational uses of data
Return data to the community: patients too
can access data
Search is Broken vs. Big
Data






Documents are not
just containers for
keywords.
Objects & meanings
relate to people,
documents, snippets,
tweets, journals,
doctors, caregivers,
patients.
Search is about the
keywords and ignores
everything else.

www.ibm.com
Academic Publishing vs.
Open Access
August 2013 – Univ. of California approved
open access standards for research on all
campuses.
 2012 – Harvard Library urged its 2,100
faculty to boycott for-profit academic
research databases and instead submit
articles to lower-cost open access journals.
 Also, the White House pledged $100
million to promote open access and to
require all federally-funded research to be
free of charge.

Clinical Studies and
Collaboration with
Pharmaceutical Companies: in
 The real-world population is rarely reflected
the selected population of a single clinical trial
data set. Combining and mining multiple data
sets can produce a more holistic view, which is
the standard that both patients and regulators
expect therapies to be measured against.
 Pharma companies need to embrace the
challenge of using combined data sets to
uncover insights they did not previously have.
 This has the potential to benefit both the
competing companies producing drugs and
patients who will have improved outcomes.
Solutions Profile
 There

should be a system put in place
by NIH/NLM for widespread sharing
of data.
 Feedback: ―we have the information,
but we do not know how to use it.‖
 A data system should be created to
integrate data types, capture data,
and create ―space‖ for raw data.
BD2K Overview
Investing in technology and tools needed to
enable researchers to easily find, access,
analyze, and curate research data.
 To increase the capacity of the workforce
(both for experts and non-experts) and
employ strategic planning to leverage IT
advances for the entire NIH community.
 Millions of Americans (citizen scientists)
who may want to research their own
disease history.

The Citizen Scientist






1 million users/patients
download their health data,
much is unreadable.
Mashups occur to build apps
to read health records.
The biomedical research
community is within a few
years of the ―thousand-dollar
human genome needing a
million-dollar interpretation.‖

More Related Content

PPT
BD2K Update
PPTX
Towards the Digital Research Enterprise
PPT
The Vision for Data @ the NIH
PPT
Data Analytics
PPT
PPTX
A SWOT Analysis of Data Science @ NIH
PPTX
Compliance: Data Management Plans and Public Access to Data
PPT
The NIH as a Digital Enterprise: Implications for PAG
BD2K Update
Towards the Digital Research Enterprise
The Vision for Data @ the NIH
Data Analytics
A SWOT Analysis of Data Science @ NIH
Compliance: Data Management Plans and Public Access to Data
The NIH as a Digital Enterprise: Implications for PAG

What's hot (20)

PPT
Workshop intro090314
PPTX
Al aposter mhenderson2015
PPTX
How to Comply with Grants: Writing Data Management Plans and Providing Public...
PDF
Va sla nov 15 final
PPTX
Inroads into Data: Getting Involved in Data at Your Institution
PDF
NSF Data Requirements and Changing Federal Requirements for Research
PPT
Data Science in Biomedicine - Where Are We Headed?
PDF
What to do about data? An overview of guidelines and policies for dataset co...
PPTX
Gather evidence to demonstrate the impact of your research
PDF
Data Governance in two different data archives: When is a federal data reposi...
PPT
Big Data in Biomedicine: Where is the NIH Headed
PPTX
Managing data responsibly to enable research interity
PPT
Open Data in a Global Ecosystem
PPTX
Industry Uses of HHS Data
PPTX
ACRL STS Liaisons Forum - AIBS
PPTX
State of open research data open con
PPTX
Impact of DDOD on Data Quality - White House 2016
PDF
Navigating the data management ecosystem - Dan Valen
PDF
Sharing and standards christopher hart - clinical innovation and partnering...
PPTX
Open Access as a Means to Produce High Quality Data
Workshop intro090314
Al aposter mhenderson2015
How to Comply with Grants: Writing Data Management Plans and Providing Public...
Va sla nov 15 final
Inroads into Data: Getting Involved in Data at Your Institution
NSF Data Requirements and Changing Federal Requirements for Research
Data Science in Biomedicine - Where Are We Headed?
What to do about data? An overview of guidelines and policies for dataset co...
Gather evidence to demonstrate the impact of your research
Data Governance in two different data archives: When is a federal data reposi...
Big Data in Biomedicine: Where is the NIH Headed
Managing data responsibly to enable research interity
Open Data in a Global Ecosystem
Industry Uses of HHS Data
ACRL STS Liaisons Forum - AIBS
State of open research data open con
Impact of DDOD on Data Quality - White House 2016
Navigating the data management ecosystem - Dan Valen
Sharing and standards christopher hart - clinical innovation and partnering...
Open Access as a Means to Produce High Quality Data
Ad

Similar to NIH Big Data to Knowledge (BD2K) (20)

PPT
Meeting the Computational Challenges Associated with Human Health
PPT
Data Science at NIH and its Relationship to Social Computing, Behavioral-Cult...
PPT
The Thinking Behind Big Data at the NIH
PPT
AMIA 2014
PPT
Ask Not What the NIH Can Do For You; Ask What You Can Do For the NIH
PPTX
Big Data as a Catalyst for Collaboration & Innovation
PPTX
Biomedical Data Sciences - New Name and New Opportunities for Change?
PPTX
PSB2014 A Vision for Biomedical Research
PPT
Data at the NIH
PPTX
BD2K Update
PPT
Data Science BD2K Update for NIH
PPT
Yale Day of Data
PPT
Human Genome and Big Data Challenges
PPTX
Will Biomedical Research Fundamentally Change in the Era of Big Data?
PDF
Natasha Bonhomme, "From Patient to Participant: The Evolving Role of Consumer...
PPT
The Commons
PPTX
BD2K @ NIH - A Vision Through 2020
PPTX
Data commons bonazzi bd2 k fundamentals of science feb 2017
PDF
Open Educational Resources for Big Data Science
PPT
Opportunities and Challenges for International Cooperation Around Big Data
Meeting the Computational Challenges Associated with Human Health
Data Science at NIH and its Relationship to Social Computing, Behavioral-Cult...
The Thinking Behind Big Data at the NIH
AMIA 2014
Ask Not What the NIH Can Do For You; Ask What You Can Do For the NIH
Big Data as a Catalyst for Collaboration & Innovation
Biomedical Data Sciences - New Name and New Opportunities for Change?
PSB2014 A Vision for Biomedical Research
Data at the NIH
BD2K Update
Data Science BD2K Update for NIH
Yale Day of Data
Human Genome and Big Data Challenges
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Natasha Bonhomme, "From Patient to Participant: The Evolving Role of Consumer...
The Commons
BD2K @ NIH - A Vision Through 2020
Data commons bonazzi bd2 k fundamentals of science feb 2017
Open Educational Resources for Big Data Science
Opportunities and Challenges for International Cooperation Around Big Data
Ad

Recently uploaded (20)

PPTX
Fundamentals of human energy transfer .pptx
PPTX
Neuropathic pain.ppt treatment managment
PPTX
Respiratory drugs, drugs acting on the respi system
PPTX
Important Obstetric Emergency that must be recognised
PPTX
Pathophysiology And Clinical Features Of Peripheral Nervous System .pptx
PPTX
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
PPTX
CEREBROVASCULAR DISORDER.POWERPOINT PRESENTATIONx
PPTX
NEET PG 2025 Pharmacology Recall | Real Exam Questions from 3rd August with D...
PPTX
Electromyography (EMG) in Physiotherapy: Principles, Procedure & Clinical App...
PPTX
anal canal anatomy with illustrations...
PPTX
surgery guide for USMLE step 2-part 1.pptx
DOCX
RUHS II MBBS Microbiology Paper-II with Answer Key | 6th August 2025 (New Sch...
PDF
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
PPTX
Stimulation Protocols for IUI | Dr. Laxmi Shrikhande
PDF
Medical Evidence in the Criminal Justice Delivery System in.pdf
PPTX
15.MENINGITIS AND ENCEPHALITIS-elias.pptx
PPT
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
PPT
STD NOTES INTRODUCTION TO COMMUNITY HEALT STRATEGY.ppt
PPTX
Uterus anatomy embryology, and clinical aspects
PPTX
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
Fundamentals of human energy transfer .pptx
Neuropathic pain.ppt treatment managment
Respiratory drugs, drugs acting on the respi system
Important Obstetric Emergency that must be recognised
Pathophysiology And Clinical Features Of Peripheral Nervous System .pptx
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
CEREBROVASCULAR DISORDER.POWERPOINT PRESENTATIONx
NEET PG 2025 Pharmacology Recall | Real Exam Questions from 3rd August with D...
Electromyography (EMG) in Physiotherapy: Principles, Procedure & Clinical App...
anal canal anatomy with illustrations...
surgery guide for USMLE step 2-part 1.pptx
RUHS II MBBS Microbiology Paper-II with Answer Key | 6th August 2025 (New Sch...
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
Stimulation Protocols for IUI | Dr. Laxmi Shrikhande
Medical Evidence in the Criminal Justice Delivery System in.pdf
15.MENINGITIS AND ENCEPHALITIS-elias.pptx
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
STD NOTES INTRODUCTION TO COMMUNITY HEALT STRATEGY.ppt
Uterus anatomy embryology, and clinical aspects
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx

NIH Big Data to Knowledge (BD2K)

  • 1. NIH – Big Data to Knowledge What is BD2K?  Why is NIH investing $100M in this?  For information about BD2K – click here   The following slides are highlights and notes from NIH workshop events *Information contained here belongs to the author and is not an official viewpoint of the NIH or any other organization
  • 2. Drivers behind the BD2K grant To meet the emerging needs of the biomedical research community  To create a better research ecosystem  NIH seeks to invest in ways to help researchers easily find, access, analyze, and curate research data 
  • 3. The Purpose of NIH’s Data Catalog Workshops To take steps, independently and in partnership with others, to enable a future state in which clinical data (including electronic health record data) are used effectively to conduct research and improve population health  Workshop participants engage actively in the discussions helping NIH develop plans, programs, and funding initiatives to implement BD2K 
  • 4. Challenges Data sharing among biomedical researchers is lacking  There is no technical infrastructure for NIHfunded researchers to easily submit datasets associated with their work  Those datasets are not available to other researchers  There is little motivation to share data, since the most common current unit of academic credit is co-authorship in the peer-reviewed literature 
  • 5. NIH’s Goals for BD2K To advance basic and translational science by facilitating and enhancing the sharing of research generated data  To promote the development of new analytical methods and software for this emerging data  To increase the workforce in quantitative science toward maximizing the return on the NIH’s public investment in biomedical research 
  • 6. NIH’s Goals for BD2K  To improve the public’s ability to discover and access data resulting from federally funded research  Researchers want visual analytics, and to build the database into a ―social network‖ – being able to ―friend‖ or ―like‖ the data
  • 7. The Model When the NIH created ClinicalTrials.gov in collaboration with the Food and Drug Administration (FDA) and medical journals, the resource enabled clinical research investigators to track ongoing or completed trials. Subsequent requirements to enter outcome data have added to its value.  Establishing an analogous repository of molecular, phenotype, imaging, and other biomedical research data is of great value to the biomedical research community. 
  • 8. NIH is looking for solutions  The development and implementation of analytical methods and software tools valuable to the research community follow a four-stage process.  Prototyping within the context of targeted scientific research projects  Engineering within robust software tools that provide appropriate user interfaces and data input/output features for effective community adoption and utilization  Dissemination to the research community — this process that may require the availability of appropriate data storage and computational resources  Maintenance and support is required to address users’ questions, community-driven requests for bug fixes, usability improvements, and new features
  • 9. The Opportunity  The training of future data scientists is at stake  The creation of a platform for scientific communities to share data with citizen groups  A new science – new discoveries and relationships across data
  • 10. NIH Data Catalog: Future Vision       Interoperation with other systems, interdisciplinary collaborations ―Likes‖ and cited metrics helping to find relevant datasets Non-obvious relationship discovery Journals imbed links within publications Enable learning: educational uses of data Return data to the community: patients too can access data
  • 11. Search is Broken vs. Big Data    Documents are not just containers for keywords. Objects & meanings relate to people, documents, snippets, tweets, journals, doctors, caregivers, patients. Search is about the keywords and ignores everything else. www.ibm.com
  • 12. Academic Publishing vs. Open Access August 2013 – Univ. of California approved open access standards for research on all campuses.  2012 – Harvard Library urged its 2,100 faculty to boycott for-profit academic research databases and instead submit articles to lower-cost open access journals.  Also, the White House pledged $100 million to promote open access and to require all federally-funded research to be free of charge. 
  • 13. Clinical Studies and Collaboration with Pharmaceutical Companies: in  The real-world population is rarely reflected the selected population of a single clinical trial data set. Combining and mining multiple data sets can produce a more holistic view, which is the standard that both patients and regulators expect therapies to be measured against.  Pharma companies need to embrace the challenge of using combined data sets to uncover insights they did not previously have.  This has the potential to benefit both the competing companies producing drugs and patients who will have improved outcomes.
  • 14. Solutions Profile  There should be a system put in place by NIH/NLM for widespread sharing of data.  Feedback: ―we have the information, but we do not know how to use it.‖  A data system should be created to integrate data types, capture data, and create ―space‖ for raw data.
  • 15. BD2K Overview Investing in technology and tools needed to enable researchers to easily find, access, analyze, and curate research data.  To increase the capacity of the workforce (both for experts and non-experts) and employ strategic planning to leverage IT advances for the entire NIH community.  Millions of Americans (citizen scientists) who may want to research their own disease history. 
  • 16. The Citizen Scientist    1 million users/patients download their health data, much is unreadable. Mashups occur to build apps to read health records. The biomedical research community is within a few years of the ―thousand-dollar human genome needing a million-dollar interpretation.‖

Editor's Notes

  • #2: Why is the NIH investing $100M at the intersection of data science and health research?