SlideShare a Scribd company logo
The Role of Automated Function Prediction in
the Era of Big Data and Small Budgets
Philip E. Bourne Ph.D.
Associate Director for Data Science
National Institutes of Health
A View from the Funding Agencies
“It was the best of times, it was the
worst of times, it was the age of
wisdom, it was the age of foolishness,
it was the epoch of belief, it was the
epoch of incredulity, it was the season
of Light, it was the season of
Darkness, it was the spring of hope, it
was the winter of despair …”
Roughly translated…
A time of great (unprecedented?)
scientific development but limited
funding
A time of upheaval in the way we do
science
From a funders perspective…
A time to squeeze every cent/penny to
maximize the amount of research that
can be done
A time for when top down approaches
meet bottom up approaches
Top Down vs Bottom Up
 Top Down
– Regulations e.g. US:
Common Rule, FISMA,
HIPPA
– Data sharing policies
• GWAS
• Genome data
• Clinical trials
– Digital enablement
– Moves towards
reproducibility
 Bottom Up
– Communities emerge
and crowdsource
• Collaboration
• Data shared
• Open source
software
• Common principles
• Standards
A Time for New Models
Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
And This May Just be the Beginning
 Evidence:
– Google car
– 3D printers
– Waze
– Robotics
From: The Second Machine Age: Work, Progress,
and Prosperity in a Time of Brilliant Technologies
by Erik Brynjolfsson & Andrew McAfee
Consider This an Opportunity
 Look at the value of
data
 Derive new business
models
 Look for new
efficiencies
 Foster best practices
 Foster collaboration
 ….
It is the age when functional
annotation is in the greatest demand
for science..
It is the age when the rewards outside
academia are greater than the rewards
inside
Associate Director for Data Science
Commons
Training
Center
BD2K
Modified
Review
Sustainability* Education* Innovation* Process
• Cloud – Data &
Compute
• Search
• Security
• Reproducibility
Standards
• App Store
• Coordinate
• Hands-on
• Syllabus
• MOOCs
• Community
• Centers
• Training Grants
• Catalogs
• Standards
• Analysis
• Data
Resource
Support
• Metrics
• Best
Practices
• Evaluation
• Portfolio
Analysis
The Biomedical Research Digital Enterprise
Communication
Collaboration
rogrammatic Theme
Deliverable
Example Features • IC’s
• Researchers
• Federal
Agencies
• International
Partners
• Computer
Scientists
Scientific Data Council External Advisory Board
* Hires made
Innovation – Big Data to Knowledge
BD2K
 Centers of excellence
 Software catalog
 Data catalog
 Software initiatives
 Standards
 Training
bd2k.nih.gov
Sustainability and Sharing: The Commons
Data
The Long Tail
Core Facilities/HS Centers
Clinical /Patient
The Why:
Data Sharing Plans
The
Commons
Government
The How:
Data
Discovery
Index
Sustainable
Storage
Quality
Scientific
Discovery
Usability
Security/
Privacy
Commons == Extramural NCBI == Research Object Sandbox == Collaborative Environment
The End Game:
KnowledgeNIH
Awardees
Private
Sector
Metrics/
Standards
Rest of
Academia
Software Standards
Index
BD2K
Centers
Cloud, Research Objects,
What The Commons Is and Is Not
 Is Not:
– A database
– Confined to one physical
location
– A new large
infrastructure
– Owned by any one group
 Is:
– A conceptual framework
– Analogous to the Internet
– A collaboratory
– A few shared rules
• All research objects
have unique
identifiers
• All research objects
have limited
provenance
What Does the Commons Enable?
 Dropbox like storage
 The opportunity to apply quality metrics
 Bring compute to the data
 A place to collaborate
 A place to discover
http://100plus.com/wp-content/uploads/Data-Commons-3-
1024x825.png
[Adapted from George Komatsoulis]
One Possible Commons Business Model
HPC, Institution …
What Are the Benefits to Those Doing
Functional Annotation?
 Open environment in which to test new ideas – better
for crowdsourcing
 Opportunity to gain resources to run annotation
pipelines
 Opportunity to collaborate through provision of open
APIs
 Better characterization and accessibility to annotation
methods
Commons Pilots
 Define a set of use cases emphasizing:
– Openness of the system
– Support for basic statistical analysis
– Embedding of existing applications
– API support into existing resources
 Evaluate against the use cases
 Review results & business model with NIH leadership
 Design a pilot phase with various groups
 Conduct pilot for 6-12 months
 Evaluate outcomes and determine whether a wider
deployment makes sense
 Report to NIH leadership summer 2015
Some Acknowledgements
 Eric Green & Mark Guyer (NHGRI)
 Jennie Larkin (NHLBI)
 Leigh Finnegan (NHGRI)
 Vivien Bonazzi (NHGRI)
 Michelle Dunn (NCI)
 Mike Huerta (NLM)
 David Lipman (NLM)
 Jim Ostell (NLM)
 Andrea Norris (CIT)
 Peter Lyster (NIGMS)
 All the over 100 folks on the BD2K team
NIHNIH……
Turning Discovery Into HealthTurning Discovery Into Health

More Related Content

PPT
PPT
Yale Day of Data
PPT
Biomedical Research as an Open Digital Enterprise
PPT
Workshop intro090314
PPTX
A SWOT Analysis of Data Science @ NIH
PPTX
Methods for measuring citizen-science impact
PPTX
Taylor Ghost of Altmetrics Yet to Come
PPTX
Gunn Designing Metrics that Serve Academica
Yale Day of Data
Biomedical Research as an Open Digital Enterprise
Workshop intro090314
A SWOT Analysis of Data Science @ NIH
Methods for measuring citizen-science impact
Taylor Ghost of Altmetrics Yet to Come
Gunn Designing Metrics that Serve Academica

What's hot (20)

PDF
Konkiel Exploring Values-Based Altmetrics
PPTX
The Commons: Leveraging the Power of the Cloud for Big Data
PDF
Kane "The Past is Prologue: Managing Change to Support an Expanding Research ...
PDF
Why Data Citation Currently Misses the Point
PPT
Virginia ACRL Presentation
PPTX
Data, Data Everywhere: What's A Publisher to Do?
PPTX
Digging into Data Funders Forum
PPTX
Big Data as a Catalyst for Collaboration & Innovation
PPTX
SWOT Analysis - What Does it Tell Us?
PPTX
FSCI Data management and data sharing
PPT
Virginia tech collections_presentation
PPTX
Understanding the Big Data Enterprise
PDF
Big Data for Library Services (2017)
PPTX
Assessing Digital Output in New Ways
PPTX
Why does research data matter to libraries
PPSX
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
PPTX
FAIR for the future: embracing all things data
PPTX
Making Data Meaningful
PPTX
Rubrics for DMPs
Konkiel Exploring Values-Based Altmetrics
The Commons: Leveraging the Power of the Cloud for Big Data
Kane "The Past is Prologue: Managing Change to Support an Expanding Research ...
Why Data Citation Currently Misses the Point
Virginia ACRL Presentation
Data, Data Everywhere: What's A Publisher to Do?
Digging into Data Funders Forum
Big Data as a Catalyst for Collaboration & Innovation
SWOT Analysis - What Does it Tell Us?
FSCI Data management and data sharing
Virginia tech collections_presentation
Understanding the Big Data Enterprise
Big Data for Library Services (2017)
Assessing Digital Output in New Ways
Why does research data matter to libraries
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
FAIR for the future: embracing all things data
Making Data Meaningful
Rubrics for DMPs
Ad

Similar to The Role of Automated Function Prediction in the Era of Big Data and Small Budgets (20)

PPT
Human Genome and Big Data Challenges
PPTX
A Big Picture in Research Data Management
PPT
Ask Not What the NIH Can Do For You; Ask What You Can Do For the NIH
PPT
The Thinking Behind Big Data at the NIH
PPT
AMIA 2014
PPTX
Data Science Meets Biomedicine, Does Anything Change
PPTX
Data Science and AI in Biomedicine: The World has Changed
PPTX
Towards a Platform for Global Health
PPTX
What Data Science Will Mean to You - One Person's View
PPTX
Research data life cycle
PPTX
One View of Data Science
PPTX
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
PPT
Foundations for Discovery Informatics
PPTX
Ps rwebinar january2019final
PPTX
Critical infrastructure to promote data synthesis
PDF
BigDataInPractice_EXLPHARMA_KOCH
PPT
Overview of Digital Publishing
PDF
Ratan "Are we there yet? Keeping the promise of open science"
PPTX
FORCE11: Creating a data and tools ecosystem
PPTX
Real-time applications of Data Science.pptx
Human Genome and Big Data Challenges
A Big Picture in Research Data Management
Ask Not What the NIH Can Do For You; Ask What You Can Do For the NIH
The Thinking Behind Big Data at the NIH
AMIA 2014
Data Science Meets Biomedicine, Does Anything Change
Data Science and AI in Biomedicine: The World has Changed
Towards a Platform for Global Health
What Data Science Will Mean to You - One Person's View
Research data life cycle
One View of Data Science
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Foundations for Discovery Informatics
Ps rwebinar january2019final
Critical infrastructure to promote data synthesis
BigDataInPractice_EXLPHARMA_KOCH
Overview of Digital Publishing
Ratan "Are we there yet? Keeping the promise of open science"
FORCE11: Creating a data and tools ecosystem
Real-time applications of Data Science.pptx
Ad

More from Philip Bourne (20)

PPTX
Your Science Needs You - More Than Ever Before
PPTX
The Biological Data Sustainability Paradox: A Time to Think Differently
PPTX
Data Science and AI in Biomedicine: The World has Changed
PPTX
AI in Medical Education A Meta View to Start a Conversation
PPTX
AI+ Now and Then How Did We Get Here And Where Are We Going
PPTX
Thoughts on Biological Data Sustainability
PPTX
What is FAIR Data and Who Needs It?
PPTX
Data Science Meets Drug Discovery
PPTX
Biomedical Data Science: We Are Not Alone
PPTX
BIMS7100-2023. Social Responsibility in Research
PPTX
AI from the Perspective of a School of Data Science
PPTX
Novo Nordisk 080522.pptx
PPTX
Towards a US Open research Commons (ORC)
PPTX
COVID and Precision Education
PPTX
Cancer Research Meets Data Science — What Can We Do Together?
PPTX
Data Science Meets Open Scholarship – What Comes Next?
PPTX
Data to Advance Sustainability
PPTX
Frontiers of Computing at the Cellular and Molecular Scales
PPTX
Social Responsibility in Research
PPTX
The Analytics and Data Science Landscape
Your Science Needs You - More Than Ever Before
The Biological Data Sustainability Paradox: A Time to Think Differently
Data Science and AI in Biomedicine: The World has Changed
AI in Medical Education A Meta View to Start a Conversation
AI+ Now and Then How Did We Get Here And Where Are We Going
Thoughts on Biological Data Sustainability
What is FAIR Data and Who Needs It?
Data Science Meets Drug Discovery
Biomedical Data Science: We Are Not Alone
BIMS7100-2023. Social Responsibility in Research
AI from the Perspective of a School of Data Science
Novo Nordisk 080522.pptx
Towards a US Open research Commons (ORC)
COVID and Precision Education
Cancer Research Meets Data Science — What Can We Do Together?
Data Science Meets Open Scholarship – What Comes Next?
Data to Advance Sustainability
Frontiers of Computing at the Cellular and Molecular Scales
Social Responsibility in Research
The Analytics and Data Science Landscape

Recently uploaded (20)

PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Pharma ospi slides which help in ospi learning
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
Institutional Correction lecture only . . .
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Pre independence Education in Inndia.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Pharma ospi slides which help in ospi learning
TR - Agricultural Crops Production NC III.pdf
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Institutional Correction lecture only . . .
Microbial disease of the cardiovascular and lymphatic systems
Abdominal Access Techniques with Prof. Dr. R K Mishra
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
human mycosis Human fungal infections are called human mycosis..pptx
PPH.pptx obstetrics and gynecology in nursing
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
01-Introduction-to-Information-Management.pdf
Renaissance Architecture: A Journey from Faith to Humanism
Final Presentation General Medicine 03-08-2024.pptx
Pre independence Education in Inndia.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student

The Role of Automated Function Prediction in the Era of Big Data and Small Budgets

  • 1. The Role of Automated Function Prediction in the Era of Big Data and Small Budgets Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health
  • 2. A View from the Funding Agencies
  • 3. “It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair …”
  • 4. Roughly translated… A time of great (unprecedented?) scientific development but limited funding A time of upheaval in the way we do science
  • 5. From a funders perspective… A time to squeeze every cent/penny to maximize the amount of research that can be done A time for when top down approaches meet bottom up approaches
  • 6. Top Down vs Bottom Up  Top Down – Regulations e.g. US: Common Rule, FISMA, HIPPA – Data sharing policies • GWAS • Genome data • Clinical trials – Digital enablement – Moves towards reproducibility  Bottom Up – Communities emerge and crowdsource • Collaboration • Data shared • Open source software • Common principles • Standards
  • 7. A Time for New Models Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
  • 8. And This May Just be the Beginning  Evidence: – Google car – 3D printers – Waze – Robotics From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee
  • 9. Consider This an Opportunity  Look at the value of data  Derive new business models  Look for new efficiencies  Foster best practices  Foster collaboration  ….
  • 10. It is the age when functional annotation is in the greatest demand for science.. It is the age when the rewards outside academia are greater than the rewards inside
  • 11. Associate Director for Data Science Commons Training Center BD2K Modified Review Sustainability* Education* Innovation* Process • Cloud – Data & Compute • Search • Security • Reproducibility Standards • App Store • Coordinate • Hands-on • Syllabus • MOOCs • Community • Centers • Training Grants • Catalogs • Standards • Analysis • Data Resource Support • Metrics • Best Practices • Evaluation • Portfolio Analysis The Biomedical Research Digital Enterprise Communication Collaboration rogrammatic Theme Deliverable Example Features • IC’s • Researchers • Federal Agencies • International Partners • Computer Scientists Scientific Data Council External Advisory Board * Hires made
  • 12. Innovation – Big Data to Knowledge BD2K  Centers of excellence  Software catalog  Data catalog  Software initiatives  Standards  Training bd2k.nih.gov
  • 13. Sustainability and Sharing: The Commons Data The Long Tail Core Facilities/HS Centers Clinical /Patient The Why: Data Sharing Plans The Commons Government The How: Data Discovery Index Sustainable Storage Quality Scientific Discovery Usability Security/ Privacy Commons == Extramural NCBI == Research Object Sandbox == Collaborative Environment The End Game: KnowledgeNIH Awardees Private Sector Metrics/ Standards Rest of Academia Software Standards Index BD2K Centers Cloud, Research Objects,
  • 14. What The Commons Is and Is Not  Is Not: – A database – Confined to one physical location – A new large infrastructure – Owned by any one group  Is: – A conceptual framework – Analogous to the Internet – A collaboratory – A few shared rules • All research objects have unique identifiers • All research objects have limited provenance
  • 15. What Does the Commons Enable?  Dropbox like storage  The opportunity to apply quality metrics  Bring compute to the data  A place to collaborate  A place to discover http://100plus.com/wp-content/uploads/Data-Commons-3- 1024x825.png
  • 16. [Adapted from George Komatsoulis] One Possible Commons Business Model HPC, Institution …
  • 17. What Are the Benefits to Those Doing Functional Annotation?  Open environment in which to test new ideas – better for crowdsourcing  Opportunity to gain resources to run annotation pipelines  Opportunity to collaborate through provision of open APIs  Better characterization and accessibility to annotation methods
  • 18. Commons Pilots  Define a set of use cases emphasizing: – Openness of the system – Support for basic statistical analysis – Embedding of existing applications – API support into existing resources  Evaluate against the use cases  Review results & business model with NIH leadership  Design a pilot phase with various groups  Conduct pilot for 6-12 months  Evaluate outcomes and determine whether a wider deployment makes sense  Report to NIH leadership summer 2015
  • 19. Some Acknowledgements  Eric Green & Mark Guyer (NHGRI)  Jennie Larkin (NHLBI)  Leigh Finnegan (NHGRI)  Vivien Bonazzi (NHGRI)  Michelle Dunn (NCI)  Mike Huerta (NLM)  David Lipman (NLM)  Jim Ostell (NLM)  Andrea Norris (CIT)  Peter Lyster (NIGMS)  All the over 100 folks on the BD2K team
  • 20. NIHNIH…… Turning Discovery Into HealthTurning Discovery Into Health

Editor's Notes

  • #7: Federal Information Security Management Act of 2002 The Health Insurance Portability and Accountability Act of 1996