SlideShare a Scribd company logo
Biomedical Data Science:
We Are Not Alone
Philip E. Bourne PhD
peb6a@virginia.edu
https://www.slideshare.net/pebourne
July 26, 2023 ISMB Lyon France
Biomedical Data Science: We Are Not Alone
Disclaimer
I am privileged to be
helping build a new
kind of school within a
traditional institution. I
have drunk my own
Kool-Aid
https://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist)
https://www.microsoft.com/en-us/research/wp-
content/uploads/2009/10/Fourth_Paradigm.pdf
https://twitter.com/aip_publishing/status/856825353645559808
Science Drivers Over the
Millennia
The Human Genome was the Tipping Point
and Led the Way
http://www.ornl.gov/hgmis
• High throughput DNA digital data changed how
we think about biomedicine
• Spawned a new field – bioinformatics /
computational biology/ systems biology /
biomedical data science
• Spawned a multi billion-dollar industry
Is Bioinformatics Dead? PLOS Biology 2021
Bourne’s Timeline
(Apologies for the US Centricity)
1980s 1990s 2000s 2010s 2020’s
The Discipline (Whatever it is Called)
Unknown Expt. Driven Emergent Over-sold A Service A Partner The Driver
6
Digital Data
Systems
Analytics
Design
Value
4 Pillars of Data Science
HPC Cloud GPUs
HHMs SVMs NNs CNNs LLMs
HIPPA Privacy Security HiTech
Mol Graphics Web 2.0 Dashboards
Basic Premise …..
We are at a new tipping point
Basic Premise …
“We need to be more aware than
ever of developments that may be
far outside our discipline that fall
under the broad topic of data
science. In short, we need to
become biomedical data
scientists.”
Stated another way, the
leadership role in data/informatics
afforded by the human genome
project no longer applies.
Data Science –
In 45+ Years in Academia I Have Never Seen Anything Like It
• It is a response to the digital transformation of
society
• It is touching every discipline (aka vertical)
• We can’t keep the students out of our classes
• Cause – large amounts of digital data
• Effect – interdisciplinarity, openness, translation,
search for responsibility and more
In summary, it is disruptive to current modes of biomedical research
Data Science
As a Driver Its Just the Beginning….
https://zenodo.org/record/6497693
45 Members Data scientist jobs are predicted to experience 36
percent growth between 2021 and 2031, according
to the US Bureau of Labor Statistics.
The global data science platform market size was
valued at USD 64.14 billion in 2021 and is projected
to grow from USD 81.47 billion in 2022 to USD
484.17 billion by 2029, exhibiting a CAGR of 29.0%
during the forecast period.
Data science is the fastest emerging field around the
globe.
Given these precedents about data and data
science we should start with a definition/framework
Big data and data science are like the Internet…
If I asked you to define them you would all say
something different, yet you use them every day…
http://vadlo.com/cartoons.php?id=357
One Definition of Data Science –
The 4+1 Model (aka domains)
• Value – assuring societal
benefit
• Design - Communication
of the value of data
• Systems – the means to
communicate and
convey benefit
• Analytics – models and
methods
• Practice – where
everything happens
[Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
The Data Science Interplay
• Value + Design = Openness,
responsibility
• Value + Analytics = Human
centered AI, algorithmic bias
• Value + Systems =
sustainability, access,
environmental impact
• Design + Analytics = literate
programming, visualization
• Design + Systems =
dashboards, engineering
design
• Analytics + Systems = ML
engineering
Thinking of data as a science unto itself is novel and controversial
[Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
With this definition let’s explore the
implications for biomedical research …
The 4+1 Model - Systems
• Value – assuring societal
benefit
• Design - Communication
of the value of data
• Systems – the means to
communicate and
convey benefit
• Analytics – models and
methods
• Practice – where
everything happens
[Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
Systems….
Science, 377 (6603), .
DOI: 10.1126/science.abo5947
Systems….
• Need something akin to the electricity grid or banking system
• Need to consider data and methods as first-class data objects
• Examples: European Open Science Cloud (EOSC), the CS3MESH4EOSC Science
Mesh, the China Science and Technology (CST) Cloud, the African Open Science
Platform, the South African National Integrated Cyber Infrastructure System, the
Malaysia Open Science Platform, the Global Open Science Cloud (GOSC) the
Australian Research Data Commons (ARDC) Nectar Research Cloud, the Digital
Research Alliance of Canada (formerly known as the New Digital Research
Infrastructure Organization), and the Arab States Research and Education
Network.
• Problems span funding agencies; solutions do not
• There is a lack of public-private partnership
Analytics ….
AlphaGo – Take Home Messages
https://www.alphagomovie.com/
1. Even the programmers were
disquieted by creating
something better than any
human
2. AlphaGo made a move that no
human Go expert nor
programmer anticipated
3. It takes a lot of resources to
defeat the world champion
Go has more moves than there are atoms in the universe
Proteins have ~20**300 combinations also more than the
number of atoms in the universe
Science Games….
https://medium.com/proteinqure/welcome-into-the-fold-bbd3f3b19fdd
Biomedical Data Science: We Are Not Alone
AlphaFold2 Makes Significant Leap
AlphaFold2
Numerical optimization – differential programming
Overall gradient descent trained to win CASP
Jumper et al.., 2021. Nature, 596 (7873),
pp.583-589
Transformer models using attention
Geometry invariant to
translation/rotation
Logistics Behind the Win
● Nothing fundamentally new from an AI perspective
● Data Integration
● Collaboration not competition
● Engineering challenge beyond most labs
● Compute power beyond most labs
● Team size beyond most labs
● Worked with protein structure specialists
Downstream Implications
• Cooperation rather than competition
• Public-private partnership
• Translational possibilities are endless
• Made possible by curated open data
• Appreciate engineering
Scientific Implications
Exploration of Latent Space
Rethink fold space? Rethink classification schemes?
AI Analytics Across the Scientific Discovery
Process
From Yolanda Gil 2023 AI for Science Eds. Choudhary, Fox & Hey p699
The 4+1 Model - Design
• Value – assuring societal
benefit
• Design - Communication
of the value of data
• Systems – the means to
communicate and
convey benefit
• Analytics – models and
methods
• Practice – where
everything happens
[Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
Biomedical Data Science: We Are Not Alone
Beyond data science the academic landscape
is changing….
https://doi.org/10.1038/sdata.2016.18
https://www.heliosopen.org/
The One Lever Left in Open
Scholarship is Academia Itself
Openness/FAIR
Data Science would not exist if it were not for open
data and methods. It would be wrong for us to take
and not give back
https://sparcopen.org/
https://datascience.virginia.edu/policies
Questions I Leave You With ….
• Are we indeed at a change point?
• Will biomedicine continue to lead data science?
• Do we need new models for doing science?
• Are we placing the right emphasis on our research
products, notably data and methods vs papers
Questions?
Databases
organize data
around a project.
Data warehouses
organize the data
for an organization
Data commons
organize the data
for a scientific
discipline or field
Data
Warehouse
Data Ecosystems
How we think about our
infrastructure is important
Challenges
Fixed level of funding
Opportunities
data commons
Data commons co-locate data
with cloud computing
infrastructure and commonly
used software services, tools &
apps for managing, analyzing and
sharing data to create an
interoperable resource for the
research community.*
*Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A Case for Data Commons Towards Data Science as a Service, IEEE
Computing in Science and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons infrastructure at a University of Chicago data center.
Bonazzi VR, Bourne PE (2017) Should biomedical research be like Airbnb? PLoS Biol 15(4): e2001818.
Systems
[Adapted from Bob Grossman]
But wait the picture is more complicated….
Data Science versus Data Engineering – How
Much Emphasis Where?
A Data Integration Poster Child
Researcher and Assistant Professor of
Medicine Dr. Thomas Hartka, also a
current online Masters in Data Science
student, is combining two disparate
data sets—electronic health records
and DMV crash data—to save lives
after motor vehicle crashes.
“I enrolled in the MSDS program to
expand my research on automotive
safety. I have already used
techniques from classes in my work.
I hope to expand my research to
real-time analytics to improve
emergency room care.”
— Dr. Thomas Hartka, UVA School
of Medicine
Coming back to the question…
So we have a definition of data science and we
have a set of guiding principles, where does this
take us?
Stated another way, what do we want to be
recognized for in 10 years?
https://pebourne.wordpress.com/
Research ethics
committees (RECs) review
the ethical acceptability
of research involving
human participants.
Historically, the principal
emphases of RECs have
been to protect
participants from physical
harms and to provide
assurance as to
participants’ interests and
welfare.*
[The Framework] is
guided by, Article 27
of the 1948 Universal
Declaration of Human
Rights. Article 27
guarantees the rights
of every individual in
the world "to share in
scientific
advancement and its
benefits" (including to
freely engage in
responsible scientific
inquiry)…*
Protect human
subject data
The right of human
subjects to benefit
from research.
*GA4GH Framework for Responsible Sharing of Genomic and Health-Related Data, see goo.gl/CTavQR
Data sharing with protections provides the evidence
so patients can benefit from advances in research.
Balance protecting human subject data
with open research that benefits
patients
[Adapted from Bob Grossman]
Value
Why Responsible Data Science?
• A defining feature
• A partnership between STEM, social
sciences and the humanities
• Where UVA has strength
Model
Transportability
Horizontal
Integration
Multi-scale
Integration
human
mouse
zebrafish
DNA
Gene/Protein
Network
Cell
Tissue
Organ
Body
Population
CNV SNP methylation
3D structure Gene
expression Proteomics
Metabolomics
Metabolic
Signaling
transduction
Gene
regulation
Hepatic Myoepithelial Erythrocyte
Epithelial Muscle Nervous
Liver Kidney Pancreas Heart
Physiologically based
pharmacokinetics
GWAS
Population
dynamics
Microbiota
From Harnessing Big Data for Systems Pharmacology 2017
https://doi.org/10.1146/annurev-pharmtox-010716-104659
Current roadblocks are more cultural than technical
The Fifth Paradigm: Integration Across Scales?
Gohlke et al. 2022
https://onlinelibrary.wiley.com/doi/10.1002/ctm2.726
Real World Evidence for Preventive Effects of Statins on
Cancer Incidence: A Transatlantic Analysis
EHR
Animal Models
Pathways
Daily Challenges
• Deciding what not to do
• Competition for the best team members (faculty and staff)
• Establishing a diverse team
• Lack of a comprehensive enterprise-wide data infrastructure
• Its easier to conform

More Related Content

PDF
04. Introducción a la metalurgia de la soldadura Autor L.A de Vedia, H.G Svob...
PPTX
Data Science Meets Biomedicine, Does Anything Change
PPTX
Data Science and AI in Biomedicine: The World has Changed
PPTX
Data Science and AI in Biomedicine: The World has Changed
PPTX
One View of Data Science
PPTX
AI from the Perspective of a School of Data Science
PPTX
What Data Science Will Mean to You - One Person's View
PPT
The Thinking Behind Big Data at the NIH
04. Introducción a la metalurgia de la soldadura Autor L.A de Vedia, H.G Svob...
Data Science Meets Biomedicine, Does Anything Change
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
One View of Data Science
AI from the Perspective of a School of Data Science
What Data Science Will Mean to You - One Person's View
The Thinking Behind Big Data at the NIH

Similar to Biomedical Data Science: We Are Not Alone (20)

PPTX
Real-time applications of Data Science.pptx
PDF
Better Data for a Better World
PPTX
The Analytics and Data Science Landscape
PPTX
Data_Science_Applications_&_Use_Cases.pptx
PPTX
Data_Science_Applications_&_Use_Cases.pptx
PDF
New and Emerging Forms of Data
PDF
AI for Marking Industry application for.pdf
PDF
Data_Science_Applications_&_Use_Cases.pdf
PPT
Open Data in a Big Data World: easy to say, but hard to do?
PPTX
A Blind Date With (Big) Data: Student Data in (Higher) Education
PDF
African Open Science Platform: Pilot Phase
PDF
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
PPTX
A coordinated framework for open data open science in Botswana/Simon Hodson
PPT
AMIA 2014
PDF
Rising tide of data update
PPTX
Rising tide of data update 20171024
PPTX
Open Science Globally: Some Developments/Dr Simon Hodson
PPT
Human Genome and Big Data Challenges
PPTX
HKU Data Curation MLIM7350 Class 8
PDF
The FAIR movement - Oxford Open Data Week
Real-time applications of Data Science.pptx
Better Data for a Better World
The Analytics and Data Science Landscape
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
New and Emerging Forms of Data
AI for Marking Industry application for.pdf
Data_Science_Applications_&_Use_Cases.pdf
Open Data in a Big Data World: easy to say, but hard to do?
A Blind Date With (Big) Data: Student Data in (Higher) Education
African Open Science Platform: Pilot Phase
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
A coordinated framework for open data open science in Botswana/Simon Hodson
AMIA 2014
Rising tide of data update
Rising tide of data update 20171024
Open Science Globally: Some Developments/Dr Simon Hodson
Human Genome and Big Data Challenges
HKU Data Curation MLIM7350 Class 8
The FAIR movement - Oxford Open Data Week
Ad

More from Philip Bourne (20)

PPTX
Your Science Needs You - More Than Ever Before
PPTX
The Biological Data Sustainability Paradox: A Time to Think Differently
PPTX
AI in Medical Education A Meta View to Start a Conversation
PPTX
AI+ Now and Then How Did We Get Here And Where Are We Going
PPTX
Thoughts on Biological Data Sustainability
PPTX
What is FAIR Data and Who Needs It?
PPTX
Data Science Meets Drug Discovery
PPTX
BIMS7100-2023. Social Responsibility in Research
PPTX
Novo Nordisk 080522.pptx
PPTX
Towards a US Open research Commons (ORC)
PPTX
COVID and Precision Education
PPTX
Cancer Research Meets Data Science — What Can We Do Together?
PPTX
Data Science Meets Open Scholarship – What Comes Next?
PPTX
Data to Advance Sustainability
PPTX
Frontiers of Computing at the Cellular and Molecular Scales
PPTX
Social Responsibility in Research
PPTX
SWOT Analysis - What Does it Tell Us?
PPTX
The UVA School of Data Science
PPTX
The Most Important Ten Simple Rules
PPTX
UVA School of Data Science
Your Science Needs You - More Than Ever Before
The Biological Data Sustainability Paradox: A Time to Think Differently
AI in Medical Education A Meta View to Start a Conversation
AI+ Now and Then How Did We Get Here And Where Are We Going
Thoughts on Biological Data Sustainability
What is FAIR Data and Who Needs It?
Data Science Meets Drug Discovery
BIMS7100-2023. Social Responsibility in Research
Novo Nordisk 080522.pptx
Towards a US Open research Commons (ORC)
COVID and Precision Education
Cancer Research Meets Data Science — What Can We Do Together?
Data Science Meets Open Scholarship – What Comes Next?
Data to Advance Sustainability
Frontiers of Computing at the Cellular and Molecular Scales
Social Responsibility in Research
SWOT Analysis - What Does it Tell Us?
The UVA School of Data Science
The Most Important Ten Simple Rules
UVA School of Data Science
Ad

Recently uploaded (20)

PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Cell Structure & Organelles in detailed.
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Pre independence Education in Inndia.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Insiders guide to clinical Medicine.pdf
PDF
Complications of Minimal Access Surgery at WLH
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPH.pptx obstetrics and gynecology in nursing
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
102 student loan defaulters named and shamed – Is someone you know on the list?
Cell Structure & Organelles in detailed.
Abdominal Access Techniques with Prof. Dr. R K Mishra
TR - Agricultural Crops Production NC III.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Renaissance Architecture: A Journey from Faith to Humanism
Pre independence Education in Inndia.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Microbial diseases, their pathogenesis and prophylaxis
Pharmacology of Heart Failure /Pharmacotherapy of CHF
STATICS OF THE RIGID BODIES Hibbelers.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
VCE English Exam - Section C Student Revision Booklet
Insiders guide to clinical Medicine.pdf
Complications of Minimal Access Surgery at WLH
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester

Biomedical Data Science: We Are Not Alone

  • 1. Biomedical Data Science: We Are Not Alone Philip E. Bourne PhD peb6a@virginia.edu https://www.slideshare.net/pebourne July 26, 2023 ISMB Lyon France
  • 3. Disclaimer I am privileged to be helping build a new kind of school within a traditional institution. I have drunk my own Kool-Aid
  • 5. The Human Genome was the Tipping Point and Led the Way http://www.ornl.gov/hgmis • High throughput DNA digital data changed how we think about biomedicine • Spawned a new field – bioinformatics / computational biology/ systems biology / biomedical data science • Spawned a multi billion-dollar industry Is Bioinformatics Dead? PLOS Biology 2021
  • 6. Bourne’s Timeline (Apologies for the US Centricity) 1980s 1990s 2000s 2010s 2020’s The Discipline (Whatever it is Called) Unknown Expt. Driven Emergent Over-sold A Service A Partner The Driver 6 Digital Data Systems Analytics Design Value 4 Pillars of Data Science HPC Cloud GPUs HHMs SVMs NNs CNNs LLMs HIPPA Privacy Security HiTech Mol Graphics Web 2.0 Dashboards
  • 7. Basic Premise ….. We are at a new tipping point
  • 8. Basic Premise … “We need to be more aware than ever of developments that may be far outside our discipline that fall under the broad topic of data science. In short, we need to become biomedical data scientists.” Stated another way, the leadership role in data/informatics afforded by the human genome project no longer applies.
  • 9. Data Science – In 45+ Years in Academia I Have Never Seen Anything Like It • It is a response to the digital transformation of society • It is touching every discipline (aka vertical) • We can’t keep the students out of our classes • Cause – large amounts of digital data • Effect – interdisciplinarity, openness, translation, search for responsibility and more In summary, it is disruptive to current modes of biomedical research
  • 10. Data Science As a Driver Its Just the Beginning…. https://zenodo.org/record/6497693 45 Members Data scientist jobs are predicted to experience 36 percent growth between 2021 and 2031, according to the US Bureau of Labor Statistics. The global data science platform market size was valued at USD 64.14 billion in 2021 and is projected to grow from USD 81.47 billion in 2022 to USD 484.17 billion by 2029, exhibiting a CAGR of 29.0% during the forecast period. Data science is the fastest emerging field around the globe.
  • 11. Given these precedents about data and data science we should start with a definition/framework
  • 12. Big data and data science are like the Internet… If I asked you to define them you would all say something different, yet you use them every day… http://vadlo.com/cartoons.php?id=357
  • 13. One Definition of Data Science – The 4+1 Model (aka domains) • Value – assuring societal benefit • Design - Communication of the value of data • Systems – the means to communicate and convey benefit • Analytics – models and methods • Practice – where everything happens [Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
  • 14. The Data Science Interplay • Value + Design = Openness, responsibility • Value + Analytics = Human centered AI, algorithmic bias • Value + Systems = sustainability, access, environmental impact • Design + Analytics = literate programming, visualization • Design + Systems = dashboards, engineering design • Analytics + Systems = ML engineering Thinking of data as a science unto itself is novel and controversial [Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
  • 15. With this definition let’s explore the implications for biomedical research …
  • 16. The 4+1 Model - Systems • Value – assuring societal benefit • Design - Communication of the value of data • Systems – the means to communicate and convey benefit • Analytics – models and methods • Practice – where everything happens [Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
  • 17. Systems…. Science, 377 (6603), . DOI: 10.1126/science.abo5947
  • 18. Systems…. • Need something akin to the electricity grid or banking system • Need to consider data and methods as first-class data objects • Examples: European Open Science Cloud (EOSC), the CS3MESH4EOSC Science Mesh, the China Science and Technology (CST) Cloud, the African Open Science Platform, the South African National Integrated Cyber Infrastructure System, the Malaysia Open Science Platform, the Global Open Science Cloud (GOSC) the Australian Research Data Commons (ARDC) Nectar Research Cloud, the Digital Research Alliance of Canada (formerly known as the New Digital Research Infrastructure Organization), and the Arab States Research and Education Network. • Problems span funding agencies; solutions do not • There is a lack of public-private partnership
  • 20. AlphaGo – Take Home Messages https://www.alphagomovie.com/ 1. Even the programmers were disquieted by creating something better than any human 2. AlphaGo made a move that no human Go expert nor programmer anticipated 3. It takes a lot of resources to defeat the world champion Go has more moves than there are atoms in the universe
  • 21. Proteins have ~20**300 combinations also more than the number of atoms in the universe
  • 25. AlphaFold2 Numerical optimization – differential programming Overall gradient descent trained to win CASP Jumper et al.., 2021. Nature, 596 (7873), pp.583-589 Transformer models using attention Geometry invariant to translation/rotation
  • 26. Logistics Behind the Win ● Nothing fundamentally new from an AI perspective ● Data Integration ● Collaboration not competition ● Engineering challenge beyond most labs ● Compute power beyond most labs ● Team size beyond most labs ● Worked with protein structure specialists
  • 27. Downstream Implications • Cooperation rather than competition • Public-private partnership • Translational possibilities are endless • Made possible by curated open data • Appreciate engineering
  • 29. Exploration of Latent Space Rethink fold space? Rethink classification schemes?
  • 30. AI Analytics Across the Scientific Discovery Process From Yolanda Gil 2023 AI for Science Eds. Choudhary, Fox & Hey p699
  • 31. The 4+1 Model - Design • Value – assuring societal benefit • Design - Communication of the value of data • Systems – the means to communicate and convey benefit • Analytics – models and methods • Practice – where everything happens [Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
  • 33. Beyond data science the academic landscape is changing….
  • 35. Openness/FAIR Data Science would not exist if it were not for open data and methods. It would be wrong for us to take and not give back https://sparcopen.org/ https://datascience.virginia.edu/policies
  • 36. Questions I Leave You With …. • Are we indeed at a change point? • Will biomedicine continue to lead data science? • Do we need new models for doing science? • Are we placing the right emphasis on our research products, notably data and methods vs papers
  • 38. Databases organize data around a project. Data warehouses organize the data for an organization Data commons organize the data for a scientific discipline or field Data Warehouse Data Ecosystems How we think about our infrastructure is important
  • 39. Challenges Fixed level of funding Opportunities data commons Data commons co-locate data with cloud computing infrastructure and commonly used software services, tools & apps for managing, analyzing and sharing data to create an interoperable resource for the research community.* *Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A Case for Data Commons Towards Data Science as a Service, IEEE Computing in Science and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons infrastructure at a University of Chicago data center. Bonazzi VR, Bourne PE (2017) Should biomedical research be like Airbnb? PLoS Biol 15(4): e2001818. Systems [Adapted from Bob Grossman]
  • 40. But wait the picture is more complicated….
  • 41. Data Science versus Data Engineering – How Much Emphasis Where?
  • 42. A Data Integration Poster Child Researcher and Assistant Professor of Medicine Dr. Thomas Hartka, also a current online Masters in Data Science student, is combining two disparate data sets—electronic health records and DMV crash data—to save lives after motor vehicle crashes. “I enrolled in the MSDS program to expand my research on automotive safety. I have already used techniques from classes in my work. I hope to expand my research to real-time analytics to improve emergency room care.” — Dr. Thomas Hartka, UVA School of Medicine
  • 43. Coming back to the question… So we have a definition of data science and we have a set of guiding principles, where does this take us? Stated another way, what do we want to be recognized for in 10 years? https://pebourne.wordpress.com/
  • 44. Research ethics committees (RECs) review the ethical acceptability of research involving human participants. Historically, the principal emphases of RECs have been to protect participants from physical harms and to provide assurance as to participants’ interests and welfare.* [The Framework] is guided by, Article 27 of the 1948 Universal Declaration of Human Rights. Article 27 guarantees the rights of every individual in the world "to share in scientific advancement and its benefits" (including to freely engage in responsible scientific inquiry)…* Protect human subject data The right of human subjects to benefit from research. *GA4GH Framework for Responsible Sharing of Genomic and Health-Related Data, see goo.gl/CTavQR Data sharing with protections provides the evidence so patients can benefit from advances in research. Balance protecting human subject data with open research that benefits patients [Adapted from Bob Grossman] Value
  • 45. Why Responsible Data Science? • A defining feature • A partnership between STEM, social sciences and the humanities • Where UVA has strength
  • 46. Model Transportability Horizontal Integration Multi-scale Integration human mouse zebrafish DNA Gene/Protein Network Cell Tissue Organ Body Population CNV SNP methylation 3D structure Gene expression Proteomics Metabolomics Metabolic Signaling transduction Gene regulation Hepatic Myoepithelial Erythrocyte Epithelial Muscle Nervous Liver Kidney Pancreas Heart Physiologically based pharmacokinetics GWAS Population dynamics Microbiota From Harnessing Big Data for Systems Pharmacology 2017 https://doi.org/10.1146/annurev-pharmtox-010716-104659 Current roadblocks are more cultural than technical The Fifth Paradigm: Integration Across Scales?
  • 47. Gohlke et al. 2022 https://onlinelibrary.wiley.com/doi/10.1002/ctm2.726 Real World Evidence for Preventive Effects of Statins on Cancer Incidence: A Transatlantic Analysis EHR Animal Models Pathways
  • 48. Daily Challenges • Deciding what not to do • Competition for the best team members (faculty and staff) • Establishing a diverse team • Lack of a comprehensive enterprise-wide data infrastructure • Its easier to conform

Editor's Notes

  • #43: I will introduce the concept of data science with a story that illustrates - citizen engagement, merging of unexpected data and societal benefit