SlideShare a Scribd company logo
What Does Open Data
Mean to Data Science?
Philip E. Bourne PhD, FACMI
Stephenson Chair of Data Science
Director, Data Science Institute
Professor of Biomedical Engineering
peb6a@virginia.edu
https://www.slideshare.net/pebourne
06/07/18 OA @ UNT 1
@pebourne
Let me answer the question with a story…
The case of the trauma surgeon…
May 22, 2018 2
Not convinced … try this one...
• North Virginia Technology Council
Announces a Hackathon
• Teams compete internally –
undergrads led by Daniel Mietchen
and Pete Alonso
• Team selected and competes
against GMU, VT, VCU
• UVA wins!
May 22, 2018 3
Source: U.S. Department of Veterans Affairs, US
Census Bureau, data.gov
The Problem
May 22, 2018 4
Latest Estimate – 20 vets commit suicide every day
In 2014, the latest year available, more than 7,400 veterans
took their own lives, accounting for 18 percent of all suicides
in America
Age Ranges
Veteran
Demographic
s
Period Served
Veteran Population
Total
Male
Female
Source: 2012, 2014, 2015 Data acquired from U.S. Department of Veterans Affairs, US Census
Bureau, Department of Defense
Race
Distribution
Race Distribution
May 22, 2018 5
Veteran and General
Suicide
Victim Demographics
Suicide Rate by
Age
Key
General Population
Veteran Population
Suicide by
Race
General Populatio
Veteran Population
Suicide Rate by
Gender
Key
General Population
Veteran Population
Total Male Female
Source: 2012, 2014 data acquired from U.S. Department of Veterans Affairs, data.gov,
2014 Centers for Disease Control and Prevention reports
May 22, 2018 6
Correlation: Health Care by
State
● ~2M Veterans lack health insurance
● 42% Unaware of VA benefits
● Complicated priority system (VA)
○ False PTSD diagnosis – est. 47,000
undiagnosed each year
Source: 2014 data acquired from Veterans Affairs
and census.gov
May 22, 2018 7
Source: 2014 data acquired from Veterans Affairs, data.gov and
census.gov
Correlation: Social Isolation by State
Key
Rural
Urban
May 22, 2018 8
Source: 2014 data acquired from Veterans Affairs, data.gov
and census.gov
Correlation: Social
Isolation as Measured by
State
Key
Rural
UrbanUtah
Arizona
New Mexico
Nevada
May 22, 2018 9
Source: 2014 data acquired from Veterans Affairs, data.gov and
census.gov
Firearm Regulation
State Total Gun
Laws
Ammunition
Regulations
Background
Checks
Buyer
Regulations
Dealer
Regulations
Gun
Trafficking
Arizona 11 0 0 0 0 0
Nevada 11 1 0 0 0 0
New
Mexico
10 0 0 0 0 0
Utah 11 0 0 0 0 2
All
States
26.5 0.72 2.46 2.4 2.7 0.76
May 22, 2018 10
Firearm Access
Veteran Suicide
Firearm
Regulation
Source: 2014 data acquired from U.S. Department of
Veterans AffairsMay 22, 2018 11
Social Media: Mining
APIs and URLs
57.6
9%
90.0
0%
Unsuccessful
**Data collected using various Python data mining methods and respective Social
Media APIsMay 22, 2018 12
May 22, 2018 13
Community:
Content
Creation and
Mental Health
Training
Utilize social media
data to increase
outreach
Have VA Resources
complement private
health care
Recommendation
s
Limit firearm
possession based
on mental health
status
May 22, 2018 14
Open data is driving data science which in turn
is/will change the way we do everything…
06/07/18 OA @ UNT 15
Digitization
Deception
Disruption
Demonetization
Dematerialization
Democratization
Time
Volume,Velocity,Variety
Digital camera invented by
Kodak but shelved
Megapixels & quality improve slowly;
Kodak slow to react
Film market collapses;
Kodak goes bankrupt
Phones replace
cameras
Instagram,
Flickr become the
value proposition
Digital media becomes bona fide
form of communication
From a presentation to the Advisory Board to the NIH Director
Example - photography
1604/08/18
So what is the problem? …
06/07/18 OA @ UNT 17
There is lots of data but it is hard to find and
is not persistent …
We are not FAIR
• Digital assets (objects) within that system are data,
software, narrative, course materials etc.
• Assets are to varying degrees FAIR – Findable,
Accessible, Interoperable and Reusable
https://www.workitdaily.com/job-search-solution/
FAIR: https://www.nature.com/articles/sdata201618
6/19/17 18
There is lots of data, but it gets lost
quickly
• Big Data
• Total data from NIH-funded research currently
estimated at 650 PB*
• 20 PB of that is in NCBI/NLM (3%) and it is
expected to grow by 10 PB this year
• Dark Data
• Only 12% of data described in published papers is
in recognized archives – 88% is dark data^
• Cost
• 2007-2014: NIH spent ~$1.2Bn extramurally on
maintaining data archives
* In 2012 Library of Congress was 3 PB
^ http://www.ncbi.nlm.nih.gov/pubmed/26207759
6/15/17 Dataverse 2017 19
Funders and publishers come at this from a
perspective of reproducibility …
06/07/18 OA @ UNT 20
I Cant Reproduce My Own Work
It took several months to replicate
this work
6/19/17 21
The problem is more profound .. It inhibits doing
the data science research in the first place …
06/07/18 OA @ UNT 22
Stating the problem is easy ..
What are some of the solutions?
06/07/18 OA @ UNT 23
Both funders and institutions see the
need to move from pipes to
platforms…
6/15/17 Dataverse 2017 24
https://blog.lexicata.com/wp-content/uploads/2015/03/platform-model-
750x410.png
Example: NSF and NIH Approaches
6/15/17 Dataverse 2017 25
What evidence is there that platforms work?
• Airbnb is a platform that supports a trusted relationship between consumer
(renter) and supplier (host)
• The platform focuses on maximizing the exchange of services between supplier
and consumer and maximizing the amount of trust associated with a given
stakeholder
• It seems to be working:
• 60 million users searching 2 million listings in 192 countries
• Average of 500,000 stays per night.
• Evaluation of US $25bn
Bonazzi & Bourne 2017 PLOS Biology 15(4) e2001818
6/15/17 Dataverse 2017 26
06/07/18 OA @ UNT 27
OpenDataLab
Platforms support 4 of the 5 pillars of data
science
28
Data Integration
& Engineering
Machine Learning
& Analytics
Visualization
& Dissemination
Data Acquisition Ethics, Law,
Policy,
Social Implications
May 22, 2018
In summary
• Open data defines much of the new economy and contributes to
social good
• This may be incentive enough
• We are all part of this fourth paradigm
• To fully realize the potential of open data we must be FAIR
• We need to breakdown silos – platforms help
06/07/18 OA @ UNT 29
Since libraries are experienced with open
knowledge for the public good they have a key
role to play. But how?
06/07/18 OA @ UNT 30

More Related Content

PDF
Open Data, Open Opportunity, Open to Progress
PDF
Learning from Conversation with the Governor: Big Data Challenges for Bank of...
PDF
Here Today, Gone within a Month: The Fleeting Life of Digital News
PPTX
Cherubini and Kiefer "Opening Discussion: An Example from the Public Library ...
PDF
Matching Mobile Applications for Cross Promotion
PDF
Share & Share Alike? An Exploration of Secure Behaviors in Romantic Relations...
PDF
On the Application of Social Data Science to Address Societal Challenges
PPTX
Tackling climate change through agricultural supply chain transparency | Javi...
Open Data, Open Opportunity, Open to Progress
Learning from Conversation with the Governor: Big Data Challenges for Bank of...
Here Today, Gone within a Month: The Fleeting Life of Digital News
Cherubini and Kiefer "Opening Discussion: An Example from the Public Library ...
Matching Mobile Applications for Cross Promotion
Share & Share Alike? An Exploration of Secure Behaviors in Romantic Relations...
On the Application of Social Data Science to Address Societal Challenges
Tackling climate change through agricultural supply chain transparency | Javi...

Similar to What Does Open Data Mean to Data Science (20)

PPT
Viva la revolution
PPTX
The State of Open Data - AI, Data Literacy and the Private Sector
PDF
How to overcome obstacles to data publication: Issues, requirements, and good...
PPTX
Open Data Initiatives
PPT
Requirements for Open Sharing of Archaeological Research Data
PDF
BLC & Digital Science: Mark Hahnel, Figshare
PPTX
Open data pilot
PPTX
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
PPTX
Open Data: an Open and Shut Case?
PPTX
Socrata: Success with Open Data
PDF
Dataverse in the Universe of Data by Christine L. Borgman
PPTX
The Power of Open Data!
PPTX
Open data: an open and shut case?
PPTX
Hahnel "Open Data Policies: Opportunities, compliance and technology strategies"
PPTX
The Challenges of Making Data Travel, by Sabina Leonelli
PDF
The State of Open Data Report - Infographic
PDF
Managing, Sharing and Curating Your Research Data in a Digital Environment
PPT
EDF2012 Nigel Shadbolt - Transparency and Open Data
PPTX
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
PDF
Open Data: Movement or a Joke?
Viva la revolution
The State of Open Data - AI, Data Literacy and the Private Sector
How to overcome obstacles to data publication: Issues, requirements, and good...
Open Data Initiatives
Requirements for Open Sharing of Archaeological Research Data
BLC & Digital Science: Mark Hahnel, Figshare
Open data pilot
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
Open Data: an Open and Shut Case?
Socrata: Success with Open Data
Dataverse in the Universe of Data by Christine L. Borgman
The Power of Open Data!
Open data: an open and shut case?
Hahnel "Open Data Policies: Opportunities, compliance and technology strategies"
The Challenges of Making Data Travel, by Sabina Leonelli
The State of Open Data Report - Infographic
Managing, Sharing and Curating Your Research Data in a Digital Environment
EDF2012 Nigel Shadbolt - Transparency and Open Data
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
Open Data: Movement or a Joke?
Ad

More from Philip Bourne (20)

PPTX
Your Science Needs You - More Than Ever Before
PPTX
The Biological Data Sustainability Paradox: A Time to Think Differently
PPTX
Data Science and AI in Biomedicine: The World has Changed
PPTX
Data Science and AI in Biomedicine: The World has Changed
PPTX
AI in Medical Education A Meta View to Start a Conversation
PPTX
AI+ Now and Then How Did We Get Here And Where Are We Going
PPTX
Thoughts on Biological Data Sustainability
PPTX
What is FAIR Data and Who Needs It?
PPTX
Data Science Meets Biomedicine, Does Anything Change
PPTX
Data Science Meets Drug Discovery
PPTX
Biomedical Data Science: We Are Not Alone
PPTX
BIMS7100-2023. Social Responsibility in Research
PPTX
AI from the Perspective of a School of Data Science
PPTX
What Data Science Will Mean to You - One Person's View
PPTX
Novo Nordisk 080522.pptx
PPTX
Towards a US Open research Commons (ORC)
PPTX
COVID and Precision Education
PPTX
One View of Data Science
PPTX
Cancer Research Meets Data Science — What Can We Do Together?
PPTX
Data Science Meets Open Scholarship – What Comes Next?
Your Science Needs You - More Than Ever Before
The Biological Data Sustainability Paradox: A Time to Think Differently
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
AI in Medical Education A Meta View to Start a Conversation
AI+ Now and Then How Did We Get Here And Where Are We Going
Thoughts on Biological Data Sustainability
What is FAIR Data and Who Needs It?
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Drug Discovery
Biomedical Data Science: We Are Not Alone
BIMS7100-2023. Social Responsibility in Research
AI from the Perspective of a School of Data Science
What Data Science Will Mean to You - One Person's View
Novo Nordisk 080522.pptx
Towards a US Open research Commons (ORC)
COVID and Precision Education
One View of Data Science
Cancer Research Meets Data Science — What Can We Do Together?
Data Science Meets Open Scholarship – What Comes Next?
Ad

Recently uploaded (20)

PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Lesson notes of climatology university.
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Institutional Correction lecture only . . .
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Pharma ospi slides which help in ospi learning
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
O7-L3 Supply Chain Operations - ICLT Program
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Microbial diseases, their pathogenesis and prophylaxis
Complications of Minimal Access Surgery at WLH
Lesson notes of climatology university.
Pharmacology of Heart Failure /Pharmacotherapy of CHF
01-Introduction-to-Information-Management.pdf
Final Presentation General Medicine 03-08-2024.pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Module 4: Burden of Disease Tutorial Slides S2 2025
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Anesthesia in Laparoscopic Surgery in India
Institutional Correction lecture only . . .
VCE English Exam - Section C Student Revision Booklet
Supply Chain Operations Speaking Notes -ICLT Program
O5-L3 Freight Transport Ops (International) V1.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
Pharma ospi slides which help in ospi learning

What Does Open Data Mean to Data Science

  • 1. What Does Open Data Mean to Data Science? Philip E. Bourne PhD, FACMI Stephenson Chair of Data Science Director, Data Science Institute Professor of Biomedical Engineering peb6a@virginia.edu https://www.slideshare.net/pebourne 06/07/18 OA @ UNT 1 @pebourne
  • 2. Let me answer the question with a story… The case of the trauma surgeon… May 22, 2018 2
  • 3. Not convinced … try this one... • North Virginia Technology Council Announces a Hackathon • Teams compete internally – undergrads led by Daniel Mietchen and Pete Alonso • Team selected and competes against GMU, VT, VCU • UVA wins! May 22, 2018 3
  • 4. Source: U.S. Department of Veterans Affairs, US Census Bureau, data.gov The Problem May 22, 2018 4 Latest Estimate – 20 vets commit suicide every day In 2014, the latest year available, more than 7,400 veterans took their own lives, accounting for 18 percent of all suicides in America
  • 5. Age Ranges Veteran Demographic s Period Served Veteran Population Total Male Female Source: 2012, 2014, 2015 Data acquired from U.S. Department of Veterans Affairs, US Census Bureau, Department of Defense Race Distribution Race Distribution May 22, 2018 5
  • 6. Veteran and General Suicide Victim Demographics Suicide Rate by Age Key General Population Veteran Population Suicide by Race General Populatio Veteran Population Suicide Rate by Gender Key General Population Veteran Population Total Male Female Source: 2012, 2014 data acquired from U.S. Department of Veterans Affairs, data.gov, 2014 Centers for Disease Control and Prevention reports May 22, 2018 6
  • 7. Correlation: Health Care by State ● ~2M Veterans lack health insurance ● 42% Unaware of VA benefits ● Complicated priority system (VA) ○ False PTSD diagnosis – est. 47,000 undiagnosed each year Source: 2014 data acquired from Veterans Affairs and census.gov May 22, 2018 7
  • 8. Source: 2014 data acquired from Veterans Affairs, data.gov and census.gov Correlation: Social Isolation by State Key Rural Urban May 22, 2018 8
  • 9. Source: 2014 data acquired from Veterans Affairs, data.gov and census.gov Correlation: Social Isolation as Measured by State Key Rural UrbanUtah Arizona New Mexico Nevada May 22, 2018 9
  • 10. Source: 2014 data acquired from Veterans Affairs, data.gov and census.gov Firearm Regulation State Total Gun Laws Ammunition Regulations Background Checks Buyer Regulations Dealer Regulations Gun Trafficking Arizona 11 0 0 0 0 0 Nevada 11 1 0 0 0 0 New Mexico 10 0 0 0 0 0 Utah 11 0 0 0 0 2 All States 26.5 0.72 2.46 2.4 2.7 0.76 May 22, 2018 10
  • 11. Firearm Access Veteran Suicide Firearm Regulation Source: 2014 data acquired from U.S. Department of Veterans AffairsMay 22, 2018 11
  • 12. Social Media: Mining APIs and URLs 57.6 9% 90.0 0% Unsuccessful **Data collected using various Python data mining methods and respective Social Media APIsMay 22, 2018 12
  • 14. Community: Content Creation and Mental Health Training Utilize social media data to increase outreach Have VA Resources complement private health care Recommendation s Limit firearm possession based on mental health status May 22, 2018 14
  • 15. Open data is driving data science which in turn is/will change the way we do everything… 06/07/18 OA @ UNT 15
  • 16. Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume,Velocity,Variety Digital camera invented by Kodak but shelved Megapixels & quality improve slowly; Kodak slow to react Film market collapses; Kodak goes bankrupt Phones replace cameras Instagram, Flickr become the value proposition Digital media becomes bona fide form of communication From a presentation to the Advisory Board to the NIH Director Example - photography 1604/08/18
  • 17. So what is the problem? … 06/07/18 OA @ UNT 17 There is lots of data but it is hard to find and is not persistent …
  • 18. We are not FAIR • Digital assets (objects) within that system are data, software, narrative, course materials etc. • Assets are to varying degrees FAIR – Findable, Accessible, Interoperable and Reusable https://www.workitdaily.com/job-search-solution/ FAIR: https://www.nature.com/articles/sdata201618 6/19/17 18
  • 19. There is lots of data, but it gets lost quickly • Big Data • Total data from NIH-funded research currently estimated at 650 PB* • 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10 PB this year • Dark Data • Only 12% of data described in published papers is in recognized archives – 88% is dark data^ • Cost • 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data archives * In 2012 Library of Congress was 3 PB ^ http://www.ncbi.nlm.nih.gov/pubmed/26207759 6/15/17 Dataverse 2017 19
  • 20. Funders and publishers come at this from a perspective of reproducibility … 06/07/18 OA @ UNT 20
  • 21. I Cant Reproduce My Own Work It took several months to replicate this work 6/19/17 21
  • 22. The problem is more profound .. It inhibits doing the data science research in the first place … 06/07/18 OA @ UNT 22
  • 23. Stating the problem is easy .. What are some of the solutions? 06/07/18 OA @ UNT 23
  • 24. Both funders and institutions see the need to move from pipes to platforms… 6/15/17 Dataverse 2017 24 https://blog.lexicata.com/wp-content/uploads/2015/03/platform-model- 750x410.png
  • 25. Example: NSF and NIH Approaches 6/15/17 Dataverse 2017 25
  • 26. What evidence is there that platforms work? • Airbnb is a platform that supports a trusted relationship between consumer (renter) and supplier (host) • The platform focuses on maximizing the exchange of services between supplier and consumer and maximizing the amount of trust associated with a given stakeholder • It seems to be working: • 60 million users searching 2 million listings in 192 countries • Average of 500,000 stays per night. • Evaluation of US $25bn Bonazzi & Bourne 2017 PLOS Biology 15(4) e2001818 6/15/17 Dataverse 2017 26
  • 27. 06/07/18 OA @ UNT 27 OpenDataLab
  • 28. Platforms support 4 of the 5 pillars of data science 28 Data Integration & Engineering Machine Learning & Analytics Visualization & Dissemination Data Acquisition Ethics, Law, Policy, Social Implications May 22, 2018
  • 29. In summary • Open data defines much of the new economy and contributes to social good • This may be incentive enough • We are all part of this fourth paradigm • To fully realize the potential of open data we must be FAIR • We need to breakdown silos – platforms help 06/07/18 OA @ UNT 29
  • 30. Since libraries are experienced with open knowledge for the public good they have a key role to play. But how? 06/07/18 OA @ UNT 30