SlideShare a Scribd company logo
Standards, tools, incentives – what does it take to enable data sharing?
Fiona Nielsen at AIDR, May 14th 2019
I used to be a scientist
- like you
I became frustrated by my lack
of data access
Picture by Melissa O’Donahue CC-BY-ND
Strongly motivated by the journey of my mother
“Someone has got to do something!”
Fiona Nielsen
Around 2012
I wanted to build a data broker for genomics data
To speed up research
DNAdigest
Founded Repositive with Adrian
With Repositive we built a search engine for genomics
Contributed our data
search expertise to the
NIH Data Commons
Pilot
Open pages – all
indexed by Google
Index of >1million
public genomic data
sets
All users can
contribute
annotation and data
sets
Visit http://discover.repositive.io
We launched a marketplace for translational cancer models
Biopharma Cancer R&D Cancer model vendors
(CROs)
Help researchers find the right cancer
model to suit their needs in drug
development for precision medicine
What do cancer models have to do with data?
The cancer models are described by complex data:
• Genetic profile
• Tumor type
• Cancer growth and phenotype
There are 100s of cancer model providers
With 1000s of cancer models
Finding the right cancer model is a data access problem
Photo by Marblesgalore.com
We organize the data to make it easily searchable
Photo by Marblesgalore.com
Our platform enables data discovery and data sharing
Biopharma Cancer R&D Cancer model vendors
(CROs)
Serving pre-
clinical
CROs
seeking
customers
who are
interested in
their cancer
models
Serving
researchers
who are
looking to
outsource
their cancer
model
experiments World’s largest inventory of cancer
models 5,000+ models in our partner
networkRead more on http://repositive.io/
Why is data sharing still a
problem?
The scientific approach for addressing challenges
Solving the problem for Alice and Bob:
1. Define the problem
2. Design a solution
3. Publish a paper
You are done.
By odder - own work, based on png version originally uploaded to the Commons by Dake., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=1812312
Some steps are missing to solve the problem
Your solution/algorithm/standard method (i)
need to be implemented into an easy-to-use and available tool (ii)
And the people who are experiencing the problem need a reason/incentive (iii) to use the tool
Unfortunately, the scientific approach only addresses (i)
We have lots of algorithms and proposed standards
But not all of them are solving real problems…
https://xkcd.com/927/
On the other hand, if you have incentives…
©Derek Law
What have I learnt across academia and industry?
Data sharing is a multi-step process
Before you can access data, you need to
 Assert this data is the data that you need
 Discover that the data exists
 someone made that data discoverable
Each step of the process requires
incentives, tools, standards
My prediction
No advanced AI/ML tool will dramatically improve the data contributed,
discovered or accessed outside the communities and problem domains
where there is a strong incentive for it’s use.
But will incentives alone fix the problem?
You have to fix two sides:
- Increase the incentives
- And/or lower the effort needed to use the tools
- Incentives?
- Fix scientific publishing to include data
- Fix hiring, promotion, tenure criteria
- Fix funding requirements
©ReinierVanOorsouw
Take an example from AirBnB
Incentives and tools existed before AirBnB
Holiday homes would be listed on Craigslist
What did AirBnB do?
They made it easier to search for rentals,
And super-easy to make data about your rental home
discoverable!
AirBnB:
Incentives > tools > standards and methods
http://ui-patterns.com/explore/domain/airbnb+com
My message to you
Is not to stop developing methods 
It is:
If you care about the impact you want to make and you want to see the problem solved on a
larger scale, you have to care about making easy to use tools and fixing the incentives.
It is not an easy task, but I am with you on this one!
What can I do today to fix incentives?
Acknowledge and give credit for good data stewardship data creation, data visibility, data
accessibility, data curation, data publishing and data sharing.
Start in your day-to-day work: e.g. create a data steward award in your lab!
Always include when hiring, promoting and funding: promote good data stewardship
Make your data stewardship tools really easy to use (documentation, support, etc)
& keep developing cool methods 
Thank you for your attention – go share that data!
repositive [ re-poz-i-tiv ], noun;
1. a positive experience of accessing
genomic data repositories
Thanks for listening!
Find me on twitter @glyn_dk
and read more about us at Repositive.io

More Related Content

PDF
INSPIRE Conference 2013 - Environmental Data on the Web
PPTX
2020 Geography in Government: Trends
PPTX
Managing the Barriers to an Open Data Culture
PPT
Web 2.0+ Strategy for High School
PDF
Data ethics
PPTX
Network Effect Is Your Asset by Eva Lau
PPTX
DataONE Education Module 02: Data Sharing
PPTX
In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate
INSPIRE Conference 2013 - Environmental Data on the Web
2020 Geography in Government: Trends
Managing the Barriers to an Open Data Culture
Web 2.0+ Strategy for High School
Data ethics
Network Effect Is Your Asset by Eva Lau
DataONE Education Module 02: Data Sharing
In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate

Similar to AIDR2019 - standards - tools - incentives - what does it take to enable data sharing? (20)

PPTX
2015 balti-and-bioinformatics
PPTX
2014 aus-agta
PPTX
Will Biomedical Research Fundamentally Change in the Era of Big Data?
PDF
Open Access Week - Oxford, 20-24 Oct 2014
PPTX
Reproducible research: theory
PDF
What role can publishers play in the open data ecosystem?
PPT
AMIA 2014
PDF
Open Science Incentives/Veerle van den Eynden
PPTX
Data Sharing in Economics – Opportunities and Limitations_Toepfer
PPT
Data at the NIH: Some Early Thoughts
PPTX
SciDataCon - How to increase accessibility and reuse for clinical and persona...
PPTX
Data Citation Rewards and Incentives
PDF
ORD_ResearchDataInEconomics
PPT
Incentives for modern research
PPTX
CuttingEEG - Open Science, Open Data and BIDS for EEG
PDF
The State of Open Research Data
PDF
The State of Open Research Data - OpenCon 2014
PDF
Six things publishers can do to promote open research data
PPTX
Magle data curation in libraries
PPTX
Towards Open Research
2015 balti-and-bioinformatics
2014 aus-agta
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Open Access Week - Oxford, 20-24 Oct 2014
Reproducible research: theory
What role can publishers play in the open data ecosystem?
AMIA 2014
Open Science Incentives/Veerle van den Eynden
Data Sharing in Economics – Opportunities and Limitations_Toepfer
Data at the NIH: Some Early Thoughts
SciDataCon - How to increase accessibility and reuse for clinical and persona...
Data Citation Rewards and Incentives
ORD_ResearchDataInEconomics
Incentives for modern research
CuttingEEG - Open Science, Open Data and BIDS for EEG
The State of Open Research Data
The State of Open Research Data - OpenCon 2014
Six things publishers can do to promote open research data
Magle data curation in libraries
Towards Open Research
Ad

More from Fiona Nielsen (20)

PDF
IECT Summer School 2024 - Things I never knew I never knew - About Sales
PPTX
EICT Summer School August 2023 - Things I never knew I never knew - about bu...
PDF
Challenges with pre-clinical studies in immuno oncology - by Fiona Nielsen
PDF
Genomics for the public is coming - are you ready or not?
PPTX
Investing in innovation for genomic medicine - sept 5 2017
PPTX
Investing in innovation for genomic medicine - the journey of Repositive
PPTX
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
PPTX
ICG-11 - genomic data projects around the world - nov 5 2016
PPTX
Workshop - finding and accessing data - Cambridge August 22 2016
PPTX
Data dialogue - Human Genomic Data Discovery
PPTX
Genome sharing projects around the world - Open Access is not enough
PPTX
From Bioinformatics Scientist to Entrepreneur
PPTX
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...
PPTX
Session 3 - big (biomedical) data
PPTX
Workshop finding and accessing data - fiona - lunteren april 18 2016
PPTX
Why i left my job in genomics R&D - Lunteren - april 18 - 2016
PPTX
Genome sharing projects around the world nijmegen oct 29 - 2015
PPTX
Overcoming barriers for genomic data sharing yaac presentation may 23 2015
PPTX
The need to redefine genomic data sharing - moving towards Open Science Oct ...
PPTX
DNAdigest Eagle Genomics Symposium March 27, 2014
IECT Summer School 2024 - Things I never knew I never knew - About Sales
EICT Summer School August 2023 - Things I never knew I never knew - about bu...
Challenges with pre-clinical studies in immuno oncology - by Fiona Nielsen
Genomics for the public is coming - are you ready or not?
Investing in innovation for genomic medicine - sept 5 2017
Investing in innovation for genomic medicine - the journey of Repositive
From bioinformatics scientist to entrepreneur - Women in Omics - ICG11 - 2016
ICG-11 - genomic data projects around the world - nov 5 2016
Workshop - finding and accessing data - Cambridge August 22 2016
Data dialogue - Human Genomic Data Discovery
Genome sharing projects around the world - Open Access is not enough
From Bioinformatics Scientist to Entrepreneur
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...
Session 3 - big (biomedical) data
Workshop finding and accessing data - fiona - lunteren april 18 2016
Why i left my job in genomics R&D - Lunteren - april 18 - 2016
Genome sharing projects around the world nijmegen oct 29 - 2015
Overcoming barriers for genomic data sharing yaac presentation may 23 2015
The need to redefine genomic data sharing - moving towards Open Science Oct ...
DNAdigest Eagle Genomics Symposium March 27, 2014
Ad

Recently uploaded (20)

PPT
protein biochemistry.ppt for university classes
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
Microbiology with diagram medical studies .pptx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
protein biochemistry.ppt for university classes
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
TOTAL hIP ARTHROPLASTY Presentation.pptx
Taita Taveta Laboratory Technician Workshop Presentation.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
POSITIONING IN OPERATION THEATRE ROOM.ppt
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
The KM-GBF monitoring framework – status & key messages.pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
Microbiology with diagram medical studies .pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
AlphaEarth Foundations and the Satellite Embedding dataset
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS

AIDR2019 - standards - tools - incentives - what does it take to enable data sharing?

  • 1. Standards, tools, incentives – what does it take to enable data sharing? Fiona Nielsen at AIDR, May 14th 2019
  • 2. I used to be a scientist - like you
  • 3. I became frustrated by my lack of data access Picture by Melissa O’Donahue CC-BY-ND
  • 4. Strongly motivated by the journey of my mother
  • 5. “Someone has got to do something!” Fiona Nielsen Around 2012
  • 6. I wanted to build a data broker for genomics data To speed up research DNAdigest Founded Repositive with Adrian
  • 7. With Repositive we built a search engine for genomics Contributed our data search expertise to the NIH Data Commons Pilot Open pages – all indexed by Google Index of >1million public genomic data sets All users can contribute annotation and data sets Visit http://discover.repositive.io
  • 8. We launched a marketplace for translational cancer models Biopharma Cancer R&D Cancer model vendors (CROs) Help researchers find the right cancer model to suit their needs in drug development for precision medicine
  • 9. What do cancer models have to do with data? The cancer models are described by complex data: • Genetic profile • Tumor type • Cancer growth and phenotype There are 100s of cancer model providers With 1000s of cancer models
  • 10. Finding the right cancer model is a data access problem Photo by Marblesgalore.com
  • 11. We organize the data to make it easily searchable Photo by Marblesgalore.com
  • 12. Our platform enables data discovery and data sharing Biopharma Cancer R&D Cancer model vendors (CROs) Serving pre- clinical CROs seeking customers who are interested in their cancer models Serving researchers who are looking to outsource their cancer model experiments World’s largest inventory of cancer models 5,000+ models in our partner networkRead more on http://repositive.io/
  • 13. Why is data sharing still a problem?
  • 14. The scientific approach for addressing challenges Solving the problem for Alice and Bob: 1. Define the problem 2. Design a solution 3. Publish a paper You are done. By odder - own work, based on png version originally uploaded to the Commons by Dake., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=1812312
  • 15. Some steps are missing to solve the problem Your solution/algorithm/standard method (i) need to be implemented into an easy-to-use and available tool (ii) And the people who are experiencing the problem need a reason/incentive (iii) to use the tool Unfortunately, the scientific approach only addresses (i)
  • 16. We have lots of algorithms and proposed standards But not all of them are solving real problems… https://xkcd.com/927/
  • 17. On the other hand, if you have incentives… ©Derek Law
  • 18. What have I learnt across academia and industry? Data sharing is a multi-step process Before you can access data, you need to  Assert this data is the data that you need  Discover that the data exists  someone made that data discoverable Each step of the process requires incentives, tools, standards
  • 19. My prediction No advanced AI/ML tool will dramatically improve the data contributed, discovered or accessed outside the communities and problem domains where there is a strong incentive for it’s use.
  • 20. But will incentives alone fix the problem? You have to fix two sides: - Increase the incentives - And/or lower the effort needed to use the tools - Incentives? - Fix scientific publishing to include data - Fix hiring, promotion, tenure criteria - Fix funding requirements ©ReinierVanOorsouw
  • 21. Take an example from AirBnB Incentives and tools existed before AirBnB Holiday homes would be listed on Craigslist What did AirBnB do? They made it easier to search for rentals, And super-easy to make data about your rental home discoverable! AirBnB: Incentives > tools > standards and methods http://ui-patterns.com/explore/domain/airbnb+com
  • 22. My message to you Is not to stop developing methods  It is: If you care about the impact you want to make and you want to see the problem solved on a larger scale, you have to care about making easy to use tools and fixing the incentives. It is not an easy task, but I am with you on this one!
  • 23. What can I do today to fix incentives? Acknowledge and give credit for good data stewardship data creation, data visibility, data accessibility, data curation, data publishing and data sharing. Start in your day-to-day work: e.g. create a data steward award in your lab! Always include when hiring, promoting and funding: promote good data stewardship Make your data stewardship tools really easy to use (documentation, support, etc) & keep developing cool methods 
  • 24. Thank you for your attention – go share that data!
  • 25. repositive [ re-poz-i-tiv ], noun; 1. a positive experience of accessing genomic data repositories Thanks for listening! Find me on twitter @glyn_dk and read more about us at Repositive.io

Editor's Notes

  • #8: Before google for datasets 50+ data sources 1M data sets + open pages => now all the metadata we curated and indexed is also findable in google datasets
  • #23: You can start with: Fixing the skewed incentives for publishing scientific publications Fixing hiring, promotion, tenur Fixing funding requirements
  • #24: You can start with: Fixing the skewed incentives for publishing scientific publications Fixing hiring, promotion, tenur Fixing funding requirements