Data Driven Innovation
Interoperability Tech Track (#agridata)
18 & 19 March 2015, Wageningen (@rfinkers)
Outline
 Introduction “Interoperable Genetic Diversity”
 Concept ”Bring Your Own Data” party
 Aim BYOD Green Genetics?
 Outcome BYOD Green Genetics
 Hands on
2
Data Driven Innovation - Interoperable Genebanks (Tech Track Session)
Climate change & Social disruption
4Photograph: AFP/Getty Images
http://www.theguardian.com/commentisfree/2015/mar/08/guardian-view-climate-change-social-disruption#img-1
Data Driven Innovation - Interoperable Genebanks (Tech Track Session)
Select a genetically diverse collection
6
Legacy databases (e.g. Uniprot)
Genome Sequence & Genome
Annotation
Genome Variation Data (re-
sequencing collections) & SNP
annotation
Accession Passport Information
Accession Phenotype Information
Web based aggregation of Information
7
Interoperable Genetic Diversity
 Genebanks should utilize genomics data
● But should not store them!
 Genomics studies should make variant data available
● But need access to passport and
characterization & evaluation data.
 Breeders needs tools to access diversity
Finkers, van Hintum et al. 2014 DOI: 10.1017/S1479262114000689
Genebank
(s)
Genomics
provider(s)
Intermezzo: Linked Open Data
Standardization makes the information interoperable
• Controlled vocabularies
• Machine readable
• Can all be queried by a single question vs. visiting
many websites
Interoperable Genetic Diversity (2)
 Implications:
● Data can be stored at many different locations, but
can be found by computers
● Newly published information (in the correct format)
will be included automatically.
● Tools can be written to dedicated questions, such as
assessing allelic variation or utilize for collection
management
Finkers, van Hintum et al. 2014 DOI: 10.1017/S1479262114000689
Genebank
(s)
Genomics
provider(s)
Interdisciplinary Approach Needed
11
Genebanks
Genomics
provider(s)
Interdisciplinary Approach Needed
Need for Data Scientists &
Domain Experts
12
Genebanks
Genomics
provider(s)
Format: Bring your own Data Workshop
1. Users define the question(s)
2. Users and Linked data experts define concepts and
ontologies
3. Experts help to create linked data and formulate
query
Bring Your Own Data Workshop
 More Info: http://www.dtls.nl/fair-data/byod/
14
Data
owners
Domain
Experts
Trainers
Linked
Data
Experts
Example: Solanaceae Trait Ontology
BYOD in action
Select a genetically diverse collection
17
Legacy databases (e.g. Uniprot)
Genome Sequence & Genome
Annotation
Genome Variation Data (re-
sequencing collections) & SNP
annotation
Accession Passport Information
Accession Phenotype Information
Example Query
18
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX taxon: <http://openlifedata.org/taxonomy_resource:>
PREFIX tdwg: <http://rs.tdwg.org/dwc/terms/>
SELECT ?acc ?label (str(?lat) as ?latitude) (str(?long) as
?longitude)
GRAPH <http://cgngenis.wageningenur.nl> {
?acc taxon:species ?species .
?species rdfs:label ?label .
?acc tdwg:decimalLatitude ?lat .
?acc tdwg:decimalLongitude ?long
}
} order by ?label
Outcome: Query Graph
19
FAIRport* in VLPB?
*More on FAIRport in the presentation of Luiz Bonino, Thursday 10:30
Summary
 Blueprint “Interoperable Genetic Diversity Shown”
 BYOD resulted in interoperable data which could be
queried
● Request your own BYOD?
 Public <-> Private integration possible
Select a genetically diverse collection
22
Legacy databases (e.g. Uniprot)
Genome Sequence & Genome
Annotation
Genome Variation Data (re-
sequencing collections) & SNP
annotation
Accession Passport Information
Accession Phenotype Information
Select a genetically diverse collection
23
Legacy databases (e.g. Uniprot)
Genome Sequence & Genome
Annotation
Genome Variation Data (re-
sequencing collections) & SNP
annotation
Accession Passport Information
Accession Phenotype Information
Working Prototype
 screendump
24
Questions?
Acknowledgements:
BYOD team
Theo van Hinthum & Frank
Menting (CGN)
Denis Guryunov & Martijn
van Kaauwen (prototype)
et. all.
HaploSmasher Hands On Session
 HaploSmasher Prototype:
● genomic regions as input:
SL2.40ch03:10000..10200
● Solyc gene identifiers: Solyc10g085020
● Filter SNPs on impact type
● HIGH, MODERATE, LOW, MODIFIER
(SNPEff )
● No input validation yet
● Use correct notation, existing Solyc
gene ID’s
HaploSmasher
HaploSmasher
 Query CGN FAIRdata graph
● Prototype is only generating links to CGN passport data now
● Graph data of three CGN accessions is available in our testset
HaploSmasher examples:
 Haplotype Output
Example queries
 http://www.plantbreeding.wur.nl/hs/
 Also, explore variation data & Linked resources
● http://www.tomatogenome.net
 Examples:
● Beta-tubulin: Solyc10g085020
● HIGH & MODERATE vs. ALL effects
● Glutamate dehydrogenase Solyc05g052100
● Uridine kinase Solyc02g067880
● magnesium chelatase Solyc04g015750
30
HaploSmasher examples:
 Conserved housekeeping genes:
● Beta-tubulin Solyc10g085020 439 AA
● 1 SNP (HIGH & MODERATE effect) , two haplotypes
HaploSmasher examples:
● Beta-tubulin Solyc10g085020 439 AA
● 136 SNPs (all SNPEff impact types)
● Part of haplotype groups:
HaploSmasher examples:
● Glutamate dehydrogenase Solyc05g052100
● 13 SNPs (HIGH, MODERATE)
HaploSmasher examples:
● Uridine kinase Solyc02g067880
● 23 SNPs (HIGH, MODERATE)
● Example haplotype groups:
HaploSmasher examples:
● magnesium chelatase Solyc04g015750
● 21 SNPs (HIGH, MODERATE)
● Example haplotype groups:

More Related Content

PPTX
Reproducible and citable data and models: an introduction.
PDF
Capturing the context: one small(ish step for modellers, one giant leap for m...
PPTX
Crediting informatics and data folks in life science teams
PPTX
FAIR data and model management for systems biology.
PPTX
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
PPTX
FAIRer Research
PPTX
Research Objects, SEEK and FAIRDOM
PPTX
The FAIRDOM Commons for Systems Biology
Reproducible and citable data and models: an introduction.
Capturing the context: one small(ish step for modellers, one giant leap for m...
Crediting informatics and data folks in life science teams
FAIR data and model management for systems biology.
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
FAIRer Research
Research Objects, SEEK and FAIRDOM
The FAIRDOM Commons for Systems Biology

What's hot (10)

PPTX
Let’s go on a FAIR safari!
PPTX
Citing data in research articles: principles, implementation, challenges - an...
PPTX
FAIR Data and Model Management for Systems Biology (and SOPs too!)
PPTX
Introduction to FAIRDOM
PDF
Metadata-based tools at the ENCODE Portal
PDF
DisGeNET: A discovery platform for the dynamical exploration of human disease...
PPT
American Society for Mass Spectrometry Conference 2013
PPTX
The Research Object Initiative: Frameworks and Use Cases
PPTX
FAIRy stories: tales from building the FAIR Research Commons
PDF
DisGeNET: a discovery platform to support translational research and drug dis...
Let’s go on a FAIR safari!
Citing data in research articles: principles, implementation, challenges - an...
FAIR Data and Model Management for Systems Biology (and SOPs too!)
Introduction to FAIRDOM
Metadata-based tools at the ENCODE Portal
DisGeNET: A discovery platform for the dynamical exploration of human disease...
American Society for Mass Spectrometry Conference 2013
The Research Object Initiative: Frameworks and Use Cases
FAIRy stories: tales from building the FAIR Research Commons
DisGeNET: a discovery platform to support translational research and drug dis...
Ad

Viewers also liked (12)

PPTX
Construction of a whole-genome re-sequencing catalogue in tomato
PPTX
Breeding potato with BreeDB
PDF
Tomato Genome Build: SL2.5 to SL3.0
PDF
SGN Workshop Use Cases
PDF
Sol Meeting 2015 SGN workshop introduction
PDF
Genomic Selection & Precision Phenotyping
PDF
SGN Introduction to UNIX Command-line 2015 part 1
PDF
SGN UPLB 2016
PDF
Cassavabase general presentation PAG 2016
PDF
Cassavabase SolGS presentation PAG 2016
PDF
2 Cassavabase workshop: search menu
PDF
1 introduction to cassavabase
Construction of a whole-genome re-sequencing catalogue in tomato
Breeding potato with BreeDB
Tomato Genome Build: SL2.5 to SL3.0
SGN Workshop Use Cases
Sol Meeting 2015 SGN workshop introduction
Genomic Selection & Precision Phenotyping
SGN Introduction to UNIX Command-line 2015 part 1
SGN UPLB 2016
Cassavabase general presentation PAG 2016
Cassavabase SolGS presentation PAG 2016
2 Cassavabase workshop: search menu
1 introduction to cassavabase
Ad

Similar to Data Driven Innovation - Interoperable Genebanks (Tech Track Session) (20)

PDF
Data sharing and analysis
PDF
DisGeNET Tutorial SWAT4LS 2015-12-07
PDF
GRM 2011: The Integrated Breeding Platform tools and services
PPTX
GIAB-GRC workshop oct2015 giab introduction 151005
PPTX
Interoperable Data for KnetMiner and DFW Use Cases
PPTX
Plant Phenomics
PPTX
FAIR Agronomy, where are we? The KnetMiner Use Case
PPTX
Eccmid meet the expert 2015
PPT
Data management, data sharing: the SysMO-SEEK Story
PPT
Data sharing - Data management - The SysMO-SEEK Story
PPT
Personal Genomes: what can I do with my data?
PDF
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
PPTX
Aries systems eemug 2021 manuscript eval services panel sci score v2_edits
PDF
GCAT Update June 2013 @ The Clinical Genome Conference
PDF
Psb tutorial cancer_pathways
PPT
Demo Presentation Wageningen Text Mining Workshop 2007
PPTX
Best Practices for Validating a Next-Gen Sequencing Workflow
PDF
ISO 20428 Intro
PPTX
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
PPTX
Mining Phenotypes: How to set up a reverse genetics experiment with an Arabid...
Data sharing and analysis
DisGeNET Tutorial SWAT4LS 2015-12-07
GRM 2011: The Integrated Breeding Platform tools and services
GIAB-GRC workshop oct2015 giab introduction 151005
Interoperable Data for KnetMiner and DFW Use Cases
Plant Phenomics
FAIR Agronomy, where are we? The KnetMiner Use Case
Eccmid meet the expert 2015
Data management, data sharing: the SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK Story
Personal Genomes: what can I do with my data?
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
Aries systems eemug 2021 manuscript eval services panel sci score v2_edits
GCAT Update June 2013 @ The Clinical Genome Conference
Psb tutorial cancer_pathways
Demo Presentation Wageningen Text Mining Workshop 2007
Best Practices for Validating a Next-Gen Sequencing Workflow
ISO 20428 Intro
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Mining Phenotypes: How to set up a reverse genetics experiment with an Arabid...

Recently uploaded (20)

PPT
expt-design-lecture-12 hghhgfggjhjd (1).ppt
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
statsppt this is statistics ppt for giving knowledge about this topic
PDF
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
Global Data and Analytics Market Outlook Report
PDF
A biomechanical Functional analysis of the masitary muscles in man
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
Machine Learning and working of machine Learning
PPT
Image processing and pattern recognition 2.ppt
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
PDF
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PPTX
1 hour to get there before the game is done so you don’t need a car seat for ...
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
recommendation Project PPT with details attached
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
expt-design-lecture-12 hghhgfggjhjd (1).ppt
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
statsppt this is statistics ppt for giving knowledge about this topic
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Global Data and Analytics Market Outlook Report
A biomechanical Functional analysis of the masitary muscles in man
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Machine Learning and working of machine Learning
Image processing and pattern recognition 2.ppt
IMPACT OF LANDSLIDE.....................
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
1 hour to get there before the game is done so you don’t need a car seat for ...
retention in jsjsksksksnbsndjddjdnFPD.pptx
recommendation Project PPT with details attached
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305

Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

  • 1. Data Driven Innovation Interoperability Tech Track (#agridata) 18 & 19 March 2015, Wageningen (@rfinkers)
  • 2. Outline  Introduction “Interoperable Genetic Diversity”  Concept ”Bring Your Own Data” party  Aim BYOD Green Genetics?  Outcome BYOD Green Genetics  Hands on 2
  • 4. Climate change & Social disruption 4Photograph: AFP/Getty Images http://www.theguardian.com/commentisfree/2015/mar/08/guardian-view-climate-change-social-disruption#img-1
  • 6. Select a genetically diverse collection 6 Legacy databases (e.g. Uniprot) Genome Sequence & Genome Annotation Genome Variation Data (re- sequencing collections) & SNP annotation Accession Passport Information Accession Phenotype Information
  • 7. Web based aggregation of Information 7
  • 8. Interoperable Genetic Diversity  Genebanks should utilize genomics data ● But should not store them!  Genomics studies should make variant data available ● But need access to passport and characterization & evaluation data.  Breeders needs tools to access diversity Finkers, van Hintum et al. 2014 DOI: 10.1017/S1479262114000689 Genebank (s) Genomics provider(s)
  • 9. Intermezzo: Linked Open Data Standardization makes the information interoperable • Controlled vocabularies • Machine readable • Can all be queried by a single question vs. visiting many websites
  • 10. Interoperable Genetic Diversity (2)  Implications: ● Data can be stored at many different locations, but can be found by computers ● Newly published information (in the correct format) will be included automatically. ● Tools can be written to dedicated questions, such as assessing allelic variation or utilize for collection management Finkers, van Hintum et al. 2014 DOI: 10.1017/S1479262114000689 Genebank (s) Genomics provider(s)
  • 12. Interdisciplinary Approach Needed Need for Data Scientists & Domain Experts 12 Genebanks Genomics provider(s)
  • 13. Format: Bring your own Data Workshop 1. Users define the question(s) 2. Users and Linked data experts define concepts and ontologies 3. Experts help to create linked data and formulate query
  • 14. Bring Your Own Data Workshop  More Info: http://www.dtls.nl/fair-data/byod/ 14 Data owners Domain Experts Trainers Linked Data Experts
  • 17. Select a genetically diverse collection 17 Legacy databases (e.g. Uniprot) Genome Sequence & Genome Annotation Genome Variation Data (re- sequencing collections) & SNP annotation Accession Passport Information Accession Phenotype Information
  • 18. Example Query 18 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX taxon: <http://openlifedata.org/taxonomy_resource:> PREFIX tdwg: <http://rs.tdwg.org/dwc/terms/> SELECT ?acc ?label (str(?lat) as ?latitude) (str(?long) as ?longitude) GRAPH <http://cgngenis.wageningenur.nl> { ?acc taxon:species ?species . ?species rdfs:label ?label . ?acc tdwg:decimalLatitude ?lat . ?acc tdwg:decimalLongitude ?long } } order by ?label
  • 20. FAIRport* in VLPB? *More on FAIRport in the presentation of Luiz Bonino, Thursday 10:30
  • 21. Summary  Blueprint “Interoperable Genetic Diversity Shown”  BYOD resulted in interoperable data which could be queried ● Request your own BYOD?  Public <-> Private integration possible
  • 22. Select a genetically diverse collection 22 Legacy databases (e.g. Uniprot) Genome Sequence & Genome Annotation Genome Variation Data (re- sequencing collections) & SNP annotation Accession Passport Information Accession Phenotype Information
  • 23. Select a genetically diverse collection 23 Legacy databases (e.g. Uniprot) Genome Sequence & Genome Annotation Genome Variation Data (re- sequencing collections) & SNP annotation Accession Passport Information Accession Phenotype Information
  • 25. Questions? Acknowledgements: BYOD team Theo van Hinthum & Frank Menting (CGN) Denis Guryunov & Martijn van Kaauwen (prototype) et. all.
  • 26. HaploSmasher Hands On Session  HaploSmasher Prototype: ● genomic regions as input: SL2.40ch03:10000..10200 ● Solyc gene identifiers: Solyc10g085020 ● Filter SNPs on impact type ● HIGH, MODERATE, LOW, MODIFIER (SNPEff ) ● No input validation yet ● Use correct notation, existing Solyc gene ID’s
  • 28. HaploSmasher  Query CGN FAIRdata graph ● Prototype is only generating links to CGN passport data now ● Graph data of three CGN accessions is available in our testset
  • 30. Example queries  http://www.plantbreeding.wur.nl/hs/  Also, explore variation data & Linked resources ● http://www.tomatogenome.net  Examples: ● Beta-tubulin: Solyc10g085020 ● HIGH & MODERATE vs. ALL effects ● Glutamate dehydrogenase Solyc05g052100 ● Uridine kinase Solyc02g067880 ● magnesium chelatase Solyc04g015750 30
  • 31. HaploSmasher examples:  Conserved housekeeping genes: ● Beta-tubulin Solyc10g085020 439 AA ● 1 SNP (HIGH & MODERATE effect) , two haplotypes
  • 32. HaploSmasher examples: ● Beta-tubulin Solyc10g085020 439 AA ● 136 SNPs (all SNPEff impact types) ● Part of haplotype groups:
  • 33. HaploSmasher examples: ● Glutamate dehydrogenase Solyc05g052100 ● 13 SNPs (HIGH, MODERATE)
  • 34. HaploSmasher examples: ● Uridine kinase Solyc02g067880 ● 23 SNPs (HIGH, MODERATE) ● Example haplotype groups:
  • 35. HaploSmasher examples: ● magnesium chelatase Solyc04g015750 ● 21 SNPs (HIGH, MODERATE) ● Example haplotype groups:

Editor's Notes

  • #19: Query is difficult, but goal is to explain that we use “standardized terms” and that these resolve to actual (human readable) sites, with blocks of information which can be re-used
  • #20: Example to show connectivity.
  • #21: Emphasis on the fact that there is a distinction between public and private data. But, if agreed upon, data can be shared between companies.
  • #23: Add Gift Wrapping
  • #24: Add Gift Wrapping