SlideShare a Scribd company logo
RDA Wheat Data Interoperability
Cookbook and last developments
9th
March 2015, San Diego
2
The WDI working group in brief
 Endorsement: March 2014
 Members: ~=30 members and 15 active members, Wheat
scientists, data and metadata technologists
 The goal: contribute to the improvement of Wheat related
data interoperability by
 Building a common interoperability framework (metadata, data formats and
vocabularies)
 Providing guidelines for describing, representing and linking Wheat related
data
3
 Deliverables
 A report of the survey of existing standards
 A cookbook intended for the Wheat data managers community, which
provides them with guidelines on what data formats, metadata, vocabularies
and ontologies they should use to describe, represent and link different
types of Wheat data.
 A library of linked vocabularies and ontologies in machine readable formats
with respect to the Linked Data standards.
 A prototype which showcases the gain of interoperability
Initial plans
4
Where we are
5
Data type Data formats currently used Recommendations
Standardized Tool specific Non
standardized
SNPs VCF BAM/SAM,
BED,
VARSCAN,
VEP
VCF files generated by using the
survey sequences of IWGSC +
metadata about VCF files to
enrich the information about the
SNPs.
genome
annotations
Genbank Flat File,
General Feature
Format (GFF), EMBL
GFF 3 + specifications with
regard the description of specific
columns
Germplasms MPCD, ABCD, Darwin
Core, Darwin Core
Germplasm
Grin Global tabulated MPCD
Gene
expression
Many format standards
laid out by repositories
such as NCBI (GEO)
and EBI Array Express
Existing format standards laid out
by the repositories such as NCBI
(GEO) and EBI Array Express +
ENA
Physical maps GFF Cmap, fpc GFF3
Genetic maps Cmap, gnpmap GFF3 (to be confirmed)
Phenotypes Drops, ped, isa-
tab, ephesis
tabulated Isa-tab
6
Examples of use cases
Title Searching for germplasm with specific traits
Description Example of searching for germplasm with specific traits - tagged with ontology terms?
Data types Germplasm
Phenotype
Challenges ● Metadata very important ~ standardized format
● Association of genes to traits, linked to germplasm, marker information
● Need for quality controls- how confident are you of the data source?
● Provenance of the germplasm- pedigree, ownership,
● Standard system for tracking germplasm, names
Title Identification of wheat genes that control root growth
Description Requires: Annotated genes (Gene Ontology, PFam, and other functional annotation)
Data types Genomic annotations? - Gene location ? (IWGS-SS ID or MIPS HCS link)
Challenges Mapping between wheat genes and orthologs from other species (deduce function by seq. similarity);
Access to RNASeq data (genes that are not expressed in roots may be irrelevant) ; mapping of wheat
genes and information on their function based on literature
Title Query on trial data associated with varieties
Data types Phenotypic data, GIS data, (wheat economy/production data)
Description To search wheat varieties with distribution maps, production figures, performances in wheat mega
environments, associated projects worldwide plus layers of climatic data on specific wheat production
areas and disease prevention information.
Challenges Phenotypic data should be linked to GIS data. Using keywords or ontology terms a system or a tool
should be able to pull out such information from different websites/systems developed by wheat
community.
7
8
 Assess the level of visibility and interoperability of Wheat
related vocabularies and ontologies
 Is the vocabulary/ontology updated regularly?
 What license and/or copyright is used?
 Is the vocabulary/ontology part of any ontology communities or listing
services?
 Is the vocabulary/ontology used or implemented in any database/repository?
 Does the vocabulary/ontology interlink and/or map to other vocabularies and
ontologies?
 Does the vocabulary/ontology
 Identify the domain covered by the ontologies and
vocabularies
 Refine the cookbook
 Collect more interoperability use cases
 Collect some technical details
Wheat related ontologies & vocabularies survey
9
Wheat related ontologies & vocabularies survey
The Wheat related BioPortal allows one to search for terms across multiple ontologies, browse
mappings between terms in different ontologies, receive recommendations on which ontologies are
most relevant for a corpus, annotate text with terms from ontologies
11
Next steps
 Metadata (harmonization, minimal metadata sets)
 Mappings
 Next workshop (summer 2015)
 Review and complete the recommendations
 Refine and complete the guidelines and the best practices
 Finalize the repository of Wheat related vocabularies
 Prototyping: a semantic knowledge base
 Integrate data from different data sources
 Provide smart search capabilities that leverage the vocabularies used against
the metadata.
12
Thank you!

More Related Content

PPT
Wheat Data Interoperability (1) by Esther DZALE YEUMO KABORE and Richard FULSS
PPTX
Rap db(rice annotation project data base)
DOC
Abstract template
DOC
Abstract template
DOCX
Abstract Template
PPT
Science Of Healthy Living SCMA 2050
PPTX
PhoenixBio 2020 Stanford Workshop on PhyloGenes
Wheat Data Interoperability (1) by Esther DZALE YEUMO KABORE and Richard FULSS
Rap db(rice annotation project data base)
Abstract template
Abstract template
Abstract Template
Science Of Healthy Living SCMA 2050
PhoenixBio 2020 Stanford Workshop on PhyloGenes

What's hot (19)

PPT
Ontology development and use for efficient information input and retrieval
PPT
Ontology development and use for efficient information input and retrieval
PPTX
PPTX
Digging for Buried Treasure: Strategies for Promoting Institutional Repositor...
PPT
bioinfomatics
PDF
08 wp7 progresses&results-20130221
PDF
Data Retrieval Systems
PPT
Primary and secondary database
PPTX
FAIR Data and Model Management for Systems Biology (and SOPs too!)
PPTX
Presentation from Code Camp 2017
PDF
Genome science intermine
DOCX
Data retrieval tools
PPTX
Protein Databases
PPTX
(Expasy)
PPT
Enabling Semantically Aware Software Applications
PPTX
databases in bioinformatics
PPT
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
PDF
UniProt and the Semantic Web
Ontology development and use for efficient information input and retrieval
Ontology development and use for efficient information input and retrieval
Digging for Buried Treasure: Strategies for Promoting Institutional Repositor...
bioinfomatics
08 wp7 progresses&results-20130221
Data Retrieval Systems
Primary and secondary database
FAIR Data and Model Management for Systems Biology (and SOPs too!)
Presentation from Code Camp 2017
Genome science intermine
Data retrieval tools
Protein Databases
(Expasy)
Enabling Semantically Aware Software Applications
databases in bioinformatics
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
UniProt and the Semantic Web
Ad

Similar to RDA Wheat Data Interoperability Cookbook and last developments (20)

PPTX
Agricultural Data Interest Group & Wheat Data Working Group of RDA
PPT
Wheat Data Interoperability (2) by Esther DZALE YEUMO KABORE and Richard FULSS
PPTX
IGAD Discussion Group 2: Increase Data Access and Availability
PPTX
Global RDF Descriptors for Germplasm Data
PDF
Ontology development for wheat information system
PPTX
IGAD_CODATA
PPTX
RDA Wheat Data Interoperability WG Demonstrator
PPTX
eROSA Stakeholder WS1: Making wheat data FAIR
PPT
Wheat Data Interoperability (3) by Esther DZALE YEUMO KABORE and Richard FULSS
PDF
GRM 2011: Wheat Research Initiative progress report
PPTX
Inventory of data standards for food & agriculture
PPT
Publishing Germplasm Vocabularies as Linked Data
PDF
Proteome bioinformatics and genetics for associating proteins with grain phen...
PPTX
The agINFRA Germplasm Working Group
PDF
Ontology-Based Services and Knowledge Management in the Agricultural Domain, ...
PDF
agriopenlink WS@EFITA 2015
PPT
Prosdocimi ucb cdao
PPTX
Ogc Ben Schaap june 24 2019 with link to farm data train
PPT
Amman Workshop #2 - M MacKay
Agricultural Data Interest Group & Wheat Data Working Group of RDA
Wheat Data Interoperability (2) by Esther DZALE YEUMO KABORE and Richard FULSS
IGAD Discussion Group 2: Increase Data Access and Availability
Global RDF Descriptors for Germplasm Data
Ontology development for wheat information system
IGAD_CODATA
RDA Wheat Data Interoperability WG Demonstrator
eROSA Stakeholder WS1: Making wheat data FAIR
Wheat Data Interoperability (3) by Esther DZALE YEUMO KABORE and Richard FULSS
GRM 2011: Wheat Research Initiative progress report
Inventory of data standards for food & agriculture
Publishing Germplasm Vocabularies as Linked Data
Proteome bioinformatics and genetics for associating proteins with grain phen...
The agINFRA Germplasm Working Group
Ontology-Based Services and Knowledge Management in the Agricultural Domain, ...
agriopenlink WS@EFITA 2015
Prosdocimi ucb cdao
Ogc Ben Schaap june 24 2019 with link to farm data train
Amman Workshop #2 - M MacKay
Ad

More from CIARD Movement (20)

PPTX
Efficient & effective data management for research projects : ILRI's Data Ma...
PPTX
Social Media in: Disseminating and Sharing Agriculture Data/Information
PDF
DSpace at ILRI : A semi-technical overview of “CGSpace”
PPTX
University of Nairobi, Open Access Initiatives
PPT
Knowledge Management at KEFRI
PPT
Open Research Data – the KALRO experience
PPTX
JKUAT Case on Open Access
PPTX
JKUAT Case on Open Access
PPTX
Open Data and Open Science in Agriculture: Management
PPTX
Open Access Initiatives and Challenges in Kenya: Universities
PPT
ICT Centre of Excellence and Open Data –iCEOD
PPTX
Open Data and Big Data Capacity Building Initiative
PPTX
Forum on Open Data and Open Science in Agriculture in Kenya: African Journal ...
PPT
Open Data and Open Science in Agriculture : Experiences and Opinions
PPTX
Open Access, Open Data and Open Science in the context of agricultural research
PPTX
Introducing the GODAN Secretariat
PPTX
Research Data Management at International Food Policy Research Institute-IFPRI
PPTX
Enabling Global Solutions for Agricultural and Nutrition Challenges through L...
PPTX
The CIARD RINGValeri
PPTX
Turning three thesauri into a Global Agricultural Concept Scheme
Efficient & effective data management for research projects : ILRI's Data Ma...
Social Media in: Disseminating and Sharing Agriculture Data/Information
DSpace at ILRI : A semi-technical overview of “CGSpace”
University of Nairobi, Open Access Initiatives
Knowledge Management at KEFRI
Open Research Data – the KALRO experience
JKUAT Case on Open Access
JKUAT Case on Open Access
Open Data and Open Science in Agriculture: Management
Open Access Initiatives and Challenges in Kenya: Universities
ICT Centre of Excellence and Open Data –iCEOD
Open Data and Big Data Capacity Building Initiative
Forum on Open Data and Open Science in Agriculture in Kenya: African Journal ...
Open Data and Open Science in Agriculture : Experiences and Opinions
Open Access, Open Data and Open Science in the context of agricultural research
Introducing the GODAN Secretariat
Research Data Management at International Food Policy Research Institute-IFPRI
Enabling Global Solutions for Agricultural and Nutrition Challenges through L...
The CIARD RINGValeri
Turning three thesauri into a Global Agricultural Concept Scheme

Recently uploaded (20)

PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
Application of enzymes in medicine (2).pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
C1 cut-Methane and it's Derivatives.pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
Fluid dynamics vivavoce presentation of prakash
PDF
The scientific heritage No 166 (166) (2025)
PPTX
Microbiology with diagram medical studies .pptx
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
Sciences of Europe No 170 (2025)
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Application of enzymes in medicine (2).pptx
Phytochemical Investigation of Miliusa longipes.pdf
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
2. Earth - The Living Planet Module 2ELS
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
C1 cut-Methane and it's Derivatives.pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Fluid dynamics vivavoce presentation of prakash
The scientific heritage No 166 (166) (2025)
Microbiology with diagram medical studies .pptx
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Classification Systems_TAXONOMY_SCIENCE8.pptx
Sciences of Europe No 170 (2025)
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf

RDA Wheat Data Interoperability Cookbook and last developments

  • 1. RDA Wheat Data Interoperability Cookbook and last developments 9th March 2015, San Diego
  • 2. 2 The WDI working group in brief  Endorsement: March 2014  Members: ~=30 members and 15 active members, Wheat scientists, data and metadata technologists  The goal: contribute to the improvement of Wheat related data interoperability by  Building a common interoperability framework (metadata, data formats and vocabularies)  Providing guidelines for describing, representing and linking Wheat related data
  • 3. 3  Deliverables  A report of the survey of existing standards  A cookbook intended for the Wheat data managers community, which provides them with guidelines on what data formats, metadata, vocabularies and ontologies they should use to describe, represent and link different types of Wheat data.  A library of linked vocabularies and ontologies in machine readable formats with respect to the Linked Data standards.  A prototype which showcases the gain of interoperability Initial plans
  • 5. 5 Data type Data formats currently used Recommendations Standardized Tool specific Non standardized SNPs VCF BAM/SAM, BED, VARSCAN, VEP VCF files generated by using the survey sequences of IWGSC + metadata about VCF files to enrich the information about the SNPs. genome annotations Genbank Flat File, General Feature Format (GFF), EMBL GFF 3 + specifications with regard the description of specific columns Germplasms MPCD, ABCD, Darwin Core, Darwin Core Germplasm Grin Global tabulated MPCD Gene expression Many format standards laid out by repositories such as NCBI (GEO) and EBI Array Express Existing format standards laid out by the repositories such as NCBI (GEO) and EBI Array Express + ENA Physical maps GFF Cmap, fpc GFF3 Genetic maps Cmap, gnpmap GFF3 (to be confirmed) Phenotypes Drops, ped, isa- tab, ephesis tabulated Isa-tab
  • 6. 6 Examples of use cases Title Searching for germplasm with specific traits Description Example of searching for germplasm with specific traits - tagged with ontology terms? Data types Germplasm Phenotype Challenges ● Metadata very important ~ standardized format ● Association of genes to traits, linked to germplasm, marker information ● Need for quality controls- how confident are you of the data source? ● Provenance of the germplasm- pedigree, ownership, ● Standard system for tracking germplasm, names Title Identification of wheat genes that control root growth Description Requires: Annotated genes (Gene Ontology, PFam, and other functional annotation) Data types Genomic annotations? - Gene location ? (IWGS-SS ID or MIPS HCS link) Challenges Mapping between wheat genes and orthologs from other species (deduce function by seq. similarity); Access to RNASeq data (genes that are not expressed in roots may be irrelevant) ; mapping of wheat genes and information on their function based on literature Title Query on trial data associated with varieties Data types Phenotypic data, GIS data, (wheat economy/production data) Description To search wheat varieties with distribution maps, production figures, performances in wheat mega environments, associated projects worldwide plus layers of climatic data on specific wheat production areas and disease prevention information. Challenges Phenotypic data should be linked to GIS data. Using keywords or ontology terms a system or a tool should be able to pull out such information from different websites/systems developed by wheat community.
  • 7. 7
  • 8. 8  Assess the level of visibility and interoperability of Wheat related vocabularies and ontologies  Is the vocabulary/ontology updated regularly?  What license and/or copyright is used?  Is the vocabulary/ontology part of any ontology communities or listing services?  Is the vocabulary/ontology used or implemented in any database/repository?  Does the vocabulary/ontology interlink and/or map to other vocabularies and ontologies?  Does the vocabulary/ontology  Identify the domain covered by the ontologies and vocabularies  Refine the cookbook  Collect more interoperability use cases  Collect some technical details Wheat related ontologies & vocabularies survey
  • 9. 9 Wheat related ontologies & vocabularies survey
  • 10. The Wheat related BioPortal allows one to search for terms across multiple ontologies, browse mappings between terms in different ontologies, receive recommendations on which ontologies are most relevant for a corpus, annotate text with terms from ontologies
  • 11. 11 Next steps  Metadata (harmonization, minimal metadata sets)  Mappings  Next workshop (summer 2015)  Review and complete the recommendations  Refine and complete the guidelines and the best practices  Finalize the repository of Wheat related vocabularies  Prototyping: a semantic knowledge base  Integrate data from different data sources  Provide smart search capabilities that leverage the vocabularies used against the metadata.