SlideShare a Scribd company logo
Metadata analysis of germplasm
collections
The case of agINFRA
Dr. Vassilis Protonotarios
Agricultural Biotechnologist, PhD
Agro-Know Technologies, Greece
e-Conference on Germplasm Data Interoperability
Session 2: “Status of data and metadata for germplasm”
Structure of the presentation
1. The agINFRA germplasm data sources
– Chinese Crop Germplasm Information System
– Italian National Germplasm Database

2. Current status
– Mappings
– Linked Data approach

3. Conclusions
The agINFRA germplasm data sources
agINFRA germplasm data sources
• Italian Germplasm Database (CRA)
– Data available through EURISCO -> GENESYS
– Uses EURISCO set of descriptors
– Data also available through GBIF

• Chinese Crop Germplasm Information System
(CGRIS/CAAS)
– Data unavailable through aggregators
– Own schema used for description of germplasm
accessions
– Metadata exposure in CSV
agINFRA germplasm data analysis
1. Analysis of agINFRA germplasm data sources
2. Analysis of metadata schemas used
3. Identification of external schemas
– Review of existing work

4. Definition of a base schema (descriptors)
5. Mappings of various schemas to the base
one
6. Development of a linked data approach for
linking germplasm data sources
1. Chinese Crop Germplasm
Information System (CGRIS / CAASD)
Chinese Crop Germplasm
Information System (CGRIS)
• Provided by: Chinese Academy of Agricultural Sciences
• A central repository for all type of plant genetic resources
information. It consists of six subsystems:
1. The management system of the National Crop Gene Bank (NCGB),
2. The management system of the long-term storage in Qinghai,
3. The management system of National germplasm Resources
Nursery,
4. The crop characterization and evaluation database system,
5. The database system for germplasm exchange at home and
abroad and
6. The management system of the medium-term storage in Beijing.

URL: http://icgr.caas.net.cn/cgrisintroduction.html
CGRIS: Data
At present, CGRIS owns
• > 2000 MB data on 180 kinds of crops
– including food crops, fibre plants, oil crops,
vegetable, fruit tree, tea, mulberry, tobacco,
sugar, green manure crops, tropical crops etc.),

• 390,000 accessions of germplasm
CGRIS: Accessions (indicative list)

http://icgr.caas.net.cn/cgrisintroduction.html
Crop Germplasm Classification
Info on wheat varieties
Info on wheat varieties
CGRIS: Germplasm Data Query
CGRIS: Germplasm Data Query
CGRIS Metadata
• CGRIS germplasm descriptors based on own
schema
– can be seen as the de facto standard for
germplasm accession information in China.
– Based on metadata scheme standards such as
developed by IPGRI (Bioversity) and GRIN
CGRIS: Basic Descriptors
CGRIS: Wheat descriptors
CGRIS Metadata: Next steps
• A mapping to the Multi-crop Passport
Descriptors (MCPD) standard is intended
– According to CAAS subject experts such a mapping
should be rather easy to produce.
CGRIS: Exposing data
• Data stored in relational DBs
• Hosted in an SQL server
• Exposure of data as CSV files (partially in
Chinese)
CGRIS: IPR information
• The CGRIS website is public and accessible for
everybody. The information is provided free of
charge but based on copyright.
• With regards to data exchange there is no
explicit policy to follow.
• CGRIS does not have an Open Access mandate
and the members of the CGRIS network apply
their own institution policy.
2. Italian Germplasm Database (CRA)
Italian Germplasm Database
• Provided by: Italian Council for Research and
Experimentation in Agriculture
• Developed in the context of the “Plant Genetic
Resources/FAO” project in 2004
– Research Centres and Units of the CRA
– The Institute of Plant Genetics of the CNR in Bari,
– NGO “Rete Semi Rurali”
– University collections (Perugia, Potenza etc.)
URL: http://fru.entecra.it
agINFRA Germplasm metadata analysis
CRA Germplasm: Data
Current status of germplasm data (CRA)
• 20,954 records from Italy are included in
EURISCO of which 17,212 from CRA
• 28,509 records for 275 plant species in the
National Inventory (in general)
– does not allow for identifying the number of CRA
germplasm records
CRA: Accessions (indicative list)

URL: http://fru.entecra.it/accessioni.php
Info on specific species
agINFRA Germplasm metadata analysis
EURISCO
descriptors
CRA Metadata
• Most CRA institutional databases use the
MCPD
– however, in the records provided to the National
Inventory several fields are often not filled.

• Some CRA collections also use descriptors
defined by
– the Union for the Protection of New Varieties of
Plants (UPOV) and
– the National Register of New Varieties.

• Ensure mapping to the Multi-crop Passport
Descriptors (MCPD)/EURISCO
CRA: IPR information
• The CRA website is public and accessible for everybody. The
information is provided free of charge but based on
copyright
• The Multilateral System (MLS) of the Treaty demands free
availability of the information on the PGRFA that are under
the management and control of the Contracting Parties and
in the public domain (Treaty, Art. 11.2).
• This excludes
– germplasm accessions that are subject to IPR and
– other legally binding protection which restricts the Contracting
Party’s control over the material.
– Accessions that are not covered by IPR include old and
autochthonous varieties, crop wild relatives and other material
found in in-situ conditions, new cultivars not protected by IPR
and cultivars whose IPR have expired.
Conclusions
Current status
• First version of mappings is available
• EURISCO descriptors used as base schema
– MCPD
– Darwin Core for Genebanks
– ABCD
– CGRIS
– CRA
Mapping table
Mapping table
Development of decision trees
Development of decision trees
Linked Data
• A linked data approach will be used by
agINFRA for linking germplasm data sources
• OpenAGRIS already aggregates germplasm
data using AGROVOC
Conclusions
• Both schemas / sets of descriptors can be
mapped to the EURISCO ones
• Linked Data approach will facilitate linking of
germplasm data from CRA/CGRIS
• EURISCO descriptors to be published as linked
data
– To be used as the base of passport data

• Linking to other germplasm standards
– e.g. Darwin Core for Genebanks*
*https://code.google.com/p/darwincore-germplasm/wiki/DarwinCoreGermplasmMapping
Take home message
• The identification of common properties
between different metadata schemas will
facilitate the linked data framework
(Indicative) List of References
• agINFRA Deliverable D2.3 “Review of Content
Requirements”
• agINFRA Deliverable D5.3 “Conceptual
specification of linked agricultural data
framework”
• agINFRA Germplasm Working Group Wiki
http://wiki.aginfra.eu/index.php/Germplasm_Working_Group

• EURISCO passport descriptors
http://www.ecpgr.cgiar.org/germplasm_databases.html

• Draft Mapping of EURISCO Descriptors to ABCD
2.06 http://www.bgbm.org/TDWG/CODATA/Schema/Mappings/EURISCO-2-ABCD.pdf
Source: http://verastic.com/social/why-do-people-not-say-thank-you.html

Contact me: vprot@agroknow.gr

More Related Content

PDF
Bioinformatics and sequencing tools used in research and development - OECD B...
PDF
Introduction - OECD Seminar on Bioinformatics and Regulation of Microbial Pes...
PPT
Bioinformatics Projects And Applications
PDF
BTIS
PPTX
Bioinformatics Applications in Biotechnology
PPTX
Bioinformatics Analysis of Nucleotide Sequences
PPTX
Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)
Bioinformatics and sequencing tools used in research and development - OECD B...
Introduction - OECD Seminar on Bioinformatics and Regulation of Microbial Pes...
Bioinformatics Projects And Applications
BTIS
Bioinformatics Applications in Biotechnology
Bioinformatics Analysis of Nucleotide Sequences
Darwin Core extension for genebanks (germplasm), at Kansas University (May 2012)

What's hot (20)

PPTX
Application of bioinformatics in climate smart horticulture
PPTX
Tools of bioinforformatics by kk
PPT
Bioinformatics in biotechnology by kk sahu
PDF
Current Trends & Developments of Bioinformatics
PPTX
Bioinformatics
PPT
1.bioinformatics introduction 32.03.2071
PPTX
Career oppurtunities in the field of Bioinformatics
PPT
Bioinformatics
PPT
Bioinformatics introduction
PPT
Project report-on-bio-informatics
PPTX
Bioinformatics Database Computer applications
PDF
Potential value of bioinformatic analysis in regulatory process - OECD Bioinf...
PDF
LECTURE NOTES ON BIOINFORMATICS
PPTX
Introduction to Gene Mining Part A: BLASTn-off!
PDF
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
PPT
Bioinformatics Information Sources
PDF
User-friendly bioinformatics (Monthly Informational workshop)
PDF
Bioinformatics databases: Current Trends and Future Perspectives
PDF
Data-integration platform for cancer research:cBioPortal demo
PPTX
Computational Biology and Bioinformatics
Application of bioinformatics in climate smart horticulture
Tools of bioinforformatics by kk
Bioinformatics in biotechnology by kk sahu
Current Trends & Developments of Bioinformatics
Bioinformatics
1.bioinformatics introduction 32.03.2071
Career oppurtunities in the field of Bioinformatics
Bioinformatics
Bioinformatics introduction
Project report-on-bio-informatics
Bioinformatics Database Computer applications
Potential value of bioinformatic analysis in regulatory process - OECD Bioinf...
LECTURE NOTES ON BIOINFORMATICS
Introduction to Gene Mining Part A: BLASTn-off!
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
Bioinformatics Information Sources
User-friendly bioinformatics (Monthly Informational workshop)
Bioinformatics databases: Current Trends and Future Perspectives
Data-integration platform for cancer research:cBioPortal demo
Computational Biology and Bioinformatics
Ad

Similar to agINFRA Germplasm metadata analysis (20)

PPTX
Major germplasm data sources and referatories
PPTX
Global RDF Descriptors for Germplasm Data
PPT
Italy palombi
PPT
Global Information Systems for Plant Genetic Resources (2009)
PPT
Global Information Systems for Plant Genetic Resources, SeedNet training cour...
PPT
Sharing of germplasm data sets, at the TDWG 2006 conference
PPTX
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
PPTX
The agINFRA Germplasm Working Group
PPT
Prototype germplasm data portal (2006)
PPT
Turok Amman Aegis Jan 2010
PPTX
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
PPT
Amman Workshop - Overview - M MacKay
PPT
GERMPLASM DATABASE.ppt
PDF
Meeting of the cgiar consortium board and director generals
PDF
Darwin Core extension for germplasm (11th December 2013)
PDF
Identifying linkages between the Genebank Platform and ISPC SPIA Isabel López...
PPT
A global information portal to facilitate and promote accessibility and ratio...
PPTX
Genesys: Online portal to Genebank Data
PPTX
Crop Trust Presentation of Performance Targets of the CRP and Long Term Grants
PDF
Importance of data sharing and germplasm movement
Major germplasm data sources and referatories
Global RDF Descriptors for Germplasm Data
Italy palombi
Global Information Systems for Plant Genetic Resources (2009)
Global Information Systems for Plant Genetic Resources, SeedNet training cour...
Sharing of germplasm data sets, at the TDWG 2006 conference
Opening and Integration of CASDD and Germplasm Data to AGRIS by Prof. Xuefu Z...
The agINFRA Germplasm Working Group
Prototype germplasm data portal (2006)
Turok Amman Aegis Jan 2010
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
Amman Workshop - Overview - M MacKay
GERMPLASM DATABASE.ppt
Meeting of the cgiar consortium board and director generals
Darwin Core extension for germplasm (11th December 2013)
Identifying linkages between the Genebank Platform and ISPC SPIA Isabel López...
A global information portal to facilitate and promote accessibility and ratio...
Genesys: Online portal to Genebank Data
Crop Trust Presentation of Performance Targets of the CRP and Long Term Grants
Importance of data sharing and germplasm movement
Ad

More from Vassilis Protonotarios (20)

PPTX
Doing business with Open Data in agriculture
PPTX
Legal interoperability in the fishery and marine data ecosystem
PPTX
Agricultural Data Interest Group & Wheat Data Working Group of RDA
PPTX
Agro-Know internal training: Using the Agro-Know blog
PPTX
Introduction to Agriculture & Food Safety Data
PPTX
Seeding organic agriculture courses on Moodle: the agriMoodle Case
PPTX
KOS Management - The case of the Organic.Edunet Ontology
PPTX
Designing Data Products
PPTX
Using language services to enrich the LOs' descriptions
PPTX
Using Agricultural Learning Portals in Developing Countries: The case of Orga...
PPT
Developing a network of content providers: The case of Organic.Edunet
PPT
AgEdWS12 - Introduction to the Workshop
PPT
Developing a network of content providers: The case of Organic.Edunet
PPT
Introducing a content integration process for a federation of agricultural in...
PPT
Organic.Edunet Web Portal - User Satisfaction Analysis (EFITA 2011)
PPT
Designing a Training Session for Public Authorities (EFITA 2011)
PPT
Identifying the Training Content Needs in Vocational Education & Training Pr...
PPT
PPT
Green Education Using Open Educational Resources (OER) (SPDECE 2012)
PPT
Presentation of the ISLE Network @ the SPDECE 2012 Symposium
Doing business with Open Data in agriculture
Legal interoperability in the fishery and marine data ecosystem
Agricultural Data Interest Group & Wheat Data Working Group of RDA
Agro-Know internal training: Using the Agro-Know blog
Introduction to Agriculture & Food Safety Data
Seeding organic agriculture courses on Moodle: the agriMoodle Case
KOS Management - The case of the Organic.Edunet Ontology
Designing Data Products
Using language services to enrich the LOs' descriptions
Using Agricultural Learning Portals in Developing Countries: The case of Orga...
Developing a network of content providers: The case of Organic.Edunet
AgEdWS12 - Introduction to the Workshop
Developing a network of content providers: The case of Organic.Edunet
Introducing a content integration process for a federation of agricultural in...
Organic.Edunet Web Portal - User Satisfaction Analysis (EFITA 2011)
Designing a Training Session for Public Authorities (EFITA 2011)
Identifying the Training Content Needs in Vocational Education & Training Pr...
Green Education Using Open Educational Resources (OER) (SPDECE 2012)
Presentation of the ISLE Network @ the SPDECE 2012 Symposium

Recently uploaded (20)

PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
20th Century Theater, Methods, History.pptx
PDF
1_English_Language_Set_2.pdf probationary
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
advance database management system book.pdf
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
TNA_Presentation-1-Final(SAVE)) (1).pptx
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
IGGE1 Understanding the Self1234567891011
PDF
Empowerment Technology for Senior High School Guide
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Practical Manual AGRO-233 Principles and Practices of Natural Farming
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
20th Century Theater, Methods, History.pptx
1_English_Language_Set_2.pdf probationary
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
Share_Module_2_Power_conflict_and_negotiation.pptx
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
Weekly quiz Compilation Jan -July 25.pdf
advance database management system book.pdf
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
TNA_Presentation-1-Final(SAVE)) (1).pptx
A powerpoint presentation on the Revised K-10 Science Shaping Paper
IGGE1 Understanding the Self1234567891011
Empowerment Technology for Senior High School Guide
Chinmaya Tiranga Azadi Quiz (Class 7-8 )

agINFRA Germplasm metadata analysis

  • 1. Metadata analysis of germplasm collections The case of agINFRA Dr. Vassilis Protonotarios Agricultural Biotechnologist, PhD Agro-Know Technologies, Greece e-Conference on Germplasm Data Interoperability Session 2: “Status of data and metadata for germplasm”
  • 2. Structure of the presentation 1. The agINFRA germplasm data sources – Chinese Crop Germplasm Information System – Italian National Germplasm Database 2. Current status – Mappings – Linked Data approach 3. Conclusions
  • 3. The agINFRA germplasm data sources
  • 4. agINFRA germplasm data sources • Italian Germplasm Database (CRA) – Data available through EURISCO -> GENESYS – Uses EURISCO set of descriptors – Data also available through GBIF • Chinese Crop Germplasm Information System (CGRIS/CAAS) – Data unavailable through aggregators – Own schema used for description of germplasm accessions – Metadata exposure in CSV
  • 5. agINFRA germplasm data analysis 1. Analysis of agINFRA germplasm data sources 2. Analysis of metadata schemas used 3. Identification of external schemas – Review of existing work 4. Definition of a base schema (descriptors) 5. Mappings of various schemas to the base one 6. Development of a linked data approach for linking germplasm data sources
  • 6. 1. Chinese Crop Germplasm Information System (CGRIS / CAASD)
  • 7. Chinese Crop Germplasm Information System (CGRIS) • Provided by: Chinese Academy of Agricultural Sciences • A central repository for all type of plant genetic resources information. It consists of six subsystems: 1. The management system of the National Crop Gene Bank (NCGB), 2. The management system of the long-term storage in Qinghai, 3. The management system of National germplasm Resources Nursery, 4. The crop characterization and evaluation database system, 5. The database system for germplasm exchange at home and abroad and 6. The management system of the medium-term storage in Beijing. URL: http://icgr.caas.net.cn/cgrisintroduction.html
  • 8. CGRIS: Data At present, CGRIS owns • > 2000 MB data on 180 kinds of crops – including food crops, fibre plants, oil crops, vegetable, fruit tree, tea, mulberry, tobacco, sugar, green manure crops, tropical crops etc.), • 390,000 accessions of germplasm
  • 9. CGRIS: Accessions (indicative list) http://icgr.caas.net.cn/cgrisintroduction.html
  • 11. Info on wheat varieties
  • 12. Info on wheat varieties
  • 15. CGRIS Metadata • CGRIS germplasm descriptors based on own schema – can be seen as the de facto standard for germplasm accession information in China. – Based on metadata scheme standards such as developed by IPGRI (Bioversity) and GRIN
  • 18. CGRIS Metadata: Next steps • A mapping to the Multi-crop Passport Descriptors (MCPD) standard is intended – According to CAAS subject experts such a mapping should be rather easy to produce.
  • 19. CGRIS: Exposing data • Data stored in relational DBs • Hosted in an SQL server • Exposure of data as CSV files (partially in Chinese)
  • 20. CGRIS: IPR information • The CGRIS website is public and accessible for everybody. The information is provided free of charge but based on copyright. • With regards to data exchange there is no explicit policy to follow. • CGRIS does not have an Open Access mandate and the members of the CGRIS network apply their own institution policy.
  • 21. 2. Italian Germplasm Database (CRA)
  • 22. Italian Germplasm Database • Provided by: Italian Council for Research and Experimentation in Agriculture • Developed in the context of the “Plant Genetic Resources/FAO” project in 2004 – Research Centres and Units of the CRA – The Institute of Plant Genetics of the CNR in Bari, – NGO “Rete Semi Rurali” – University collections (Perugia, Potenza etc.) URL: http://fru.entecra.it
  • 24. CRA Germplasm: Data Current status of germplasm data (CRA) • 20,954 records from Italy are included in EURISCO of which 17,212 from CRA • 28,509 records for 275 plant species in the National Inventory (in general) – does not allow for identifying the number of CRA germplasm records
  • 25. CRA: Accessions (indicative list) URL: http://fru.entecra.it/accessioni.php
  • 26. Info on specific species
  • 29. CRA Metadata • Most CRA institutional databases use the MCPD – however, in the records provided to the National Inventory several fields are often not filled. • Some CRA collections also use descriptors defined by – the Union for the Protection of New Varieties of Plants (UPOV) and – the National Register of New Varieties. • Ensure mapping to the Multi-crop Passport Descriptors (MCPD)/EURISCO
  • 30. CRA: IPR information • The CRA website is public and accessible for everybody. The information is provided free of charge but based on copyright • The Multilateral System (MLS) of the Treaty demands free availability of the information on the PGRFA that are under the management and control of the Contracting Parties and in the public domain (Treaty, Art. 11.2). • This excludes – germplasm accessions that are subject to IPR and – other legally binding protection which restricts the Contracting Party’s control over the material. – Accessions that are not covered by IPR include old and autochthonous varieties, crop wild relatives and other material found in in-situ conditions, new cultivars not protected by IPR and cultivars whose IPR have expired.
  • 32. Current status • First version of mappings is available • EURISCO descriptors used as base schema – MCPD – Darwin Core for Genebanks – ABCD – CGRIS – CRA
  • 37. Linked Data • A linked data approach will be used by agINFRA for linking germplasm data sources • OpenAGRIS already aggregates germplasm data using AGROVOC
  • 38. Conclusions • Both schemas / sets of descriptors can be mapped to the EURISCO ones • Linked Data approach will facilitate linking of germplasm data from CRA/CGRIS • EURISCO descriptors to be published as linked data – To be used as the base of passport data • Linking to other germplasm standards – e.g. Darwin Core for Genebanks* *https://code.google.com/p/darwincore-germplasm/wiki/DarwinCoreGermplasmMapping
  • 39. Take home message • The identification of common properties between different metadata schemas will facilitate the linked data framework
  • 40. (Indicative) List of References • agINFRA Deliverable D2.3 “Review of Content Requirements” • agINFRA Deliverable D5.3 “Conceptual specification of linked agricultural data framework” • agINFRA Germplasm Working Group Wiki http://wiki.aginfra.eu/index.php/Germplasm_Working_Group • EURISCO passport descriptors http://www.ecpgr.cgiar.org/germplasm_databases.html • Draft Mapping of EURISCO Descriptors to ABCD 2.06 http://www.bgbm.org/TDWG/CODATA/Schema/Mappings/EURISCO-2-ABCD.pdf