CG Core v2 Schema from the DSpace
Perspective
Alan Orth
CGSpace Technical Manager
Monitoring, Evaluation and Learning (MEL) Developers’ Retreat
Nairobi, Kenya
3- 6 December 2019
Dublin Core Schema Context & Landscape
DC → QDC → DCTERMS
• 1995: DC originates at OCLC workshop in Dublin, Ohio
• aka “simple”, consists of fifteen core metadata elements
called the Dublin Core Metadata Element Set (DCMES)
• 2000: Ongoing process by working groups to develop
qualifiers and encoding schemes for the DCMES
• 2008: DCTERMS supersedes DC and QDC
• Includes and refines previous schemas, adds new fields
• Each term has a unique URI, all defined as RDF properties
Excellent resource: https://en.wikipedia.org/wiki/Dublin_Core
Dublin Core in DSpace
• DSpace implements Qualified Dublin Core
• DSpace partially implements DCTERMS
• Simple Dublin Core, Qualified Dublin Core, and
DCTERMS are all available for describing items in a
DSpace repository
• Advanced: DSpace can use “crosswalks” to express
metadata in other formats (depending on how good
you are with XSLT)
Value Proposition for a “CG Core” Schema?
• Is it bad to say “I don’t know”?
• Why not use qualifiers, as permitted by Dublin Core?
• dc.subject.ilri
• dc.coverage.country
• dc.identifier.doi
• dc.creator.affiliation
• dc.date.embargo
• etc...
• See DMCI Grammatical Principles section 2.3
DCMI “Dumb-down Principle”
“The qualification of Dublin Core Elements is guided
by a rule known colloquially as the Dumb-Down
Principle. According to this rule, a client should be
able to ignore any qualifier and use the value as if it
were unqualified. While this may result in some loss
of specificity, the remaining term value (minus the
qualifier) must continue to be generally correct and
useful for discovery. Qualification is therefore
supposed only to refine, not extend the semantic
scope of an Element.”
“DCMI: DCMI Grammatical Principles”. www.dublincore.org. Retrieved 4 December 2019.
Value Proposition for a “CG Core” Schema
• Similar to the DC → QDC → DCTERMS evolution
• Introduction of formal schema with RDF data model
• See: agriculturalsemantics.github.io/cg-core/cgcore.rdf
• Standardized guidance about metadata fields and
controlled vocabularies
• For example, using ORCID for unique author identifiers
• For example, using ISO 639 alpha 3 for language codes
• See: agriculturalsemantics.github.io/cg-core/cgcore.html
• Enable programmatic validation of data sets using the
schema
The “CG Core” Dream
A “core” schema for meaningful metadata
interchange between CGIAR centers, CRPs, etc.
• Rise of web-based institutional repositories like
DSpace, CKAN, and DataVerse in CG after late 2000s
• Harvesting of repositories as means of syndication (no
duplication of content!)
• Increased interest in reporting and impact assessment
• Bonus: build cool things like AReS Explorer and
GARDIAN to see all research across the CG in one
place!
Build Cool Things
“AReS Explorer”. https://cgspace.cgiar.org/explorer. Retrieved 4 December 2019.
Progress on “CG Core” Schema
• “CG Core” initiative undertaken in 2015
• Formation of Metadata Working Group
• CGcore Draft version beta 1 (November, 2016)
• Beta version 1.0 (March, 2017)
• CG Core v2 “soft ratification” at the Big Data Platform
meeting in Kenya (October, 2018)
• CG Core v2 review by ILRI, ICARDA, IITA, and
WorldFish in Jordan (January, 2019)
• CG Core v2 ongoing review by Alan, Abenet, and
Marie-Angelique (mid-to-late 2019)
CG Core v2 Metadata Changes in Practice
• Much of CG Core v2 is simply aligning with DCTERMS
• For example, in the CGSpace context, some fields gain
a more appropriate home within DCTERMS:
• cg.identifier.status→dcterms.accessRights
• dc.rights→dcterms.license
• cg.link.reference→dcterms.relation
• dc.description.abstract→dcterms.abstract
• Others merely change places:
• dc.type→dcterms.type
• dc.format.extent→dcterms.extent
• dc.relation.ispartofseries→dcterms.isPartOf
See the full list: https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration
Technical Limitations to Adoption in DSpace
• DSpace 5.x and 6.x have many hard-coded references
to DC fields (see: IncludePageMeta.java)
• Impossible to migrate away from some fields:
• dc.title
• dc.identifier.uri
• dc.contributor.author
• dc.date.accessioned
• etc…
• DSpace uses a flat schema, so this is not possible:
<dc.creator affiliation="ILRI">Alan
Orth</dc.creator>
Progress of CG Core v2 Implementation
• CGSpace public test server is running CG Core v2 as of
November, 2019
• Item submission ✓
• Item display ✓
• OAI-PMH ✓
• REST API ✓
• CGSpace-specific DSpace 5.x code modifications are
available on GitHub
• Thorough implementation notes also available
• Soon solicit feedback from CGSpace community
• Massive effort for downstream consumers of CGSpace
• How long should the notice period be?
Acknowledgements
Medha Devare, Carlos Quiros, and Martin Mueller
for getting the first few drafts and betas of CG Core
out the door.
Marie-Angélique Laporte for being receptive to
feedback and for bringing “CG Core v2” into open,
accessible development on GitHub.
This presentation is licensed for use under the Creative Commons Attribution 4.0 International Licence.
better lives through livestock
ilri.org
ILRI thanks all donors and organizations which globally support its work through their contributions
to the CGIAR Trust Fund

More Related Content

PDF
Clojure - Why does it matter?
PPTX
Neo4j - Graph Database
PPT
Metadata lecture 5 part 2
PDF
Running deep neural nets in your Java application with Deeplearning4j
KEY
Taming NoSQL with Spring Data
PPT
Integrating HDF5 with SRB
PDF
NoSQL no more: SQL on Druid with Apache Calcite
PPTX
Rich Data Graphs for MapReduce
Clojure - Why does it matter?
Neo4j - Graph Database
Metadata lecture 5 part 2
Running deep neural nets in your Java application with Deeplearning4j
Taming NoSQL with Spring Data
Integrating HDF5 with SRB
NoSQL no more: SQL on Druid with Apache Calcite
Rich Data Graphs for MapReduce

What's hot (20)

PPTX
MongoDB and Azure Databricks
PDF
Large Table Partitioning with PostgreSQL and Django
 
PPTX
Ozone and HDFS’s evolution
PPTX
Intro to RavenDB
PDF
MongoDB .local London 2019: Managing Diverse User Needs with MongoDB and SQL
PDF
In-memory No SQL- GIDS2014
PPTX
NoSQL for SQL Users
PDF
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
PDF
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...
PPTX
No Time to Waste: Migrate from Oracle to Postgres in Minutes
 
PDF
KDB database (EPAM tech talks, Sofia, April, 2015)
PPTX
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 
PPTX
PDF
Practical Use of a NoSQL
PPT
HDF-EOS 2/5 to netCDF Converter
PPTX
Practical Use of a NoSQL Database
PPT
Kml Generation Web Services
PPTX
Mule soft mar 2017 Parquet Arrow
MongoDB and Azure Databricks
Large Table Partitioning with PostgreSQL and Django
 
Ozone and HDFS’s evolution
Intro to RavenDB
MongoDB .local London 2019: Managing Diverse User Needs with MongoDB and SQL
In-memory No SQL- GIDS2014
NoSQL for SQL Users
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...
No Time to Waste: Migrate from Oracle to Postgres in Minutes
 
KDB database (EPAM tech talks, Sofia, April, 2015)
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 
Practical Use of a NoSQL
HDF-EOS 2/5 to netCDF Converter
Practical Use of a NoSQL Database
Kml Generation Web Services
Mule soft mar 2017 Parquet Arrow
Ad

Similar to CG Core v2 schema from the DSpace perspective (20)

PPT
Dublin Core Metadata Tutorial.ppt
PPTX
Does MARC Have A Future?
PPT
DC Architecture WG Meeting - DC-2006, Mexico
PPT
DC-2008 Tutorial: Basic Concepts
PPT
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
PDF
Introduction to Dublin Core Metadata
PPTX
Database Integration in Distributed Database
PDF
DSpace-CRIS: an open source solution - Cineca euroCRIS membership meeting Por...
PDF
praxistreffen-bamberg-2025-worksophop.pdf
PPTX
Database , 4 Data Integration
ODP
DC-2008 Architecture Forum Open session
PPT
Dublin Core Description Set Profiles
PPTX
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
PPTX
Atla dublin core basics workshop
PPT
Dublin Core In Practice
PPTX
Ontologies, controlled vocabularies and Dataverse
 
PPT
DCMI Abstract Model: issues and proposed changes
PPT
Metadata Workshop - Utrecht - November 5, 2008
PDF
Tools for Next Generation of CMS: XML, RDF, & GRDDL
PPTX
dublin_core_2025 Key Standards for Building Data Catalogue
Dublin Core Metadata Tutorial.ppt
Does MARC Have A Future?
DC Architecture WG Meeting - DC-2006, Mexico
DC-2008 Tutorial: Basic Concepts
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Introduction to Dublin Core Metadata
Database Integration in Distributed Database
DSpace-CRIS: an open source solution - Cineca euroCRIS membership meeting Por...
praxistreffen-bamberg-2025-worksophop.pdf
Database , 4 Data Integration
DC-2008 Architecture Forum Open session
Dublin Core Description Set Profiles
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
Atla dublin core basics workshop
Dublin Core In Practice
Ontologies, controlled vocabularies and Dataverse
 
DCMI Abstract Model: issues and proposed changes
Metadata Workshop - Utrecht - November 5, 2008
Tools for Next Generation of CMS: XML, RDF, & GRDDL
dublin_core_2025 Key Standards for Building Data Catalogue
Ad

More from ILRI (20)

PPTX
How the small-scale low biosecurity sector could be transformed into a more b...
PPTX
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
PDF
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
PPTX
A training, certification and marketing scheme for informal dairy vendors in ...
PDF
Milk safety and child nutrition impacts of the MoreMilk training, certificati...
PPTX
Preventing the next pandemic: a 12-slide primer on emerging zoonotic diseases
PPTX
Preventing preventable diseases: a 12-slide primer on foodborne disease
PPTX
Preventing a post-antibiotic era: a 12-slide primer on antimicrobial resistance
PPTX
Food safety research in low- and middle-income countries
PPTX
Food safety research LMIC
PPTX
The application of One Health: Observations from eastern and southern Africa
PDF
One Health in action: Perspectives from 10 years in the field
PPTX
Reservoirs of pathogenic Leptospira species in Uganda
PDF
Minyoo ya mbwa
PDF
Parasites in dogs
PDF
Assessing meat microbiological safety and associated handling practices in bu...
PDF
Ecological factors associated with abundance and distribution of mosquito vec...
PPTX
Livestock in the agrifood systems transformation
PDF
Development of a fluorescent RBL reporter system for diagnosis of porcine cys...
PDF
Practices and drivers of antibiotic use in Kenyan smallholder dairy farms
How the small-scale low biosecurity sector could be transformed into a more b...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
A training, certification and marketing scheme for informal dairy vendors in ...
Milk safety and child nutrition impacts of the MoreMilk training, certificati...
Preventing the next pandemic: a 12-slide primer on emerging zoonotic diseases
Preventing preventable diseases: a 12-slide primer on foodborne disease
Preventing a post-antibiotic era: a 12-slide primer on antimicrobial resistance
Food safety research in low- and middle-income countries
Food safety research LMIC
The application of One Health: Observations from eastern and southern Africa
One Health in action: Perspectives from 10 years in the field
Reservoirs of pathogenic Leptospira species in Uganda
Minyoo ya mbwa
Parasites in dogs
Assessing meat microbiological safety and associated handling practices in bu...
Ecological factors associated with abundance and distribution of mosquito vec...
Livestock in the agrifood systems transformation
Development of a fluorescent RBL reporter system for diagnosis of porcine cys...
Practices and drivers of antibiotic use in Kenyan smallholder dairy farms

Recently uploaded (20)

PPT
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
PDF
Packaging materials of fruits and vegetables
PPTX
Presentation1 INTRODUCTION TO ENZYMES.pptx
PPTX
Microbes in human welfare class 12 .pptx
PDF
Wound infection.pdfWound infection.pdf123
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
PPTX
A powerpoint on colorectal cancer with brief background
PPTX
Seminar Hypertension and Kidney diseases.pptx
PPT
Mutation in dna of bacteria and repairss
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PPTX
TORCH INFECTIONS in pregnancy with toxoplasma
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
PPTX
Understanding the Circulatory System……..
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPTX
perinatal infections 2-171220190027.pptx
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
Packaging materials of fruits and vegetables
Presentation1 INTRODUCTION TO ENZYMES.pptx
Microbes in human welfare class 12 .pptx
Wound infection.pdfWound infection.pdf123
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
A powerpoint on colorectal cancer with brief background
Seminar Hypertension and Kidney diseases.pptx
Mutation in dna of bacteria and repairss
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
TORCH INFECTIONS in pregnancy with toxoplasma
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Hypertension_Training_materials_English_2024[1] (1).pptx
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
Understanding the Circulatory System……..
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
perinatal infections 2-171220190027.pptx

CG Core v2 schema from the DSpace perspective

  • 1. CG Core v2 Schema from the DSpace Perspective Alan Orth CGSpace Technical Manager Monitoring, Evaluation and Learning (MEL) Developers’ Retreat Nairobi, Kenya 3- 6 December 2019
  • 2. Dublin Core Schema Context & Landscape DC → QDC → DCTERMS • 1995: DC originates at OCLC workshop in Dublin, Ohio • aka “simple”, consists of fifteen core metadata elements called the Dublin Core Metadata Element Set (DCMES) • 2000: Ongoing process by working groups to develop qualifiers and encoding schemes for the DCMES • 2008: DCTERMS supersedes DC and QDC • Includes and refines previous schemas, adds new fields • Each term has a unique URI, all defined as RDF properties Excellent resource: https://en.wikipedia.org/wiki/Dublin_Core
  • 3. Dublin Core in DSpace • DSpace implements Qualified Dublin Core • DSpace partially implements DCTERMS • Simple Dublin Core, Qualified Dublin Core, and DCTERMS are all available for describing items in a DSpace repository • Advanced: DSpace can use “crosswalks” to express metadata in other formats (depending on how good you are with XSLT)
  • 4. Value Proposition for a “CG Core” Schema? • Is it bad to say “I don’t know”? • Why not use qualifiers, as permitted by Dublin Core? • dc.subject.ilri • dc.coverage.country • dc.identifier.doi • dc.creator.affiliation • dc.date.embargo • etc... • See DMCI Grammatical Principles section 2.3
  • 5. DCMI “Dumb-down Principle” “The qualification of Dublin Core Elements is guided by a rule known colloquially as the Dumb-Down Principle. According to this rule, a client should be able to ignore any qualifier and use the value as if it were unqualified. While this may result in some loss of specificity, the remaining term value (minus the qualifier) must continue to be generally correct and useful for discovery. Qualification is therefore supposed only to refine, not extend the semantic scope of an Element.” “DCMI: DCMI Grammatical Principles”. www.dublincore.org. Retrieved 4 December 2019.
  • 6. Value Proposition for a “CG Core” Schema • Similar to the DC → QDC → DCTERMS evolution • Introduction of formal schema with RDF data model • See: agriculturalsemantics.github.io/cg-core/cgcore.rdf • Standardized guidance about metadata fields and controlled vocabularies • For example, using ORCID for unique author identifiers • For example, using ISO 639 alpha 3 for language codes • See: agriculturalsemantics.github.io/cg-core/cgcore.html • Enable programmatic validation of data sets using the schema
  • 7. The “CG Core” Dream A “core” schema for meaningful metadata interchange between CGIAR centers, CRPs, etc. • Rise of web-based institutional repositories like DSpace, CKAN, and DataVerse in CG after late 2000s • Harvesting of repositories as means of syndication (no duplication of content!) • Increased interest in reporting and impact assessment • Bonus: build cool things like AReS Explorer and GARDIAN to see all research across the CG in one place!
  • 8. Build Cool Things “AReS Explorer”. https://cgspace.cgiar.org/explorer. Retrieved 4 December 2019.
  • 9. Progress on “CG Core” Schema • “CG Core” initiative undertaken in 2015 • Formation of Metadata Working Group • CGcore Draft version beta 1 (November, 2016) • Beta version 1.0 (March, 2017) • CG Core v2 “soft ratification” at the Big Data Platform meeting in Kenya (October, 2018) • CG Core v2 review by ILRI, ICARDA, IITA, and WorldFish in Jordan (January, 2019) • CG Core v2 ongoing review by Alan, Abenet, and Marie-Angelique (mid-to-late 2019)
  • 10. CG Core v2 Metadata Changes in Practice • Much of CG Core v2 is simply aligning with DCTERMS • For example, in the CGSpace context, some fields gain a more appropriate home within DCTERMS: • cg.identifier.status→dcterms.accessRights • dc.rights→dcterms.license • cg.link.reference→dcterms.relation • dc.description.abstract→dcterms.abstract • Others merely change places: • dc.type→dcterms.type • dc.format.extent→dcterms.extent • dc.relation.ispartofseries→dcterms.isPartOf See the full list: https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration
  • 11. Technical Limitations to Adoption in DSpace • DSpace 5.x and 6.x have many hard-coded references to DC fields (see: IncludePageMeta.java) • Impossible to migrate away from some fields: • dc.title • dc.identifier.uri • dc.contributor.author • dc.date.accessioned • etc… • DSpace uses a flat schema, so this is not possible: <dc.creator affiliation="ILRI">Alan Orth</dc.creator>
  • 12. Progress of CG Core v2 Implementation • CGSpace public test server is running CG Core v2 as of November, 2019 • Item submission ✓ • Item display ✓ • OAI-PMH ✓ • REST API ✓ • CGSpace-specific DSpace 5.x code modifications are available on GitHub • Thorough implementation notes also available • Soon solicit feedback from CGSpace community • Massive effort for downstream consumers of CGSpace • How long should the notice period be?
  • 13. Acknowledgements Medha Devare, Carlos Quiros, and Martin Mueller for getting the first few drafts and betas of CG Core out the door. Marie-Angélique Laporte for being receptive to feedback and for bringing “CG Core v2” into open, accessible development on GitHub.
  • 14. This presentation is licensed for use under the Creative Commons Attribution 4.0 International Licence. better lives through livestock ilri.org ILRI thanks all donors and organizations which globally support its work through their contributions to the CGIAR Trust Fund

Editor's Notes

  • #2: Disclaimer: I am not a metadata expert or ontologist!
  • #3: Background context and evolution of Dublin Core, to say nothing of other schemas.
  • #5: Why didn’t we just make use of qualifiers, as permitted by Dublin Core?
  • #6: Why didn’t we just stick to dc.subject.ilri, dc.identifier.doi, dc.coverage.country, etc as permitted by Dublin Core?
  • #7: Medha Devare, Carlos Quiros, Martin Mueller, Marie-Angelique.
  • #8: Medha Devare, Carlos Quiros, Martin Mueller, Marie-Angelique.
  • #9: Medha Devare, Carlos Quiros, Martin Mueller, Marie-Angelique.
  • #10: Medha Devare, Carlos Quiros, Martin Mueller, Marie-Angelique.
  • #11: Others are more complicated. Need to see reference schema to understand.
  • #13: As far as I know CGSpace is the only party working to implement CG Core v2 currently.