SlideShare a Scribd company logo
Darwin Core Archive (DwC-A) 
validation: A New Collaborative 
Effort 
Christian Gendreau, Université de Montréal / Canadensys 
David P. Shorthouse, Université de Montréal / Canadensys 
Marie-Élise Lecoq, GBIF France 
Tim Robertson, GBIF
Darwin Core Archive (DwC-A) 
DarwinCore standard does not impose strong 
rules on the content associated with any 
DarwinCore terms.
Current GBIF DwC-A Validator 
Original goal 
“
 test Darwin Core Archives as specified in the 
Darwin Core Text Guide.” 
http://tools.gbif.org/dwca-validator/
Current GBIF DwC-A Validator 
Original target 
DwC-A are simple and can be created using 
simple custom scripts. 
“
 make sure GBIF and others can read the 
information as expected.”
Current GBIF DwC-A Validator 
‱ Validates archive structure 
‱ Offer web presence 
– Report viewer 
– API
Next GBIF DwC-A Validator? 
New goal 
Extends validation to the content of the archive 
https://github.com/gbif/dwca-validator
Current content validators 
‱ Atlas of Living Australia sandbox 
‱ VertNet – Spatial quality 
‱ GBIF Spain – Darwin Test 
‱ Encyclopedia of Life – dwc-validator 
‱ Scratchpads – dwca-validator 
‱ GlobalNames – dwc-archive ruby gem 
‱ 
 much more 
See Appendix 1 for links
What we need? 
‱ Accommodate different scopes 
‱ Configuration/customizations 
– Use more knowledge when available 
‱ Web access (page and API)
Scopes 
‱ Data entry 
‱ Desktop software 
– Scientific Work Flow 
– Statistical software 
‱ Integrated Publishing Toolkit (IPT) 
‱ National nodes 
‱ Aggregators
Configuration/Customization 
‱ Where the validator will be used? 
‱ Can we provide more information? 
– e.g. I know all the dates in my file should be ISO
Components 
‱ Library 
‱ Web 
‱ Extension Support
Library 
‱ Define structure for validation process 
‱ Provide a validation framework enabling 
sharing 
‱ Close to DarwinCore specification
Web 
‱ Web page to submit archive or URL 
‱ Report viewer 
‱ API
Extension Support 
‱ Include domain knowledge 
‱ Propose interpreted data
Internals 
‱ Validation types 
– Structure 
‱ Metadata 
– Records : Rows 
‱ Fields data (e.g. date, coordinates) 
– Records : Columns 
‱ ID uniqueness
Internals – Record level 
‱ Validation chain 
– Composed by chain elements 
– Possible parallelism
Internals – Record level 
‱ Immutable Chain element 
– Self contained 
‱ Never relies on another chain element 
– Ordering independent 
‱ Same behaviour wherever the element is used in the 
chain 
But what if I need really ordering?
Internals - Composition 
‱ Composed chain element 
‱ Exposed as one chain element
Composition example 
‱ Mandatory Latitude/Longitude 
– Check record completion on lat/long 
– Check decimal lat/long value
Configuration example 
‱ Select mandatory DarwinCore terms 
– scientificName must be provided 
‱ Restrict bounding box 
– decimalLatitude and decimalLongitude must be 
between
Customization example 
‱ Apply your own controlled vocabulary 
– Use your own dictionary for a term 
– ControlledVocabularyEvaluationRule
Extension Example 
‱ Suggester, link to narhwal-processor 
– Suùde –> ISO 3166-2:SE 
– URI –> http://sws.geonames.org/2661886
Collaborative 
‱ Share configuration 
‱ Share customization (dictionary) 
‱ Implement new reusable component 
– e.g. validation on specific Dwc-A extension
Collaboration 
‱ Where to go? 
– https://github.com/gbif/dwca-validator 
‱ Who can contribute? 
– Everyone 
‱ What is needed? 
– Ideas, constructive comments 
– Code review, feedback
Project status 
‱ Not yet released 
‱ Command line interface available 
Follow the project on GitHub
Acknowledgments
Special thanks 
‱ SiB Colombia 
‱ SiB Brazil 
‱ Peter Desmet 
‱ John Wieczorek 
‱ Dag Endresen 
‱ 

Appendix 1 
DwC Content validators 
Atlas of Living Australia sandbox 
http://sandbox.ala.org.au/datacheck/ 
VertNet – Spatial quality 
Displayed on occurrence pages at 
http://portal.vertnet.org/search 
GBIF Spain – Darwin Test 
http://www.gbif.es/darwin_test/Darwin_Test_in.php 
Encyclopedia of Life – dwc-validator 
http://services.eol.org/dwc_validator/
Appendix 1 - continue 
Scratchpads – dwca-validator 
https://github.com/edwbaker/dwca_validator/ 
GlobalNames – dwc-archive ruby gem 
https://github.com/GlobalNamesArchitecture/d 
wc-archive

More Related Content

PDF
Nzitf Velociraptor Workshop
PDF
Secure Redis Cluster At Box: Vova Galchenko, Ravitej Sistla
PPT
Building a lightweight discovery interface for Chinese patents
PDF
Logging with Elasticsearch, Logstash & Kibana
PDF
Velociraptor - SANS Summit 2019
PPTX
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
PDF
Identifying third party software with ScanCode
PDF
Expert Roundtable: The Future of Metadata After Hive Metastore
 
Nzitf Velociraptor Workshop
Secure Redis Cluster At Box: Vova Galchenko, Ravitej Sistla
Building a lightweight discovery interface for Chinese patents
Logging with Elasticsearch, Logstash & Kibana
Velociraptor - SANS Summit 2019
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
Identifying third party software with ScanCode
Expert Roundtable: The Future of Metadata After Hive Metastore
 

What's hot (20)

PPTX
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
PPTX
Azure Redis Cache - Cache on Steroids!
PDF
New use cases for Ceph, beyond OpenStack, Luis Rico
PPTX
SANS Cloud Security Summit 2018: Forensics as a Service
PDF
Crikeycon 2019 Velociraptor Workshop
PDF
IPFS: The Permanent Web
PDF
Digital Forensics and Incident Response in The Cloud Part 3
PDF
RSA APJ Velociraptor Lab
PDF
DevOoops (Increase awareness around DevOps infra security) - VoxxedDays Ticin...
PDF
Digital Forensics and Incident Response in The Cloud
PDF
Digital Forensics and Incident Response in The Cloud
PDF
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
PDF
Road to Opscon (Pisa '15) - DevOoops
PDF
Upstream Consultancy and Ceph RadosGW/S3 (AMTEGA Ceph Day 2018)
 
PPTX
project-presentation
PDF
Introducing ELK
PDF
What's new in Elasticsearch v5
PDF
Implementing RIOXX
PPTX
More kibana
PDF
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
Azure Redis Cache - Cache on Steroids!
New use cases for Ceph, beyond OpenStack, Luis Rico
SANS Cloud Security Summit 2018: Forensics as a Service
Crikeycon 2019 Velociraptor Workshop
IPFS: The Permanent Web
Digital Forensics and Incident Response in The Cloud Part 3
RSA APJ Velociraptor Lab
DevOoops (Increase awareness around DevOps infra security) - VoxxedDays Ticin...
Digital Forensics and Incident Response in The Cloud
Digital Forensics and Incident Response in The Cloud
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
Road to Opscon (Pisa '15) - DevOoops
Upstream Consultancy and Ceph RadosGW/S3 (AMTEGA Ceph Day 2018)
 
project-presentation
Introducing ELK
What's new in Elasticsearch v5
Implementing RIOXX
More kibana
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
Ad

Similar to Darwin Core Archive (DwC-A) validation: A New Collaborative Effort (20)

PDF
GBIF BIFA mentoring, Day 2 Publish data, July 2016
PPTX
Data Quality: Towards a Common Validator
PDF
Event core and new datatypes in GBIF - 10th European GBIF Nodes Meeting in Ta...
PPTX
Triplifier talk
PPTX
BiSciCol: Linking Information for Biodiversity Scientists
PDF
Ontologies for biodiversity informatics, UiO DSC June 2023
PPTX
3 bitriplifiertalk
PDF
Global Biodiversity Information Facility - 2013
PPT
EIA Biodiversity Data Mobilisation
PPTX
D paul ecn2013
PDF
Global Biodiversity Information Facility (GBIF) - 2012
PPTX
Community Standards and Tools for Biodiversity Science at NIEHD
PDF
European agrobiodioversity, ECPGR network meeting on EURISCO, Central Crop Da...
PPTX
Using the Biological Collections Ontology to Advance Biodiversity Science
PDF
PhD defense Julien Troudet (29/11/2017)
PPTX
Knowledge Organization System (KOS) for biodiversity information resources, G...
PPTX
Germplasm data exchange, CGIAR SINGER (2009)
PDF
Antabif training
PPTX
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
PDF
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...
GBIF BIFA mentoring, Day 2 Publish data, July 2016
Data Quality: Towards a Common Validator
Event core and new datatypes in GBIF - 10th European GBIF Nodes Meeting in Ta...
Triplifier talk
BiSciCol: Linking Information for Biodiversity Scientists
Ontologies for biodiversity informatics, UiO DSC June 2023
3 bitriplifiertalk
Global Biodiversity Information Facility - 2013
EIA Biodiversity Data Mobilisation
D paul ecn2013
Global Biodiversity Information Facility (GBIF) - 2012
Community Standards and Tools for Biodiversity Science at NIEHD
European agrobiodioversity, ECPGR network meeting on EURISCO, Central Crop Da...
Using the Biological Collections Ontology to Advance Biodiversity Science
PhD defense Julien Troudet (29/11/2017)
Knowledge Organization System (KOS) for biodiversity information resources, G...
Germplasm data exchange, CGIAR SINGER (2009)
Antabif training
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...
Ad

Recently uploaded (20)

PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
AI in Product Development-omnex systems
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
ai tools demonstartion for schools and inter college
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Introduction to Artificial Intelligence
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
System and Network Administraation Chapter 3
PPTX
L1 - Introduction to python Backend.pptx
PPTX
ISO 45001 Occupational Health and Safety Management System
PPTX
Transform Your Business with a Software ERP System
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
AI in Product Development-omnex systems
Operating system designcfffgfgggggggvggggggggg
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
ai tools demonstartion for schools and inter college
CHAPTER 2 - PM Management and IT Context
Softaken Excel to vCard Converter Software.pdf
Design an Analysis of Algorithms I-SECS-1021-03
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Introduction to Artificial Intelligence
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Which alternative to Crystal Reports is best for small or large businesses.pdf
System and Network Administraation Chapter 3
L1 - Introduction to python Backend.pptx
ISO 45001 Occupational Health and Safety Management System
Transform Your Business with a Software ERP System
Navsoft: AI-Powered Business Solutions & Custom Software Development
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus

Darwin Core Archive (DwC-A) validation: A New Collaborative Effort

  • 1. Darwin Core Archive (DwC-A) validation: A New Collaborative Effort Christian Gendreau, UniversitĂ© de MontrĂ©al / Canadensys David P. Shorthouse, UniversitĂ© de MontrĂ©al / Canadensys Marie-Élise Lecoq, GBIF France Tim Robertson, GBIF
  • 2. Darwin Core Archive (DwC-A) DarwinCore standard does not impose strong rules on the content associated with any DarwinCore terms.
  • 3. Current GBIF DwC-A Validator Original goal “
 test Darwin Core Archives as specified in the Darwin Core Text Guide.” http://tools.gbif.org/dwca-validator/
  • 4. Current GBIF DwC-A Validator Original target DwC-A are simple and can be created using simple custom scripts. “
 make sure GBIF and others can read the information as expected.”
  • 5. Current GBIF DwC-A Validator ‱ Validates archive structure ‱ Offer web presence – Report viewer – API
  • 6. Next GBIF DwC-A Validator? New goal Extends validation to the content of the archive https://github.com/gbif/dwca-validator
  • 7. Current content validators ‱ Atlas of Living Australia sandbox ‱ VertNet – Spatial quality ‱ GBIF Spain – Darwin Test ‱ Encyclopedia of Life – dwc-validator ‱ Scratchpads – dwca-validator ‱ GlobalNames – dwc-archive ruby gem ‱ 
 much more See Appendix 1 for links
  • 8. What we need? ‱ Accommodate different scopes ‱ Configuration/customizations – Use more knowledge when available ‱ Web access (page and API)
  • 9. Scopes ‱ Data entry ‱ Desktop software – Scientific Work Flow – Statistical software ‱ Integrated Publishing Toolkit (IPT) ‱ National nodes ‱ Aggregators
  • 10. Configuration/Customization ‱ Where the validator will be used? ‱ Can we provide more information? – e.g. I know all the dates in my file should be ISO
  • 11. Components ‱ Library ‱ Web ‱ Extension Support
  • 12. Library ‱ Define structure for validation process ‱ Provide a validation framework enabling sharing ‱ Close to DarwinCore specification
  • 13. Web ‱ Web page to submit archive or URL ‱ Report viewer ‱ API
  • 14. Extension Support ‱ Include domain knowledge ‱ Propose interpreted data
  • 15. Internals ‱ Validation types – Structure ‱ Metadata – Records : Rows ‱ Fields data (e.g. date, coordinates) – Records : Columns ‱ ID uniqueness
  • 16. Internals – Record level ‱ Validation chain – Composed by chain elements – Possible parallelism
  • 17. Internals – Record level ‱ Immutable Chain element – Self contained ‱ Never relies on another chain element – Ordering independent ‱ Same behaviour wherever the element is used in the chain But what if I need really ordering?
  • 18. Internals - Composition ‱ Composed chain element ‱ Exposed as one chain element
  • 19. Composition example ‱ Mandatory Latitude/Longitude – Check record completion on lat/long – Check decimal lat/long value
  • 20. Configuration example ‱ Select mandatory DarwinCore terms – scientificName must be provided ‱ Restrict bounding box – decimalLatitude and decimalLongitude must be between
  • 21. Customization example ‱ Apply your own controlled vocabulary – Use your own dictionary for a term – ControlledVocabularyEvaluationRule
  • 22. Extension Example ‱ Suggester, link to narhwal-processor – SuĂšde –> ISO 3166-2:SE – URI –> http://sws.geonames.org/2661886
  • 23. Collaborative ‱ Share configuration ‱ Share customization (dictionary) ‱ Implement new reusable component – e.g. validation on specific Dwc-A extension
  • 24. Collaboration ‱ Where to go? – https://github.com/gbif/dwca-validator ‱ Who can contribute? – Everyone ‱ What is needed? – Ideas, constructive comments – Code review, feedback
  • 25. Project status ‱ Not yet released ‱ Command line interface available Follow the project on GitHub
  • 27. Special thanks ‱ SiB Colombia ‱ SiB Brazil ‱ Peter Desmet ‱ John Wieczorek ‱ Dag Endresen ‱ 

  • 28. Appendix 1 DwC Content validators Atlas of Living Australia sandbox http://sandbox.ala.org.au/datacheck/ VertNet – Spatial quality Displayed on occurrence pages at http://portal.vertnet.org/search GBIF Spain – Darwin Test http://www.gbif.es/darwin_test/Darwin_Test_in.php Encyclopedia of Life – dwc-validator http://services.eol.org/dwc_validator/
  • 29. Appendix 1 - continue Scratchpads – dwca-validator https://github.com/edwbaker/dwca_validator/ GlobalNames – dwc-archive ruby gem https://github.com/GlobalNamesArchitecture/d wc-archive

Editor's Notes

  • #18: explain immutable
  • #21: ----- Notes de la rĂ©union (2014-10-20 14:54) ----- examples
  • #23: ----- Notes de la rĂ©union (2014-10-20 14:54) ----- suggester : explain it
  • #25: ----- Notes de la rĂ©union (2014-10-20 14:54) ----- collaboration received where to go current state, timeline current challenges, collaboration is needed