GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen, Senior Programme Officer, GBIF 15 September 2009, Biodiversity Informatics WWW.GBIF.ORG Global Names Architecture A Rationale Brief History Components
Biodiversity Information: A focus on taxa All  accumulated information  of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge. -  Grimaldi & Engel, 2005, Evolution of the Insects Biodiversity Informatics: Creation, Curation, Discovery, Delivery of biodiversity information
A  name that serves as a link to what has been learned  in the past… From T.E. Glover, The Fishes of Southwestern Japan, c.1870
A  name that serves as a link to what has been learned  in the past… Unlike many other domains of science, historic publications have continued importance.
… and that we today add to the body of knowledge. From T.E. Glover, The Fishes of Southwestern Japan, c.1870
GBIF index 177 million records (> 5%/month) G igabytes of text (~100 now) All data mobilized through GBIF
Biodiversity Information Species information “tied” to scientific names
T he “Names Problem” Not Stable 5-10% names invalidated/decade Not unique No complete list of names No complete list of species No agreement on how many Even within a single group Impacts discovery and access of information about species
T he “Names Problem” Properties of Names Orthographic (As labels of text that are “tied” to information about species) Nomenclature (As the core “words” of taxonomy that tie a name to a original publication and type) Taxonomy (As components of taxon definitions derived via authoritative taxonomic rigor)
Orthography Orthography and the Names Problem Objectives for Remediation
Variations in name spelling Loligo pealeii Loligo pealii Loligo pealei
Some names are more hard to spell than others Actinobacillus actimomycetemcomitans Actinobacillus actimycetemcomitans Actinobacillus actinmycetemcomitans Actinobacillus actinomicetemcomitans Actinobacillus actinomy Actinobacillus actinomyce Actinobacillus actinomycemcomitans Actinobacillus actinomyceremcomitans Actinobacillus actinomycetam Actinobacillus actinomycetamcomitans Actinobacillus actinomycetecomitans Actinobacillus actinomycetemcmitans Actinobacillus actinomycetemcomintans Actinobacillus actinomycetemcomitance Actinobacillus actinomycetemcomitans Actinobacillus actinomycetemcomitants Actinobacillus actinomycetemcommitans Actinobacillus actinomycetemocimitans Actinobacillus actinomycetencomitans Actinobacillus actinomycetum Actinobacillus actinomyctemcomitans Actinobacillus actinomyectomcomitans Actinobacillus actinomyetemcomitans Actinobacillus actinonmycetemcomitans Actinobacillus actionomycetemcomitans Actinobacillus actynomicetemcomitans Actinobacillus antinomycetemcomitans Difficulties with Latinized Names Transcription errors Which one is the correct one?
Agalinus paupercula borealis Agalinus pauperculum borealis Agalinis paupercula var. Borealis Agalinus pauperculum var. borealis Agalinus paupercula var. borealis Agalinus paupercula var. borealis Pennell Agalinus paupercula Britton var. borealis Pennell Agalinus paupercula (Gray) Britt. var. borealis Pennell Agalinis paupercula (A.Gray) Britton var. borealis Pennell Agalinus paupercula (Gray) Britton var. borealis (Pennell) Zenkert 1934 Gerardia paupercula borealis Gerardia paupercula var. borealis Gerardia paupercula var. borealis (Pennell) Deam Gerardia paupercula (Gray) Britt. var. borealis (Pennell) Deam Gerardia paupercula (Gray) Britt. var. borealis (Pennell) Deam Gerardia paupercula (A. Gray) Britton var. borealis (Pennell) Deam Gerardia paupercula (A. Gray) Britton subsp. borealis (Pennell) Pennell Gerardia paupercula (Gray) Britt. ssp. borealis (Pennell) Pennell  Gerardia paupercula Britton ssp. borealis Pennell Many ways to correctly spell a name Should GBIF/EoL/BHL display all/one/some?
Objectives Informatics can contribute Index names occurring in content we wish to publicise and access Develop tools to extract, catalog, and match names. Reconcile names to authoritative names sources via a common resolution path Reconcile name occurrence to taxonomic concepts via a common concept resolution path
Nomenclature Nomenclatural aspects of the names problem. Approaches for remediating them
Don’t pass on bad information. How can we determine the status of the names we discover in content that we serve?
Nomenclatural changes impact search and retrieval Where can I find out these names are related? Zoological Code doesn’t track recombinations Botanical Code does.
Nomenclatural changes impact search and retrieval
Homonyms Peranema  – the fern Peranema  – the euglenid How many Peranema are there? How can I tell them apart?
Homonyms Taxonomic context alone doesn’t tell me enough. Kingdom Phylum Class Order Family Genus Plantae Magnoliophyta Magnoliopsida Apiales Umbelliferae Oenanthe Plantae Oenanthe Oenanthe Plantae Magnoliophyta Magnoliopsida Apiales Apiaceae Oenanthe Plantae Orchidaceae Oenanthe Animalia Chordata Aves Passeriformes Muscicapidae Oenanthe Animalia Chordata Aves Passeriformes Turdidae Oenanthe Animalia Chordata Actinopterygii Perciformes Pomatomidae Pomatomus Animalia Chordata Pisces Perciformes Serranidae Pomatomus
Approaches to remediation Consolidate the major nomenclatural databases A single nomenclatural dictionary Populate with provisionally verified records and enable open annotation Provides nomenclatural status of a name Collectively identifies all homonyms.  Identifiers used in taxonomic data provide disambiguation context Ties all distinct nomenclatural combinations to the original published name. Informatics Promote global identifiers and simple resolution pathway for these data
Taxonomy Taxonomic Examples of the Names problem Approaches for remediating them
Taxonomic synonyms Halichondria panicea  (Pallas 1776)  sec Van Soest 2002 (WoRMS)
Consequences of Splitting Taxon Concept problem:  What does someone mean when they refer to P. carinii
The Perils of Lumping Bear Lodge meadow jumping mouse. Zaphus hudsonius campestris Zaphus hudsonius preblei INCLUDES DOES NOT INCLUDE Dr. Rob Roy Ramey says Dr. Tim King says Preble’s meadow jumping mouse. What should a search for “Zaphus hudsonius campestris” return?
Different taxonomic views, different # species, different names Taxonomic Backbones: Scope and completeness
Organisational value of Non-Taxonomic Lists
Approaches to remediation An inventory of different taxonomic catalogues Inform if there are concept issues for the species Provide synonymised taxon concepts with unique and resolvable identifiers Multiple classifications via checklists and catalogues accessible and utilised as organisational frameworks for species information
Summary A data publication framework that enables A complete index of all names that are tied to information about species Tools and infrastructure to support this. A complete index of verified nomenclature and a identification and resolution system to make it easy to tie a name to an authoritative record. A global taxonomic resolution system that allows a particular usage of a name to be tied to a defined taxon. A system that puts taxonomy as a global organisational framework for species information.
Inventory and Index
uBio Indexes
Web Service outputs Taxon Object
Web Service calls from client applications
Taxonomic organisation of content
Taxonomic organisation of content
Indexes support processes that support discovery
That enable new and better tools and services
Formalise the Architecture
Coordinate Communities of Interest
Summary:  GNA Objectives A complete index of names tied to information about species reconciled to a common and verified nomenclatural dictionary. This same dictionary forms the basis for multiple expressions of taxonomic catalogues, regional checklists, and thematic lists of species. These lists are openly accessible and tied to services and processes that enable them to be effectively employed in data organisation and retrieval. Collectively, these components serve the delivery and utilisation of biological knowledge.
Thank you [email_address] Skype:dremsen

More Related Content

PPT
Global Names Architecture - Remsen
PPT
Andrew Polaszek - ZooBank: ICZN’s open-access web-based register of all new a...
PPT
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
PPT
Plant names: Obstacles and Solutions to access information about plants
PPTX
10 years of global biodiversity databases: are we there yet?
PPT
Nigel J. Robinson - ZooBank and Zoological Record - a partnership for success
PPT
Publishing Germplasm Vocabularies as Linked Data
PPT
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Global Names Architecture - Remsen
Andrew Polaszek - ZooBank: ICZN’s open-access web-based register of all new a...
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
Plant names: Obstacles and Solutions to access information about plants
10 years of global biodiversity databases: are we there yet?
Nigel J. Robinson - ZooBank and Zoological Record - a partnership for success
Publishing Germplasm Vocabularies as Linked Data
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...

What's hot (11)

PPTX
Tony Rees IRMNG 2015 presentation
PPTX
Fbip specify2015
PPTX
Two graphs, three responses
PPTX
Root words
PPT
Shorthouse
PPT
Nomenclature for the Future: The power and challenges for stable and sensible...
PDF
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
PPTX
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
PPT
Sharing information between projects
PPT
Surfacing the deep data of taxonomy
PPTX
Week 6 - PPT
Tony Rees IRMNG 2015 presentation
Fbip specify2015
Two graphs, three responses
Root words
Shorthouse
Nomenclature for the Future: The power and challenges for stable and sensible...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Sharing information between projects
Surfacing the deep data of taxonomy
Week 6 - PPT
Ad

Viewers also liked (20)

PPTX
Greater Mekong Biodiversity
PPT
Madagascar biodiversity
PPT
Tropical rainforest
PPT
Biodiversity presentation copy
PPT
Peatlands, Climate Change & Biodiversity
PDF
Biome presentation
PPTX
Biodiversity value and threats
PPTX
Presentation of biodiversity
PDF
PPT
Being a Good Data Provider, by Alastair Dunning
PDF
PDF
Actian Vectorwise Brochure
PDF
A Guide to the Water Needs of Landscape Plants
PDF
Acalmar
PDF
PPTX
Adrenal disorder
PDF
Bondia.cat 28/02/2013
PDF
Guía del Comercio electrónico 2012
PPTX
Achernar
Greater Mekong Biodiversity
Madagascar biodiversity
Tropical rainforest
Biodiversity presentation copy
Peatlands, Climate Change & Biodiversity
Biome presentation
Biodiversity value and threats
Presentation of biodiversity
Being a Good Data Provider, by Alastair Dunning
Actian Vectorwise Brochure
A Guide to the Water Needs of Landscape Plants
Acalmar
Adrenal disorder
Bondia.cat 28/02/2013
Guía del Comercio electrónico 2012
Achernar
Ad

Similar to Remsen Lect04 (20)

PPT
Semantic Technologies at FAO
PPTX
Cataloging Taxonomic Data
PPT
Ontology development and use for efficient information input and retrieval
PPT
Ontology development and use for efficient information input and retrieval
PPT
Tony Rees: Towards a Hierarchical Classification of All Life
PDF
20140623 swets agosti_final
PPT
Mapping Biodiversity - The Atlas of Living Australia
PPT
Learning science vocabulary through knowledge of Greek and Latin roots
PPTX
Grasses Online - Scratchpads for Poaceae
PDF
A Step Towards (From) Read to Write Access to Taxonomic Publications
PPTX
Concept-based taxonomic reconciliation
PDF
taxonomy.pdf good ppt read it is good one
PPTX
2014.04.01 Shorthouse REDM400
PPT
Murtha Baca
PPT
Special Libraries Associatin
PDF
BioDIP - a proposed infrastructure to link the taxonomic to the genomic and o...
PPT
Preeti singh12072017
PDF
Bi 2005 20
PPT
iPlant Tree of Life
PDF
PhyloCode2b[1].pdf
Semantic Technologies at FAO
Cataloging Taxonomic Data
Ontology development and use for efficient information input and retrieval
Ontology development and use for efficient information input and retrieval
Tony Rees: Towards a Hierarchical Classification of All Life
20140623 swets agosti_final
Mapping Biodiversity - The Atlas of Living Australia
Learning science vocabulary through knowledge of Greek and Latin roots
Grasses Online - Scratchpads for Poaceae
A Step Towards (From) Read to Write Access to Taxonomic Publications
Concept-based taxonomic reconciliation
taxonomy.pdf good ppt read it is good one
2014.04.01 Shorthouse REDM400
Murtha Baca
Special Libraries Associatin
BioDIP - a proposed infrastructure to link the taxonomic to the genomic and o...
Preeti singh12072017
Bi 2005 20
iPlant Tree of Life
PhyloCode2b[1].pdf

Recently uploaded (20)

PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPT
What is a Computer? Input Devices /output devices
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PDF
Architecture types and enterprise applications.pdf
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Five Habits of High-Impact Board Members
PDF
Hybrid model detection and classification of lung cancer
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
CloudStack 4.21: First Look Webinar slides
PPTX
The various Industrial Revolutions .pptx
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Enhancing emotion recognition model for a student engagement use case through...
Assigned Numbers - 2025 - Bluetooth® Document
sustainability-14-14877-v2.pddhzftheheeeee
What is a Computer? Input Devices /output devices
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
WOOl fibre morphology and structure.pdf for textiles
Web Crawler for Trend Tracking Gen Z Insights.pptx
Architecture types and enterprise applications.pdf
Developing a website for English-speaking practice to English as a foreign la...
DP Operators-handbook-extract for the Mautical Institute
Taming the Chaos: How to Turn Unstructured Data into Decisions
Hindi spoken digit analysis for native and non-native speakers
Five Habits of High-Impact Board Members
Hybrid model detection and classification of lung cancer
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
1 - Historical Antecedents, Social Consideration.pdf
A novel scalable deep ensemble learning framework for big data classification...
CloudStack 4.21: First Look Webinar slides
The various Industrial Revolutions .pptx
From MVP to Full-Scale Product A Startup’s Software Journey.pdf

Remsen Lect04

  • 1. GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen, Senior Programme Officer, GBIF 15 September 2009, Biodiversity Informatics WWW.GBIF.ORG Global Names Architecture A Rationale Brief History Components
  • 2. Biodiversity Information: A focus on taxa All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge. - Grimaldi & Engel, 2005, Evolution of the Insects Biodiversity Informatics: Creation, Curation, Discovery, Delivery of biodiversity information
  • 3. A name that serves as a link to what has been learned in the past… From T.E. Glover, The Fishes of Southwestern Japan, c.1870
  • 4. A name that serves as a link to what has been learned in the past… Unlike many other domains of science, historic publications have continued importance.
  • 5. … and that we today add to the body of knowledge. From T.E. Glover, The Fishes of Southwestern Japan, c.1870
  • 6. GBIF index 177 million records (> 5%/month) G igabytes of text (~100 now) All data mobilized through GBIF
  • 7. Biodiversity Information Species information “tied” to scientific names
  • 8. T he “Names Problem” Not Stable 5-10% names invalidated/decade Not unique No complete list of names No complete list of species No agreement on how many Even within a single group Impacts discovery and access of information about species
  • 9. T he “Names Problem” Properties of Names Orthographic (As labels of text that are “tied” to information about species) Nomenclature (As the core “words” of taxonomy that tie a name to a original publication and type) Taxonomy (As components of taxon definitions derived via authoritative taxonomic rigor)
  • 10. Orthography Orthography and the Names Problem Objectives for Remediation
  • 11. Variations in name spelling Loligo pealeii Loligo pealii Loligo pealei
  • 12. Some names are more hard to spell than others Actinobacillus actimomycetemcomitans Actinobacillus actimycetemcomitans Actinobacillus actinmycetemcomitans Actinobacillus actinomicetemcomitans Actinobacillus actinomy Actinobacillus actinomyce Actinobacillus actinomycemcomitans Actinobacillus actinomyceremcomitans Actinobacillus actinomycetam Actinobacillus actinomycetamcomitans Actinobacillus actinomycetecomitans Actinobacillus actinomycetemcmitans Actinobacillus actinomycetemcomintans Actinobacillus actinomycetemcomitance Actinobacillus actinomycetemcomitans Actinobacillus actinomycetemcomitants Actinobacillus actinomycetemcommitans Actinobacillus actinomycetemocimitans Actinobacillus actinomycetencomitans Actinobacillus actinomycetum Actinobacillus actinomyctemcomitans Actinobacillus actinomyectomcomitans Actinobacillus actinomyetemcomitans Actinobacillus actinonmycetemcomitans Actinobacillus actionomycetemcomitans Actinobacillus actynomicetemcomitans Actinobacillus antinomycetemcomitans Difficulties with Latinized Names Transcription errors Which one is the correct one?
  • 13. Agalinus paupercula borealis Agalinus pauperculum borealis Agalinis paupercula var. Borealis Agalinus pauperculum var. borealis Agalinus paupercula var. borealis Agalinus paupercula var. borealis Pennell Agalinus paupercula Britton var. borealis Pennell Agalinus paupercula (Gray) Britt. var. borealis Pennell Agalinis paupercula (A.Gray) Britton var. borealis Pennell Agalinus paupercula (Gray) Britton var. borealis (Pennell) Zenkert 1934 Gerardia paupercula borealis Gerardia paupercula var. borealis Gerardia paupercula var. borealis (Pennell) Deam Gerardia paupercula (Gray) Britt. var. borealis (Pennell) Deam Gerardia paupercula (Gray) Britt. var. borealis (Pennell) Deam Gerardia paupercula (A. Gray) Britton var. borealis (Pennell) Deam Gerardia paupercula (A. Gray) Britton subsp. borealis (Pennell) Pennell Gerardia paupercula (Gray) Britt. ssp. borealis (Pennell) Pennell Gerardia paupercula Britton ssp. borealis Pennell Many ways to correctly spell a name Should GBIF/EoL/BHL display all/one/some?
  • 14. Objectives Informatics can contribute Index names occurring in content we wish to publicise and access Develop tools to extract, catalog, and match names. Reconcile names to authoritative names sources via a common resolution path Reconcile name occurrence to taxonomic concepts via a common concept resolution path
  • 15. Nomenclature Nomenclatural aspects of the names problem. Approaches for remediating them
  • 16. Don’t pass on bad information. How can we determine the status of the names we discover in content that we serve?
  • 17. Nomenclatural changes impact search and retrieval Where can I find out these names are related? Zoological Code doesn’t track recombinations Botanical Code does.
  • 18. Nomenclatural changes impact search and retrieval
  • 19. Homonyms Peranema – the fern Peranema – the euglenid How many Peranema are there? How can I tell them apart?
  • 20. Homonyms Taxonomic context alone doesn’t tell me enough. Kingdom Phylum Class Order Family Genus Plantae Magnoliophyta Magnoliopsida Apiales Umbelliferae Oenanthe Plantae Oenanthe Oenanthe Plantae Magnoliophyta Magnoliopsida Apiales Apiaceae Oenanthe Plantae Orchidaceae Oenanthe Animalia Chordata Aves Passeriformes Muscicapidae Oenanthe Animalia Chordata Aves Passeriformes Turdidae Oenanthe Animalia Chordata Actinopterygii Perciformes Pomatomidae Pomatomus Animalia Chordata Pisces Perciformes Serranidae Pomatomus
  • 21. Approaches to remediation Consolidate the major nomenclatural databases A single nomenclatural dictionary Populate with provisionally verified records and enable open annotation Provides nomenclatural status of a name Collectively identifies all homonyms. Identifiers used in taxonomic data provide disambiguation context Ties all distinct nomenclatural combinations to the original published name. Informatics Promote global identifiers and simple resolution pathway for these data
  • 22. Taxonomy Taxonomic Examples of the Names problem Approaches for remediating them
  • 23. Taxonomic synonyms Halichondria panicea (Pallas 1776) sec Van Soest 2002 (WoRMS)
  • 24. Consequences of Splitting Taxon Concept problem: What does someone mean when they refer to P. carinii
  • 25. The Perils of Lumping Bear Lodge meadow jumping mouse. Zaphus hudsonius campestris Zaphus hudsonius preblei INCLUDES DOES NOT INCLUDE Dr. Rob Roy Ramey says Dr. Tim King says Preble’s meadow jumping mouse. What should a search for “Zaphus hudsonius campestris” return?
  • 26. Different taxonomic views, different # species, different names Taxonomic Backbones: Scope and completeness
  • 27. Organisational value of Non-Taxonomic Lists
  • 28. Approaches to remediation An inventory of different taxonomic catalogues Inform if there are concept issues for the species Provide synonymised taxon concepts with unique and resolvable identifiers Multiple classifications via checklists and catalogues accessible and utilised as organisational frameworks for species information
  • 29. Summary A data publication framework that enables A complete index of all names that are tied to information about species Tools and infrastructure to support this. A complete index of verified nomenclature and a identification and resolution system to make it easy to tie a name to an authoritative record. A global taxonomic resolution system that allows a particular usage of a name to be tied to a defined taxon. A system that puts taxonomy as a global organisational framework for species information.
  • 32. Web Service outputs Taxon Object
  • 33. Web Service calls from client applications
  • 36. Indexes support processes that support discovery
  • 37. That enable new and better tools and services
  • 40. Summary: GNA Objectives A complete index of names tied to information about species reconciled to a common and verified nomenclatural dictionary. This same dictionary forms the basis for multiple expressions of taxonomic catalogues, regional checklists, and thematic lists of species. These lists are openly accessible and tied to services and processes that enable them to be effectively employed in data organisation and retrieval. Collectively, these components serve the delivery and utilisation of biological knowledge.
  • 41. Thank you [email_address] Skype:dremsen