SlideShare a Scribd company logo
A continuously updated All Genera Index: an achievable goal for Biodiversity Informatics? Tony Rees – CSIRO Marine and Atmospheric Research, Australia TDWG Conference, October 2011
Why an All Genera Index? All-species index(es) will take time to complete, all-genera potentially more tractable: ~10x smaller task (~2m valid species, maybe 250k genera) leverage off existing genus-level compilations e.g. ING for plant names, Nomenclator Zoologicus for legacy animals, maybe ZooBank for future animal names, IPNI/others for plants prokaryote, virus names also well curated and accessible Aim for horizontal coverage first (no missing tax. sectors, also include both extant + fossil names), vertical completeness e.g. to species level can be secondary consideration Can carry the burden of tax. assignments – then species merely need to be attached to the correct genus instance Genera can have significant nomenclatural and taxonomic interest i.e. valid vs. invalid names, author / year and place of publication (i.e. original work), genus-level synonyms and homonyms Can carry other attributes / assertions e.g. all species have trait “x”, occur in habitat “y”, within geological range “z” Tony Rees: Continuously Updated All Genera Index
Continuing a distinguished tradition… Tony Rees: Continuously Updated All Genera Index D. Patterson, Nature, 2003 Remsen & Patterson, TDWG, 2007 D. Remsen, in “The Linnaean Ark”, 2010
Different use cases, different approaches Remsen / Patterson / uBio approach (if correctly understood) Assemble largest possible list of taxonomic names from multiple sources / provenance, reconciliation / deduplication / assignment to tax. hierarchy is subsequent activity Main initial use case is for information retrieval / query expansion (multiple variants of name authorship are seen as valuable) Author / OBIS interest and approach Starting point is a tax. hierarchy (kingdom through family), all names must live in this structure Names from “trusted sources” given precedence, others used sparingly and subject to additional verification, multiple variants of name authorship are rationalized to single preferred version Important focus (after tax. assignment) for OBIS is on attributes, in particular marine vs. nonmarine, extant vs. fossil – i.e. use the power of the list for non-tax. as well as taxonomic purposes Linkages to primary taxonomic literature also of potential value (allows harvesting of attributes, expanded understanding of original tax. concepts, more…) Tony Rees: Continuously Updated All Genera Index
Leverage existing genus-level compilations Tony Rees: Continuously Updated All Genera Index
Leverage existing genus-level compilations Tony Rees: Continuously Updated All Genera Index (Nomenclator Zoologicus extract)
Characteristics of nomenclator-style compilations Emphasis is on nomenclatural information i.e. facts (name X was established by Y in publication Z on date D) and nomenclatural synonyms / rationale, subsequent tax. treatment (“opinions”) may or may not be included Literature citations seen as critical component (excellent!), often verified from the original – i.e. a nomenclator can be considered a proxy for the primary literature Recent / on-line nomenclators often have full citation information / reference modules (e.g. Catalog of Fishes, Index Fungorum, Systema Dipterorum, more…) ING and Nomenclator Zoologicus use the more terse “nomenclator style” or microcitation (no article title, full authorship or page range included) – less obvious for verifying/sourcing relevant attributes, or cross-linking to bibliographic lists Non-taxonomic attributes may also be included in some compilations, but not all. Tony Rees: Continuously Updated All Genera Index
Assembling the “desired” data set In practice, for the full set of desired information it may be necessary to supplement information from nomenclators with that from other sources i.e. subsequent tax. treatments and opinions, bibliographies / literature indexes, sources for attributes such as eco- and geo- characteristics Additional effort may be needed to massage supplied fragmentary / inconsistent taxonomies into a coherent whole at higher levels Higher tax. itself is a moving target too – e.g. for Angiosperms (APG, APG II, APG III…), protists, viruses and prokaryotes Information varies from readily available / well curated / comprehensive / current (for “examplar” groups) to fragmentary / out-of-date / hard-to-access / no recent overviews for others Desired level of detail is not available at genus level from current Cat. of Life, need to go to contributing GSDs, checklists, primary literature and elsewhere at this time (also to relevant sources for fossil taxa). Tony Rees: Continuously Updated All Genera Index
Author’s experience to date First “cut” in 2003-4 as names indexing operation for OBIS, ramped up in 2006 as  IRMNG , the  Interim Register of Marine and Nonmarine Genera Concept name follows ERMS, the European Register of Marine Species (now WoRMS), also including “Interim” for incomplete / provisional, but hopefully useable in its present state Initial guesstimate to complete was 3-6 months (slight underestimate!) All names sourcing and ingestion based on manual data loading at this time, would like to move to automated data feeds / updates as available in future versions Uploading initial batches of data straightforward, problems come with subsequent ones required for gap filling, i.e.: Duplicate and near-duplicate detection Genus-level homonyms are a significant issue Dealing with data conflicts – same name, different tax. opinions or orthographies for supplied information. Tony Rees: Continuously Updated All Genera Index
A portion of the IRMNG master genus table (as at Oct 2011) Tony Rees: Continuously Updated All Genera Index
High-level overview + relevant statistics for “all life” (currently possible for names, in future for valid taxa) Navigate the tax. hierarchy in any direction Generate hierarchical lists Generate alphabetic lists Sort / filter by any desired criteria, both taxonomic and non-taxonomic Generate lists of homonyms, within or across Codes Indicate current tax. hierarchy, nomenclatural / taxonomic status, and attributes (to varying degrees) for any input name Holds partial species lists for selected genus names e.g. from Cat.of Life (with permission) and elsewhere (could be developed further as desired) Indicate near match targets to any input name (“did you mean…”) – using TAXAMATCH fuzzy matching (latter also adopted by iPlant, PESI, GNI, more…) Services / views this currently supports Tony Rees: Continuously Updated All Genera Index
IRMNG-generated statistics for “all life”  (web query 6 Oct 2011) Tony Rees: Continuously Updated All Genera Index (NB, can also generate these lists as required via the web, by navigating the hierarchy, or enter the hierarchy at any level)
Current IRMNG status >450k genus names, in 17k+ families as at October 2011 (however significant subset, ~30%, still await family-level allocation) Start made on resolving genus-level synonyms on group-by-group basis, but much more to do Genus coverage considered >95% complete 1753-2003, less so for more recent data: Tony Rees: Continuously Updated All Genera Index
Some questions for this meeting Tony Rees: Continuously Updated All Genera Index Is this a worthwhile effort more generally i.e. as a community resource, cf. ongoing equivalent activities e.g. Catalogue of Life, GSDs, ITIS, PaleoDB, more… If so, where should it reside, who should manage/curate for the future To what extent can it leverage or synergise with emerging GN* activities and infrastructure To what degree can existing manual data upload / infill processes be automated How best to achieve continuing population and currency, e.g. as new names appear (~2k genera, 25k new species / yr if relevant).
Thank you Visit IRMNG at  www.obis.org.au/irmng/ Thanks to data sources and funders who have contributed to development of IRMNG to date! Contact Us Phone: 1300 363 400 or +61 3 9545 2176 Email: Tony.Rees@csiro.au  Web: www.cmar.csiro.au/datacentre/
Supplementary slide Tony Rees: Continuously Updated All Genera Index
The emerging GN* world… – which elements relevant to this task? Tony Rees: Continuously Updated All Genera Index

More Related Content

PPT
Using the Semantic Web to Support Ecoinformatics
PPTX
PDF
We ve got_issues
PPT
Global Library of Life: The Biodiversity Heritage Library
PPTX
Chestnut Resources via Hardwood Genomics Web
PPT
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
PPT
Falling Over Free Resources
PPT
The agricultural ontology service
Using the Semantic Web to Support Ecoinformatics
We ve got_issues
Global Library of Life: The Biodiversity Heritage Library
Chestnut Resources via Hardwood Genomics Web
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
Falling Over Free Resources
The agricultural ontology service

Viewers also liked (7)

PPTX
Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...
PPT
Global Names Architecture - Remsen
PDF
Classification notes for website
PPT
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
PDF
ViBRANT: linking communities and services
PPTX
Nodes Portal Toolkit primer
PDF
Chris Lyal - Taxonomy and the Web - integrating the pieces
Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...
Global Names Architecture - Remsen
Classification notes for website
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
ViBRANT: linking communities and services
Nodes Portal Toolkit primer
Chris Lyal - Taxonomy and the Web - integrating the pieces
Ad

Similar to Tony Rees: An All Genera Index (20)

PPT
Tony Rees: Towards a Hierarchical Classification of All Life
PPTX
10 years of global biodiversity databases: are we there yet?
PPT
IRMNG presentation March 2012
PPTX
Selected innovations in Biodiversity Informatics
PPTX
Tony Rees IRMNG 2015 presentation
PPTX
KOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptx
PPT
Sherborn: Lyal - Digitising legacy taxonomic literature: processes, products ...
PDF
taxonomy.pdf good ppt read it is good one
PDF
Bi 2005 20
PPTX
Cataloging Taxonomic Data
PPT
Writing The Encyclopedia Of Life (not EoL.org)
PPT
Remsen Lect04
PPT
Murtha Baca
PPT
Special Libraries Associatin
PPT
Environmental science-spring-2012
PPT
Plant taxonomy
PPTX
Introduction to Taxonomy, Components and Major Plant Taxonomist
PPTX
Franz Et Al. Using ASP to Simulate the Interplay of Taxonomic and Nomenclatur...
PDF
Taxonomic keys term paper 1
PPT
Textmining
Tony Rees: Towards a Hierarchical Classification of All Life
10 years of global biodiversity databases: are we there yet?
IRMNG presentation March 2012
Selected innovations in Biodiversity Informatics
Tony Rees IRMNG 2015 presentation
KOUSIK_GHOSHPhenetics and Cladistics2020-04-05Phenetics and Cladistics.pptx
Sherborn: Lyal - Digitising legacy taxonomic literature: processes, products ...
taxonomy.pdf good ppt read it is good one
Bi 2005 20
Cataloging Taxonomic Data
Writing The Encyclopedia Of Life (not EoL.org)
Remsen Lect04
Murtha Baca
Special Libraries Associatin
Environmental science-spring-2012
Plant taxonomy
Introduction to Taxonomy, Components and Major Plant Taxonomist
Franz Et Al. Using ASP to Simulate the Interplay of Taxonomic and Nomenclatur...
Taxonomic keys term paper 1
Textmining
Ad

Recently uploaded (20)

PDF
Mushroom cultivation and it's methods.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Getting Started with Data Integration: FME Form 101
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
August Patch Tuesday
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Tartificialntelligence_presentation.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Empathic Computing: Creating Shared Understanding
Mushroom cultivation and it's methods.pdf
Assigned Numbers - 2025 - Bluetooth® Document
cloud_computing_Infrastucture_as_cloud_p
Unlocking AI with Model Context Protocol (MCP)
Getting Started with Data Integration: FME Form 101
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Spectral efficient network and resource selection model in 5G networks
August Patch Tuesday
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
OMC Textile Division Presentation 2021.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
NewMind AI Weekly Chronicles - August'25-Week II
TLE Review Electricity (Electricity).pptx
Network Security Unit 5.pdf for BCA BBA.
Tartificialntelligence_presentation.pptx
Programs and apps: productivity, graphics, security and other tools
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Empathic Computing: Creating Shared Understanding

Tony Rees: An All Genera Index

  • 1. A continuously updated All Genera Index: an achievable goal for Biodiversity Informatics? Tony Rees – CSIRO Marine and Atmospheric Research, Australia TDWG Conference, October 2011
  • 2. Why an All Genera Index? All-species index(es) will take time to complete, all-genera potentially more tractable: ~10x smaller task (~2m valid species, maybe 250k genera) leverage off existing genus-level compilations e.g. ING for plant names, Nomenclator Zoologicus for legacy animals, maybe ZooBank for future animal names, IPNI/others for plants prokaryote, virus names also well curated and accessible Aim for horizontal coverage first (no missing tax. sectors, also include both extant + fossil names), vertical completeness e.g. to species level can be secondary consideration Can carry the burden of tax. assignments – then species merely need to be attached to the correct genus instance Genera can have significant nomenclatural and taxonomic interest i.e. valid vs. invalid names, author / year and place of publication (i.e. original work), genus-level synonyms and homonyms Can carry other attributes / assertions e.g. all species have trait “x”, occur in habitat “y”, within geological range “z” Tony Rees: Continuously Updated All Genera Index
  • 3. Continuing a distinguished tradition… Tony Rees: Continuously Updated All Genera Index D. Patterson, Nature, 2003 Remsen & Patterson, TDWG, 2007 D. Remsen, in “The Linnaean Ark”, 2010
  • 4. Different use cases, different approaches Remsen / Patterson / uBio approach (if correctly understood) Assemble largest possible list of taxonomic names from multiple sources / provenance, reconciliation / deduplication / assignment to tax. hierarchy is subsequent activity Main initial use case is for information retrieval / query expansion (multiple variants of name authorship are seen as valuable) Author / OBIS interest and approach Starting point is a tax. hierarchy (kingdom through family), all names must live in this structure Names from “trusted sources” given precedence, others used sparingly and subject to additional verification, multiple variants of name authorship are rationalized to single preferred version Important focus (after tax. assignment) for OBIS is on attributes, in particular marine vs. nonmarine, extant vs. fossil – i.e. use the power of the list for non-tax. as well as taxonomic purposes Linkages to primary taxonomic literature also of potential value (allows harvesting of attributes, expanded understanding of original tax. concepts, more…) Tony Rees: Continuously Updated All Genera Index
  • 5. Leverage existing genus-level compilations Tony Rees: Continuously Updated All Genera Index
  • 6. Leverage existing genus-level compilations Tony Rees: Continuously Updated All Genera Index (Nomenclator Zoologicus extract)
  • 7. Characteristics of nomenclator-style compilations Emphasis is on nomenclatural information i.e. facts (name X was established by Y in publication Z on date D) and nomenclatural synonyms / rationale, subsequent tax. treatment (“opinions”) may or may not be included Literature citations seen as critical component (excellent!), often verified from the original – i.e. a nomenclator can be considered a proxy for the primary literature Recent / on-line nomenclators often have full citation information / reference modules (e.g. Catalog of Fishes, Index Fungorum, Systema Dipterorum, more…) ING and Nomenclator Zoologicus use the more terse “nomenclator style” or microcitation (no article title, full authorship or page range included) – less obvious for verifying/sourcing relevant attributes, or cross-linking to bibliographic lists Non-taxonomic attributes may also be included in some compilations, but not all. Tony Rees: Continuously Updated All Genera Index
  • 8. Assembling the “desired” data set In practice, for the full set of desired information it may be necessary to supplement information from nomenclators with that from other sources i.e. subsequent tax. treatments and opinions, bibliographies / literature indexes, sources for attributes such as eco- and geo- characteristics Additional effort may be needed to massage supplied fragmentary / inconsistent taxonomies into a coherent whole at higher levels Higher tax. itself is a moving target too – e.g. for Angiosperms (APG, APG II, APG III…), protists, viruses and prokaryotes Information varies from readily available / well curated / comprehensive / current (for “examplar” groups) to fragmentary / out-of-date / hard-to-access / no recent overviews for others Desired level of detail is not available at genus level from current Cat. of Life, need to go to contributing GSDs, checklists, primary literature and elsewhere at this time (also to relevant sources for fossil taxa). Tony Rees: Continuously Updated All Genera Index
  • 9. Author’s experience to date First “cut” in 2003-4 as names indexing operation for OBIS, ramped up in 2006 as IRMNG , the Interim Register of Marine and Nonmarine Genera Concept name follows ERMS, the European Register of Marine Species (now WoRMS), also including “Interim” for incomplete / provisional, but hopefully useable in its present state Initial guesstimate to complete was 3-6 months (slight underestimate!) All names sourcing and ingestion based on manual data loading at this time, would like to move to automated data feeds / updates as available in future versions Uploading initial batches of data straightforward, problems come with subsequent ones required for gap filling, i.e.: Duplicate and near-duplicate detection Genus-level homonyms are a significant issue Dealing with data conflicts – same name, different tax. opinions or orthographies for supplied information. Tony Rees: Continuously Updated All Genera Index
  • 10. A portion of the IRMNG master genus table (as at Oct 2011) Tony Rees: Continuously Updated All Genera Index
  • 11. High-level overview + relevant statistics for “all life” (currently possible for names, in future for valid taxa) Navigate the tax. hierarchy in any direction Generate hierarchical lists Generate alphabetic lists Sort / filter by any desired criteria, both taxonomic and non-taxonomic Generate lists of homonyms, within or across Codes Indicate current tax. hierarchy, nomenclatural / taxonomic status, and attributes (to varying degrees) for any input name Holds partial species lists for selected genus names e.g. from Cat.of Life (with permission) and elsewhere (could be developed further as desired) Indicate near match targets to any input name (“did you mean…”) – using TAXAMATCH fuzzy matching (latter also adopted by iPlant, PESI, GNI, more…) Services / views this currently supports Tony Rees: Continuously Updated All Genera Index
  • 12. IRMNG-generated statistics for “all life” (web query 6 Oct 2011) Tony Rees: Continuously Updated All Genera Index (NB, can also generate these lists as required via the web, by navigating the hierarchy, or enter the hierarchy at any level)
  • 13. Current IRMNG status >450k genus names, in 17k+ families as at October 2011 (however significant subset, ~30%, still await family-level allocation) Start made on resolving genus-level synonyms on group-by-group basis, but much more to do Genus coverage considered >95% complete 1753-2003, less so for more recent data: Tony Rees: Continuously Updated All Genera Index
  • 14. Some questions for this meeting Tony Rees: Continuously Updated All Genera Index Is this a worthwhile effort more generally i.e. as a community resource, cf. ongoing equivalent activities e.g. Catalogue of Life, GSDs, ITIS, PaleoDB, more… If so, where should it reside, who should manage/curate for the future To what extent can it leverage or synergise with emerging GN* activities and infrastructure To what degree can existing manual data upload / infill processes be automated How best to achieve continuing population and currency, e.g. as new names appear (~2k genera, 25k new species / yr if relevant).
  • 15. Thank you Visit IRMNG at www.obis.org.au/irmng/ Thanks to data sources and funders who have contributed to development of IRMNG to date! Contact Us Phone: 1300 363 400 or +61 3 9545 2176 Email: Tony.Rees@csiro.au Web: www.cmar.csiro.au/datacentre/
  • 16. Supplementary slide Tony Rees: Continuously Updated All Genera Index
  • 17. The emerging GN* world… – which elements relevant to this task? Tony Rees: Continuously Updated All Genera Index