SlideShare a Scribd company logo
GLOBAL BIODIVERSITY INFORMATION FACILITY Tim Robertson Systems Architect September 2009 WWW.GBIF.ORG Technical Issues and Opportunities for Resource Discovery
Content A look at the past, present and future of the GBIF registry and portals for biodiversity resources discovery. Register existence Associate metadata Enable discovery through search
Registry: The past… Universal Description Discovery and Integration (UDDI) “ … XML-based registry for businesses worldwide to list themselves on the Internet …” UDDI GBIF Businesses Institutions + Services  + Collections + Service Bindings + Endpoints (DiGIR etc) + TModels + Application Schemas (DwC etc)
UDDI: Metadata  Limited by-in-large to: Contact Information (emails, addresses etc) Key-Value pairs ISO country code Endorsing node Allows for search by title, contact etc 2 levels of credit Data provenance is lost  –   lack of recognition!
Past: Search capabilities Recognising the federated search was limited, GBIF built the Data Portal (  http://data.gbif.org  ) Harvesting of resources registered in the UDDI TAPIR, DiGIR, BioCASe Rich search for individual records and resources by Darwin Core type terms (the what, where, when etc) by building indexes Limited metadata search capabilities DiGIR, BioCASe, TAPIR etc offer TECHNICAL metadata only
GBIF Network: The real scenario Challenge #1: Model the true nature of the network makeup. A graph and not a tree Multiple entity types Institutions, networks, collections, GBIF Nodes Many relationship types
Benefits: Accurate data provenance Duplicate record detection Ability to model sub networks  Opportunity: Re-use of registry for your own purposes Registry: A graph based model
Challenge #2:  Scalable deployment supporting this  reuse (99.9%, 24/7) Authentication model Identity management?  Cascading permissions?  Wiki style? Or perhaps copy the model of  ? “ Institution X requests to be associated with you.  Would you like to accept this association?” Registry: A graph based model
Challenge #2 (cont.): Who should curate? Private and community copies? Single ( scalable ) instance or  multiple masters ? Opportunity: Offering tagging (machine and human) allows for people to make use of the registry in ways we would not envision myimagebank.org : containsTypesInTaxon  =  Leiopelmatidae   Registry: A graph based model
Endpoint monitoring  http://bioguid.info/status/  (Rod Page) Provider monitoring
Enabling discoverability Combination of human authored with machine generated metadata? “ … artificial intelligence is just that; ‘ARTIFICIAL intelligence’.  For a system to feel smart to humans, you need human crafted metadata… ”
Challenge #3: If  there is agreement to improve discoverability by associating automatically generated metadata with a registered entity:  How to uniquely identify resources within the registry? Preserve existing ( multiple ) identifiers Where does one stop?  (Inventory of Taxa for example?) What services are required to enable this association? E.g. Find resource for “ DwC:collectionCode” Associating data and metadata
Existing metadata stores There are many existing resources … Identification  of the master copy is critical for success Conflict resolution  –  how do we achieve this? Complete copies or subset copies? Wikipedia style, make copies available?
Service registration To enable a service oriented architecture (SOA) workflow definition Requires the definition of  Service  endpoints Input formats Output formats Remember:
GUID Resolution Awaiting recommendation from the task group D o we envisage GBIF running a generic resolver (multiple)? Act as a cache? Include endpoint monitoring and early warning system?
Vocabulary definitions Requires consensus within the community that terms adequately describe the content. Community site for authoring vocabularies? The same applies for  extensions  to the Darwin Core The GBIF Integrated Publishing Toolkit (IPT) uses the GBRDS as the source for  extension  definition and  vocabulary  definition.
Be smart with our limited resources
Contact Web site:  http://www.gbif.org   Data portal:  http://data.gbif.org GBIF Secretariat Universitetsparken 15 2100 Copenhagen Denmark E-mail:  [email_address] Phone: +45 3532 1487

More Related Content

PDF
Wed van horik_handson_research data management
PPT
Metadata lecture 3, metadata schemes
PPT
Introduction to Metadata
PDF
Jisc Research Data Shared Service - a Samvera case study
PPT
Archives 2.0, the Archives Hub and AIM25
PPTX
Jisc Research Data Shared Service Open Repositories 2018 24x7
PPT
Alitora Innovation Networks
PPT
Metadata lecture 1, intro
Wed van horik_handson_research data management
Metadata lecture 3, metadata schemes
Introduction to Metadata
Jisc Research Data Shared Service - a Samvera case study
Archives 2.0, the Archives Hub and AIM25
Jisc Research Data Shared Service Open Repositories 2018 24x7
Alitora Innovation Networks
Metadata lecture 1, intro

What's hot (20)

PPTX
Rebecca Grant - DRI/ARA(I) Training: Introduction to EAD - Metadata and Metad...
PDF
Smart Data Applications powered by the Wikidata Knowledge Graph
PDF
Dataverse opportunities
 
PPTX
GBIF: An infrastructure for infrastructures
PPT
Exploring Linked Data
PPTX
SPC2019 - Managing Content Types in the Modern World
PPT
DataCite How To: Use the MDS
PDF
Introduction to eudat and its services
PPT
Metadata: A concept
PDF
Oregon Digital: Collaborative Hydra Development
PDF
What is SDMX-RDF?
PPTX
Scratchpads: the Virtual Research Environment for biodiversity data
PPT
The Future Of Access to Articles
PPTX
Presentation IS
PPT
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013
PPTX
Jisc Research Data Shared Service Open Repositories 2018 Paper
PPTX
ReDBox and rdmps bof
PPTX
Overview of grid computing
PDF
Digital Representation of Privacy Terms
PPTX
Sailing on the ocean of 1s and 0s
Rebecca Grant - DRI/ARA(I) Training: Introduction to EAD - Metadata and Metad...
Smart Data Applications powered by the Wikidata Knowledge Graph
Dataverse opportunities
 
GBIF: An infrastructure for infrastructures
Exploring Linked Data
SPC2019 - Managing Content Types in the Modern World
DataCite How To: Use the MDS
Introduction to eudat and its services
Metadata: A concept
Oregon Digital: Collaborative Hydra Development
What is SDMX-RDF?
Scratchpads: the Virtual Research Environment for biodiversity data
The Future Of Access to Articles
Presentation IS
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013
Jisc Research Data Shared Service Open Repositories 2018 Paper
ReDBox and rdmps bof
Overview of grid computing
Digital Representation of Privacy Terms
Sailing on the ocean of 1s and 0s
Ad

Viewers also liked (6)

PDF
Zookeyeditorial
PPTX
Chavan Finland 13082009
PPT
Elsevier1 vc
PDF
Ijms March 2005
PPT
Ebiosphere09 Vc Final
PDF
Zookeyeditorial
Chavan Finland 13082009
Elsevier1 vc
Ijms March 2005
Ebiosphere09 Vc Final
Ad

Similar to Gbrds Tech Issues Op (20)

PDF
Gbrds Summary Final July2009 (2)
PPT
Gbrds Workshop Sept09 Metadata Identifiers
PPT
Gbrds Workshop Sept09 Metadata Identifiers
PPT
Gbrds Sg
PPT
EIA Biodiversity Data Mobilisation
PPTX
GBIF registry (GBRDS), at European Nodes meeting in Alicante, Spain (10 March...
PPTX
TDWG at the University of Tasmania
PDF
Adoption of Persistent Identifiers for Biodiversity Informatics
PDF
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)
PPTX
Knowledge Organization System (KOS) for biodiversity information resources, G...
PPT
Eia Data Publishing Infra Tech March2010
PPT
TDWG_2010_Chavan_data_citation
PDF
Global Biodiversity Information Facility - 2013
PPT
If we build it will they come? BOSC2012 Keynote Goble
PPTX
Data exchange alternatives, GIGA TAG (2009)
PPT
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
PPT
Ices wgdim-may-2010
PDF
Going for GOLD - Adventures in Open Linked Metadata
PPT
Biocatalogue Talk Slides
PDF
Biogeo SDI workshop Presentation At Ogc
Gbrds Summary Final July2009 (2)
Gbrds Workshop Sept09 Metadata Identifiers
Gbrds Workshop Sept09 Metadata Identifiers
Gbrds Sg
EIA Biodiversity Data Mobilisation
GBIF registry (GBRDS), at European Nodes meeting in Alicante, Spain (10 March...
TDWG at the University of Tasmania
Adoption of Persistent Identifiers for Biodiversity Informatics
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)
Knowledge Organization System (KOS) for biodiversity information resources, G...
Eia Data Publishing Infra Tech March2010
TDWG_2010_Chavan_data_citation
Global Biodiversity Information Facility - 2013
If we build it will they come? BOSC2012 Keynote Goble
Data exchange alternatives, GIGA TAG (2009)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
Ices wgdim-may-2010
Going for GOLD - Adventures in Open Linked Metadata
Biocatalogue Talk Slides
Biogeo SDI workshop Presentation At Ogc

More from Vishwas Chavan (19)

PDF
Conceptualising Framework for Local Biodiversity Heritage Sites (LBHS): A Bio...
PDF
State Biodiversity Boards: Towards Better Governance
PPT
Exploring the future of scholarly publishing of biodiversity data
PPT
Indo norway delhi_vishwas_28_oct2011_final
PPT
Gb17_content_slides_master
PPT
Gb17 gsap-nhc
PPT
Gb17 stateofthe networkreport
PDF
Spandan, the BIP
PDF
Bioinformatics Education in India
PPT
Spnhc june-2010
PPT
Data cite
PPT
S P N H C June 2010
PDF
स्पन्दन दबिप Final 1
PPT
Gbrd Sworkshop Sept09
PPT
Ecological Society of America
PPT
Morris 17 01 Multimedia
PPT
Chavan 02 02 Gbif Small To Big
PDF
Toi News
PDF
Conceptualising Framework for Local Biodiversity Heritage Sites (LBHS): A Bio...
State Biodiversity Boards: Towards Better Governance
Exploring the future of scholarly publishing of biodiversity data
Indo norway delhi_vishwas_28_oct2011_final
Gb17_content_slides_master
Gb17 gsap-nhc
Gb17 stateofthe networkreport
Spandan, the BIP
Bioinformatics Education in India
Spnhc june-2010
Data cite
S P N H C June 2010
स्पन्दन दबिप Final 1
Gbrd Sworkshop Sept09
Ecological Society of America
Morris 17 01 Multimedia
Chavan 02 02 Gbif Small To Big
Toi News

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Machine Learning_overview_presentation.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
August Patch Tuesday
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Tartificialntelligence_presentation.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Unlocking AI with Model Context Protocol (MCP)
Encapsulation_ Review paper, used for researhc scholars
Machine Learning_overview_presentation.pptx
NewMind AI Weekly Chronicles - August'25-Week II
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
August Patch Tuesday
Building Integrated photovoltaic BIPV_UPV.pdf
Empathic Computing: Creating Shared Understanding
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Tartificialntelligence_presentation.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
A comparative analysis of optical character recognition models for extracting...
Assigned Numbers - 2025 - Bluetooth® Document
Diabetes mellitus diagnosis method based random forest with bat algorithm
Unlocking AI with Model Context Protocol (MCP)

Gbrds Tech Issues Op

  • 1. GLOBAL BIODIVERSITY INFORMATION FACILITY Tim Robertson Systems Architect September 2009 WWW.GBIF.ORG Technical Issues and Opportunities for Resource Discovery
  • 2. Content A look at the past, present and future of the GBIF registry and portals for biodiversity resources discovery. Register existence Associate metadata Enable discovery through search
  • 3. Registry: The past… Universal Description Discovery and Integration (UDDI) “ … XML-based registry for businesses worldwide to list themselves on the Internet …” UDDI GBIF Businesses Institutions + Services + Collections + Service Bindings + Endpoints (DiGIR etc) + TModels + Application Schemas (DwC etc)
  • 4. UDDI: Metadata Limited by-in-large to: Contact Information (emails, addresses etc) Key-Value pairs ISO country code Endorsing node Allows for search by title, contact etc 2 levels of credit Data provenance is lost – lack of recognition!
  • 5. Past: Search capabilities Recognising the federated search was limited, GBIF built the Data Portal ( http://data.gbif.org ) Harvesting of resources registered in the UDDI TAPIR, DiGIR, BioCASe Rich search for individual records and resources by Darwin Core type terms (the what, where, when etc) by building indexes Limited metadata search capabilities DiGIR, BioCASe, TAPIR etc offer TECHNICAL metadata only
  • 6. GBIF Network: The real scenario Challenge #1: Model the true nature of the network makeup. A graph and not a tree Multiple entity types Institutions, networks, collections, GBIF Nodes Many relationship types
  • 7. Benefits: Accurate data provenance Duplicate record detection Ability to model sub networks Opportunity: Re-use of registry for your own purposes Registry: A graph based model
  • 8. Challenge #2: Scalable deployment supporting this reuse (99.9%, 24/7) Authentication model Identity management? Cascading permissions? Wiki style? Or perhaps copy the model of ? “ Institution X requests to be associated with you. Would you like to accept this association?” Registry: A graph based model
  • 9. Challenge #2 (cont.): Who should curate? Private and community copies? Single ( scalable ) instance or multiple masters ? Opportunity: Offering tagging (machine and human) allows for people to make use of the registry in ways we would not envision myimagebank.org : containsTypesInTaxon = Leiopelmatidae Registry: A graph based model
  • 10. Endpoint monitoring http://bioguid.info/status/ (Rod Page) Provider monitoring
  • 11. Enabling discoverability Combination of human authored with machine generated metadata? “ … artificial intelligence is just that; ‘ARTIFICIAL intelligence’. For a system to feel smart to humans, you need human crafted metadata… ”
  • 12. Challenge #3: If there is agreement to improve discoverability by associating automatically generated metadata with a registered entity: How to uniquely identify resources within the registry? Preserve existing ( multiple ) identifiers Where does one stop? (Inventory of Taxa for example?) What services are required to enable this association? E.g. Find resource for “ DwC:collectionCode” Associating data and metadata
  • 13. Existing metadata stores There are many existing resources … Identification of the master copy is critical for success Conflict resolution – how do we achieve this? Complete copies or subset copies? Wikipedia style, make copies available?
  • 14. Service registration To enable a service oriented architecture (SOA) workflow definition Requires the definition of Service endpoints Input formats Output formats Remember:
  • 15. GUID Resolution Awaiting recommendation from the task group D o we envisage GBIF running a generic resolver (multiple)? Act as a cache? Include endpoint monitoring and early warning system?
  • 16. Vocabulary definitions Requires consensus within the community that terms adequately describe the content. Community site for authoring vocabularies? The same applies for extensions to the Darwin Core The GBIF Integrated Publishing Toolkit (IPT) uses the GBRDS as the source for extension definition and vocabulary definition.
  • 17. Be smart with our limited resources
  • 18. Contact Web site: http://www.gbif.org Data portal: http://data.gbif.org GBIF Secretariat Universitetsparken 15 2100 Copenhagen Denmark E-mail: [email_address] Phone: +45 3532 1487