SlideShare a Scribd company logo
Use of ISOcat within CMDIMenzoWindhouwerMarc Kemps-SnijdersSue Ellen Wright
OutlineISOcat: a Data Category RegistryThe role of data categories in CMDIA glimpse of ISOcatStatus of the metadata profileA matter of trustUpcoming
ISOcat: a Data Category RegistryThe reference implementation of ISO 12620:2009Terminology and other content and language resources — Specification of data categories and management of a Data Category Registry for language resourcesA data categoryis the result of the specification of a given data fieldan elementary descriptor in a linguistic structure or an annotation scheme
Data categories and linguistic resourcespartOfSpeechLemmawrittenFormwrittenFormWord FormgrammaticalGenderlexicalTypegrammaticalGenderwordOrderLexicon1..*A (schema for a) typological databaseLexical EntryShared semantics!0..*1..*FormSense0..*A (schema for a) lexicon
Data category specificationAdministrative informationIdentifierVersionOriginJustificationStatusDescriptive informationNames, definitions, examples and explanations in various languages (English is mandatory)Application (domain) specific namesConceptual domainPossible values (per profile)Linguistic informationExamples and explanations for various languagesPossible values for various languages
The role of data categories in CMDICMD components, elements and items can have links to conceptsThese links should be resolvableto a concept descriptionThis concept description gives explicitsemanticsElements and components can use different terminology but still have common semanticsISOcat provides resolvable links to the semantic description of data categories (DCs)CMD items: simple DCsCMD elements: complex DCsCMD components: container DCs (upcoming)
Data category references in CMDI<CMD_Component name="HeadWordType">  <CMD_Element name="HeadWordType" ConceptLink="http://www.isocat.org/datcat/DC-2486">    <ValueScheme>      <enumeration>        <item ConceptLink="http://www.isocat.org/datcat/DC-286">Lemma</item>        <item ConceptLink="http://www.isocat.org/datcat/DC-2948">Word form</item>        <item ConceptLink="http://www.isocat.org/datcat/DC-350">Phrase</item>        <item ConceptLink="http://www.isocat.org/datcat/DC-1386">Sentence</item>        <item ConceptLink="http://www.isocat.org/datcat/DC-2599">Other</item>        <item ConceptLink="http://www.isocat.org/datcat/DC-2592">Unspecified</item>      </enumeration>    </ValueScheme>  </CMD_Element></CMD_Component>
A glimpse of ISOcat
Status of the metadata profileInitial set of data categories has been created (to never disappear)Any additional data categories in the pipeline?Your own components might need your own DCsTranslations for many EU languages have been addedMostly green checksDeterioration due to new but incomplete translationsISO StandardizationTDG ballot is ending this weekImplementation of standardization process is ongoingThe addition of container DCs to be linked to CMD components is planned
StandardizationDecision GroupSubmissiongroupData Category RegistryBoardThematic DomainGroupStewardshipgroupValidationEvaluationrejectedrejectedPublication
Metadata Thematic Domain Group (ballot)Chair: Peter Wittenburg (MPI)Members:Thierry Declerck (DFKI)FlorianSchiel (Munich)Erhard Hinrichs (Tübingen)Iris Vogel  (Tübingen)Claude Martin (LIMIS)Bertrand Gaiffe (ATILF)Maria Gavrilidou (ILSP)ElinaDesipri (ILSP)DaanBroeder (MPI)NellekeOostdijk (Nijmegen)Martin Wynne (Oxford)Wim Peters (Oxford)Helen Aristar-Dry (Michigan)Thorsten Trippel (Tübingen)Proposed by chairProposed by ISO member stateProposed by chair and ISO member state(based on intermediate ballot results)
Can you trust the DCR?Each, also when not standardized, data category has a Persistent IDentifier(PID)The Registration Authority of ISO 12620 is obliged to keep these PIDs resolvableThe DCR is a (core) component of many ISO TC 37 standards (TBX, TMF, LMF, LAF, …)The DCR is also a (core) registry in the CLARIN infrastructureISOcat is in beta, i.e., still actively being developed and not yet feature complete, however, core functionality is stable for everyday usage and being backed up every dayWe welcome any feedback! Contact us: isocat@mpi.nl
Can you trust a data category?Some TDGs are actively cleaning up their legacy:Metadata, Morphosyntax and TerminologyAlso at least 2 new TDGs are being establishedTranslation and Sign languageOthers are still asleep but will hopefully wakeup after the ending of the TDG ballotthe DCR development team might interact with them as closely as we do with the current active TDGs to speed them upThis means DCs are a bit in fluxGet in touch with the (private) owners (mediated mail is coming soon)Once standardized changes need to go through the standardization process, but old versions will always be available
Can you trust a private data category?There are no standardized DCs in the registry yetISO standardization of a DC is an optional stepStandardized DCs have extra safe guards, but might be slow in adapting to changes in the environmentISOcat uses a grass roots approachWork together in (adhoc)groupsInteract with owners of DCs (email, forum)Become members of TDGsThere might be some versioning policy for privately owned public DCsFreeze the English definition, i.e., a change of the semantics requires a new version (and the old version remains available)
UpcomingISOcatforumA forum per TDG/profileSend a mediated email to another ISOcat user, e.g., the owner of a DC or the chair of a TDGStandardization supportSubmit a set of DCs to a TDG for standardizationStandardization workflowSubmit a change request for a standardized DCContainer DCsDCIF import (by ISOcat system administration)ISO 639 language codesconcepts from the GOLD ontology…Relation Registry
Thank you for your attention!Visitwww.isocat.orgQuestions?isocat@mpi.nl

More Related Content

PPT
Code4Lib 2008 Metadata Registry
PPT
Metadata Cloud
PPT
ODP
Introduction to LDL 2012
PPTX
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
PPTX
Enterprise & Web based Federated Identity Management & Data Access Controls
PPT
Dublin Core In Practice
PPT
Making the Conceptual Layer Real via HTTP based Linked Data
Code4Lib 2008 Metadata Registry
Metadata Cloud
Introduction to LDL 2012
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Enterprise & Web based Federated Identity Management & Data Access Controls
Dublin Core In Practice
Making the Conceptual Layer Real via HTTP based Linked Data

What's hot (20)

PDF
Integrating Semantic Systems
DOCX
Bt0078 website design
PPTX
Unified characterisation, please
PPTX
Archival Stewardship of Email using ePADD Software
PDF
OIDF Workshop at Verizon Media -- 9/30/2019 -- Research & Education Working G...
PPT
Linked Data Driven Data Virtualization for Web-scale Integration
PPT
香港六合彩
PPT
Introduction to the Names Project
PPT
What is a DOI?
PDF
Semantics2014
PPT
Linked Data Planet Key Note
PPT
Dublin Core Metadata Initiative Abstract Model
PPT
Virtuoso Universal Server Overview
DOCX
Alti_profile
PPT
Solving Real Problems Using Linked Data
PDF
ePADD & Records Management, Society of American Archivists (SAA) Annual Meeti...
PPTX
Introduction to CrossRef Webinar
PPT
Owled2008dc Statement Of Interest
PDF
Sql Saturday 111 Atlanta applied enterprise semantic mining
PPT
Semantic Web in Action
Integrating Semantic Systems
Bt0078 website design
Unified characterisation, please
Archival Stewardship of Email using ePADD Software
OIDF Workshop at Verizon Media -- 9/30/2019 -- Research & Education Working G...
Linked Data Driven Data Virtualization for Web-scale Integration
香港六合彩
Introduction to the Names Project
What is a DOI?
Semantics2014
Linked Data Planet Key Note
Dublin Core Metadata Initiative Abstract Model
Virtuoso Universal Server Overview
Alti_profile
Solving Real Problems Using Linked Data
ePADD & Records Management, Society of American Archivists (SAA) Annual Meeti...
Introduction to CrossRef Webinar
Owled2008dc Statement Of Interest
Sql Saturday 111 Atlanta applied enterprise semantic mining
Semantic Web in Action
Ad

Similar to Use of ISOcat within CMDI (15)

PPTX
The ISO-DCR
PPTX
ISOcat: a short introduction
PPTX
ISOcat to LMF to TEI
PPTX
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...
PDF
The importance of metadata for datasets: The DCAT-AP European standard
PPTX
Taxonomy Interoperability Standards
PPTX
On the way to a Relation Registry for ISOcat data categories
PPTX
Semantic Mapping in CLARIN Component Metadata.
PPTX
ISOcat and RELcat, two cooperating semantic registries
PDF
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
PPTX
What do cats have to do with explicit semantics?
PPTX
LDL 2012 - Linking to ISOcat Data Categories
PPTX
Clarin nl odijk-final_event_2015-03-13
PDF
8. (Semantic Interoperability in the CLARIN infrastructure. Menzo Windhouwer....
PPT
Berlin 6 Open Access Conference: Stefan Weisgerber
The ISO-DCR
ISOcat: a short introduction
ISOcat to LMF to TEI
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...
The importance of metadata for datasets: The DCAT-AP European standard
Taxonomy Interoperability Standards
On the way to a Relation Registry for ISOcat data categories
Semantic Mapping in CLARIN Component Metadata.
ISOcat and RELcat, two cooperating semantic registries
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
What do cats have to do with explicit semantics?
LDL 2012 - Linking to ISOcat Data Categories
Clarin nl odijk-final_event_2015-03-13
8. (Semantic Interoperability in the CLARIN infrastructure. Menzo Windhouwer....
Berlin 6 Open Access Conference: Stefan Weisgerber
Ad

Use of ISOcat within CMDI

  • 1. Use of ISOcat within CMDIMenzoWindhouwerMarc Kemps-SnijdersSue Ellen Wright
  • 2. OutlineISOcat: a Data Category RegistryThe role of data categories in CMDIA glimpse of ISOcatStatus of the metadata profileA matter of trustUpcoming
  • 3. ISOcat: a Data Category RegistryThe reference implementation of ISO 12620:2009Terminology and other content and language resources — Specification of data categories and management of a Data Category Registry for language resourcesA data categoryis the result of the specification of a given data fieldan elementary descriptor in a linguistic structure or an annotation scheme
  • 4. Data categories and linguistic resourcespartOfSpeechLemmawrittenFormwrittenFormWord FormgrammaticalGenderlexicalTypegrammaticalGenderwordOrderLexicon1..*A (schema for a) typological databaseLexical EntryShared semantics!0..*1..*FormSense0..*A (schema for a) lexicon
  • 5. Data category specificationAdministrative informationIdentifierVersionOriginJustificationStatusDescriptive informationNames, definitions, examples and explanations in various languages (English is mandatory)Application (domain) specific namesConceptual domainPossible values (per profile)Linguistic informationExamples and explanations for various languagesPossible values for various languages
  • 6. The role of data categories in CMDICMD components, elements and items can have links to conceptsThese links should be resolvableto a concept descriptionThis concept description gives explicitsemanticsElements and components can use different terminology but still have common semanticsISOcat provides resolvable links to the semantic description of data categories (DCs)CMD items: simple DCsCMD elements: complex DCsCMD components: container DCs (upcoming)
  • 7. Data category references in CMDI<CMD_Component name="HeadWordType"> <CMD_Element name="HeadWordType" ConceptLink="http://www.isocat.org/datcat/DC-2486"> <ValueScheme> <enumeration> <item ConceptLink="http://www.isocat.org/datcat/DC-286">Lemma</item> <item ConceptLink="http://www.isocat.org/datcat/DC-2948">Word form</item> <item ConceptLink="http://www.isocat.org/datcat/DC-350">Phrase</item> <item ConceptLink="http://www.isocat.org/datcat/DC-1386">Sentence</item> <item ConceptLink="http://www.isocat.org/datcat/DC-2599">Other</item> <item ConceptLink="http://www.isocat.org/datcat/DC-2592">Unspecified</item> </enumeration> </ValueScheme> </CMD_Element></CMD_Component>
  • 8. A glimpse of ISOcat
  • 9. Status of the metadata profileInitial set of data categories has been created (to never disappear)Any additional data categories in the pipeline?Your own components might need your own DCsTranslations for many EU languages have been addedMostly green checksDeterioration due to new but incomplete translationsISO StandardizationTDG ballot is ending this weekImplementation of standardization process is ongoingThe addition of container DCs to be linked to CMD components is planned
  • 10. StandardizationDecision GroupSubmissiongroupData Category RegistryBoardThematic DomainGroupStewardshipgroupValidationEvaluationrejectedrejectedPublication
  • 11. Metadata Thematic Domain Group (ballot)Chair: Peter Wittenburg (MPI)Members:Thierry Declerck (DFKI)FlorianSchiel (Munich)Erhard Hinrichs (Tübingen)Iris Vogel (Tübingen)Claude Martin (LIMIS)Bertrand Gaiffe (ATILF)Maria Gavrilidou (ILSP)ElinaDesipri (ILSP)DaanBroeder (MPI)NellekeOostdijk (Nijmegen)Martin Wynne (Oxford)Wim Peters (Oxford)Helen Aristar-Dry (Michigan)Thorsten Trippel (Tübingen)Proposed by chairProposed by ISO member stateProposed by chair and ISO member state(based on intermediate ballot results)
  • 12. Can you trust the DCR?Each, also when not standardized, data category has a Persistent IDentifier(PID)The Registration Authority of ISO 12620 is obliged to keep these PIDs resolvableThe DCR is a (core) component of many ISO TC 37 standards (TBX, TMF, LMF, LAF, …)The DCR is also a (core) registry in the CLARIN infrastructureISOcat is in beta, i.e., still actively being developed and not yet feature complete, however, core functionality is stable for everyday usage and being backed up every dayWe welcome any feedback! Contact us: isocat@mpi.nl
  • 13. Can you trust a data category?Some TDGs are actively cleaning up their legacy:Metadata, Morphosyntax and TerminologyAlso at least 2 new TDGs are being establishedTranslation and Sign languageOthers are still asleep but will hopefully wakeup after the ending of the TDG ballotthe DCR development team might interact with them as closely as we do with the current active TDGs to speed them upThis means DCs are a bit in fluxGet in touch with the (private) owners (mediated mail is coming soon)Once standardized changes need to go through the standardization process, but old versions will always be available
  • 14. Can you trust a private data category?There are no standardized DCs in the registry yetISO standardization of a DC is an optional stepStandardized DCs have extra safe guards, but might be slow in adapting to changes in the environmentISOcat uses a grass roots approachWork together in (adhoc)groupsInteract with owners of DCs (email, forum)Become members of TDGsThere might be some versioning policy for privately owned public DCsFreeze the English definition, i.e., a change of the semantics requires a new version (and the old version remains available)
  • 15. UpcomingISOcatforumA forum per TDG/profileSend a mediated email to another ISOcat user, e.g., the owner of a DC or the chair of a TDGStandardization supportSubmit a set of DCs to a TDG for standardizationStandardization workflowSubmit a change request for a standardized DCContainer DCsDCIF import (by ISOcat system administration)ISO 639 language codesconcepts from the GOLD ontology…Relation Registry
  • 16. Thank you for your attention!Visitwww.isocat.orgQuestions?isocat@mpi.nl