SlideShare a Scribd company logo
http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe
Towards maintainable constraint validation
and repair for taxonomies
- The PoolParty approach
Monika Solanki
https://w3id.org/people/msolanki
@nimonika
University of Oxford
Joint work with
Christian Mader
Fraunhofer IAIS, Germany
http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe
PoolParty (SWC) Use case
PoolParty(PPT): leading commercial taxonomy
management application, authoring tool for knowledge
graphs, provides taxonomy import functionality to
interact with third party datasets
Taxonomists using PPT integrate a variety of models,
schemata, ontologies and vocabularies into their
knowledge bases.
Challenge: combining varied data sources to ensure that
these data mashups at any time conform to a set of quality
heuristics, as expected by the data processing algorithms.
monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe
Motivation
Consuming and interlinking enterprise data and openly
available data within an industry setting.
Ensuring that the interlinked datasets confirm to a set of
quality heuristics.
Interactively detecting and repairing datasets with
constraint violations.
monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe
Ensuring Data Consistency
Current - checks to ensure that the data persisted in the triple
store do not violate it’s data consistency are scattered in the
code and sometimes performed multiple times.
Requirements
Provide a mechanism to specify data constraints in a
formal way,
Identify and analyse datasets that are imported into PPT
and are a source of constraint violations.
monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe
Constraint resolution
Current - checks to ensure that the data persisted in the triple
store do not violate it’s data consistency are scattered in the
code and sometimes performed multiple times.
Requirements
Provide a validation mechanism to check for constraint
violation and evaluate this against the selected datasets.
Combine formal data constraint definitions with reusable
repair strategies that can be easily applied by end-users in
a (semi-) automatic way.
monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe
Dataset selection
SWC-generated: Datasets for which a conversion to a
PPT-compatible taxonomy has been performed by SWC
(containing 10 datasets),
Custom-generated: Datasets for which a conversion to a
PPT-compatible taxonomy has been performed by
third-party institutions (containing 9 datasets), and
Web: Datasets that are using SKOS, but for which is
currently unknown if they are compatible with PPT
(containing 7 datasets).
monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe
Constraint specification
ConceptTypeAssertion (cta):
SELECT DISTINCT ?resource WHERE {
?resource skos:broader|skos:narrower ?otherRes.
FILTER NOT EXISTS {?resource a skos:Concept}}
HierarchicalConsistency (hc):
SELECT DISTINCT ?resource WHERE {
?resource a skos:Concept
FILTER NOT EXISTS {
?resource (skos:broader|^skos:narrower)*/skos:
topConceptOf ?parent.
?parent a skos:ConceptScheme.}}
monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe
Validation using SHACL
HierarchicalConsistency (hc):
ppts:ConceptShape
a sh:Shape;
sh:scopeClass skos:Concept;
sh:property [
a sh:PropertyConstraint;
sh:predicate skos:prefLabel;
sh:minCount 1;
sh:minLength 1;
sh:datatype rdf:langString;
sh:uniqueLang true];
sh:constraint [
a sh:Constraint;
a sh:OrConstraint;
sh:shapes (ppts:ConceptHasBroaderShape ppts:
ConceptIsTopConceptShape)].
monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe
Repair strategies
AddInverseStrategy
ppts:ConceptHavingBroader
a sh:Shape;
sh:scope [
a sh:Scope;
a sh:PropertyScope ;
sh:predicate skos:broader];
sh:inverseProperty [
a sh:InversePropertyConstraint;
sh:predicate skos:narrower;
sh:minCount 1;
rs:strategy [
a rs:AddInverseStrategy]].
monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe
Implementation
SHACL implementation (TopQuadrant), Sesame, SWC
libraries ⇒ Java application
SKOS data model, Dataset file, Constraint specification ⇒
Violation report
Violation report, SKOS data model, Dataset file, Constraint
specification ⇒ Triples changeset
Not yet Optimised for runtime performance
monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe
Validation results
cta was never violated in datasets converted to PPT
taxonomies.
upl is a SKOS-level constraint, better respected by
vocabulary providers.
Violations observed across all datasets.
monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe
Validation performance
Omitted 10 datasets that contained ≤ 50000 triples.
No correlation between the dataset size and time taken to
perform the validation.
Structure of the dataset makes a difference.
monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe
Repair strategy execution performance
Repair strategy applied to a special case of the constraint
br - BidirectionalRelationsHierarical.
Only considered skos:broaderThan and
skos:narrowerThan. Did not consider owl:inverse.
Repair scales well even with larger datasets.
monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe
Summary and Conclusions
Interwoven SHACL-based data consistency specification
and validation with repair strategies.
Validation of datasets generated by PPT can be done with
reasonable performance.
Integrating repair strategies and data constraint
specification helps in building a unified, maintainable
model.
The model also plays a pivotal role in harmonizing data
and software development processes.
monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies

More Related Content

PDF
Enabling combined Software and Data engineering at Web-scale
PDF
Linked Data for Improved Vaccine Information Systems
PDF
2017 06-01-eswc2017-ug
PDF
Ekaw2014
PDF
Diversity2015
PDF
EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...
PDF
Linked data driven EPCIS Event-based Traceability across Supply chain busine...
PDF
LEAPS: A Semantic Web and Linked data framework for the Algal Biomass Domain
Enabling combined Software and Data engineering at Web-scale
Linked Data for Improved Vaccine Information Systems
2017 06-01-eswc2017-ug
Ekaw2014
Diversity2015
EPCIS Event-Based Traceability in Pharmaceutical Supply Chains via Automated ...
Linked data driven EPCIS Event-based Traceability across Supply chain busine...
LEAPS: A Semantic Web and Linked data framework for the Algal Biomass Domain

What's hot (20)

PDF
Realising the Potential of Algal Biomass Production through Semantic Web an...
PDF
Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...
PDF
Linked data driven EPCIS Event based Traceability across Supply chain busine...
PDF
From Biomass to Energy via Semantic Web and Linked data
PPTX
FAIR Computational Workflows
PPTX
FAIR Computational Workflows
PPTX
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
PPTX
ELIXIR UK Node presentation to the ELIXIR Board
PDF
MOCHA 2018 Challenge @ ESWC2018
PDF
Incremental adaptive semi-supervised fuzzy clustering for data stream classif...
PPT
Webtracks at JISC Managing Research Data Meeting
PPTX
OpenCube Workshop at eGov2015 & ePart2015 dual conference
PPTX
FAIR Computational Workflows
PPTX
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
PPTX
Delivering web-based access to data and algorithms to support computational t...
PPTX
OpenAIRE provide dashboard #OpenAIREweek2020
PDF
ICIC 2013 New Product Introductions InfoChem
PPTX
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
PDF
7th Content Providers Community Call
PPT
Data management, data sharing: the SysMO-SEEK Story
Realising the Potential of Algal Biomass Production through Semantic Web an...
Consuming Linked data in Supply Chains: Enabling data visibility via Linked P...
Linked data driven EPCIS Event based Traceability across Supply chain busine...
From Biomass to Energy via Semantic Web and Linked data
FAIR Computational Workflows
FAIR Computational Workflows
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
ELIXIR UK Node presentation to the ELIXIR Board
MOCHA 2018 Challenge @ ESWC2018
Incremental adaptive semi-supervised fuzzy clustering for data stream classif...
Webtracks at JISC Managing Research Data Meeting
OpenCube Workshop at eGov2015 & ePart2015 dual conference
FAIR Computational Workflows
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
Delivering web-based access to data and algorithms to support computational t...
OpenAIRE provide dashboard #OpenAIREweek2020
ICIC 2013 New Product Introductions InfoChem
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
7th Content Providers Community Call
Data management, data sharing: the SysMO-SEEK Story
Ad

Viewers also liked (17)

PDF
Actividades
PDF
Resumen "Mas alla hay dragones" p2
PPT
Castilla y León
PDF
COHSPrac 20153004 LumsdenJ
PPTX
Los colores
DOCX
Initial movie idea
PPTX
Manuel M.: prueba 01
PDF
Ujjwal_thorat_2016
PPTX
Ervina nolita
PPTX
Gerencia del conocimiento
PPT
Begüm kaymakçı
PDF
Myngle Spanish Course A1 L17 - Elisa Delaini
PDF
Bitacora de segundo mes
PDF
Servicios Avanzados
PDF
Reser 2017
PDF
Interoperability and Portability for Cloud Computing: A Guide
PPSX
Ecosistemas terrestres
Actividades
Resumen "Mas alla hay dragones" p2
Castilla y León
COHSPrac 20153004 LumsdenJ
Los colores
Initial movie idea
Manuel M.: prueba 01
Ujjwal_thorat_2016
Ervina nolita
Gerencia del conocimiento
Begüm kaymakçı
Myngle Spanish Course A1 L17 - Elisa Delaini
Bitacora de segundo mes
Servicios Avanzados
Reser 2017
Interoperability and Portability for Cloud Computing: A Guide
Ecosistemas terrestres
Ad

More from Monika Solanki (16)

PDF
Monika solanki-agrisemantics2021
PDF
What's in a field?
PDF
Interoperability for smart appliances in the IoT world
PDF
Design Intent Ontology presented at WOP2015
PDF
Detecting EPCIS exceptions in linked traceability streams across supply cha...
PDF
Linking transformations in EPCIS governing supply chain business processes
PDF
Open Knowledge Repositories: Enablers of Data Integration across Business Col...
PDF
Representing Supply Chain Events on the Web of Data
PDF
Reactor Pattern
PDF
Conformance To Standards: A content ontology design pattern
PDF
Building Ontologies for Algal Biomass Operations 2012
PDF
SEA: A Framework for Interactive Querying, Visualisation and Statistical Anal...
PDF
Pelagios 2011
PDF
Reconstructing the Chaine operatoire through Semantically Linked Open Data
PDF
Semantic web in Cultural Heritage and Archaeology
PDF
A Framework for transforming archaeological databases to ontological datasets
Monika solanki-agrisemantics2021
What's in a field?
Interoperability for smart appliances in the IoT world
Design Intent Ontology presented at WOP2015
Detecting EPCIS exceptions in linked traceability streams across supply cha...
Linking transformations in EPCIS governing supply chain business processes
Open Knowledge Repositories: Enablers of Data Integration across Business Col...
Representing Supply Chain Events on the Web of Data
Reactor Pattern
Conformance To Standards: A content ontology design pattern
Building Ontologies for Algal Biomass Operations 2012
SEA: A Framework for Interactive Querying, Visualisation and Statistical Anal...
Pelagios 2011
Reconstructing the Chaine operatoire through Semantically Linked Open Data
Semantic web in Cultural Heritage and Archaeology
A Framework for transforming archaeological databases to ontological datasets

Recently uploaded (20)

PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Tartificialntelligence_presentation.pptx
PPTX
Machine Learning_overview_presentation.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
TLE Review Electricity (Electricity).pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Empathic Computing: Creating Shared Understanding
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Machine learning based COVID-19 study performance prediction
PDF
August Patch Tuesday
PPTX
1. Introduction to Computer Programming.pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Tartificialntelligence_presentation.pptx
Machine Learning_overview_presentation.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Diabetes mellitus diagnosis method based random forest with bat algorithm
TLE Review Electricity (Electricity).pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
MIND Revenue Release Quarter 2 2025 Press Release
Empathic Computing: Creating Shared Understanding
Building Integrated photovoltaic BIPV_UPV.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Machine learning based COVID-19 study performance prediction
August Patch Tuesday
1. Introduction to Computer Programming.pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
Assigned Numbers - 2025 - Bluetooth® Document

Towards maintainable constraint validation and repair for taxonomies: The PoolParty approach

  • 1. http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe Towards maintainable constraint validation and repair for taxonomies - The PoolParty approach Monika Solanki https://w3id.org/people/msolanki @nimonika University of Oxford Joint work with Christian Mader Fraunhofer IAIS, Germany
  • 2. http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe PoolParty (SWC) Use case PoolParty(PPT): leading commercial taxonomy management application, authoring tool for knowledge graphs, provides taxonomy import functionality to interact with third party datasets Taxonomists using PPT integrate a variety of models, schemata, ontologies and vocabularies into their knowledge bases. Challenge: combining varied data sources to ensure that these data mashups at any time conform to a set of quality heuristics, as expected by the data processing algorithms. monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
  • 3. http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe Motivation Consuming and interlinking enterprise data and openly available data within an industry setting. Ensuring that the interlinked datasets confirm to a set of quality heuristics. Interactively detecting and repairing datasets with constraint violations. monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
  • 4. http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe Ensuring Data Consistency Current - checks to ensure that the data persisted in the triple store do not violate it’s data consistency are scattered in the code and sometimes performed multiple times. Requirements Provide a mechanism to specify data constraints in a formal way, Identify and analyse datasets that are imported into PPT and are a source of constraint violations. monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
  • 5. http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe Constraint resolution Current - checks to ensure that the data persisted in the triple store do not violate it’s data consistency are scattered in the code and sometimes performed multiple times. Requirements Provide a validation mechanism to check for constraint violation and evaluate this against the selected datasets. Combine formal data constraint definitions with reusable repair strategies that can be easily applied by end-users in a (semi-) automatic way. monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
  • 6. http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe Dataset selection SWC-generated: Datasets for which a conversion to a PPT-compatible taxonomy has been performed by SWC (containing 10 datasets), Custom-generated: Datasets for which a conversion to a PPT-compatible taxonomy has been performed by third-party institutions (containing 9 datasets), and Web: Datasets that are using SKOS, but for which is currently unknown if they are compatible with PPT (containing 7 datasets). monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
  • 7. http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe Constraint specification ConceptTypeAssertion (cta): SELECT DISTINCT ?resource WHERE { ?resource skos:broader|skos:narrower ?otherRes. FILTER NOT EXISTS {?resource a skos:Concept}} HierarchicalConsistency (hc): SELECT DISTINCT ?resource WHERE { ?resource a skos:Concept FILTER NOT EXISTS { ?resource (skos:broader|^skos:narrower)*/skos: topConceptOf ?parent. ?parent a skos:ConceptScheme.}} monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
  • 8. http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe Validation using SHACL HierarchicalConsistency (hc): ppts:ConceptShape a sh:Shape; sh:scopeClass skos:Concept; sh:property [ a sh:PropertyConstraint; sh:predicate skos:prefLabel; sh:minCount 1; sh:minLength 1; sh:datatype rdf:langString; sh:uniqueLang true]; sh:constraint [ a sh:Constraint; a sh:OrConstraint; sh:shapes (ppts:ConceptHasBroaderShape ppts: ConceptIsTopConceptShape)]. monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
  • 9. http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe Repair strategies AddInverseStrategy ppts:ConceptHavingBroader a sh:Shape; sh:scope [ a sh:Scope; a sh:PropertyScope ; sh:predicate skos:broader]; sh:inverseProperty [ a sh:InversePropertyConstraint; sh:predicate skos:narrower; sh:minCount 1; rs:strategy [ a rs:AddInverseStrategy]]. monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
  • 10. http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe Implementation SHACL implementation (TopQuadrant), Sesame, SWC libraries ⇒ Java application SKOS data model, Dataset file, Constraint specification ⇒ Violation report Violation report, SKOS data model, Dataset file, Constraint specification ⇒ Triples changeset Not yet Optimised for runtime performance monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
  • 11. http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe Validation results cta was never violated in datasets converted to PPT taxonomies. upl is a SKOS-level constraint, better respected by vocabulary providers. Violations observed across all datasets. monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
  • 12. http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe Validation performance Omitted 10 datasets that contained ≤ 50000 triples. No correlation between the dataset size and time taken to perform the validation. Structure of the dataset makes a difference. monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
  • 13. http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe Repair strategy execution performance Repair strategy applied to a special case of the constraint br - BidirectionalRelationsHierarical. Only considered skos:broaderThan and skos:narrowerThan. Did not consider owl:inverse. Repair scales well even with larger datasets. monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies
  • 14. http://aligned-project.eu COLD@ISWC, 18th October 2016, Kobe Summary and Conclusions Interwoven SHACL-based data consistency specification and validation with repair strategies. Validation of datasets generated by PPT can be done with reasonable performance. Integrating repair strategies and data constraint specification helps in building a unified, maintainable model. The model also plays a pivotal role in harmonizing data and software development processes. monika.solanki@cs.ox.ac.uk, @nimonika Constraint validation and repair for taxonomies