SlideShare a Scribd company logo
Validata: A tool for testing
profile conformance
Alasdair J G Gray
Heriot-Watt University
www.macs.hw.ac.uk/~ajg33
A.J.G.Gray@hw.ac.uk
@gray_alasdair
Andrew Beveridge
Jacob Baungard Hansen
Johnny Val
Leif Gehrmann
Roisin Farmer
Sunil Khutan
Tomas Robertson
HCLS Dataset Descriptions
https://www.w3.org/TR/hcls-dataset/
Dumontier M, Gray AJG, Marshall MS, et al. (2016) The health care
and life sciences community profile for dataset descriptions.
PeerJ 4:e2331 https://doi.org/10.7717/peerj.2331
1 December 2016
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
2
Requirements
• Online tool
– Deployable on W3C
server
– GUI
– API
• Support multiple
constraints
– Properties
– Data values
– …
• Requirement levels
– Different levels of
user messages:
Error, Warning,
Information
• Configurable
– HCLS (Required)
– DCAT, Open
PHACTS, etc
(Optional)
1 December 2016
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
3
Example Constraint
1 December 2016 4
• Shape
• A Dataset
– MUST be declared to be of type dctype:Dataset
– MUST have a dcterms:title as a language typed
string
– MUST NOT have dcterms:created date
<Dataset> rdf:langString
.
✗
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
Dates are associated
with versions in HCLS
Example Validation
1 December 2016 5
<Dataset> rdf:langString
.
✗
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
• Shape
• Data
Example Validation
• Shape
• Data
1 December 2016 6
<Dataset> rdf:langString
.
✗
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
Example Validation
1 December 2016 7
<Dataset> rdf:langString
.
✗
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
• Shape
• Data
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
Shape
1 December 2016 8
<Dataset> rdf:langString
.
✗
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
Shape Expressions (ShEx)
1 December 2016 9
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
ShEx: Validation
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
Validator can’t warn of
missing property
Example data
<Dataset> {
`MUST` rdf:type (dctypes:Dataset),
`MUST` dct:title rdf:langString,
`MAY` dct:alternative rdf:langString+,
`MUST` !dct:created .
}
Shape
1 December 2016 10
<Dataset> rdf:langString
.
✗
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
Requirement Levels
Validator can warn of
missing property
Implementation
Validata
• Web app front end
• Javascript + HTML
• Relies on ShEx-validator
– Validates documents
– Returns report
https://github.com/HW-
SWeL/Validata
ShEx-validator
• Validation system
• Validation API
• Javascript
– nodejs engine
• Reuses
– n3: RDF Library
– ShExParser
https://github.com/HW-
SWeL/ShEx-validator
1 December 2016
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
11
http://hw-swel.github.io/Validata/
VALIDATA DEMO
Validata
https://github.com/HW-SWeL/Validata
• RDF constraint validation tool
– Configurable to any profile
• Shape Expression (ShEx) constraints
• Open source javascript implementation
www.macs.hw.ac.uk/~ajg33/
A.J.G.Gray@hw.ac.uk
@gray_alasdair

More Related Content

PDF
The DATS model: datasets descriptions for data discovery in DataMed
PPTX
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
PPTX
Supporting Dataset Descriptions in the Life Sciences
PPTX
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
PPTX
An Identifier Scheme for the Digitising Scotland Project
PDF
The DataTags System: Sharing Sensitive Data with Confidence
ODP
2009 0807 Lod Gmod
PDF
Metadata as Linked Data for Research Data Repositories
The DATS model: datasets descriptions for data discovery in DataMed
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
Supporting Dataset Descriptions in the Life Sciences
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
An Identifier Scheme for the Digitising Scotland Project
The DataTags System: Sharing Sensitive Data with Confidence
2009 0807 Lod Gmod
Metadata as Linked Data for Research Data Repositories

What's hot (20)

PDF
Data curation at Dryad Digital Repository: A former curator's perspective
PDF
Citations needed for the sum of all human knowledge: Wikidata as the missing ...
PDF
Data Repositories Impact
PDF
Verifiable, linked open knowledge that anyone can edit
PDF
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
PPTX
Using Funding Data
PDF
Yosemite part-4 webinar-final
PDF
Introduction-and-RDF-Representation-of-FHIR-for-Clinical-Data
PDF
Connecting Dataverse with the Research Life Cycle
PDF
DataTags, The Tags Toolset, and Dataverse Integration
PDF
Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...
PDF
ORCID: An Overview - Alice Meadows
PDF
FundRef Webinar
PPTX
BibBase Linked Data Triplification Challenge 2010 Presentation
PPTX
Wikidata and the Semantic Web of Food
PDF
Who is using your metadata - Ginny Hendricks
PDF
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
PDF
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
PPTX
Creating Incentives
PPTX
Open semantic chemical structures
Data curation at Dryad Digital Repository: A former curator's perspective
Citations needed for the sum of all human knowledge: Wikidata as the missing ...
Data Repositories Impact
Verifiable, linked open knowledge that anyone can edit
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Using Funding Data
Yosemite part-4 webinar-final
Introduction-and-RDF-Representation-of-FHIR-for-Clinical-Data
Connecting Dataverse with the Research Life Cycle
DataTags, The Tags Toolset, and Dataverse Integration
Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...
ORCID: An Overview - Alice Meadows
FundRef Webinar
BibBase Linked Data Triplification Challenge 2010 Presentation
Wikidata and the Semantic Web of Food
Who is using your metadata - Ginny Hendricks
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
Creating Incentives
Open semantic chemical structures
Ad

More from Alasdair Gray (19)

PPTX
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
PPTX
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
PPTX
Open PHACTS: The Data Today
PPTX
Project X
PPTX
Data Integration in a Big Data Context: An Open PHACTS Case Study
PPTX
Data Integration in a Big Data Context
PPTX
Data Linkage
PPTX
Scientific lenses to support multiple views over linked chemistry data
PPTX
Scientific Lenses over Linked Data An approach to support multiple integrate...
PPTX
Describing Scientific Datasets: The HCLS Community Profile
PPTX
SensorBench
PPTX
Data Science meets Linked Data
PPTX
Sensors and Big Data for Health and Well-being
PPTX
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
PPTX
Dataset Descriptions in Open PHACTS and HCLS
PPTX
Computing Identity Co-Reference Across Drug Discovery Datasets
PPTX
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
PPTX
Including Co-Referent URIs in a SPARQL Query
PPTX
2013 01-14 ops-dataset_descriptions
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Open PHACTS: The Data Today
Project X
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context
Data Linkage
Scientific lenses to support multiple views over linked chemistry data
Scientific Lenses over Linked Data An approach to support multiple integrate...
Describing Scientific Datasets: The HCLS Community Profile
SensorBench
Data Science meets Linked Data
Sensors and Big Data for Health and Well-being
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Dataset Descriptions in Open PHACTS and HCLS
Computing Identity Co-Reference Across Drug Discovery Datasets
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Including Co-Referent URIs in a SPARQL Query
2013 01-14 ops-dataset_descriptions
Ad

Recently uploaded (20)

PPTX
Spectroscopy.pptx food analysis technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
Teaching material agriculture food technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Cloud computing and distributed systems.
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation theory and applications.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
Spectroscopy.pptx food analysis technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Teaching material agriculture food technology
MYSQL Presentation for SQL database connectivity
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Mobile App Security Testing_ A Comprehensive Guide.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Encapsulation_ Review paper, used for researhc scholars
Cloud computing and distributed systems.
cuic standard and advanced reporting.pdf
Encapsulation theory and applications.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
“AI and Expert System Decision Support & Business Intelligence Systems”
20250228 LYD VKU AI Blended-Learning.pptx

Validata: A tool for testing profile conformance

  • 1. Validata: A tool for testing profile conformance Alasdair J G Gray Heriot-Watt University www.macs.hw.ac.uk/~ajg33 A.J.G.Gray@hw.ac.uk @gray_alasdair Andrew Beveridge Jacob Baungard Hansen Johnny Val Leif Gehrmann Roisin Farmer Sunil Khutan Tomas Robertson
  • 2. HCLS Dataset Descriptions https://www.w3.org/TR/hcls-dataset/ Dumontier M, Gray AJG, Marshall MS, et al. (2016) The health care and life sciences community profile for dataset descriptions. PeerJ 4:e2331 https://doi.org/10.7717/peerj.2331 1 December 2016 @gray_alasdair www.macs.hw.ac.uk/~ajg33 2
  • 3. Requirements • Online tool – Deployable on W3C server – GUI – API • Support multiple constraints – Properties – Data values – … • Requirement levels – Different levels of user messages: Error, Warning, Information • Configurable – HCLS (Required) – DCAT, Open PHACTS, etc (Optional) 1 December 2016 @gray_alasdair www.macs.hw.ac.uk/~ajg33 3
  • 4. Example Constraint 1 December 2016 4 • Shape • A Dataset – MUST be declared to be of type dctype:Dataset – MUST have a dcterms:title as a language typed string – MUST NOT have dcterms:created date <Dataset> rdf:langString . ✗ @gray_alasdair www.macs.hw.ac.uk/~ajg33 Dates are associated with versions in HCLS
  • 5. Example Validation 1 December 2016 5 <Dataset> rdf:langString . ✗ @gray_alasdair www.macs.hw.ac.uk/~ajg33 • Shape • Data
  • 6. Example Validation • Shape • Data 1 December 2016 6 <Dataset> rdf:langString . ✗ @gray_alasdair www.macs.hw.ac.uk/~ajg33
  • 7. Example Validation 1 December 2016 7 <Dataset> rdf:langString . ✗ @gray_alasdair www.macs.hw.ac.uk/~ajg33 • Shape • Data
  • 8. <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } Shape 1 December 2016 8 <Dataset> rdf:langString . ✗ @gray_alasdair www.macs.hw.ac.uk/~ajg33 Shape Expressions (ShEx)
  • 9. 1 December 2016 9 @gray_alasdair www.macs.hw.ac.uk/~ajg33 ShEx: Validation <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } Validator can’t warn of missing property Example data
  • 10. <Dataset> { `MUST` rdf:type (dctypes:Dataset), `MUST` dct:title rdf:langString, `MAY` dct:alternative rdf:langString+, `MUST` !dct:created . } Shape 1 December 2016 10 <Dataset> rdf:langString . ✗ @gray_alasdair www.macs.hw.ac.uk/~ajg33 Requirement Levels Validator can warn of missing property
  • 11. Implementation Validata • Web app front end • Javascript + HTML • Relies on ShEx-validator – Validates documents – Returns report https://github.com/HW- SWeL/Validata ShEx-validator • Validation system • Validation API • Javascript – nodejs engine • Reuses – n3: RDF Library – ShExParser https://github.com/HW- SWeL/ShEx-validator 1 December 2016 @gray_alasdair www.macs.hw.ac.uk/~ajg33 11
  • 13. Validata https://github.com/HW-SWeL/Validata • RDF constraint validation tool – Configurable to any profile • Shape Expression (ShEx) constraints • Open source javascript implementation www.macs.hw.ac.uk/~ajg33/ A.J.G.Gray@hw.ac.uk @gray_alasdair

Editor's Notes

  • #3: Motivation: how do we check descriptions conform? Summary level: time unchanging information, e.g. name, description, publisher Version level: version specific information, e.g. version number, creator, etc Distribution level: file specific information, e.g. file location and format, number of triples 18 vocabularies: DCTerms, DCAT, VoID, FOAF, … 61 prescribed properties: MUST, SHOULD, MAY, MUST NOT for each level
  • #4: Link into data publishing pipeline via API Not tied to HCLS, only a motivation No existing tool meets these needs
  • #5: Constraints form a graph pattern that data must comply with
  • #6: How do we validate that our example data conforms to a certain shape Express expected shape as ShEx Toy example, what about for real
  • #7: How do we validate that our example data conforms to a certain shape Express expected shape as ShEx Toy example, what about for real
  • #8: How do we validate that our example data conforms to a certain shape Express expected shape as ShEx Toy example, what about for real
  • #9: ShEx: Concise notation regex based W3C SHACL not stable when work done ShEx is an implementation of SHACL with extra features
  • #10: Step through validation process
  • #11: Extended ShEx to allow arbitrary hierarchies Toy example, what about for real
  • #12: ShEx-validator has other dependencies too Minimist: arguments parser Promise: call backs Pegjs: parser generator Mocha: test driven development