SlideShare a Scribd company logo
Date: 09/06/2014
User Requirements for
Geospatial Provenance
Daniel Garijo, Andreas Harth, Yolanda Gil
Ontology Engineering Group. Universidad Politécnica de Madrid
Information Sciences Institute, University of Southern California
Institute AIFB, Karlsruhe Institute of Technology
Problem statement
Maps can integrate many different sources
•Open Street Maps
•GeoNames
•CIA World Factbook
•Etc.
Interaction to standarize
2
Outline
1. Challenges
2. Assumptions
3. Types of provenance in the geospatial domain
1. Provenance of datasets and sets of datasets
2. Provenance of objects and sets of objects
3. Provenance of properties and sets of properties
4. Other requirements related to provenance
4. Modeling geospatial provenance with PROV-O
1. Dataset level provenance
• Updating a map
2. Object level provenance
3. Property level provenance
5. Summary
6. Conclusions and Future work
3
Challenges concerning provenance
Versioning and provenance
(Map updates )
Trust based provenance Data integration and provenance
Crowdsourcing and provenance Granularity and provenance
Aggregation and provenance
4
Assumptions
Simplifying the problem…
•The entities across datasets have been mapped.
•The datasets share the same data model and vocabulary.
•Each dataset contains objects with unique identifiers.
•The integrated map is going to be presented to a user who is interested in
using the information for some purpose.
5
Summary
1. Challenges
2. Assumptions
3. Types of provenance in the geospatial domain
1. Provenance of datasets and sets of datasets
2. Provenance of objects and sets of objects
3. Provenance of properties and sets of properties
4. Other requirements related to provenance
4. Modeling geospatial provenance with PROV-O
1. Dataset level provenance
• Updating a map
2. Object level provenance
3. Property level provenance
5. Summary
6. Conclusions and Future work
6
Types of provenance: Provenance of Datasets and sets of Datasets
Provenance of a map…
•Sources used to create the map
•Creator of the map
•Creation process used (algorithms, etc.)
•Recent changes of the map
•Reason why the map has been updated
Browsing different versions of a map…
•Most recent maps
•Maps from an organization
•Maps created from a version of a dataset or algorithm
Map
release
June
OSM FAO GADM
Integration June
7
Types of provenance: Provenance of Objects and sets of Objects
Objects: lower granularity entities in the map
•Original data source of the object
•Organizations responsible for the creation of the object
•Date of creation of the object
•Date of insertion of the object in the map
•Process of inclusion in the dataset
Provenance of collections of objects…
•Source of the objects of a region/area
•Objects from a specific organization
•Objects belonging to a type of source (e.g., crowdsourced map)
•Objects introduced in the last version of the map
A
B
C
bridge
stadium
intersection
8
Types of provenance: Provenance of Properties and sets of Properties
Properties: attributes of objects in a map
•Sources of the property
•Creator of the property
•Date of the creation/update of the property
•Process by which the property was added
Provenance of sets of properties…
•Properties of objects coming from one data source
•Properties of objects belonging to a crowdsourced
map
•Properties of the selected objects that have the same source
9
Source A Source B
Height: 20 m
Length: 1 km
Name: 405
Fwy overpass
Other requirements related to provenance
10
Other requirements might not be straightforward to answer…
•How did a set of manual corrections help to improve the map?
•What is new in this map?
•What objects are integrated with a high confidence?
•Why is an object not appearing?
•General highlights of the map
…but they can be addressed having provenance records
Summary
1. Challenges
2. Assumptions
3. Types of provenance in the geospatial domain
1. Provenance of datasets and sets of datasets
2. Provenance of objects and sets of objects
3. Provenance of properties and sets of properties
4. Other requirements related to provenance
4. Modeling geospatial provenance with PROV-O
1. Dataset level provenance
• Updating a map
2. Object level provenance
3. Property level provenance
5. Summary
6. Conclusions and Future work
11
Modeling provenance in the geospatial domain: PROV-O extension
Simple PROV-O extension to model the dataset level
12
Dataset Level Provenance: Example
13
Dataset integration approaches
There are different alternatives for updating a map
14
Object level provenance: scalability
15
Property level provenance
16
Asserted properties do not have URIs!
•New entities for describing their provenance
Source A Source B
:Bridge :height 20m
:Bridge :length 1 km
:Bridge :name “405 Fwy overpass”
:metadata1
:metadata2
prov:wasDerivedFrom
prov:wasDerivedFrom
Conclusions
17
Requirements and
major challenges for
geospatial
provenance
4 main categories:
•Provenance of datasets
•Provenance of objects
appearing in the map
•Provenance of
properties
•Other
Analogous
questions are
relevant for
dataset/object/prop
erty provenance in
non-geospatial
domains.
Date: 09/06/2014
User Requirements for
Geospatial Provenance
Daniel Garijo, Andreas Harth, Yolanda Gil
Ontology Engineering Group. Universidad Politécnica de Madrid
Information Sciences Institute, University of Southern California
Institute AIFB, Karlsruhe Institute of Technology

More Related Content

PPT
Os Percy
PPT
Os Racicot
PDF
Combining Textual and Graph-based Features for Entity Disambiguation
PDF
2017 ii 5_katharina_schleidt_datacovestatisticalviewer
PPTX
GIS file types
PDF
Open geo data - technical issue
PPTX
WORKS 11 Presentation
PPTX
ISI work
Os Percy
Os Racicot
Combining Textual and Graph-based Features for Entity Disambiguation
2017 ii 5_katharina_schleidt_datacovestatisticalviewer
GIS file types
Open geo data - technical issue
WORKS 11 Presentation
ISI work

Viewers also liked (14)

PPTX
Research Objects in Scientific Publications
PPTX
From Scientific Workflows to Research Objects: Publication and Abstraction of...
PPT
P-Plan
PDF
Research Objects Tutorial (TPDL)
PDF
Frag Flow: Automated Fragment Detection in Scientific Workflows
PDF
On Specifying and Sharing Scientific Workflow Optimization Results Using Rese...
PPTX
Common Motifs in Scientific Workflows: An Empirical Analysis
PPTX
Power point ses diana
PDF
Opmw
PPTX
From Scientific Workflows to Research Objects: Publication and Abstraction of...
PPTX
PROV-O Tutorial. DC-2013 Conference
PPTX
EDBT 2015: Summer School Overview
PDF
OEG tools for supporting Ontology Engineering
PDF
OPM Overview
Research Objects in Scientific Publications
From Scientific Workflows to Research Objects: Publication and Abstraction of...
P-Plan
Research Objects Tutorial (TPDL)
Frag Flow: Automated Fragment Detection in Scientific Workflows
On Specifying and Sharing Scientific Workflow Optimization Results Using Rese...
Common Motifs in Scientific Workflows: An Empirical Analysis
Power point ses diana
Opmw
From Scientific Workflows to Research Objects: Publication and Abstraction of...
PROV-O Tutorial. DC-2013 Conference
EDBT 2015: Summer School Overview
OEG tools for supporting Ontology Engineering
OPM Overview
Ad

Similar to User requirments for geospatial provenance (20)

PDF
Provenance and DataONE: Facilitating Reproducible Science
PPT
Recording and Reasoning Over Data Provenance in Web and Grid Services
PDF
Prov-O-Viz: Interactive Provenance Visualization
PPT
Provinance in scientific workflows in e science
PDF
Provenance Management to Enable Data Sharing
PDF
The Data-Intensive Visual Analytics (DIVA) project
PDF
A Sightseeing Tour of Provenance in Databases & Workflows
KEY
SWPM12 report on the dagstuhl seminar on Semantic Data Management
PDF
Simplifying Data Interoperability with Geo Addressing and Enrichment
PPTX
Provenance Aware Linked Sensor Data
PPTX
The lifecycle of reproducible science data and what provenance has got to do ...
PDF
Provenance and Trust
PDF
Works 2015-provenance-mileage
PPT
Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau
PDF
Intro To Geospatial
PPTX
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
PDF
Applications of AI in the geospatial domain
PPT
Uniting traditional GIS and mainstream IT
Provenance and DataONE: Facilitating Reproducible Science
Recording and Reasoning Over Data Provenance in Web and Grid Services
Prov-O-Viz: Interactive Provenance Visualization
Provinance in scientific workflows in e science
Provenance Management to Enable Data Sharing
The Data-Intensive Visual Analytics (DIVA) project
A Sightseeing Tour of Provenance in Databases & Workflows
SWPM12 report on the dagstuhl seminar on Semantic Data Management
Simplifying Data Interoperability with Geo Addressing and Enrichment
Provenance Aware Linked Sensor Data
The lifecycle of reproducible science data and what provenance has got to do ...
Provenance and Trust
Works 2015-provenance-mileage
Keepit Course 3: Provenance (and OPM), based on slides by Luc Moreau
Intro To Geospatial
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Applications of AI in the geospatial domain
Uniting traditional GIS and mainstream IT
Ad

More from dgarijo (20)

PDF
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
PDF
FAIR Workflows: A step closer to the Scientific Paper of the Future
PPTX
Towards Reusable Research Software
PDF
SOMEF: a metadata extraction framework from software documentation
PPTX
A Template-Based Approach for Annotating Long-Tailed Datasets
PPTX
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
PPTX
Towards Knowledge Graphs of Reusable Research Software Metadata
PPTX
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
PPTX
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
PPTX
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
PPTX
Towards Human-Guided Machine Learning - IUI 2019
PPTX
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
PPTX
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
PPTX
WIDOCO: A Wizard for Documenting Ontologies
PPTX
Towards Automating Data Narratives
PDF
Automated Hypothesis Testing with Large Scale Scientific Workflows
PDF
OntoSoft: A Distributed Semantic Registry for Scientific Software
PDF
Software Metadata: Describing "dark software" in GeoSciences
PPTX
Reproducibility Using Semantics: An Overview
PPTX
PhD Thesis: Mining abstractions in scientific workflows
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FAIR Workflows: A step closer to the Scientific Paper of the Future
Towards Reusable Research Software
SOMEF: a metadata extraction framework from software documentation
A Template-Based Approach for Annotating Long-Tailed Datasets
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
Towards Knowledge Graphs of Reusable Research Software Metadata
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
Towards Human-Guided Machine Learning - IUI 2019
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
WIDOCO: A Wizard for Documenting Ontologies
Towards Automating Data Narratives
Automated Hypothesis Testing with Large Scale Scientific Workflows
OntoSoft: A Distributed Semantic Registry for Scientific Software
Software Metadata: Describing "dark software" in GeoSciences
Reproducibility Using Semantics: An Overview
PhD Thesis: Mining abstractions in scientific workflows

Recently uploaded (20)

PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Institutional Correction lecture only . . .
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
RMMM.pdf make it easy to upload and study
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Business Ethics Teaching Materials for college
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
master seminar digital applications in india
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Institutional Correction lecture only . . .
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
RMMM.pdf make it easy to upload and study
human mycosis Human fungal infections are called human mycosis..pptx
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Business Ethics Teaching Materials for college
Final Presentation General Medicine 03-08-2024.pptx
Supply Chain Operations Speaking Notes -ICLT Program
master seminar digital applications in india
STATICS OF THE RIGID BODIES Hibbelers.pdf
Anesthesia in Laparoscopic Surgery in India
O5-L3 Freight Transport Ops (International) V1.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPH.pptx obstetrics and gynecology in nursing
Week 4 Term 3 Study Techniques revisited.pptx
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Microbial diseases, their pathogenesis and prophylaxis
Module 4: Burden of Disease Tutorial Slides S2 2025

User requirments for geospatial provenance

  • 1. Date: 09/06/2014 User Requirements for Geospatial Provenance Daniel Garijo, Andreas Harth, Yolanda Gil Ontology Engineering Group. Universidad Politécnica de Madrid Information Sciences Institute, University of Southern California Institute AIFB, Karlsruhe Institute of Technology
  • 2. Problem statement Maps can integrate many different sources •Open Street Maps •GeoNames •CIA World Factbook •Etc. Interaction to standarize 2
  • 3. Outline 1. Challenges 2. Assumptions 3. Types of provenance in the geospatial domain 1. Provenance of datasets and sets of datasets 2. Provenance of objects and sets of objects 3. Provenance of properties and sets of properties 4. Other requirements related to provenance 4. Modeling geospatial provenance with PROV-O 1. Dataset level provenance • Updating a map 2. Object level provenance 3. Property level provenance 5. Summary 6. Conclusions and Future work 3
  • 4. Challenges concerning provenance Versioning and provenance (Map updates ) Trust based provenance Data integration and provenance Crowdsourcing and provenance Granularity and provenance Aggregation and provenance 4
  • 5. Assumptions Simplifying the problem… •The entities across datasets have been mapped. •The datasets share the same data model and vocabulary. •Each dataset contains objects with unique identifiers. •The integrated map is going to be presented to a user who is interested in using the information for some purpose. 5
  • 6. Summary 1. Challenges 2. Assumptions 3. Types of provenance in the geospatial domain 1. Provenance of datasets and sets of datasets 2. Provenance of objects and sets of objects 3. Provenance of properties and sets of properties 4. Other requirements related to provenance 4. Modeling geospatial provenance with PROV-O 1. Dataset level provenance • Updating a map 2. Object level provenance 3. Property level provenance 5. Summary 6. Conclusions and Future work 6
  • 7. Types of provenance: Provenance of Datasets and sets of Datasets Provenance of a map… •Sources used to create the map •Creator of the map •Creation process used (algorithms, etc.) •Recent changes of the map •Reason why the map has been updated Browsing different versions of a map… •Most recent maps •Maps from an organization •Maps created from a version of a dataset or algorithm Map release June OSM FAO GADM Integration June 7
  • 8. Types of provenance: Provenance of Objects and sets of Objects Objects: lower granularity entities in the map •Original data source of the object •Organizations responsible for the creation of the object •Date of creation of the object •Date of insertion of the object in the map •Process of inclusion in the dataset Provenance of collections of objects… •Source of the objects of a region/area •Objects from a specific organization •Objects belonging to a type of source (e.g., crowdsourced map) •Objects introduced in the last version of the map A B C bridge stadium intersection 8
  • 9. Types of provenance: Provenance of Properties and sets of Properties Properties: attributes of objects in a map •Sources of the property •Creator of the property •Date of the creation/update of the property •Process by which the property was added Provenance of sets of properties… •Properties of objects coming from one data source •Properties of objects belonging to a crowdsourced map •Properties of the selected objects that have the same source 9 Source A Source B Height: 20 m Length: 1 km Name: 405 Fwy overpass
  • 10. Other requirements related to provenance 10 Other requirements might not be straightforward to answer… •How did a set of manual corrections help to improve the map? •What is new in this map? •What objects are integrated with a high confidence? •Why is an object not appearing? •General highlights of the map …but they can be addressed having provenance records
  • 11. Summary 1. Challenges 2. Assumptions 3. Types of provenance in the geospatial domain 1. Provenance of datasets and sets of datasets 2. Provenance of objects and sets of objects 3. Provenance of properties and sets of properties 4. Other requirements related to provenance 4. Modeling geospatial provenance with PROV-O 1. Dataset level provenance • Updating a map 2. Object level provenance 3. Property level provenance 5. Summary 6. Conclusions and Future work 11
  • 12. Modeling provenance in the geospatial domain: PROV-O extension Simple PROV-O extension to model the dataset level 12
  • 14. Dataset integration approaches There are different alternatives for updating a map 14
  • 15. Object level provenance: scalability 15
  • 16. Property level provenance 16 Asserted properties do not have URIs! •New entities for describing their provenance Source A Source B :Bridge :height 20m :Bridge :length 1 km :Bridge :name “405 Fwy overpass” :metadata1 :metadata2 prov:wasDerivedFrom prov:wasDerivedFrom
  • 17. Conclusions 17 Requirements and major challenges for geospatial provenance 4 main categories: •Provenance of datasets •Provenance of objects appearing in the map •Provenance of properties •Other Analogous questions are relevant for dataset/object/prop erty provenance in non-geospatial domains.
  • 18. Date: 09/06/2014 User Requirements for Geospatial Provenance Daniel Garijo, Andreas Harth, Yolanda Gil Ontology Engineering Group. Universidad Politécnica de Madrid Information Sciences Institute, University of Southern California Institute AIFB, Karlsruhe Institute of Technology

Editor's Notes

  • #3: This presentation is a summary of the OWS-9 y OWS-10 discussions (In the context of OGC) Maps integrate information from many resources. Normally the data integration process is automatic, although it may have some manual steps (curate data, etc). Each source may have their own properties, geometries, data, etc, but when presenting to a user just a value for each thing is shown. Maps can be updated (e.g., a new road is built), and we need to track the provenance of the information to check its authenticity. This work summarizes the discussions with researchers and practitioners at several meetings and workshops on geospatial data. This effort is also of great importance for the community, as there is an ongoing effort on standarizing how to link entities in geospatial data (OGC and W3C)
  • #4: Given the previous problem, in this presentation we will show the challenges derived from the problem, A set of assumptions to simplify the integration scenario, the types of provenance that we can find on it, How to model it with PROV and the conclusions and future work.
  • #5: Trust based provenance: If a map is created from many datasets, we need to know if that dataset is a trusted one or not. Data integration and provenance: knowing which data came from each dataset can be very relevant to understand why a map is the way it is. Crowdsourcing and provenance: Some datasets like OSM depend on the data provided by users. It is key to know who contributed in what to assess its quality Granularity and provenance: different datasets provide different levels of granularity. A geographical feature can be a point, line or 3d area. Aggregation and provenance: maps are aggregations of features from other sources. Versioning and provenance: map updates
  • #6: Given the heterogeneity of the data, in this first approach to the problem we decided to simplify it. In a nutshell, what we assume is that the datasets are using the same model and that the entities across different datasets have been mapped. This is unrealistic, as it is a great effort. However, the W3C and OGC are already talking on how to align existent approaches to make a standard. We do this to be able to tackle and describe the main challenges regarding provenance in this scenario.
  • #7: Next I’ll talk about the types of provenance that we can find in the geospatial domain.
  • #8: Types of provenance: provenance of datasets. This is the most typical one, as it aims to describe the main features of a map: which sources were used, which process led to its creation, what are the changes made to the map, etc. A map may have been updated, and different versions might be available. Therefore we are also interested in browsing the provenance of sets of maps.
  • #9: Drilling down in granularity: maps are made of objects, and these objects may have its provenance as well. You could ask where does the object come from the organizations responsible for its appearance in the map, the date when the object was inserted, etc. As happened with the maps, we may be also interested in annotating sets of objects (in case they all share different annotations) instead of having them annotated individually.
  • #10: An object can have properties which have been integrated from different sources. The questions related to them are analogous to those that we could do to an object.
  • #11: Other requirements are not that easy to answer (not directly with a sparql query), but they can be benefited from the previous types of provenance. For example, if we want to answer how a set of corrections helped to improve a map, we can show the previous map and slowly introduce the changes, thus showing how the map is complete. We could answer the second question by retrieving the objects introduced in the newer version of the map, we could retrieve those with high confidence by modeling extra metadata from the algorithm, etc.
  • #12: Now that we have introduced the main requirements, how do we tackle them with PROV?
  • #13: First we need to introduce some basic extensions to PROV. These are very basic extensions and additional ones could be necessary to deal with the different levels of granularity. This is a work in progress and we still haven’t published the vocabulary extensions. We wanted to distinguish crowdsourced maps from integrated maps, as the former will be the inputs and the latter the outputs of the map integration processes. Other entities are the additional datasets consulted by the algorithm responsible for the integration of the map. We were going to introduce roles as well, but in the end decided to cut them out for simplicity.
  • #14: This would be an example of an integration of a map created from two different maps (GM and OSM). Explain a little the example
  • #15: There are three alternative approaches to creating new versions of the map: the new version of the map is generated anew, the new version of the map is generated taking into account the previous version of the map, and only the delta of the changes are generated. We assumed the second one in the previous example, although each approach is possible.
  • #16: This figure shows an example of several ways to store object level provenance. Maps can be big, and storing the provenance of every object might bring scalability issues. Recording partial provenance: Only particular aspects of provenance could be stored. For example, the only provenance assertions for an object could be references to the original objects identifiers. • Recording provenance selectively: During the integration process, specific decisions would be made as to what objects grant a detailed provenance record and which ones do not. For example, if an object was created with low confidence then detailed provenance would be recorded. • Aggregating provenance of objects: Objects with equivalent provenance could be grouped into collections, and the provenance would be attached to the collections. • Storing provenance separately: Provenance can be stored separately from the map itself. Several provenance services could be set up for the same map.
  • #17: The problem of modeling properties is that they do not have an identifier. Therefore we need to create a new entity (annotation, bundle, etc) which will contain the provenance for it. Explain the example with the bridge
  • #18: This is a summary of all the previous requirements, which is the main contribution. Discuss a little the difference between the sections and summarize each one a bit. Another contribution is the PROV extension