SlideShare a Scribd company logo
Geographica: A Benchmark for
Geospatial RDF Stores
George Garbis, Kostis Kyzirakos, Manolis Koubarakis
Dept. of Informatics and Telecommunications,
National and Kapodistrian University of Athens, Greece

12th International Semantic Web Conference
(Evaluation Track)
Outline
•

•

Motivation
The benchmark Geographica
Real-world workload
Synthetic workload
Evaluating the performance of geospatial RDF
stores using Geographica
Conclusions
•

•

•

•

23/10/2013

2
Motivation


Lots of geospatial data is available on the Web today.



Lots of geospatial data is quickly being transformed into
linked geospatial data!



People have started building applications using such data.



Geospatial extensions of SPARQL (e.g., GeoSPARQL and
stSPARQL) have been recently developed.



RDF stores provide support for GeoSPARQL (e.g., Strabon,
Oracle 12c, uSeekM, Parliament) or provide limited
geospatial functionality (e.g., Virtuoso, BigOwlim,
AllegroGraph)
23/10/2013

3
The Benchmark Geographica
•

•

Aim: measure the performance of today’s geospatial RDF stores
Organized around two workloads:
Real-world workload:
Based on existing linked geospatial datasets and known
application scenarios
Synthetic workload:
Measure performance in a controlled environment where we can
play around with selectivity of queries.
•

•

•

•

•

Γεωγραφικά: 17-volume geographical
encyclopedia by Στράβων (AD 17)

23/10/2013

4
Outline
•

•

Motivation
The benchmark Geographica
Real-world workload
Synthetic workload
Evaluating the performance of geospatial RDF
stores using Geographica
Conclusions
•

•

•

•

23/10/2013

5
Real-World Workload
Datasets
•

Datasets: Real-world datasets for the
geographic area of Greece playing an important
role in the LOD cloud or having complex
geometries
LinkedGeoData (LGD) for rivers and main roads
in Greece
GeoNames for Greece
DBpedia for Greece
Greek Administrative Geography (GAG)
CORINE land cover (CLC) for Greece
Hotspots
•

•

•

•

•

•

23/10/2013

6
Real-World Workload
Datasets

Dataset

Size

# of
Triples

# of
Points

# of Lines
# of Polygons
(max/min/avg (max/min/avg
points/line)
points/polygon)

GeoNames

45MB

400K

22K

-

-

Dbpedia

89MB

430K

8K

-

-

LGD

29MB

150K

-

12K (1.6K/2/21)

-

GAG

33MB

4K

-

-

325
(15K/4/400)

CLC

401MB

630K

-

-

45K
(5K/4/140)

Hotspots

90MB

450K

-

-

37K
(4/4/4)

23/10/2013

7
Real-World Workload
Parts
•

For this workload, Geographica has
two parts (following Jackpine):
Micro part: Tests primitive spatial
functions offered by geospatial RDF
stores
Macro part: Simulates some typical
application scenarios
•

•

23/10/2013

8
Real-World Workload
Micro part
•

•

29 SPARQL queries that consist of one or two triple
patterns and a spatial function.
Functions included:
Non-topological: boundary, envelope, convex
•

•

•

•

hull, buffer, area
Topological: equals, intersects, overlaps,
crosses, within, distance, disjoint
Spatial aggregates: extent, union

These functions are used for spatial selections
and spatial joins

23/10/2013

9
Example – non-topological
Micro part
•

Construct the boundary of all polygons of CLC

PREFIX geof:
<http://www.opengis.net/def/function/geosparql/
>
PREFIX dataset:
<http://geographica.di.uoa.gr/dataset/>
PREFIX clc:
<http://geo.linkedopendata.gr/corine/ontology#>
SELECT ( geof:boundary(?o1) as ?ret )
WHERE {
GRAPH dataset:clc { ?s1 clc:asWKT ?o1. }
23/10/2013
10
}
Example – spatial selection
Micro part

Find all points in GeoNames that are within a
given polygon.
PREFIX dataset:
<http://geographica.di.uoa.gr/dataset/>
PREFIX geonames:
<http://www.geonames.org/ontology#>
•

SELECT ?s1 ?o1
WHERE {
GRAPH dataset:geonames { ?s1
geonames:asWKT ?o1 }
FILTER( geof:sfWithin(?o1,
"POLYGON((…))"^^geo:wktLiteral)).
23/10/2013
}

11
Example – spatial join
Micro part
•

Find all pairs of GAG polygons that overlap

PREFIX dataset:
<http://geographica.di.uoa.gr/dataset/>
PREFIX gag:
<http://geo.linkedopendata.gr/gag/ontology/>
PREFIX clc:
<http://geo.linkedopendata.gr/corine/ontology#>
SELECT ?s1 ?s2
WHERE {
GRAPH dataset:gag {?s1 gag:asWKT ?o1}
GRAPH dataset:clc {?s2 clc:asWKT ?o2}
FILTER( geof:sfOverlaps(?o1, ?o2) )
23/10/2013
12
}
Real-World Workload
Micro part
•

Spatial Selections
Query Point
Points

Query Line

Within Buffer
Distance

Query Polygon
Within
Disjoint

Lines

Intersects
Disjoint

Polygons
•

Equals
Crosses
Intersects

Equals
Overlaps

Points

Lines

Polygons

Equals

Intersects

Intersects
Within

Spatial Joins
Points
Lines

Intersects
Within
Crosses

Polygons

Within
Touches
Overlaps

23/10/2013

13
Real-World Workload
Macro part: Scenarios
•

Reverse Geocoding

23/10/2013

14
Real-World Workload
Macro part: Scenarios
•

•

Reverse Geocoding
Web Map Search and Browsing

23/10/2013

15
Real-World Workload
Macro part: Scenarios
•

•

•

Reverse Geocoding
Web Map Search and Browsing
Rapid Mapping for Fire Monitoring

23/10/2013

16
Outline
•

•

Motivation
The benchmark Geographica
Real-world workload
Synthetic workload
Evaluating the performance of geospatial RDF
stores using Geographica
Conclusions
•

•

•

•

23/10/2013

17
Synthetic Workload
•

Goal: Evaluate performance in a controlled environment
where we can vary the thematic and spatial selectivity of
queries
Thematic selectivity: the fraction of the total
geographic features of a dataset that satisfy the nonspatial part of a query
Spatial selectivity: the fraction of the total geographic
features of a dataset which satisfy the topological
relation in the FILTER clause of a query
•

•

23/10/2013

18
Synthetic Workload
Generator
•

Dataset: As in VESPA, the produced datasets are
geographic features on a synthetic map:
States in a country ((n/3)2)
Land ownership (n2)
Roads (n)
POI (n2)
•

•

•

•

23/10/2013

19
Synthetic Workload
Ontology
•

•

Based roughly on the ontology of OpenStreetMap and the
GeoSPARQL vocabulary
Tagging each feature with a key enables us to select a
known fraction of features in a uniform way

23/10/2013

20
Synthetic Workload

Query template for spatial selections
SELECT ?s
WHERE {
?s ns:hasGeometry ?g.
?s c:hasTag ?tag.
?g ns:asWKT ?wkt.
?tag ns:hasKey “THEMA”
FILTER(FUNCTION(?wkt, “GEOM”^^geo:wktLiteral))}
•

Parameters:
ns: specifies the kind of feature (and geometry type) examined
FUNCTION: specifies the topological function examined
THEMA: defines the thematic selectivity of the query using
another parameter k
GEOM: specifies a rectangle that controls the spatial selectivity
of the query
•

•

•

•

23/10/2013

21
Synthetic Workload

Query template for spatial joins
SELECT ?s1 ?s2
WHERE {
?s1 ns1:hasGeometry ?g1.
?s1 ns1:hasTag ?tag1.
?g1 ns1:asWKT ?wkt1.
?tag1 ns1:hasKey “THEMA” .
?s2 ns2:hasGeometry ?g2.
?s2 ns2:hasTag ?tag2.
?g2 ns2:asWKT ?wkt2.
?tag2 ns2:hasKey “THEMA’” .
FILTER(FUNCTION(?wkt1, ?wkt2))}

23/10/2013

22
Outline
•

•

Motivation
The benchmark Geographica
Real-world workload
Synthetic workload
Evaluating the performance of geospatial
RDF stores using Geographica
Conclusions
•

•

•

•

23/10/2013

23
Experimental Setup
•

•

•

Geospatial RDF stores tested: Strabon, Parliament, uSeekM
Machine: Intel Xeon E5620, 12MB L3 cache, 2.4GHz, 24GB RAM, 4
HDD with RAID-5
Micro part (real-world workload) & synthetic workload:
Metric: response time
Run 3 times and compute the median
Time out: 1 hour
Run both on warm caches and cold caches
Macro part (real-world workload) :
Run many instantiations of each scenario for one hour without
cleaning caches
Metric: Average time for a complete execution
•

•

•

•

•

•

•

23/10/2013

24
Results

Real Workload - micro part (cold caches)

23/10/2013

25
Results

Macro part

Scenario

Strabon

uSeekM

Parliament

Reverse Geocoding

65 sec

0.77 sec

2.6 sec

Map Search and
Browsing

0.9 sec

0.6 sec

22.2 sec

Rapid Mapping for Fire
Monitoring

207.4 sec

-

-

23/10/2013

26
Results

Synthetic Workload
•

We generate the synthetic dataset with n=512.
This results in:
28,900 states
262,144 land ownerships
512 roads
262,144 points of interest
•

•

•

•

•

Size: 3,880,224 triples (745 MB)

23/10/2013

27
Results

Synthetic Workload – spatial selections

Intersects
Tag 1, cold caches
23/10/2013

Intersects
Tag 512, cold caches
28
Results

Synthetic Workload - Spatial Joins

Touches
23/10/2013

29
Conclusions
•

We defined Geographica, a new comprehensive
benchmark for geospatial RDF stores, and used it
to compare 3 relevant systems
Strabon
Parliament
uSeekM
•

•

•

•

Two workloads: real-world and synthetic

23/10/2013

30
Future Work
•

•

•

•

•

•

Capture the full GeoSPARQL standard.
Study scaling issues with larger datasets.
Add more application scenarios
Extent the generator to produce datasets
that do not follow a uniform distribution.
Extend the benchmark to include
time-evolving geospatial data.

23/10/2013

31
Thanks!
 Geographica: http://geographica.di.uoa.gr
 This work was supported in part by the European Commission
project TELEIOS http://www.earthobservatory.eu

Any Questions?

23/10/2013

32

More Related Content

PPTX
Big linked geospatial data tools in ExtremeEarth-phiweek19
ODP
WMS Performance Shootout 2011
PPTX
The National Oceanographic Data Center’s NetCDF Templates
ODP
WMS Performance Shootout 2009
PDF
Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...
ODP
WMS Performance Shootout 2010
PPTX
Spatiotemporal Raster Improvements in GeoServer
PDF
Using GeoServer for spatio-temporal data management with examples for MetOc a...
Big linked geospatial data tools in ExtremeEarth-phiweek19
WMS Performance Shootout 2011
The National Oceanographic Data Center’s NetCDF Templates
WMS Performance Shootout 2009
Nokia Asha webinar: Developing location-based services for Nokia Asha phones ...
WMS Performance Shootout 2010
Spatiotemporal Raster Improvements in GeoServer
Using GeoServer for spatio-temporal data management with examples for MetOc a...

What's hot (20)

PDF
Location based services for Nokia X and Nokia Asha using Geo2tag
PPT
Wms Performance Tests Map Server Vs Geo Server
ODP
MapServer #ProTips 2015
PPTX
[FOSS4G Seoul 2015] New Geoprocessing Toolbox in uDig Desktop GIS
PPTX
PDF
State of GeoServer 2.14
PDF
GeoServer on steroids
PDF
Comparing Vocabularies for Representing Geographical Features and Their Geometry
PPT
Overview of MassGIS Web Mapping Services
PDF
HACC: Fitting the Universe Inside a Supercomputer
PDF
Raster Data In GeoServer And GeoTools: Achievements, Issues And Future Develo...
PPTX
Why is postgis awesome?
PPTX
Querying Linked Geospatial Data with Incomplete Information
PDF
Java Image Processing for Geospatial Community
PDF
Raster data in GeoServer and GeoTools: Achievements, issues and future develo...
PPTX
Raster data in GeoServer and GeoTools: Achievements, issues and future devel...
ODP
OSGeo Conferences Report
PDF
GeoServer on Steroids
PDF
Spatio-temporal Data Handling With GeoServer for MetOc And Remote Sensing
PDF
GeoNetwork, The Open Source Solution for the interoperable management of ge...
Location based services for Nokia X and Nokia Asha using Geo2tag
Wms Performance Tests Map Server Vs Geo Server
MapServer #ProTips 2015
[FOSS4G Seoul 2015] New Geoprocessing Toolbox in uDig Desktop GIS
State of GeoServer 2.14
GeoServer on steroids
Comparing Vocabularies for Representing Geographical Features and Their Geometry
Overview of MassGIS Web Mapping Services
HACC: Fitting the Universe Inside a Supercomputer
Raster Data In GeoServer And GeoTools: Achievements, Issues And Future Develo...
Why is postgis awesome?
Querying Linked Geospatial Data with Incomplete Information
Java Image Processing for Geospatial Community
Raster data in GeoServer and GeoTools: Achievements, issues and future develo...
Raster data in GeoServer and GeoTools: Achievements, issues and future devel...
OSGeo Conferences Report
GeoServer on Steroids
Spatio-temporal Data Handling With GeoServer for MetOc And Remote Sensing
GeoNetwork, The Open Source Solution for the interoperable management of ge...
Ad

Similar to Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013 (20)

PPTX
Geographica: A Benchmark for Geospatial RDF Stores
PDF
Representing and Querying Geospatial Information in the Semantic Web
PDF
Building Scalable Semantic Geospatial RDF Stores
PDF
Big Linked Data Querying - ExtremeEarth Open Workshop
PDF
thesis.compressed
PDF
Building A Spatial Database In Postgresql (Ppt).pdf
PPT
Building a Spatial Database in PostgreSQL
PDF
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
PPTX
Querying Incomplete Geospatial Information in RDF
PPT
Building a Spatial Database in PostgreSQL
PDF
Big Linked Data Federation - ExtremeEarth Open Workshop
PPTX
Databases Basics and Spacial Matrix - Discussig Geographic Potentials of Data...
PDF
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
PPTX
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
PPTX
Stratio's Cassandra Lucene index: Geospatial use cases
PDF
Toward Next Generation of Gazetteer: Utilizing GeoSPARQL For Developing Link...
PPTX
GeoMesa: Scalable Geospatial Analytics
PDF
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
PDF
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
PDF
The spatiotemporal RDF store Strabon
Geographica: A Benchmark for Geospatial RDF Stores
Representing and Querying Geospatial Information in the Semantic Web
Building Scalable Semantic Geospatial RDF Stores
Big Linked Data Querying - ExtremeEarth Open Workshop
thesis.compressed
Building A Spatial Database In Postgresql (Ppt).pdf
Building a Spatial Database in PostgreSQL
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Querying Incomplete Geospatial Information in RDF
Building a Spatial Database in PostgreSQL
Big Linked Data Federation - ExtremeEarth Open Workshop
Databases Basics and Spacial Matrix - Discussig Geographic Potentials of Data...
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial use cases
Toward Next Generation of Gazetteer: Utilizing GeoSPARQL For Developing Link...
GeoMesa: Scalable Geospatial Analytics
BigDataEurope 1st SC5 Workshop, Project Teleios & LEO, by M. Koubarakis, Univ...
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
The spatiotemporal RDF store Strabon
Ad

More from Kostis Kyzirakos (6)

PDF
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
PDF
Linked Earth Observation Data:The Projects TELEIOS and LEO
PDF
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
PDF
Data Models and Query Languages for Linked Geospatial Data
PDF
Data Models and Query Languages for Linked Geospatial Data
PPTX
Strabon: A Semantic Geospatial Database System
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
Linked Earth Observation Data:The Projects TELEIOS and LEO
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial Data
Strabon: A Semantic Geospatial Database System

Recently uploaded (20)

PPTX
master seminar digital applications in india
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Basic Mud Logging Guide for educational purpose
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Institutional Correction lecture only . . .
PDF
Classroom Observation Tools for Teachers
PDF
Business Ethics Teaching Materials for college
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Pre independence Education in Inndia.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
RMMM.pdf make it easy to upload and study
master seminar digital applications in india
Microbial diseases, their pathogenesis and prophylaxis
Basic Mud Logging Guide for educational purpose
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Renaissance Architecture: A Journey from Faith to Humanism
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Institutional Correction lecture only . . .
Classroom Observation Tools for Teachers
Business Ethics Teaching Materials for college
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Module 4: Burden of Disease Tutorial Slides S2 2025
Anesthesia in Laparoscopic Surgery in India
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Pre independence Education in Inndia.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
RMMM.pdf make it easy to upload and study

Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013

  • 1. Geographica: A Benchmark for Geospatial RDF Stores George Garbis, Kostis Kyzirakos, Manolis Koubarakis Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens, Greece 12th International Semantic Web Conference (Evaluation Track)
  • 2. Outline • • Motivation The benchmark Geographica Real-world workload Synthetic workload Evaluating the performance of geospatial RDF stores using Geographica Conclusions • • • • 23/10/2013 2
  • 3. Motivation  Lots of geospatial data is available on the Web today.  Lots of geospatial data is quickly being transformed into linked geospatial data!  People have started building applications using such data.  Geospatial extensions of SPARQL (e.g., GeoSPARQL and stSPARQL) have been recently developed.  RDF stores provide support for GeoSPARQL (e.g., Strabon, Oracle 12c, uSeekM, Parliament) or provide limited geospatial functionality (e.g., Virtuoso, BigOwlim, AllegroGraph) 23/10/2013 3
  • 4. The Benchmark Geographica • • Aim: measure the performance of today’s geospatial RDF stores Organized around two workloads: Real-world workload: Based on existing linked geospatial datasets and known application scenarios Synthetic workload: Measure performance in a controlled environment where we can play around with selectivity of queries. • • • • • Γεωγραφικά: 17-volume geographical encyclopedia by Στράβων (AD 17) 23/10/2013 4
  • 5. Outline • • Motivation The benchmark Geographica Real-world workload Synthetic workload Evaluating the performance of geospatial RDF stores using Geographica Conclusions • • • • 23/10/2013 5
  • 6. Real-World Workload Datasets • Datasets: Real-world datasets for the geographic area of Greece playing an important role in the LOD cloud or having complex geometries LinkedGeoData (LGD) for rivers and main roads in Greece GeoNames for Greece DBpedia for Greece Greek Administrative Geography (GAG) CORINE land cover (CLC) for Greece Hotspots • • • • • • 23/10/2013 6
  • 7. Real-World Workload Datasets Dataset Size # of Triples # of Points # of Lines # of Polygons (max/min/avg (max/min/avg points/line) points/polygon) GeoNames 45MB 400K 22K - - Dbpedia 89MB 430K 8K - - LGD 29MB 150K - 12K (1.6K/2/21) - GAG 33MB 4K - - 325 (15K/4/400) CLC 401MB 630K - - 45K (5K/4/140) Hotspots 90MB 450K - - 37K (4/4/4) 23/10/2013 7
  • 8. Real-World Workload Parts • For this workload, Geographica has two parts (following Jackpine): Micro part: Tests primitive spatial functions offered by geospatial RDF stores Macro part: Simulates some typical application scenarios • • 23/10/2013 8
  • 9. Real-World Workload Micro part • • 29 SPARQL queries that consist of one or two triple patterns and a spatial function. Functions included: Non-topological: boundary, envelope, convex • • • • hull, buffer, area Topological: equals, intersects, overlaps, crosses, within, distance, disjoint Spatial aggregates: extent, union These functions are used for spatial selections and spatial joins 23/10/2013 9
  • 10. Example – non-topological Micro part • Construct the boundary of all polygons of CLC PREFIX geof: <http://www.opengis.net/def/function/geosparql/ > PREFIX dataset: <http://geographica.di.uoa.gr/dataset/> PREFIX clc: <http://geo.linkedopendata.gr/corine/ontology#> SELECT ( geof:boundary(?o1) as ?ret ) WHERE { GRAPH dataset:clc { ?s1 clc:asWKT ?o1. } 23/10/2013 10 }
  • 11. Example – spatial selection Micro part Find all points in GeoNames that are within a given polygon. PREFIX dataset: <http://geographica.di.uoa.gr/dataset/> PREFIX geonames: <http://www.geonames.org/ontology#> • SELECT ?s1 ?o1 WHERE { GRAPH dataset:geonames { ?s1 geonames:asWKT ?o1 } FILTER( geof:sfWithin(?o1, "POLYGON((…))"^^geo:wktLiteral)). 23/10/2013 } 11
  • 12. Example – spatial join Micro part • Find all pairs of GAG polygons that overlap PREFIX dataset: <http://geographica.di.uoa.gr/dataset/> PREFIX gag: <http://geo.linkedopendata.gr/gag/ontology/> PREFIX clc: <http://geo.linkedopendata.gr/corine/ontology#> SELECT ?s1 ?s2 WHERE { GRAPH dataset:gag {?s1 gag:asWKT ?o1} GRAPH dataset:clc {?s2 clc:asWKT ?o2} FILTER( geof:sfOverlaps(?o1, ?o2) ) 23/10/2013 12 }
  • 13. Real-World Workload Micro part • Spatial Selections Query Point Points Query Line Within Buffer Distance Query Polygon Within Disjoint Lines Intersects Disjoint Polygons • Equals Crosses Intersects Equals Overlaps Points Lines Polygons Equals Intersects Intersects Within Spatial Joins Points Lines Intersects Within Crosses Polygons Within Touches Overlaps 23/10/2013 13
  • 14. Real-World Workload Macro part: Scenarios • Reverse Geocoding 23/10/2013 14
  • 15. Real-World Workload Macro part: Scenarios • • Reverse Geocoding Web Map Search and Browsing 23/10/2013 15
  • 16. Real-World Workload Macro part: Scenarios • • • Reverse Geocoding Web Map Search and Browsing Rapid Mapping for Fire Monitoring 23/10/2013 16
  • 17. Outline • • Motivation The benchmark Geographica Real-world workload Synthetic workload Evaluating the performance of geospatial RDF stores using Geographica Conclusions • • • • 23/10/2013 17
  • 18. Synthetic Workload • Goal: Evaluate performance in a controlled environment where we can vary the thematic and spatial selectivity of queries Thematic selectivity: the fraction of the total geographic features of a dataset that satisfy the nonspatial part of a query Spatial selectivity: the fraction of the total geographic features of a dataset which satisfy the topological relation in the FILTER clause of a query • • 23/10/2013 18
  • 19. Synthetic Workload Generator • Dataset: As in VESPA, the produced datasets are geographic features on a synthetic map: States in a country ((n/3)2) Land ownership (n2) Roads (n) POI (n2) • • • • 23/10/2013 19
  • 20. Synthetic Workload Ontology • • Based roughly on the ontology of OpenStreetMap and the GeoSPARQL vocabulary Tagging each feature with a key enables us to select a known fraction of features in a uniform way 23/10/2013 20
  • 21. Synthetic Workload Query template for spatial selections SELECT ?s WHERE { ?s ns:hasGeometry ?g. ?s c:hasTag ?tag. ?g ns:asWKT ?wkt. ?tag ns:hasKey “THEMA” FILTER(FUNCTION(?wkt, “GEOM”^^geo:wktLiteral))} • Parameters: ns: specifies the kind of feature (and geometry type) examined FUNCTION: specifies the topological function examined THEMA: defines the thematic selectivity of the query using another parameter k GEOM: specifies a rectangle that controls the spatial selectivity of the query • • • • 23/10/2013 21
  • 22. Synthetic Workload Query template for spatial joins SELECT ?s1 ?s2 WHERE { ?s1 ns1:hasGeometry ?g1. ?s1 ns1:hasTag ?tag1. ?g1 ns1:asWKT ?wkt1. ?tag1 ns1:hasKey “THEMA” . ?s2 ns2:hasGeometry ?g2. ?s2 ns2:hasTag ?tag2. ?g2 ns2:asWKT ?wkt2. ?tag2 ns2:hasKey “THEMA’” . FILTER(FUNCTION(?wkt1, ?wkt2))} 23/10/2013 22
  • 23. Outline • • Motivation The benchmark Geographica Real-world workload Synthetic workload Evaluating the performance of geospatial RDF stores using Geographica Conclusions • • • • 23/10/2013 23
  • 24. Experimental Setup • • • Geospatial RDF stores tested: Strabon, Parliament, uSeekM Machine: Intel Xeon E5620, 12MB L3 cache, 2.4GHz, 24GB RAM, 4 HDD with RAID-5 Micro part (real-world workload) & synthetic workload: Metric: response time Run 3 times and compute the median Time out: 1 hour Run both on warm caches and cold caches Macro part (real-world workload) : Run many instantiations of each scenario for one hour without cleaning caches Metric: Average time for a complete execution • • • • • • • 23/10/2013 24
  • 25. Results Real Workload - micro part (cold caches) 23/10/2013 25
  • 26. Results Macro part Scenario Strabon uSeekM Parliament Reverse Geocoding 65 sec 0.77 sec 2.6 sec Map Search and Browsing 0.9 sec 0.6 sec 22.2 sec Rapid Mapping for Fire Monitoring 207.4 sec - - 23/10/2013 26
  • 27. Results Synthetic Workload • We generate the synthetic dataset with n=512. This results in: 28,900 states 262,144 land ownerships 512 roads 262,144 points of interest • • • • • Size: 3,880,224 triples (745 MB) 23/10/2013 27
  • 28. Results Synthetic Workload – spatial selections Intersects Tag 1, cold caches 23/10/2013 Intersects Tag 512, cold caches 28
  • 29. Results Synthetic Workload - Spatial Joins Touches 23/10/2013 29
  • 30. Conclusions • We defined Geographica, a new comprehensive benchmark for geospatial RDF stores, and used it to compare 3 relevant systems Strabon Parliament uSeekM • • • • Two workloads: real-world and synthetic 23/10/2013 30
  • 31. Future Work • • • • • • Capture the full GeoSPARQL standard. Study scaling issues with larger datasets. Add more application scenarios Extent the generator to produce datasets that do not follow a uniform distribution. Extend the benchmark to include time-evolving geospatial data. 23/10/2013 31
  • 32. Thanks!  Geographica: http://geographica.di.uoa.gr  This work was supported in part by the European Commission project TELEIOS http://www.earthobservatory.eu Any Questions? 23/10/2013 32