Benchmarking Big Linked Data: The case of the HOBBIT Project

Benchmarking Big Linked Data:
The case of the HOBBIT Project
Irini Fundulaki
Institute of Computer Science
Foundation for Research and Technology -
Hellas

Data and its dimensions
https://www.ibmbigdatahub.com/infographic/four-vs-big-data
•  Variability
•  Validity
•  Vulnerability
•  Volatility
•  Visualization
•  Value

Big Data Open Source Tools
http://dbrang.tistory.com/1024

Basic Question

Which tools should I use
for my business use case?

Necessary Questions to ask
•  Where are the current
bottlenecks?
•  Which steps of the data
lifecycle are critical?
•  Which solutions are available
on the market?
•  Which key performance
indicators are relevant?
•  How well should tools
perform?
•  How do existing solutions
perform w.r.t. relevant
indicators? Benchmark systems!

Why Benchmarks?
•  Performance Evaluation
–  There is no single recipe on how to do it right
–  There are many ways how to do it wrong
–  There are a number of best practices but no broadly accepted
standard on how to design and develop a benchmark
•  Questions asked:
–  What data/datasets should we use?
–  Which workload/queries should we consider?
–  What to measure and how to measure?

Benchmark Development Methodology
•  Management and methodological activities performed by a
group of people
–  Management: Organizational protocols to control the process
–  Methodological: design principles, methods and steps for
benchmark creation
•  Benchmark Development
–  Roles and bodies: people/groups involved in the development
–  Design principles: fundamental rules that direct the
development of a benchmark
–  Development process: series of steps to develop a benchmark
based on Choke Points
Choke Points: the set of technical
diﬃculties that force systems to improve their performance

Benchmark Development Process (1)
•  Design Principles [L97]
Principle Comment
Relevant The benchmark is meaningful for the target domain
Understandable The benchmark is easy to understand and use
Good Metrics The metrics deﬁned by the benchmark are linear, orthogonal
and monotonic
Scalable The benchmark is applicable to a broad spectrum of hardware
and software conﬁgurations
Coverage The benchmark workload does not oversimplify the typical
environment
Acceptance The benchmark is recognized as relevant by the majority of
vendors and users

Benchmark Development Process (2)
•  Benchmarking Metrics: What we measure
–  Performance
–  Price/Performance
–  Energy/Performance Metrics: Energy metric to measure the
energy consumption of system components
•  TPC Pricing speciﬁcation
–  Provides consistent methodologies for computing the price of
the benchmarked system, licensing of software, maintenance …
Benchmark Metrics
TPC-C Transaction Rate(tpmC), Price per Transaction ($/tmpC)
TPC-E Transactions per Second (tpS)
TPC-H Composite Query per Hour Performance Metric (QpH@Size),
Price per Composite Query per Hour Performance Metric ($/
QpH@Size)

Design Principles: Desirable Attributes of a
Benchmark
•  Relevant/Representative: based on realistic
use case scenarios and must reflect the needs
of the use case
Benchmark
Attributes
relevant
representative
understandable
simple
portable
fair
repeatable
metrics
scalable
verifiable
•  Understandable/Simple: the results and
workload are easily understandable by users
•  Portable/Fair/Repeatable: no system
benefits from the benchmark. Must be
deterministic and provide a «gold standard»
•  Verifiable: allow verifiable results in each
execution
•  Metrics: should be well defined to be able to
assess and compare the systems.
•  Scalable: datasets should be in the order of
billions of «objects»

Development Process: Choke Points
•  A benchmark exposes a system to a workload and should identify
the technical difficulties of the system under test
•  Choke Points [BNE14 ] are those technological challenges whose
resolution will significantly improve the performance of a product
–  TPC-H: a 20 years old benchmark (superseded by TPC-DS) but
still influential using business-oriented queries and concurrent
modifications
–  22 queries capturing (most of) the aspects of relational query
processing
•  [BNE14] performed an analysis of the TPC-H workload and
identified 28 choke points grouped into 6 categories

Choke Points à la TPC-H
•  CP1: Aggregation Performance
–  Ordered aggregation, small group-by keys, interesting orders, dependent
group-by keys
•  CP2: Join Performance
–  Large joins, sparse foreign keys, rich join order optimization, late projection
•  CP3: Data Access Locality (materialized views)
–  Columnar locality, physical locality by key, detecting correlation
•  CP4: Expression Calculation
–  Raw Expression Arithmetic, Complex Boolean Expressions in Joins and
Selections, String Matching Performance
•  CP5: Correlated Sub-queries
–  Flattening sub-queries, moving predicates to a sub-query, overlap between
outer- and sub-query
•  CP6: Parallelism and Concurrency
–  Query plan parallelization, workload management, result re-use

HOBBIT: Holistic Benchmarking of Big Linked Data
•  Focus on Big Linked Data
•  Cover the business-critical
steps of the Linked Data
lifecycle
•  Used by a growing number
of companies
•  Mature and maturing
technologies

HOBBIT Objectives
•  Gather real requirements
i.  Focus on industrial requirements
ii.  Gather relevant performance
indicators
iii.  Gather relevant performance
thresholds
iv.  Gather real datasets
v.  Choke point-based design
vi.  Develop benchmarks based on real data
•  Provide universal benchmarking platform
i.  Standardized hardware
ii.  Provide comparable results
•  Periodic benchmarking challenges and reporting
i.  Create independent HOBBIT association

HOBBIT Benchmarks (1)
1.  Generation and Acquisition
–  The extraction benchmarks test the performance of the
systems that implement approaches for obtaining RDF
data from
1.  semi-structured data streams such as sensor data
(smart metering, geo-spatial information, etc.) and
2.  unstructured data streams (Twitter, RSS feeds, etc.).

! Sensor Streams Benchmark
! Unstructured Streams Benchmark

2.  Analysis and Processing
–  Benchmarks for the linking and analysis of data of the big
data value chain
–  The Linking Benchmark tests the performance of instance
matching methods and tools for Linked Data
–  The Analytics benchmark tests the performance of
Machine Learning methods (supervised and unsupervised)
for data analytics
! Link Discovery Analysis Benchmark
! Structured Machine Learning Benchmark

3.  Storage and Curation
–  Benchmarks for high insert rate with time-dependent and
largely repetitive or cyclic data as well as for data that
come into multiple versions
! Data Storage
! Versioning

4.  Visualization and Services
–  Focus is on benchmarks with well-deﬁned metrics that do
not involve users
! Question Answering Benchmark
! Faceted Browsing Benchmark

HOBBIT Platform (1)
•  The HOBBIT evaluation platform is a distributed FAIR*
benchmarking platform for the Linked Data lifecycle.
•  The platform was designed to provide means to:
1.  benchmark any step of the Linked Data lifecycle, including
generation and acquisition, analytics and processing, storage
and curation as well as visualization and services
2.  ensure that benchmarking results can be found, accessed,
integrated and reused easily (FAIR principles)
3.  benchmark Big Data platforms by being the ﬁrst distributed
benchmarking platform for Linked data

HOBBIT Platform (2)
•  Underlying Principles: The HOBBIT benchmarking platform ensures
that:
–  Users can test systems with the HOBBIT benchmarks or theirs
without having to worry about finding standardized hardware
–  New benchmarks can be easily created and added to the
platform by third parties
–  The evaluation can be scaled out to large datasets and on
distributed architectures
–  The publishing and analysis of the results of different systems
can be carried out in a uniform manner across the different
benchmarks obtaining comparable results!
•  History:
–  First Release February 2017
–  Second Version February 2018

Task
Generator
Task
Generator
HOBBIT Platform (3): Components
Evaluation
Module
Benchmark
Controller
Evaluation
Storage
Task
Generator
Task
Generator
Task
Generator
Data
Generator
Repository
User
Management
Platform
Controller
Front End
Storage Analysis
Benchmarked System
data ﬂow
creates component

HOBBIT Platform (4): Workﬂow

Linked Open Data Cloud (1)
https://lod-cloud.net/ June 2018
Media
Government
Geographic
Publications
User-generated
Life sciences
Cross-domain

Linked Open Data Cloud (2)
https://lod-cloud.net/ June 2018
Same entity can be
described in diﬀerent
sources

Link Discovery: The cornerstone for Linked Data
Swiss Geospatial Data
data acquisition
Geospatial data
How can we automatically recognize
multiple mentions of the same entity or
relations amongst entities
across or within sources?
=
Link Discovery
data evolution

Link Discovery
•  The Linked Data paradigm is based on the publication of
information by diﬀerent publishers, and the interlinking of Web
resources across knowledge bases.
•  Cross-dataset links are not integral to newly created datasets LOD
and must be determined automatically using link discovery
techniques
•  Instance Matching: a sub-problem that focuses on ﬁnding matches
between objects using mostly string comparison techniques
–  In Linked Data, the «representations» that refer to the same
real-world object are expressed using the owl:sameAs links

HOBBIT: Instance Matching Benchmark
•  Inspired from SPIMBench developed in the context of EU FP7 LDBC
•  Domain Dependent Instance Matching Benchmark
•  Highly Conﬁgurable, Scalable
•  Synthetic benchmark to test the correctness and performance of
instance matching systems
•  Supports Standard Value-Based and Structure-Based Test Cases
•  Introduces Advanced Semantics-Aware Test Cases considering
OWL2 expressive constructs
•  Expressive Gold Standard that records the transformations applied
on the matched instances and a Similarity Score Metric

Performance Metrics
•  Standard Information Retrieval Metrics:
–  Precision (P) = TP / (TP + FP)
–  Recall (R) = TP / (TP + FN)
–  F-measure (F) = 2 x (PR / (P+R))
•  Similarity Score
–  value in [0, 1] quantifies the difficulty of an IM System to find a
match
–  average similarity score: average difficulty of the matched
instances
–  standard deviation: the spread of similarity scores for the
matched instances

Linking Benchmark for Spatial Data – Version 1 (1)
•  Used at OAEI 2017
•  Source Dataset
–  TomTom Dataset
–  Consists of traces represented as LineStrings in the Well-known text
(WKT) format
•  Target Dataset
–  Obtained by applying a set of transformations on the source dataset
•  Change coordinate system format
•  Change Date format
•  Value-based transformations
•  Addition/Deletion of intermediate points
•  Gold Standard
–  [source Trace][relation][target Trace] where in Version 1, relation is
EQUALS

•  TomTom Dataset: representing and describing Transport Data
–  Each text file contains a simple textual representation of trace data (GPS fixes)
–  Each line represents a single GPS fix
–  Lines are sorted in ascending order by timestamp of the corresponding GPS fix
1305093212000 13.587170 52.425710 8.33
…
1305093216000 13.586730 52.425650 3.89
<xsd:dateTime><longitude [o] latitude [o]> <speed [m/s]>
Trajectory of a car from
Fri, 04 Oct 43326 12:13:20 GMT to
Fri, 04 Oct 43326 13:20:00 GMT

Vehicle TracehasTrace
Pointwgs84_pos:Point
hasPoint
hasSpeed
Float
velocityValuevelocityMetrichasTimeStamp
xsd:TimeStamp
Velocity
MotionProperty Vector
km_per_hour
xsd:decimal
long lat
rdfs:subClassOf
TomTom Ontology

•  Spatial Benchmark Generator (Spgen)
–  diﬀers from the classical, mostly string-based approaches
–  generic, schema agnostic and choke-point based design
–  tests the performance of link discovery systems that deal with the
DE-9IM (Dimensionally Extended nine-Intersection Model) model that
is used to represent topological relations
•  Choke Points
–  CP1: Scalability
•  Large datasets
•  Large number of points for each trace
–  CP2: Output Quality Metric
•  Precision
•  Recall
•  F-measure
–  CP3: Time Performance

•  Source Dataset
–  Consists of traces represented as LineStrings in the Well-known
text (WKT) format
•  Target Dataset
–  Consists of traces represented as LineStrings or Polygons in the
Well-known text (WKT) format
•  Gold Standard
–  Stores pairs of matched source and target instances
–  Generated using RADON to ensure completeness and
correctness
Given a WKT Geometry s as source and a DE-9IM Relation r, we generate a
WKT Geometry t as target such as their Intersection Matrix follows the
deﬁnition of Relation r.

DE-9IM Topological Relations between LineStrings
Equals Disjoint Touches
Contains (Within)
& Covers (Covered By)
Intersects Crosses
Overlaps

DE-9IM Topological Relations between a LineString
and a Polygon
Disjoint
Touches Intersects
Crosses Contains (Within)
& Covers (Covered By)

Link Discovery Benchmark: Architecture
Input Dataset
(Traces)
Initialization
Module
Resource
Generation
Module
Resource
Transformation
Module
(JTS Extension)
Test Case
Generation
Parameters
Source
Data
Target
Data
RADON
Gold
Standard

Benchmarks for eHealth Systems
•  No industry-strength benchmark for eHealth Systems (a la TPC)
•  Necessary to develop industry-strength benchmarks for eHealth
systems
•  Identify the technical diﬃculties – choke points
•  Necessary to test the
–  Information Completeness
–  Security
–  Standard Compliance
–  Query Evaluation Performance
–  Eﬃciency of Storage Space
–  Interoperability Features

References
•  [L97] Charles Levine. TPC-C: The OLTP Benchmark. In SIGMOD –
Industrial Session, 1997.
•  [BNE14] P. Boncz, T. Neumann, O. Erling. TPC-H Analyzed: Hidden
Messages and Lessons Learned from an Inﬂuential Benchmark.
Performance Characterization and Benchmarking. In TPCTC 2013,
Revised Selected Papers.

Lance System Architecture
Source
Data
Target
Data
Weighted
Gold Standard
Resource
Transformation
Module
RESCAL [NT12] MATCHER SAMPLER
Weight Computation Module
Test Case
Generation
Parameters RDF Repository
Data
Ingestion
Module
Initialization
Module
Resource
Generator
Test Case Generator SPARQL
Queries
(Schema
Stats)
SPARQL
Queries
(IR)
Source
Data
Main Memory
Data
Representation
Gold Standard
Matched Instances

Lance System Architecture for Spatial Data*
Source
Data
Target
Data
Weighted
Gold Standard
Resource
Transformation
Module
RESCAL [NT12] MATCHER SAMPLER
Weight Computation Module
Test Case
Generation
Parameters RDF Repository
Data
Ingestion
Module
Initialization
Module
Resource
Generator
Test Case Generator SPARQL
Queries
(Schema
Stats)
SPARQL
Queries
(IR)
Source
Data
Main Memory
Data
Representation
Gold Standard
Matched Instances

This work was supported by grands from the EU H2020 Framework Programme
provided for the project HOBBIT (GA no. 688227).

Benchmarking Big Linked Data: The case of the HOBBIT Project

More Related Content

Similar to Benchmarking Big Linked Data: The case of the HOBBIT Project (20)

More from Holistic Benchmarking of Big Linked Data (20)

Recently uploaded (20)

Benchmarking Big Linked Data: The case of the HOBBIT Project