Aalborg University
Optimizing RDF Data Cubes for Efficient Processing of
Analytical Queries
Kim Ahlstrøm Jakobsen
Alex B. Andersen
Katja Hose
Torben Bach Pedersen
Database Technology,
Department of Computer Science,
Aalborg University
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 1 / 19
Aalborg University
Motivation
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University
Motivation
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University
Motivation
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University
Motivation
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University
Motivation
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University
Motivation
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 2 / 19
Aalborg University
Future Goal
Goal
Analytical queries on internal data & external linked data
Benefits
Enables exploratory queries
Increasing amount of linked data
Integrates with heterogeneous data
Semantic reasoning
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 3 / 19
Aalborg University
Future Goal
Goal
Analytical queries on internal data & external linked data
Benefits
Enables exploratory queries
Increasing amount of linked data
Integrates with heterogeneous data
Semantic reasoning
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 3 / 19
Aalborg University
The First Steps
Efficient Processing of Analytical Querying on RDF Data Cubes.
Denormalize the cube dimensions
Reduce the subject-object joins (expensive)
Increase the subject-subject joins
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 4 / 19
Aalborg University
The First Steps
Efficient Processing of Analytical Querying on RDF Data Cubes.
Denormalize the cube dimensions
Reduce the subject-object joins (expensive)
Increase the subject-subject joins
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 4 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Workflow
Internal optimization
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 5 / 19
Aalborg University
Building the Cube
Purpose
Organize data with purpose of
analysis
Easier to understand
What is a cube
Facts: The subject of the analysis
Dimensions: Perspectives of the data
Levels: Concepts in the dimensions
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 6 / 19
Aalborg University
Building the Cube
Purpose
Organize data with purpose of
analysis
Easier to understand
What is a cube
Facts: The subject of the analysis
Dimensions: Perspectives of the data
Levels: Concepts in the dimensions
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 6 / 19
Aalborg University
Analytical Queries
Example Query 1
What is the revenue per country?
Example Query 2
What are the top k products bought by customers from Denmark?
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 7 / 19
Aalborg University
Analytical Queries
Example Query 1
What is the revenue per country?
Example Query 2
What are the top k products bought by customers from Denmark?
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 7 / 19
Aalborg University
Patterns
Snowflake Pattern
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 8 / 19
Aalborg University
Patterns
Star Pattern
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 9 / 19
Aalborg University
Patterns
Fully Denormalized Pattern
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 10 / 19
Aalborg University
Special Cases:
Unbalanced Hierarchies
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 11 / 19
Aalborg University
Special Cases:
Property Collision
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 12 / 19
Aalborg University
Special Cases:
Property Collision
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 12 / 19
Aalborg University
Special Cases:
Property Collision
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 12 / 19
Aalborg University
Semantic Web OLAP Denormalization Algorithm
Input
QB4OLAP ontology
Snowflake pattern RDF data
cube
Output
Star pattern RDF data cube
Fully Denormalized pattern RDF
data cube
Features
Top-down traversal
Property renaming
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 13 / 19
Aalborg University
Unbalanced Hierarchies Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University
Unbalanced Hierarchies Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University
Unbalanced Hierarchies Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University
Unbalanced Hierarchies Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University
Unbalanced Hierarchies Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University
Unbalanced Hierarchies Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 14 / 19
Aalborg University
Query rewriting
SELECT ?name sum(? p r i c e )
WHERE {
? l i n e i t e m : e x t e n d e d p r i c e ? p r i c e ;
: h a s o r d e r ? o r d e r .
? o r d e r skos : broader ? customer .
? customer skos : broader ? natio n .
? nation : name ?name .
}
GROUP BY ?name
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 15 / 19
Aalborg University
Query rewriting
SELECT ?name sum(? p r i c e )
WHERE {
? l i n e i t e m : e x t e n d e d p r i c e ? p r i c e ;
: h a s o r d e r ? o r d e r .
? o r d e r skos : broader ? customer .
? customer skos : broader ? natio n .
? nation : name ?name .
}
GROUP BY ?name
SELECT ?name sum(? p r i c e )
WHERE {
? l i n e i t e m : e x t e n d e d p r i c e ? p r i c e ;
: h a s o r d e r ? o r d e r .
? o r d e r : nation name ?name .
}
GROUP BY ?name
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 16 / 19
Aalborg University
Results
Virtuoso
Star Denormalized
Increase in Triples 16 % 173 %
Avg. Decease in Query Time 600 % 700 %
Geo. M. Decease in Query Time 110 % 140 %
Cost of triple storage
Static and frequently changing data
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 17 / 19
Aalborg University
Future Work
More cube optimizations
Consider data provenance and
quality
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 18 / 19
Thank you
Aalborg University
SWOD Abstract
Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD Abstract
Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD Abstract
Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD Abstract
Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD Abstract
Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD Abstract
Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD Abstract
Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD Abstract
Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD Abstract
Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD Abstract
Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD Abstract
Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD Abstract
Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
SWOD Abstract
Example
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19
Aalborg University
Figure Credits
Workman – Licence: CC BY 3.0
Credit: www.clipartbest.com
Cube – Licence: CC BY 3.0
Credit: www.clipartbest.com
Turing machine
http://www.felienne.com/
Steps
http://www.cliparthut.com/
Future-work
http://www.horsesforsources.com/
Kim Ahlstrøm Jakobsen Optimizing RDF Data Cubes 19 / 19

More Related Content

PPTX
Federated SPARQL Query Processing ISWC2015 Tutorial
PPTX
Federated SPARQL query processing over the Web of Data
PPTX
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
PPTX
Efficient source selection for sparql endpoint federation
PPTX
Federated Query Formulation and Processing Through BioFed
PPT
Introduction to Semantic Web for GIS Practitioners
PPTX
RDF Stream Processing: Let's React
PPTX
RDF Stream Processing and the role of Semantics
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL query processing over the Web of Data
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
Efficient source selection for sparql endpoint federation
Federated Query Formulation and Processing Through BioFed
Introduction to Semantic Web for GIS Practitioners
RDF Stream Processing: Let's React
RDF Stream Processing and the role of Semantics

Similar to Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries (20)

PPTX
Concepts of Query Processing in ADBMS.pptx
PPTX
OLAP Basics and Fundamentals by Bharat Kalia
PPTX
Apache Kylin @ Big Data Europe 2015
PDF
Don’t optimize my queries, optimize my data!
PPTX
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
PDF
An Efficient Approach for Clustering High Dimensional Data
PPTX
Big Data Analytics V2
PDF
Leveraging Big Data and Real-Time Analytics at Cxense
PDF
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
PPTX
Design cube in Apache Kylin
PDF
OLAP IN DATA MINING
PDF
qCube: Efficient integration of range query operators over a high dimension d...
PPTX
Blinkdb
PPTX
Apache Kylin - OLAP Cubes for SQL on Hadoop
PPTX
Apache Kylin – Cubes on Hadoop
PDF
Don't optimize my queries, organize my data!
PPTX
Data cube computation
PPTX
Lazy beats Smart and Fast
PPTX
IBANK - Big data www.ibank.uk.com 07474222079
PDF
rhbase_tutorial
Concepts of Query Processing in ADBMS.pptx
OLAP Basics and Fundamentals by Bharat Kalia
Apache Kylin @ Big Data Europe 2015
Don’t optimize my queries, optimize my data!
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
An Efficient Approach for Clustering High Dimensional Data
Big Data Analytics V2
Leveraging Big Data and Real-Time Analytics at Cxense
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
Design cube in Apache Kylin
OLAP IN DATA MINING
qCube: Efficient integration of range query operators over a high dimension d...
Blinkdb
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin – Cubes on Hadoop
Don't optimize my queries, organize my data!
Data cube computation
Lazy beats Smart and Fast
IBANK - Big data www.ibank.uk.com 07474222079
rhbase_tutorial
Ad

Recently uploaded (20)

PDF
Navigating the Thai Supplements Landscape.pdf
PPTX
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
PPTX
eGramSWARAJ-PPT Training Module for beginners
PPTX
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
PPTX
New ISO 27001_2022 standard and the changes
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
Machine Learning and working of machine Learning
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PDF
Loose-Leaf for Auditing & Assurance Services A Systematic Approach 11th ed. E...
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PPTX
SET 1 Compulsory MNH machine learning intro
PPTX
ai agent creaction with langgraph_presentation_
PPTX
The Data Security Envisioning Workshop provides a summary of an organization...
PPT
statistic analysis for study - data collection
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
Session 11 - Data Visualization Storytelling (2).pdf
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PPTX
recommendation Project PPT with details attached
PPTX
chrmotography.pptx food anaylysis techni
Navigating the Thai Supplements Landscape.pdf
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
eGramSWARAJ-PPT Training Module for beginners
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
New ISO 27001_2022 standard and the changes
IMPACT OF LANDSLIDE.....................
Machine Learning and working of machine Learning
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Loose-Leaf for Auditing & Assurance Services A Systematic Approach 11th ed. E...
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
SET 1 Compulsory MNH machine learning intro
ai agent creaction with langgraph_presentation_
The Data Security Envisioning Workshop provides a summary of an organization...
statistic analysis for study - data collection
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
retention in jsjsksksksnbsndjddjdnFPD.pptx
Session 11 - Data Visualization Storytelling (2).pdf
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
recommendation Project PPT with details attached
chrmotography.pptx food anaylysis techni
Ad

Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries