SlideShare a Scribd company logo
Alejandro Llaves and
Oscar Corcho
Ontology Engineering Group
Universidad Politécnica de Madrid
Madrid, Spain
allaves@fi.upm.es
May 31 2015
Scalability in RDF Stream
Processing Systems
Alejandro Llaves and
Oscar Corcho
Ontology Engineering Group
Universidad Politécnica de Madrid
Madrid, Spain
allaves@fi.upm.es
May 31 2015
What we do to improve
Scalability in our RDF Stream
Processing System
Outline

Towards efficient processing of RDF data streams

Architecture overview

Parallelizing the pre-processing of sensor data streams

Example of use: CSIRO's Sensor Cloud

Discussion and future work
Towards efficient processing of RDF data streams

Goal: to develop a stream processing engine
capable of adapting to variable conditions, such as
changing rates of input data, failure of processing
nodes, or distribution of workload, while serving
complex continuous queries.

Example of query execution parallelization (OrdRing 2014)
Project
Project
Project
Project
Simple Join
Simple Join
Simple Join
Simple Join
Windowing
<t1, t3...>
Windowing
<t0, t2...>
Windowing
<t1, t3...>
Windowing
<t0, t2...>
Storm topology example (4 nodes)
SELECT ?obs.value ?sensors.location
FROM NAMED STREAM <obs> [60 SEC TO NOW]
FROM NAMED STREAM <sensors> [60 SEC TO NOW]
WHERE obs.sensorId = sensors.id ;
SPOUT
<obs>
Triple2Graph
Output
SPOUT
<sensors> Triple2Graph
t0
t1
t2
t3
morph-streams++ architecture
Parallelizing the pre-processing of sensor data streams
Methodology
1. Transform data input into field-named tuples
2. Add semantic annotations (if needed)
3. Publish tuples to multiple channels
4. Convert tuples to RDF on (query) demand
Focus

Storm topologies

Environmental sensor observations

Using Semantic Sensor Network (SSN) ontology
Example of use: CSIRO's Sensor Cloud (1/3)
Sensor Cloud

Viticulture, water
management, weather
monitoring, oyster farming...

RESTful API – JSON

Network → Platform →
Sensor → Phenomenon →
Observation

Lack of semantic
descriptions, e.g.
rain_trace vs Rain.

Multiple HTTP requests to
query various streams.
Source: CSIRO
Example of use: CSIRO's Sensor Cloud (2/3)
1.Sensor Cloud messages to field-named tuples
2.SWEET annotations for phenomena
<sample time=”2015­05­28T16:30” value=”48” sensor=”bom_gov_au.94961.air.rel_hum”/>
[“2015­05­28T16:32”, “2015­05­28T16:30”, “48”, “bom_gov_au”, “94961”, “air”, “rel_hum”,
“­43.3167”, “147.0075”]
network
phenomenon
platform sensorsampling time
system time
latitude longitude
Example of use: CSIRO's Sensor Cloud (3/3)
3.Publish to multiple channels
4.Convert tuples to SSN model on (query) demand
Publish2Kafka
Publish2Kafka
SWEET
Annotation
SWEET
Annotation
SC2Tuple
SC2Tuple
Sensor Cloud
spout
sweet:Rainfall
All tuples
sweet:RelativeHumidity
sweet:Rainfall
All tuples
sweet:RelativeHumidity Kafka
spout
Tuple2SSN
Tuple2SSN
to be continued...
Query: heavy rainfall events
Discussion and future work
Conclusion

Division of work into simple tasks.

Parallelize any parallelizable task.

Delay RDF generation and convert on demand.
Future work

Evaluation and benchmarking.

SSN mapping interface.

Topology package: executing distributed queries (Storm).

theObserver (theO) package: monitoring scalability metrics
for adaptive query processing.
The presented research has has been funded by Ministerio de
Economía y Competitividad (Spain) under the project ”4V:
Volumen, Velocidad, Variedad y Validez en la Gestión Innovadora
de Datos” (TIN2013-46238-C4-2-R), by the EU Marie Curie
IRSES project SemData (612551), and supported by an AWS in
Education Research Grant award.
Alejandro Llaves
allaves@fi.upm.es
Thanks!

More Related Content

PPTX
OCC Overview OMG Clouds Meeting 07-13-09 v3
PPTX
Bioclouds CAMDA (Robert Grossman) 09-v9p
PPTX
Slide 1
PPTX
Project Matsu: Elastic Clouds for Disaster Relief
PPT
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
PDF
Moa: Real Time Analytics for Data Streams
PPT
CCLS Internship Presentation
PPTX
Bionimbus - An Overview (2010-v6)
OCC Overview OMG Clouds Meeting 07-13-09 v3
Bioclouds CAMDA (Robert Grossman) 09-v9p
Slide 1
Project Matsu: Elastic Clouds for Disaster Relief
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Moa: Real Time Analytics for Data Streams
CCLS Internship Presentation
Bionimbus - An Overview (2010-v6)

What's hot (20)

PDF
Sentiment Knowledge Discovery in Twitter Streaming Data
PPTX
Health & Status Monitoring (2010-v8)
PPTX
My Other Computer is a Data Center: The Sector Perspective on Big Data
PPTX
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
PDF
Cloud-based Data Stream Processing
PPTX
Streaming Algorithms
PDF
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
PPTX
ReComp: challenges in selective recomputation of (expensive) data analytics t...
PDF
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
PDF
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
PPT
B4 greengrid
PDF
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
PDF
Mining big data streams with APACHE SAMOA by Albert Bifet
PPTX
Dev Games!
PDF
Spark for Behavioral Analytics Research: Spark Summit East talk by John W u
PDF
Pitfalls in benchmarking data stream classification and how to avoid them
PDF
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
PDF
towards_analytics_query_engine
PDF
K venkata reddy
PPTX
NERSC, AI and the Superfacility, Debbie Bard
Sentiment Knowledge Discovery in Twitter Streaming Data
Health & Status Monitoring (2010-v8)
My Other Computer is a Data Center: The Sector Perspective on Big Data
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Cloud-based Data Stream Processing
Streaming Algorithms
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
ReComp: challenges in selective recomputation of (expensive) data analytics t...
Cortana Analytics Workshop: Real-Time Data Processing -- How Do I Choose the ...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
B4 greengrid
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
Mining big data streams with APACHE SAMOA by Albert Bifet
Dev Games!
Spark for Behavioral Analytics Research: Spark Summit East talk by John W u
Pitfalls in benchmarking data stream classification and how to avoid them
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
towards_analytics_query_engine
K venkata reddy
NERSC, AI and the Superfacility, Debbie Bard
Ad

Viewers also liked (17)

PDF
Oportunidades en turismo 2011
PPTX
Primary research on gta v
PDF
Ch02
PPTX
Siba nbs darwin c sandy
PPTX
Question 3 of AS evaluation
PPTX
Presentazione esiti occupazionali istruzione e formazione 2013/14
PPTX
Research into the title and opening credit for final production
DOCX
Jaringan tumbuhan
PPTX
PELAYANAN DINAS KEPENDUDUKAN DAN CATATAN SIPIL DALAM PEMBUATAN AKTA PERKAWIN...
PDF
Oportunidades en la industria alimentaria 2013
PPTX
Ими гордится Россия
PPTX
Linux Day Torino 2015 applicazioni per iniziare
PPTX
China's professional networking sites overview (incl. LinkedIn)
PDF
Pós-crise e comunicação: o papel publicitário do líder organizacional- Joana...
PPTX
Grand Cianjur
PPT
Sage run error checking
PDF
Как использовать Agile и не зафакапить проект
Oportunidades en turismo 2011
Primary research on gta v
Ch02
Siba nbs darwin c sandy
Question 3 of AS evaluation
Presentazione esiti occupazionali istruzione e formazione 2013/14
Research into the title and opening credit for final production
Jaringan tumbuhan
PELAYANAN DINAS KEPENDUDUKAN DAN CATATAN SIPIL DALAM PEMBUATAN AKTA PERKAWIN...
Oportunidades en la industria alimentaria 2013
Ими гордится Россия
Linux Day Torino 2015 applicazioni per iniziare
China's professional networking sites overview (incl. LinkedIn)
Pós-crise e comunicação: o papel publicitário do líder organizacional- Joana...
Grand Cianjur
Sage run error checking
Как использовать Agile и не зафакапить проект
Ad

Similar to What we do to improve scalability in our RDF processing system (20)

PDF
Weather Station Data Publication at Irstea: an implementation Report.
PPTX
XGSN: An Open-source Semantic Sensing Middleware for the Web of Things
PPT
Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...
PDF
Virtual Clusters for (RDF) Stream Processing
PPTX
Enabling semantic integration
PDF
SemsorGrid4Env (Newsfromthefront 2010)
PPTX
Ingredients for Semantic Sensor Networks
PPTX
GSN Global Sensor Networks for Environmental Data Management
PPT
X-GSN in OpenIoT SummerSchool
PDF
Semantic Sensor Web
PDF
ACC-2012, Bangalore, India, 28 July, 2012
PPT
Semantics in Sensor Networks
PPTX
RDF Stream Processing Tutorial: RSP implementations
PDF
Scylla Summit 2022: Stream Processing with ScyllaDB
PPTX
Toward Semantic Sensor Data Archives on the Web
PPTX
SRBench Streaming RDF SPARQL Benchmark
PPTX
Semantic Sensor Networks and Linked Stream Data
PPTX
SSG4Env EGU2010
PDF
Semantic Discovery and Integration of Urban Data Streams
Weather Station Data Publication at Irstea: an implementation Report.
XGSN: An Open-source Semantic Sensing Middleware for the Web of Things
Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...
Virtual Clusters for (RDF) Stream Processing
Enabling semantic integration
SemsorGrid4Env (Newsfromthefront 2010)
Ingredients for Semantic Sensor Networks
GSN Global Sensor Networks for Environmental Data Management
X-GSN in OpenIoT SummerSchool
Semantic Sensor Web
ACC-2012, Bangalore, India, 28 July, 2012
Semantics in Sensor Networks
RDF Stream Processing Tutorial: RSP implementations
Scylla Summit 2022: Stream Processing with ScyllaDB
Toward Semantic Sensor Data Archives on the Web
SRBench Streaming RDF SPARQL Benchmark
Semantic Sensor Networks and Linked Stream Data
SSG4Env EGU2010
Semantic Discovery and Integration of Urban Data Streams

Recently uploaded (20)

PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PDF
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PPTX
Current and future trends in Computer Vision.pptx
PPTX
Module 8- Technological and Communication Skills.pptx
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PPTX
Management Information system : MIS-e-Business Systems.pptx
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PPTX
communication and presentation skills 01
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
737-MAX_SRG.pdf student reference guides
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PPT
Occupational Health and Safety Management System
Automation-in-Manufacturing-Chapter-Introduction.pdf
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
Fundamentals of safety and accident prevention -final (1).pptx
Current and future trends in Computer Vision.pptx
Module 8- Technological and Communication Skills.pptx
Exploratory_Data_Analysis_Fundamentals.pdf
Safety Seminar civil to be ensured for safe working.
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
III.4.1.2_The_Space_Environment.p pdffdf
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
Management Information system : MIS-e-Business Systems.pptx
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
communication and presentation skills 01
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
737-MAX_SRG.pdf student reference guides
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Occupational Health and Safety Management System

What we do to improve scalability in our RDF processing system

  • 1. Alejandro Llaves and Oscar Corcho Ontology Engineering Group Universidad Politécnica de Madrid Madrid, Spain allaves@fi.upm.es May 31 2015 Scalability in RDF Stream Processing Systems
  • 2. Alejandro Llaves and Oscar Corcho Ontology Engineering Group Universidad Politécnica de Madrid Madrid, Spain allaves@fi.upm.es May 31 2015 What we do to improve Scalability in our RDF Stream Processing System
  • 3. Outline  Towards efficient processing of RDF data streams  Architecture overview  Parallelizing the pre-processing of sensor data streams  Example of use: CSIRO's Sensor Cloud  Discussion and future work
  • 4. Towards efficient processing of RDF data streams  Goal: to develop a stream processing engine capable of adapting to variable conditions, such as changing rates of input data, failure of processing nodes, or distribution of workload, while serving complex continuous queries.  Example of query execution parallelization (OrdRing 2014) Project Project Project Project Simple Join Simple Join Simple Join Simple Join Windowing <t1, t3...> Windowing <t0, t2...> Windowing <t1, t3...> Windowing <t0, t2...> Storm topology example (4 nodes) SELECT ?obs.value ?sensors.location FROM NAMED STREAM <obs> [60 SEC TO NOW] FROM NAMED STREAM <sensors> [60 SEC TO NOW] WHERE obs.sensorId = sensors.id ; SPOUT <obs> Triple2Graph Output SPOUT <sensors> Triple2Graph t0 t1 t2 t3
  • 6. Parallelizing the pre-processing of sensor data streams Methodology 1. Transform data input into field-named tuples 2. Add semantic annotations (if needed) 3. Publish tuples to multiple channels 4. Convert tuples to RDF on (query) demand Focus  Storm topologies  Environmental sensor observations  Using Semantic Sensor Network (SSN) ontology
  • 7. Example of use: CSIRO's Sensor Cloud (1/3) Sensor Cloud  Viticulture, water management, weather monitoring, oyster farming...  RESTful API – JSON  Network → Platform → Sensor → Phenomenon → Observation  Lack of semantic descriptions, e.g. rain_trace vs Rain.  Multiple HTTP requests to query various streams. Source: CSIRO
  • 8. Example of use: CSIRO's Sensor Cloud (2/3) 1.Sensor Cloud messages to field-named tuples 2.SWEET annotations for phenomena <sample time=”2015­05­28T16:30” value=”48” sensor=”bom_gov_au.94961.air.rel_hum”/> [“2015­05­28T16:32”, “2015­05­28T16:30”, “48”, “bom_gov_au”, “94961”, “air”, “rel_hum”, “­43.3167”, “147.0075”] network phenomenon platform sensorsampling time system time latitude longitude
  • 9. Example of use: CSIRO's Sensor Cloud (3/3) 3.Publish to multiple channels 4.Convert tuples to SSN model on (query) demand Publish2Kafka Publish2Kafka SWEET Annotation SWEET Annotation SC2Tuple SC2Tuple Sensor Cloud spout sweet:Rainfall All tuples sweet:RelativeHumidity sweet:Rainfall All tuples sweet:RelativeHumidity Kafka spout Tuple2SSN Tuple2SSN to be continued... Query: heavy rainfall events
  • 10. Discussion and future work Conclusion  Division of work into simple tasks.  Parallelize any parallelizable task.  Delay RDF generation and convert on demand. Future work  Evaluation and benchmarking.  SSN mapping interface.  Topology package: executing distributed queries (Storm).  theObserver (theO) package: monitoring scalability metrics for adaptive query processing.
  • 11. The presented research has has been funded by Ministerio de Economía y Competitividad (Spain) under the project ”4V: Volumen, Velocidad, Variedad y Validez en la Gestión Innovadora de Datos” (TIN2013-46238-C4-2-R), by the EU Marie Curie IRSES project SemData (612551), and supported by an AWS in Education Research Grant award. Alejandro Llaves allaves@fi.upm.es Thanks!