SlideShare a Scribd company logo
Alejandro Llaves
Ontology Engineering Group
Universidad Politécnica de Madrid
Madrid, Spain
allaves@fi.upm.es
Oct 21 2015
Virtual Clusters for
(RDF) Stream Processing
Outline

Some context: morph-streams++

Motivation

Use case: Sensor Cloud data integration

Topologies everywhere

Setting up a virtual cluster

Deploying Storm topologies

Conclusion
Some context...
Motivation

Integrating an unbounded stream of heterogeneous
sensor observations

Solution:
– Storm topologies for real-time processing
– Semantic Sensor Network (SSN) ontology for
modelling observations
– SWEET ontology for environmental phenomena
Use case: Sensor Cloud data integration (1/3)
Sensor Cloud

Viticulture, water
management, weather
monitoring, oyster farming...

RESTful API – JSON

Network → Platform →
Sensor → Phenomenon →
Observation

Lack of semantic
descriptions, e.g.
rain_trace vs Rain.

Multiple HTTP requests to
query various streams.
Source: CSIRO
Use case: Sensor Cloud data integration (2/3)

Sensor Cloud messages to field-named tuples

SWEET annotations for heterogeneous phenomena descriptions
<sample time=”2015­05­28T16:30” value=”15” sensor=”bom_gov_au.94961.air.air_temp”/>
[“2015­05­28T16:32”, “2015­05­28T16:30”, “15”, “bom_gov_au”, “94961”, “air”, “air_temp”,
“­43.3167”, “147.0075”]
network
phenomenon
platform sensorsampling time
system time
latitude longitude
SensorCloudParser
Bolt
SweetAnnotations
Bolt
Use case: Sensor Cloud data integration (3/3)
SSN mapping
SSNConverter
Bolt
Topologies everywhere

A Storm topology “is a graph of stream transformations
where each node is a spout or bolt”.
https://storm.apache.org/documentation/Tutorial.html

Example of simple topology
Virtual Clusters for (RDF) Stream Processing
Virtual Clusters for (RDF) Stream Processing
Setting up a virtual cluster (1/2)
Wirbelsturm - https://github.com/miguno/wirbelsturm/

Allows deploying (local or remote) virtual clusters.

Focus on Big Data technologies: Storm, Kafka,
Zookeeper...

Uses Vagrant for “easy to configure, reproducible, and
portable work environments” - https://docs.vagrantup.com/v2/why-vagrant/index.html

Uses Puppet for provisioning: installation and
configuration of SW packages in the cluster nodes.
Setting up a virtual cluster (2/2)

$ ./deploy

Show wirbelsturm.yaml

Check Storm GUI -
http://localhost:28080/index.html
Deploying Storm topologies

$ ./deploy

Show wirbelsturm.yaml

Check Storm GUI -
http://localhost:28080/index.html

Describe simple topology

Compile & deploy

Describe a topology set

Configure Kafka

Compile & deploy
Virtual Clusters for (RDF) Stream Processing
Conclusion
Conclusion

Wirbelsturm allows easy configuration & deployment of virtual clusters,
with focus on Big Data technologies.

SSN and SWEET ontologies to model and integrate environmental
sensor observations.

Parallelization of bottleneck tasks reduces the average message
processing latency (up to some extent). More about Storm
parallelization: http://bit.ly/1NVyjU2

Delaying RDF conversion does not speed up the processing of Sensor
Cloud messages in the tested environment.

Submitted paper to IJSWIS, special issue on Velocity and Variety
Dimensions of Big Data – Llaves, Corcho et al.
What's coming next

Flying faster with Heron - https://blog.twitter.com/2015/flying-faster-with-twitter-heron
The presented research has has been funded by Ministerio de
Economía y Competitividad (Spain) under the project ”4V:
Volumen, Velocidad, Variedad y Validez en la Gestión Innovadora
de Datos” (TIN2013-46238-C4-2-R), by the EU Marie Curie
IRSES project SemData (612551), and supported by an AWS in
Education Research Grant award.
Alejandro Llaves
allaves@fi.upm.es
Thanks!

More Related Content

PPTX
2019 swan-cs3
PDF
Container orchestration in geo-distributed cloud computing platforms
DOCX
da-sync a doppler-assisted time-synchronization scheme for mobile underwater ...
PDF
Storm @ Fifth Elephant 2013
PDF
Manning_3D_Cloud_AGU_Poster
PDF
From data centers to fog computing: the evaporating cloud
PPTX
Zookeeper-aware application server
PDF
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...
2019 swan-cs3
Container orchestration in geo-distributed cloud computing platforms
da-sync a doppler-assisted time-synchronization scheme for mobile underwater ...
Storm @ Fifth Elephant 2013
Manning_3D_Cloud_AGU_Poster
From data centers to fog computing: the evaporating cloud
Zookeeper-aware application server
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...

Similar to Virtual Clusters for (RDF) Stream Processing (20)

PDF
What we do to improve scalability in our RDF processing system
PDF
Storm@Twitter, SIGMOD 2014 paper
PPTX
Introduction to Storm
PDF
The Future of Apache Storm
PPTX
The Future of Apache Storm
PPTX
Scaling Apache Storm (Hadoop Summit 2015)
PDF
Real-time Big Data Processing with Storm
PPTX
Multi-tenant Apache Storm as a service
PDF
Mhug apache storm
PPTX
Introduction to Streaming Distributed Processing with Storm
PDF
The Future of Apache Storm
PDF
Red Hat Storage: Emerging Use Cases
PDF
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
PDF
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...
PPTX
From Gust To Tempest: Scaling Storm
PDF
Streaming Analytics Unit 3 notes for engineers
PPT
Docker Based Hadoop Provisioning
PPTX
Cassandra summit-2013
PDF
Ipres2019 sn-stormcrawler
PPTX
Slide #1:Introduction to Apache Storm
What we do to improve scalability in our RDF processing system
Storm@Twitter, SIGMOD 2014 paper
Introduction to Storm
The Future of Apache Storm
The Future of Apache Storm
Scaling Apache Storm (Hadoop Summit 2015)
Real-time Big Data Processing with Storm
Multi-tenant Apache Storm as a service
Mhug apache storm
Introduction to Streaming Distributed Processing with Storm
The Future of Apache Storm
Red Hat Storage: Emerging Use Cases
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...
From Gust To Tempest: Scaling Storm
Streaming Analytics Unit 3 notes for engineers
Docker Based Hadoop Provisioning
Cassandra summit-2013
Ipres2019 sn-stormcrawler
Slide #1:Introduction to Apache Storm
Ad

Recently uploaded (20)

PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Business Analytics and business intelligence.pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Computer network topology notes for revision
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
annual-report-2024-2025 original latest.
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
IB Computer Science - Internal Assessment.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Data_Analytics_and_PowerBI_Presentation.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Business Analytics and business intelligence.pdf
Reliability_Chapter_ presentation 1221.5784
Supervised vs unsupervised machine learning algorithms
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Introduction to Knowledge Engineering Part 1
climate analysis of Dhaka ,Banglades.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Computer network topology notes for revision
Clinical guidelines as a resource for EBP(1).pdf
annual-report-2024-2025 original latest.
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Ad

Virtual Clusters for (RDF) Stream Processing

  • 1. Alejandro Llaves Ontology Engineering Group Universidad Politécnica de Madrid Madrid, Spain allaves@fi.upm.es Oct 21 2015 Virtual Clusters for (RDF) Stream Processing
  • 2. Outline  Some context: morph-streams++  Motivation  Use case: Sensor Cloud data integration  Topologies everywhere  Setting up a virtual cluster  Deploying Storm topologies  Conclusion
  • 4. Motivation  Integrating an unbounded stream of heterogeneous sensor observations  Solution: – Storm topologies for real-time processing – Semantic Sensor Network (SSN) ontology for modelling observations – SWEET ontology for environmental phenomena
  • 5. Use case: Sensor Cloud data integration (1/3) Sensor Cloud  Viticulture, water management, weather monitoring, oyster farming...  RESTful API – JSON  Network → Platform → Sensor → Phenomenon → Observation  Lack of semantic descriptions, e.g. rain_trace vs Rain.  Multiple HTTP requests to query various streams. Source: CSIRO
  • 6. Use case: Sensor Cloud data integration (2/3)  Sensor Cloud messages to field-named tuples  SWEET annotations for heterogeneous phenomena descriptions <sample time=”2015­05­28T16:30” value=”15” sensor=”bom_gov_au.94961.air.air_temp”/> [“2015­05­28T16:32”, “2015­05­28T16:30”, “15”, “bom_gov_au”, “94961”, “air”, “air_temp”, “­43.3167”, “147.0075”] network phenomenon platform sensorsampling time system time latitude longitude SensorCloudParser Bolt SweetAnnotations Bolt
  • 7. Use case: Sensor Cloud data integration (3/3) SSN mapping SSNConverter Bolt
  • 8. Topologies everywhere  A Storm topology “is a graph of stream transformations where each node is a spout or bolt”. https://storm.apache.org/documentation/Tutorial.html  Example of simple topology
  • 11. Setting up a virtual cluster (1/2) Wirbelsturm - https://github.com/miguno/wirbelsturm/  Allows deploying (local or remote) virtual clusters.  Focus on Big Data technologies: Storm, Kafka, Zookeeper...  Uses Vagrant for “easy to configure, reproducible, and portable work environments” - https://docs.vagrantup.com/v2/why-vagrant/index.html  Uses Puppet for provisioning: installation and configuration of SW packages in the cluster nodes.
  • 12. Setting up a virtual cluster (2/2)  $ ./deploy  Show wirbelsturm.yaml  Check Storm GUI - http://localhost:28080/index.html
  • 13. Deploying Storm topologies  $ ./deploy  Show wirbelsturm.yaml  Check Storm GUI - http://localhost:28080/index.html  Describe simple topology  Compile & deploy  Describe a topology set  Configure Kafka  Compile & deploy
  • 15. Conclusion Conclusion  Wirbelsturm allows easy configuration & deployment of virtual clusters, with focus on Big Data technologies.  SSN and SWEET ontologies to model and integrate environmental sensor observations.  Parallelization of bottleneck tasks reduces the average message processing latency (up to some extent). More about Storm parallelization: http://bit.ly/1NVyjU2  Delaying RDF conversion does not speed up the processing of Sensor Cloud messages in the tested environment.  Submitted paper to IJSWIS, special issue on Velocity and Variety Dimensions of Big Data – Llaves, Corcho et al. What's coming next  Flying faster with Heron - https://blog.twitter.com/2015/flying-faster-with-twitter-heron
  • 16. The presented research has has been funded by Ministerio de Economía y Competitividad (Spain) under the project ”4V: Volumen, Velocidad, Variedad y Validez en la Gestión Innovadora de Datos” (TIN2013-46238-C4-2-R), by the EU Marie Curie IRSES project SemData (612551), and supported by an AWS in Education Research Grant award. Alejandro Llaves allaves@fi.upm.es Thanks!