SlideShare a Scribd company logo
First Joint International Workshop on
Semantic Sensor Networks and Terra Cognita
October 11, 2015, Bethlehem, PA, USA
Emrooz: A Scalable Database for
SSN Observations
Markus Stocker, Narasinha Shurpali, Kerry Taylor, George
Burba, Mauno Rönkkö, Mikko Kolehmainen
markus.stocker@uef.fi
@markusstocker and @envinf
2
Introduction
Expressive ontologies for sensor (meta-) data (SSN)
Flexible graph data model (RDF)
Triple stores obvious choice
Unfortunately hardly viable at scale
Triple stores indexes for graph pattern queries
Not designed for time series interval queries
3
Aim
Build a database that ...
Consumes SSN observations in RDF
Evaluates SSN observation SPARQL queries
Scales to billions of observations
Has better query performance than triple stores
4
Architecture
5
Cassandra data model
Schema consisting of
Partition key (row key) of type ascii
Clustering key (column name) of type timeuuid
Column value of type blob
The partition key consists of two (dash-concatenated) parts
SHA-256 hex string digest of sensor-property-feature URIs
Date time string of pattern yyyyMMddHHmm
Computed from observation result time
Floor-rounded to year, month, day, hour, or minute
Rounding depends on sensor sampling frequency
Goal is to limit the number of columns per row
Clustering key determined by observation result time
Column value is set of triples for observation (binary)
6
Experiments
LI-7500A Open Path CO2/H2O Gas Analyzer
LI-7700 Open Path CH4 Analyzer
Property of mole fraction
Three features for the monitored gases
SSN-TC workshop talk at ISWC 2015 on Emrooz
8
Experiments
January 7 to May 26, 2015, 6045 GHG archive files
Estimated # of sensor observations is 326 430 000
Estimated # of triples is 4.9 billion (15 triples / observation)
Load and query performance on 10 subsets
SPARQL query with 10 min interval
Compared to Stardog and Blazegraph
Test performance with varying time interval
9
The query
select ?time ?value
where { [
ssn:observedBy licor:LERS-75H-2035 ;
ssn:observedProperty sweet-propFraction:MoleFraction ;
ssn:featureOfInterest sweet-matrCompound:CO2 ;
ssn:observationResultTime [ time:inXSDDateTime ?time ] ;
ssn:observationResult [ ssn:hasValue [
dul:hasRegionDataValue ?value
] ]
]
filter (?time >= "2015-04-15T00:00:00.000+06:00"^^xsd:dateTime
&& ?time < "2015-04-15T00:10:00.000+06:00"^^xsd:dateTime)
}
order by asc(?time)
10
Results: Some figures
Subset Observations Triples Distinct
30 m 54 000 810 000 648 007
1 h 108 000 1 620 000 1 296 007
3 h 324 000 4 860 000 3 888 007
6 h 647 997 9 719 955 7 775 971
12 h 1 295 997 19 439 955 15 551 971
1 d 2 591 994 38 879 910 31 103 935
7 d 18 140 271 272 104 065 217 683 259
1 M 72 526 464 1 087 896 960 870 317 575
3 M 194 188 107 2 912 821 605 *
J-M 328 715 445 4 930 731 675 *
11
Results: Load performance
10
100
1000
10000
100000
1000000
30 m 1 h 3 h 6 h 12 h 1 d 7 d 1 m 3 m J-M
Time(logscale)[s]
Subsets
Emrooz
Blazegraph
Stardog
12
Results: Query performance
10
100
1000
30 m 1 h 3 h 6 h 12 h 1 d 7 d 1 m 3 m J-M
Time(logscale)[s]
Subsets
Emrooz
Blazegraph
Stardog
13
Results: Query size performance
0
1
2
3
4
5
6
7
8
9
10
1 s 30 s 1 m 5 m 10 m 20 m 30 m 40 m 50 m 60 m
Time[s]
Query time interval
Emrooz
14
REST
curl http://localhost:8080/sensors/list
curl http://localhost:8080/properties/list
curl http://localhost:8080/features/list
curl -H "Accept: application/json" 
http://localhost:8080/sensors/list
curl -H "Accept: text/csv" -G 
--data-urlencode sensor=http://example.org#thermometer 
--data-urlencode property=http://example.org#temperature 
--data-urlencode feature=http://example.org#air 
--data-urlencode from=2015-04-21T01:00:00.000+03:00 
--data-urlencode to=2015-04-21T02:00:00.000+03:00 
http://localhost:8080/observations/sensor/list
15
R
host <- "http://localhost:8080"
df.sensors <- read.csv(text=getURL(paste0(host, "/sensors/list")),
header=FALSE, col.names=c("sensor"))
df.sensors
sensor
1 http://licor.com#LERS-75H-CH4
2 http://licor.com#LERS-75H-CO2
16
R
host <- "http://localhost:8080"
sensor <- "http://licor.com#LERS-75H-CO2"
property <- "http://sweet.jpl.nasa.gov/2.3/propMass.owl#Density"
feature <- "http://sweet.jpl.nasa.gov/2.3/matrCompound.owl#CarbonDioxide"
from <- "2015-01-07T00:00:00.000+06:00"
to <- "2015-01-07T00:01:00.000+06:00"
url <- paste0(host, "/observations/sensor/list?",
"sensor=", curlEscape(sensor),
"&property=", curlEscape(property),
"&feature=", curlEscape(feature),
"&from=", curlEscape(from),
"&to=", curlEscape(to))
df.observations <- read.csv(text=getURL(url,
httpheader=c(Accept="text/csv")), header=TRUE, sep=",")
ggplot(data=df.observations, aes(time, value))
+ geom_line() + xlab("Time") + ylab("CO2 [mmol m-3]")
17
R
18.00
18.05
18.10
18.15
00 15 30 45 00
Time
CO2[mmolm−3]
18
Related and future work
Other authors have pointed out the problem
“Semantification of measurement data not promising”
RDF databases on NoSQL systems (e.g. Cumulus RDF)
Support for QB observations (done)
REST API (preliminary)
Integration with R/Matlab (preliminary)
Performance comparison with other systems
19
Conclusion
SSN and RDF nice for sensor (meta-) data
Triple stores inadequate for observation data
Alternative approaches required
What are the advantages and disadvantages?
Reasoning on all data by some sensor?
Query for observation values exceeding threshold?

More Related Content

PPTX
SPARQLstream and Morph-streams
PDF
Influxdb and time series data
PDF
Time Series Processing with Solr and Spark
PDF
Illustrator_Sample
PDF
The new time series kid on the block
PDF
Chronix Poster for the Poster Session FAST 2017
PDF
A Fast and Efficient Time Series Storage Based on Apache Solr
PPT
Faceting optimizations for Solr
SPARQLstream and Morph-streams
Influxdb and time series data
Time Series Processing with Solr and Spark
Illustrator_Sample
The new time series kid on the block
Chronix Poster for the Poster Session FAST 2017
A Fast and Efficient Time Series Storage Based on Apache Solr
Faceting optimizations for Solr

What's hot (19)

PPT
Counters for real-time statistics
PPTX
Round Table Introduction: Analytics on 100 TB+ catalogs
PPTX
PDF
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
PDF
Go and Uber’s time series database m3
PDF
VO Course 02: Astronomy & Standards
PPTX
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
PDF
Building blocks for aggregate programming of self-organising applications
PPT
CCLS Internship Presentation
PPTX
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
PDF
Q4 2016 GeoTrellis Presentation
PDF
Big Data Analysis with Crate and Python
PPTX
Beyond Lists - Functional Kats Conf Dublin 2015
PPT
Oil and Gas Imaging, pumpsandpipesmdhc
PDF
Redis TimeSeries: Danni Moiseyev, Pieter Cailliau
PPT
Cassandra at talkbits
PPTX
SRAdb Bioconductor Package Overview
PDF
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...
PDF
JavaCro'15 - Big Data in a DIY home - Marko Švaljek
Counters for real-time statistics
Round Table Introduction: Analytics on 100 TB+ catalogs
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Go and Uber’s time series database m3
VO Course 02: Astronomy & Standards
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
Building blocks for aggregate programming of self-organising applications
CCLS Internship Presentation
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
Q4 2016 GeoTrellis Presentation
Big Data Analysis with Crate and Python
Beyond Lists - Functional Kats Conf Dublin 2015
Oil and Gas Imaging, pumpsandpipesmdhc
Redis TimeSeries: Danni Moiseyev, Pieter Cailliau
Cassandra at talkbits
SRAdb Bioconductor Package Overview
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...
JavaCro'15 - Big Data in a DIY home - Marko Švaljek
Ad

Viewers also liked (18)

PPTX
PPTX
07 application security fundamentals - part 2 - security mechanisms - data ...
DOCX
TERMPROJECT1
PPTX
Internet buscadores
DOC
Resume2015-1
PPTX
A study of the demographic differences of instructors in using e-Textbooks in...
PDF
Tulos foorumi elisa_ranta-aho_04112015
PDF
infographic TK
PPTX
Photography
PPS
DIWALI GREETINGS FROM SHREE NARENDRA MODI - DINESH VORA
PDF
GovindR&D ConferencePresentationOct2015pdf
PDF
Acnl2015 arjen uittenbogaard-echt agile veranderen
PDF
1 Million Favors & Counting
PPTX
Asso%20artchimade
PDF
bezar_pitch_catalog_FINAL_nocrops
PPSX
Target audience (isobel ellen sanger)
PPTX
9 урок теорія робочий стіл
PDF
OMC Brasil China
07 application security fundamentals - part 2 - security mechanisms - data ...
TERMPROJECT1
Internet buscadores
Resume2015-1
A study of the demographic differences of instructors in using e-Textbooks in...
Tulos foorumi elisa_ranta-aho_04112015
infographic TK
Photography
DIWALI GREETINGS FROM SHREE NARENDRA MODI - DINESH VORA
GovindR&D ConferencePresentationOct2015pdf
Acnl2015 arjen uittenbogaard-echt agile veranderen
1 Million Favors & Counting
Asso%20artchimade
bezar_pitch_catalog_FINAL_nocrops
Target audience (isobel ellen sanger)
9 урок теорія робочий стіл
OMC Brasil China
Ad

Similar to SSN-TC workshop talk at ISWC 2015 on Emrooz (20)

PPTX
Apache Lens at Hadoop meetup
PDF
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
PDF
Building Scalable Semantic Geospatial RDF Stores
PDF
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)
PPTX
Hybrid acquisition of temporal scopes for rdf data
PDF
What we do to improve scalability in our RDF processing system
PPT
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
PPT
On the need for a W3C community group on RDF Stream Processing
PDF
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
PDF
Time Series Analysis
PDF
Querying federations 
of Triple Pattern Fragments
PPTX
Virtual Science in the Cloud
PDF
A look ahead at spark 2.0
PDF
Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark.pdf
PDF
Making sense of your data
PDF
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
PPTX
Sensors, Mappings and Queries in the Semantic Web
PDF
Representing and Querying Geospatial Information in the Semantic Web
PPTX
Environment Canada's Data Management Service
PPTX
Spark Summit EU talk by Sameer Agarwal
Apache Lens at Hadoop meetup
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Building Scalable Semantic Geospatial RDF Stores
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)
Hybrid acquisition of temporal scopes for rdf data
What we do to improve scalability in our RDF processing system
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
On the need for a W3C community group on RDF Stream Processing
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Analysis
Querying federations 
of Triple Pattern Fragments
Virtual Science in the Cloud
A look ahead at spark 2.0
Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark.pdf
Making sense of your data
Modeling and Querying Metadata in the Semantic Sensor Web: stRDF and stSPARQL
Sensors, Mappings and Queries in the Semantic Web
Representing and Querying Geospatial Information in the Semantic Web
Environment Canada's Data Management Service
Spark Summit EU talk by Sameer Agarwal

Recently uploaded (20)

PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
L1 - Introduction to python Backend.pptx
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Nekopoi APK 2025 free lastest update
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
How to Migrate SBCGlobal Email to Yahoo Easily
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Design an Analysis of Algorithms II-SECS-1021-03
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
L1 - Introduction to python Backend.pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Odoo Companies in India – Driving Business Transformation.pdf
Upgrade and Innovation Strategies for SAP ERP Customers
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PTS Company Brochure 2025 (1).pdf.......
Softaken Excel to vCard Converter Software.pdf
Nekopoi APK 2025 free lastest update
Understanding Forklifts - TECH EHS Solution
CHAPTER 2 - PM Management and IT Context
How to Choose the Right IT Partner for Your Business in Malaysia

SSN-TC workshop talk at ISWC 2015 on Emrooz

  • 1. First Joint International Workshop on Semantic Sensor Networks and Terra Cognita October 11, 2015, Bethlehem, PA, USA Emrooz: A Scalable Database for SSN Observations Markus Stocker, Narasinha Shurpali, Kerry Taylor, George Burba, Mauno Rönkkö, Mikko Kolehmainen markus.stocker@uef.fi @markusstocker and @envinf
  • 2. 2 Introduction Expressive ontologies for sensor (meta-) data (SSN) Flexible graph data model (RDF) Triple stores obvious choice Unfortunately hardly viable at scale Triple stores indexes for graph pattern queries Not designed for time series interval queries
  • 3. 3 Aim Build a database that ... Consumes SSN observations in RDF Evaluates SSN observation SPARQL queries Scales to billions of observations Has better query performance than triple stores
  • 5. 5 Cassandra data model Schema consisting of Partition key (row key) of type ascii Clustering key (column name) of type timeuuid Column value of type blob The partition key consists of two (dash-concatenated) parts SHA-256 hex string digest of sensor-property-feature URIs Date time string of pattern yyyyMMddHHmm Computed from observation result time Floor-rounded to year, month, day, hour, or minute Rounding depends on sensor sampling frequency Goal is to limit the number of columns per row Clustering key determined by observation result time Column value is set of triples for observation (binary)
  • 6. 6 Experiments LI-7500A Open Path CO2/H2O Gas Analyzer LI-7700 Open Path CH4 Analyzer Property of mole fraction Three features for the monitored gases
  • 8. 8 Experiments January 7 to May 26, 2015, 6045 GHG archive files Estimated # of sensor observations is 326 430 000 Estimated # of triples is 4.9 billion (15 triples / observation) Load and query performance on 10 subsets SPARQL query with 10 min interval Compared to Stardog and Blazegraph Test performance with varying time interval
  • 9. 9 The query select ?time ?value where { [ ssn:observedBy licor:LERS-75H-2035 ; ssn:observedProperty sweet-propFraction:MoleFraction ; ssn:featureOfInterest sweet-matrCompound:CO2 ; ssn:observationResultTime [ time:inXSDDateTime ?time ] ; ssn:observationResult [ ssn:hasValue [ dul:hasRegionDataValue ?value ] ] ] filter (?time >= "2015-04-15T00:00:00.000+06:00"^^xsd:dateTime && ?time < "2015-04-15T00:10:00.000+06:00"^^xsd:dateTime) } order by asc(?time)
  • 10. 10 Results: Some figures Subset Observations Triples Distinct 30 m 54 000 810 000 648 007 1 h 108 000 1 620 000 1 296 007 3 h 324 000 4 860 000 3 888 007 6 h 647 997 9 719 955 7 775 971 12 h 1 295 997 19 439 955 15 551 971 1 d 2 591 994 38 879 910 31 103 935 7 d 18 140 271 272 104 065 217 683 259 1 M 72 526 464 1 087 896 960 870 317 575 3 M 194 188 107 2 912 821 605 * J-M 328 715 445 4 930 731 675 *
  • 11. 11 Results: Load performance 10 100 1000 10000 100000 1000000 30 m 1 h 3 h 6 h 12 h 1 d 7 d 1 m 3 m J-M Time(logscale)[s] Subsets Emrooz Blazegraph Stardog
  • 12. 12 Results: Query performance 10 100 1000 30 m 1 h 3 h 6 h 12 h 1 d 7 d 1 m 3 m J-M Time(logscale)[s] Subsets Emrooz Blazegraph Stardog
  • 13. 13 Results: Query size performance 0 1 2 3 4 5 6 7 8 9 10 1 s 30 s 1 m 5 m 10 m 20 m 30 m 40 m 50 m 60 m Time[s] Query time interval Emrooz
  • 14. 14 REST curl http://localhost:8080/sensors/list curl http://localhost:8080/properties/list curl http://localhost:8080/features/list curl -H "Accept: application/json" http://localhost:8080/sensors/list curl -H "Accept: text/csv" -G --data-urlencode sensor=http://example.org#thermometer --data-urlencode property=http://example.org#temperature --data-urlencode feature=http://example.org#air --data-urlencode from=2015-04-21T01:00:00.000+03:00 --data-urlencode to=2015-04-21T02:00:00.000+03:00 http://localhost:8080/observations/sensor/list
  • 15. 15 R host <- "http://localhost:8080" df.sensors <- read.csv(text=getURL(paste0(host, "/sensors/list")), header=FALSE, col.names=c("sensor")) df.sensors sensor 1 http://licor.com#LERS-75H-CH4 2 http://licor.com#LERS-75H-CO2
  • 16. 16 R host <- "http://localhost:8080" sensor <- "http://licor.com#LERS-75H-CO2" property <- "http://sweet.jpl.nasa.gov/2.3/propMass.owl#Density" feature <- "http://sweet.jpl.nasa.gov/2.3/matrCompound.owl#CarbonDioxide" from <- "2015-01-07T00:00:00.000+06:00" to <- "2015-01-07T00:01:00.000+06:00" url <- paste0(host, "/observations/sensor/list?", "sensor=", curlEscape(sensor), "&property=", curlEscape(property), "&feature=", curlEscape(feature), "&from=", curlEscape(from), "&to=", curlEscape(to)) df.observations <- read.csv(text=getURL(url, httpheader=c(Accept="text/csv")), header=TRUE, sep=",") ggplot(data=df.observations, aes(time, value)) + geom_line() + xlab("Time") + ylab("CO2 [mmol m-3]")
  • 17. 17 R 18.00 18.05 18.10 18.15 00 15 30 45 00 Time CO2[mmolm−3]
  • 18. 18 Related and future work Other authors have pointed out the problem “Semantification of measurement data not promising” RDF databases on NoSQL systems (e.g. Cumulus RDF) Support for QB observations (done) REST API (preliminary) Integration with R/Matlab (preliminary) Performance comparison with other systems
  • 19. 19 Conclusion SSN and RDF nice for sensor (meta-) data Triple stores inadequate for observation data Alternative approaches required What are the advantages and disadvantages? Reasoning on all data by some sensor? Query for observation values exceeding threshold?