SlideShare a Scribd company logo
BigData
Semantic Approach to
Big Data and Event Processing
Mastering	the	Velocity	
Dimension	of	Big	Data	
Emanuele	Della	Valle	
DEIB	-	Politecnico	di	Milano	
@manudellavalle	
emanuele.dellavalle@polimi.it	
h?p://emanueledellavalle.org
BigData
Agenda	
•  It's	a	streaming	world	
•  Mastering	the	velocity	dimension	with	
informaEon	flow	processing	
•  A	modeling	framework	for	DSMS	and	CEP		
–  FuncEonal	model	
–  Processing	model	
–  Deployment	model	
–  InteracEon	model	
–  Data	model	
–  Time	model	
–  Rule	model	
–  Language	model	
@manudellavalle		-		h?p://emanueledellavalle.org	 27/10/2015
BigData
It's	a	streaming	world	…	
@manudellavalle		-		h?p://emanueledellavalle.org	 3
[…]	
•  Financial	markets	
•  Sensor	networks	
•  Social	networks	
•  Generate	data	streams!	
7/10/2015
BigData
…	looking	for	reacEve	answers	
@manudellavalle		-		h?p://emanueledellavalle.org	 4
[…]	
•  Based	on	the	last	seconds	of	transacEons,		
what	shall	I	buy/sell	now	
•  Shall	I	keep	drilling	based	on	the	
last	sensor	observaEons?		
•  Which	are	the	top	hashtags	
in	the	last	few	minutes?	
	
•  Require	conEnuous	processing		
and	reacEve	answer	
7/10/2015
BigData
Other	domains	
•  Intrusion	detecEon		
•  Fraud	DetecEon	
•  Emergency	Response	Services	
•  TransportaEon	and	LogisEcs	
•  Supply	Chain	OpEmizaEon	
•  System	monitoring	
•  Click	inspecEon	
•  ...	
@manudellavalle		-		h?p://emanueledellavalle.org	 57/10/2015
BigData
Mastering	the	Velocity	dimension	with	
InformaEon	Flow	Processing	(IFP)	soluEons	
@manudellavalle		-		h?p://emanueledellavalle.org	 67/10/2015
BigData
Paradigm	Shias	Enabled	4/4	
Leverage	data	as	it	is	captured	
7/10/2015	 @manudellavalle		-		h?p://emanueledellavalle.org	 7
[source:	Marc	Andrews,	2014]
BigData
Paradigm	Shias	Enabled	4/4	
Leverage	data	as	it	is	captured	
7/10/2015	 @manudellavalle		-		h?p://emanueledellavalle.org	 8
[source:	Marc	Andrews,	2014]
BigData
IFP	-	Gartner	
The	Gartner	hype	cycle	
@manudellavalle		-		h?p://emanueledellavalle.org	 97/10/2015
BigData
IFP	-	Forrester	
Forrester’s	top	15	
emerging	tech	to	
watch:	
Now	to	2018	
@manudellavalle		-		h?p://emanueledellavalle.org	 107/10/2015
BigData
Is	there	a	market	beyond	hype?	
•  Complex	Event	Processing	
Market	Worth	$3,322M	by	
2018	
(2014	Report	by	
MarketsandMarkets)	
•  Major	players	include:	
–  Microsoa	
–  IBM	
–  Oracle	
–  SAP	
–  Tibco	
–  ....	
@manudellavalle		-		h?p://emanueledellavalle.org	 117/10/2015
BigData
DSMS/CEP	State	of	the	Art	
[source:TheForresterWave:
BigDataStreamingAnalyticsPlatforms,
Q32014]
7/10/2015	 @manudellavalle		-		h?p://emanueledellavalle.org	 12
BigData
DSMS/CEP	State	of	the	Art	
•  InformaEca:	Vibe	data	stream	
–  h?ps://www.informaEca.com/products/data-integraEon/real-Eme-integraEon/vibe-data-stream.html		
•  SAP:	Event	Stream	Processor		
–  h?p://www.sap.com/pc/tech/database/soaware/sybase-complex-event-processing/index.html		
•  Soaware	AG:	Intelligent	Business	OperaEons	
–  h?p://www.soawareag.com/corporate/products/apama_webmethods/		
•  SQL	Stream:	blaze	
–  h?p://www.sqlstream.com/blaze/		
•  Tibco	Complex	Event	Processing	
–  h?p://www.Ebco.com/products/event-processing/complex-event-processing/		
•  Vitra	OperaEonal	Intelligence	
–  h?p://www.vitria.com/products/operaEonal-intelligence	
1/12/2014	 h?p://emanueledellavalle.org	 13
BigData
DSMS/CEP	State	of	the	Art	
•  Gianpaolo	Cugola,	Alessandro	Margara:	Processing	flows	of	
informaEon:	From	data	stream	to	complex	event	
processing.	ACM	Comput.	Surv.	44(3):	15	(2012)	
•  Content	
–  Type	of	models	compared	
•  FuncEonal	and	processing	
•  Deployment	and	interacEons	
•  Data,	Time,	and	Rule	
•  Language	
–  #	of	systems	surveyed:		
•  Academic:	24		
•  Industrial:	9	
•  Total:	33	
–  To	learn	more:	
•  h?p://home.dei.polimi.it/margara/papers/survey.pdf		
7/10/2015	 @manudellavalle		-		h?p://emanueledellavalle.org	 14
BigData
InformaEon	Flow	Processing	
•  The	IFP	engine	processes	incoming	flows	of	informa1on	according	to	a	set	of	
processing	rules	
–  Processing	is	“on	line”	
•  Sources	produce	the	incoming	informaEon	flows,	sinks	consume	the	results	of	
processing,	rule	managers	add	or	remove	rules	
•  InformaEon	flows	are	composed	of	informa1on	items	
–  Items	part	of	the	same	flow	are	neither	necessarily	ordered	nor	of	the	same	kind	
•  Processing	involve	filtering,	
combining,	and	aggregaEng	
flows,	item	by	item	as	they	
enter	the	engine	
Sources	 Sinks	
IFP	Engine	
Informa1on	Flows	 Informa1on	Flows	-----	
-----	
-----	
-----	
Rules	
-----	
-----	
-----	
-----	
Rule	managers	
@manudellavalle		-		h?p://emanueledellavalle.org	 157/10/2015
BigData
IFP:	A	bit	of	history	of	two	approaches	
Traditional
DBMS
Active
DBMS
DSMS
Event-
based
Systems
CEP
@manudellavalle		-		h?p://emanueledellavalle.org	 167/10/2015
BigData
From	Passive	to	AcEve	DBMSs	
•  Standard	DBMSs	
– Purely	passive:	Human-ac1ve	
database-passive	(HADP)	
– ExecuEon	happens	only	when	
asked	by	clients	(through	queries)	
•  AcEve	DBMSs	
– The	reacEve	behavior	moves	
(in	part)	from	the	applicaEon	
to	the	DB	layer…	
– …which	executes	Event	CondiEon	AcEon	(ECA)	rules	
@manudellavalle		-		h?p://emanueledellavalle.org	 177/10/2015
BigData
AcEve	DBMSs	
•  As	a	DBMS	extension	
– Rules	may	only	refer	to	the	internal	state	of	the	DB	
•  Closed	DB	applicaEons	
– Rules	may	support	the	semanEcs	of	the	applicaEon,	
but	external	sources	of	events	are	not	allowed	
– But	events	may	come	from	external	sources	…	
•  Open	DB	applicaEons	
– Events	may	come	from	external	sources	
@manudellavalle		-		h?p://emanueledellavalle.org	 187/10/2015
BigData
Data	Stream	Management	Systems	(DSMS)	
•  Data	streams	are	(unbounded)	
sequences	of	Eme-varying	
data	elements	
•  Represent:	
– an	(almost)	“conEnuous”	flow	of	informaEon		
– with	the	recent	informaEon	being	more	relevant	as	it	
describes	the	current	state	of	a	dynamic	system	
Eme	
@manudellavalle		-		h?p://emanueledellavalle.org	 197/10/2015
BigData
Data	Stream	Management	Systems	(DSMS)	
•  The	nature	of	streams	requires	a		
paradigmaEc	change*	
–  from	persistent	data		
•  one	Eme	semanEcs	
–  to	transient	data		
•  conEnuous	
*	This	paradigmaEc	change	first	arose	in	DB	community	in	the	late	'90s	
20	7/10/2015	 @manudellavalle		-		h?p://emanueledellavalle.org
BigData
ConEnuous	SemanEcs	
•  ConEnuous	queries	registered	over	streams	that	
are	observed	trough	windows	 window
input streams streams of answerRegistered	
ConEnuous	
Query	
Dynamic	
System
21	7/10/2015	 @manudellavalle		-		h?p://emanueledellavalle.org
BigData
Event-based	systems	
•  Components	collaborate	by	
exchanging	informaEon	about	
occurrent	events.	In	parEcular:	
–  Components	publish	noEficaEons	
about	the	events	they	observe,	
or	
–  they	subscribe	to	the	events	they	
are	interested	to	be	noEfied	
about	
•  CommunicaEon	is:	
–  Purely	message	based	
–  Asynchronous	
–  MulEcast	
–  Implicit	
–  Anonymous	
topic=fire*	&	
place=*	
topic=*	&	
place=1st	floor	
topic=fire	alarm	&	
place=*	
fire	alarm	at	
1st	floor	
fire	alarm	at	
1st	floor	
fire	alarm	at	
1st	floor	
fire	alarm	at	
1st	floor	
fire	training	at	
1st	floor	
fire	training	at	
1st	floor	
fire	training	at	
1st	floor	
@manudellavalle		-		h?p://emanueledellavalle.org	 227/10/2015
BigData
Complex	Event	Processing	(CEP)	
•  CEP	systems	adds	the	ability	to	deploy	rules	that	describe	how	
composite	events	can	be	generated	from	primiEve	(or	composite)	
ones	
•  Typical	CEP	rules	search	
for	sequences	of		
events	
–  Raise	C	if	A→B	
•  Time	is	a	key	aspect	
in	CEP	
Rules	
-----	
-----	
-----	
-----	
@manudellavalle		-		h?p://emanueledellavalle.org	 237/10/2015
BigData
The	current	situaEon	
•  Back	in	2007	CEP	was	
already	a	hot	topic…	
•  …	but	having	a	good	grasp	
of	the	area	was	rather	hard	
•  As	observed	by	Opher	
Etzion	the	area	was	looking	
like	the	“Tower	of	Babel”	
–  Event	Processing	and	the	Babylon	
Tower	–	Event	process	thinking	blog	
–	Sept.	8,	2007	
event	
data	
stream	
message	
flow	
publish	 subscribe	 noEfy	
adverEse	
pa?ern	
sequence	
primiEve	
complex	
composite	
join	
send	 receive	
middleware	
system	 applicaEon	
protocol	
rouEng	network	
query	
rule	 condiEon	
acEon	
24G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
The	current	situaEon	
•  Several	communiEes	
were	contribuEng	to	
the	area…	
•  …	each	bringing	its	own	
experEse	and	
vocabulary…	
•  …but	oaen	working	in	
isolaEon	
event	
data	
stream	
message	
flow	
publish	subscribe	noEfy	
adverEse	
pa?ern	sequence	
primiEve	
complex	
composite	
join	
send	receive	
middleware	system	applicaEon	protocol	
rouEng	network	
query	
rule	 condiEon	
acEon	
ad-hoc	tools	
(intrusion	det.,	…)	
data	management	
&	databases	
DEBS	
process	modeling	
&	automaEon	
DBMSs	 applicaEon	
servers	
middleware	
systems	
Researchers	
Tool	vendors	
25G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
The	current	situaEon	
•  That	was	2007.	What	about	today?	
•  Things	did	not	change	much	
–  From	the	“Event	Process	Thinking”	blog	
[Which	is]	the	relaEon	between	event	processing	and	data	stream	management?	
1.  They	are	aliases	--	stream	is	just	a	collecEon	of	events,	likewise,	an	event	is	
just	a	member	in	a	stream,	and	the	funcEonality	is	the	same	
2.  Stream	management	is	a	subset	of	event	processing	--	there	are	different	
ways	to	do	event	processing,	streams	is	one	of	them	
3.  Event	processing	is	a	subset	of	stream	management	--	event	streams	is	just	
one	type	of	stream,	but	there	are	voice	stream,	video	stream,	…	
4.  Event	processing	and	stream	management	are	disEnct	and	there	is	no	
overlapping	between	them	
•  At	the	same	Eme	tool	vendors	are	building	tools	that	try	to	combine		
¿	juxtapose	?		different	approaches	
26G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
The	goal	of	the	survey	
•  Define	a	modeling	framework	to	
–  compare	different	systems	in	a	precise	way	
–  compare	different	approaches	in	a	precise	way	
–  help	people	coming	from	different	areas	communicate	
and	compare	their	work	with	others	
–  isolate	the	open	issues	from	those	already	solved	
–  precisely	iden5fy	the	challenges	
–  isolate	the	best	part	of	the	various	approaches	
–  …	finding	a	way	to	combine	them	
• 	 	
7/10/2015	 G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	 27
G.	Cugola,	A.	Margara:	"Processing	Flows	of	Informa5on:	From	Data	Stream	to	Complex	
Event	Processing”.	ACM	Compu)ng	Surveys,	44(3),	ACM	Press,	June	2012
BigData
The	InformaEon	Flow	Processing	domain	
•  The	IFP	engine	processes	incoming	flows	of	informa5on	
according	to	a	set	of	processing	rules	
•  Sources	produce	the	incoming	informaEon	flows,	sinks	
consume	the	results	of	processing,	rule	managers	add	or	
remove	rules	
•  InformaEon	flows	are	composed	of	informa5on	items	
–  Items	part	of	the	same		
flow	are	neither		
necessarily	ordered		
nor	of	the	same	kind	
–  Processing	involve		
filtering,	combining,		
and	aggregaEng	flows,	
item	by	item	as	they	
enter	the	engine	
7/10/2015	 G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	 28
Sources	 Sinks	
IFP	Engine	
Informa1on	Flows	 Informa1on	Flows	-----	
-----	
-----	
-----	
Rules	
-----	
-----	
-----	
-----	
Rule	managers
BigData
One	framework,	several	models	
•  Different	models	to	capture	different	viewpoints	
– FuncEonal	model	
– Processing	model	
– Deployment	model	
– InteracEon	model	
– Time	model	
– Data	model	
– Rule	model	
– Language	model	
29G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
FuncEonal	model	
Receiver Forwarder
Clock	
-----	
-----	-----	
-----	
-----	
-----	
-----	
-----	-----	
-----	
-----	
-----	
-----	
-----	-----	-----	
-----	
-----	
Decider	
History	
History	
History	
Producer	
A	
A	
A	
Seq	
Rules	
Knowledge	
base	
•  Implements the transport protocol to
move information items along the net
•  Acts as a demultiplexer
•  Implements the transport protocol to
move information items along the net
•  Acts as a multiplexer
30G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
A	short	digression	
•  We	assume	rules	can	be	(logically)	
decomposed	in	two	parts:	C	→	A	
–  C	is	the	condi1on	
–  A	is	the	ac1on	
•  Example	(in	CQL):	
Select IStream(Count(*))
From F1 [Range 1 Minute]
Where F1.A > 0
•  This	way	we	can	split	processing	in	two	phases:	
–  The	detec1on	phase	determines	the	items	that	trigger	the	rule	
–  The	produc1on	phase	use	those	items	to	produce	the	output	of	
the	rule	
Receiver Forwarder
Clock	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
Decider	
History	
History	
History	
Producer	
A	
A	
A	
Seq	
Rules	
Knowledge	
base	
condiEon	
acEon	
31G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
FuncEonal	model	
Receiver Forwarder
Clock	
-----	
-----	-----	
-----	
-----	
-----	
-----	
-----	-----	
-----	
-----	
-----	
-----	
-----	-----	-----	
-----	
-----	
Decider	
History	
History	
History	
Producer	
A	
A	
A	
Seq	
Rules	
Knowledge	
base	
•  Implements the detection phase
•  Accumulates partial results into the history
•  When a rule fires passes to the producer its
action part and the triggering items•  Implements the production phase
•  Uses the items in Seq as stated in action A
•  Some systems allow rules to be
added or removed at processing time
•  Some systems allows rules to combine
flowing items with items previously stored
into a (read only) storage
•  If present models the ability of
performing recursive processing
building hierarchies of items
•  Optional component
•  Periodically creates special information
items holding current time
•  Its presence models the ability of
performing periodic processing of inputs
32G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
FuncEonal	model:	ConsideraEons	
•  The	detecEon-producEon	cycle	
–  Fired	by	a	new	item	I	entering	the	engine	through	the	Receiver	
•  Including	those	periodically	produced	by	the	Clock,	if	present	
–  DetecEon	phase:	Evaluates	all	the	rules	to	find	those	enabled	
•  Using	item	I	plus	the	data	into	the	Knowledge	base,	if	present	
•  The	item	I	can	be	accumulated	into	the	History	for	parEally	enabled	
rules	
•  The	acEon	part	of	the	enabled	rules	together	with	the	triggering	items	
(A+Seq)	is	passed	to	the	producer	
–  ProducEon	phase:	Produces	the	output	items	
•  Combining	the	items	that	triggered	the	rule	with	data	present	in	the	
Knowledge	base,	if	present	
•  New	items	are	sent	to	subscribed	sinks	(through	the	Forwarder)…	
•  …but	they	could	also	be	sent	internally	to	be	processed	again	(recursive	
processing)	
•  In	some	systems	the	acEon	part	of	fired	rules	may	also	change	the	set	of	
deployed	rules	
33G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
FuncEonal	model:	ConsideraEons	
•  Maximum	length	of	Seq	a	key	aspect	
–  1	≈	PubSub	
–  Bounded	⇒		
•  CQL	like	languages	without	Eme	based	windows	
•  Pa?ern	based	languages	without	a	Kleene+	operator	
•  Other	key	aspects	that	impact	expressiveness	
–  Presence	of	the	Clock	
•  Models	the	ability	to	process	rules	
periodically	
•  Available	in	almost	half	of	the	
systems	reviewed	
•  Most	AcEve	DBMSs	and	DSMSs	but	
few	CEP	systems	
(see	next	slide)	
Receiver Forwarder
Clock	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
Decider	
History	
History	
History	
Producer	
A	
A	
A	
Seq	
Rules	
Knowledge	
base	
34G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
FuncEonal	model:	ConsideraEons	
(see	previous	slide)	
–  Presence	of	the	Knowledge	base	
•  Only	available	in	systems	coming	from	the	database	community	
–  Presence	of	the	looping	flow	exiEng	
the	Producer	
•  Models	the	ability	of	performing	recursive	
processing	
•  Half	CEP	systems	have	it	
•  All	AcEve	DBMSs	but	very	few		
DSMSs	have	it	
–  They	have	nested	rules	
–  Support	to	dynamic	rule	change	
•  Few	systems	support	it	
•  Can	be	implemented	externally…	
–  Through	sinks	acEng	also	as	rule	managers	
•  …but	we	think	it	is	nice	to	have	it	internally	
Receiver Forwarder
Clock	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
Decider	
History	
History	
History	
Producer	
A	
A	
A	
Seq	
Rules	
Knowledge	
base	
35G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
The	semanEcs	of	processing	
•  What	determines	the	output	of	each	detecEon-producEon	cycle?	
–  The	new	item	entering	the	engine	
–  The	set	of	deployed	rules	
–  The	items	stored	into	the	History	
–  The	content	of	the	Knowledge	Base	
•  Is	this	enough?	
•  Example	(in	Padres	and	CQL):
–  Smoke && Temp>50
–  Select IStream(Smoke.area)
From Smoke[Rows 30 Slide 10], Temp[Rows 50 Slide 5]
Where Smoke.area = Temp.area AND Temp.value > 50
Rules	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
Decider	
History	
History	
History	
Producer	
A	
A	
A	
Seq	
Knowledge	
base	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
History	
History	
History	
Knowledge	
base	
36G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Processing	model	
•  Three	policies	affect	the	behavior	of	the	system	
– The	selec1on	policy	
– The	consump1on	policy	
– The	load	shedding	policy	
37G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
SelecEon	policy	
•  Determines	if	a	rule	fires	once	or	mulEple	Emes	
and	the	items	actually	selected	from	the	History	
•  Example:	
Receiver
Decider	
A A	A	A	B
?	
A	∧B	
A0	A1	
A0		B	R	
A1		B	R	
A0		B	R	 A1		B	R	single	
mulEple	
or	
38G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
SelecEon	policy:	ConsideraEons	
•  Most	systems	adopt	a	mulEple	selecEon	policy	
–  It	is	simpler	to	implement	
–  Is	it	adequate?	
•  Example:	Alert	fire	when	smoke	and	high	temperature	in	a	short	Eme	
frame	
–  If	10	sensors	read	high	temperature	and	immediately	aaerward	one	detects	
smoke	I	would	like	to	receive	a	single	alert,	not	10	
•  A	few	systems	allow	this	policy	to	be	programmed…	
•  …some	of	them	on	a	per-rule	base	
–  E.g.,	Amit,	T-Rex	
39G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
SelecEon	policy:	The	TESLA	case	
•  TESLA	(Trio-based	Event	SpecificaEon	Language):	the	T-Rex	language	
–  A	rule	language	for	CEP.	Tries	to	combine	expressiveness	and	efficiency	
–  Has	a	formally	defined	semanEcs	
•  Expressed	in	Trio,	a	Metric	Temporal	Logic	(see	DEBS	2010)	
•  Allows	rule	managers	to	choose	their	own	selecEon	policy	on	a	per	rule	base	
–  Example:	MulEple	selecEon	
define Fire(area: string, measuredTemp: double)
from Smoke(area=$a) and
each Temp(area=$a and val>50) within 1min. from Smoke
where area=Smoke.area and measuredTemp=Temp.value
–  Example:	Single	selecEon	
define Fire(area: string, measuredTemp: double)
from Smoke(area=$a) and
last Temp(area=$a and val>50) within 1min. from Smoke
where area=Smoke.area and measuredTemp=Temp.val	
•  Alternatively you may use:
•  first…within
•  n-first…within n-last…within
40G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
ConsumpEon	policy	
•  Determines	how	the	history	changes	aaer	firing	of	
a	rule	⇒	what	happens	when	new	items	enter	the	
Decider	
•  Example:	
Receiver
Decider	
A A	B
?	
A	∧B	
A				B	R	
selected	
zero	 A				B	R	
41G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
ConsumpEon	policy:	ConsideraEons	
•  Most	systems	couple	a	mulEple	selecEon	policy	with	a	zero	
consumpEon	policy	
–  This	is	the	common	case	with	DSMSs,	which	use	(sliding)	
windows	to	select	relevant	events	
•  Example	(in	CQL)	
Select IStream(Smoke.area)
From Smoke[Range 1 min], Temp[Range 1 min]
Where Smoke.area = Temp.area AND Temp.val > 50
•  The	systems	that	allow	the	selecEon	policy	to	be	programmed	
oaen	allow	the	consumpEon	policy	to	be	programmed,	too	
–  E.g.,	Amit,	T-Rex	
42G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
ConsumpEon	policy:	The	TESLA	case	
•  Zero	consumpEon	policy	
–  define Fire(area: string, measuredTemp: double)
from Smoke(area=$a) and
each Temp(area=$a and val>50)
within 1min. from Smoke
where area=Smoke.area and measuredTemp=Temp.value
•  Selected	consumpEon	policy	
–  define Fire(area: string, measuredTemp: double)
from Smoke(area=$a) and
each Temp(area=$a and val>50)
within 1min. from Smoke
where area=Smoke.area and measuredTemp=Temp.value
consuming Temp
T	 T	 T	 T	 S	
Fire!	
Fire!	
Fire!	
Fire!	
S	
Fire!	
Fire!	
Fire!	
Fire!	
T	 T	 T	 T	 S	
Fire!	
Fire!	Fire!	
Fire!	
T	 T	 T	 T	 S	
43G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Load	shedding	policy	
•  Problem:	How	to	manage	bursts	of	input	data	
•  It	may	seem	a	system	issue	
–  i.e.,	an	issue	to	solve	into	the	
Receiver	
•  But	it	strongly	impacts	the	results	
produced	
–  i.e.,	the	“semanEcs”	of	the	rules	
•  Accordingly,	some	systems	allows	this	issue	to	be	determined	on	a	
per-rule	basis	
–  e.g.,	Aurora	allows	rules	to	specify	the	expected	QoS	and	sheds	
input	to	stay	within	limits	with	the	available	resources	
–  Conceptually	the	issue	is	addressed	into	the	decider	
Receiver Forwarder
Clock	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
-----	
Decider	
History	
History	
History	
Producer	
A	
A	
A	
Seq	
Rules	
Knowledge	
base	
44G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Deployment	model	
•  IFP	applicaEons	may	
include	a	large	number	of	
sources	and	sinks	
–  Possibly	dispersed	over	a	
wide	geographical	area	
•  It	becomes	important	to	
consider	the	deployment	
architecture	of	the	engine	
–  How	the	components	of	
the	funcEonal	model	can	
be	distributed	to	achieve	
scalability	
45G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Deployment	model	
Centralized	
Distributed	
Clustered	 Networked	
Sources	
IFP	Engine	
Informa1on	Flows	 Informa1on	Flows	-----	
-----	
-----	
-----	
Rules	
-----	
-----	
-----	
-----	
Rule	managers	
Sinks	
46G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Deployment	Model	
•  Most	exisEng	systems	adopt	a	centralized	soluEon	
•  When	distributed	processing	is	allowed,	it	is	
usually	based	on	clustered	soluEons	
•  A	few	systems	have	recognized	the	importance	of	
networked	deployment	for	some	applicaEons	
– E.g.	Microsoa	StreamInsight	(part	of	SQLServer)	
•  Filtering	near	sources	
•  AggregaEon	and	correlaEon	in-network	
•  AnalyEcs	and	historical	data	in	a	centralized	server/cluster	
•  In	most	cases,	deployment/configuraEon	is	not	
automaEc	
47G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Deployment	model	
•  AutomaEc	distribuEon	of	
processing	introduces	the	
operator	placement	problem	
•  Given	a	set	of	rules	(composed	
of	operators)	and	a	set	of	
nodes	
–  How	to	split	the	processing	
load	
–  How	to	assign	operators	to	
available	nodes	
•  In	other	words	(Event	
Processing	in	AcEon)	
–  Given	an	event	processing	
network	
–  How	to	map	it	onto	the	
physical	network	of	nodes	
48G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Operator	placement	
•  The	operator	placement	problem	is	sEll	open	
– Several	proposals	
•  Oaen	adopEng	techniques	coming	from	the	OperaEonal	
Research	
– Difficult	to	compare	soluEons	and	results	
•  Even	in	its	simplest	form	the	problem	is	NP-hard	
	
•  more	in	the	"operator	placement	problem"	lecture	of	the	PhD	
course	on	"Stream	and	Complex	Event	Processing"	offered	by	
Politecnico	di	Milano	in	2015
h?p://www.streamreasoning.org/TR/2015/scep/corso_do?_ifp_operatorPlacement_2015.pdf		
49G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
More	on	deployment	model	
•  Operator	placement	is	only	part	of	the	problem	
•  Other	issues	
– How	to	build	the	network	of	nodes?	
– How	to	maintain	it?	
– How	to	gather	the	informaEon	required	to	solve	the	
operator	placement	problem?	
– How	to	actually	“place”	the	operators?	
– How	to	“replace”	them	when	the	situaEon	changes?	
•  New	rules	added,	old	rules	removed…	
•  …new	sources/sinks	
50G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Deployment	model	and	dynamics	
•  How	to	cope	with	mobile	nodes?	
– Mobile	sinks	and	sources…	
– …but	also	mobile	“processors”	
•  The	issue	is	relevant	
– We	leave	in	a	mobile	world	
•  Very	few	proposals	
•  A	lot	of	work	in	the	area	of	pure	publish/subscribe	
– Several	works	published	in	DEBS,	not	to	menEon	other	
major	conferences/journals	
•  May	we	reuse	some	of	this	work?	
51G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
InteracEon	Model	
•  It	is	interesEng	to	study	the	characterisEcs	of	the	
interacEons	among	the	main	component	of	an	
IFP	system	
–  Who	starts	the	communicaEon?	
Sources	 Sinks	IFP	Engine	
52G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
InteracEon	Model	
Sources	 Sinks	IFP	Engine	
•  Push	
•  Pull	
Observation Model
•  Push	
•  Pull	
Forwarding Model
•  Push	
•  Pull	
Notification Model
53G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Time	Model	
•  RelaEonship	between	informaEon	items	and	
passing	of	Eme	
•  Ability	of	an	IFP	system	to	associate	some	kind	of	
happened-before	(ordering)	relaEonship	to	
informaEon	items	
•  We	idenEfied	4	classes:	
1.  Stream-only	
2.  Causal	
3.  Absolute	
4.  Interval	
54G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Stream-Only	Time	Model	
•  Used	in	original	DSMSs	
•  Timestamps	may	be	present	
or	not	
•  When	present,	they	are	used	
only	to	order	items	before	
entering	the	engine,	then	they	
are	forgo?en	
•  They	are	not	exposed	to	the	
language	
–  With	the	excepEon	of	
windowing	constructs	
•  Ordering	in	output	streams	is	
conceptually	separate	from	
the	ordering	in	input	streams	
CQL/Stream
Select DStream(*)
From F1[Rows 5],
F2[Range 1 Minute]
Where F1.A = F2.A
Relational Tables
Stream Stream
S2R R2S
R2R
55G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Causal	Time	Model	
•  Each	item	has	a	label	
reflecEng	some	kind	of	
causal	relaEonship	
•  ParEal	order	
•  E.g.	Rapide	
–  An	event	is	causally	
ordered	aaer	all	events	
that	led	to	its	occurrence	
Gigascope
Select count(*)
From A, B
Where A.a-1 <= B.b and
A.a+1 > B.b
A.a, B.b monotonically
increase
56G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Absolute	Time	Model	
•  InformaEon	items	have	
an	associated	
Emestamp	
•  Defining	a	single	point	
in	Eme	w.r.t.	a	
(logically)unique	clock	
–  Total	order	
•  Timestamps	are	fully	
exposed	to	the	language	
•  InformaEon	items	can	
be	Emestamped	at	
source	or	entering	the	
engine	
TESLA/T-Rex
Define Fire(area: string,
measuredTemp: double)
From Smoke(area=$a) and last
Temp(area=$a and value>45)
within 5 min. from Smoke
Where area=Smoke.area and
measuredTemp=Temp.value
57G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Interval	Time	Model	
•  Used	for	events	to	include	“duraEon”	
– SnoopIB,	Cayuga,	NextCEP,	…	
•  At	a	first	sight,	it	is	a	simple	extension	of	the	
absolute	Eme	model	
– Timestamps	with	two	values:	start	Eme	and	end	Eme	
•  However,	it	opens	many	issues	
– What	is	the	successor	of	an	event?	
– What	is	the	Emestamp	associated	to	a	composite	
event?	
58G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Interval	Time	Model	
•  Which	is	the	immediate	
successor	of	A?	
–  Choose	according	to	end	Eme	only:	
B	
•  But	it	started	before	A!	
–  Exclude	B:	C,	D	
•  Both	of	them?	
•  Which	of	them?	
–  No	other	event	strictly	between	A	
and	its	successor:	C,	D,	E	
•  Seems	a	natural	definiEon	
•  Unfortunately	we	loose	associaEvity!	
–  Xà(YàZ)	≠	(X	àY)àZ	
•  May	impede	some	rule	rewriEng	for	
processing	opEmizaEons	
A
B
C
D
E
59G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Interval	Time	Model	
•  “What	is	“Next”	in	event	processing?”	by	White	et.	Al	
–  Proposes	a	number	of	desired	properEes	to	be	saEsfied	by	
the	“Next”	funcEon	
–  There	is	one	model	that	saEsfies	them	all	
•  Complete	History	
–  It	is	not	sufficient	to	encode	Emestamps	using	a	couple	of	
values	
–  Timestamps	of	composite	events	must	embed	the	
Emestamps	of	all	the	events	that	led	to	their	occurrence	
–  Possibly,	Emestamps	of	unbounded	size	
•  In	case	of	unbounded	Seq	
60G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Data	Model	
•  Studies	how	the	different	
systems	
–  Represent	single	data	items	
–  Organize	them	into	data	
flows	
Data
•  Generic	Data	
•  Event	NoEficaEons	
•  Records	
•  Tuples	
•  Objects	
•  …	
Data Items
Nature of Items
Format
Support for Uncertainty
Data Flows
•  Homogeneous	
•  Heterogeneous	
61G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Nature	of	Items	
•  The	meaning	we	associate	to	
informaEon	items	
–  Generic	data	
–  Event	noEficaEons	
•  Deeply	influences	several	
other	aspects	of	an	IFP	system	
–  Time	model	!!!	
–  Rule	language	
–  SemanEcs	of	processing	
•  Heritage	of	the	
heterogeneous	backgrounds	
of	different	communiEes	
Data
•  Generic	Data	
•  Event	NoEficaEons	
•  Records	
•  Tuples	
•  Objects	
•  …	
Data Items
Nature of Items
Format
Support for Uncertainty
Data Flows
•  Homogeneous	
•  Heterogeneous	
62G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Nature	of	Items	
CQL/Stream	(Generic	Data)	
Select IStream(*)
From F1[Rows 5],
F2[Range 1 Minute]
Where F1.A = F2.A
	
TESLA/T-Rex	(Event	No5fica5ons)	
Define Fire (area: string,
measuredTemp: double)
From Smoke(area=$a)and last
Temp(area=$a and value>45)
within 5 min.
from Smoke
Where area=Smoke.area and
measuredTemp=Temp.value
Relational Tables
Stream Stream
S2R R2S
R2R
63G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Format	of	Items	
•  How	informaEon	is	
represented	
•  Influences	the	way	items	
are	processed	
–  E.g.,	RelaEonal	model	
requires	tuples	
Data
•  Generic	Data	
•  Event	NoEficaEons	
•  Records	
•  Tuples	
•  Objects	
•  …	
Data Items
Nature of Items
Format
Support for Uncertainty
Data Flows
•  Homogeneous	
•  Heterogeneous	
64G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Support	for	Uncertainty	
•  Ability	to	associate	a	degree	of	
uncertainty	to	informaEon	items	
–  To	the	content	of	items	
•  Imprecise	temperature	reading	
–  To	the	presence	of	an	item	
(occurrence	of	an	event)	
•  Spurious	RFID	reading	
•  When	present,	probabilisEc	
informaEon	is	usually	exploited	
in	rules	during	processing	
Data
•  Generic	Data	
•  Event	NoEficaEons	
•  Records	
•  Tuples	
•  Objects	
•  …	
Data Items
Nature of Items
Format
Support for Uncertainty
Data Flows
•  Homogeneous	
•  Heterogeneous	
65G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Data	Flows	
•  Homogeneous	
–  Each	flow	contains	data	with	the	
same	format	and	“kind”	
•  E.g.	Tuples	with	idenEcal	
structure	
–  Oaen	associated	with	
“database-like”	rule	languages	
•  Heterogeneous	
–  InformaEon	flows	are	seen	as	
channels	connecEng	sources,	
processors,	and	sinks	
–  Each	channel	may	transport	
items	with	different	kind	and	
format	
Data
•  Generic	Data	
•  Event	NoEficaEons	
•  Records	
•  Tuples	
•  Objects	
•  …	
Data Items
Nature of Items
Format
Support for Uncertainty
Data Flows
•  Homogeneous	
•  Heterogeneous	
66G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Rule	Model	
•  Rules	are	much	more	complex	
enEEes	than	data	items	
•  Large	number	of	different	
approaches	
–  Already	observed	in	the	
previous	slides	
•  Looking	back	to	our	funcEonal	
model,	we	classify	them	into	
two	macro	classes	
–  Transforming	rules	
–  DetecEng	rules	
Rule
•  Transforming	Rules	
•  DetecEng	Rules	
Type of Rules
Support for Uncertainty
67G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Transforming	Rules	
•  Do	not	present	an	explicit	
disEncEon	between	detecEon	
and	producEon	
•  Define	an	execuEon	plan	
combining	primi1ve	operators	
•  Each	operator	transforms	one	or	
more	input	flows	into	one	or	
more	output	flows	
•  The	execuEon	plan	can	be	
defined	
–  explicitly	(e.g.,	through	graphical	
notaEon)	
–  implicitly	(using	a	high	level	language)	
•  Oaen	used	with	homogeneous	
informaEon	flows	
–  To	take	advantage	of	the	predefined	
structure	of	input	and	output	
Rule
•  Transforming	Rules	
•  DetecEng	Rules	
Type of Rules
Support for Uncertainty
68G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
DetecEng	Rules	
•  Present	an	explicit	
disEncEon	between	
detecEon	and	producEon	
•  Usually,	the	detecEon	is	
based	on	a	logical	predicate	
that	captures	paIerns	of	
interest	in	the	history	of	
received	items	
Rule
•  Transforming	Rules	
•  DetecEng	Rules	
Type of Rules
Support for Uncertainty
69G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Support	for	Uncertainty	
•  Two	orthogonal	aspects	
–  Support	for	uncertain	input	
•  Allows	rules	to	deal	with/
reason	about	uncertain	input	
data	
–  Support	for	uncertain	output	
•  Allows	rules	to	associate	a	
degree	of	uncertainty	to	the	
output	produced	
Rule
•  Transforming	Rules	
•  DetecEng	Rules	
Type of Rules
Support for Uncertainty
70G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Language	Model	
•  Specify	operaEons	to	
–  Filter	
–  Join	
–  Aggregate	
•  input	flows	…	
•  …	to	produce	one	or	
more	output	flows	
•  Following	the	rule	
model,	we	define	two	
classes	of	languages:	
–  Transforming	languages	
•  DeclaraEve	languages	
•  ImperaEve	languages	
–  DetecEng	languages	
•  Pa?ern-based	
71G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
•  Following	the	rule	
model,	we	define	two	
classes	of	languages:	
–  Transforming	languages	
•  DeclaraEve	languages	
•  ImperaEve	languages	
–  DetecEng	languages	
•  Pa?ern-based	
Language	Model	
•  Specify	the	expected	
result	rather	than	the	
desired	execuEon	flow	
•  Usually	derive	from	
relaEonal	languages	
–  RelaEonal	algebra	/	SQL	
CQL/Stream:
Select IStream(*)
From F1[Rows 5],
F2[Rows 10]
Where F1.A = F2.A
72G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
•  Following	the	rule	
model,	we	define	two	
classes	of	languages:	
–  Transforming	languages	
•  DeclaraEve	languages	
•  ImperaEve	languages	
–  DetecEng	languages	
•  Pa?ern-based	
Language	Model	
•  Specify	the	desired	
execuEon	flow	
•  StarEng	from	primiEve	
operators	
–  Can	be	user-defined	
•  Usually	adopt	a	
graphical	notaEon	
73G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
ImperaEve	Languages	
Aurora (Boxes & Arrows Model)
74G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Hybrid	Languages	
Oracle CEP
75G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
•  Following	the	rule	
model,	we	define	two	
classes	of	languages:	
–  Transforming	languages	
•  DeclaraEve	languages	
•  ImperaEve	languages	
–  DetecEng	languages	
•  Pa?ern-based	
Language	Model	
•  Specify	a	firing	
condiEon	as	a	pa?ern	
•  Select	a	porEon	of	
incoming	flows	through	
–  Logic	operators	
–  Content	/	Eming	
constraints	
•  The	acEon	uses	
selected	items	to	
produce	new	
knowledge	
76G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
DetecEng	Languages	
TESLA / T-Rex
Define Fire(area: string, measuredTemp: double)
From Smoke(area=$a) and last
Temp(area=$a and value>45)
within 5 min. from Smoke
Where area=Smoke.area and
measuredTemp=Temp.value
77G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Language	Model	
•  Different	syntaxes	/	constructs	/	operators	
•  Comparison	of	languages	semanEcs	and	
expressiveness	sEll	an	open	issue	
•  Our	approach:	
– Review	all	operators	encountered	in	the	analysis	of	
systems	
– Specifying	the	classes	of	languages	adopEng	them	
– Trying	to	capture	some	semanEcs	relaEonship	
•  Among	operators	
78G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Language	Model	
•  Single-Item	operators	
–  SelecEon	operators	
•  Filter	items	according	to	their	
content	
–  ElaboraEon	operators	
•  ProjecEon	
–  Extracts	a	part	of	the	content	
of	an	item	
•  Renaming	
–  Changes	the	name	of	a	field	in	
languages	based	on	records	or	
tuples	
•  Present	in	all	languages	
•  Defined	as	primiEve	operators	in	
imperaEve	languages	
•  DeclaraEve	languages	inherit	
selecEon,	projecEon,	and	
renaming	from	relaEonal	algebra	
79G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	
Select RStream
(I.Price as HighPrice)
From Items[Rows 1] as I
Where I.Price > 100
Renaming
Projection
Selection
7/10/2015
BigData
Language	Model	
•  Single-Item	operators	
–  SelecEon	operators	
•  Filter	items	according	to	their	
content	
–  ElaboraEon	operators	
•  ProjecEon	
–  Extracts	a	part	of	the	content	
of	an	item	
•  Renaming	
–  Changes	the	name	of	a	field	in	
languages	based	on	records	or	
tuples	
•  Pa?ern-based	languages	
–  SelecEon	inside	the	condiEon	
part	(pa?ern)	
–  ElaboraEon	as	part	of	the	
acEon	
80G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	
Define ExpensiveItem
(highPrice: double)
From Item(price>100)
Where highPrice = price
Selection
Renaming
Projection
7/10/2015
BigData
Language	Model	
•  Logic	Operators	
–  ConjuncEon	
–  DisjuncEon	
–  RepeEEon	
–  NegaEon	
•  Explicitly	present	in	pa?ern-
based	languages	
81G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	
PADRES
(A & B) || (C & D)
Conjunction DisjuncEon
7/10/2015
BigData
Language	Model	
•  Logic	Operators	
–  ConjuncEon	
–  DisjuncEon	
–  RepeEEon	
–  NegaEon	
•  Some	logic	operators	are	blocking	
–  Express	pa?ern	whose	validity	
cannot	be	decided	into	a	
bounded	amount	of	Eme	
•  E.g.,	NegaEon	
–  Used	in	conjuncEon	with	
windows	
82G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	
Define Fire()
From Smoke(area=$a) and
not Rain(area=$a)
within 10 min from Smoke
NegaEon
Window
7/10/2015
BigData
Language	Model	
•  Logic	Operators	
–  ConjuncEon	
–  DisjuncEon	
–  RepeEEon	
–  NegaEon	
•  Tradi1onally,	logic	operators	
were	not	explicitly	offered	by	
declaraEve	and	imperaEve	
languages	
•  However,	they	could	be	
expressed	as	transformaEon	of	
input	flows	
83G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	
Select IStream
(F1.A, F2.B)
From F1 [Rows 10],
F2 [Rows 20]
ConjuncEon	of	A	and	B
7/10/2015
BigData
Language	Model	
•  Sequences	
–  Similar	to	logic	operators	
–  Based	on	Eming	
relaEons	among	items	
•  Present	in	almost	all	
pa?ern-based	
languages	
84G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	
Define Fire()
From Smoke(area=$a) and last
Temp(area=$a and value>45)
within 5 min. from Smoke
Sequence	(Eme-bounded)
7/10/2015
BigData
Language	Model	
•  Sequences	
–  Similar	to	logic	operators	
–  Based	on	Eming	
relaEons	among	items	
•  Tradi1onally,	transforming	
languages	did	not	provide	
sequences	explicitly	
•  Could	be	expressed	with	an	
explicit	reference	to	
Emestamps	
–  If	present	inside	items	
85G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	
Select IStream
(F1.A, F2.B)
From F1 [Rows 10], F2 [Rows 20]
Where F1.timestamp < F2.timestamp
Impose	Emestamp	order
7/10/2015
BigData
Language	Model	
•  IteraEons	
– Express	possibly	unbounded	sequences	of	items	…	
– …	saEsfying	an	itera1ng	condiEon	
•  Implicitly	defines	an	ordering	among	items	
SASE+
PATTERN SEQ(Alert a, Shipment+ b[ ])
WHERE skip_till_any_match(a, b[ ]) {
a.type = ’contaminated’ and
b[1].from = a.site and b[i].from = b[i-1].to
} WITHIN 3 hours
IteraEon	(Kleene	+)
86G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Language	Model	
•  Logic	operators,	
sequences,	and	
iteraEons	tradi1onally	
not	offered	by	
transforming	languages	
•  And	now?	
–  Current	trend:	
•  Embed	pa?erns	inside	
declaraEve	languages	
•  Especially	adopted	in	
commercial	systems	
Esper
Select A.price
From pattern
[every (A à(B or C))]
Where A.price > 100
87G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Language	Model	
•  Windows	
– Kind:	
•  Logical	(Time-Based)	
•  Physical	(Count-
Based)	
•  User-Defined	
Logical
Select IStream(Count(*))
From F1[Range 1 Minute]
Physical
Select IStream(Count(*))
From F1[Rows 50 Slide 10]
88G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Language	Model	
•  Windows	are	used	to	limit	the	scope	of	blocking	operators	
•  They	are	generally	available	in	declaraEve	and	imperaEve	
languages	
•  They	are	not	present	in	all	pa?ern-based	languages	
–  Some	of	them	do	not	include	blocking	operators	
–  Some	of	them	“embed”	windows	inside	operators	
•  Making	them	unblocking	
CEDR
EVENT Test-Rule
WHEN UNLESS(A, B, 12 hours)
WHERE A.a < B.b
89G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Language	Model	
•  Windows	movement	
–  Fixed:	do	not	move	at	all	
–  Landmark:	have	a	fixed	lower	bound,	while	the	upper	
bound	advances	every	Eme	a	new	informaEon	item	enters	
the	system	
•  E.g.,	all	items	since	1/1/2013	
–  Sliding:	have	a	fixed	size,	both	lower	and	upper	bounds	
advance	when	new	items	enter	the	system	
–  Pane:	both	the	lower	and	the	upper	bounds	move	by	k	
elements,	as	k	elements	enter	the	system	
•  K	is	smaller	than	the	window	size	
–  Tumble:	same	as	above	
•  K	is	greater	or	equal	to	the	window	size	
90G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Language	Model	
•  Flow	management	operators	
–  Required	by	declaraEve	and	imperaEve	languages	to	
merge,	split,	organize,	and	process	incoming	flows	of	
informaEon	
Flow Management Operators
Join
Bag Operators
Duplicate
Union
Except
Intersect
Remove-duplicates
Group By
Order By
Flow Creation
91G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Language	Model	
•  ParameterizaEon	
–  Allows	the	binding	of	
different	informaEon	items	
based	on	their	content	
–  Offered	implicitly	by	
declaraEve	and	imperaEve	
languages	
•  Through	a	combinaEon	of	
join	and	selecEon	
–  Offered	as	an	explicit	
operator	in	pa?ern-based	
languages	
CQL / Stream
Select IStream (F1.A, F2.B)
From F1 [Rows 10], F2 [Rows 20]
Where F1.A > F2.B
Cartesian	
product
SelecEon
Explicit	ParameterTESLA / T-Rex
Define Fire()
From Smoke(area=$a) and last
Temp(area=$a and value>45)
within 5 min. from Smoke
	
92G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Language	Model	
Aggregates
•  DetecEon	Aggregates	
•  ProducEon	Aggregates	
Scope
Definition
•  Predefined	
•  User-defined	
Define Fire(area: string,
measuredTemp: double)
From Smoke(area=$a) and
45 < Avg(Temp(area=$a).value
within 5 min. from Smoke)
Where area=Smoke.area and
measuredTemp=Temp.value
Define Fire(area: string,
measuredTemp: double)
From Smoke(area=$a) and
last Temp(area=$a and value>45)
within 5 min. from Smoke)
Where area=Smoke.area and
measuredTemp=Avg(Temp(area=$a).value)
within 1 hour from Smoke
DetecEon	
Aggregate
ProducEon	
Aggregate
93G.	Cugola	and	A.	Margara		-		h?p://www.streamreasoning.org/courses/scep2015	7/10/2015
BigData
Credits	
•  These	slides	are	parEally	based	on	"A	modeling	
framework	for	DSMS	and	CEP"	by	G.	Cugola	and	
A.	Margara	presented	in	the	PhD	course	on	
"Stream	and	complex	event	processing"	offered	
by	Politecnico	di	Milano	in	2015.	
– h?p://www.streamreasoning.org/courses/scep2015		
7/10/2015	 @manudellavalle		-		h?p://emanueledellavalle.org	 94
BigData
Semantic Approach to
Big Data and Event Processing
Thank	you!	
Any	QuesEon?	
Emanuele	Della	Valle	
DEIB	-	Politecnico	di	Milano	
@manudellavalle	
emanuele.dellavalle@polimi.it	
h?p://emanueledellavalle.org

More Related Content

PPT
Taking advantageofai july2018
PPTX
Data Streaming in IoT and Big Data Analytics
PDF
Big Data for Smart City
PPTX
Data Activities in Austria
PDF
Big data analytics and building intelligent applications
PPTX
ParStream - Big Data for Business Users
PPTX
Big Data Session Presentations
PPTX
Neo4j Popular use case
Taking advantageofai july2018
Data Streaming in IoT and Big Data Analytics
Big Data for Smart City
Data Activities in Austria
Big data analytics and building intelligent applications
ParStream - Big Data for Business Users
Big Data Session Presentations
Neo4j Popular use case

What's hot (20)

PPTX
Data sciences and marketing analytics
PDF
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
PDF
DAMA Webinar: Taking Information Governance to the Next Level
PPTX
Big data session five ( a )f
PPTX
Neo4j GraphTalk Wien - Einführung
PDF
International Journal of Computer Science, Engineering and Information Techn...
PDF
Big Data et eGovernment
PDF
How Big Data Ecosystems Work
PDF
Helio, a Continues Real-Time Fraud Detection and Monitoring Solution
PDF
Nokia On Analyzing, With Wisdom, The Cognition Of The Crowd
PPTX
Using big data and open source for smart city planning
PPTX
SC4 Workshop 1: Logistics and big data German herrero
PPTX
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...
PDF
Harnessing Big Data_UCLA
PDF
Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...
PDF
"Agile Analytics" - Marianne Faro, Analytics Competence Lead at Itility
PDF
Drowning in Data but Thirsty for Insights
PDF
Short introduction to Big Data Analytics, the Internet of Things, and their s...
PDF
Towards a big data roadmap for europe
PDF
Double Your Hadoop Performance with Hortonworks SmartSense
Data sciences and marketing analytics
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
DAMA Webinar: Taking Information Governance to the Next Level
Big data session five ( a )f
Neo4j GraphTalk Wien - Einführung
International Journal of Computer Science, Engineering and Information Techn...
Big Data et eGovernment
How Big Data Ecosystems Work
Helio, a Continues Real-Time Fraud Detection and Monitoring Solution
Nokia On Analyzing, With Wisdom, The Cognition Of The Crowd
Using big data and open source for smart city planning
SC4 Workshop 1: Logistics and big data German herrero
"Social innovation with (big) data" - Maurice Fransen, Analytics Lead Public ...
Harnessing Big Data_UCLA
Hyper-Converged Infrastructure: Big Data and IoT opportunities and challenges...
"Agile Analytics" - Marianne Faro, Analytics Competence Lead at Itility
Drowning in Data but Thirsty for Insights
Short introduction to Big Data Analytics, the Internet of Things, and their s...
Towards a big data roadmap for europe
Double Your Hadoop Performance with Hortonworks SmartSense
Ad

Viewers also liked (10)

PPTX
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
PDF
Examples of Applied Semantic Technologies: Social Data Annotation
PDF
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
PPTX
Examples of Real-World Big Data Application
PDF
Semantics Approach to Big Data and Event Processing: an introduction focused ...
PDF
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
PDF
Mastering the variety dimension of Big Data with semantic technologies: high ...
PPTX
Integrating Sensor and Social Data for Understanding City Events
PPTX
Knoesis-Semantic filtering-Tutorials
PDF
RDF Streams and Continuous SPARQL (C-SPARQL)
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
Examples of Applied Semantic Technologies: Social Data Annotation
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
Examples of Real-World Big Data Application
Semantics Approach to Big Data and Event Processing: an introduction focused ...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Mastering the variety dimension of Big Data with semantic technologies: high ...
Integrating Sensor and Social Data for Understanding City Events
Knoesis-Semantic filtering-Tutorials
RDF Streams and Continuous SPARQL (C-SPARQL)
Ad

Similar to Mastering the Velocity Dimension of Big Data (20)

PPTX
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
PPTX
SKILLWISE-BIGDATA ANALYSIS
PPTX
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
PPTX
Identifying the new frontier of big data as an enabler for T&T industries: Re...
PDF
Building a Business Case for Innovation: Project Considerations for Cloud, Mo...
PDF
From IoT to IoTA
PPTX
TOP Business Intelligence Predictions for 2015
PPTX
DataAquitaine February 2022
PPTX
Algorithm Marketplace and the new "Algorithm Economy"
PPTX
Joaquin Salvachúa_Put your data to work on your business using AI/ML with FIW...
PDF
MBA-TU-Thailand:BigData for business startup.
PDF
Introduction to Streaming Analytics
PPTX
Big data4businessusers
PPTX
Top Business Intelligence Trends for 2016 by Panorama Software
PDF
Big Data Architectures @ JAX / BigDataCon 2016
PDF
What is Big Data Pipe?
PDF
SuanIct-Bigdata desktop-final
PPTX
I40 The Current Industrial Revolution
PDF
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
PPT
SMAC
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
SKILLWISE-BIGDATA ANALYSIS
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
Identifying the new frontier of big data as an enabler for T&T industries: Re...
Building a Business Case for Innovation: Project Considerations for Cloud, Mo...
From IoT to IoTA
TOP Business Intelligence Predictions for 2015
DataAquitaine February 2022
Algorithm Marketplace and the new "Algorithm Economy"
Joaquin Salvachúa_Put your data to work on your business using AI/ML with FIW...
MBA-TU-Thailand:BigData for business startup.
Introduction to Streaming Analytics
Big data4businessusers
Top Business Intelligence Trends for 2016 by Panorama Software
Big Data Architectures @ JAX / BigDataCon 2016
What is Big Data Pipe?
SuanIct-Bigdata desktop-final
I40 The Current Industrial Revolution
Mobile, Wearables, Big Data and A Strategy to Move Forward (with NTT Data Ent...
SMAC

Recently uploaded (20)

PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
Launch Your Data Science Career in Kochi – 2025
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
.pdf is not working space design for the following data for the following dat...
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Supervised vs unsupervised machine learning algorithms
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Launch Your Data Science Career in Kochi – 2025
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Business Ppt On Nestle.pptx huunnnhhgfvu
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Data_Analytics_and_PowerBI_Presentation.pptx

Mastering the Velocity Dimension of Big Data