SlideShare a Scribd company logo
BigData
Semantic Approach to
Big Data and Event Processing
Stream	Reasoning:	
mastering		
the	velocity	and	variety	dimensions	of	
Big	Data	at	once	
Emanuele	Della	Valle	
DEIB	-	Politecnico	di	Milano	
@manudellavalle	
emanuele.dellavalle@polimi.it	
hBp://emanueledellavalle.org
BigData
It's	a	streaming	world	…	
•  Off-shore	oil	operaIons	
•  Smart	CiIes	
•  Global	Contact	Center	
•  Social	networks	
•  Generate	data	streams!	
2	
E.	Della	Valle,	S.	Ceri,	F.	van	Harmelen,	D.	Fensel	It's	a	Streaming	World!	Reasoning	upon	
Rapidly	Changing	Informa:on.	IEEE	Intelligent	Systems	24(6):	83-89	(2009)	
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
…	looking	for	reacIve	answers	…	
3	
•  What	is	the	expected	Ime	to	failure	when	that	
turbine's	barring	starts	to	vibrate	as		
detected	in	the	last	10	minutes?		
•  Is	public	transportaIon	
where	the	people	are?	
	
•  Who	are	the	best	available	agents	to		
route	all	these	unexpected	contacts		
about	the	tariff	plan	launched	yesterday?		
•  Who	is	driving	the	discussion		
about	the	top	10	emerging	topics	?	
	
•  Require	conInuous	processing		
and	reacIve	answer	
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
…	and	more	conflicIng	requirements	1/8	
A	system	able	to	answer	those	queries	must	be	able	to		
•  handle	massive	datasets	
–  A	typical	oil	producIon	placorm	is	equipped		
with	about	400.000	sensors	
–  Telecom	data	is	the	most	pervasive	data	
source	in	urban	are,	in	Milano	there	are	
1.8	million	mobile	users	
–  A	global	contact	centre	of	a	Telecom		
operator	counts	500	millions	of	clients	
	
–  Facebook	alone	has	1.1	billion		
of	acIve	users		
		
4	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
…	conflicIng	requirements	2/8	
A	system	able	to	answer	those	queries	must	be	able	to		
•  process	data	streams	on	the	fly		
–  The	sensors	on	typical	oil	producIon		
placorm	generates	10,000	observaIons	
per	minute	with	peaks	of	100,000	o/m	
–  The	mobile	users	in	Milano	generates	
20,000	call/sms/data	connecIons	
per	minute	with	peaks	of	80,000	c/m	
–  A	global	contact	centre	receives	
10,000	contacts	per	minute	with	
peaks	of	30,000	c/m	
–  Facebook,	as	of	May	2013,	observes	
3	millions	"I	like"	per	minute	
		
5	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
…	conflicIng	requirements	3/8	
A	system	able	to	answer	those	queries	must	be	able	to		
•  cope	with	heterogeneous	dataset			
–  The	sensors	on	typical	oil	producIon	
have	been	deployed	over	10	years	
by	10s	of	different	producers		
–  Tens	of	data	sources	are	normally	
needed	to	make	sense	of	an	urban	
phenomena	
–  A	global	contact	centre	consists	in	100s	
of	offices	owned	by	different	subsidiary		
companies	engaged	yearly	
–  Each	social	network	has	its	own	
data	model,	APIs,	…	
		
6	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
…	conflicIng	requirements	4/8	
A	system	able	to	answer	those	queries	must	be	able	to		
•  cope	with	incomplete	data				
–  10s	of	sensors	and	networking	links		
broke	down	daily	
	
–  Coverage	is	incomplete	
	
	
–  Only	standard	cases	are	covered	by	
fully	machine	processable	data	records	
100s	of	contacts	per	minute	are		
manage	ad-hoc	
–  Conversa:ons	happen	outside	the	
social	networks,	too!	
7	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
…	conflicIng	requirements	5/8	
A	system	able	to	answer	those	queries	must	be	able	to		
•  cope	with	noisy	data					
–  Sensor	out-of-opera:ng	range		
	
	
–  Faulty	sensors	
	
	
–  Agents	misunderstand,	get	:red,	…	
	
	
–  	Irony,	sarcasm,	…	
8	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
…	conflicIng	requirements	6/8	
A	system	able	to	answer	those	queries	must	be	able	to		
•  provide	reac:ve	answers						
–  detecIon	of	dangerous	situaIons		
must	occur	within	minutes		
	
–  recommendaIons	to	ciIzens	must	
be	performed	in	few	seconds	
	
–  rouIng	a	contact	through	each	step	of		
the	decision	tree	must	take	less	than	a	
second	
–  Search	autocompleIng	may	need	
to	be	updated	every	few	minutes		
9	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
…	conflicIng	requirements	7/8	
A	system	able	to	answer	those	queries	must	be	able	to		
•  support	fine-grained	informa:on	access							
–  IdenIfy	a	turbine	among	thousands	
	
	
–  Locate	a	bus	among	thousands	
	
	
–  Contact	an	agent	among	thousands	
	
	
–  IdenIfy	an	opinion	maker	among	
thousands	of	influencers	for	a	topic	
10	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
…	conflicIng	requirements	8/8	
A	system	able	to	answer	those	queries	must	be	able	to		
•  integrate	complex	domain	models	of							
–  opera:onal	and	control	process		
	
	
–  various	city	aspects	
	
	
–  contact	management,	contract	types,		
agent	skills,	contactor	profiles,	…		
	
–  topics,	user	profiles,	…	
11	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Challenges	
A	system	able	to	answer	those	queries	must	be	able	to		
•  handle	massive	datasets 	 	 	 	 	 	 	x		
•  process	data	streams	on	the	fly 	 	 	 	 	 	x		
•  cope	with	heterogeneous	datasets 	 	 	 	 	 	x		
•  cope	with	incomplete	data 	 	 	 	 	 	 	 	 	x 	x					
•  cope	with	noisy	data	 	 	 	 	 	 	 	 	 	 	 	x					
•  provide	reac:ve	answers 	 	 	 	 	 	 	 	x						
•  support	fine-grained	access	 	 	 	 	 	 	x				x							
•  integrate	complex	domain	models	 	 	 	 	 	 	 	x		
12	
Volume'
Velocity'
Variety'
Veracity'
In Big Data terms
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Challenges	
•  Volume	+	Velocity	+	Variety	=	hard	deal	
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org	 13
Volume
months days hours min. sec. ms.
Volume
ZB
EB
PB
TB
GB
MB
KB
Variety
BigData
A	good	reason	to	embrace	it!	
•  Volume	+	Velocity	+	Variety	=>	high	value	
	
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org	 14
Value
ms. sec. min. hours days months years
velocity
Variety
BigData
From	challenges	to	opportuniIes	
•  Formally	data	streams	are	:		
–  unbounded	sequences	of	Ime-varying	data	elements	
•  Less	formally,	in	many	applicaIon	domains,	they	are:		
–  a	“conInuous”	flow	of	informaIon		
–  where	recent	informa:on	is	more	relevant	as	it	describes	the	
current	state	of	a	dynamic	system	
•  OpportuniIes	
–  Forget	old	enough	informa:on	
–  Exploit	the	implicit	ordering	(by	recency)	in	the	data		
time
15	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
State-of-the-art:	DSMS	and	CEP		
•  A	paradigma:c	change!	
•  ConInuous	queries	registered	over	streams	that	
are	observed	trough	windows	
	
window
input streams streams of answerRegistered	
ConInuous	
Query	
Dynamic	
System
16	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
DSMS	and	CEP	vs.	requirements	
Requirement
DSMS
CEP
massive datasets
data streams
heterogeneous dataset
incomplete data
noisy data
reactive answers
fine-grained information access
complex domain models
17	
✗
✗
✗
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
State of the art: Ontology Based Data Access	
•  Given	ontology	O	and	query	Q,	use	O	to	rewrite	Q	
as	Q’	so	that,	for	any	set	of	ground	facts	A	contained	in	mulIple	
databases:	
–  answer(Q,O,A)	=	answer(Q’,!,A)	
The	answer	of	the	query	Q	using	the	ontology	O	for	any	set	of	ground	facts	A	
is	equal	to	answer	of	a	query	Q’	without	considering	the	ontology	O		
•  Use	mapping	M	to	map	Q’	to	mulIple	SQL	queries	to	the	various	
databases	
Rewrite
O
Q
Q’
Map
SQL
M
answer
A
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org	 18
BigData
DSMS,	CEP	and	OBDA	vs.	requirements	
Requirement
DSMS
CEP
OBDA
massive datasets
data streams
heterogeneous dataset
incomplete data
noisy data
reactive answers
fine-grained information access
complex domain models
19	
✗
✗
✗
✗
✗
✗
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Stream	Reasoning	
•  Research	quesIon	
–  is	it	possible	to	make	sense	in	real	:me	of		
mul:ple,	heterogeneous,	gigan:c	and	inevitably	noisy	and	
incomplete	data	streams	in	order	to	support	the	decision	
processes	of	extremely	large	numbers	of	concurrent	users?	
•  Proposed	approach	
	
20	
Complexity	
Raw	Stream	Processing	
SemanIc	Streams	
DL-Lite	
DL	AbstracIon	
SelecIon	
InterpretaIon	
Reasoning	
Querying	
Re-wriIng	
Change	Frequency	
PTIME	
NEXPTIME	
104	Hz	
1	Hz		
Complexity	vs.	Dynamics		
AC0	
H.	Stuckenschmidt,	S.	Ceri,	E.	Della	Valle,	F.	van	Harmelen:	Towards	Expressive	Stream	Reasoning.	Proceedings	
of	the	Dagstuhl	Seminar	on	SemanIc	Aspects	of	Sensor	Networks,	2010.		
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Sub-research	quesIons	
1.  Is	it	possible	extend	the	Seman:c	Web	stack		
in	order	to	represent	heterogeneous	data	streams,	
conInuous	queries,	and	conInuous	reasoning	tasks?	
2.  Does	the	ordered	nature	of	data	streams	and	the	
possibility	to	forget	old	enough	informaIon	allow	to	
op:mize	con:nuous	querying	and	con:nuous	reasoning	
tasks	so	to	provide	reac:ve	answers	to	large	number	of	
concurrent	users	without	forsaking	correctness	or	
completeness?		
3.  Can	SemanIc	Web	and	Machine	Learning	technologies	be	
jointly	employed	to	cope	with	the	noisy	and	incomplete	
nature	of	data	streams?	
4.  Are	there	prac:cal	cases	where	processing	data	stream	at	
semanIc	level	is	the	best	choice?		
21	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Sub-research	quesIons	
1.  Is	it	possible	extend	the	Seman:c	Web	stack		
in	order	to	represent	heterogeneous	data	streams,	
conInuous	queries,	and	conInuous	reasoning	tasks?	
2.  Does	the	ordered	nature	of	data	streams	and	the	
possibility	to	forget	old	enough	informaIon	allow	to	
op:mize	con:nuous	querying	and	con:nuous	reasoning	
tasks	so	to	provide	reac:ve	answers	to	large	number	of	
concurrent	users	without	forsaking	correctness	or	
completeness?		
3.  Can	SemanIc	Web	and	Machine	Learning	technologies	be	
jointly	employed	to	cope	with	the	noisy	and	incomplete	
nature	of	data	streams?	
4.  Are	there	prac:cal	cases	where	processing	data	stream	at	
semanIc	level	is	the	best	choice?		
22	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Model	data	stream	at	semanIc	level					1/2		
•  State	of	the	art:	RDF	
–  It	allows	to	make	statements	about	resources	in	the	form	of	
subject-predicate-object	expressions	
•  In	RDF	terminology	triples	
•  E.g.	
							@BarakObama								posts										"Four	more	years"	
	
	
–  A	collecIon	of	RDF	statements	represents	a	labelled,	directed	
graph	
•  In	RDF	terminology	a	graph	
•  E.g.,	the	tweet	above	by	Barak	Obama	is	connected	to	
–  800,000+	twiBer	user	profiles	via	retweets	
–  300,000+	twiBer	user	profiles	favorite	
–  …	
•  ContribuIon:	RDF	stream	
subject predicate object
23	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Model	data	stream	at	semanIc	level					2/2		
•  State	of	the	art:	RDF	
•  ContribuIon:	RDF	Stream	
–  Unbound	sequence	of	:me-varying	triples	
–  each	represented	by	a	pair	made	of	an	RDF	triple	and	its	
Imestamp		
	 	 	 	 	…	
	@BarakObama			 					posts	 						"Four	more	years",									 						8:16PM	6	Nov	2012	
	@Alice														 					posts	 						"RT:	Four	more	years",			 						8:17PM	6	Nov	2012	
	 	 	 	 	…	
	
D.F.	Barbieri,	D.	Braga,	S.	Ceri,	E.	Della	Valle,	M.	Grossniklaus:	Querying	RDF	streams	with		
C-SPARQL.	SIGMOD	Record	39(1):	20-26	(2010)		
24	
subject predicate object timestamp
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Adding	conInuous	semanIcs	to	SPARQL
Who	are	the	opinion	makers?	i.e.,	the	users	who	are	
likely	to	influence	the	behavior	their	followers	
REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS
CONSTRUCT { ?opinionMaker sd:about ?resource }
FROM STREAM <http://…> [RANGE 30m STEP 5m]
WHERE {
?opinionMaker ?opinion ?resource .
?follower sioc:follows ?opinionMaker.
?follower ?opinion ?resource.
FILTER ( cs:timestamp(?follower) >
cs:timestamp(?opinionMaker) )
}
HAVING ( COUNT(DISTINCT ?follower) > 3 )
25	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Adding	conInuous	semanIcs	to	SPARQL
Who	are	the	opinion	makers?	i.e.,	the	users	who	are	
likely	to	influence	the	behavior	their	followers	
REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS
CONSTRUCT { ?opinionMaker sd:about ?resource }
FROM STREAM <http://…> [RANGE 30m STEP 5m]
WHERE {
?opinionMaker ?opinion ?resource .
?follower sioc:follows ?opinionMaker.
?follower ?opinion ?resource.
FILTER ( cs:timestamp(?follower) >
cs:timestamp(?opinionMaker) )
}
HAVING ( COUNT(DISTINCT ?follower) > 3 )
Query	registra:on	
(for	con:nuous	execu:on)	
FROM	STREAM	clause	
WINDOW	
RDF	Stream	added	as		
new	ouput	format			
Buil:n	to	access	
:mestamps	
26	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigDatadiscusses	 discusses	 discusses	
discusses	 discusses	
discusses	
discusses	
Defining	conInuous	deducIve	reasoning	
What impact has been my micropost p1 creating in the last hour?
Let’s count the number of microposts that discuss it …
REGISTER STREAM ImpactMeter AS
SELECT (count(?p) AS ?impact)
FROM STREAM <http://…/fb> [RANGE 60m STEP 10m]
WHERE {
:Alice posts [ sr:discusses ?p ]
}
p1	 p3	 p5	 p8	
p2	 p4	 p7	
p6	
7!
27	
Transitive
property
Alice posts p1 .
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Finding	
•  The	Seman:c	Web	stack	can	be	extended	so	to	
incorporate	streaming	data	as	a	first	class	ciIzen	
–  RDF	stream	data	model	
–  Con:nuous	SPARQL	syntax	and	semanIcs	
–  Con:nuous	deduc:ve	reasoning	semanIcs					
28	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Sub-research	quesIons	
1.  Is	it	possible	extend	the	Seman:c	Web	stack		
in	order	to	represent	heterogeneous	data	streams,	
conInuous	queries,	and	conInuous	reasoning	tasks?	
2.  Does	the	ordered	nature	of	data	streams	and	the	
possibility	to	forget	old	enough	informaIon	allow	to	
op:mize	con:nuous	querying	and	con:nuous	reasoning	
tasks	so	to	provide	reac:ve	answers	to	large	number	of	
concurrent	users	without	forsaking	correctness	or	
completeness?		
3.  Can	SemanIc	Web	and	Machine	Learning	technologies	be	
jointly	employed	to	cope	with	the	noisy	and	incomplete	
nature	of	data	streams?	
4.  Are	there	prac:cal	cases	where	processing	data	stream	at	
semanIc	level	is	the	best	choice?		
29	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
OpImize	querying	for	reacIve	answers	
•  C-SPARQL	engine	Ime	window-based	selecIon	outperforms							
							SPARQL	filter-based	selecIon	(Jena-ARQ)	
D.	Barbieri,	D.	Braga,	S.	Ceri,	E.	Della	Valle,	Y.	Huang,	V.	Tresp,	A.Reunger,	H.	Wermser:	DeducIve	and	
InducIve	Stream	Reasoning	for	SemanIc	Social	Media	AnalyIcs		
IEEE	Intelligent	Systems,	30	Aug.	2010.	
30	
Our In-memory
RDF stream
processing
engine
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Is	it	an	hard	problem?	
•  In	the	database	community	it	is	known	as	the		
incremental	view	maintenance	problem	
•  The	state	of	the	art	soluIon	is	the	DRed	algorithm	
–  Overes<ma<on	of	dele<on:	OveresImates	deleIons	by	compuIng	all	
direct	consequences	of	a	deleIon.	
–  Rederiva<on:	Prunes	those	esImated	deleIons	for	which	alternaIve	
derivaIons	(via	some	other	facts	in	the	program)	exist.	
–  Inser<on:	Adds	the	new	derivaIons	that	are	consequences	of	
inserIons	
•  DeleIon	are	twice	as	expensive	as	inserIons	
A	 B C
E
A	 C
A	 B C
E
A	 C
31	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Is	it	an	hard	problem?	
1000	
	
	
	
100	
	
	
	
10	
	
	
	
0	
rematerialize	the	view	
aver	each	update	
0%								2%								4%								6%								8%								10%								12%								14%	
%	of	deleIons	w.r.t	
the	size	of	the	DB	
ms	
Break-even	point	
In a normal DB incremental
view maintenance is convenient
in most of the cases:
•  Lot of inserts, few deletions
•  Small % of deletions w.r.t.
the size of the DB
incremental		
view	maintenance	
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org	 32
BigData
Is	it	an	hard	problem?	
1000	
	
	
	
100	
	
	
	
10	
	
	
	
0	
rematerialize	the	view	
aver	each	update	
0%								2%								4%								6%								8%								10%								12%								14%	
%	of	deleIons	w.r.t	the	
content	of	the	window	
ms	
Break-even	point	
In a streaming setting
•  The number of inserts are equals
to the number of deletions
•  the data exiting the windows
causes large % of deletions w.r.t.
the size of the window
incremental		
view	maintenance	
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org	 33
BigData
What	is	the	target	behaviour?	
1000	
	
	
	
100	
	
	
	
10	
	
	
	
0	
rematerialize	the	view	
aver	each	update	
incremental		
view	maintenance	
0%								2%								4%								6%								8%								10%								12%								14%	
ms	
Break-even	point	
Target	behaviour	
%	of	deleIons	w.r.t	the	
content	of	the	window	
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org	 34
BigData
ObservaIon	
•  DRed	works	with	random	inserIons	and	deleIons	
•  In	a	streaming	sebng,	when	a	triple	enters	the	window,		
given	the	size	of	the	window,	the	reasoner	knows	already		
when	it	will	be	deleted!	
•  E.g.,		
–  if	the	window	is	40	minutes	
long,	and,		
–  it	is	10:00,	the	triple(s)		
entering	now	
–  will	exit	on	10:40.	
•  Conclusion	
–  dele:ons	are	predictable	
Time
Enter
window
Exit
window
Explicitly in
window
Infer
win
10:00 A!B
10:10 B!C
10:20 A!E
10:30 E!C
10:40 A!B
10:50 B!C
11:00 A!E
A B
A B C A
A B C
E
A
A B C
E
A
A C
E
A
A B C
E
A
C
E
35	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
OpImized	Stream	Reasoning	(IMaRS)	
•  Idea:		
–  add	an	expira:on	:me	to	each	triple	and		
–  use	an	hash	table	to	index	triples	by	their	expiraIon	Ime	
•  The	algorithm	
1.  deletes	expired	triples		
2.  Adds	the	new	derivaIons	that	are	consequences	of	
inserIons	annota:ng	each	inferred	triple	with	an	
expira:on	:me	(the	min	of	those	of	the	triple	it	is	
derived	from),	and	
3.  when	mul:ple	deriva:ons	occur,	for	each	mulIple	
derivaIon,	it	keeps	the	max	expiraIon	Ime.	
36	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
OpImize	reasoning	for	reacIve	answers	
•  Incremental	Reasoning	on	RDF	streams	(IMaRS):	new	reasoning	
algorithm	opImized	for	reacIve	query	answering	
D.F.	Barbieri,	D.	Braga,	S.Ceri,	E.	Della	Valle,	M.	Grossniklaus:	Incremental	Reasoning	on	Streams	
and	Rich	Background	Knowledge.	ESWC	(1)	2010:	1-15	
D.	Dell'Aglio,	E.	Della	Valle:	Incremental	Reasoning	on	RDF	Streams.	In	A.Harth,	K.Hose,	R.Schenkel	
(Eds.)	Linked	Data	Management,	CRC	Press	2014,	ISBN	9781466582408	
37	
!  Re-materialize after each window slide
!  Use DRed
!  IMaRS
% of deletions w.r.t. the content of the window
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
OpImize	reasoning	for	reacIve	answers	
•  comparison	of	the	average	Ime	needed	to	answer	
a	C-SPARQL	query,	when	2%	of	the	content	exits	the	window	each	
Ime	it	slides,	using		
–  A	backward	reasoner	on	the	window	content	
–  DRed	+	standard	SPARQL	on	the	materializaIon	
–  IMaRS	+	standard	SPARQL	on	the	materializaIon	
38	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Finding	
•  Stream	Reasoning	task	is	feasible	and	the	very	nature	of	
streaming	data	offers	opportuniIes	to	op:mise	
reasoning	tasks	where	data	is	ordered	by	recency	and	
can	be	forgoBen	aver	a	while	
–  C-SPARQL	Engine	prototype	
–  IMaRS	conInuous	incremental	reasoning	algorithm	
39	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Sub-research	quesIons	
1.  Is	it	possible	extend	the	Seman:c	Web	stack		
in	order	to	represent	heterogeneous	data	streams,	
conInuous	queries,	and	conInuous	reasoning	tasks?	
2.  Does	the	ordered	nature	of	data	streams	and	the	
possibility	to	forget	old	enough	informaIon	allow	to	
op:mize	con:nuous	querying	and	con:nuous	reasoning	
tasks	so	to	provide	reac:ve	answers	to	large	number	of	
concurrent	users	without	forsaking	correctness	or	
completeness?		
3.  Can	SemanIc	Web	and	Machine	Learning	technologies	be	
jointly	employed	to	cope	with	the	noisy	and	incomplete	
nature	of	data	streams?	
4.  Are	there	prac:cal	cases	where	processing	data	stream	at	
semanIc	level	is	the	best	choice?		
40	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Cope	with	the	noisy	and	incomplete	data	
•  "Noise"	is	reduced	using	DSMS	techniques	
•  Deduc:ve	stream	reasoning	copes	with	incompleteness	deducing	implicit	facts	
•  Induc:ve	stream	reasoning	copes	with	"irrepairable"	incompleteness	inducing	
missing	facts	
D.F.	Barbieri,	D.	Braga,	S.	Ceri,	E.	Della	Valle,	Y.	Huang,	V.	Tresp,	A.	Reunger,	H.	Wermser:	DeducIve	
and	InducIve	Stream	Reasoning	for	SemanIc	Social	Media	AnalyIcs.		
IEEE	Intelligent	Systems	25(6):	32-41	(2010)		
41	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Findings	
•  A	combina:on	of	deduc:ve	and	induc:ve	stream	
reasoning	techniques	can	cope	with	incomplete	and	
noisy	data		
42	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Sub-research	quesIons	
1.  Is	it	possible	extend	the	Seman:c	Web	stack		
in	order	to	represent	heterogeneous	data	streams,	
conInuous	queries,	and	conInuous	reasoning	tasks?	
2.  Does	the	ordered	nature	of	data	streams	and	the	
possibility	to	forget	old	enough	informaIon	allow	to	
op:mize	con:nuous	querying	and	con:nuous	reasoning	
tasks	so	to	provide	reac:ve	answers	to	large	number	of	
concurrent	users	without	forsaking	correctness	or	
completeness?		
3.  Can	SemanIc	Web	and	Machine	Learning	technologies	be	
jointly	employed	to	cope	with	the	noisy	and	incomplete	
nature	of	data	streams?	
4.  Are	there	prac:cal	cases	where	processing	data	stream	at	
semanIc	level	is	the	best	choice?		
43	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
PraIcal	cases	
•  10+	deployments	in	Sensor	Networks	&	Social	media	analyIcs,	e.g.					
BOTTARI
Winner of Semantic Web
Challenge 2011
	
City Data Fusion
Winner of IBM
faculty award 2013
	
M.	Balduini,	I.	Celino,	D.	Dell’Aglio,	E.	Della	Valle,	Y.	Huang,	T.	Lee,	S.-H.	Kim,	V.	Tresp:		
BOTTARI:	An	augmented	reality	mobile	applicaIon	to	deliver	personalized	and	locaIon-based	
recommendaIons	by	conInuous	analysis	of	social	media	streams.	J.	Web	Sem.	16:	33-41	(2012)		
44	
Social Listener
M.Balduini,	E.Della	Valle,	M.Azzi,	R.Larcher,	F.Antonelli,	and	P.Ciuccarelli:		
CitySensing:	Fusing	City	Data	for	Visual	Storytelling.	IEEE	MulIMedia	22(3):	44-53	(2015)	
8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Findings	
•  There	are	applicaIon	domains	where	Stream	Reasoning	
offers	an	adequate	soluIon	
45	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Findings	
1.  The	Seman:c	Web	stack	can	be	extended	so	to	incorporate	
streaming	data	as	a	first	class	ciIzen	
–  RDF	stream	data	model	
–  Con:nuous	SPARQL	syntax	and	semanIcs	
–  Con:nuous	deduc:ve	reasoning	semanIcs					
2.  Stream	Reasoning	task	is	feasible	and	the	very	nature	of	
streaming	data	offers	opportuniIes	to	op:mise	reasoning	
tasks	where	data	is	ordered	by	recency	and	can	be	forgoBen	
aver	a	while	
–  IMaRS	conInuous	incremental	reasoning	algorithm	
–  C-SPARQL	Engine	prototype	
3.  A	combinaIon	of	deduc:ve	and	induc:ve	stream	reasoning	
techniques	can	cope	with	incomplete	and	noisy	data		
4.  There	are	applica:on	domains	where	Stream	Reasoning	offers	
an	adequate	soluIon	
46	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
Open	issues	
1.  The	Seman:c	Web	stack	can	be	extended		
–  "NavigaIng	the	Chasm	between	the	Scylla	of	PracIcal	ApplicaIons	
and	the	Charybdis	of	TheoreIcal	Approaches"	
A.	Bernstein,	2015	
2.  Stream	Reasoning	task	is	feasible		
–  It's	Ime	to	start	removing	assumpIons	
•  knowledge	does	not	change	
•  background	data	does	not	change	
–  OBDA	for	SQL	≠	OBDA	for	conInuous	querying	
3.  Stream	reasoning	can	cope	with	incomplete	and	noisy	data	
–  Theory	is	needed!	
4.  There	are	applica:on	domains	where	Stream	Reasoning	offers	
an	adequate	soluIon	
–  Rigorous	quanItaIve	comparaIve	research	is	needed			
47	8/10/2015	 @manudellavalle		-		hBp://emanueledellavalle.org
BigData
AdverIsements	:-P	
•  Check	out	my	PhD	thesis	
– hBp://dare.ubvu.vu.nl/handle/1871/53293		
– Chapter	1:	IntroducIon	
•  The	content	of	this	presentaIon	
– Chapter	8:	conclusions	
•  A	review	of	stream	reasoning	approaches	updated	in	
spring	2015	
•  Put	an	"I	like"	to	Stream	Reasoning	on	Facebook	
– hBps://www.facebook.com/streamreasoning		
@manudellavalle		-		hBp://emanueledellavalle.org	 488/10/2015
BigData
Semantic Approach to
Big Data and Event Processing
Thank	you!	
Any	QuesIon?	
Emanuele	Della	Valle	
DEIB	-	Politecnico	di	Milano	
emanuele.dellavalle@polimi.it	
hBp://emanueledellavalle.org

More Related Content

PPTX
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
PDF
Confluence2016
PPTX
City Data Fusion and City Sensing presented at EIT ICT Labs for EXPO 2015
PPTX
Digital delta & geodesign sept2014: connection water data with geo data infra...
PDF
Safecast long version oct 2015
PPTX
Semantics-empowered Smart City applications: today and tomorrow
PPTX
City Data Fusion: A Big Data Infrastructure to sense the pulse of the city in...
PPT
CUbRIK is
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
Confluence2016
City Data Fusion and City Sensing presented at EIT ICT Labs for EXPO 2015
Digital delta & geodesign sept2014: connection water data with geo data infra...
Safecast long version oct 2015
Semantics-empowered Smart City applications: today and tomorrow
City Data Fusion: A Big Data Infrastructure to sense the pulse of the city in...
CUbRIK is

What's hot (11)

PPT
How to make cities "smarter"?
PPT
Data Analytics for Smart Cities: Looking Back, Looking Forward
PPTX
Big Data & Smart City Applications
PPTX
The Future Started Yesterday: The Top Ten Computer and IT Trends
PDF
Tim willoughby
PPTX
Big data and smart cities
PPTX
20160511 Sustainability in Local Government
PDF
Geospatial Information Management
PDF
Strawberry energy
PDF
HunchWorks: Combining Human Expertise and Big Data
PPTX
Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...
How to make cities "smarter"?
Data Analytics for Smart Cities: Looking Back, Looking Forward
Big Data & Smart City Applications
The Future Started Yesterday: The Top Ten Computer and IT Trends
Tim willoughby
Big data and smart cities
20160511 Sustainability in Local Government
Geospatial Information Management
Strawberry energy
HunchWorks: Combining Human Expertise and Big Data
Share Information, Change the World: Big Data, Small Apps, Smart Dashboards &...
Ad

Viewers also liked (9)

PDF
Mastering the variety dimension of Big Data with semantic technologies: high ...
PPTX
Integrating Sensor and Social Data for Understanding City Events
PPTX
Examples of Real-World Big Data Application
PDF
Mastering the Velocity Dimension of Big Data
PDF
Examples of Applied Semantic Technologies: Social Data Annotation
PDF
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
PDF
Semantics Approach to Big Data and Event Processing: an introduction focused ...
PPTX
Knoesis-Semantic filtering-Tutorials
PDF
RDF Streams and Continuous SPARQL (C-SPARQL)
Mastering the variety dimension of Big Data with semantic technologies: high ...
Integrating Sensor and Social Data for Understanding City Events
Examples of Real-World Big Data Application
Mastering the Velocity Dimension of Big Data
Examples of Applied Semantic Technologies: Social Data Annotation
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Semantics Approach to Big Data and Event Processing: an introduction focused ...
Knoesis-Semantic filtering-Tutorials
RDF Streams and Continuous SPARQL (C-SPARQL)
Ad

Similar to Stream Reasoning: mastering the velocity and variety dimensions of Big Data at once (20)

PPTX
On Stream Reasoning
PPTX
It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, ...
PDF
Stream reasoning: mastering the velocity and the variety dimensions of Big Da...
PDF
sylviane toporkoff one conference prague 2013
PPTX
Innovation in the public sector
PDF
Big Data et eGovernment
PPTX
ICCM 2013 Panel 1: What's so Big about Big Data?
PPTX
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
PPTX
Listening to the pulse of our cities with Stream Reasoning (and few more tech...
PPTX
Big Data in a Digital City. Key Insights from the Smart City Case Study
PPTX
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
PDF
Tim willoughby - Presentation to Open Ireland
PDF
Open Innovation - Winter 2014 - Socrata, Inc.
PDF
ePlus Presents Big Data 101
PPT
Internet of Things and Large-scale Data Analytics
PPTX
Cross-Disciplinary Insights on Big Data Challenges and Solutions
PDF
Disaster Technology Trends & Digital Volunteerism
PPTX
Transforming Operations Using the Results of the Tech Wave
PPTX
Identifying the new frontier of big data as an enabler for T&T industries: Re...
PDF
Web 3.0 & Internet of Things
On Stream Reasoning
It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, ...
Stream reasoning: mastering the velocity and the variety dimensions of Big Da...
sylviane toporkoff one conference prague 2013
Innovation in the public sector
Big Data et eGovernment
ICCM 2013 Panel 1: What's so Big about Big Data?
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
Listening to the pulse of our cities with Stream Reasoning (and few more tech...
Big Data in a Digital City. Key Insights from the Smart City Case Study
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Tim willoughby - Presentation to Open Ireland
Open Innovation - Winter 2014 - Socrata, Inc.
ePlus Presents Big Data 101
Internet of Things and Large-scale Data Analytics
Cross-Disciplinary Insights on Big Data Challenges and Solutions
Disaster Technology Trends & Digital Volunteerism
Transforming Operations Using the Results of the Tech Wave
Identifying the new frontier of big data as an enabler for T&T industries: Re...
Web 3.0 & Internet of Things

Recently uploaded (20)

PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPT
Quality review (1)_presentation of this 21
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Launch Your Data Science Career in Kochi – 2025
PDF
Mega Projects Data Mega Projects Data
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
Fluorescence-microscope_Botany_detailed content
Major-Components-ofNKJNNKNKNKNKronment.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Clinical guidelines as a resource for EBP(1).pdf
climate analysis of Dhaka ,Banglades.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Quality review (1)_presentation of this 21
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Introduction to Knowledge Engineering Part 1
1_Introduction to advance data techniques.pptx
Moving the Public Sector (Government) to a Digital Adoption
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Launch Your Data Science Career in Kochi – 2025
Mega Projects Data Mega Projects Data
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Fluorescence-microscope_Botany_detailed content

Stream Reasoning: mastering the velocity and variety dimensions of Big Data at once