SlideShare a Scribd company logo
1BNAIC 2015 November 5-6, 2015
AMIDST Toolbox
A Java library for Analysis of MassIve Data Streams using
Probabilistic Graphical Models
FP7 European research project
http://amidst.eu
Anders	L.	Madsen,	Andres	R.	Masegosa, Ana	M.	Martinez,	
Hanen Borchani,	Thomas	D.	Nielsen,	Helge Langseth,	Antonio	
Salmeron, Dario	Ramos-Lopez.
Outline
1. Overview	of	AMIDST	Toolbox
o Why	data	streams	are	important?
o Why	PGMs	for	analyzing	data	streams?
o Scalable	Inference	(and	learning)
o Roadmap	for	coming	releases
2. Live	Demo:	Modeling	concept	drift	in	financial	data.
o Handling	data	streams.
o Defining	Bayesian	networks	with	hidden	 variables.
o Inference	and	Learning	Bayesian	networks.
BNAIC 2015 November 5-6, 2015
Scope
Part	I
Data Streams everywhere
• Unbounded	flows	of	data	are	generated	daily:	
• Social	Networks
• Network	Monitoring
• Financial/Banking	industry
• ….
BNAIC 2015 November 5-6, 2015
Data Stream Processing
• Processing	data	streams	is	challenging:
– Do	not	fit	in	main	memory
– Continuous	Model	updating	
– Continuous	Model	Inference
– Concept	Drift
BNAIC 2015 November 5-6, 2015
Processing Massive Data Streams
• Everything	has	to	scale:
• Scalable	Computing	infrastructure
• Scalable	Models/Inference/Learning
BNAIC 2015 November 5-6, 2015
AMIDST Toolbox
• Scalable	framework	for	data	stream	processing.
• Based	on	Probabilistic	Graphical	Models.
• Unique	project	for	data	stream	mining	using	PGMs.
• Open	source	project	(Apache	Software	License	2.0).
BNAIC 2015 November 5-6, 2015
AMIDST EU Project
8
§ This	toolbox	aims	to	deal	with	real,	complex	and	massive	data	streams.
§ Applied	to	real	use-cases	of	AMIDST’s	industrial	partners.
BNAIC 2015 November 5-6, 2015
Toolbox Web Page
http://amidst.github.io/toolbox/
BNAIC 2015 November 5-6, 2015
Why	PGMs	for	
data	stream	processing?
Part	II
Why Graphical Models?
§ Let’s	look	at	the	following	simple	example:	
§ Stream	of	sensor	measurements	about	temperature and	
smoke presence	in	a	given	geographical	area.
§ Monitor	the	stream	to	detect	the	presence	of	a	fire (event	
detection	problem)
?
BNAIC 2015 November 5-6, 2015
§ Cast	the	problem	as	an	anomaly	detection	
problem	(outliers).	
§ Streaming	K-Means	(widely	used	in	industry).
Why Graphical Models?
Anomaly
BNAIC 2015 November 5-6, 2015
Why Graphical Models for
analyzing Data Streams?
§ Many	data	streams	models	are	black	box	models:
§ Pros:
§ No	need	to	understand	the	problem.
§ Cons:
§ Many	hyper-parameters	to	tune.
§ Blackbox models	can	rarely	explain	what	they	learned.	
Stream
Blackbox Model
Predictions
BNAIC 2015 November 5-6, 2015
§ Bayesian	Networks:
§ Openbox models
§ Encode	prior	knowledge.
§ Continuous	and	discrete	variables	(CLG	networks).	
§ Example:	
Why Graphical Models?
Fire
Temp Smoke
T1 T2 T3 S1
p(Fire=true|t1,t2,t3,s1)
BNAIC 2015 November 5-6, 2015
Why Graphical Models?
Stream Predictions
Openbox Models
BNAIC 2015 November 5-6, 2015
Why Graphical Models?
Stream Predictions
Openbox Models
Blackbox Inference	Engine
(multi-core	parallelization)
BNAIC 2015 November 5-6, 2015
Inference	Engine
Part	III
Inference Engine
§ Querying	the	model
§ p(Fire=true|t1,t2,t3,s1,season)
§ E(Temperature|smoke=true).
BNAIC 2015 November 5-6, 2015
Inference Engine
§ Querying	the	model
§ p(Fire=true|t1,t2,t3,s1,season)
§ E(Temperature|smoke=true)
§ Learning	from	data		(using	a	Bayesian	approach):
§ Bayesian	framework	naturally	deals	with	data	streams.
§ Prior	is	updated	in	the	light	of	new	data.
p(✓|d1, . . . , dn, dn+1) / p(dn+1|✓)p(✓|d1, . . . , dn)
BNAIC 2015 November 5-6, 2015
Querying the model
§ Parallel	Monte	Carlo	Inference	[Salmeron et	al.	CAEPIA	2015]
§ Exploit	Multi-Core	(powered	by	Java	8)
BNAIC 2015 November 5-6, 2015
Querying the model
§ Parallel	Monte	Carlo	Inference	[Salmeron et	al.	CAEPIA	2015]
§ Exploit	Multi-Core	(powered	by	Java	8)
§ Variational		Message	Passing	[Winn	et	al.	JMLR	2004]
§ Deterministic	approximation
BNAIC 2015 November 5-6, 2015
Learning from data streams
§ Bayesian	approach:
§ Learning	as	an	inference	problem.
§ Powered	by	VMP.
✓
Z
x
i = 1 . . . N
↵
BNAIC 2015 November 5-6, 2015
Learning from data streams
§ Bayesian	approach:
§ Learning	as	an	inference	problem.
§ Powered	by	VMP.
§ Plateau	notation!!
BNAIC 2015 November 5-6, 2015
Learning from data streams
§ Parallel	Streaming	Variational	Bayes[Broderick	et	al.	NIPS	2013]
§ Powered	by	Variational	Message	Passing.
§ Multi-core	processing	(using	Java	8).
BNAIC 2015 November 5-6, 2015
Links to other open software
§ MoaLink
§ MOA	is	a	state-of-the-art	tool	for	data	stream	mining.
§ Using	AMIDST	models	within	MOA	GUI!
§ Great	for	evaluation	&	comparison.
BNAIC 2015 November 5-6, 2015
Links to other open software
§ HuginLink
§ Hugin is	a	commercial	software	for	PGMs	and		influence	diagrams.
§ Models	conversion.
§ Hugin inference	engine	can	be	used	within	AMIDST.
26BNAIC 2015 November 5-6, 2015
Road	Map
Part	III
Dynamic Bayesian Networks
(release 1.1)
§ Encode	temporal	knowledge
§ Naturally	fits	with	data	streams
Fire(t)
Temp(t) Smoke(t)
T1(t) T2(t) T3(t) S1(t)
Fire(t-1)
Temp(t-1)
BNAIC 2015 November 5-6, 2015
Distributed Stream Processing
(release 1.1)
§ RLink
§ Invoke	AMIDST	Inference	engine	within	R.	
§ Preliminary	functionality	recently	presented.
29BNAIC 2015 November 5-6, 2015
Distributed Stream Processing
(release 2.0)
§ FlinkLink
§ Apache	Flink:	Open	source	platform	for	distributed	stream	processing.
§ Handling	Massive	Data	Streams.
30BNAIC 2015 November 5-6, 2015
Open Source project
§ We’re	open	to	your	contributions!!	;)
31BNAIC 2015 November 5-6, 2015
Hosted on Github
§ Download:
:>	git clone	https://github.com/amidst/toolbox.git
§ Compile:
:>	./compile.sh
§ Run:
:>	./run.sh <class-name>
BNAIC 2015 November 5-6, 2015
Please “star” our project!
(if you like it)
33BNAIC 2015 November 5-6, 2015
Any questions
before the live demo ?
34
Live	Demo
Tracking	concept	drift	in	
Financial	data	with	AMIDST
Borchani et	al.	Modeling	Concept	Drift:	A	Probabilistic	Graphical	Model	Based	Approach.	IDA	2015.
Demo Code Available in Github
36
eu.amidst.bnaic2015.examples.BCC
BNAIC 2015 November 5-6, 2015
Financial Data
§ Provided	by	BCC	(spanish regional	bank).
§ Consist	of	monthly	aggregated	information
§ Active	clients	between	18	and	65	years	old.
§ Data	between	April	2007	and	March	2014.
§ 11	variables
§ Income,	total	credit,	expenses,	etc.
§ Each	client	is	classified	as:
§ defaulter/non-defaulter	in	following	12	months.
37BNAIC 2015 November 5-6, 2015
Financial Data
§ Hypothesis:
§ Does	spanish financial	crisis	impact	on	bank	customers?
§ Look	at	the	evolution	of	regional	unemployment	rate.
38BNAIC 2015 November 5-6, 2015
Data Preprocessing/Visualization
§ Visualize	the	evolution	of	the	monthly	aggregated	data:
§ Data	does	not	fit	in	main	memory!
39BNAIC 2015 November 5-6, 2015
Model Building
§ We	use	a	simple	Naïve	Bayes	model:
§ With	a	global	hidden	variable	to	track	concept	drift.
40
D
A1 A2 A11…
H
BNAIC 2015 November 5-6, 2015
Model Building
§ We	also	use	Plateau	notation
§ “H”	is	designed	to	capture	concept	drift	
41
D
A1 A2 A11…
HtHt-1
i=1…M
✓
BNAIC 2015 November 5-6, 2015
Tracking concept drift
42BNAIC 2015 November 5-6, 2015
Tracking concept drift
43BNAIC 2015 November 5-6, 2015
References
§ Masegosa	et	al.	AMIDST:	Analysis	of	Massive	Data	Streams	using	Probabilistic	
Graphical	Models.	Submitted	to	JMLR.	2015.
§ Borchani et	al.	Modeling	Concept	Drift:	A	Probabilistic	Graphical	Model	Based	
Approach.	IDA	2015.	
§ Masegosa	et	al.	Probabilistic	graphical	models	on	multi-core	CPUs	using	Java	8.	
Submitted	to	IEEE	Computational	Intelligence	Magazine,	Special	Issue	on	
Computational	Intelligence	Software.	2015.	
§ Salmeron et	al.	Parallel	importance	sampling	in	conditional	linear	Gaussian	
networks.	In	Proceedings	of	the	Conferencia de	la	Asociacion Española	para la	
Inteligencia Artificial,	volume	in	press,	2015.	
§ Winn	et	al. Variational	message	passing.	Journal	of	Machine	Learning	Research,	
6:661–694,	2005.	
§ Broderick	et	al.	Streaming	variational Bayes.	In	Advances	in	Neural	Information	
Processing	Systems,	pages	1727–1735,	2013.	
44BNAIC 2015 November 5-6, 2015
Any questions ?
45
http://amidst.github.io/toolbox/
BNAIC 2015 November 5-6, 2015
Open Source project
§ We’re	open	to	your	contributions!!	;)
46BNAIC 2015 November 5-6, 2015

More Related Content

PDF
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
PPTX
Trivento summercamp masterclass 9/9/2016
PPTX
Trivento summercamp fast data 9/9/2016
PDF
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
PDF
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
PDF
Continuous Intelligence: Keeping your AI Application in Production
PDF
Case of success: Visualization as an example for exercising democratic transp...
PDF
Big Data Pitfalls
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Trivento summercamp masterclass 9/9/2016
Trivento summercamp fast data 9/9/2016
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
Continuous Intelligence: Keeping your AI Application in Production
Case of success: Visualization as an example for exercising democratic transp...
Big Data Pitfalls

Viewers also liked (9)

PDF
Plantilla para contenidos de social media Ideario Marketing
PDF
Firma deicy
DOCX
Aimee Brotherton Resume
PDF
3i-Awards-2007-08-OS
PDF
Untitled Presentation
PDF
Artículo - Determinación del efecto que tiene la productividad laboral sobre ...
PPSX
SurgeSafe Lightning Protection system -Presentation
PDF
Plantilla planificación estrategia social media ideario
PDF
SenoTK_Catur Rahayu
Plantilla para contenidos de social media Ideario Marketing
Firma deicy
Aimee Brotherton Resume
3i-Awards-2007-08-OS
Untitled Presentation
Artículo - Determinación del efecto que tiene la productividad laboral sobre ...
SurgeSafe Lightning Protection system -Presentation
Plantilla planificación estrategia social media ideario
SenoTK_Catur Rahayu
Ad

Similar to Amidst demo (BNAIC 2015) (20)

PDF
Artificial intelligence and data stream mining
PPTX
Big and fast data strategy 2017 jr
PDF
Meetup7 integration microservices_machine_learning
PDF
The Future of Data Pipelines
PDF
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
PPTX
Accelerating Big Data & Analytics Innovations through Public – Private Partne...
PDF
TUW-ASE Summer 2015: Advanced service-based data analytics: Models, Elasticit...
PDF
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
PDF
NetApp Industry Keynote - Flash Memory Summit - Aug2015
PPTX
Study: #Big Data in #Austria
PDF
Eecs6893 big dataanalytics-lecture1
PDF
How to Build Fast Data Applications: Evaluating the Top Contenders
PDF
Algorithmic Systems Transparency and Accountability in Big Data & Cognitive Era
PDF
MISE2015
PDF
Analytics on system z final
PDF
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
PPTX
Easy SPARQLing for the Building Performance Professional
PDF
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
PDF
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
PDF
Data Science & Data Products at Neue Zürcher Zeitung
Artificial intelligence and data stream mining
Big and fast data strategy 2017 jr
Meetup7 integration microservices_machine_learning
The Future of Data Pipelines
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Accelerating Big Data & Analytics Innovations through Public – Private Partne...
TUW-ASE Summer 2015: Advanced service-based data analytics: Models, Elasticit...
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
NetApp Industry Keynote - Flash Memory Summit - Aug2015
Study: #Big Data in #Austria
Eecs6893 big dataanalytics-lecture1
How to Build Fast Data Applications: Evaluating the Top Contenders
Algorithmic Systems Transparency and Accountability in Big Data & Cognitive Era
MISE2015
Analytics on system z final
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
Easy SPARQLing for the Building Performance Professional
BDVe Webinar Series: DataBench – Benchmarking Big Data. Arne Berre. Tue, Oct ...
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
Data Science & Data Products at Neue Zürcher Zeitung
Ad

More from AMIDST Toolbox (8)

PDF
Analysis of massive data using R (CAEPIA2015)
PDF
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
PDF
A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...
PDF
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
PDF
Parallelisation of the PC Algorithm (CAEPIA2015)
PDF
d-VMP: Distributed Variational Message Passing (PGM2016)
PDF
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...
PDF
Flink Forward 2016
Analysis of massive data using R (CAEPIA2015)
Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)
A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallelisation of the PC Algorithm (CAEPIA2015)
d-VMP: Distributed Variational Message Passing (PGM2016)
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...
Flink Forward 2016

Recently uploaded (20)

PPT
Chemical bonding and molecular structure
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
Microbiology with diagram medical studies .pptx
PDF
The scientific heritage No 166 (166) (2025)
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
famous lake in india and its disturibution and importance
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
diccionario toefl examen de ingles para principiante
PPT
protein biochemistry.ppt for university classes
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
Chemical bonding and molecular structure
INTRODUCTION TO EVS | Concept of sustainability
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Microbiology with diagram medical studies .pptx
The scientific heritage No 166 (166) (2025)
Introduction to Fisheries Biotechnology_Lesson 1.pptx
AlphaEarth Foundations and the Satellite Embedding dataset
famous lake in india and its disturibution and importance
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
diccionario toefl examen de ingles para principiante
protein biochemistry.ppt for university classes
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
ECG_Course_Presentation د.محمد صقران ppt
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Comparative Structure of Integument in Vertebrates.pptx
bbec55_b34400a7914c42429908233dbd381773.pdf

Amidst demo (BNAIC 2015)