SlideShare a Scribd company logo
Effects of change propagation resulting from adaptive
preprocessing in multicomponent predictive systems
Manuel Martín Salvador, Marcin Budka, Bogdan Gabrys
{msalvador,mbudka,bgabrys}@bournemouth.ac.uk
Data Science Institute. Bournemouth University
KES-2016, York, UK
September 7th, 2016
Outline
1. Prologue
2. Introduction to MCPS
3. Motivation
4. Reactive adaptation of MCPS
5. Experiments
6. Conclusion
PROLOGUE
Butterfly effect
Small causes can have large effects
— Edward Lorenz (1917 - 2008)
Source: GloWings
Change propagation
Controlled change management in a system
CC by TheGiantVermin
Data streams
“Infinite” number of records
Continuously arriving to the system
at different or same rates
Can be stationary or evolving
Data streams
Examples:
● Sensors in manufacturing industry
● Traffic monitoring sensors
● Event logs in websites
● Transactions in the financial sector
“Infinite” number of records
Continuously arriving to the system
at different or same rates
Can be stationary or evolving
A single engine of Airbus A320
has more than 1000 sensors
generating 10GB/s!!
INTRODUCTION TO MCPS
Data Stream
Data stream learning for online prediction
Predictive
Model
Online Supervised Learning Algorithm
Predictions
True labels
t+k
t
Data Stream
Data stream learning for online prediction
Predictive
Model
PredictionsPreprocessing Postprocessing
Multicomponent Predictive System (MCPS)
MCPS composition
Manual
● WEKA
● RapidMiner
● Knime
● IBM SPSS
Automatic
● Auto-WEKA (Bayesian optimisation)
● Auto-sklearn (Bayesian optimisation + Meta-learning)
● TPOT (Genetic programming)
● e-Lico IDA (Ontologies + Planning)
Example of WEKA workflow
Formalising MCPS
o
token
(data) i
place
transition
Well-handled and Acyclic Workflow Petri net (WA-WF-net)
MCPS = (P, T, F)
Formalising MCPS
o prediction
i
place
transition
Well-handled and Acyclic Workflow Petri net (WA-WF-net)
MCPS = (P, T, F)
“Automatic composition and optimisation of multicomponent predictive systems”
@ IEEE TNNLS (under review) http://bit.ly/automatic-mcps-tnnls
Formalising MCPS
Classifier
o
Replace
missing
values
Dimensionality
reduction
Outlier
handling
token
(data) i
place
transition
Well-handled and Acyclic Workflow Petri net (WA-WF-net)
MCPS = (P, T, F)
“Automatic composition and optimisation of multicomponent predictive systems”
@ IEEE TNNLS (under review) http://bit.ly/automatic-mcps-tnnls
MOTIVATION
Data changes over time
Snapshot of SYN dataset at different times
Need of model adaptation
Streaming error (mean over last 10 samples)
SYN dataset with GFMM classifier
GFMMZ-Score PCA Min-Max
Wrongly classified
Need of preprocessing adaptation
Streaming error (mean over last 10 samples)
SYN dataset with GFMM classifier
GFMMZ-Score PCA Min-Max
Wrongly classified
(out of [0,1])
New hyperboxes
Main strategies for MCPS adaptation
Adaptation strategies GLOBAL LOCAL
Re-composition Full Partial
Hyperparameter optimisation (keep components) Full Partial
Parameterisation (keep components and hyperparameters) Full Partial
Main strategies for MCPS adaptation
Adaptation strategies GLOBAL LOCAL
Re-composition Full Partial
Hyperparameter optimisation (keep components) Full Partial
Parameterisation (keep components and hyperparameters) Full Partial
“Adapting Multicomponent Predictive Systems using Hybrid Adaptation
Strategies with Auto-WEKA in Process Industry” @ AutoML / ICML 2016
http://bit.ly/adapting-mcps-paper
This work!
Need of change propagation
Streaming error (mean over last 10 samples)
SYN dataset with GFMM classifier
GFMMZ-Score PCA Min-Max
Inconsistent hyperboxes
due to a different input space
REACTIVE ADAPTATION OF MCPS
Reactive adaptation of MCPS
GFMMZ-Score PCA Min-Max
Time
i p1
p2
p3
o
[-3.1, 2.7]
x1
= 3.6
Reactive adaptation of MCPS
GFMMZ-Score PCA Min-Max
Time
i p1
p2
p3
o
data
meta-data
[-3.1, 2.7]
x1
= 3.6
[-3.1, 3.6]
Reactive adaptation of MCPS
GFMMZ-Score PCA Min-Max
Time
i p1
p2
p3
o
data
meta-data
prediction
[-3.1, 2.7]
x1
= 3.6
[-3.1, 3.6]
Updating a component: GFMM
0 1
1
0
(-3.1) (2.7)
x1
x2
0 1
1
0
(-3.1) (3.6)
x1
x2
Hyperboxes are
mapped to the new
input space
EXPERIMENTS
Experiments
Name # Attr # Class Type
SYN 2 2 Synthetic
ELEC 7 2 Real
COVERTYPE 54 7 Real
GAS 128 6 Real
Datasets Scenarios
Id
Adap.
Model
Adap.
Prepro.
Change
Propagation
#1 No No No
#2 Yes No No
#3 Yes Yes No
#4 Yes Yes Yes
First 200 samples for initial training,
rest 400 for testing and online learning
GFMMZ-Score PCA Min-Max
Results
#3 crashes due to
lack of change
propagation when
changing PCA
components
CONCLUSION
Conclusion
Only model adaptation may not be enough to cope with evolving data streams,
adaptive preprocessing should be considered.
However, “blind” adaptation of components can result in inconsistent models or
even in a system crash.
Local adaptation of a component may require adapting further components.
Therefore, a system must be reactive and propagate changes.
The definition of MCPS has been extended to support change propagation using
a new token for meta-data in a coloured Petri net (cMCPS).
Future work
Large study to measure the actual cost of adaptation.
Open questions:
● How to handle propagation requiring changes of the Petri net structure?
● How to handle transformations in systems with nonlinear components?
● How to order components to reduce the cost of adaptation?
● Can a meta-data token be removed at an early stage instead of being fully
propagated?
Thanks!
Paper: http://bit.ly/change-propagation-mcps
Slides: http://www.slideshare.net/draxus
Manuel <msalvador@bournemouth.ac.uk>
@draxus

More Related Content

DOCX
JPN1411 Secure Continuous Aggregation in Wireless Sensor Networks
PDF
Adaptive anomaly detection with kernel eigenspace splitting and merging
PPT
DOCX
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...
PDF
Dotnet modeling and optimizing the performance- security tradeoff on d-ncs u...
PPTX
Reactive programming
PPTX
Automating Machine Learning - Is it feasible?
PDF
Improving transport timetables usability for mobile devices
JPN1411 Secure Continuous Aggregation in Wireless Sensor Networks
Adaptive anomaly detection with kernel eigenspace splitting and merging
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...
Dotnet modeling and optimizing the performance- security tradeoff on d-ncs u...
Reactive programming
Automating Machine Learning - Is it feasible?
Improving transport timetables usability for mobile devices

Similar to Effects of change propagation resulting from adaptive preprocessing in multicomponent predictive systems (20)

PPT
Concurrent Replication of Parallel and Distributed Simulations
PDF
Pdcs2010 balman-presentation
PPTX
Pruning convolutional neural networks for resource efficient inference
PDF
Hardback solution to accelerate multimedia computation through mgp in cmp
PDF
Modelling Multi-Component Predictive Systems as Petri Nets
PDF
Accelerating GWAS epistatic interaction analysis methods
PPTX
cuTau Leaping
PPT
PPT
PDF
EFFICIENT USE OF HYBRID ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM COMBINED WITH N...
PDF
D0931621
PDF
IEEE Fuzzy system Title and Abstract 2016
PDF
M3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
PDF
A time efficient and accurate retrieval of range aggregate queries using fuzz...
PPTX
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
PPTX
A Tale of Data Pattern Discovery in Parallel
PPTX
Power grid-data-analysis-overview-2013-03
PDF
Energy Efficient Optimal Paths Using PDORP-LC
PDF
IEEE Emerging topic in computing Title and Abstract 2016
PDF
IEEE Networking 2016 Title and Abstract
PPT
Integrative information management for systems biology
Concurrent Replication of Parallel and Distributed Simulations
Pdcs2010 balman-presentation
Pruning convolutional neural networks for resource efficient inference
Hardback solution to accelerate multimedia computation through mgp in cmp
Modelling Multi-Component Predictive Systems as Petri Nets
Accelerating GWAS epistatic interaction analysis methods
cuTau Leaping
PPT
EFFICIENT USE OF HYBRID ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM COMBINED WITH N...
D0931621
IEEE Fuzzy system Title and Abstract 2016
M3AT: Monitoring Agents Assignment Model for the Data-Intensive Applications
A time efficient and accurate retrieval of range aggregate queries using fuzz...
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
A Tale of Data Pattern Discovery in Parallel
Power grid-data-analysis-overview-2013-03
Energy Efficient Optimal Paths Using PDORP-LC
IEEE Emerging topic in computing Title and Abstract 2016
IEEE Networking 2016 Title and Abstract
Integrative information management for systems biology
Ad

More from Manuel Martín (20)

PDF
Hogar (Des)Conectado
PDF
Automatizando el aprendizaje basado en datos
PDF
Brand engagement with mobile gamification apps from a developer perspective
PDF
Towards Automatic Composition of Multicomponent Predictive Systems
PDF
From sensor readings to prediction: on the process of developing practical so...
PDF
Quick presentation for the OpenML workshop in Eindhoven 2014
PDF
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
PDF
Artificial Intelligence for Automating Data Analysis
PDF
Handling concept drift in data stream mining
PDF
Minería de secuencias de datos
PDF
Minería de secuencias de datos
PDF
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
ODP
Decompiladores
ODP
Operaciones Colectivas en MPI
PDF
Introducción a GNU/Linux
PDF
Presentación Día de la Libertad del Software 2011
ODP
Presentacion Taller de Introducción a Linux SFD2010
PDF
Presentación Gnome 3.0 en Granada
ODP
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
ODP
Pintando gráficas con Python
Hogar (Des)Conectado
Automatizando el aprendizaje basado en datos
Brand engagement with mobile gamification apps from a developer perspective
Towards Automatic Composition of Multicomponent Predictive Systems
From sensor readings to prediction: on the process of developing practical so...
Quick presentation for the OpenML workshop in Eindhoven 2014
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
Artificial Intelligence for Automating Data Analysis
Handling concept drift in data stream mining
Minería de secuencias de datos
Minería de secuencias de datos
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
Decompiladores
Operaciones Colectivas en MPI
Introducción a GNU/Linux
Presentación Día de la Libertad del Software 2011
Presentacion Taller de Introducción a Linux SFD2010
Presentación Gnome 3.0 en Granada
AndalucíaPeople: Un sistema de recomendación para sitios de ocio de Andalucía
Pintando gráficas con Python
Ad

Recently uploaded (20)

PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Lecture1 pattern recognition............
PPTX
Introduction to machine learning and Linear Models
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
.pdf is not working space design for the following data for the following dat...
Business Ppt On Nestle.pptx huunnnhhgfvu
Lecture1 pattern recognition............
Introduction to machine learning and Linear Models
IB Computer Science - Internal Assessment.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Miokarditis (Inflamasi pada Otot Jantung)
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Business Acumen Training GuidePresentation.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Supervised vs unsupervised machine learning algorithms
Data_Analytics_and_PowerBI_Presentation.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...

Effects of change propagation resulting from adaptive preprocessing in multicomponent predictive systems

  • 1. Effects of change propagation resulting from adaptive preprocessing in multicomponent predictive systems Manuel Martín Salvador, Marcin Budka, Bogdan Gabrys {msalvador,mbudka,bgabrys}@bournemouth.ac.uk Data Science Institute. Bournemouth University KES-2016, York, UK September 7th, 2016
  • 2. Outline 1. Prologue 2. Introduction to MCPS 3. Motivation 4. Reactive adaptation of MCPS 5. Experiments 6. Conclusion
  • 4. Butterfly effect Small causes can have large effects — Edward Lorenz (1917 - 2008) Source: GloWings
  • 5. Change propagation Controlled change management in a system CC by TheGiantVermin
  • 6. Data streams “Infinite” number of records Continuously arriving to the system at different or same rates Can be stationary or evolving
  • 7. Data streams Examples: ● Sensors in manufacturing industry ● Traffic monitoring sensors ● Event logs in websites ● Transactions in the financial sector “Infinite” number of records Continuously arriving to the system at different or same rates Can be stationary or evolving A single engine of Airbus A320 has more than 1000 sensors generating 10GB/s!!
  • 9. Data Stream Data stream learning for online prediction Predictive Model Online Supervised Learning Algorithm Predictions True labels t+k t
  • 10. Data Stream Data stream learning for online prediction Predictive Model PredictionsPreprocessing Postprocessing Multicomponent Predictive System (MCPS)
  • 11. MCPS composition Manual ● WEKA ● RapidMiner ● Knime ● IBM SPSS Automatic ● Auto-WEKA (Bayesian optimisation) ● Auto-sklearn (Bayesian optimisation + Meta-learning) ● TPOT (Genetic programming) ● e-Lico IDA (Ontologies + Planning) Example of WEKA workflow
  • 12. Formalising MCPS o token (data) i place transition Well-handled and Acyclic Workflow Petri net (WA-WF-net) MCPS = (P, T, F)
  • 13. Formalising MCPS o prediction i place transition Well-handled and Acyclic Workflow Petri net (WA-WF-net) MCPS = (P, T, F) “Automatic composition and optimisation of multicomponent predictive systems” @ IEEE TNNLS (under review) http://bit.ly/automatic-mcps-tnnls
  • 14. Formalising MCPS Classifier o Replace missing values Dimensionality reduction Outlier handling token (data) i place transition Well-handled and Acyclic Workflow Petri net (WA-WF-net) MCPS = (P, T, F) “Automatic composition and optimisation of multicomponent predictive systems” @ IEEE TNNLS (under review) http://bit.ly/automatic-mcps-tnnls
  • 16. Data changes over time Snapshot of SYN dataset at different times
  • 17. Need of model adaptation Streaming error (mean over last 10 samples) SYN dataset with GFMM classifier GFMMZ-Score PCA Min-Max Wrongly classified
  • 18. Need of preprocessing adaptation Streaming error (mean over last 10 samples) SYN dataset with GFMM classifier GFMMZ-Score PCA Min-Max Wrongly classified (out of [0,1]) New hyperboxes
  • 19. Main strategies for MCPS adaptation Adaptation strategies GLOBAL LOCAL Re-composition Full Partial Hyperparameter optimisation (keep components) Full Partial Parameterisation (keep components and hyperparameters) Full Partial
  • 20. Main strategies for MCPS adaptation Adaptation strategies GLOBAL LOCAL Re-composition Full Partial Hyperparameter optimisation (keep components) Full Partial Parameterisation (keep components and hyperparameters) Full Partial “Adapting Multicomponent Predictive Systems using Hybrid Adaptation Strategies with Auto-WEKA in Process Industry” @ AutoML / ICML 2016 http://bit.ly/adapting-mcps-paper This work!
  • 21. Need of change propagation Streaming error (mean over last 10 samples) SYN dataset with GFMM classifier GFMMZ-Score PCA Min-Max Inconsistent hyperboxes due to a different input space
  • 23. Reactive adaptation of MCPS GFMMZ-Score PCA Min-Max Time i p1 p2 p3 o [-3.1, 2.7] x1 = 3.6
  • 24. Reactive adaptation of MCPS GFMMZ-Score PCA Min-Max Time i p1 p2 p3 o data meta-data [-3.1, 2.7] x1 = 3.6 [-3.1, 3.6]
  • 25. Reactive adaptation of MCPS GFMMZ-Score PCA Min-Max Time i p1 p2 p3 o data meta-data prediction [-3.1, 2.7] x1 = 3.6 [-3.1, 3.6]
  • 26. Updating a component: GFMM 0 1 1 0 (-3.1) (2.7) x1 x2 0 1 1 0 (-3.1) (3.6) x1 x2 Hyperboxes are mapped to the new input space
  • 28. Experiments Name # Attr # Class Type SYN 2 2 Synthetic ELEC 7 2 Real COVERTYPE 54 7 Real GAS 128 6 Real Datasets Scenarios Id Adap. Model Adap. Prepro. Change Propagation #1 No No No #2 Yes No No #3 Yes Yes No #4 Yes Yes Yes First 200 samples for initial training, rest 400 for testing and online learning GFMMZ-Score PCA Min-Max
  • 29. Results #3 crashes due to lack of change propagation when changing PCA components
  • 31. Conclusion Only model adaptation may not be enough to cope with evolving data streams, adaptive preprocessing should be considered. However, “blind” adaptation of components can result in inconsistent models or even in a system crash. Local adaptation of a component may require adapting further components. Therefore, a system must be reactive and propagate changes. The definition of MCPS has been extended to support change propagation using a new token for meta-data in a coloured Petri net (cMCPS).
  • 32. Future work Large study to measure the actual cost of adaptation. Open questions: ● How to handle propagation requiring changes of the Petri net structure? ● How to handle transformations in systems with nonlinear components? ● How to order components to reduce the cost of adaptation? ● Can a meta-data token be removed at an early stage instead of being fully propagated?