Concept Drift: Monitoring Model Quality In Streaming ML Applications

* same data generating distribution 
(Some algorithms tolerate violation of this to a certain degree.)
training set
operation=*
core problem

core problem
stream
population change

core problem
stream
sensor failure

core problem
stream
concept drift

core problem
stream
emerging  
concept

common solution
model batch
poor

classifiermargin
can active learning help?
color
size

a better solution
data
classifier
feature extraction
predictions

a better solution
data
classifier
feature extraction
predictions
monitoring

a better solution
data
classifier
labeling
-
feature extraction
predictions
monitoring

a better solution
data
classifier
labeling
-change detection
adaptation
feature extraction
predictions

monitor how?
supervised
unsupervised
statistical process control
sequential analysis
error distribution monitoring
clustering / novelty detection
feature distribution monitoring
model-dependent monitoring

adapt how?
explicit  
mechanisms
implicit  
mechanisms
windowing
weighting
sampling
pure methods
ensemble methods

• Drift Detection Method [DDM]
• # of errors is Binomial:
• alert:

• Early Drift Detection Method [EDDM]
• distance between errors  
better for gradual drift
• warn & start caching:
• alert and reset max:
• Drift Detection Method [DDM]
• # of errors is Binomial:
• alert:

sequential analysis
• Linear Four Rates [LFR]
• stationary data => constant contingency table
0 1
0 TN FN
1 FP TP
Predicted
True

sequential analysis
• calculate four rates
0 1
0 TN FN
1 FP TP
Predicted
True

sequential analysis
• incremental updates
0 1
0 TN FN
1 FP TP
Predicted
True

sequential analysis
• incremental updates
• test for change
• Monte Carlo sampling  
for significance level
• Bonferoni correction  
for correlated tests
• O(1)
• Better than (E)DDM  
for class imbalance
0 1
0 TN FN
1 FP TP
Predicted
True

error distribution monitoring
• ADaptive WINdowing [ADWIN]
• Consider all partitions of a window 
 
 
 
 
• Drop the last element if any 
 
 
• Efficient version O(log W)
• Data structure for windows ~ exponential histograms
• Drop last window rather than last element
w0 w1
prediction
errors

resampling
• Prediction loss over random permutations vs. ordered training data
• Parallel permutation test version available
• Still expensive
• Only method directly applicable to regression setting
• Side note: Even with finite training set, drift could be problematic if model is developed
naively.

• OLINDDA: K-means, periodically
merge unknown to known or flag
• MINAS: micro-clusters, incremental
stream clustering
• DETECTNOD: Discrete Cosine
Transform to estimate distances
efficiently
• Woo-ensemble: Treat outliers as
potential emerging class centroids
• ECSMiner: Store and use cluster
summary efficiently
• GC3: Grid based clustering
size
color

Curse of  
Dimensionality
size
color
• OLINDDA: K-means, periodically
merge unknown to known or flag
• MINAS: micro-clusters, incremental
stream clustering
• DETECTNOD: Discrete Cosine
Transform to estimate distances
efficiently
• Woo-ensemble: Treat outliers as
potential emerging class centroids
• ECSMiner: Store and use cluster
summary efficiently
• GC3: Grid based clustering

feature distribution monitoring
• Monitor individual features
• Many ways to compare:
• Pearson correlation [Change of Concept - CoC]
• Hellinger distance [HDDDM] ~ O(DB)
• PCA to reduce the number of features to track (top [PCA-1] or bottom [PCA-2] n%)
w0
w1
color
size
samples

• Not all changes matter
• Posterior probability estimate
• Use [A-distance] ~ generalized Kolmogorov-Simirnov distance
• designed to be less sensitive to irrelevant changes
L1-distance KS-distance A-distance

• [Margin] distribution
• rank statistic on density estimates for a  
binary representation of the data,
• compare average margins of a linear classifier  
induced by the 1-norm SVM
• based on the average zero-one or sigmoid error  
rate of an SVM classifier
• Generalized margin [MD3]:
• Embed base classifier in a  
Random Feature Bagged Ensemble
• Margin == high disagreement region of the ensemble
m
argin
“margin”
“margin”

explicit mechanisms for adaptation
W
stationary
W
drift
ADWIN
Drop the last sub-window  
if threshold is exceeded. = Adaptively shrink
window during drift.

* Adaptation goes through a similar reﬁnement process.
JIT w
0
m
0
m
1
m
2
m
3
m
4
I
0
I
1
I
2
I
3
I
4
change detected *
w
1
w
2
w
3
w
4

Biased 
Reservoir 
Sampling
bias:
capacity:
overwrite / exchange
randomly w/ Prob{ %full }
or append

implicit mechanisms for adaptation
Ensemble Based Adaptation
ensemble 1 ensemble (N-1) ensemble N
train new member

retire / decay train new member

retire / decay
recurring
train new member

• Online NonStationary boosting [ONSboost]
• NonStationary Random Forests [NSRF]
• Dynamic Weighted Majority [DWM]
• Learn++ for NonStationary Environments [Learn++.NSE]
retire / decay
recurring
train new member

which method?
Method Efficiency Pros Cons Notes
DDM/EDDM O(1) no data stored
label cost
false alarms sampling  
necessary  
in case of  
fast data,
microservices 
architecture 
ideal
LFR O(1) class imbalance OK label cost
ADWIN O(log W)
better change
localization
label cost
JIT O(log W) no labels required only for abrupt changes best localization

which method?
Method Eﬃciency Pros Cons Notes
ECSMiner / GC3 O(W
2
/ k) 
O(G log C)
emerging concepts
clusterable  
drift only
use if emerging
concepts expected
HDDDM O(DB) no labels
not for population
drift or class
imbalance
better when combined
with PCA
A-distance O(log W) no labels
less false positives
compared to HDDDM
good choice for
unsupervised
Margin / MD3
Learning, detection,
adaptation bundled
reduced false alarms
must use feature
bagged ensembles
best choice but must
commit to using the
specific machine
learning algorithmsEnsemble methods recurring concepts large batches

references
https://gist.github.com/emrev12/0d75dc2d6c3e80012d10a82712b8ced0

Check out these resources:
Dean’s book
Webinars
etc.Fast Data Architectures  
for Streaming Applications
Getting Answers Now from Data Sets that Never End
By Dean Wampler, Ph. D., VP of Fast Data Engineering
60
LIGHTBEND.COM/LEARN

Serving Machine Learning Models
A Guide to Architecture, Stream Processing Engines,  
and Frameworks
By Boris Lublinsky, Fast Data Platform Architect
61
LIGHTBEND.COM/LEARN

lightbend.com/fast-data-platform

thank you!
emre.velipasaoglu@ .com

Concept Drift: Monitoring Model Quality In Streaming ML Applications

More Related Content

What's hot (20)

Similar to Concept Drift: Monitoring Model Quality In Streaming ML Applications (20)

More from Lightbend (20)

Recently uploaded (20)

Concept Drift: Monitoring Model Quality In Streaming ML Applications