SlideShare a Scribd company logo
Introduction Problem formulation Learning strategy Experiments Conclusion
Credit Card Fraud Detection and
Concept-Drift Adaptation with Delayed
Supervised Information
Andrea Dal Pozzolo, Giacomo Boracchi, Olivier Caelen,
Cesare Alippi, and Gianluca Bontempi
15/07/2015
IEEE IJCNN 2015 conference
1/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
INTRODUCTION
Fraud Detection is notably a challenging problem because of
concept drift (i.e. customers’ habits evolve)
class unbalance (i.e. genuine transactions far outnumber
frauds)
uncertain class labels (i.e. some frauds are not reported or
reported with large delay and few transactions can be
timely investigated)
2/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
INTRODUCTION II
Fraud-detection systems (FDSs) differ from a classification
tasks:
only a small set of supervised samples is provided by
human investigators (they check few alerts).
the labels of the majority of transactions are available only
several days later (after customers have report
unauthorized transactions).
3/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
PROBLEM FORMULATION
We formalise FD as a classification problem:
At day t, classifier Kt−1 (trained on t − 1) associates to each
feature vector x ∈ Rn, a score PKt−1
(+|x).
The k transactions with largest PKt−1
(+|x) define the alerts
At reported to the investigators.
Investigators provide feedbacks Ft about the alerts in At,
defining a set of k supervised couples (x, y)
Ft = {(x, y), x ∈ At}, (1)
Ft are the only immediate supervised samples.
4/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
PROBLEM FORMULATION II
At day t, delayed supervised couples Dt−δ are transactions
that have not been checked by investigators, but their label
is assumed to be correct after that δ days have elapsed.
Time%
Feedbacks%
Supervised%samples%
Delayed%samples%
t −δ t −1 t
FtDt−δ
All%fraudulent%transac9ons%of%a%day%
All%genuine%transac9ons%of%a%day%
Fraudulent%transac9ons%in%the%feedback%
Genuine%transac9ons%in%the%feedback%
Figure : The supervised samples available at day t include: i)
feedbacks of the first δ days and ii) delayed couples occurred before
the δth
day.
5/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
Ft are a small set of risky transactions according the FDS.
Dt−δ contains all the occurred transactions in a day (≈ 99%
genuine transactions).
Time%
Fraudulent%transac9ons%in%
Genuine%transac9ons%in%
Fraudulent%feedback%in%%
Genuine%feedback%in%%
FtFt−1Dt−7 Ft−6 Ft−5 Ft−4 Ft−3 Ft−2
FtFt−1Dt−7 Ft−6 Ft−5 Ft−4 Ft−3 Ft−2Dt−8
FtFt−1Dt−7 Ft−6 Ft−5 Ft−4 Ft−3 Ft−2Dt−8
Day'1'
Day'2'
Day'3'
Ft
Ft
St
St
Dt−9
Figure : Everyday we have a new set of feedbacks
(Ft, Ft−1, . . . , Ft−(δ−1)) from the first δ days and a new set of delayed
transactions occurred on the δth
day (Dt−δ). In this Figure we assume
δ = 7.
6/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
ACCURACY MEASURE FOR A FDS
The goal of a FDS is to return accurate alerts, thus the highest
precision in At. This precision can be measured by the quantity
pk(t) =
#{(x, y) ∈ Ft s.t. y = +}
k
(2)
where pk(t) is the proportion of frauds in the top k transactions
with the highest likelihood of frauds ([1]).
7/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
LEARNING STRATEGY
Learning from feedbacks Ft is a different problem than learning
from delayed samples in Dt−δ:
Ft provides recent, up-to-date, information while Dt−δ
might be already obsolete once it comes.
Percentage of frauds in Ft and Dt−δ is different.
Supervised couples in Ft are not independently drawn, but
are instead selected by Kt−1.
A classifier trained on Ft learns how to label transactions
that are most likely to be fraudulent.
Feedbacks and delayed transactions have to be treated
separately.
8/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
CONCEPT DRIFT ADAPTATION
Two conventional solutions for CD adaptation are Wt and
Et [6, 5]. To learn separately from feedbacks and delayed
transactions we propose Ft, WD
t and ED
t .
Time%
All%fraudulent%transac9ons%of%a%day%
All%genuine%transac9ons%of%a%day%
Fraudulent%transac9ons%in%the%feedback%
Genuine%transac9ons%in%the%feedback%
FtFt−1Dt−7 Ft−6 Ft−5 Ft−4 Ft−3 Ft−2Dt−8
FtFt−1Dt−7 Ft−6 Ft−5 Ft−4 Ft−3 Ft−2Dt−8
Sliding'
window'
Ensemble'
M1M2 Ft
Ft
EtED
t
Wt
WD
t
Figure : Supervised information used by different classifiers in the
ensemble and sliding window approach.9/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
CLASSIFIER AGGREGATIONS
WD
t and ED
t have to be aggregated with Ft to exploit
information provided by feedbacks. We combine these
classifiers by averaging the posterior probabilities.
Sliding window:
PAW
t
(+|x) =
PFt (+|x) + PWD
t
(+|x)
2
Ensemble:
PAE
t
(+|x) =
PFt (+|x) + PED
t
(+|x)
2
AE
t and AW
t give larger influence to feedbacks on the
probability estimates w.r.t Et and Wt.
10/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
TWO RANDOM FOREST
We used two different Random Forests (RF) classifiers
depending on the fraud prevalence in the training set.
for classifiers on delayed samples we used a Balanced
RF [3] (undersampling before training each tree).
for Ft we adopted a standard RF [2] (no undersampling).
11/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
DATASETS
We considered two datasets of credit card transactions:
Table : Datasets
Id Start day End day # Days # Instances # Features % Fraud
2013 2013-09-05 2014-01-18 136 21,830,330 51 0.19%
2014 2014-08-05 2014-10-09 44 7,619,452 51 0.22%
In the 2013 dataset there is an average of 160k transaction per
day and about 304 frauds per day, while in the 2014 dataset
there is a daily average of 173k transactions and 380 frauds.
12/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
EXPERIMENTS
Settings:
We assume that after δ = 7 days all the transactions labels
are provided (delayed supervised information)
A budget of k = 100 alerts that can be checked by the
investigators (Ft is trained on a window of 700 feedbacks).
A window of α = 16 days is used to train WD
t (16 models
in ED
t )
Each experiments is repeated 10 times and the performance is
assessed using pk.
13/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
In both 2013 and 2014 datasets, aggregations AW
t and AE
t
outperforms the other FDSs in terms of pk.
Table : Average pk in all the batches for the sliding window
Dataset 2013 Dataset 2014
classifier mean sd mean sd
F 0.609 0.250 0.596 0.249
WD 0.540 0.227 0.549 0.253
W 0.563 0.233 0.559 0.256
AW 0.697 0.212 0.657 0.236
Table : Average pk in all the batches for the ensemble
Dataset 2013 Dataset 2014
classifier mean sd mean sd
F 0.603 0.258 0.596 0.271
ED 0.459 0.237 0.443 0.242
E 0.555 0.239 0.516 0.252
AE 0.683 0.220 0.634 0.239
14/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
WDWFAW
(a) Sliding window
2013
WD
WFAW
(b) Sliding window
2014
F E ED
AE
(c) Ensemble 2013
E EDFAE
(d) Ensemble 2014
Sum of ranks from
the Friedman test [4],
classifiers having the
same letter are not
significantly different
(paired t-test based
upon on the ranks).
15/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
EXPERIMENTS ON ARTIFICIAL DATASET WITH CD
In the second part we artificially introduce CD in specific days
by juxtaposing transactions acquired in different times of the
year.
Table : Datasets with Artificially Introduced CD
Id Start 2013 End 2013 Start 2014 End 2014
CD1 2013-09-05 2013-09-30 2014-08-05 2014-08-31
CD2 2013-10-01 2013-10-31 2014-09-01 2014-09-30
CD3 2013-11-01 2013-11-30 2014-08-05 2014-08-31
16/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
Table : Average pk in the month before and after CD for the sliding
window approach
(a) Before CD
CD1 CD2 CD3
classifier mean sd mean sd mean sd
F 0.411 0.142 0.754 0.270 0.690 0.252
WD 0.291 0.129 0.757 0.265 0.622 0.228
W 0.332 0.215 0.758 0.261 0.640 0.227
AW 0.598 0.192 0.788 0.261 0.768 0.221
(b) After CD
CD1 CD2 CD3
classifier mean sd mean sd mean sd
F 0.635 0.279 0.511 0.224 0.599 0.271
WD 0.536 0.335 0.374 0.218 0.515 0.331
W 0.570 0.309 0.391 0.213 0.546 0.319
AW 0.714 0.250 0.594 0.210 0.675 0.244
17/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
AW
W
(e) Sliding window strate-
gies on dataset CD1
AW
W
(f) Sliding window strate-
gies on dataset CD2
W
AW
(g) Sliding window strate-
gies on dataset CD3
AE
E
(h) Ensemble strategies on
dataset CD3
Figure : Average pk per day (the higher the better) for classifiers on
datasets with artificial concept drift smoothed using moving average
of 15 days. The vertical bar denotes the date of the concept drift.
18/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
CONCLUDING REMARKS
We notice that:
Ft outperforms classifiers on delayed samples (trained on
obsolete couples).
Ft outperforms classifiers trained on the entire supervised
dataset (dominated by delayed samples).
Aggregation gives larger influence to feedbacks.
19/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
CONCLUSION
We formalise a real-world FDS framework that meets
realistic working conditions.
In a real-world scenario, there is a strong alert-feedback
interaction that has to be explicitly considered
Feedbacks and delayed samples should be separately
handled when training a FDS
Aggregating two distinct classifiers is an effective strategy
and that it enables a prompter adaptation in concept
drifting environments
20/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
FUTURE WORK
Future work will focus on:
Adaptive aggregation of Ft and the classifier trained on
delayed samples.
Study the sample selection bias in Ft introduced by
alert-feedback interaction.
21/ 22
Introduction Problem formulation Learning strategy Experiments Conclusion
BIBLIOGRAPHY
[1] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland.
Data mining for credit card fraud: A comparative study.
Decision Support Systems, 50(3):602–613, 2011.
[2] L. Breiman.
Random forests.
Machine learning, 45(1):5–32, 2001.
[3] C. Chen, A. Liaw, and L. Breiman.
Using random forest to learn imbalanced data.
University of California, Berkeley, 2004.
[4] M. Friedman.
The use of ranks to avoid the assumption of normality implicit in the analysis of variance.
Journal of the American Statistical Association, 32(200):675–701, 1937.
[5] J. Gao, B. Ding, W. Fan, J. Han, and P. S. Yu.
Classifying data streams with skewed class distributions and concept drifts.
Internet Computing, 12(6):37–49, 2008.
[6] D. K. Tasoulis, N. M. Adams, and D. J. Hand.
Unsupervised clustering in streaming data.
In ICDM Workshops, pages 638–642, 2006.
22/ 22

More Related Content

PDF
Adaptive Machine Learning for Credit Card Fraud Detection
PDF
Andrea Dal Pozzolo's CV
PPTX
Credit card fraud detection using machine learning Algorithms
PPTX
Credit card fraud detection using python machine learning
PPTX
Comparative study of various approaches for transaction Fraud Detection using...
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
Credit card payment_fraud_detection
PPTX
Credit card fraud detection
Adaptive Machine Learning for Credit Card Fraud Detection
Andrea Dal Pozzolo's CV
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using python machine learning
Comparative study of various approaches for transaction Fraud Detection using...
Welcome to International Journal of Engineering Research and Development (IJERD)
Credit card payment_fraud_detection
Credit card fraud detection

What's hot (18)

PPTX
Maximizing a churn campaigns profitability with cost sensitive machine learning
PPTX
Machine learning algorithms and business use cases
PDF
PhD Defense - Example-Dependent Cost-Sensitive Classification
PDF
POSSIBILISTIC SHARPE RATIO BASED NOVICE PORTFOLIO SELECTION MODELS
PDF
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
PDF
Credit iconip
PDF
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
PDF
Is Machine learning useful for Fraud Prevention?
PDF
Classification methods and assessment
PDF
Project crm submission sonali
PDF
Module 2: Machine Learning Deep Dive
PDF
Consideration on Fairness-aware Data Mining
PDF
Fairness-aware Learning through Regularization Approach
PPTX
Credit card fraud dection
PDF
IRJET- Credit Card Fraud Detection Analysis
PPTX
Decision Tree- M.B.A -DecSci
PDF
Decisions Decisions
PDF
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...
Maximizing a churn campaigns profitability with cost sensitive machine learning
Machine learning algorithms and business use cases
PhD Defense - Example-Dependent Cost-Sensitive Classification
POSSIBILISTIC SHARPE RATIO BASED NOVICE PORTFOLIO SELECTION MODELS
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Credit iconip
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Is Machine learning useful for Fraud Prevention?
Classification methods and assessment
Project crm submission sonali
Module 2: Machine Learning Deep Dive
Consideration on Fairness-aware Data Mining
Fairness-aware Learning through Regularization Approach
Credit card fraud dection
IRJET- Credit Card Fraud Detection Analysis
Decision Tree- M.B.A -DecSci
Decisions Decisions
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...
Ad

Viewers also liked (20)

PPTX
Credit Card Fraud Detection - Anomaly Detection
DOC
Notes day 2 - angels
PPTX
Aq workshop l01 - way of a muslim
PPTX
Board room of today presentation
PDF
产品经理实用工具全集(1 8)
DOCX
tttt
DOCX
Confusion in manhaj
PDF
The three fundamental principles
PDF
Colegio nacional nicolas esguerra
PPSX
Raising a Mathematician Training Program 2015
PPT
腾讯产品运营PPT-产品经理的视角
PDF
Taqleed following blindly
PPT
Подарунки
PDF
Slideshare1
PDF
Slideshare1
PDF
Shapiro capitulo 4 2° parte
PDF
Shapiro capitulo 4 1° parte
PDF
Andrea Dal Pozzolo - Data Scientist
PDF
The true religion of god
Credit Card Fraud Detection - Anomaly Detection
Notes day 2 - angels
Aq workshop l01 - way of a muslim
Board room of today presentation
产品经理实用工具全集(1 8)
tttt
Confusion in manhaj
The three fundamental principles
Colegio nacional nicolas esguerra
Raising a Mathematician Training Program 2015
腾讯产品运营PPT-产品经理的视角
Taqleed following blindly
Подарунки
Slideshare1
Slideshare1
Shapiro capitulo 4 2° parte
Shapiro capitulo 4 1° parte
Andrea Dal Pozzolo - Data Scientist
The true religion of god
Ad

Similar to Credit card fraud detection and concept drift adaptation with delayed supervised information (20)

PDF
Introducing R package ESG at Rmetrics Paris 2014 conference
PPSX
Customer Segmentation with R - Deep Dive into flexclust
PDF
Anomaly Detection using Neural Networks with Pandas, Keras and Python
PPT
208 dataflowdgm
PDF
Principles of Business Forecasting 1st Edition Ord Solutions Manual
PDF
Credit Card Fraud Detection Using Machine Learning & Data Science
PDF
Credit Card Fraud Detection Using Machine Learning & Data Science
PDF
Clustering Financial Time Series using their Correlations and their Distribut...
PDF
DST 정-말 좋아합니다!!
PDF
Precautionary Savings and the Stock-Bond Covariance
PDF
Global credit risk cycles, lending standards, and limits to cross border risk...
PPTX
Engineering Data Analysis-ProfCharlton
DOCX
CLASS+2 BOARD PAPER 2021
PDF
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
PDF
Data Mining & Analytics for U.S. Airlines On-Time Performance
PDF
How Will Knowledge Graphs Improve Clinical Reporting Workflows
PDF
Summary jpx wp_en_no9
PDF
Predicting Reassignments of Bug Reports — an Exploratory Investigation
DOCX
Neural Network Model
PDF
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
Introducing R package ESG at Rmetrics Paris 2014 conference
Customer Segmentation with R - Deep Dive into flexclust
Anomaly Detection using Neural Networks with Pandas, Keras and Python
208 dataflowdgm
Principles of Business Forecasting 1st Edition Ord Solutions Manual
Credit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data Science
Clustering Financial Time Series using their Correlations and their Distribut...
DST 정-말 좋아합니다!!
Precautionary Savings and the Stock-Bond Covariance
Global credit risk cycles, lending standards, and limits to cross border risk...
Engineering Data Analysis-ProfCharlton
CLASS+2 BOARD PAPER 2021
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Data Mining & Analytics for U.S. Airlines On-Time Performance
How Will Knowledge Graphs Improve Clinical Reporting Workflows
Summary jpx wp_en_no9
Predicting Reassignments of Bug Reports — an Exploratory Investigation
Neural Network Model
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...

More from Andrea Dal Pozzolo (6)

PDF
Presentation of the unbalanced R package
PDF
Calibrating Probability with Undersampling for Unbalanced Classification
PDF
When is undersampling effective in unbalanced classification tasks?
PDF
Data Innovation Summit - Made in Belgium 2015
PDF
Doctiris project - Innoviris, Brussels
PDF
Racing for unbalanced methods selection
Presentation of the unbalanced R package
Calibrating Probability with Undersampling for Unbalanced Classification
When is undersampling effective in unbalanced classification tasks?
Data Innovation Summit - Made in Belgium 2015
Doctiris project - Innoviris, Brussels
Racing for unbalanced methods selection

Recently uploaded (20)

PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
The scientific heritage No 166 (166) (2025)
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
. Radiology Case Scenariosssssssssssssss
Classification Systems_TAXONOMY_SCIENCE8.pptx
HPLC-PPT.docx high performance liquid chromatography
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Introduction to Cardiovascular system_structure and functions-1
The KM-GBF monitoring framework – status & key messages.pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
AlphaEarth Foundations and the Satellite Embedding dataset
INTRODUCTION TO EVS | Concept of sustainability
Phytochemical Investigation of Miliusa longipes.pdf
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
TOTAL hIP ARTHROPLASTY Presentation.pptx
The scientific heritage No 166 (166) (2025)
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
ECG_Course_Presentation د.محمد صقران ppt

Credit card fraud detection and concept drift adaptation with delayed supervised information

  • 1. Introduction Problem formulation Learning strategy Experiments Conclusion Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information Andrea Dal Pozzolo, Giacomo Boracchi, Olivier Caelen, Cesare Alippi, and Gianluca Bontempi 15/07/2015 IEEE IJCNN 2015 conference 1/ 22
  • 2. Introduction Problem formulation Learning strategy Experiments Conclusion INTRODUCTION Fraud Detection is notably a challenging problem because of concept drift (i.e. customers’ habits evolve) class unbalance (i.e. genuine transactions far outnumber frauds) uncertain class labels (i.e. some frauds are not reported or reported with large delay and few transactions can be timely investigated) 2/ 22
  • 3. Introduction Problem formulation Learning strategy Experiments Conclusion INTRODUCTION II Fraud-detection systems (FDSs) differ from a classification tasks: only a small set of supervised samples is provided by human investigators (they check few alerts). the labels of the majority of transactions are available only several days later (after customers have report unauthorized transactions). 3/ 22
  • 4. Introduction Problem formulation Learning strategy Experiments Conclusion PROBLEM FORMULATION We formalise FD as a classification problem: At day t, classifier Kt−1 (trained on t − 1) associates to each feature vector x ∈ Rn, a score PKt−1 (+|x). The k transactions with largest PKt−1 (+|x) define the alerts At reported to the investigators. Investigators provide feedbacks Ft about the alerts in At, defining a set of k supervised couples (x, y) Ft = {(x, y), x ∈ At}, (1) Ft are the only immediate supervised samples. 4/ 22
  • 5. Introduction Problem formulation Learning strategy Experiments Conclusion PROBLEM FORMULATION II At day t, delayed supervised couples Dt−δ are transactions that have not been checked by investigators, but their label is assumed to be correct after that δ days have elapsed. Time% Feedbacks% Supervised%samples% Delayed%samples% t −δ t −1 t FtDt−δ All%fraudulent%transac9ons%of%a%day% All%genuine%transac9ons%of%a%day% Fraudulent%transac9ons%in%the%feedback% Genuine%transac9ons%in%the%feedback% Figure : The supervised samples available at day t include: i) feedbacks of the first δ days and ii) delayed couples occurred before the δth day. 5/ 22
  • 6. Introduction Problem formulation Learning strategy Experiments Conclusion Ft are a small set of risky transactions according the FDS. Dt−δ contains all the occurred transactions in a day (≈ 99% genuine transactions). Time% Fraudulent%transac9ons%in% Genuine%transac9ons%in% Fraudulent%feedback%in%% Genuine%feedback%in%% FtFt−1Dt−7 Ft−6 Ft−5 Ft−4 Ft−3 Ft−2 FtFt−1Dt−7 Ft−6 Ft−5 Ft−4 Ft−3 Ft−2Dt−8 FtFt−1Dt−7 Ft−6 Ft−5 Ft−4 Ft−3 Ft−2Dt−8 Day'1' Day'2' Day'3' Ft Ft St St Dt−9 Figure : Everyday we have a new set of feedbacks (Ft, Ft−1, . . . , Ft−(δ−1)) from the first δ days and a new set of delayed transactions occurred on the δth day (Dt−δ). In this Figure we assume δ = 7. 6/ 22
  • 7. Introduction Problem formulation Learning strategy Experiments Conclusion ACCURACY MEASURE FOR A FDS The goal of a FDS is to return accurate alerts, thus the highest precision in At. This precision can be measured by the quantity pk(t) = #{(x, y) ∈ Ft s.t. y = +} k (2) where pk(t) is the proportion of frauds in the top k transactions with the highest likelihood of frauds ([1]). 7/ 22
  • 8. Introduction Problem formulation Learning strategy Experiments Conclusion LEARNING STRATEGY Learning from feedbacks Ft is a different problem than learning from delayed samples in Dt−δ: Ft provides recent, up-to-date, information while Dt−δ might be already obsolete once it comes. Percentage of frauds in Ft and Dt−δ is different. Supervised couples in Ft are not independently drawn, but are instead selected by Kt−1. A classifier trained on Ft learns how to label transactions that are most likely to be fraudulent. Feedbacks and delayed transactions have to be treated separately. 8/ 22
  • 9. Introduction Problem formulation Learning strategy Experiments Conclusion CONCEPT DRIFT ADAPTATION Two conventional solutions for CD adaptation are Wt and Et [6, 5]. To learn separately from feedbacks and delayed transactions we propose Ft, WD t and ED t . Time% All%fraudulent%transac9ons%of%a%day% All%genuine%transac9ons%of%a%day% Fraudulent%transac9ons%in%the%feedback% Genuine%transac9ons%in%the%feedback% FtFt−1Dt−7 Ft−6 Ft−5 Ft−4 Ft−3 Ft−2Dt−8 FtFt−1Dt−7 Ft−6 Ft−5 Ft−4 Ft−3 Ft−2Dt−8 Sliding' window' Ensemble' M1M2 Ft Ft EtED t Wt WD t Figure : Supervised information used by different classifiers in the ensemble and sliding window approach.9/ 22
  • 10. Introduction Problem formulation Learning strategy Experiments Conclusion CLASSIFIER AGGREGATIONS WD t and ED t have to be aggregated with Ft to exploit information provided by feedbacks. We combine these classifiers by averaging the posterior probabilities. Sliding window: PAW t (+|x) = PFt (+|x) + PWD t (+|x) 2 Ensemble: PAE t (+|x) = PFt (+|x) + PED t (+|x) 2 AE t and AW t give larger influence to feedbacks on the probability estimates w.r.t Et and Wt. 10/ 22
  • 11. Introduction Problem formulation Learning strategy Experiments Conclusion TWO RANDOM FOREST We used two different Random Forests (RF) classifiers depending on the fraud prevalence in the training set. for classifiers on delayed samples we used a Balanced RF [3] (undersampling before training each tree). for Ft we adopted a standard RF [2] (no undersampling). 11/ 22
  • 12. Introduction Problem formulation Learning strategy Experiments Conclusion DATASETS We considered two datasets of credit card transactions: Table : Datasets Id Start day End day # Days # Instances # Features % Fraud 2013 2013-09-05 2014-01-18 136 21,830,330 51 0.19% 2014 2014-08-05 2014-10-09 44 7,619,452 51 0.22% In the 2013 dataset there is an average of 160k transaction per day and about 304 frauds per day, while in the 2014 dataset there is a daily average of 173k transactions and 380 frauds. 12/ 22
  • 13. Introduction Problem formulation Learning strategy Experiments Conclusion EXPERIMENTS Settings: We assume that after δ = 7 days all the transactions labels are provided (delayed supervised information) A budget of k = 100 alerts that can be checked by the investigators (Ft is trained on a window of 700 feedbacks). A window of α = 16 days is used to train WD t (16 models in ED t ) Each experiments is repeated 10 times and the performance is assessed using pk. 13/ 22
  • 14. Introduction Problem formulation Learning strategy Experiments Conclusion In both 2013 and 2014 datasets, aggregations AW t and AE t outperforms the other FDSs in terms of pk. Table : Average pk in all the batches for the sliding window Dataset 2013 Dataset 2014 classifier mean sd mean sd F 0.609 0.250 0.596 0.249 WD 0.540 0.227 0.549 0.253 W 0.563 0.233 0.559 0.256 AW 0.697 0.212 0.657 0.236 Table : Average pk in all the batches for the ensemble Dataset 2013 Dataset 2014 classifier mean sd mean sd F 0.603 0.258 0.596 0.271 ED 0.459 0.237 0.443 0.242 E 0.555 0.239 0.516 0.252 AE 0.683 0.220 0.634 0.239 14/ 22
  • 15. Introduction Problem formulation Learning strategy Experiments Conclusion WDWFAW (a) Sliding window 2013 WD WFAW (b) Sliding window 2014 F E ED AE (c) Ensemble 2013 E EDFAE (d) Ensemble 2014 Sum of ranks from the Friedman test [4], classifiers having the same letter are not significantly different (paired t-test based upon on the ranks). 15/ 22
  • 16. Introduction Problem formulation Learning strategy Experiments Conclusion EXPERIMENTS ON ARTIFICIAL DATASET WITH CD In the second part we artificially introduce CD in specific days by juxtaposing transactions acquired in different times of the year. Table : Datasets with Artificially Introduced CD Id Start 2013 End 2013 Start 2014 End 2014 CD1 2013-09-05 2013-09-30 2014-08-05 2014-08-31 CD2 2013-10-01 2013-10-31 2014-09-01 2014-09-30 CD3 2013-11-01 2013-11-30 2014-08-05 2014-08-31 16/ 22
  • 17. Introduction Problem formulation Learning strategy Experiments Conclusion Table : Average pk in the month before and after CD for the sliding window approach (a) Before CD CD1 CD2 CD3 classifier mean sd mean sd mean sd F 0.411 0.142 0.754 0.270 0.690 0.252 WD 0.291 0.129 0.757 0.265 0.622 0.228 W 0.332 0.215 0.758 0.261 0.640 0.227 AW 0.598 0.192 0.788 0.261 0.768 0.221 (b) After CD CD1 CD2 CD3 classifier mean sd mean sd mean sd F 0.635 0.279 0.511 0.224 0.599 0.271 WD 0.536 0.335 0.374 0.218 0.515 0.331 W 0.570 0.309 0.391 0.213 0.546 0.319 AW 0.714 0.250 0.594 0.210 0.675 0.244 17/ 22
  • 18. Introduction Problem formulation Learning strategy Experiments Conclusion AW W (e) Sliding window strate- gies on dataset CD1 AW W (f) Sliding window strate- gies on dataset CD2 W AW (g) Sliding window strate- gies on dataset CD3 AE E (h) Ensemble strategies on dataset CD3 Figure : Average pk per day (the higher the better) for classifiers on datasets with artificial concept drift smoothed using moving average of 15 days. The vertical bar denotes the date of the concept drift. 18/ 22
  • 19. Introduction Problem formulation Learning strategy Experiments Conclusion CONCLUDING REMARKS We notice that: Ft outperforms classifiers on delayed samples (trained on obsolete couples). Ft outperforms classifiers trained on the entire supervised dataset (dominated by delayed samples). Aggregation gives larger influence to feedbacks. 19/ 22
  • 20. Introduction Problem formulation Learning strategy Experiments Conclusion CONCLUSION We formalise a real-world FDS framework that meets realistic working conditions. In a real-world scenario, there is a strong alert-feedback interaction that has to be explicitly considered Feedbacks and delayed samples should be separately handled when training a FDS Aggregating two distinct classifiers is an effective strategy and that it enables a prompter adaptation in concept drifting environments 20/ 22
  • 21. Introduction Problem formulation Learning strategy Experiments Conclusion FUTURE WORK Future work will focus on: Adaptive aggregation of Ft and the classifier trained on delayed samples. Study the sample selection bias in Ft introduced by alert-feedback interaction. 21/ 22
  • 22. Introduction Problem formulation Learning strategy Experiments Conclusion BIBLIOGRAPHY [1] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland. Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3):602–613, 2011. [2] L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001. [3] C. Chen, A. Liaw, and L. Breiman. Using random forest to learn imbalanced data. University of California, Berkeley, 2004. [4] M. Friedman. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32(200):675–701, 1937. [5] J. Gao, B. Ding, W. Fan, J. Han, and P. S. Yu. Classifying data streams with skewed class distributions and concept drifts. Internet Computing, 12(6):37–49, 2008. [6] D. K. Tasoulis, N. M. Adams, and D. J. Hand. Unsupervised clustering in streaming data. In ICDM Workshops, pages 638–642, 2006. 22/ 22