SlideShare a Scribd company logo
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
Credit Card Fraud Detection
Why Theory Doesn't Adjust to Practice
Alejandro Correa Bahnsen, Luxembourg University
Andrés Gonzalez Montoya, Scotia Bank
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
Introduction
€ 500
€ 600
€ 700
€ 800
2007 2008 2009 2010 2011E 2012E
Europe fraud evolution
Internet transactions (millions of euros)
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
Introduction
$-
$1.0
$2.0
$3.0
$4.0
$5.0
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
US fraud evolution
Online revenue lost due to fraud (Billions of dollars)
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
• Increasing fraud levels around the world
• Different technologies and legal requirements makes
it harder to control
• There is a need for advanced fraud detection
systems
Introduction
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
• Introduction
• Transaction flow
• Database
• Evaluation of algorithms
• If-Then rules (Expert Rules)
• Financial measure
• Predictive modeling
• Logistic Regression
• Cost Sensitive Logistic Regression
Agenda
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
Simplify transaction flow
Fraud??
Network
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
Data
• Larger European card
processing company
• 2012 card present transactions
• 750,000 Transactions
• 3500 Frauds
• 0.467% Fraud rate
• 148,562 EUR lost due to fraud
on test dataset
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
Test
Train
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
• Raw attributes
• Other attributes:
Age, country of residence, postal code, type of card
Data
TRXID Client ID Date Amount Location Type
Merchant
Group
Fraud
1 1 2/1/12 6:00 580 Ger Internet Airlines No
2 1 2/1/12 6:15 120 Eng Present Car Rent No
3 2 2/1/12 8:20 12 Bel Present Hotel Yes
4 1 3/1/12 4:15 60 Esp ATM ATM No
5 2 3/1/12 9:18 8 Fra Present Retail No
6 1 3/1/12 9:55 1210 Ita Internet Airlines Yes
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
• Derived attributes
Data
Trx
ID
Client
ID
Date Amount Location Type
Merchant
Group
Fraud
No. of Trx – same
client – last 6 hour
Sum – same client
– last 7 days
1 1 2/1/12 6:00 580 Ger Internet Airlines No 0 0
2 1 2/1/12 6:15 120 Eng Present Car Renting No 1 580
3 2 2/1/12 8:20 12 Bel Present Hotel Yes 0 0
4 1 3/1/12 4:15 60 Esp ATM ATM No 0 700
5 2 3/1/12 9:18 8 Fra Present Retail No 0 12
6 1 3/1/12 9:55 1210 Ita Internet Airlines Yes 1 760
By Group Last Function
Client None hour Count
Credit Card Transaction Type day Sum(Amount)
Merchant week Avg(Amount)
Merchant Category month
Merchant Country 3 months
– Combination of following criteria:
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
• Misclassification = 1 −
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
• Recall =
𝑇𝑃
𝑇𝑃+𝐹𝑁
• Precision =
𝑇𝑃
𝑇𝑃+𝐹𝑃
• F-Score = 2
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
Evaluation
True Class (𝑦𝑖)
Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)
Predicted class
(𝑝𝑖)
Fraud (𝑝𝑖=1) TP FP
Legitimate (𝑝𝑖=0) FN TN
• Confusion matrix
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
• Introduction
• Transaction flow
• Database
• Evaluation of algorithms
• If-Then rules (Expert Rules)
• Financial measure
• Predictive modeling
• Logistic Regression
• Cost Sensitive Logistic Regression
Agenda
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
Fraud
Algorithms
• If-Then rules
• Predictive modeling
• Logistic Regression
• Decision Trees
• Random Forest
• Cost Sensitive
Logistic Regression
Fraud??
Network
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
• “Purpose is to use facts and rules, taken from the
knowledge of many human experts, to help make
decisions.”
• Example of rules
• More than 4 ATM transactions in one hour?
• More than 2 transactions in 5 minutes?
• Magnetic stripe transaction then internet transaction?
If-Then rules (Expert rules)
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
• More than 4 ATM transactions in one hour?
• More than 2 transactions in 5 minutes?
• Magnetic stripe transaction then internet
transaction?
If-Then rules (Expert rules)
Fraud??
Network
If one or more rules is activated then decline the transaction
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
• Problems with rules
• New fraud patterns are not detected
• Only simple rules can be created
• Advantages of rules
• Easy to implement
• Very easy to interpret
If-Then rules (Expert rules)
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
If-Then rules (Expert rules)
1.04%
31%
17%
22%
Miss-cla Recall Precision F1-Score
Results
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
• Motivation
• False positives carries a different cost than
false negatives
• Frauds range from few to thousands of euros
(dollars, pounds, etc)
Financial evaluation
There is a need for a real comparison measure
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
• Cost matrix
where:
• Evaluation measure
Financial evaluation
Ca Administrative costs
Amt Amount of transaction i
True Class (𝑦𝑖)
Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)
Predicted class
(𝑝𝑖)
Fraud (𝑝𝑖=1) Ca Ca
Legitimate (𝑝𝑖=0) Amt 0
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
If-Then rules
1.04%
31%
17%
22%
Miss-cla Recall Precision F1-Score
Results
€
95,520
€
148,562
Cost Cost No Model
148,562 EUR are the losses due to fraud in the test database (2 months)
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
• Introduction
• Transaction flow
• Database
• Evaluation of algorithms
• If-Then rules (Expert Rules)
• Financial measure
• Predictive modeling
• Logistic Regression
• Cost Sensitive Logistic Regression
Agenda
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
Predictive modeling is the use of statistical and
mathematical techniques to discover patterns in data in
order to make predictions
Predictive modeling
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
Predictive modeling
Amountoftransaction
Number of transactions last day
Normal Transaction
Fraud
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
Predictive modeling
Amountoftransaction
Number of transactions last day
Normal Transaction
Fraud
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
Predictive modeling
Amount of transaction
Number of transactions last day
Normal Transaction
Fraud
Amount spend on internet last month
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
True Class (𝑦𝑖)
Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)
Predicted class
(𝑝𝑖)
Fraud (𝑝𝑖=1) 0 1
Legitimate (𝑝𝑖=0) 1 0
• Model
• Cost Function
• Cost Matrix
Logistic Regression
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
€
148,196
€
148,562
Cost Cost No Model
0.52% 0%
2%
0%
Miss-cla Recall Precision F1-Score
Logistic Regression
Results
148,562 EUR are the losses due to fraud in the test database (2 months)
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
1% 5% 10% 20% 50%
Logistic Regression
Sub-sampling procedure:
0.467%
Select all the frauds and a random sample of the legitimate transactions.
620,000
310,000
62,000
31,000 15,500 5,200
Fraud Percentage
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
Logistic Regression
Results
€ 148,562 € 148,196
€ 142,510
€ 112,103
€ 79,838
€ 65,870
€ 46,530
€ -
€ 20,000
€ 40,000
€ 60,000
€ 80,000
€ 100,000
€ 120,000
€ 140,000
€ 160,000
0%
10%
20%
30%
40%
50%
60%
70%
No Model All 1% 5% 10% 20% 50%
Cost Recall Precision Miss-cla F1-Score
Selecting the algorithm by Cost
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
Logistic Regression
• Best model selected using traditional F1-Score does not gives the best results in
terms of cost
• Model selected by cost, is trained using less than 1% of the database, meaning there
is a lot of information excluded
• The algorithm is trained to minimize the miss-classification (approx.) but then is
evaluated based on cost
• Why not train the algorithm to minimize the cost instead?
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
True Class (𝑦𝑖)
Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)
Predicted class
(𝑝𝑖)
Fraud (𝑝𝑖=1) Ca Ca
Legitimate (𝑝𝑖=0) Amt 0
• Cost Matrix
Cost Sensitive Logistic Regression
• Cost Function
• Objective
Find 𝜃 that minimized the cost function (Genetic Algorithms)
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
• Cost Function
• Gradient
• Hessian
Cost Sensitive Logistic Regression
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
Cost Sensitive Logistic Regression
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Legitimate
Fraud
Amount cumulative distribution
€49
€370€124
€196
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
€ 148,562
€ 31,174
€ 37,785
€ 66,245 € 67,264
€ 73,772
€ 85,724
€ -
€ 20,000
€ 40,000
€ 60,000
€ 80,000
€ 100,000
€ 120,000
€ 140,000
€ 160,000
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
No Model All 1% 5% 10% 20% 50%
Cost Recall Precision F1-Score
Cost sensitive Logistic Regression
Results
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
Cost sensitive Logistic Regression
Results€ 148,562
€ 95,520
€ 46,530
€ 31,174
€ 35,466 € 34,203
€ -
€ 20,000
€ 40,000
€ 60,000
€ 80,000
€ 100,000
€ 120,000
€ 140,000
€ 160,000
0%
10%
20%
30%
40%
50%
60%
70%
80%
No Model If-Then rules Logistic Regression Cost Sensitive
Logistic Regression
Decision Trees Random Forests
Cost Recall Precision F1-Score
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
Conclusion
• Selecting models based on traditional statistics does not
gives the best results in terms of cost
• Models should be evaluated taking into account real
financial costs of the application
• Algorithms should be developed to incorporate those
financial costs
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
Contact information
Alejandro Correa Bahnsen
University of Luxembourg
Luxembourg
al.bahnsen@gmail.com
http://www.linkedin.com/in/albahnsen
http://www.slideshare.net/albahnsen
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
Thank You!!
Alejandro Correa Bahnsen
Andres Gonzalez Montoya
Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013
• Hastie, T., & Tibshirani, R. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
Beijing.
• Hand, D., Whitrow, C., Adams, N. M., Juszczak, P., & Weston, D. (2007). Performance criteria for plastic card fraud
detection tools. Journal of the Operational Research Society, 59, 956–962.
• Sheng, V., & Ling, C. (2006). Thresholding for making classifiers cost-sensitive. Proceedings of the National
Conference on Artificial Intelligence.
• Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data mining for credit card fraud: A
comparative study. Decision Support Systems, 50(3), 602–613.
• Ling, C., & Sheng, V. (2008). Cost-sensitive learning and the class imbalance problem. In C. Sammut & G. I. Webb
(Eds.), Encyclopedia of Machine Learning (pp. 231–235). Springer.
• Moro, S., Laureano, R., & Cortez, P. (2011). Using data mining for bank direct marketing: An application of the
crisp-dm methodology. In EUROSIS (Ed.), European Simulation and Modeling Conference - ESM’2011 (pp. 117–
121). Guimares, Portugal.
References

More Related Content

PDF
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
PPTX
Fraud Detection with Cost-Sensitive Predictive Analytics
PDF
PhD Defense - Example-Dependent Cost-Sensitive Classification
PPTX
Fraud Analytics
PPTX
Credit Card Fraud Detection Client Presentation
PPT
Audit,fraud detection Using Picalo
PDF
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
PPTX
Fraud analytics
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Fraud Detection with Cost-Sensitive Predictive Analytics
PhD Defense - Example-Dependent Cost-Sensitive Classification
Fraud Analytics
Credit Card Fraud Detection Client Presentation
Audit,fraud detection Using Picalo
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud analytics

What's hot (20)

PPTX
Online Payment Fraud Detection with Azure Machine Learning
PDF
Credit Card Fraud Detection
PPTX
Credit Card Fraudulent Transaction Detection Research Paper
PDF
Operationalize deep learning models for fraud detection with Azure Machine Le...
PDF
A Study on Credit Card Fraud Detection using Machine Learning
PDF
Uses of analytics in the field of Banking
PDF
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
PDF
Predictive analytics solution for claims fraud detection
PDF
Fraud detection ML
PDF
Credit card fraud detection through machine learning
PDF
Fraud Detection presentation
PDF
Credit card fraud detection using python machine learning
PDF
Mclarens @ Data Science Sg
PPTX
Fraud Detection in Insurance with Machine Learning for WARTA - Artur Suchwalko
PPTX
Credit card fraud detection using machine learning Algorithms
PPTX
Liubomyr Bregman "Financial Crime Detection using Advanced Analytics"
PPT
Ibm financial crime management solution 3
PPT
Creditcard
PDF
Fraud detection with Machine Learning
Online Payment Fraud Detection with Azure Machine Learning
Credit Card Fraud Detection
Credit Card Fraudulent Transaction Detection Research Paper
Operationalize deep learning models for fraud detection with Azure Machine Le...
A Study on Credit Card Fraud Detection using Machine Learning
Uses of analytics in the field of Banking
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Predictive analytics solution for claims fraud detection
Fraud detection ML
Credit card fraud detection through machine learning
Fraud Detection presentation
Credit card fraud detection using python machine learning
Mclarens @ Data Science Sg
Fraud Detection in Insurance with Machine Learning for WARTA - Artur Suchwalko
Credit card fraud detection using machine learning Algorithms
Liubomyr Bregman "Financial Crime Detection using Advanced Analytics"
Ibm financial crime management solution 3
Creditcard
Fraud detection with Machine Learning
Ad

Viewers also liked (10)

PDF
2011 advanced analytics through the credit cycle
PDF
Modern Data Science
PDF
Fraud analytics detección y prevención de fraudes en la era del big data sl...
PPTX
Maximizing a churn campaigns profitability with cost sensitive machine learning
PPTX
1609 Fraud Data Science
PDF
Analytics - compitiendo en la era de la informacion
PPTX
Classifying Phishing URLs Using Recurrent Neural Networks
PDF
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
PDF
Demystifying machine learning using lime
PDF
Ensembles of example dependent cost-sensitive decision trees slides
2011 advanced analytics through the credit cycle
Modern Data Science
Fraud analytics detección y prevención de fraudes en la era del big data sl...
Maximizing a churn campaigns profitability with cost sensitive machine learning
1609 Fraud Data Science
Analytics - compitiendo en la era de la informacion
Classifying Phishing URLs Using Recurrent Neural Networks
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Demystifying machine learning using lime
Ensembles of example dependent cost-sensitive decision trees slides
Ad

Similar to 2013 credit card fraud detection why theory dosent adjust to practice (20)

PPTX
Data analysis for credit card fraud detection.pptx
PDF
Role of Data Analytics in Finance & Fraud Detection | IABAC
PPT
Applications of advanced data analytic in decision making
PPTX
Fortifying Fraud Detection: Advanced Data Analysis Techniques for Enhanced Se...
PPTX
Fraud Detection: Innovative Approaches to Safeguarding Integrity
PDF
Predictive analytics 2025_br
PDF
201406 IASA: Analytics Maturity - Unlocking The Business Impact
PPTX
Analytics in banking services
PDF
Preventing Tax Evasion & Combating Fraud through Predictive Analytics
PPTX
What is the Value of SAS Analytics?
PDF
Maximising The Value of Analytics in Tax Compliance
PDF
Data analytics in finance broucher
PPTX
Fraud Detection: Harnessing Data Science for Securing Transactions
DOCX
credit card fraud analysis using predictive modeling python project abstract
PDF
How to sustain analytics capabilities in an organization
PDF
The Analytics Lifecycle
PDF
Predictive Analytics Solutions, Edsson 2019
PPTX
How analytics will transform banking in luxembourg
PDF
1340 keynote minkowski_using our laptop
PDF
TransactionBasedAnalytics2010
Data analysis for credit card fraud detection.pptx
Role of Data Analytics in Finance & Fraud Detection | IABAC
Applications of advanced data analytic in decision making
Fortifying Fraud Detection: Advanced Data Analysis Techniques for Enhanced Se...
Fraud Detection: Innovative Approaches to Safeguarding Integrity
Predictive analytics 2025_br
201406 IASA: Analytics Maturity - Unlocking The Business Impact
Analytics in banking services
Preventing Tax Evasion & Combating Fraud through Predictive Analytics
What is the Value of SAS Analytics?
Maximising The Value of Analytics in Tax Compliance
Data analytics in finance broucher
Fraud Detection: Harnessing Data Science for Securing Transactions
credit card fraud analysis using predictive modeling python project abstract
How to sustain analytics capabilities in an organization
The Analytics Lifecycle
Predictive Analytics Solutions, Edsson 2019
How analytics will transform banking in luxembourg
1340 keynote minkowski_using our laptop
TransactionBasedAnalytics2010

More from Alejandro Correa Bahnsen, PhD (6)

PPTX
black hat deephish
PPTX
DeepPhish: Simulating malicious AI
PDF
AI vs. AI: Can Predictive Models Stop the Tide of Hacker AI?
PPTX
How I Learned to Stop Worrying and Love Building Data Products
PPTX
Fraud Detection by Stacking Cost-Sensitive Decision Trees
PDF
2012 predictive clusters
black hat deephish
DeepPhish: Simulating malicious AI
AI vs. AI: Can Predictive Models Stop the Tide of Hacker AI?
How I Learned to Stop Worrying and Love Building Data Products
Fraud Detection by Stacking Cost-Sensitive Decision Trees
2012 predictive clusters

2013 credit card fraud detection why theory dosent adjust to practice

  • 1. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Credit Card Fraud Detection Why Theory Doesn't Adjust to Practice Alejandro Correa Bahnsen, Luxembourg University Andrés Gonzalez Montoya, Scotia Bank
  • 2. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Introduction € 500 € 600 € 700 € 800 2007 2008 2009 2010 2011E 2012E Europe fraud evolution Internet transactions (millions of euros)
  • 3. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Introduction $- $1.0 $2.0 $3.0 $4.0 $5.0 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 US fraud evolution Online revenue lost due to fraud (Billions of dollars)
  • 4. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 • Increasing fraud levels around the world • Different technologies and legal requirements makes it harder to control • There is a need for advanced fraud detection systems Introduction
  • 5. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 • Introduction • Transaction flow • Database • Evaluation of algorithms • If-Then rules (Expert Rules) • Financial measure • Predictive modeling • Logistic Regression • Cost Sensitive Logistic Regression Agenda
  • 6. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Simplify transaction flow Fraud?? Network
  • 7. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Data • Larger European card processing company • 2012 card present transactions • 750,000 Transactions • 3500 Frauds • 0.467% Fraud rate • 148,562 EUR lost due to fraud on test dataset Dec Nov Oct Sep Aug Jul Jun May Apr Mar Feb Jan Test Train
  • 8. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 • Raw attributes • Other attributes: Age, country of residence, postal code, type of card Data TRXID Client ID Date Amount Location Type Merchant Group Fraud 1 1 2/1/12 6:00 580 Ger Internet Airlines No 2 1 2/1/12 6:15 120 Eng Present Car Rent No 3 2 2/1/12 8:20 12 Bel Present Hotel Yes 4 1 3/1/12 4:15 60 Esp ATM ATM No 5 2 3/1/12 9:18 8 Fra Present Retail No 6 1 3/1/12 9:55 1210 Ita Internet Airlines Yes
  • 9. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 • Derived attributes Data Trx ID Client ID Date Amount Location Type Merchant Group Fraud No. of Trx – same client – last 6 hour Sum – same client – last 7 days 1 1 2/1/12 6:00 580 Ger Internet Airlines No 0 0 2 1 2/1/12 6:15 120 Eng Present Car Renting No 1 580 3 2 2/1/12 8:20 12 Bel Present Hotel Yes 0 0 4 1 3/1/12 4:15 60 Esp ATM ATM No 0 700 5 2 3/1/12 9:18 8 Fra Present Retail No 0 12 6 1 3/1/12 9:55 1210 Ita Internet Airlines Yes 1 760 By Group Last Function Client None hour Count Credit Card Transaction Type day Sum(Amount) Merchant week Avg(Amount) Merchant Category month Merchant Country 3 months – Combination of following criteria:
  • 10. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 • Misclassification = 1 − 𝑇𝑃+𝑇𝑁 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 • Recall = 𝑇𝑃 𝑇𝑃+𝐹𝑁 • Precision = 𝑇𝑃 𝑇𝑃+𝐹𝑃 • F-Score = 2 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙 Evaluation True Class (𝑦𝑖) Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0) Predicted class (𝑝𝑖) Fraud (𝑝𝑖=1) TP FP Legitimate (𝑝𝑖=0) FN TN • Confusion matrix
  • 11. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 • Introduction • Transaction flow • Database • Evaluation of algorithms • If-Then rules (Expert Rules) • Financial measure • Predictive modeling • Logistic Regression • Cost Sensitive Logistic Regression Agenda
  • 12. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Fraud Algorithms • If-Then rules • Predictive modeling • Logistic Regression • Decision Trees • Random Forest • Cost Sensitive Logistic Regression Fraud?? Network
  • 13. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 • “Purpose is to use facts and rules, taken from the knowledge of many human experts, to help make decisions.” • Example of rules • More than 4 ATM transactions in one hour? • More than 2 transactions in 5 minutes? • Magnetic stripe transaction then internet transaction? If-Then rules (Expert rules)
  • 14. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 • More than 4 ATM transactions in one hour? • More than 2 transactions in 5 minutes? • Magnetic stripe transaction then internet transaction? If-Then rules (Expert rules) Fraud?? Network If one or more rules is activated then decline the transaction
  • 15. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 • Problems with rules • New fraud patterns are not detected • Only simple rules can be created • Advantages of rules • Easy to implement • Very easy to interpret If-Then rules (Expert rules)
  • 16. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 If-Then rules (Expert rules) 1.04% 31% 17% 22% Miss-cla Recall Precision F1-Score Results
  • 17. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 • Motivation • False positives carries a different cost than false negatives • Frauds range from few to thousands of euros (dollars, pounds, etc) Financial evaluation There is a need for a real comparison measure
  • 18. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 • Cost matrix where: • Evaluation measure Financial evaluation Ca Administrative costs Amt Amount of transaction i True Class (𝑦𝑖) Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0) Predicted class (𝑝𝑖) Fraud (𝑝𝑖=1) Ca Ca Legitimate (𝑝𝑖=0) Amt 0
  • 19. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 If-Then rules 1.04% 31% 17% 22% Miss-cla Recall Precision F1-Score Results € 95,520 € 148,562 Cost Cost No Model 148,562 EUR are the losses due to fraud in the test database (2 months)
  • 20. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 • Introduction • Transaction flow • Database • Evaluation of algorithms • If-Then rules (Expert Rules) • Financial measure • Predictive modeling • Logistic Regression • Cost Sensitive Logistic Regression Agenda
  • 21. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Predictive modeling is the use of statistical and mathematical techniques to discover patterns in data in order to make predictions Predictive modeling
  • 22. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Predictive modeling Amountoftransaction Number of transactions last day Normal Transaction Fraud
  • 23. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Predictive modeling Amountoftransaction Number of transactions last day Normal Transaction Fraud
  • 24. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Predictive modeling Amount of transaction Number of transactions last day Normal Transaction Fraud Amount spend on internet last month
  • 25. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 True Class (𝑦𝑖) Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0) Predicted class (𝑝𝑖) Fraud (𝑝𝑖=1) 0 1 Legitimate (𝑝𝑖=0) 1 0 • Model • Cost Function • Cost Matrix Logistic Regression
  • 26. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 € 148,196 € 148,562 Cost Cost No Model 0.52% 0% 2% 0% Miss-cla Recall Precision F1-Score Logistic Regression Results 148,562 EUR are the losses due to fraud in the test database (2 months)
  • 27. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 1% 5% 10% 20% 50% Logistic Regression Sub-sampling procedure: 0.467% Select all the frauds and a random sample of the legitimate transactions. 620,000 310,000 62,000 31,000 15,500 5,200 Fraud Percentage
  • 28. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Logistic Regression Results € 148,562 € 148,196 € 142,510 € 112,103 € 79,838 € 65,870 € 46,530 € - € 20,000 € 40,000 € 60,000 € 80,000 € 100,000 € 120,000 € 140,000 € 160,000 0% 10% 20% 30% 40% 50% 60% 70% No Model All 1% 5% 10% 20% 50% Cost Recall Precision Miss-cla F1-Score Selecting the algorithm by Cost
  • 29. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Logistic Regression • Best model selected using traditional F1-Score does not gives the best results in terms of cost • Model selected by cost, is trained using less than 1% of the database, meaning there is a lot of information excluded • The algorithm is trained to minimize the miss-classification (approx.) but then is evaluated based on cost • Why not train the algorithm to minimize the cost instead?
  • 30. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 True Class (𝑦𝑖) Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0) Predicted class (𝑝𝑖) Fraud (𝑝𝑖=1) Ca Ca Legitimate (𝑝𝑖=0) Amt 0 • Cost Matrix Cost Sensitive Logistic Regression • Cost Function • Objective Find 𝜃 that minimized the cost function (Genetic Algorithms)
  • 31. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 • Cost Function • Gradient • Hessian Cost Sensitive Logistic Regression
  • 32. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Cost Sensitive Logistic Regression 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Legitimate Fraud Amount cumulative distribution €49 €370€124 €196
  • 33. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 € 148,562 € 31,174 € 37,785 € 66,245 € 67,264 € 73,772 € 85,724 € - € 20,000 € 40,000 € 60,000 € 80,000 € 100,000 € 120,000 € 140,000 € 160,000 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% No Model All 1% 5% 10% 20% 50% Cost Recall Precision F1-Score Cost sensitive Logistic Regression Results
  • 34. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Cost sensitive Logistic Regression Results€ 148,562 € 95,520 € 46,530 € 31,174 € 35,466 € 34,203 € - € 20,000 € 40,000 € 60,000 € 80,000 € 100,000 € 120,000 € 140,000 € 160,000 0% 10% 20% 30% 40% 50% 60% 70% 80% No Model If-Then rules Logistic Regression Cost Sensitive Logistic Regression Decision Trees Random Forests Cost Recall Precision F1-Score
  • 35. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Conclusion • Selecting models based on traditional statistics does not gives the best results in terms of cost • Models should be evaluated taking into account real financial costs of the application • Algorithms should be developed to incorporate those financial costs
  • 36. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Contact information Alejandro Correa Bahnsen University of Luxembourg Luxembourg al.bahnsen@gmail.com http://www.linkedin.com/in/albahnsen http://www.slideshare.net/albahnsen
  • 37. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 Thank You!! Alejandro Correa Bahnsen Andres Gonzalez Montoya
  • 38. Copyright © 2013, SAS Institute Inc. All rights reserved. #analytics2013 • Hastie, T., & Tibshirani, R. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Beijing. • Hand, D., Whitrow, C., Adams, N. M., Juszczak, P., & Weston, D. (2007). Performance criteria for plastic card fraud detection tools. Journal of the Operational Research Society, 59, 956–962. • Sheng, V., & Ling, C. (2006). Thresholding for making classifiers cost-sensitive. Proceedings of the National Conference on Artificial Intelligence. • Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3), 602–613. • Ling, C., & Sheng, V. (2008). Cost-sensitive learning and the class imbalance problem. In C. Sammut & G. I. Webb (Eds.), Encyclopedia of Machine Learning (pp. 231–235). Springer. • Moro, S., Laureano, R., & Cortez, P. (2011). Using data mining for bank direct marketing: An application of the crisp-dm methodology. In EUROSIS (Ed.), European Simulation and Modeling Conference - ESM’2011 (pp. 117– 121). Guimares, Portugal. References