SlideShare a Scribd company logo
Anomaly and fraud detection with
Machine Learning
Ahmed Rebai
Esprit Prépa & Esprit School of Business
Plan
● Anomaly/Fraud detection with machine learning/Deep
learning
● Outlier detection general applications
● Applications to financial sector (banking, insurance): Towards
Fintech
● Supervised vs. Unsupervised
● Algorithms and methods
● Conclusions Ensemble method (Isolation forest) and Density
method (DBSCAN/HDBSCAN)
Anomaly/Fraud detection with
machine learning/Deep learning
● Machine learning : data mining, predictive analytics, artificial
intelligence (in practice : statistics and numerical methods for
statistical analysis)
● Anomaly: deviation from “normal”/”expected”
Anomaly=Outlier=Deviant or Unusual Data Point
● Anomaly detection : detection of outlier events or observations:
Detecting deviations from the expected pattern of a data set.
The real challenge in anomaly detection is to construct the right
data model to separate outliers from noise and normal data.
In one dimension: what anomalies
look like?
Real Example: Stock market :
The May 6, 2010, Flash Crash at 2:45 pm
Real Example from the Stock market :
A real financial fraud!
● On April 21, 2015, nearly five years after the
incident, the U.S. Department of Justice laid "22
criminal counts, including fraud and market
manipulation" against Navinder Singh Sarao, a
trader. Among the charges included was the
use of spoofing algorithms; just prior to the
Flash Crash, he placed thousands of E-mini
S&P 500 stock index futures contracts which he
planned on canceling later.
Stock market : Flash Crash
continues
In two dimension: what anomalies
look like?
Results with DBSCAN algorithm from sklearn.cluster
In two dimension: what anomalies
look like?
Results with 2D PCA method from sklear.decomposition
also you can do kernel PCA
January 28 1986 Challenger spatial mission:
Dalal & al., 1989 Journal of the American Statistical Association
Real Example: Rocket science
In two dimension: what anomalies
look like?
In two dimension: what anomalies
look like?
Using Logarithmic, log-log plots to
display outliers can help
pyplot.hist can "log" y axis for you with keyword argument log=True
Example : plt.hist(numpy.log10(data), log=True)
Linear histogram Frequency vs log10
Price
Log-log histogram
log10 freq vs log10
price
Using Logarithmic, log-log plots to
display outliers can help
Linear plot Log-log plot
matplotlib.pyplot.loglog
Outlier detection general applications
● Cyber security: Detect cyber-attacks on networks: more
trusted connections
● Equipment failure (risk theory/théorie de la fiabilité)
industrial sector
● Fraud detection
● Detecting cheaters in mobile gaming
● Preprocessing task for analysis or machine learning
● Reduce false declines then grow the revenue
● Detecting French regions where Le Pen’s scores at the
2017 presidential election deviate from predictions based
on socio-economic variables (towardsdatascience)
Applications to financial sector (banking,
insurance): Towards Fintech
● Retail bank: Credit card fraud
● Private bank: Market abuse, Anti-money laundering
● Investment bank: Market abuse (Flash Crash), Anti-money
laundering
●
Insurance companies: Fraudulent operations
●
Central banks: detecting tax havens (panama papers)
Requires different approaches because Red flags are banking-type
specific (specific business expertise)
Methodology?
Supervised learning vs.
Unsupervised learning
Supervised vs. Unsupervised
● In credit card fraud detection one knows the target
variable.
How? Customers tell us. Can use supervised approach
because the true class is self-revealing.
● In market abuse or money laundering detection we don’t
really know classes (how money is laundered changes
all the time and by the same criminal organisation)
Why supervised learning is difficult?
● Severe class imbalance : we estimate that
99.9% are trusted operations vs. 0.1% of
fraudulent operations
● Problem during the train test split
● Solution you can use the stratification option in
the train_test_split function of sklearn
And now why unsupervised is
difficult?
● Sever class overlap : money laundering is mixed with legal
financial activity, especially in Investment banks
● Uncertainty around the data model
● The complexity of data
● The huge volume of data (time complexity)
● Next we will discuss the data models
● To avoid time complexity: use dask, numba (used to speed
numpy), ray and the newly module modin (used to speed
pandas) + demonstration
Algorithms
● We can differentiate between three methods:
- Distance based algorithms (similarity)
K-NN for classification
K-Means for clustering
- Density-based (fitting a density)
DBSCAN and HDBSCAN
Local outlier factor (LOF)
- Parametric
Gaussian mixture models (GMM)
Single class SVM
Extreme value theory: Tukey outlier labeling
Tree ensemble algorithm : Isolation forest
(used in Credit-swiss)
(F. T. Liu, et al., Isolation Forest, Data Mining, 2008. ICDM’08, Eighth
IEEE International Conference)
from sklearn.ensemble import IsolationForest
Ensemble regressor uses the concept of isolation to explain/separate-
away anomalies
● No point based distance calculation
● Instead I.F. builds an ensemble of random trees for a given data set
and anomalies are points with the shortest average path length
Fraud detection ML
Unsupervised learning for anomaly
detection (conclusion)
● Unsupervised learning is all about finding structure in data
● Techniques: Clustering (K-means, spectral clustering)
● Principle Components Analysis
● Support Vector machine
● Autoencoder Deep Neural Networks : DNN
autoencoder anomaly detection (exotic)
● Filtering, Sequential Bayesian Filtering
● Gaussian mixture Model clustering via EM
● LightGBM (Exotic)
Fraud detection ML

More Related Content

PDF
Machine Learning for Fraud Detection
PDF
Fraud detection with Machine Learning
PPTX
Machine Learning in Banking
PDF
Adaptive Machine Learning for Credit Card Fraud Detection
PPTX
Credit Card Fraudulent Transaction Detection Research Paper
PPTX
Credit card fraud detection
PPTX
Credit Card Fraud Detection
Machine Learning for Fraud Detection
Fraud detection with Machine Learning
Machine Learning in Banking
Adaptive Machine Learning for Credit Card Fraud Detection
Credit Card Fraudulent Transaction Detection Research Paper
Credit card fraud detection
Credit Card Fraud Detection

What's hot (20)

PDF
Credit Card Fraud Detection Using ML In Databricks
PDF
Is Machine learning useful for Fraud Prevention?
PPTX
Real-Time Fraud Detection in Payment Transactions
PPTX
Credit card fraud dection
PPTX
Credit card fraud detection using machine learning Algorithms
PPTX
Credit card fraud detection methods using Data-mining.pptx (2)
PDF
Credit card payment_fraud_detection
PDF
Credit Card Fraud Detection Tutorial
PDF
Detecting fraud with Python and machine learning
PDF
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
PPTX
Credit card fraud detection using python machine learning
PDF
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
PPTX
Anomaly Detection Technique
PDF
A Survey of Online Credit Card Fraud Detection using Data Mining Techniques
PPTX
CREDIT CARD FRAUD DETECTION
PPTX
Anomaly detection
PDF
A Study on Credit Card Fraud Detection using Machine Learning
PPTX
Machine learning in Cyber Security
PDF
Anomaly detection
PDF
Understanding Bagging and Boosting
Credit Card Fraud Detection Using ML In Databricks
Is Machine learning useful for Fraud Prevention?
Real-Time Fraud Detection in Payment Transactions
Credit card fraud dection
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection methods using Data-mining.pptx (2)
Credit card payment_fraud_detection
Credit Card Fraud Detection Tutorial
Detecting fraud with Python and machine learning
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit card fraud detection using python machine learning
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Anomaly Detection Technique
A Survey of Online Credit Card Fraud Detection using Data Mining Techniques
CREDIT CARD FRAUD DETECTION
Anomaly detection
A Study on Credit Card Fraud Detection using Machine Learning
Machine learning in Cyber Security
Anomaly detection
Understanding Bagging and Boosting
Ad

Similar to Fraud detection ML (20)

PPTX
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
PPTX
Big Data & Machine Learning - TDC2013 Sao Paulo
PDF
IRJET - Fraud Detection in Credit Card using Machine Learning Techniques
PDF
Analysis on different Data mining Techniques and algorithms used in IOT
PPT
Eick/Alpaydin Introduction
PPT
2011 02-04 - d sallier - prévision probabiliste
PDF
Credit Card Fraud Detection Using Machine Learning & Data Science
PDF
Credit Card Fraud Detection Using Machine Learning & Data Science
PPT
Ml topic1 a
PDF
IRJET- Credit Card Fraud Detection Analysis
PPT
Machine Learning
ODP
Planning for power systems
PDF
IRJET - Cardless ATM
PPTX
Machine Learning - Challenges, Learnings & Opportunities
PPTX
Introduction_to_MAchine_Learning_Advance.pptx
PDF
Why am I doing this???
PDF
Log Message Anomaly Detection with Oversampling
PPT
ML-Topic1A.ppteeweqeqeqeqeqeqwewqqwwqeeqeqw
PPT
Machine Learning basics with simple .ppt
PPTX
Cerdit card
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 Sao Paulo
IRJET - Fraud Detection in Credit Card using Machine Learning Techniques
Analysis on different Data mining Techniques and algorithms used in IOT
Eick/Alpaydin Introduction
2011 02-04 - d sallier - prévision probabiliste
Credit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data Science
Ml topic1 a
IRJET- Credit Card Fraud Detection Analysis
Machine Learning
Planning for power systems
IRJET - Cardless ATM
Machine Learning - Challenges, Learnings & Opportunities
Introduction_to_MAchine_Learning_Advance.pptx
Why am I doing this???
Log Message Anomaly Detection with Oversampling
ML-Topic1A.ppteeweqeqeqeqeqeqwewqqwwqeeqeqw
Machine Learning basics with simple .ppt
Cerdit card
Ad

Recently uploaded (20)

PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Geodesy 1.pptx...............................................
PPT
Mechanical Engineering MATERIALS Selection
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Digital Logic Computer Design lecture notes
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Lecture Notes Electrical Wiring System Components
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Internet of Things (IOT) - A guide to understanding
Geodesy 1.pptx...............................................
Mechanical Engineering MATERIALS Selection
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Operating System & Kernel Study Guide-1 - converted.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Foundation to blockchain - A guide to Blockchain Tech
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Digital Logic Computer Design lecture notes
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
R24 SURVEYING LAB MANUAL for civil enggi
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
OOP with Java - Java Introduction (Basics)
Embodied AI: Ushering in the Next Era of Intelligent Systems
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Lecture Notes Electrical Wiring System Components

Fraud detection ML

  • 1. Anomaly and fraud detection with Machine Learning Ahmed Rebai Esprit Prépa & Esprit School of Business
  • 2. Plan ● Anomaly/Fraud detection with machine learning/Deep learning ● Outlier detection general applications ● Applications to financial sector (banking, insurance): Towards Fintech ● Supervised vs. Unsupervised ● Algorithms and methods ● Conclusions Ensemble method (Isolation forest) and Density method (DBSCAN/HDBSCAN)
  • 3. Anomaly/Fraud detection with machine learning/Deep learning ● Machine learning : data mining, predictive analytics, artificial intelligence (in practice : statistics and numerical methods for statistical analysis) ● Anomaly: deviation from “normal”/”expected” Anomaly=Outlier=Deviant or Unusual Data Point ● Anomaly detection : detection of outlier events or observations: Detecting deviations from the expected pattern of a data set. The real challenge in anomaly detection is to construct the right data model to separate outliers from noise and normal data.
  • 4. In one dimension: what anomalies look like?
  • 5. Real Example: Stock market : The May 6, 2010, Flash Crash at 2:45 pm
  • 6. Real Example from the Stock market : A real financial fraud! ● On April 21, 2015, nearly five years after the incident, the U.S. Department of Justice laid "22 criminal counts, including fraud and market manipulation" against Navinder Singh Sarao, a trader. Among the charges included was the use of spoofing algorithms; just prior to the Flash Crash, he placed thousands of E-mini S&P 500 stock index futures contracts which he planned on canceling later.
  • 7. Stock market : Flash Crash continues
  • 8. In two dimension: what anomalies look like? Results with DBSCAN algorithm from sklearn.cluster
  • 9. In two dimension: what anomalies look like? Results with 2D PCA method from sklear.decomposition also you can do kernel PCA
  • 10. January 28 1986 Challenger spatial mission: Dalal & al., 1989 Journal of the American Statistical Association Real Example: Rocket science
  • 11. In two dimension: what anomalies look like?
  • 12. In two dimension: what anomalies look like?
  • 13. Using Logarithmic, log-log plots to display outliers can help pyplot.hist can "log" y axis for you with keyword argument log=True Example : plt.hist(numpy.log10(data), log=True) Linear histogram Frequency vs log10 Price Log-log histogram log10 freq vs log10 price
  • 14. Using Logarithmic, log-log plots to display outliers can help Linear plot Log-log plot matplotlib.pyplot.loglog
  • 15. Outlier detection general applications ● Cyber security: Detect cyber-attacks on networks: more trusted connections ● Equipment failure (risk theory/théorie de la fiabilité) industrial sector ● Fraud detection ● Detecting cheaters in mobile gaming ● Preprocessing task for analysis or machine learning ● Reduce false declines then grow the revenue ● Detecting French regions where Le Pen’s scores at the 2017 presidential election deviate from predictions based on socio-economic variables (towardsdatascience)
  • 16. Applications to financial sector (banking, insurance): Towards Fintech ● Retail bank: Credit card fraud ● Private bank: Market abuse, Anti-money laundering ● Investment bank: Market abuse (Flash Crash), Anti-money laundering ● Insurance companies: Fraudulent operations ● Central banks: detecting tax havens (panama papers) Requires different approaches because Red flags are banking-type specific (specific business expertise)
  • 18. Supervised vs. Unsupervised ● In credit card fraud detection one knows the target variable. How? Customers tell us. Can use supervised approach because the true class is self-revealing. ● In market abuse or money laundering detection we don’t really know classes (how money is laundered changes all the time and by the same criminal organisation)
  • 19. Why supervised learning is difficult? ● Severe class imbalance : we estimate that 99.9% are trusted operations vs. 0.1% of fraudulent operations ● Problem during the train test split ● Solution you can use the stratification option in the train_test_split function of sklearn
  • 20. And now why unsupervised is difficult? ● Sever class overlap : money laundering is mixed with legal financial activity, especially in Investment banks ● Uncertainty around the data model ● The complexity of data ● The huge volume of data (time complexity) ● Next we will discuss the data models ● To avoid time complexity: use dask, numba (used to speed numpy), ray and the newly module modin (used to speed pandas) + demonstration
  • 21. Algorithms ● We can differentiate between three methods: - Distance based algorithms (similarity) K-NN for classification K-Means for clustering - Density-based (fitting a density) DBSCAN and HDBSCAN Local outlier factor (LOF) - Parametric Gaussian mixture models (GMM) Single class SVM Extreme value theory: Tukey outlier labeling
  • 22. Tree ensemble algorithm : Isolation forest (used in Credit-swiss) (F. T. Liu, et al., Isolation Forest, Data Mining, 2008. ICDM’08, Eighth IEEE International Conference) from sklearn.ensemble import IsolationForest Ensemble regressor uses the concept of isolation to explain/separate- away anomalies ● No point based distance calculation ● Instead I.F. builds an ensemble of random trees for a given data set and anomalies are points with the shortest average path length
  • 24. Unsupervised learning for anomaly detection (conclusion) ● Unsupervised learning is all about finding structure in data ● Techniques: Clustering (K-means, spectral clustering) ● Principle Components Analysis ● Support Vector machine ● Autoencoder Deep Neural Networks : DNN autoencoder anomaly detection (exotic) ● Filtering, Sequential Bayesian Filtering ● Gaussian mixture Model clustering via EM ● LightGBM (Exotic)