Revolutionizing Fraud Detection: Innovative Strategies for Securing Transactions

CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Fraud Detection
By: Aanchal Chhiba

Problem Statement
This project aims to enhance the accuracy of detecting fraud in mobile financial
transactions. By leveraging machine learning, the project seeks to predict fraudulent
transactions with high precision. The goal is to develop a robust machine learning
model to accurately identify fraudulent transactions in real-time , enabling the
company to improve security, reduce financial losses, and gain insights into factors
contributing to transaction fraud.
Objective: Minimize Fraudulent Transactions

Data Overview
• Approximately 89.8% of the transactions are non-fraudulent, while
10.2% are fraudulent.
• The dataset contains 11,142 entries with 10 columns.
Attributes:
• step: Time step of the transaction.
• type: Type of transaction (e.g., TRANSFER, CASH_OUT).
• amount: The amount of the transaction.
• nameOrig: The customer initiating the transaction.

DATA OVERVIEW Cont’d
• oldbalanceOrg: The initial balance of the customer before the
transaction.
• newbalanceOrig: The balance of the customer after the transaction.
• nameDest: The recipient customer.
• oldbalanceDest: The initial balance of the recipient before the
transaction.
• newbalanceDest: The balance of the recipient after the transaction.
• isFraud: Indicates whether the transaction was fraudulent (1 for fraud, 0
for non-fraud).

MACHINE LEARNING PIPELINE
New User
ML
Model
Fraud
Non-Fraud
ACTION

Exploratory Data Analysis (EDA)
KEY INSIGHTS
1. Class Imbalance
Approximately 89.8% of the transactions
are non-fraudulent, while 10.2% are
fraudulent. This indicates a significant
class imbalance in the dataset, which is
important to consider for modeling and
evaluation.

EDA Cont’d
2. Distribution of Transaction Types
• The most common transaction types are
PAYMENT (5,510 transactions), followed
by CASH_IN (1,951 transactions) and
CASH_OUT (1,871 transactions).
• TRANSFER transactions (1,464) are less
frequent but are more likely to be
involved in fraudulent activities.
• DEBIT transactions (346) are the least
common.

EDA Cont’d
3. Correlation Analysis
• 'oldbalanceOrg' and 'newbalanceOrig' have a very
strong positive correlation (almost 1). This is expected
as the new balance is derived from the old balance.
• Similarly, 'oldbalanceDest' and 'newbalanceDest' show
a strong positive correlation.
• 'amount' and 'oldbalanceOrg' have a moderate
negative correlation, as do 'amount' and
'newbalanceOrig'. This suggests that larger transaction
amounts tend to be associated with lower balances in
the originating account
• 'amount' and 'isFraud' have a weak positive
correlation. This indicates that fraudulent transactions
might involve slightly higher amounts on average.

Data Preprocessing
1. Handling Missing Values
There are no missing values in this dataset.
2. One Hot Encoding
We have performed one-hot encoding on the 'type' column.
• This function converts the categorical 'type' column into numerical features.
• It creates new columns for each unique value in the 'type' column, and assigns a 1 if the row has that
value, and 0 otherwise.
3. Feature Scaling
As my dataset contains some imbalance, I have applied feature scaling to standardize the data.
Specifically, I used the Standard Scaler, which transforms the data so that the mean is 0 and the
standard deviation is 1.

DATA PREPROCESSING
4. Feature Engineering
• Balance Differences and Account Status: Created features like 'balanceDiffOrig' and
'balanceDiffDest' to capture changes in balances for origin and destination accounts, and indicators
like 'originAccountEmpty' and 'destAccountEmpty' to identify empty accounts after transactions,
helping to detect fraud-related patterns.
• Transaction Type and High-Value Indicators: Introduced 'isTransferOrCashout' to focus on high-risk
transaction types (TRANSFER, CASH_OUT) and 'amountAboveThreshold' to flag high-value
transactions, which are more prone to fraud.
• Amount to Balance Ratios: Calculated ratios like 'amountToOldBalanceRatio' to understand the
transaction size relative to the account balance, offering insights into unusual transaction behaviors
that may indicate fraud.

Machine Learning Model
Evaluation

1. Model Performance Overview:
The Random Forest model achieved high accuracy and a strong AUC-ROC
score, indicating good performance in distinguishing between fraudulent
and non-fraudulent transactions.
2. Classification Metrics:
• Precision, Recall, and F1-Score are balanced, showing the model is
effective in minimizing both false positives and false negatives.
• A high Recall (Sensitivity) indicates the model is good at identifying
actual fraudulent transactions.
3. Confusion Matrix Insights:
The confusion matrix reveals a low number of false positives and false
negatives, which is crucial in fraud detection to avoid incorrect fraud alerts
and missed fraud cases.
4. ROC AUC Curve:
The ROC curve for the Random Forest classifier shows that the classifier
has a high true positive rate and a low false positive rate. This indicates
that the classifier is good at identifying fraudulent transactions without
falsely flagging too many non-fraudulent transactions.
Random Forest
Classification

The Gradient Boosting model, using XGBoost, demonstrated strong
predictive power with a high AUC-ROC score, indicating good
discrimination between fraudulent and non-fraudulent transactions.
2.Classification Metrics:
• Precision and Recall scores are balanced, reflecting that the model is
effective in correctly identifying fraud while minimizing false alarms.
• The F1-Score shows a good balance between precision and recall,
especially critical in fraud detection.
The confusion matrix shows a low number of false positives and false
negatives, indicating the model's robustness in identifying both
fraudulent and legitimate transactions accurately.
• ROC AUC Curve
The ROC curve and AUC score highlight the model's ability to differentiate
between fraud and non-fraud transactions, with a high AUC indicating
excellent performance.
Gradient BOOSTING

The SVM model provided a balanced approach to detecting fraudulent transactions,
showing good overall accuracy and AUC-ROC scores, which indicates effective
separation between fraud and non-fraud cases.
2. Classification Metrics:
• Precision: The model maintained a high precision score, minimizing the number of
false positives.
• Recall: High recall is crucial for fraud detection as it ensures most fraudulent cases
are correctly identified.
• F1-Score: A good F1-Score balances precision and recall, demonstrating that the
model is robust in detecting fraud without overly penalizing non-fraud
transactions.
• The confusion matrix illustrates a lower number of false positives and false
negatives, highlighting the model's effectiveness in correctly identifying both
fraudulent and legitimate transactions.
4. ROC-AUC Curve:
The AUC-ROC curve shows a strong ability of the SVM model to discriminate between
fraudulent and non-fraudulent transactions. A high AUC score further validates the
model's performance.
SUPPORT VECTOR
MACHINE (SVM)

Model Comparison
• Based on the comparison, it appears that the Gradient
Boosting model generally outperforms the other
models in terms of accuracy, F1-score, precision and
recall. It also has a high ROC AUC score, indicating its
strong ability to distinguish between classes.
• Random Forest Classifier and SVM have also shown
good performance.
• In this scenario, recall is the most important metric. In
fraud detection, it's crucial to minimize false negatives.
A false negative occurs when the model fails to identify
a fraudulent transaction. This can lead to significant
financial losses. Therefore, recall is preferred because it
measures the model's ability to correctly identify all
positive instances (fraudulent transactions). A high
recall value indicates that the model is effectively
capturing most of the fraudulent transaction.

Questions ?

Thank You!

Revolutionizing Fraud Detection: Innovative Strategies for Securing Transactions

More Related Content

Similar to Revolutionizing Fraud Detection: Innovative Strategies for Securing Transactions (20)

More from Boston Institute of Analytics (20)

Recently uploaded (20)

Revolutionizing Fraud Detection: Innovative Strategies for Securing Transactions