SlideShare a Scribd company logo
2
Most read
7
Most read
10
Most read
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Fraud Detection
By: Aanchal Chhiba
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Problem Statement
This project aims to enhance the accuracy of detecting fraud in mobile financial
transactions. By leveraging machine learning, the project seeks to predict fraudulent
transactions with high precision. The goal is to develop a robust machine learning
model to accurately identify fraudulent transactions in real-time , enabling the
company to improve security, reduce financial losses, and gain insights into factors
contributing to transaction fraud.
Objective: Minimize Fraudulent Transactions
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Data Overview
• Approximately 89.8% of the transactions are non-fraudulent, while
10.2% are fraudulent.
• The dataset contains 11,142 entries with 10 columns.
Attributes:
• step: Time step of the transaction.
• type: Type of transaction (e.g., TRANSFER, CASH_OUT).
• amount: The amount of the transaction.
• nameOrig: The customer initiating the transaction.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
DATA OVERVIEW Cont’d
• oldbalanceOrg: The initial balance of the customer before the
transaction.
• newbalanceOrig: The balance of the customer after the transaction.
• nameDest: The recipient customer.
• oldbalanceDest: The initial balance of the recipient before the
transaction.
• newbalanceDest: The balance of the recipient after the transaction.
• isFraud: Indicates whether the transaction was fraudulent (1 for fraud, 0
for non-fraud).
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
MACHINE LEARNING PIPELINE
New User
ML
Model
Fraud
Non-Fraud
ACTION
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Exploratory Data Analysis (EDA)
KEY INSIGHTS
1. Class Imbalance
Approximately 89.8% of the transactions
are non-fraudulent, while 10.2% are
fraudulent. This indicates a significant
class imbalance in the dataset, which is
important to consider for modeling and
evaluation.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
EDA Cont’d
2. Distribution of Transaction Types
• The most common transaction types are
PAYMENT (5,510 transactions), followed
by CASH_IN (1,951 transactions) and
CASH_OUT (1,871 transactions).
• TRANSFER transactions (1,464) are less
frequent but are more likely to be
involved in fraudulent activities.
• DEBIT transactions (346) are the least
common.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
EDA Cont’d
3. Correlation Analysis
• 'oldbalanceOrg' and 'newbalanceOrig' have a very
strong positive correlation (almost 1). This is expected
as the new balance is derived from the old balance.
• Similarly, 'oldbalanceDest' and 'newbalanceDest' show
a strong positive correlation.
• 'amount' and 'oldbalanceOrg' have a moderate
negative correlation, as do 'amount' and
'newbalanceOrig'. This suggests that larger transaction
amounts tend to be associated with lower balances in
the originating account
• 'amount' and 'isFraud' have a weak positive
correlation. This indicates that fraudulent transactions
might involve slightly higher amounts on average.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Data Preprocessing
1. Handling Missing Values
There are no missing values in this dataset.
2. One Hot Encoding
We have performed one-hot encoding on the 'type' column.
• This function converts the categorical 'type' column into numerical features.
• It creates new columns for each unique value in the 'type' column, and assigns a 1 if the row has that
value, and 0 otherwise.
3. Feature Scaling
As my dataset contains some imbalance, I have applied feature scaling to standardize the data.
Specifically, I used the Standard Scaler, which transforms the data so that the mean is 0 and the
standard deviation is 1.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
DATA PREPROCESSING
4. Feature Engineering
• Balance Differences and Account Status: Created features like 'balanceDiffOrig' and
'balanceDiffDest' to capture changes in balances for origin and destination accounts, and indicators
like 'originAccountEmpty' and 'destAccountEmpty' to identify empty accounts after transactions,
helping to detect fraud-related patterns.
• Transaction Type and High-Value Indicators: Introduced 'isTransferOrCashout' to focus on high-risk
transaction types (TRANSFER, CASH_OUT) and 'amountAboveThreshold' to flag high-value
transactions, which are more prone to fraud.
• Amount to Balance Ratios: Calculated ratios like 'amountToOldBalanceRatio' to understand the
transaction size relative to the account balance, offering insights into unusual transaction behaviors
that may indicate fraud.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Machine Learning Model
Evaluation
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
1. Model Performance Overview:
The Random Forest model achieved high accuracy and a strong AUC-ROC
score, indicating good performance in distinguishing between fraudulent
and non-fraudulent transactions.
2. Classification Metrics:
• Precision, Recall, and F1-Score are balanced, showing the model is
effective in minimizing both false positives and false negatives.
• A high Recall (Sensitivity) indicates the model is good at identifying
actual fraudulent transactions.
3. Confusion Matrix Insights:
The confusion matrix reveals a low number of false positives and false
negatives, which is crucial in fraud detection to avoid incorrect fraud alerts
and missed fraud cases.
4. ROC AUC Curve:
The ROC curve for the Random Forest classifier shows that the classifier
has a high true positive rate and a low false positive rate. This indicates
that the classifier is good at identifying fraudulent transactions without
falsely flagging too many non-fraudulent transactions.
Random Forest
Classification
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
1. Model Performance Overview:
The Gradient Boosting model, using XGBoost, demonstrated strong
predictive power with a high AUC-ROC score, indicating good
discrimination between fraudulent and non-fraudulent transactions.
2.Classification Metrics:
• Precision and Recall scores are balanced, reflecting that the model is
effective in correctly identifying fraud while minimizing false alarms.
• The F1-Score shows a good balance between precision and recall,
especially critical in fraud detection.
3. Confusion Matrix Insights:
The confusion matrix shows a low number of false positives and false
negatives, indicating the model's robustness in identifying both
fraudulent and legitimate transactions accurately.
• ROC AUC Curve
The ROC curve and AUC score highlight the model's ability to differentiate
between fraud and non-fraud transactions, with a high AUC indicating
excellent performance.
Gradient BOOSTING
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
1. Model Performance Overview:
The SVM model provided a balanced approach to detecting fraudulent transactions,
showing good overall accuracy and AUC-ROC scores, which indicates effective
separation between fraud and non-fraud cases.
2. Classification Metrics:
• Precision: The model maintained a high precision score, minimizing the number of
false positives.
• Recall: High recall is crucial for fraud detection as it ensures most fraudulent cases
are correctly identified.
• F1-Score: A good F1-Score balances precision and recall, demonstrating that the
model is robust in detecting fraud without overly penalizing non-fraud
transactions.
3. Confusion Matrix Insights:
• The confusion matrix illustrates a lower number of false positives and false
negatives, highlighting the model's effectiveness in correctly identifying both
fraudulent and legitimate transactions.
4. ROC-AUC Curve:
The AUC-ROC curve shows a strong ability of the SVM model to discriminate between
fraudulent and non-fraudulent transactions. A high AUC score further validates the
model's performance.
SUPPORT VECTOR
MACHINE (SVM)
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Model Comparison
• Based on the comparison, it appears that the Gradient
Boosting model generally outperforms the other
models in terms of accuracy, F1-score, precision and
recall. It also has a high ROC AUC score, indicating its
strong ability to distinguish between classes.
• Random Forest Classifier and SVM have also shown
good performance.
• In this scenario, recall is the most important metric. In
fraud detection, it's crucial to minimize false negatives.
A false negative occurs when the model fails to identify
a fraudulent transaction. This can lead to significant
financial losses. Therefore, recall is preferred because it
measures the model's ability to correctly identify all
positive instances (fraudulent transactions). A high
recall value indicates that the model is effectively
capturing most of the fraudulent transaction.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Questions ?
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Thank You!

More Related Content

PPTX
Revolutionizing Fraud Detection: Innovative Strategies for Securing Transactions
PPTX
Fraud Analysis of Transactions: Identifying and Mitigating Risks
PPTX
Fraud Detection: Harnessing Data Science for Securing Transactions
PPTX
Fraud Detection in Cybersecurity: Advanced Techniques for Safeguarding Digita...
PPTX
Fortifying Fraud Detection: Advanced Data Analysis Techniques for Enhanced Se...
PPTX
Fraud Detection: Innovative Approaches to Safeguarding Integrity
PPTX
Financial Fraud Detection: Identifying and Preventing Financial Fraud
PPTX
Innovative Approaches to Fraud Detection: Advanced Techniques and Best Practices
Revolutionizing Fraud Detection: Innovative Strategies for Securing Transactions
Fraud Analysis of Transactions: Identifying and Mitigating Risks
Fraud Detection: Harnessing Data Science for Securing Transactions
Fraud Detection in Cybersecurity: Advanced Techniques for Safeguarding Digita...
Fortifying Fraud Detection: Advanced Data Analysis Techniques for Enhanced Se...
Fraud Detection: Innovative Approaches to Safeguarding Integrity
Financial Fraud Detection: Identifying and Preventing Financial Fraud
Innovative Approaches to Fraud Detection: Advanced Techniques and Best Practices

Similar to Revolutionizing Fraud Detection: Innovative Strategies for Securing Transactions (20)

PPTX
Optimizing Digital Marketing Success: Conversion Prediction Techniques
PPTX
Credit Card Fraud Detection Presentation
PPTX
Predict Your Profits: Optimizing Ad Campaigns with Data-Driven Insights
PPTX
Online fraud prediction and prevention.pptx
DOCX
credit card fraud analysis using predictive modeling python project abstract
PPTX
E-Commerce Customer Segmentation and Behavior Prediction: A Data-Driven Strategy
PPTX
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
PPTX
Detecting Credit Card Fraud: An AI-driven Approach
PPTX
Predicting Insurance Responses: Leveraging Data Science for Better Outcomes
PPTX
[DSC Europe 22] Anti-Money Laundering ML Modeling approach - Gizem Akar
PPTX
Credit card fraud dection
PDF
Artificial Intelligence in Banking
PDF
Artificial Intelligence in Banking
PDF
Disruptive technologies - Session 2 - Blockchain smart_contracts
PPTX
"Ecommerce Customer Segmentation & Prediction: Enhancing Business Strategies ...
PPTX
chanakya ppt for fraud detection in financial transaction
PPTX
Data Analytics and fraud detection DAFD_unit_1_9july.pptx
PDF
A Bayesian Probit Online Model Framework for Auction Fraud Detection
PPTX
Credit Card Usage Segmentation: A Data-Driven Approach to Customer Insights
PPTX
Credit Card Fraud Detection_ Mansi_Choudhary.pptx
Optimizing Digital Marketing Success: Conversion Prediction Techniques
Credit Card Fraud Detection Presentation
Predict Your Profits: Optimizing Ad Campaigns with Data-Driven Insights
Online fraud prediction and prevention.pptx
credit card fraud analysis using predictive modeling python project abstract
E-Commerce Customer Segmentation and Behavior Prediction: A Data-Driven Strategy
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Detecting Credit Card Fraud: An AI-driven Approach
Predicting Insurance Responses: Leveraging Data Science for Better Outcomes
[DSC Europe 22] Anti-Money Laundering ML Modeling approach - Gizem Akar
Credit card fraud dection
Artificial Intelligence in Banking
Artificial Intelligence in Banking
Disruptive technologies - Session 2 - Blockchain smart_contracts
"Ecommerce Customer Segmentation & Prediction: Enhancing Business Strategies ...
chanakya ppt for fraud detection in financial transaction
Data Analytics and fraud detection DAFD_unit_1_9july.pptx
A Bayesian Probit Online Model Framework for Auction Fraud Detection
Credit Card Usage Segmentation: A Data-Driven Approach to Customer Insights
Credit Card Fraud Detection_ Mansi_Choudhary.pptx
Ad

More from Boston Institute of Analytics (20)

PPTX
"Predicting Employee Retention: A Data-Driven Approach to Enhancing Workforce...
PPTX
Music Recommendation System: A Data Science Project for Personalized Listenin...
PPTX
Mental Wellness Analyzer: Leveraging Data for Better Mental Health Insights -...
PPTX
Suddala-Scan: Enhancing Website Analysis with AI for Capstone Project at Bost...
PPTX
Enhancing Brand Presence Through Social Media Marketing: A Strategic Approach...
PPTX
Employee Retention Prediction: Leveraging Data for Workforce Stability
PPTX
Predicting Movie Success: Unveiling Box Office Potential with Data Analytics
PPTX
Smart Driver Alert: Predictive Fatigue Detection Technology
PPTX
Smart Driver Alert: Predictive Fatigue Detection Technology
PPTX
E-Commerce Customer Segmentation and Prediction: Unlocking Insights for Smart...
PPTX
Predictive Maintenance: Revolutionizing Vehicle Care with Demographic and Sen...
PPTX
Smart Driver Alert: Revolutionizing Road Safety with Predictive Fatigue Detec...
PDF
Water Potability Prediction: Ensuring Safe and Clean Water
PDF
Developing a Training Program for Employee Skill Enhancement
PPTX
Website Scanning: Uncovering Vulnerabilities and Ensuring Cybersecurity
PPTX
Analyzing Open Ports on Websites: Functions, Benefits, Threats, and Detailed ...
PPTX
Designing a Simple Python Tool for Website Vulnerability Scanning
PPTX
Building a Simple Python-Based Website Vulnerability Scanner
PPTX
Cybersecurity and Ethical Hacking: Capstone Project
PPTX
Website Port Scanning: Functions, Benefits, and Threats of Open Ports
"Predicting Employee Retention: A Data-Driven Approach to Enhancing Workforce...
Music Recommendation System: A Data Science Project for Personalized Listenin...
Mental Wellness Analyzer: Leveraging Data for Better Mental Health Insights -...
Suddala-Scan: Enhancing Website Analysis with AI for Capstone Project at Bost...
Enhancing Brand Presence Through Social Media Marketing: A Strategic Approach...
Employee Retention Prediction: Leveraging Data for Workforce Stability
Predicting Movie Success: Unveiling Box Office Potential with Data Analytics
Smart Driver Alert: Predictive Fatigue Detection Technology
Smart Driver Alert: Predictive Fatigue Detection Technology
E-Commerce Customer Segmentation and Prediction: Unlocking Insights for Smart...
Predictive Maintenance: Revolutionizing Vehicle Care with Demographic and Sen...
Smart Driver Alert: Revolutionizing Road Safety with Predictive Fatigue Detec...
Water Potability Prediction: Ensuring Safe and Clean Water
Developing a Training Program for Employee Skill Enhancement
Website Scanning: Uncovering Vulnerabilities and Ensuring Cybersecurity
Analyzing Open Ports on Websites: Functions, Benefits, Threats, and Detailed ...
Designing a Simple Python Tool for Website Vulnerability Scanning
Building a Simple Python-Based Website Vulnerability Scanner
Cybersecurity and Ethical Hacking: Capstone Project
Website Port Scanning: Functions, Benefits, and Threats of Open Ports
Ad

Recently uploaded (20)

PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Transcultural that can help you someday.
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Introduction to the R Programming Language
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPT
DATA COLLECTION METHODS-ppt for nursing research
PPTX
Leprosy and NLEP programme community medicine
PDF
Microsoft Core Cloud Services powerpoint
PPTX
Modelling in Business Intelligence , information system
PDF
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
How to run a consulting project- client discovery
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Transcultural that can help you someday.
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to the R Programming Language
[EN] Industrial Machine Downtime Prediction
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
ISS -ESG Data flows What is ESG and HowHow
climate analysis of Dhaka ,Banglades.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
DATA COLLECTION METHODS-ppt for nursing research
Leprosy and NLEP programme community medicine
Microsoft Core Cloud Services powerpoint
Modelling in Business Intelligence , information system
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
How to run a consulting project- client discovery
Topic 5 Presentation 5 Lesson 5 Corporate Fin
importance of Data-Visualization-in-Data-Science. for mba studnts

Revolutionizing Fraud Detection: Innovative Strategies for Securing Transactions

  • 1. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Fraud Detection By: Aanchal Chhiba
  • 2. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Problem Statement This project aims to enhance the accuracy of detecting fraud in mobile financial transactions. By leveraging machine learning, the project seeks to predict fraudulent transactions with high precision. The goal is to develop a robust machine learning model to accurately identify fraudulent transactions in real-time , enabling the company to improve security, reduce financial losses, and gain insights into factors contributing to transaction fraud. Objective: Minimize Fraudulent Transactions
  • 3. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Data Overview • Approximately 89.8% of the transactions are non-fraudulent, while 10.2% are fraudulent. • The dataset contains 11,142 entries with 10 columns. Attributes: • step: Time step of the transaction. • type: Type of transaction (e.g., TRANSFER, CASH_OUT). • amount: The amount of the transaction. • nameOrig: The customer initiating the transaction.
  • 4. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. DATA OVERVIEW Cont’d • oldbalanceOrg: The initial balance of the customer before the transaction. • newbalanceOrig: The balance of the customer after the transaction. • nameDest: The recipient customer. • oldbalanceDest: The initial balance of the recipient before the transaction. • newbalanceDest: The balance of the recipient after the transaction. • isFraud: Indicates whether the transaction was fraudulent (1 for fraud, 0 for non-fraud).
  • 5. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. MACHINE LEARNING PIPELINE New User ML Model Fraud Non-Fraud ACTION
  • 6. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Exploratory Data Analysis (EDA) KEY INSIGHTS 1. Class Imbalance Approximately 89.8% of the transactions are non-fraudulent, while 10.2% are fraudulent. This indicates a significant class imbalance in the dataset, which is important to consider for modeling and evaluation.
  • 7. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. EDA Cont’d 2. Distribution of Transaction Types • The most common transaction types are PAYMENT (5,510 transactions), followed by CASH_IN (1,951 transactions) and CASH_OUT (1,871 transactions). • TRANSFER transactions (1,464) are less frequent but are more likely to be involved in fraudulent activities. • DEBIT transactions (346) are the least common.
  • 8. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. EDA Cont’d 3. Correlation Analysis • 'oldbalanceOrg' and 'newbalanceOrig' have a very strong positive correlation (almost 1). This is expected as the new balance is derived from the old balance. • Similarly, 'oldbalanceDest' and 'newbalanceDest' show a strong positive correlation. • 'amount' and 'oldbalanceOrg' have a moderate negative correlation, as do 'amount' and 'newbalanceOrig'. This suggests that larger transaction amounts tend to be associated with lower balances in the originating account • 'amount' and 'isFraud' have a weak positive correlation. This indicates that fraudulent transactions might involve slightly higher amounts on average.
  • 9. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Data Preprocessing 1. Handling Missing Values There are no missing values in this dataset. 2. One Hot Encoding We have performed one-hot encoding on the 'type' column. • This function converts the categorical 'type' column into numerical features. • It creates new columns for each unique value in the 'type' column, and assigns a 1 if the row has that value, and 0 otherwise. 3. Feature Scaling As my dataset contains some imbalance, I have applied feature scaling to standardize the data. Specifically, I used the Standard Scaler, which transforms the data so that the mean is 0 and the standard deviation is 1.
  • 10. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. DATA PREPROCESSING 4. Feature Engineering • Balance Differences and Account Status: Created features like 'balanceDiffOrig' and 'balanceDiffDest' to capture changes in balances for origin and destination accounts, and indicators like 'originAccountEmpty' and 'destAccountEmpty' to identify empty accounts after transactions, helping to detect fraud-related patterns. • Transaction Type and High-Value Indicators: Introduced 'isTransferOrCashout' to focus on high-risk transaction types (TRANSFER, CASH_OUT) and 'amountAboveThreshold' to flag high-value transactions, which are more prone to fraud. • Amount to Balance Ratios: Calculated ratios like 'amountToOldBalanceRatio' to understand the transaction size relative to the account balance, offering insights into unusual transaction behaviors that may indicate fraud.
  • 11. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Machine Learning Model Evaluation
  • 12. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. 1. Model Performance Overview: The Random Forest model achieved high accuracy and a strong AUC-ROC score, indicating good performance in distinguishing between fraudulent and non-fraudulent transactions. 2. Classification Metrics: • Precision, Recall, and F1-Score are balanced, showing the model is effective in minimizing both false positives and false negatives. • A high Recall (Sensitivity) indicates the model is good at identifying actual fraudulent transactions. 3. Confusion Matrix Insights: The confusion matrix reveals a low number of false positives and false negatives, which is crucial in fraud detection to avoid incorrect fraud alerts and missed fraud cases. 4. ROC AUC Curve: The ROC curve for the Random Forest classifier shows that the classifier has a high true positive rate and a low false positive rate. This indicates that the classifier is good at identifying fraudulent transactions without falsely flagging too many non-fraudulent transactions. Random Forest Classification
  • 13. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. 1. Model Performance Overview: The Gradient Boosting model, using XGBoost, demonstrated strong predictive power with a high AUC-ROC score, indicating good discrimination between fraudulent and non-fraudulent transactions. 2.Classification Metrics: • Precision and Recall scores are balanced, reflecting that the model is effective in correctly identifying fraud while minimizing false alarms. • The F1-Score shows a good balance between precision and recall, especially critical in fraud detection. 3. Confusion Matrix Insights: The confusion matrix shows a low number of false positives and false negatives, indicating the model's robustness in identifying both fraudulent and legitimate transactions accurately. • ROC AUC Curve The ROC curve and AUC score highlight the model's ability to differentiate between fraud and non-fraud transactions, with a high AUC indicating excellent performance. Gradient BOOSTING
  • 14. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. 1. Model Performance Overview: The SVM model provided a balanced approach to detecting fraudulent transactions, showing good overall accuracy and AUC-ROC scores, which indicates effective separation between fraud and non-fraud cases. 2. Classification Metrics: • Precision: The model maintained a high precision score, minimizing the number of false positives. • Recall: High recall is crucial for fraud detection as it ensures most fraudulent cases are correctly identified. • F1-Score: A good F1-Score balances precision and recall, demonstrating that the model is robust in detecting fraud without overly penalizing non-fraud transactions. 3. Confusion Matrix Insights: • The confusion matrix illustrates a lower number of false positives and false negatives, highlighting the model's effectiveness in correctly identifying both fraudulent and legitimate transactions. 4. ROC-AUC Curve: The AUC-ROC curve shows a strong ability of the SVM model to discriminate between fraudulent and non-fraudulent transactions. A high AUC score further validates the model's performance. SUPPORT VECTOR MACHINE (SVM)
  • 15. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Model Comparison • Based on the comparison, it appears that the Gradient Boosting model generally outperforms the other models in terms of accuracy, F1-score, precision and recall. It also has a high ROC AUC score, indicating its strong ability to distinguish between classes. • Random Forest Classifier and SVM have also shown good performance. • In this scenario, recall is the most important metric. In fraud detection, it's crucial to minimize false negatives. A false negative occurs when the model fails to identify a fraudulent transaction. This can lead to significant financial losses. Therefore, recall is preferred because it measures the model's ability to correctly identify all positive instances (fraudulent transactions). A high recall value indicates that the model is effectively capturing most of the fraudulent transaction.
  • 16. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Questions ?
  • 17. CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Thank You!