SlideShare a Scribd company logo
Minority Report in Fraud Detection: Classification of Skewed Data Clifton Phua, Damminda Alahakoon, and Vincent Lee SIGKDD 2004 Reporter: Ping-Hua Yang
Abstract This paper propose an innovative fraud detection method to deal with the data mining problem of skewed data distributions. This method uses Back-propagation, together with Naïve Bayesian and C4.5 algorithms on data partitions derived from minority over-sampling with replacement. This paper compares the new fraud detection method against C4.5 trained using under-sampling, over-sampling, and SMOTE without partitioning. The most interesting find is confirming that the combination of classifiers to produce the best cost savings has its contributions from all three algorithms.
Outline  Introduction Fraud detection Experiments Results Discussion Conclusion
Introduction Fraud, or criminal deception, will be a costly problem for many profit organizations. Data mining can minimize some of these losses by making use of the massive collections of customer data. However fraud detection data being highly skewed or imbalanced in the norm. There are two typical way to proceed when faced with this problem.  The first approach is to apply different algorithms. The second approach is to manipulate the class distribution.
Introduction  This paper introduces the new fraud detection method for skewed data. The innovative use of NB, C4.5, and BP classifier to process the same partitioned numerical data has the potential of getting better cost saving. The selection of the best classifier of different algorithms using stacking and the merger of their predictions. One related problem caused by skewed data includes measuring the performance of the classifiers. Success can’t be defined in terms of predictive accuracy because the minority class in the skewed data usually has a significantly higher cost.
Fraud detection Existing fraud detection methods The new fraud detection method Fraud detection algorithms
Existing Fraud detection methods Insurance fraud The hot spot methodology applies a three step process: the k-means for cluster detection, the C4.5 for decision tree rule induction, and domain knowledge , statistical summaries and visualization tools for rule evaluation. [Williams G, Hung Z,1997] Expanded the hot spot architecture to use genetic algorithm to generate rules and to allow the domain user. [Williams G, 1999] Credit Card Fraud The Bayesian Belief Network (BBN) and Artificial Neural Network (ANN) comparison the STAGE algorithm for BBN and BP algorithm for ANN in fraud detection. [Maes S, Tuyls K, Vanschoenwinkel B, Manderick B, 2002]
Existing Fraud detection methods Telecommunications Fraud The advanced security for personal communications technologies (ASPECT) research group focuses on neural networks to train legal current user profiles that store recent user information and user profile histories that store long term information to define normal patterns of use. [Weatherford M, 2002]
The New Fraud detection method The idea is to simulate the book’s Precrime method of precogs and integration mechanism with existing data mining methods and techniques.
Fraud detection algorithms This study provides a slight variation of cross validation.  Instead of using ten data partition, an odd-numbered eleven data partitions. Bagging combines the classifiers trained by the same algorithm using unweighted majority voting on each example or instance. Stacking combines multiple classifiers generated by different algorithms with a meta-classifier. To classify an instance the base classifiers from the tree algorithms present their predictions to the meta-classifier which then makes the final prediction. This paper propose Stacking-bagging which is a hybrid technique. To train the simplest learning algorithm first, followed by the complex ones.
Experiments Data Understanding Cost Model Data Preparation Modeling
Data Understanding The available fraud detection data set in automobile insurance is provided by Angoss KnowledgeSeeker software. This paper split the main data set into a training data set and a scoring data set. The Class labels of the training data are known, and the training data is historical compared to the scoring data. This data set contains 11338 examples from January 1994 to December 1995 (training data), and 4083 instances from January 1996 to December 1996 (scoring data). It has a 6% fraudulent and 94% legitimate distribution The original data set has 6 numerical attributes and 25 categorical attributes
Cost Model This cost model has two assumptions All alters must investigated. The average cost per claim must be higher than the average cost per investigation. In 1996, the average cost per claim for the score data set is approximated at USD$2,640.
Cost Model The evaluation metrics for the predictive models on the score data set to find the optimum cost savings are:
Data preparation In a related study, it is recommended that data partitions should neither be too large for the time complexity of the learning algorithms nor too small to produce poor classifiers. Randomly select different legal examples from the years 1994 and 1995 (10840 legal examples) into eleven sets of y legal examples (923). x fraud examples (615) with a different set of y to form eleven x:y partitions (615:923) with a fraud:legal distribution of 40:60. Other possible distributions are 50:50 (923:923), and 30:70 (396:923). Minority over-sampling with replacement/replication. In rotation, each data partition of a certain distribution is used for training, testing and evaluation. A training data partition is used to come up with a classifier, a test data partition to optimize the classifier’s parameters and an evaluation data to compare the classifier with others.
Data preparation Test  The algorithm trained on partition 1 to generate classifier 1. The algorithm tested on partition 2 to refine the classifier. The algorithm evaluation on partition 3 to assess the expected accuracy of classifier.
M odeling
Modeling In figure 3. Each rectangle represents an experiment. Each circle depicts a comparison of cost savings between experiments. Each bold arrow indicates the best experiment from the comparisons. Decision threshold (except for experiments V and IX) and cost model for these experiments will remain unchanged. Experiment V and IX will produce BP predictions need to be converted into categorical ones using the decision threshold value.
Modeling
Modeling Table 4.  Lists the eleven tests, labeled A to K, which were repeated for each of experiments I to V In other words, there are 55 test in total for experiments I to V. Each test consisted of training, testing, evaluation, and scoring. The score set was the same for all classifiers but the data partitions labeled 1 to 11 were rotated. The overall success rate denotes the ability of an ensemble of classifiers to provide correct predictions. The bagged overall success rates X and Z were compared to averaged overall success rates W and Y.
Modeling Experiments I, II, and III were designed to determine the best training distribution under the cost model. Which one of the above three training distributions is the best for the data partitions under the cost model? Experiment IV and V used the best training distribution determined from comparison 1. Experiment IV and V produce a bagged Z. Experiment VI, VII, and VIII determine which ensemble mechanism produces the best cost savings. Experiment VI used bagging to combine three sets of perditions from each algorithm. Experiment VII used stacking to combine all predictions. Experiment VIII proposed to bag the best classifiers determined by stacking.
Modeling Experiment IX implemented the BP algorithm on unsampled and unpartitioned data. This experiment was then compared with the other six before it. Which one of the above seven different classifier systems will attain the highest cost savings? Experiment X, XI, and XII were constructed to find out how each sampling method performs on unpartitioned data and if they could yield better results than the multiple classifier approach. Experiment XII’s data consists of the same number of examples as XI. But for XII, the minority class used SMOTE. Can the best classifier system perform better than the sampling approaches in the following results section?
Results Table 5 show in experiments I, II, and III, the bagged success rates X outperformed all the averaged success rates W. When applied on the score set, bagged success rate Z performed marginally better than the averaged success rates Y.
Results In figure 4., experiment IV highlights C4.5 as the best learning algorithm for this particular automobile insurance data set. The resultant predictions of experiment VIII (stacking-bagging) were better than those of C4.5 algorithm.
Results
Results  In figure 5, these three experiments performed comparably well at 40:60 and 50:50. Experiment XI and XII substantiate the claims that SMOTE is superior to minority oversampling with replacement. The undersampled data provides the highest cost saving of $165,242 at 60:40, it also incurs the highest expenditure (-$266,529). This is most likely due to the number of legal examples getting very small.
Discussion Table 6 ranks all the experiments using cost savings. Stacking-bagging achieves the highest cost savings which is almost twice the of the conventional BP procedure used by many fraud detection. The optimum success rate is 60% for highest cost savings in this slewed data set and, as the success rate increases, cost savings decrease.
Discussion
Discussion Table 7 illustrates the top fifteen, out of 33 classifiers, produced from stacking.
Conclusion  In this paper, existing fraud detection methods are explored and a new fraud detection method is recommended. The choice of the three classification algorithm and one hybrid meta-learning technique is justified for the new method. To extend the fraud detection method based on Minority Report to find out the properties of a data set, data partition, or data cluster which will make on classifier more appropriate.

More Related Content

PDF
V. pacáková, d. brebera
DOCX
Luis_Ramon_Report.doc
PDF
High-Dimensional Methods: Examples for Inference on Structural Effects
PDF
2012 predictive clusters
PDF
DataMining_CA2-4
PDF
Handling Imbalanced Data: SMOTE vs. Random Undersampling
PDF
Class imbalance problem1
DOCX
MATH 533 Entire Course NEW
V. pacáková, d. brebera
Luis_Ramon_Report.doc
High-Dimensional Methods: Examples for Inference on Structural Effects
2012 predictive clusters
DataMining_CA2-4
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Class imbalance problem1
MATH 533 Entire Course NEW

What's hot (19)

PDF
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
DOCX
MATH 533 Education Specialist / snaptutorial.com
PDF
Probability density estimation using Product of Conditional Experts
PDF
Multi-Cluster Based Approach for skewed Data in Data Mining
PDF
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
DOCX
FSRM 582 Project
PDF
Econometrics of High-Dimensional Sparse Models
PDF
Spectral Element Methods in Large Eddy Simulation
PDF
Prediction model of algal blooms using logistic regression and confusion matrix
PPTX
Multiclass classification of imbalanced data
PPTX
Use of Definitive Screening Designs to Optimize an Analytical Method
PPTX
Modeling strategies for definitive screening designs using jmp and r
PDF
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
PDF
Simulation Study of Hurdle Model Performance on Zero Inflated Count Data
PDF
A Combined Approach for Feature Subset Selection and Size Reduction for High ...
PDF
Non-parametric analysis of models and data
PDF
JEDM_RR_JF_Final
DOCX
MATH 533 RANK Achievement Education--math533rank.com
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
MATH 533 Education Specialist / snaptutorial.com
Probability density estimation using Product of Conditional Experts
Multi-Cluster Based Approach for skewed Data in Data Mining
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
FSRM 582 Project
Econometrics of High-Dimensional Sparse Models
Spectral Element Methods in Large Eddy Simulation
Prediction model of algal blooms using logistic regression and confusion matrix
Multiclass classification of imbalanced data
Use of Definitive Screening Designs to Optimize an Analytical Method
Modeling strategies for definitive screening designs using jmp and r
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
Simulation Study of Hurdle Model Performance on Zero Inflated Count Data
A Combined Approach for Feature Subset Selection and Size Reduction for High ...
Non-parametric analysis of models and data
JEDM_RR_JF_Final
MATH 533 RANK Achievement Education--math533rank.com
Ad

Viewers also liked (20)

PPT
Bridgestreet Worldwide Corporate/Temporary Housing
PPT
2008peno1milieu
PPTX
Profesionales 2.0. Los nuevos empleos TIC
ODP
Luna de avellaneda
PDF
Detecting fraud in cellular telephone networks
PPT
Fraud Detector - The easy-to-customize, high ROI, IT solution for detecting ...
PPT
Detecting Frauds
PDF
Detecting Corporate Fraud: Tips from a Crook and a Sleuth by Roddy Boyd and S...
PDF
Graphes et détection de fraude : exemple de l'assurance
PDF
Insurance fraud through collusion - Pierre Picard
PDF
Détection de profils, application en santé et en économétrie geissler
PDF
Scalable Prediction Services with R
PDF
Urso construction fraud
PDF
Building Scalable Prediction Services in R
PPTX
R at Microsoft
PDF
Detecting fraud with Python and machine learning
PDF
Machine Learning for Fraud Detection
PPTX
Fraud Detection Architecture
PPT
Telecom Fraud Detection - Naive Bayes Classification
PDF
Detecting Fraud Using Data Mining Techniques
Bridgestreet Worldwide Corporate/Temporary Housing
2008peno1milieu
Profesionales 2.0. Los nuevos empleos TIC
Luna de avellaneda
Detecting fraud in cellular telephone networks
Fraud Detector - The easy-to-customize, high ROI, IT solution for detecting ...
Detecting Frauds
Detecting Corporate Fraud: Tips from a Crook and a Sleuth by Roddy Boyd and S...
Graphes et détection de fraude : exemple de l'assurance
Insurance fraud through collusion - Pierre Picard
Détection de profils, application en santé et en économétrie geissler
Scalable Prediction Services with R
Urso construction fraud
Building Scalable Prediction Services in R
R at Microsoft
Detecting fraud with Python and machine learning
Machine Learning for Fraud Detection
Fraud Detection Architecture
Telecom Fraud Detection - Naive Bayes Classification
Detecting Fraud Using Data Mining Techniques
Ad

Similar to 11/04 Regular Meeting: Monority Report in Fraud Detection Classification of Skewed Data (20)

PDF
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
PDF
Analysis of Common Supervised Learning Algorithms Through Application
PDF
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
PDF
Predicting automobile insurance fraud using classical and machine learning mo...
DOC
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
PDF
Analysis of Common Supervised Learning Algorithms Through Application
PDF
Accounting for variance in machine learning benchmarks
PPTX
PPT for ensembled techniques used for smoke detection
PDF
Building_a_Readmission_Model_Using_WEKA
PDF
Detecting fraudulent financial statement under imbalanced data using neural n...
PDF
C0413016018
PDF
A MODEL-BASED APPROACH MACHINE LEARNING TO SCALABLE PORTFOLIO SELECTION
PDF
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
PDF
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
PPTX
PPT_on ensemble technology in machine learning in hybrid mode.pptx
PPTX
machine learning classification algorithm on ensemble technology.pptx
PDF
Machine learning for sanctions screening
PDF
Automobile Insurance Claim Fraud Detection
PDF
Empirical analysis of ensemble methods for the classification of robocalls in...
PDF
Classification of Breast Cancer Diseases using Data Mining Techniques
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Analysis of Common Supervised Learning Algorithms Through Application
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
Predicting automobile insurance fraud using classical and machine learning mo...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
Analysis of Common Supervised Learning Algorithms Through Application
Accounting for variance in machine learning benchmarks
PPT for ensembled techniques used for smoke detection
Building_a_Readmission_Model_Using_WEKA
Detecting fraudulent financial statement under imbalanced data using neural n...
C0413016018
A MODEL-BASED APPROACH MACHINE LEARNING TO SCALABLE PORTFOLIO SELECTION
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
PPT_on ensemble technology in machine learning in hybrid mode.pptx
machine learning classification algorithm on ensemble technology.pptx
Machine learning for sanctions screening
Automobile Insurance Claim Fraud Detection
Empirical analysis of ensemble methods for the classification of robocalls in...
Classification of Breast Cancer Diseases using Data Mining Techniques

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
cuic standard and advanced reporting.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Modernizing your data center with Dell and AMD
PPTX
Big Data Technologies - Introduction.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Network Security Unit 5.pdf for BCA BBA.
cuic standard and advanced reporting.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
NewMind AI Weekly Chronicles - August'25 Week I
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
MYSQL Presentation for SQL database connectivity
NewMind AI Monthly Chronicles - July 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Modernizing your data center with Dell and AMD
Big Data Technologies - Introduction.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Digital-Transformation-Roadmap-for-Companies.pptx
Empathic Computing: Creating Shared Understanding
Spectral efficient network and resource selection model in 5G networks
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication

11/04 Regular Meeting: Monority Report in Fraud Detection Classification of Skewed Data

  • 1. Minority Report in Fraud Detection: Classification of Skewed Data Clifton Phua, Damminda Alahakoon, and Vincent Lee SIGKDD 2004 Reporter: Ping-Hua Yang
  • 2. Abstract This paper propose an innovative fraud detection method to deal with the data mining problem of skewed data distributions. This method uses Back-propagation, together with Naïve Bayesian and C4.5 algorithms on data partitions derived from minority over-sampling with replacement. This paper compares the new fraud detection method against C4.5 trained using under-sampling, over-sampling, and SMOTE without partitioning. The most interesting find is confirming that the combination of classifiers to produce the best cost savings has its contributions from all three algorithms.
  • 3. Outline Introduction Fraud detection Experiments Results Discussion Conclusion
  • 4. Introduction Fraud, or criminal deception, will be a costly problem for many profit organizations. Data mining can minimize some of these losses by making use of the massive collections of customer data. However fraud detection data being highly skewed or imbalanced in the norm. There are two typical way to proceed when faced with this problem. The first approach is to apply different algorithms. The second approach is to manipulate the class distribution.
  • 5. Introduction This paper introduces the new fraud detection method for skewed data. The innovative use of NB, C4.5, and BP classifier to process the same partitioned numerical data has the potential of getting better cost saving. The selection of the best classifier of different algorithms using stacking and the merger of their predictions. One related problem caused by skewed data includes measuring the performance of the classifiers. Success can’t be defined in terms of predictive accuracy because the minority class in the skewed data usually has a significantly higher cost.
  • 6. Fraud detection Existing fraud detection methods The new fraud detection method Fraud detection algorithms
  • 7. Existing Fraud detection methods Insurance fraud The hot spot methodology applies a three step process: the k-means for cluster detection, the C4.5 for decision tree rule induction, and domain knowledge , statistical summaries and visualization tools for rule evaluation. [Williams G, Hung Z,1997] Expanded the hot spot architecture to use genetic algorithm to generate rules and to allow the domain user. [Williams G, 1999] Credit Card Fraud The Bayesian Belief Network (BBN) and Artificial Neural Network (ANN) comparison the STAGE algorithm for BBN and BP algorithm for ANN in fraud detection. [Maes S, Tuyls K, Vanschoenwinkel B, Manderick B, 2002]
  • 8. Existing Fraud detection methods Telecommunications Fraud The advanced security for personal communications technologies (ASPECT) research group focuses on neural networks to train legal current user profiles that store recent user information and user profile histories that store long term information to define normal patterns of use. [Weatherford M, 2002]
  • 9. The New Fraud detection method The idea is to simulate the book’s Precrime method of precogs and integration mechanism with existing data mining methods and techniques.
  • 10. Fraud detection algorithms This study provides a slight variation of cross validation. Instead of using ten data partition, an odd-numbered eleven data partitions. Bagging combines the classifiers trained by the same algorithm using unweighted majority voting on each example or instance. Stacking combines multiple classifiers generated by different algorithms with a meta-classifier. To classify an instance the base classifiers from the tree algorithms present their predictions to the meta-classifier which then makes the final prediction. This paper propose Stacking-bagging which is a hybrid technique. To train the simplest learning algorithm first, followed by the complex ones.
  • 11. Experiments Data Understanding Cost Model Data Preparation Modeling
  • 12. Data Understanding The available fraud detection data set in automobile insurance is provided by Angoss KnowledgeSeeker software. This paper split the main data set into a training data set and a scoring data set. The Class labels of the training data are known, and the training data is historical compared to the scoring data. This data set contains 11338 examples from January 1994 to December 1995 (training data), and 4083 instances from January 1996 to December 1996 (scoring data). It has a 6% fraudulent and 94% legitimate distribution The original data set has 6 numerical attributes and 25 categorical attributes
  • 13. Cost Model This cost model has two assumptions All alters must investigated. The average cost per claim must be higher than the average cost per investigation. In 1996, the average cost per claim for the score data set is approximated at USD$2,640.
  • 14. Cost Model The evaluation metrics for the predictive models on the score data set to find the optimum cost savings are:
  • 15. Data preparation In a related study, it is recommended that data partitions should neither be too large for the time complexity of the learning algorithms nor too small to produce poor classifiers. Randomly select different legal examples from the years 1994 and 1995 (10840 legal examples) into eleven sets of y legal examples (923). x fraud examples (615) with a different set of y to form eleven x:y partitions (615:923) with a fraud:legal distribution of 40:60. Other possible distributions are 50:50 (923:923), and 30:70 (396:923). Minority over-sampling with replacement/replication. In rotation, each data partition of a certain distribution is used for training, testing and evaluation. A training data partition is used to come up with a classifier, a test data partition to optimize the classifier’s parameters and an evaluation data to compare the classifier with others.
  • 16. Data preparation Test The algorithm trained on partition 1 to generate classifier 1. The algorithm tested on partition 2 to refine the classifier. The algorithm evaluation on partition 3 to assess the expected accuracy of classifier.
  • 18. Modeling In figure 3. Each rectangle represents an experiment. Each circle depicts a comparison of cost savings between experiments. Each bold arrow indicates the best experiment from the comparisons. Decision threshold (except for experiments V and IX) and cost model for these experiments will remain unchanged. Experiment V and IX will produce BP predictions need to be converted into categorical ones using the decision threshold value.
  • 20. Modeling Table 4. Lists the eleven tests, labeled A to K, which were repeated for each of experiments I to V In other words, there are 55 test in total for experiments I to V. Each test consisted of training, testing, evaluation, and scoring. The score set was the same for all classifiers but the data partitions labeled 1 to 11 were rotated. The overall success rate denotes the ability of an ensemble of classifiers to provide correct predictions. The bagged overall success rates X and Z were compared to averaged overall success rates W and Y.
  • 21. Modeling Experiments I, II, and III were designed to determine the best training distribution under the cost model. Which one of the above three training distributions is the best for the data partitions under the cost model? Experiment IV and V used the best training distribution determined from comparison 1. Experiment IV and V produce a bagged Z. Experiment VI, VII, and VIII determine which ensemble mechanism produces the best cost savings. Experiment VI used bagging to combine three sets of perditions from each algorithm. Experiment VII used stacking to combine all predictions. Experiment VIII proposed to bag the best classifiers determined by stacking.
  • 22. Modeling Experiment IX implemented the BP algorithm on unsampled and unpartitioned data. This experiment was then compared with the other six before it. Which one of the above seven different classifier systems will attain the highest cost savings? Experiment X, XI, and XII were constructed to find out how each sampling method performs on unpartitioned data and if they could yield better results than the multiple classifier approach. Experiment XII’s data consists of the same number of examples as XI. But for XII, the minority class used SMOTE. Can the best classifier system perform better than the sampling approaches in the following results section?
  • 23. Results Table 5 show in experiments I, II, and III, the bagged success rates X outperformed all the averaged success rates W. When applied on the score set, bagged success rate Z performed marginally better than the averaged success rates Y.
  • 24. Results In figure 4., experiment IV highlights C4.5 as the best learning algorithm for this particular automobile insurance data set. The resultant predictions of experiment VIII (stacking-bagging) were better than those of C4.5 algorithm.
  • 26. Results In figure 5, these three experiments performed comparably well at 40:60 and 50:50. Experiment XI and XII substantiate the claims that SMOTE is superior to minority oversampling with replacement. The undersampled data provides the highest cost saving of $165,242 at 60:40, it also incurs the highest expenditure (-$266,529). This is most likely due to the number of legal examples getting very small.
  • 27. Discussion Table 6 ranks all the experiments using cost savings. Stacking-bagging achieves the highest cost savings which is almost twice the of the conventional BP procedure used by many fraud detection. The optimum success rate is 60% for highest cost savings in this slewed data set and, as the success rate increases, cost savings decrease.
  • 29. Discussion Table 7 illustrates the top fifteen, out of 33 classifiers, produced from stacking.
  • 30. Conclusion In this paper, existing fraud detection methods are explored and a new fraud detection method is recommended. The choice of the three classification algorithm and one hybrid meta-learning technique is justified for the new method. To extend the fraud detection method based on Minority Report to find out the properties of a data set, data partition, or data cluster which will make on classifier more appropriate.