SlideShare a Scribd company logo
101
S. Suganya and N. Kamalraj, “Meta classification technique for improving credit card fraud detection,” International Journal of Scientific
and Technical Advancements, Volume 2, Issue 1, pp. 101-105, 2016.
International Journal of Scientific and Technical Advancements
ISSN: 2454-1532
Meta Classification Technique for Improving Credit
Card Fraud Detection
S. Suganya1
, N. Kamalraj2
1
Department of Computer Science, Dr. SNS Rajalakshmi College of Arts and Science, Coimbatore, India-641049
2
Department of Information Technology, Dr. SNS Rajalakshmi College of Arts and Science, Coimbatore, India-641049
Email address: 1
sugiselvi85@gmail.com, 2
tpkamal@gmail.com
Abstract— Data mining is the process of automatic classification of cases based on data patterns obtained from a dataset. A number of
algorithms have been developed and implemented to extract information and discover knowledge patterns that may be useful for decision
support. Credit card frauds occur by online and offline. Due to increase in recent developments in technology fraud transactions also
increased. In this work a ensemble method based on the D-TREE, SVM, KNN and FA is proposed for solving transaction data classification
problems. Initial solutions are generated at random using D-TREE, SVM, KNN and the FA that tries to optimize the weights of the D-
TREE, SVM, and KNN carries out the improvement. Experiments results using CREDITCARD transaction data sets show that the proposed
FA-D-TREE, SVM, KNN outperforms the D-TREE, SVM, KNN on datasets. Further comparison with other approaches in the literature
shows that the ensemble method is able to minimize the error rate. All results show ensemble FA-D-TREE, SVM, KNN outperforms
normal methods.
Keywords—Datamining; D-TREE; FA; KNN; SVM.
I. INTRODUCTION
ata Mining or Knowledge Discovery is more
important to extract needful data or information
from huge amount of data collection. Knowledge
Discovery in data is the non-trivial process of identifying
valid, novel, potentially useful and ultimately understandable
patterns in data [1]. Analysis and prediction are also a part of
data mining process which is used to extract models with
different data classes and to predict future models using
extracted models by analyzing it. Classification in data mining
consists of two steps namely model or classifier building and
model or classifier usage for classification of huge amount of
data.
Credit Card frauds is of two types as online through
internet, phone and the like and offline which occurs by loss
of credit card. The filter and wrapper are two forms of feature
selection models for classification. Ensemble learning method
is used in this work; in which collection of methods learn a
target function by training a number of individual learners and
combining their predictions. The reasons for using ensemble
learning is to improve accuracy and efficiency
Jerzy Stefanowski et al. [2] proposed an experimental
study and the main aim of this proposed method is to improve
the classification accuracy. Experiments are done over various
benchmark datasets and found that the combiner classifier is
having higher classification accuracy than the single
classifiers.
The ample number of ensemble methods is now available
for researchers in that area. Different factors that distinguish
ensemble methods are Inter-classifiers relationship, combining
method, diversity generator, ensemble size. The various
ensemble algorithms are bagging. Random subspaces, random
forest, rotation forest.
II. LITERATURE REVIEW
Several taxonomies are arisen in literature which is used to
categorize ensemble methods from algorithm designer point.
Sharkey [5] proposed taxonomy for ensemble of neural
networks which suggests three dimensions:
1. Two modes of ensemble members are Competitive where
single member is used for classification and Co-operative
where all members are selected for classification.
2. Ensemble creation is done whether Top-down where
combination is based on some other feature other than
output of classifiers or Bottom-up which takes members‟
output for their combination. Further Bottom-up methods
are divided into fixed (voting) and dynamic (stacking).
3. The ensemble of components are whether modular or
hybrid which make a distinction between modular and
pure ensemble systems. The main theme of pure ensemble
systems is the combination of classifier set, where every
set solves the same unique task and to get a most accurate
and reliable performance when compared to single user.
The complex problem is broken down into number of
solvable problems is the use of modular systems.
Decision optimization and Coverage optimization are two
important categories of ensemble techniques.
Brown (4) proposes that According to the diversity
whether the ensemble methods choose implicitly by
randomization or explicitly by some other metric. Three
factors influence in grouping techniques which are as follows,
how they initialize the inducers in the hypothesis space, what
the space of accessible hypotheses is, and how that space is
traversed by the inducer.
Although several surveys on ensemble for classification
tasks are available in the literature [3] and there are several
papers which suggest taxonomy for ensemble methods [4], in
this paper the four main contributions are introduced:
D
102
International Journal of Scientific and Technical Advancements
ISSN: 2454-1532
S. Suganya and N. Kamalraj, “Meta classification technique for improving credit card fraud detection,” International Journal of Scientific
and Technical Advancements, Volume 2, Issue 1, pp. 101-105, 2016.
1. All noteworthy ensemble methods are to be categorized
into a fresh unified taxonomy. As noted in [3], a structure
is now only gradually under development by numerous
attempts. A fresh taxonomy suggests systemizing present
taxonomies into logical and unified taxonomy.
2. The updated survey of ensemble learning is proposed
since it is an lively research field.
3. Efficient and mature ensemble methods are covered in
this paper which does not fit in mainstream.
4. To choose apt ensemble method, numerous criteria for
selection is given from practitioner‟s view of point.
A. Neural Network
Raghavendra Patidar, Lokesh Sharma proposed a work by
neural networks on credit card fraud detection. Different
techniques are available to detect fraud but very few are able
to find fraudulent one in evolution. The short time to conclude
to accept or to reject and enormous amount of transactions in
credit card to be processed are atypical characteristics of fraud
detection in credit card usage.
a) Working principle (Pattern recognition)
Neural network is alike a working of human brain.
Computer is made to imagine like a human brain which gains
knowledge from past experience that is used to solve a daily
problems in life. The user of the credit card will use some
standard pattern which is trained for previous two or three
years on neural network. Other information such as huge
purchase frequencies and the like are stored. Credit card
pattern usage is trained with different other faces of credit card
provided by the particular bank. Prediction algorithm is used
to discriminate fraudulent and non-fraudulent transactions
from usage pattern of credit card. The trained original card
holder‟s pattern is matched with illegal user‟s pattern, when
they are same then the final conclusion on that transaction is
genuine.
b) Fraud detection
The small variations in pattern matching can be accepted
and if variations are big then the transaction is fraud or illegal.
Neural network‟s output will be stuck between 0 and 1. When
the output lies below .6 or .7 then the transaction is legal.
When the output is above .7 the transaction‟s probability of
being illegal is high. There are some situations when card
holders (legal users) make different pattern transaction which
resembles illegal users‟ transaction and vice versa. Card
holders will use credit card according the amount limit
specified by the bank while fraudster won‟t be the like
because he will try to use card for more amount before legal
action is taken by the card holder. History descriptors have
some details such as payment and card details, date of issue
and so on [6].
B. Decision Tree and SVM
Y. Sahin and E. Duman developed a work with real dataset
for the comparison of performance between SVM and
Decision trees algorithms. The result of comparison is
decision tree models are more enhanced in final result than
SVM Models. SVM models have a problem of over fitting of
training dataset during result comparison. The identification of
more number of fraud transactions is the success factor of this
problem. Despite of whether transaction is true fraud or true
normal assignment, the accuracy shows true assignment rate.
According to accuracy, the performance of SVM models
becomes as equal as decision tree model and performance is
compared as the increase in number dataset for training, over
fitting becomes less. Decision tree models caught more
number of fraudulent transactions than SVM models. The
accuracy is not matched accordingly with performance metric
in this problem. Among more models C5.0 is finest but C&RT
models gets more fraud transactions from samples. [7]
C. Based on Frequent Item Set Mining
K. R. Seeja and Masoumeh Zareapoor worked on highly
imbalanced and anonymous credit card transaction datasets to
detect frauds.
The class with imbalance problem is handled by frequent
item set mining which is used for identifying legal and illegal
patterns. A matching algorithm is developed for identifying
legal or illegal pattern from the incoming transactions. Pattern
identification is done by equally treating every attribute
without giving attention on attributes to manage anonymous
nature of transaction. This work has fewer false alarms when
compared to state of the art classifiers, rate of fraud detection
is high, classification rate is balanced, Matthews correlation
coefficient.
“Fraud Miner” is the key technique proposed. Patterns are
created for legal and illegal transactions for each customer in
frequent item set mining in the training phase respectively.
The value “0” is returned if legal pattern is matched with
incoming transaction. If the value returned by algorithm is “1”
then the incoming transaction is fraudulent [8].
D. Meta Classification Strategy
Joseph Pun, Yuri Lawryshyn followed the meta-learning
techniques introduced by Chan and Stolfo [10] in their
proposed work. The results of various learners to prediction
accuracy are combined by Meta learning method. And also
pros and cons of methods are complimented between the
methods. Arbiter and combiner are the two methods of
combiner algorithms. Combiner methods are most effective
than arbiter is found by the experiments. The attributes and
correct classifications are used to train base classifier in
combiner method. Meta-level classifier gets an input from
resulting predictions. The training data for Meta classifier is
fed from the combination of original attributes, base classifier
prediction and correct classifications for all instances which is
a “combined” dataset. Meta-level classifier‟s prediction is the
final prediction in the combiner strategy [9].
E. Hidden Markov Model
Fraud Detection System (FDS) is runned by the bank
which issues credit card. Every incoming transaction is
verified by FDS. Genuine and fake transactions are identified
from card details and purchase details by FDS. Comparison of
spending details, address for delivery and the like is checked
by FDS for finding the difference. If the transaction is
identified as fake or fraud then that particular transaction is
declined [10].
103
International Journal of Scientific and Technical Advancements
ISSN: 2454-1532
S. Suganya and N. Kamalraj, “Meta classification technique for improving credit card fraud detection,” International Journal of Scientific
and Technical Advancements, Volume 2, Issue 1, pp. 101-105, 2016.
a) HMM model for credit card transaction processing
HMM should determine observation symbols to handle
credit card transactions. Control the x values of purchase into
M price ranges such as V1, V2,…, VM, establishes
observation symbols to bank.
b) Generation of observation symbols
Each credit card holder is trained by HMM. The
observation symbols of each card holder‟s transaction are got
by the clustering algorithm. Issuing bank consists of many
attributes in the database.
c) Checking spending profile
Three types of spending details of card holders are low,
high, and medium.
d) Model parameter estimation and training
Few transactions are used to train proposed model and is
developed with further enhancements for future references to
detect frauds efficiently.
e) Fraud detection
Initial symbol sequence is formed from the symbols taken
from the cardholder‟s training data after learning HMM
factors [10].
F. Bayesian Classification
Sam Maes, Karl Tulys, Bram Vanschoenwinkel, Bernard
Manderick [11], had done a work on fraud detection using
Bayesian Belief Networks which received a high score from
STAGE algorithm. The experimental results from four
features of dataset results in 68% of fraudulent transactions are
correctly identified and 10% of genuine transactions are
falsely classified as fraudulent transactions. Another
experimental result from ten features of dataset is 15% of
fraudulent transactions are incorrectly classified and & 73% of
transactions are fraudulent.
III. RESEARCH METHODOLOGY
The system is first loaded with the .csv format Italian
Government Employee Transaction Dataset which is then
preprocessed and converted into array format. The
preprocessed data is then produced as input for base classifiers
(D-Tree, SVM, KNN) separately. The data is trained and
classified by each algorithm separately which is given as input
to meta classifier where ensemble technology is implemented
to find the optimal classified solution.
A. KNN Algorithm
Closest training examples in the feature space are used for
classifying objects by KNN Algorithm. Two types of sets is
divided by KNN as test set and training set. Euclidean
Distance is used to find training sets objects for each row of
the test set and majority vote is used for classification. If kth
nearest vector has ties, all candidates are included to vote.
The transaction date is taken as a feature for classification.
Accurate predictions are done about unknown data after
trained on known data. (w1, w2… wn, v) are given as training
tuples. In testing part (classification) only (w1, w2… wn) is
given, the main aim is to find „v‟ with more accuracy.
Euclidean distance is calculated by using the formula,
Steps:
1. Randomly select some „k‟ number of transaction dates.
2. Using test set find the classes where data in training set is
classified using distance function.
3. Calculate the labeled data with (+/-) difference.
4. Draw bisectors.
5. Extend & join all bisectors.
Fig. 1. Framework of the system.
B. Decision Trees
Decision tree is build by greedy method using lowest
disorder tests. The creditor is taken as a key feature based on
which classification is made. In credit card dataset context,
each feature „fi‟ is the test count of credit card, „vi‟ is result
amplitude and „di‟ is amplitude. Entropy is calculated by
considering if a random variable „fi‟ can take „vi‟ different
values then the ith value with probability „pi‟, entropy is
calculated as
Class entropy is calculated and is used to construct
decision tree. Root of the tree can be any feature test that
usually maximally distinguishes class labels.
C. Support Vector Machine (SVM)
SVM classifies the credit card dataset based on the
services feature. Data is given as input to SVM and function is
Dataset
1.
Prepro
cess
Preprocessed
details
Data
KNN
D-Tree
SVM
2.
Train
4.
Train
3.
Train
2.1
Classific
ation
3.1
Classifi
cation
4.1
Classifi
cation
5. Firefly
Classific
ation
5.1 Ensemble
result
Optimization Result
Ensemble
Result details
Ensemble Result Data
Classified Data
Classified Result
Data
KNN Optimized Data
D-Tree Optimized
Data
SVM Optimized Data
104
International Journal of Scientific and Technical Advancements
ISSN: 2454-1532
S. Suganya and N. Kamalraj, “Meta classification technique for improving credit card fraud detection,” International Journal of Scientific
and Technical Advancements, Volume 2, Issue 1, pp. 101-105, 2016.
given as output which is used to predict the future data‟s
feature. The aim of SVM is to use optimal hyper plane for
linearly separable patterns. For not linearly separable by
transformations of original data to map into new space called
kernel function. Credit card fraud detection uses linear
separability, since these are high dimensions it needs hyper
plane.
Steps:
1. Consider (x1,….xn) is data set and yi(1,-1) be class label
of xi.
2. Find the boundary for the data set.
3. The decision boundary can be found by finding the
solution for constrained optimization problem. (f(x) = wx
+ b ).
w-> weight factor, b-> bias.
D. Ensemble Classifier – Firefly
Yang (2009) developed FA which is based on population.
The flashing light produced by fireflies is main aspect of FA.
The light intensity makes the fireflies to attract each other and
for other activities also used. Minimum intensity fireflies are
easily attracted towards maximum intensity fireflies. This
concept is used as an optimization algorithm; the flashing light
of fireflies is mapped to fitness function which is to be
optimized.
In this study, the FA is employed to optimize the weights
of the D-TREE, SVM, KNN model, denoted as FA-D-TREE,
SVM, KNN, to obtain the optimal parameter settings for
training the network of D-TREE, SVM, KNN and to minimize
the error rate. The quality of transaction is measured on the
error rate which is calculated on the basis of confusion matrix.
Begin
Generate the initial solution randomly
Evaluate each individual in the population f(x)
based on error rate
Find the best solution from the population
While (stopping criterion satisfied)
For i = 1 to n do
For j = 1 to n do
If (f (xj) < f (xi))
Calculate attractive fireflies by eq.1
Calculate the distance between each fireflies i
and j by eq.
Move all firefly (xi) to the best solution (xj) by
eq.3
End if
End for j
End for i
Moves best solution randomly by eq.4
Find the best solution from the new
population
End while
Return best (TP), (TN), (FP), and (FN)
End of the algorithm
In FA, the form of attractiveness function of a firefly is
depicted by the following:
(1)
where,
r = the distance between any two fireflies
β0 = the initial attractiveness at r = 0 and set to 1 in
this study
γ = an absorption coefficient which controls the
decrease of the light intensity and also set to 1 in this study
(2)
(3)
(4)
IV. IMPLEMENTATION AND RESULT
A. Benchmark Datasets
This experiment is performed on datasets that can be freely
downloaded from the CREDITCARD Transaction data
Classification Homepage:
www.cs.CreditCard.edu/~eamonn/time_series_data. The data
contains data sets, which come from different domains (Table
I).
TABLE I. Instances and features of algorithm.
Algorithm Instance# Feature#
KNN 2424 100
D-Tree 2397 617
SVM 2000 500
Firefly 2000 649
All the CREDITCARD data sets are categorized as having
similar complexity to real-world data sets with the data sets
based on several criteria. All the benchmark CREDITCARD
transaction data sets have a moderate to high transaction data
length that ranges from 1996 to 2637 transaction data length.
The results clearly indicate that the hybrid method (FA- D-
TREE, SVM, and KNN) has outperformed the D-TREE,
SVM, KNN algorithm on all datasets. For example, in the
Gun-Point dataset the D-TREE, SVM, KNN has achieved
11.33% error rate, while the proposed FA-D-TREE, SVM,
KNN obtained 00.08% of error rate. It is due to capability of
the FA which incorporated into D-TREE, SVM, KNN to find
the optimal weights for the D-TREE, SVM, KNN and
consequently increase the performance of the D-TREE, SVM,
and KNN. This is believed that fireflies come together more
closely around the optimal solution. In other words, it has
good exploitation capability and can find better solutions as
many candidates (fireflies) are gathered near optimal solution.
105
International Journal of Scientific and Technical Advancements
ISSN: 2454-1532
S. Suganya and N. Kamalraj, “Meta classification technique for improving credit card fraud detection,” International Journal of Scientific
and Technical Advancements, Volume 2, Issue 1, pp. 101-105, 2016.
B. Comparison with state-of-the-Art
Table II shows the comparison of the results of FA-D-
TREE, SVM, KNN and other available approaches in terms of
error rate classification using credit card datasets. The best
results are presented in bold.
The experimental results indicate that the proposed hybrid
method (FA-D-TREE, SVM, KNN) outperforms other
approaches on credit card datasets. FA-D-TREE, SVM, KNN
is able to classify the Wafer with error rate of 0.004%. This
capability is supported by the feature of the attractiveness i.e.,
the density of the light that caused the fireflies to be brighter
(is determined by the value of the objective function) and
attract to the location of near optimal solutions.
C. Experimental Results
Table II presents the comparison of the error rate (%)
between FA-D-TREE, SVM, KNN and D-TREE, SVM, KNN
transaction data classification techniques with credit card
datasets.
TABLE II: Comparison of algorithms error rate
Algorithm Instance# Feature# Proposed algorithm Actual Predicted
KNN 2424 100 73.15±7.41 61.38±5.09 61.89±4.11
D-Tree 2397 617 90.56±1.02 89.77±1.02 90.01±1.03
SVM 2000 500 67.72±3.36 55.63±3.29 55.1±3.47
Firefly (Ensemble) 2000 649 96.11±1.3 97.9±0.9 97.9±0.92
Fig. 2. Comparison of error rate between algorithms.
Note#: Instance is the number of rows of data. Feature is the
initial result of the algorithm. Proposed Algorithm indicates
the result obtained by ensemble of D-Tree with FA, KNN with
FA, FA with SVM. Actual column shows the actual classes in
the selected instance of data. Predicted shows the results
obtained by the ensemble algorithms with FA.
V. CONCLUSION AND FUTURE WORK
In this work an ensemble method based on the D-TREE,
SVM, KNN and FA is proposed for solving transaction data
classification problems. Initial classification results are
generated at random instances of data using D-TREE, SVM,
KNN and the improvement is carried out by the FA that tries
to optimize the weights of the D-TREE,SVM,KNN using
ensemble mechanism. Experiments results using benchmark
CREDITCARD transaction data sets show that the proposed
FA-D-TREE, SVM, KNN outperforms the D-TREE, SVM,
KNN on all dataset instances. Further comparison with other
approaches in the literature shows that the ensemble method is
able to minimize the error rate with new best results on
instances. As an extension of this study, further investigation
will be devoted to validate the hybridization between FA with
local search algorithm for the purpose of creating a balance
between the exploration and exploitation during the
optimization process and to avoid the premature convergence.
REFERENCES
[1] E. Bauer and R. Kohavi, “An empirical comparison of voting
classification algorithms: bagging, boosting, and variants,” Machine
Learning, vol. 35, pp. 1-38, 1999.
[2] C. E. Brodley, “Recursive automatic bias selection for classifier
construction,” Machine Learning, vol. 20, pp. 63-94, 1995.
[3] T. Dietterich, “Ensemble methods in machine learning,” First
International Workshop on Multiple Classifier Systems, Lecture Notes in
Computer Science, Springer-Verlag, pp. 1-15, 2000.
[4] G. Brown, J. L. Wyatt, and P. Tino, “Managing diversity in regression
ensembles,” The Journal of Machine Learning Research, vol. 6, pp.
1621-1650, 2005.
[5] A. Sharkey, Combining Artificial Neural Networks: Ensemble and
Modular Multi-Net Systems, Springer-Verlag, pp. 1-30, 1999.
[6] R. Patidar and L. Sharma, “Credit card fraud detection using neural
network,” International Journal of Soft Computing and Engineering
(IJSCE), vol. 1, issue-NCAI2011, 2011.
[7] Y. Sahin and E. Duman, “Detecting credit card fraud by decision trees
and support vector machines,” Proceedings of the International
Multiconference of Engineers and Computer Scientists, Hong Kong, vol.
I, 2011.
[8] K. R. Seeja and M. Zareapoor, “FraudMiner: A novel credit card fraud
detection model based on frequent item set mining,” The Scientific
World Journal, Hindawi Publishing Corporation, vol. 2014, article ID
252797, pp. 1-10, 2014.
[9] J. Pun and Y. Lawryshyn, “Improving credit card fraud detection using a
meta-classification strategy,” International Journal of Computer
Applications, vol. 56, no. 10, pp. 41-46, 2012.
[10] A. Thakur, B. Shaikh, V. Jain, and A. M. Magar, “Hidden markov model
in credit card fraud detection,” International Journal of Advanced
Research in Computer Science and Software Engineering, vol. 5, issue
2, pp. 997-1000, 2015.
[11] S. Maes, K. Tulys, B. Vanschoenwinkel, and B. Manderick, “Credit card
fraud detection using bayesian and neural networks,” Vrije Universiteit
Brussel–Department of Computer Science, Computational Modeling
Lab (COMO), Pleinlaan 2, B-1050 Brussel, Belgium.

More Related Content

PDF
IRJET- Credit Card Fraud Detection using Isolation Forest
PDF
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
PDF
IRJET- Credit Card Fraud Detection Analysis
PDF
Prediction of Default Customer in Banking Sector using Artificial Neural Network
PDF
An application of artificial intelligent neural network and discriminant anal...
PDF
Extended pso algorithm for improvement problems k means clustering algorithm
PDF
Data Collection Methods for Building a Free Response Training Simulation
PDF
Profile Analysis of Users in Data Analytics Domain
IRJET- Credit Card Fraud Detection using Isolation Forest
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
IRJET- Credit Card Fraud Detection Analysis
Prediction of Default Customer in Banking Sector using Artificial Neural Network
An application of artificial intelligent neural network and discriminant anal...
Extended pso algorithm for improvement problems k means clustering algorithm
Data Collection Methods for Building a Free Response Training Simulation
Profile Analysis of Users in Data Analytics Domain

What's hot (20)

PDF
Fuzzy Analytic Hierarchy Based DBMS Selection In Turkish National Identity Ca...
PDF
The use of genetic algorithm, clustering and feature selection techniques in ...
PPTX
Credit card fraud detection using machine learning Algorithms
PDF
Framework for opinion as a service on review data of customer using semantics...
PDF
Biometric Identification and Authentication Providence using Fingerprint for ...
PDF
Performance Analysis of Selected Classifiers in User Profiling
PPTX
Comparative study of various approaches for transaction Fraud Detection using...
PDF
Improving the credit scoring model of microfinance
PDF
PARTICIPATION ANTICIPATING IN ELECTIONS USING DATA MINING METHODS
PDF
Instance Selection and Optimization of Neural Networks
PDF
Trading outlier detection machine learning approach
PDF
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRY
PDF
Ijatcse71852019
PDF
Df24693697
PDF
IRJET- Analysis of Brand Value Prediction based on Social Media Data
PDF
An approach for discrimination prevention in data mining
PDF
An approach for discrimination prevention in data mining
PDF
A survey on discrimination deterrence in data mining
PDF
Adaptive Machine Learning for Credit Card Fraud Detection
PDF
Empirical analysis of ensemble methods for the classification of robocalls in...
Fuzzy Analytic Hierarchy Based DBMS Selection In Turkish National Identity Ca...
The use of genetic algorithm, clustering and feature selection techniques in ...
Credit card fraud detection using machine learning Algorithms
Framework for opinion as a service on review data of customer using semantics...
Biometric Identification and Authentication Providence using Fingerprint for ...
Performance Analysis of Selected Classifiers in User Profiling
Comparative study of various approaches for transaction Fraud Detection using...
Improving the credit scoring model of microfinance
PARTICIPATION ANTICIPATING IN ELECTIONS USING DATA MINING METHODS
Instance Selection and Optimization of Neural Networks
Trading outlier detection machine learning approach
PROVIDING A METHOD FOR DETERMINING THE INDEX OF CUSTOMER CHURN IN INDUSTRY
Ijatcse71852019
Df24693697
IRJET- Analysis of Brand Value Prediction based on Social Media Data
An approach for discrimination prevention in data mining
An approach for discrimination prevention in data mining
A survey on discrimination deterrence in data mining
Adaptive Machine Learning for Credit Card Fraud Detection
Empirical analysis of ensemble methods for the classification of robocalls in...
Ad

Similar to Meta Classification Technique for Improving Credit Card Fraud Detection (20)

PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
PDF
Concept drift and machine learning model for detecting fraudulent transaction...
PDF
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
PDF
A Comparative Study on Credit Card Fraud Detection
PPT
CREDIT_CARD.ppt
PDF
Unsupervised Learning for Credit Card Fraud Detection
PDF
CREDIT CARD FRAUD DETECTION USING MACHINE LEARNING
PDF
An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...
PPTX
Presentation2.pptx
PDF
A new hybrid algorithm for business intelligence recommender system
PDF
A NEW HYBRID ALGORITHM FOR BUSINESS INTELLIGENCE RECOMMENDER SYSTEM
PDF
Sandip Finwmwmmwmwmmmenenneal Project.pdf
PDF
Analysis on Fraud Detection Mechanisms Using Machine Learning Techniques
PPTX
smart attendance system using signature verification 1.pptx
PDF
CREDIT CARD FRAUD DETECTION AND AUTHENTICATION SYSTEM USING MACHINE LEARNING
PDF
IRJET - Fake News Detection: A Survey
PDF
A simulated decision trees algorithm (sdt)
DOCX
A Distributed Knowledge Distillation Framework for Financial Fraud Detection ...
PDF
Ec3212561262
Welcome to International Journal of Engineering Research and Development (IJERD)
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Concept drift and machine learning model for detecting fraudulent transaction...
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
A Comparative Study on Credit Card Fraud Detection
CREDIT_CARD.ppt
Unsupervised Learning for Credit Card Fraud Detection
CREDIT CARD FRAUD DETECTION USING MACHINE LEARNING
An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...
Presentation2.pptx
A new hybrid algorithm for business intelligence recommender system
A NEW HYBRID ALGORITHM FOR BUSINESS INTELLIGENCE RECOMMENDER SYSTEM
Sandip Finwmwmmwmwmmmenenneal Project.pdf
Analysis on Fraud Detection Mechanisms Using Machine Learning Techniques
smart attendance system using signature verification 1.pptx
CREDIT CARD FRAUD DETECTION AND AUTHENTICATION SYSTEM USING MACHINE LEARNING
IRJET - Fake News Detection: A Survey
A simulated decision trees algorithm (sdt)
A Distributed Knowledge Distillation Framework for Financial Fraud Detection ...
Ec3212561262
Ad

More from IJSTA (20)

PDF
Improving the Quality of Service of Paolo Memorial Chokchai4 Hospital
PDF
Brand Loyalty of ‘Mae Samarn’ Thai Baked Mung Bean Cake, Petch Buri Province
PDF
A Survey on Neural Network Based Minimization of Data Center in Power Consump...
PDF
Satisfaction and Behavior in Consuming Beauty Queen Medicinal Herbs Cream of ...
PDF
Clustering Categorical Data for Internet Security Applications
PDF
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
PDF
Performance Analysis of 5-D Coupling for Parallel Angular Transmission
PDF
Frequency Assignment in GSM Networks an Intelligent Approach
PDF
Echo Determination of Follicular Growth and Ovulation Time in Nubian Goats Su...
PDF
In Vitro Cytotoxic Activity of Medicinal Plants Used in the Treatment of Cancer
PDF
Thin Layer Chromatography, Extraction and Phytochemical Investigations of Cel...
PDF
Antimicrobial Activity of Leaf Extracts of Asparagus Racemosus Willd–A Medici...
PDF
E-note+Books “A Study of School Digitization Transformation Scope in India”
PDF
Fast Multiplier for FIR Filters
PDF
Study on Automatic Age Estimation and Restoration for Verification of Human F...
PDF
Spectral Efficient IDMA System Using Multi User Detection
PDF
Experimental Investigation for Drinking Water Production through Double Slope...
PDF
An Efficient Approach for Clustering High Dimensional Data
PDF
Comparison of Marshall and Superpave Asphalt Design Methods for Sudan Pavemen...
PDF
New Approach of MA Detection & Grading Using Different Classifiers
Improving the Quality of Service of Paolo Memorial Chokchai4 Hospital
Brand Loyalty of ‘Mae Samarn’ Thai Baked Mung Bean Cake, Petch Buri Province
A Survey on Neural Network Based Minimization of Data Center in Power Consump...
Satisfaction and Behavior in Consuming Beauty Queen Medicinal Herbs Cream of ...
Clustering Categorical Data for Internet Security Applications
Efficiency of Using Sequence Discovery for Polymorphism in DNA Sequence
Performance Analysis of 5-D Coupling for Parallel Angular Transmission
Frequency Assignment in GSM Networks an Intelligent Approach
Echo Determination of Follicular Growth and Ovulation Time in Nubian Goats Su...
In Vitro Cytotoxic Activity of Medicinal Plants Used in the Treatment of Cancer
Thin Layer Chromatography, Extraction and Phytochemical Investigations of Cel...
Antimicrobial Activity of Leaf Extracts of Asparagus Racemosus Willd–A Medici...
E-note+Books “A Study of School Digitization Transformation Scope in India”
Fast Multiplier for FIR Filters
Study on Automatic Age Estimation and Restoration for Verification of Human F...
Spectral Efficient IDMA System Using Multi User Detection
Experimental Investigation for Drinking Water Production through Double Slope...
An Efficient Approach for Clustering High Dimensional Data
Comparison of Marshall and Superpave Asphalt Design Methods for Sudan Pavemen...
New Approach of MA Detection & Grading Using Different Classifiers

Recently uploaded (20)

PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Geodesy 1.pptx...............................................
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Welding lecture in detail for understanding
PPTX
Sustainable Sites - Green Building Construction
PPTX
Construction Project Organization Group 2.pptx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPT
Project quality management in manufacturing
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Geodesy 1.pptx...............................................
Mechanical Engineering MATERIALS Selection
Internet of Things (IOT) - A guide to understanding
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
OOP with Java - Java Introduction (Basics)
Welding lecture in detail for understanding
Sustainable Sites - Green Building Construction
Construction Project Organization Group 2.pptx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Project quality management in manufacturing
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT

Meta Classification Technique for Improving Credit Card Fraud Detection

  • 1. 101 S. Suganya and N. Kamalraj, “Meta classification technique for improving credit card fraud detection,” International Journal of Scientific and Technical Advancements, Volume 2, Issue 1, pp. 101-105, 2016. International Journal of Scientific and Technical Advancements ISSN: 2454-1532 Meta Classification Technique for Improving Credit Card Fraud Detection S. Suganya1 , N. Kamalraj2 1 Department of Computer Science, Dr. SNS Rajalakshmi College of Arts and Science, Coimbatore, India-641049 2 Department of Information Technology, Dr. SNS Rajalakshmi College of Arts and Science, Coimbatore, India-641049 Email address: 1 sugiselvi85@gmail.com, 2 tpkamal@gmail.com Abstract— Data mining is the process of automatic classification of cases based on data patterns obtained from a dataset. A number of algorithms have been developed and implemented to extract information and discover knowledge patterns that may be useful for decision support. Credit card frauds occur by online and offline. Due to increase in recent developments in technology fraud transactions also increased. In this work a ensemble method based on the D-TREE, SVM, KNN and FA is proposed for solving transaction data classification problems. Initial solutions are generated at random using D-TREE, SVM, KNN and the FA that tries to optimize the weights of the D- TREE, SVM, and KNN carries out the improvement. Experiments results using CREDITCARD transaction data sets show that the proposed FA-D-TREE, SVM, KNN outperforms the D-TREE, SVM, KNN on datasets. Further comparison with other approaches in the literature shows that the ensemble method is able to minimize the error rate. All results show ensemble FA-D-TREE, SVM, KNN outperforms normal methods. Keywords—Datamining; D-TREE; FA; KNN; SVM. I. INTRODUCTION ata Mining or Knowledge Discovery is more important to extract needful data or information from huge amount of data collection. Knowledge Discovery in data is the non-trivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in data [1]. Analysis and prediction are also a part of data mining process which is used to extract models with different data classes and to predict future models using extracted models by analyzing it. Classification in data mining consists of two steps namely model or classifier building and model or classifier usage for classification of huge amount of data. Credit Card frauds is of two types as online through internet, phone and the like and offline which occurs by loss of credit card. The filter and wrapper are two forms of feature selection models for classification. Ensemble learning method is used in this work; in which collection of methods learn a target function by training a number of individual learners and combining their predictions. The reasons for using ensemble learning is to improve accuracy and efficiency Jerzy Stefanowski et al. [2] proposed an experimental study and the main aim of this proposed method is to improve the classification accuracy. Experiments are done over various benchmark datasets and found that the combiner classifier is having higher classification accuracy than the single classifiers. The ample number of ensemble methods is now available for researchers in that area. Different factors that distinguish ensemble methods are Inter-classifiers relationship, combining method, diversity generator, ensemble size. The various ensemble algorithms are bagging. Random subspaces, random forest, rotation forest. II. LITERATURE REVIEW Several taxonomies are arisen in literature which is used to categorize ensemble methods from algorithm designer point. Sharkey [5] proposed taxonomy for ensemble of neural networks which suggests three dimensions: 1. Two modes of ensemble members are Competitive where single member is used for classification and Co-operative where all members are selected for classification. 2. Ensemble creation is done whether Top-down where combination is based on some other feature other than output of classifiers or Bottom-up which takes members‟ output for their combination. Further Bottom-up methods are divided into fixed (voting) and dynamic (stacking). 3. The ensemble of components are whether modular or hybrid which make a distinction between modular and pure ensemble systems. The main theme of pure ensemble systems is the combination of classifier set, where every set solves the same unique task and to get a most accurate and reliable performance when compared to single user. The complex problem is broken down into number of solvable problems is the use of modular systems. Decision optimization and Coverage optimization are two important categories of ensemble techniques. Brown (4) proposes that According to the diversity whether the ensemble methods choose implicitly by randomization or explicitly by some other metric. Three factors influence in grouping techniques which are as follows, how they initialize the inducers in the hypothesis space, what the space of accessible hypotheses is, and how that space is traversed by the inducer. Although several surveys on ensemble for classification tasks are available in the literature [3] and there are several papers which suggest taxonomy for ensemble methods [4], in this paper the four main contributions are introduced: D
  • 2. 102 International Journal of Scientific and Technical Advancements ISSN: 2454-1532 S. Suganya and N. Kamalraj, “Meta classification technique for improving credit card fraud detection,” International Journal of Scientific and Technical Advancements, Volume 2, Issue 1, pp. 101-105, 2016. 1. All noteworthy ensemble methods are to be categorized into a fresh unified taxonomy. As noted in [3], a structure is now only gradually under development by numerous attempts. A fresh taxonomy suggests systemizing present taxonomies into logical and unified taxonomy. 2. The updated survey of ensemble learning is proposed since it is an lively research field. 3. Efficient and mature ensemble methods are covered in this paper which does not fit in mainstream. 4. To choose apt ensemble method, numerous criteria for selection is given from practitioner‟s view of point. A. Neural Network Raghavendra Patidar, Lokesh Sharma proposed a work by neural networks on credit card fraud detection. Different techniques are available to detect fraud but very few are able to find fraudulent one in evolution. The short time to conclude to accept or to reject and enormous amount of transactions in credit card to be processed are atypical characteristics of fraud detection in credit card usage. a) Working principle (Pattern recognition) Neural network is alike a working of human brain. Computer is made to imagine like a human brain which gains knowledge from past experience that is used to solve a daily problems in life. The user of the credit card will use some standard pattern which is trained for previous two or three years on neural network. Other information such as huge purchase frequencies and the like are stored. Credit card pattern usage is trained with different other faces of credit card provided by the particular bank. Prediction algorithm is used to discriminate fraudulent and non-fraudulent transactions from usage pattern of credit card. The trained original card holder‟s pattern is matched with illegal user‟s pattern, when they are same then the final conclusion on that transaction is genuine. b) Fraud detection The small variations in pattern matching can be accepted and if variations are big then the transaction is fraud or illegal. Neural network‟s output will be stuck between 0 and 1. When the output lies below .6 or .7 then the transaction is legal. When the output is above .7 the transaction‟s probability of being illegal is high. There are some situations when card holders (legal users) make different pattern transaction which resembles illegal users‟ transaction and vice versa. Card holders will use credit card according the amount limit specified by the bank while fraudster won‟t be the like because he will try to use card for more amount before legal action is taken by the card holder. History descriptors have some details such as payment and card details, date of issue and so on [6]. B. Decision Tree and SVM Y. Sahin and E. Duman developed a work with real dataset for the comparison of performance between SVM and Decision trees algorithms. The result of comparison is decision tree models are more enhanced in final result than SVM Models. SVM models have a problem of over fitting of training dataset during result comparison. The identification of more number of fraud transactions is the success factor of this problem. Despite of whether transaction is true fraud or true normal assignment, the accuracy shows true assignment rate. According to accuracy, the performance of SVM models becomes as equal as decision tree model and performance is compared as the increase in number dataset for training, over fitting becomes less. Decision tree models caught more number of fraudulent transactions than SVM models. The accuracy is not matched accordingly with performance metric in this problem. Among more models C5.0 is finest but C&RT models gets more fraud transactions from samples. [7] C. Based on Frequent Item Set Mining K. R. Seeja and Masoumeh Zareapoor worked on highly imbalanced and anonymous credit card transaction datasets to detect frauds. The class with imbalance problem is handled by frequent item set mining which is used for identifying legal and illegal patterns. A matching algorithm is developed for identifying legal or illegal pattern from the incoming transactions. Pattern identification is done by equally treating every attribute without giving attention on attributes to manage anonymous nature of transaction. This work has fewer false alarms when compared to state of the art classifiers, rate of fraud detection is high, classification rate is balanced, Matthews correlation coefficient. “Fraud Miner” is the key technique proposed. Patterns are created for legal and illegal transactions for each customer in frequent item set mining in the training phase respectively. The value “0” is returned if legal pattern is matched with incoming transaction. If the value returned by algorithm is “1” then the incoming transaction is fraudulent [8]. D. Meta Classification Strategy Joseph Pun, Yuri Lawryshyn followed the meta-learning techniques introduced by Chan and Stolfo [10] in their proposed work. The results of various learners to prediction accuracy are combined by Meta learning method. And also pros and cons of methods are complimented between the methods. Arbiter and combiner are the two methods of combiner algorithms. Combiner methods are most effective than arbiter is found by the experiments. The attributes and correct classifications are used to train base classifier in combiner method. Meta-level classifier gets an input from resulting predictions. The training data for Meta classifier is fed from the combination of original attributes, base classifier prediction and correct classifications for all instances which is a “combined” dataset. Meta-level classifier‟s prediction is the final prediction in the combiner strategy [9]. E. Hidden Markov Model Fraud Detection System (FDS) is runned by the bank which issues credit card. Every incoming transaction is verified by FDS. Genuine and fake transactions are identified from card details and purchase details by FDS. Comparison of spending details, address for delivery and the like is checked by FDS for finding the difference. If the transaction is identified as fake or fraud then that particular transaction is declined [10].
  • 3. 103 International Journal of Scientific and Technical Advancements ISSN: 2454-1532 S. Suganya and N. Kamalraj, “Meta classification technique for improving credit card fraud detection,” International Journal of Scientific and Technical Advancements, Volume 2, Issue 1, pp. 101-105, 2016. a) HMM model for credit card transaction processing HMM should determine observation symbols to handle credit card transactions. Control the x values of purchase into M price ranges such as V1, V2,…, VM, establishes observation symbols to bank. b) Generation of observation symbols Each credit card holder is trained by HMM. The observation symbols of each card holder‟s transaction are got by the clustering algorithm. Issuing bank consists of many attributes in the database. c) Checking spending profile Three types of spending details of card holders are low, high, and medium. d) Model parameter estimation and training Few transactions are used to train proposed model and is developed with further enhancements for future references to detect frauds efficiently. e) Fraud detection Initial symbol sequence is formed from the symbols taken from the cardholder‟s training data after learning HMM factors [10]. F. Bayesian Classification Sam Maes, Karl Tulys, Bram Vanschoenwinkel, Bernard Manderick [11], had done a work on fraud detection using Bayesian Belief Networks which received a high score from STAGE algorithm. The experimental results from four features of dataset results in 68% of fraudulent transactions are correctly identified and 10% of genuine transactions are falsely classified as fraudulent transactions. Another experimental result from ten features of dataset is 15% of fraudulent transactions are incorrectly classified and & 73% of transactions are fraudulent. III. RESEARCH METHODOLOGY The system is first loaded with the .csv format Italian Government Employee Transaction Dataset which is then preprocessed and converted into array format. The preprocessed data is then produced as input for base classifiers (D-Tree, SVM, KNN) separately. The data is trained and classified by each algorithm separately which is given as input to meta classifier where ensemble technology is implemented to find the optimal classified solution. A. KNN Algorithm Closest training examples in the feature space are used for classifying objects by KNN Algorithm. Two types of sets is divided by KNN as test set and training set. Euclidean Distance is used to find training sets objects for each row of the test set and majority vote is used for classification. If kth nearest vector has ties, all candidates are included to vote. The transaction date is taken as a feature for classification. Accurate predictions are done about unknown data after trained on known data. (w1, w2… wn, v) are given as training tuples. In testing part (classification) only (w1, w2… wn) is given, the main aim is to find „v‟ with more accuracy. Euclidean distance is calculated by using the formula, Steps: 1. Randomly select some „k‟ number of transaction dates. 2. Using test set find the classes where data in training set is classified using distance function. 3. Calculate the labeled data with (+/-) difference. 4. Draw bisectors. 5. Extend & join all bisectors. Fig. 1. Framework of the system. B. Decision Trees Decision tree is build by greedy method using lowest disorder tests. The creditor is taken as a key feature based on which classification is made. In credit card dataset context, each feature „fi‟ is the test count of credit card, „vi‟ is result amplitude and „di‟ is amplitude. Entropy is calculated by considering if a random variable „fi‟ can take „vi‟ different values then the ith value with probability „pi‟, entropy is calculated as Class entropy is calculated and is used to construct decision tree. Root of the tree can be any feature test that usually maximally distinguishes class labels. C. Support Vector Machine (SVM) SVM classifies the credit card dataset based on the services feature. Data is given as input to SVM and function is Dataset 1. Prepro cess Preprocessed details Data KNN D-Tree SVM 2. Train 4. Train 3. Train 2.1 Classific ation 3.1 Classifi cation 4.1 Classifi cation 5. Firefly Classific ation 5.1 Ensemble result Optimization Result Ensemble Result details Ensemble Result Data Classified Data Classified Result Data KNN Optimized Data D-Tree Optimized Data SVM Optimized Data
  • 4. 104 International Journal of Scientific and Technical Advancements ISSN: 2454-1532 S. Suganya and N. Kamalraj, “Meta classification technique for improving credit card fraud detection,” International Journal of Scientific and Technical Advancements, Volume 2, Issue 1, pp. 101-105, 2016. given as output which is used to predict the future data‟s feature. The aim of SVM is to use optimal hyper plane for linearly separable patterns. For not linearly separable by transformations of original data to map into new space called kernel function. Credit card fraud detection uses linear separability, since these are high dimensions it needs hyper plane. Steps: 1. Consider (x1,….xn) is data set and yi(1,-1) be class label of xi. 2. Find the boundary for the data set. 3. The decision boundary can be found by finding the solution for constrained optimization problem. (f(x) = wx + b ). w-> weight factor, b-> bias. D. Ensemble Classifier – Firefly Yang (2009) developed FA which is based on population. The flashing light produced by fireflies is main aspect of FA. The light intensity makes the fireflies to attract each other and for other activities also used. Minimum intensity fireflies are easily attracted towards maximum intensity fireflies. This concept is used as an optimization algorithm; the flashing light of fireflies is mapped to fitness function which is to be optimized. In this study, the FA is employed to optimize the weights of the D-TREE, SVM, KNN model, denoted as FA-D-TREE, SVM, KNN, to obtain the optimal parameter settings for training the network of D-TREE, SVM, KNN and to minimize the error rate. The quality of transaction is measured on the error rate which is calculated on the basis of confusion matrix. Begin Generate the initial solution randomly Evaluate each individual in the population f(x) based on error rate Find the best solution from the population While (stopping criterion satisfied) For i = 1 to n do For j = 1 to n do If (f (xj) < f (xi)) Calculate attractive fireflies by eq.1 Calculate the distance between each fireflies i and j by eq. Move all firefly (xi) to the best solution (xj) by eq.3 End if End for j End for i Moves best solution randomly by eq.4 Find the best solution from the new population End while Return best (TP), (TN), (FP), and (FN) End of the algorithm In FA, the form of attractiveness function of a firefly is depicted by the following: (1) where, r = the distance between any two fireflies β0 = the initial attractiveness at r = 0 and set to 1 in this study γ = an absorption coefficient which controls the decrease of the light intensity and also set to 1 in this study (2) (3) (4) IV. IMPLEMENTATION AND RESULT A. Benchmark Datasets This experiment is performed on datasets that can be freely downloaded from the CREDITCARD Transaction data Classification Homepage: www.cs.CreditCard.edu/~eamonn/time_series_data. The data contains data sets, which come from different domains (Table I). TABLE I. Instances and features of algorithm. Algorithm Instance# Feature# KNN 2424 100 D-Tree 2397 617 SVM 2000 500 Firefly 2000 649 All the CREDITCARD data sets are categorized as having similar complexity to real-world data sets with the data sets based on several criteria. All the benchmark CREDITCARD transaction data sets have a moderate to high transaction data length that ranges from 1996 to 2637 transaction data length. The results clearly indicate that the hybrid method (FA- D- TREE, SVM, and KNN) has outperformed the D-TREE, SVM, KNN algorithm on all datasets. For example, in the Gun-Point dataset the D-TREE, SVM, KNN has achieved 11.33% error rate, while the proposed FA-D-TREE, SVM, KNN obtained 00.08% of error rate. It is due to capability of the FA which incorporated into D-TREE, SVM, KNN to find the optimal weights for the D-TREE, SVM, KNN and consequently increase the performance of the D-TREE, SVM, and KNN. This is believed that fireflies come together more closely around the optimal solution. In other words, it has good exploitation capability and can find better solutions as many candidates (fireflies) are gathered near optimal solution.
  • 5. 105 International Journal of Scientific and Technical Advancements ISSN: 2454-1532 S. Suganya and N. Kamalraj, “Meta classification technique for improving credit card fraud detection,” International Journal of Scientific and Technical Advancements, Volume 2, Issue 1, pp. 101-105, 2016. B. Comparison with state-of-the-Art Table II shows the comparison of the results of FA-D- TREE, SVM, KNN and other available approaches in terms of error rate classification using credit card datasets. The best results are presented in bold. The experimental results indicate that the proposed hybrid method (FA-D-TREE, SVM, KNN) outperforms other approaches on credit card datasets. FA-D-TREE, SVM, KNN is able to classify the Wafer with error rate of 0.004%. This capability is supported by the feature of the attractiveness i.e., the density of the light that caused the fireflies to be brighter (is determined by the value of the objective function) and attract to the location of near optimal solutions. C. Experimental Results Table II presents the comparison of the error rate (%) between FA-D-TREE, SVM, KNN and D-TREE, SVM, KNN transaction data classification techniques with credit card datasets. TABLE II: Comparison of algorithms error rate Algorithm Instance# Feature# Proposed algorithm Actual Predicted KNN 2424 100 73.15±7.41 61.38±5.09 61.89±4.11 D-Tree 2397 617 90.56±1.02 89.77±1.02 90.01±1.03 SVM 2000 500 67.72±3.36 55.63±3.29 55.1±3.47 Firefly (Ensemble) 2000 649 96.11±1.3 97.9±0.9 97.9±0.92 Fig. 2. Comparison of error rate between algorithms. Note#: Instance is the number of rows of data. Feature is the initial result of the algorithm. Proposed Algorithm indicates the result obtained by ensemble of D-Tree with FA, KNN with FA, FA with SVM. Actual column shows the actual classes in the selected instance of data. Predicted shows the results obtained by the ensemble algorithms with FA. V. CONCLUSION AND FUTURE WORK In this work an ensemble method based on the D-TREE, SVM, KNN and FA is proposed for solving transaction data classification problems. Initial classification results are generated at random instances of data using D-TREE, SVM, KNN and the improvement is carried out by the FA that tries to optimize the weights of the D-TREE,SVM,KNN using ensemble mechanism. Experiments results using benchmark CREDITCARD transaction data sets show that the proposed FA-D-TREE, SVM, KNN outperforms the D-TREE, SVM, KNN on all dataset instances. Further comparison with other approaches in the literature shows that the ensemble method is able to minimize the error rate with new best results on instances. As an extension of this study, further investigation will be devoted to validate the hybridization between FA with local search algorithm for the purpose of creating a balance between the exploration and exploitation during the optimization process and to avoid the premature convergence. REFERENCES [1] E. Bauer and R. Kohavi, “An empirical comparison of voting classification algorithms: bagging, boosting, and variants,” Machine Learning, vol. 35, pp. 1-38, 1999. [2] C. E. Brodley, “Recursive automatic bias selection for classifier construction,” Machine Learning, vol. 20, pp. 63-94, 1995. [3] T. Dietterich, “Ensemble methods in machine learning,” First International Workshop on Multiple Classifier Systems, Lecture Notes in Computer Science, Springer-Verlag, pp. 1-15, 2000. [4] G. Brown, J. L. Wyatt, and P. Tino, “Managing diversity in regression ensembles,” The Journal of Machine Learning Research, vol. 6, pp. 1621-1650, 2005. [5] A. Sharkey, Combining Artificial Neural Networks: Ensemble and Modular Multi-Net Systems, Springer-Verlag, pp. 1-30, 1999. [6] R. Patidar and L. Sharma, “Credit card fraud detection using neural network,” International Journal of Soft Computing and Engineering (IJSCE), vol. 1, issue-NCAI2011, 2011. [7] Y. Sahin and E. Duman, “Detecting credit card fraud by decision trees and support vector machines,” Proceedings of the International Multiconference of Engineers and Computer Scientists, Hong Kong, vol. I, 2011. [8] K. R. Seeja and M. Zareapoor, “FraudMiner: A novel credit card fraud detection model based on frequent item set mining,” The Scientific World Journal, Hindawi Publishing Corporation, vol. 2014, article ID 252797, pp. 1-10, 2014. [9] J. Pun and Y. Lawryshyn, “Improving credit card fraud detection using a meta-classification strategy,” International Journal of Computer Applications, vol. 56, no. 10, pp. 41-46, 2012. [10] A. Thakur, B. Shaikh, V. Jain, and A. M. Magar, “Hidden markov model in credit card fraud detection,” International Journal of Advanced Research in Computer Science and Software Engineering, vol. 5, issue 2, pp. 997-1000, 2015. [11] S. Maes, K. Tulys, B. Vanschoenwinkel, and B. Manderick, “Credit card fraud detection using bayesian and neural networks,” Vrije Universiteit Brussel–Department of Computer Science, Computational Modeling Lab (COMO), Pleinlaan 2, B-1050 Brussel, Belgium.