SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 2565
Hybrid Model Using Unsupervised Filtering Based On Ant Colony
Optimization And Multiclass Svm By Considering Medical Data Set
Rashmi1, Shaveta Saini2
1 M.Tech Scholar, Department of Computer Science & Engineering, Guru Nanak Dev University
RC Jalandhar, Punjab
2 M.Tech Scholar, Department of Computer Science & Engineering, Guru Nanak Dev University
RC Jalandhar, Punjab
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Data mining is the computational procedure for
discovering habits in big data pieces ("big data") regarding
methods in the intersection of artificial thinking ability,
machine learning, statistics, and also database programs. The
total goal of the data exploration process is usually to extract
information from the data set and transform it into a good
understandable structure for further use. It has been found
that that the Ant colony optimizationoutperformsovertheJ48
and random forest based machine learning algorithms. The
overall objective of this research work is to propose a hybrid
model which will use unsupervised filtering which will be
followed by Ant colony optimization and multiclass SVM by
considering the medical data set.
KeyWords: Data mining, Ant colony
optimization,Random forest, SVM, ACO.
1.INTRODUCTION
Data Mining is defined as getting information and facts via
substantial sets of data. To put it differently,most people can
say which facts mining isthe methodwithmining knowledge
via data. The content or maybe knowledge removed
consequently work extremely well in many programs like
Market place Evaluation, Deception Diagnosis, Client
Storage, Manufacturing Regulate, Scientific research Search
etc. Normally, facts mining (sometimes identified as facts or
maybe knowledge discovery) is the whole process of
examining facts via various aspects along with summarizing
the idea straight into valuable information and facts -
information and facts which can be used to improve sales
revenue, reduces prices, or maybe both. Details mining
software is one of the systematic instruments intended for
examining data. The idea permits consumers to research
facts via several size or maybe angles, sort the idea, along
with summarize this interactionsidentified.Technologically,
facts mining is the whole processofobtainingcorrelationsor
maybe habits amongst a large number of domains around
large relational databases.
Data mining is highly useful in the following domains −
 Market Analysis and Management
 Corporate Analysis & Risk Management
 Fraud Detection
1.1 Ant Colony Optimization in Data Mining
Data mining (sometimes termed facts or expertise
discovery) includes the usage of sophisticated facts
evaluation equipment find out before mysterious, valid
behavior and also romantic relationships inside significant
datasets. These power tools may incorporate statistical
types, statistical algorithms, and also appliance
understanding solutions (algorithms of which better their
overall performance routinely as a result of expertise, such
as neurological communities or selection trees). For that
reason, facts mining consists of greater than collecting and
also managing facts – this also may include study and also
prediction. Data mining can be on facts depicted inside
quantitative, textual forms. Files mining apps can make use
of various factors to check the data. Such as connections,
collection or direction study, distinction, clustering and also
forecasting. While facts mining solutions can be very
powerful gear, they may not be self-sufficient applications.
To overcome the objectives, facts mining needs qualified
complex and also investigative Gurus that can design your
study and also experience the outcome that's created.
Frequently, inside facts mining projects virtually any of 4
sorts of romantic relationships usually are desired:
•Classes: stored data is used to locate data in predetermined
groups.
•Clusters: data items are grouped according to logical
relation-ships or user preferences
•Associations: data can be mined to identify associations.
•Sequential patterns: data is mined to anticipate behavior
patterns and trends.
1.2 Random Forest
RF suits numerous category trees to the details fixed, and
then fuses the actual predictions all the actual trees. The
algorithm starts off with selecting a numerous (e.g., 500)
bootstrap samples fromthe data.Insidea commonbootstrap
taste, roughly 63% connected with the very first
observations happen at least once. Observations in the very
first details fixed that will not happen inside a bootstrap
taste are called out-of-bag observations. Your category
sapling is match to each and every bootstrap taste,
nevertheless at each and every node, only a few arbitrarily
picked specifics (e.g., the actual sq cause of the number of
variables) are available for the actual binary partitioning.
The trees and shrubs usually are totally produced and they
are all utilised to predict the actual out-of-bag observations.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 2566
The forecasted category of an statement is worked out by
simply the vast majority political election in the out-of-bag
predictions for your statement, with jewelry divide
randomly.
1.3 Ant Colony Decision Trees (ACDT)
It is actually exciting to see the fact that proposed algorithm
criteria connected with decision sapling
structure is principally depending on the release connected
with insect nest optimization. Several bit of a
variations have already been presented the two while a
whole new individually distinct optimization algorithm
criteria pertaining to making decision woods as well as a
whole new meta heuristics tactic in details exploration
procedures. With ACDT each one insect prefers the
appropriate feature pertaining to splitting in each one node
with the produced decision sapling based on the heuristic
operate as well as pheromone values.
Fig1. Ant Colony Decision Trees
The particular displayed alterations are unveiled primarily
move tip and perhaps they are treated seeing that a
noticable difference on the expertise of the group
mechanism. We certainly have utilised your conventional
version with ACO easy alterations in regards to the main
policies, dedicated to each agents–little bugs through the
building this trips are contained in the scheme. We have got
put on this conventional busting tip, first of all utilized in
CART. Secondly, we're complied with the pheromone
alterations which will are useful awareness intended for
generating an acceptable division. With ACDT each ould like
determines the perfect trait intendedforbustingwithin each
node on the made determination tree according to the
heuristic function along with pheromone beliefs (fig. 1). The
particular heuristic function is definitely using the Twoing
criterion, which supports little bugs divide this physical
objects within a couple communities, of this analyzed trait
values. In this way, this trait, which will nicely split this
physical objects is definitely treated as being the finest
situation for that analyzed node. The best busting is
definitely noticed after we labeled exactly the same volume
of physical objects in the right and left sub-contract woods
using the ideal homogeneity within deciding classes.
Pheromone beliefs signify the simplest way (connection)
with the better than this subordinate nodes– most doable a
combination in the analyzed subtrees. Foreachnodemost of
us estimate the next beliefs according to the physical objects
labeled utilizing the Twoing criterion with the highest
node.The pseudo computer code on the offered criteria is
definitely displayed below. At the start of it is deliver the
results, each ould like forms a single determination tree . At
the end of this hook, the top determination tree is definitely
picked out and then the pheromone is definitely up-to-date
according to the splits completed along the way with
structure deciding tree, iteratively. Whilst constructing this
tree, agents–little bugs are examining prior structures and
many alterations are executed within the node. This
technique is completed till the finest determination tree is
definitely obtained. Is essential constructing decidingtree is
definitely presented.
2. UNSUPERVISED FILTERING
Techniques to construct narrow techniques intended for
raw, unclassified details are called without supervision
mastering procedures with the theory with neurological
networks. This sort of techniques will often be specified by
their mastering rules, i.e., how they change their central
dumbbells or even narrow coefficients. In this particular it
will work with an alternative method to discoverthe narrow
functions. Here we opt for very first many properties of the
outcome vectors computed by the system. We layout a good
(or energy) function that will procedures most of these
properties. Eventually, most of us utilize an iterative search
engine optimization method to discover the filters. It can be
the benefit that will the massive current expertise with
search engine optimization concept can certainly apply to
discover powerful implementations with the training
procedure.
2.1 Multiclass SVMs
SVMs will be in the beginning designedforbinarydistinction
problems. Extensions to be able to multiclass commonly
contain possibly fixing a huge seo trouble immediately or
perhaps contemplating the decomposition with the original
trouble into smaller binary sub-contracttroublesorperhaps
next pairing their particular solutions. While either
techniques, commonly,present nofactorinside effectiveness
in the event the super factors will be correctly updated [16],
the decomposition one is much more computational
attractive. There are two most important strategies with
decomposition:One-Versus-One(OVO)andalsoOne-Versus-
All (OVA). And may generally utilized because of their ease,
overall performance and also in the same way excellent
distinction effectiveness [16]. This kind of documentaims at
on the OVO program, however the proposed approach may
very well be well put on some other multiclass strategies
since well. Your OVO process constructs N(N−1)/2 SVMs,
acquiring note many binary combinations of classes. While
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 2567
an evaluation case in point is supplied, it truly is put on each
of the SVMs and their produces will be for some reason
combined. Your MaxWins voting program [9] utilized the
following counts how frequently will you each and every
course is definitely outputted through the binary SVMs and
also the test case in point is associated with by far the most
identified as class.
3.Related work
Duan, K. B., Rajapakse, J. C., Wang, H., & Azuaje, F. (2005)[1]
has proposeda new aspect range procedure that runs on the
backward elimination treatment just like that carried out
throughout assist vector unit recursive aspect elimination
(SVM-RFE). Contrarytoa SVM-RFEprocedure,atmostmove,
a planned tactic computes a aspect rank credit score from
your precise analysis involving weight vectors involving a
number of straight line SVMs educated on subsamples
involving the very first education data. Cios K.J and Moore
G.W(2002)[2] has studiedEthical as well as legalised
elements of health data mining as well as data property,
concern with litigation, anticipated benefits plus unique
administrator issues.Brameir.M and Banzhaf.W (2001) [3]
has discussed a couple of strategies of acceleration and
speed of innate encoding approach. First one is the use of a
powerful protocol that will minimizes code. Next one is a
demotic approach to virtually parallelize the device on a
single processor.GP operation with professional medical
explanation issues is usually in comparison coming from a
standard repository along with outcomes acquired through
nerve organs networks. Benefits reveal that GP functions
equally inside explanation as well as generalization.Prather
J.C and Lobach D.F (1997) [4] has used the tactics of web
data mining (also called Understanding Uncovering inside
databases) to search for interactions within a significant
professional medical database. They summarize the
particular functions interested in mining any professional
medical database like files warehousing, files query&
cleansing and files analysis.Paripinelli R.S,Lopes H.S and
Freitas A.A92001)[5] has described a formula pertaining to
concept breakthrough discovery inside data bank known as
AntMiner.The intent of your algorithm criteria will be the
removal of category procedures to help be relevant to
undetectable details like a choice aid.AntMiner has been
applied to health data bank to get category rules.Li J, Fu.A,
W.c and He.H (2005) [6] has discussedthetroubleconnected
with acquiring probabilityhabitsinsideprofessional medical
results are discussed. Danger habits by way of a statsmetric,
distant relative probability which has been commonly used
inside epidemiological exploration will be defined. A anti-
monotone property to get mining exceptional probability
pattern places is usuallystudied.Thecriteria featuresearned
a few beneficial outcomes for professional medical
researchers.Ghazavi S.N and Liao T.W (2008) [7] has
presenteda details mining study associated with health care
details using furred modelling procedures designed to use
aspect subsets picked out by simply many methods. About
three furred modelling procedures like the furred k-nearest
neighbor algorithm criteria, a furred clustering based
modelling and also the flexible multilevel based furred
inference process are generally employed.Delan.D, Walker.G
and Kadam.A (2005) [8] has stated thatimaginative
biomedical systems, greater explanatory prognostic
components will be calculated along with recorded in this
paper. Available electronic advancements to build up idea
versions with regard to cancer of the breast survivability are
used.Pamulaparty, L., Rao, C. G., and Rao, M. S. (2016)[9] has
discussedthat bunch evaluation isolates the information in
groups that are important, practical as well as both. It is
additionally utilized to be a place to start intended for some
other reasons of knowledge summarization. These people
reviewed a few very basic algorithms including K-means,
Hairy C-means, Hierarchical clusteringtothink ofgroupings,
and use Ur files mining tool. Your outcomes are subjected to
testing within the datasets that is On line News flash
Popularity, Eye Files Fixed plus coming from UCI files
repository plus mi RNA dataset intended for health-related
files analysis. All datasets appeared to be examined with
different clustering algorithms. Every single formula have
their own originality plus antithetical behavior.Sebag, M.,
Azé, J., and Lucas, N. (2016) [10] has taken from a NP finish
marketing qualification to get supervised learning, the
location underneath the ROC curve. This specific
marketing qualification, handled together with progression
approaches, will be experimentally when compared
to basique probability qualification handled simply by
quadratic marketing in Assistance Vector Machines. Very
similar answers are attained with some standard difficulties
from the Irvine repository, inside half this SVM
computational cost.Yu, H., Vaidya, J., andJiang,X.(2006)[11]
has proposedvarious algorithms that handed out
understanding breakthrough, eventhoughfurnishingmakes
certain within the non-disclosure connected with data.
Group is a vital data mining difficulty pertinent in most
varied domains. The aim of explanation should be to build a
model which could forecast a characteristic (binary credit
with this work) in line with the remainder of attributes.
People propose a proficient plus protected privacy-
preserving algorithmpertainingtoassistancevectorproduct
(SVM) explanation through up and down partitioned data.
Caruana, R., and Niculescu-Mizil, A.(2004)[12] has
studiedvarious algorithms that handed out understanding
breakthrough, even though furnishing makes certain within
the non-disclosure connected with data. Group isa vital data
mining difficulty pertinent in most varied domains. The aim
of explanation should be to build a model which could
forecast a characteristic (binarycredit withthiswork)inline
with the remainder of attributes.Peopleproposea proficient
plus protected privacy-preserving algorithm pertaining to
assistance vector product (SVM)explanationthroughupand
down partitioned data.
4. EXPERIMENTAL SETUP
MATLAB (matrix laboratory) is a multi-paradigm
numerical processing environment and fourth-generation
programming language. A proprietary programming
language developed by Math Works, MATLAB allows matrix
manipulations, plotting of data and functions, execution of
algorithms, creation of end user interfaces, and interfacing
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 2568
with programs written in other languages suchasC,C++,C#,
Java, Fortran and Python.
Waikato Environment for KnowledgeAnalysis (Weka)is
a popular suite of machine learning software writteninJava,
developed at the University of Waikato, New Zealand. It is
free software licensedunder theGNUGeneral Public License.
It is a workbench which consists ofcollectionofvisualization
tools and algorithms for data analysis and predictive
modeling, along with graphical user interfaces for accessing
the functions easily.
The Intel’s DUAL CORE processor is used along with
WINDOWS 7 with 2 GB RAM and 500 GB hard disk.
The data sets used for reference in order to evaluate the
performance of proposed algorithm are:
Breast Cancer [26]-Breast cancer generally develops from
breast tissue. Signs of breast cancer includes a lump in the
breast, a change in breast shape, dimpling of the skin, fluid
coming from the nipple, or a red scaly patch of skin.
Diabetes [26]-Diabetes is a metabolic disorder which is
characterized by high blood sugar, insulin resistance, and
relative lack of insulin. Common symptoms include
increased thirst, frequent urination,andunexplainedweight
loss
E-coli [26]-Escherichia coli is a gram-negative, facultative
anaerobic, rod-shaped bacterium of the genus Escherichia
which is commonly found in the lower intestine of warm-
blooded organisms. Most E. coli strains are harmless, but
some serotypes can cause serious food poisoning in their
hosts.
Heart Disease [26]-Cardiovascular disease (CVD) is a class
of diseases which involves the heart or blood vessels. This is
generally caused by high blood pressure, smoking, diabetes,
lack of exercise, obesity, high blood cholesterol, poor diet,
and excessive alcohol consumption.
5. METHODOLOGY AND RESULTS
5.1 Methodology
Fig2.Proposed methodology
5.2 Performance Analysis
This paper has designed and implemented the proposed
technique in MATLAB tool u2013a. The evaluation of
proposed technique is done on the basis offollowingmetrics
i.e. Accuracy, F-measure, true positive rate andfalsepositive
raate. A comparison is drawn between all the parameters
with proposed algorithm and figures shows all the results.
1. Correctly Classified Instances-It is defined as the
number of instances which are classified as correct from the
total number of instances used.
2.Incorrectly Classified Instances-It is defined as the
number of instances which are classified as incorrect from
the total number of instances used.
3. Kappa Statistics-Kappa statistics is the measure that
determines inter rater agreement for qualitative items.
Cohen's kappa measures the agreement between two raters
who each classify N items into C mutually exclusive
categories. It is calculated as-
LOAD DATA SET
TRAIN AND TEST DATA USING HYBRID MULTICLASS SVM AND ANT
COLONY OPTIMIZATION
EVALUATE PERFORMANCE
RETURN SOLUTION
START
END
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 2569
4. Accuracy-Accuracy refers to the ability of the model to
correctly predict the class label of new or unseen data.It is
calculated as-
where,
True positives (TP) =No. of correct classifications predicted
as yes(or positive).
True negatives (TN) =No. of correct classificationspredicted
as no(or negative).
False positive (FP) =No. of incorrectclassificationspredicted
as yes(positive) when it is actually no(negative).
False negative (FN) =No. of incorrect classifications
predicted as no(negative) when it is actually yes(positive).
5. F-Measure-It is the measure that combines precision and
recall. It is the harmonic mean of precision and recall.It is
calculated as-
where, Precision and Recall are defined from the eqs below
6.RESULTS AND PERFORMANCE EVALUATION
The results obtained from the proposedmodel areforBreast
cancer, Diabetes, E-coli and Heart disease data sets. The
detailed discussion of the results obtained for the data sets.
1. Discusssion for Wisconsin Breast Cancer data set
The accuracy of proposed Hybrid multiclass SVM and LAD
tree model(shown in “bold” text in table 3) is computed as
96.5 for Breast Cancer data set.The accuracies of other
classification models have alsobeencomputedinthetable.It
is clearly evident from the results obtainedthattheaccuracy
of proposed model is computed as the highest among all
other classification models. In addition to accuracy, another
important metric i.e. Kappa Statistics is also evaluatedand it
also shows the highest reading of 0.9157 from all other
existing classifiers.
Another important metric ROC is also calculated. The ROC
area of proposed model i.e. 0.994 is also highest among all
other existing classifiers.
Fig 3.Analysis of corr class istances, Inc class ins,Accuracy
Fig 4.Analysis of kappa stat and F-measure
2. Discussion for Diabetes data set
The accuracy of proposed Hybrid multiclass SVM and LAD
tree model (shown in “bold” text in table 4) is computed as
99.2 for Diabetes data set. The accuracies of other
classification models have alsobeencomputedinthetable.It
is clearly evident from the results obtainedthattheaccuracy
of proposed model is computed as the highest among all
other classification models. In addition to accuracy, another
important metric i.e. Kappa Statistics is also evaluatedand it
also shows the highest reading of 0.9827 from all other
existing classifiers.
Another important metric ROC is also calculated. The ROC
area of proposed model i.e.1 is also highest among all other
existing classifiers.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 2570
Fig 5.Analysis of corr class istances, Inc class ins,Accuracy
Fig 6.Analysis of kappa stat and F-measure
3. Discussion for E-coli data set
The accuracy of proposed Hybrid multiclass SVM and LAD
tree model(shown in “bold” text in table 5) is computed as
99.4 for E-coli data set. The accuracies of other classification
models have also been computed in the table. It is clearly
evident from the results obtained that the accuracy of
proposed model is computed as the highest among all other
classification models. In addition to accuracy, another
important metric i.e. Kappa Statistics is also evaluatedand it
also shows the highest reading of 0.9918 from all other
existing classifiers.
Another important metric ROC is also calculated. The ROC
area of proposed model i.e.1 is also highest among all other
existing classifiers.
Fig 7.Analysis of corr class istances, Inc class ins,Accuracy
Fig 8.Analysis of kappa stat and F-measure
4. Discussion for Heart Disease data set
The accuracy of proposed Hybrid multiclass SVM and LAD
tree model (shown in “bold” text in table 6) is computed as
98.3 for Heart disease data set. The accuracies of other
classification models have alsobeencomputedinthetable.It
is clearly evident from the results obtainedthattheaccuracy
of proposed model is computed as the highest among all
other classification models. In addition to accuracy, another
important metric i.e. Kappa Statistics is also evaluatedand it
also shows the highest reading of 0.9632 from all other
existing classifiers.
Another important metric ROC is also calculated. The ROC
area of proposed model i.e.0.998 is also highest among all
other
Fig 9.Analysis of corr class istances, Inc class ins,Accuracy
Fig 10.Analysis of kappa stat and F-measure
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 2571
6.CONCLUSION
In this paper, we have analyzed existing ANT colony
optimization and random forest tree based data mining.The
proposed unsupervised filtering by ACO and multiclassSVM
based data mining gives betterresults.Thispaperhasshown
comparison between exiting and proposed data mining
techniques on the basis of parameters like Correctly
classified instances,Incorrectly classified instances,Kappa
statistics,Accuracy and F-measure.This proposed technique
of data mining shows better results as compared to the
existing technique.
REFERENCES
[1]Duan, K. B., Rajapakse, J. C., Wang, H., & Azuaje, F. (2005).
Multiple SVM-RFE for gene selection in cancer
classification with expression data. IEEE transactions on
nanobioscience, 4(3), 228-234.
[2]Cios, K. J., & Moore, G. W. (2002). Uniqueness of medical
data mining. Artificial intelligence in medicine, 26(1), 1-
24.
[3]Brameier, M., & Banzhaf, W. (2001). A comparison of
linear genetic programming and neural networks in
medical data mining. IEEE Transactions on Evolutionary
Computation, 5(1), 17-26.
[4]Prather, J. C., Lobach, D. F., Goodwin, L. K., Hales, J. W.,
Hage, M. L., & Hammond, W. E. (1997). Medical data
mining: knowledge discovery in a clinical data
warehouse. In Proceedings of the AMIA annual fall
symposium (p. 101). American Medical Informatics
Association.
[5] Parpinelli, R. S., Lopes, H. S., & Freitas, A. A. (2001, July).
An ant colony based system for data mining:applications
to medical data. In Proceedings of the genetic and
evolutionarycomputationconference(GECCO-2001)(pp.
791-797).
[6] Li, J., Fu, A. W. C., He, H., Chen, J., Jin, H., McAullay, D., &
Kelman, C. (2005, August). Mining risk patterns in
medical data. In Proceedings of the eleventh ACM
SIGKDD international conference on Knowledge
discovery in data mining (pp. 770-775). ACM.
[7] Ghazavi, S. N., & Liao, T. W. (2008). Medical data mining
by fuzzy modeling with selected features. Artificial
Intelligence in Medicine, 43(3), 195-206.
[8] Delen, D., Walker, G., & Kadam, A. (2005). Predicting
breast cancer survivability: a comparison of three data
mining methods. Artificial intelligencein medicine,34(2),
113-127.
[9] Pamulaparty, L., Rao, C. G., & Rao, M. S. (2016). Cluster
Analysis of Medical Research Data usingR. Global Journal
of Computer Science and Technology, 16(1).
[10] Sebag, M., Azé, J., & Lucas, N. (2003, October). ROC-
based evolutionary learning: Application to medical data
mining. In International Conference on Artificial
Evolution (Evolution Artificielle) (pp.384-396).Springer
Berlin Heidelberg.
[11]Yu, H., Vaidya, J., & Jiang, X. (2006, April). Privacy-
preserving svm classification on vertically partitioned
data. In Pacific-Asia Conference on KnowledgeDiscovery
and Data Mining (pp. 647-656). Springer Berlin
Heidelberg.
[12] Caruana, R., & Niculescu-Mizil, A. (2004, August). Data
mining in metric space: an empirical analysis of
supervised learning performancecriteria.InProceedings
of the tenth ACM SIGKDD international conference on
Knowledge discovery and data mining (pp. 69-78). ACM.
[13] Delen, D. (2009). Analysis of cancer data: a data mining
approach. Expert Systems, 26(1), 100-112.
[14] Tomar, D., & Agarwal, S. (2013). A survey on Data
Mining approaches for Healthcare. International Journal
of Bio-Science and Bio-Technology, 5(5), 241-266.
[15] Raikwal, J. S., s& Saxena, K. (2012). Performance
evaluation of SVM and k-nearestneighboralgorithmover
medical data set. International Journal of Computer
Applications, 50(14).
[16] Moses, D. (2015). A survey of data mining algorithms
used in cardiovascular disease diagnosis frommulti-lead
ECG data. Kuwait Journal of Science, 42(2).
[17] Nichat, A. M., & Ladhake, S. A. (2016). Brain Tumor
Segmentation and Classification UsingModifiedFCMand
SVM Classifier. Brain, 5(4).
[18] Verma, L., Srivastava, S., & Negi, P. C. (2016). A Hybrid
Data Mining Model to Predict Coronary Artery Disease
Cases Using Non-Invasive Clinical Data. Journal of
Medical Systems, 40(7), 1-7.
[19] Li, D. C., Liu, C. W., & Hu, S. C. (2010). A learning method
for the class imbalance problem with medical data sets.
Computers in biology and medicine, 40(5), 509-518.
[20]Kazemzadeh, R. S., & Sartipi, K. (2005, September).
Interoperability of data and knowledge in distributed
health care systems. In 13th IEEE International
Workshop on Software Technology and Engineering
Practice (STEP'05) (pp. 230-240). IEEE.

More Related Content

PDF
Data mining techniques
PDF
Data mining techniques a survey paper
PDF
IRJET - Survey on Clustering based Categorical Data Protection
PDF
G046024851
PDF
Volume 14 issue 03 march 2014_ijcsms_march14_10_14_rahul
PDF
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
PPTX
02 Related Concepts
PDF
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET
Data mining techniques
Data mining techniques a survey paper
IRJET - Survey on Clustering based Categorical Data Protection
G046024851
Volume 14 issue 03 march 2014_ijcsms_march14_10_14_rahul
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
02 Related Concepts
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASET

What's hot (20)

PDF
IRJET- A Detailed Study on Classification Techniques for Data Mining
PDF
A Survey on Constellation Based Attribute Selection Method for High Dimension...
PDF
IRJET- Missing Data Imputation by Evidence Chain
PPTX
03 Data Mining Techniques
PDF
Survey on semi supervised classification methods and
PDF
Survey on semi supervised classification methods and feature selection
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
A02610104
PDF
An Efficient Approach for Asymmetric Data Classification
PDF
Identification of Disease in Leaves using Genetic Algorithm
PDF
Intrusion Detection and Forensics based on decision tree and Association rule...
PDF
Introduction to feature subset selection method
PDF
CCC-Bicluster Analysis for Time Series Gene Expression Data
PDF
M43016571
DOCX
Introductionedited
PPT
Detection of plant diseases
PDF
Bj32809815
PDF
Plant Leaf Disease Analysis using Image Processing Technique with Modified SV...
PDF
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...
PDF
Classification of Paddy Types using Naïve Bayesian Classifiers
IRJET- A Detailed Study on Classification Techniques for Data Mining
A Survey on Constellation Based Attribute Selection Method for High Dimension...
IRJET- Missing Data Imputation by Evidence Chain
03 Data Mining Techniques
Survey on semi supervised classification methods and
Survey on semi supervised classification methods and feature selection
Welcome to International Journal of Engineering Research and Development (IJERD)
A02610104
An Efficient Approach for Asymmetric Data Classification
Identification of Disease in Leaves using Genetic Algorithm
Intrusion Detection and Forensics based on decision tree and Association rule...
Introduction to feature subset selection method
CCC-Bicluster Analysis for Time Series Gene Expression Data
M43016571
Introductionedited
Detection of plant diseases
Bj32809815
Plant Leaf Disease Analysis using Image Processing Technique with Modified SV...
Improved Frequent Pattern Mining Algorithm using Divide and Conquer Technique...
Classification of Paddy Types using Naïve Bayesian Classifiers
Ad

Similar to Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization and Multiclass Svm by Considering Medical Data Set (20)

PDF
Feature Subset Selection for High Dimensional Data using Clustering Techniques
PDF
SELF LEARNING REAL TIME EXPERT SYSTEM
PDF
Self learning real time expert system
PDF
IRJET- Pattern Recognition Process, Methods and Applications in Artificial In...
PDF
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
PDF
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
DOCX
Final Report
PPTX
UNIT 2: Part 2: Data Warehousing and Data Mining
PDF
Recommendation system using unsupervised machine learning algorithm & assoc
PDF
Data Mining Module 2 Business Analytics.
PDF
IRJET- Random Forest Algorithm in Drug Selection in Medical Field
PDF
Survey: Biological Inspired Computing in the Network Security
PDF
An integrated mechanism for feature selection
PDF
IRJET - Heart Health Classification and Prediction using Machine Learning
PDF
IRJET-Scaling Distributed Associative Classifier using Big Data
PDF
Data Science - Part V - Decision Trees & Random Forests
PDF
A Firefly based improved clustering algorithm
PDF
Potato Leaf Disease Detection Using Machine Learning
PDF
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
PDF
H017124652
Feature Subset Selection for High Dimensional Data using Clustering Techniques
SELF LEARNING REAL TIME EXPERT SYSTEM
Self learning real time expert system
IRJET- Pattern Recognition Process, Methods and Applications in Artificial In...
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Final Report
UNIT 2: Part 2: Data Warehousing and Data Mining
Recommendation system using unsupervised machine learning algorithm & assoc
Data Mining Module 2 Business Analytics.
IRJET- Random Forest Algorithm in Drug Selection in Medical Field
Survey: Biological Inspired Computing in the Network Security
An integrated mechanism for feature selection
IRJET - Heart Health Classification and Prediction using Machine Learning
IRJET-Scaling Distributed Associative Classifier using Big Data
Data Science - Part V - Decision Trees & Random Forests
A Firefly based improved clustering algorithm
Potato Leaf Disease Detection Using Machine Learning
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
H017124652
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPTX
web development for engineering and engineering
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
additive manufacturing of ss316l using mig welding
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Artificial Intelligence
PPT
Mechanical Engineering MATERIALS Selection
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Sustainable Sites - Green Building Construction
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Construction Project Organization Group 2.pptx
PPTX
Current and future trends in Computer Vision.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Geodesy 1.pptx...............................................
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
composite construction of structures.pdf
web development for engineering and engineering
Foundation to blockchain - A guide to Blockchain Tech
Safety Seminar civil to be ensured for safe working.
additive manufacturing of ss316l using mig welding
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Internet of Things (IOT) - A guide to understanding
Artificial Intelligence
Mechanical Engineering MATERIALS Selection
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Sustainable Sites - Green Building Construction
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Construction Project Organization Group 2.pptx
Current and future trends in Computer Vision.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
OOP with Java - Java Introduction (Basics)
Geodesy 1.pptx...............................................
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
composite construction of structures.pdf

Hybrid Model using Unsupervised Filtering Based on Ant Colony Optimization and Multiclass Svm by Considering Medical Data Set

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 2565 Hybrid Model Using Unsupervised Filtering Based On Ant Colony Optimization And Multiclass Svm By Considering Medical Data Set Rashmi1, Shaveta Saini2 1 M.Tech Scholar, Department of Computer Science & Engineering, Guru Nanak Dev University RC Jalandhar, Punjab 2 M.Tech Scholar, Department of Computer Science & Engineering, Guru Nanak Dev University RC Jalandhar, Punjab ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Data mining is the computational procedure for discovering habits in big data pieces ("big data") regarding methods in the intersection of artificial thinking ability, machine learning, statistics, and also database programs. The total goal of the data exploration process is usually to extract information from the data set and transform it into a good understandable structure for further use. It has been found that that the Ant colony optimizationoutperformsovertheJ48 and random forest based machine learning algorithms. The overall objective of this research work is to propose a hybrid model which will use unsupervised filtering which will be followed by Ant colony optimization and multiclass SVM by considering the medical data set. KeyWords: Data mining, Ant colony optimization,Random forest, SVM, ACO. 1.INTRODUCTION Data Mining is defined as getting information and facts via substantial sets of data. To put it differently,most people can say which facts mining isthe methodwithmining knowledge via data. The content or maybe knowledge removed consequently work extremely well in many programs like Market place Evaluation, Deception Diagnosis, Client Storage, Manufacturing Regulate, Scientific research Search etc. Normally, facts mining (sometimes identified as facts or maybe knowledge discovery) is the whole process of examining facts via various aspects along with summarizing the idea straight into valuable information and facts - information and facts which can be used to improve sales revenue, reduces prices, or maybe both. Details mining software is one of the systematic instruments intended for examining data. The idea permits consumers to research facts via several size or maybe angles, sort the idea, along with summarize this interactionsidentified.Technologically, facts mining is the whole processofobtainingcorrelationsor maybe habits amongst a large number of domains around large relational databases. Data mining is highly useful in the following domains −  Market Analysis and Management  Corporate Analysis & Risk Management  Fraud Detection 1.1 Ant Colony Optimization in Data Mining Data mining (sometimes termed facts or expertise discovery) includes the usage of sophisticated facts evaluation equipment find out before mysterious, valid behavior and also romantic relationships inside significant datasets. These power tools may incorporate statistical types, statistical algorithms, and also appliance understanding solutions (algorithms of which better their overall performance routinely as a result of expertise, such as neurological communities or selection trees). For that reason, facts mining consists of greater than collecting and also managing facts – this also may include study and also prediction. Data mining can be on facts depicted inside quantitative, textual forms. Files mining apps can make use of various factors to check the data. Such as connections, collection or direction study, distinction, clustering and also forecasting. While facts mining solutions can be very powerful gear, they may not be self-sufficient applications. To overcome the objectives, facts mining needs qualified complex and also investigative Gurus that can design your study and also experience the outcome that's created. Frequently, inside facts mining projects virtually any of 4 sorts of romantic relationships usually are desired: •Classes: stored data is used to locate data in predetermined groups. •Clusters: data items are grouped according to logical relation-ships or user preferences •Associations: data can be mined to identify associations. •Sequential patterns: data is mined to anticipate behavior patterns and trends. 1.2 Random Forest RF suits numerous category trees to the details fixed, and then fuses the actual predictions all the actual trees. The algorithm starts off with selecting a numerous (e.g., 500) bootstrap samples fromthe data.Insidea commonbootstrap taste, roughly 63% connected with the very first observations happen at least once. Observations in the very first details fixed that will not happen inside a bootstrap taste are called out-of-bag observations. Your category sapling is match to each and every bootstrap taste, nevertheless at each and every node, only a few arbitrarily picked specifics (e.g., the actual sq cause of the number of variables) are available for the actual binary partitioning. The trees and shrubs usually are totally produced and they are all utilised to predict the actual out-of-bag observations.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 2566 The forecasted category of an statement is worked out by simply the vast majority political election in the out-of-bag predictions for your statement, with jewelry divide randomly. 1.3 Ant Colony Decision Trees (ACDT) It is actually exciting to see the fact that proposed algorithm criteria connected with decision sapling structure is principally depending on the release connected with insect nest optimization. Several bit of a variations have already been presented the two while a whole new individually distinct optimization algorithm criteria pertaining to making decision woods as well as a whole new meta heuristics tactic in details exploration procedures. With ACDT each one insect prefers the appropriate feature pertaining to splitting in each one node with the produced decision sapling based on the heuristic operate as well as pheromone values. Fig1. Ant Colony Decision Trees The particular displayed alterations are unveiled primarily move tip and perhaps they are treated seeing that a noticable difference on the expertise of the group mechanism. We certainly have utilised your conventional version with ACO easy alterations in regards to the main policies, dedicated to each agents–little bugs through the building this trips are contained in the scheme. We have got put on this conventional busting tip, first of all utilized in CART. Secondly, we're complied with the pheromone alterations which will are useful awareness intended for generating an acceptable division. With ACDT each ould like determines the perfect trait intendedforbustingwithin each node on the made determination tree according to the heuristic function along with pheromone beliefs (fig. 1). The particular heuristic function is definitely using the Twoing criterion, which supports little bugs divide this physical objects within a couple communities, of this analyzed trait values. In this way, this trait, which will nicely split this physical objects is definitely treated as being the finest situation for that analyzed node. The best busting is definitely noticed after we labeled exactly the same volume of physical objects in the right and left sub-contract woods using the ideal homogeneity within deciding classes. Pheromone beliefs signify the simplest way (connection) with the better than this subordinate nodes– most doable a combination in the analyzed subtrees. Foreachnodemost of us estimate the next beliefs according to the physical objects labeled utilizing the Twoing criterion with the highest node.The pseudo computer code on the offered criteria is definitely displayed below. At the start of it is deliver the results, each ould like forms a single determination tree . At the end of this hook, the top determination tree is definitely picked out and then the pheromone is definitely up-to-date according to the splits completed along the way with structure deciding tree, iteratively. Whilst constructing this tree, agents–little bugs are examining prior structures and many alterations are executed within the node. This technique is completed till the finest determination tree is definitely obtained. Is essential constructing decidingtree is definitely presented. 2. UNSUPERVISED FILTERING Techniques to construct narrow techniques intended for raw, unclassified details are called without supervision mastering procedures with the theory with neurological networks. This sort of techniques will often be specified by their mastering rules, i.e., how they change their central dumbbells or even narrow coefficients. In this particular it will work with an alternative method to discoverthe narrow functions. Here we opt for very first many properties of the outcome vectors computed by the system. We layout a good (or energy) function that will procedures most of these properties. Eventually, most of us utilize an iterative search engine optimization method to discover the filters. It can be the benefit that will the massive current expertise with search engine optimization concept can certainly apply to discover powerful implementations with the training procedure. 2.1 Multiclass SVMs SVMs will be in the beginning designedforbinarydistinction problems. Extensions to be able to multiclass commonly contain possibly fixing a huge seo trouble immediately or perhaps contemplating the decomposition with the original trouble into smaller binary sub-contracttroublesorperhaps next pairing their particular solutions. While either techniques, commonly,present nofactorinside effectiveness in the event the super factors will be correctly updated [16], the decomposition one is much more computational attractive. There are two most important strategies with decomposition:One-Versus-One(OVO)andalsoOne-Versus- All (OVA). And may generally utilized because of their ease, overall performance and also in the same way excellent distinction effectiveness [16]. This kind of documentaims at on the OVO program, however the proposed approach may very well be well put on some other multiclass strategies since well. Your OVO process constructs N(N−1)/2 SVMs, acquiring note many binary combinations of classes. While
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 2567 an evaluation case in point is supplied, it truly is put on each of the SVMs and their produces will be for some reason combined. Your MaxWins voting program [9] utilized the following counts how frequently will you each and every course is definitely outputted through the binary SVMs and also the test case in point is associated with by far the most identified as class. 3.Related work Duan, K. B., Rajapakse, J. C., Wang, H., & Azuaje, F. (2005)[1] has proposeda new aspect range procedure that runs on the backward elimination treatment just like that carried out throughout assist vector unit recursive aspect elimination (SVM-RFE). Contrarytoa SVM-RFEprocedure,atmostmove, a planned tactic computes a aspect rank credit score from your precise analysis involving weight vectors involving a number of straight line SVMs educated on subsamples involving the very first education data. Cios K.J and Moore G.W(2002)[2] has studiedEthical as well as legalised elements of health data mining as well as data property, concern with litigation, anticipated benefits plus unique administrator issues.Brameir.M and Banzhaf.W (2001) [3] has discussed a couple of strategies of acceleration and speed of innate encoding approach. First one is the use of a powerful protocol that will minimizes code. Next one is a demotic approach to virtually parallelize the device on a single processor.GP operation with professional medical explanation issues is usually in comparison coming from a standard repository along with outcomes acquired through nerve organs networks. Benefits reveal that GP functions equally inside explanation as well as generalization.Prather J.C and Lobach D.F (1997) [4] has used the tactics of web data mining (also called Understanding Uncovering inside databases) to search for interactions within a significant professional medical database. They summarize the particular functions interested in mining any professional medical database like files warehousing, files query& cleansing and files analysis.Paripinelli R.S,Lopes H.S and Freitas A.A92001)[5] has described a formula pertaining to concept breakthrough discovery inside data bank known as AntMiner.The intent of your algorithm criteria will be the removal of category procedures to help be relevant to undetectable details like a choice aid.AntMiner has been applied to health data bank to get category rules.Li J, Fu.A, W.c and He.H (2005) [6] has discussedthetroubleconnected with acquiring probabilityhabitsinsideprofessional medical results are discussed. Danger habits by way of a statsmetric, distant relative probability which has been commonly used inside epidemiological exploration will be defined. A anti- monotone property to get mining exceptional probability pattern places is usuallystudied.Thecriteria featuresearned a few beneficial outcomes for professional medical researchers.Ghazavi S.N and Liao T.W (2008) [7] has presenteda details mining study associated with health care details using furred modelling procedures designed to use aspect subsets picked out by simply many methods. About three furred modelling procedures like the furred k-nearest neighbor algorithm criteria, a furred clustering based modelling and also the flexible multilevel based furred inference process are generally employed.Delan.D, Walker.G and Kadam.A (2005) [8] has stated thatimaginative biomedical systems, greater explanatory prognostic components will be calculated along with recorded in this paper. Available electronic advancements to build up idea versions with regard to cancer of the breast survivability are used.Pamulaparty, L., Rao, C. G., and Rao, M. S. (2016)[9] has discussedthat bunch evaluation isolates the information in groups that are important, practical as well as both. It is additionally utilized to be a place to start intended for some other reasons of knowledge summarization. These people reviewed a few very basic algorithms including K-means, Hairy C-means, Hierarchical clusteringtothink ofgroupings, and use Ur files mining tool. Your outcomes are subjected to testing within the datasets that is On line News flash Popularity, Eye Files Fixed plus coming from UCI files repository plus mi RNA dataset intended for health-related files analysis. All datasets appeared to be examined with different clustering algorithms. Every single formula have their own originality plus antithetical behavior.Sebag, M., Azé, J., and Lucas, N. (2016) [10] has taken from a NP finish marketing qualification to get supervised learning, the location underneath the ROC curve. This specific marketing qualification, handled together with progression approaches, will be experimentally when compared to basique probability qualification handled simply by quadratic marketing in Assistance Vector Machines. Very similar answers are attained with some standard difficulties from the Irvine repository, inside half this SVM computational cost.Yu, H., Vaidya, J., andJiang,X.(2006)[11] has proposedvarious algorithms that handed out understanding breakthrough, eventhoughfurnishingmakes certain within the non-disclosure connected with data. Group is a vital data mining difficulty pertinent in most varied domains. The aim of explanation should be to build a model which could forecast a characteristic (binary credit with this work) in line with the remainder of attributes. People propose a proficient plus protected privacy- preserving algorithmpertainingtoassistancevectorproduct (SVM) explanation through up and down partitioned data. Caruana, R., and Niculescu-Mizil, A.(2004)[12] has studiedvarious algorithms that handed out understanding breakthrough, even though furnishing makes certain within the non-disclosure connected with data. Group isa vital data mining difficulty pertinent in most varied domains. The aim of explanation should be to build a model which could forecast a characteristic (binarycredit withthiswork)inline with the remainder of attributes.Peopleproposea proficient plus protected privacy-preserving algorithm pertaining to assistance vector product (SVM)explanationthroughupand down partitioned data. 4. EXPERIMENTAL SETUP MATLAB (matrix laboratory) is a multi-paradigm numerical processing environment and fourth-generation programming language. A proprietary programming language developed by Math Works, MATLAB allows matrix manipulations, plotting of data and functions, execution of algorithms, creation of end user interfaces, and interfacing
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 2568 with programs written in other languages suchasC,C++,C#, Java, Fortran and Python. Waikato Environment for KnowledgeAnalysis (Weka)is a popular suite of machine learning software writteninJava, developed at the University of Waikato, New Zealand. It is free software licensedunder theGNUGeneral Public License. It is a workbench which consists ofcollectionofvisualization tools and algorithms for data analysis and predictive modeling, along with graphical user interfaces for accessing the functions easily. The Intel’s DUAL CORE processor is used along with WINDOWS 7 with 2 GB RAM and 500 GB hard disk. The data sets used for reference in order to evaluate the performance of proposed algorithm are: Breast Cancer [26]-Breast cancer generally develops from breast tissue. Signs of breast cancer includes a lump in the breast, a change in breast shape, dimpling of the skin, fluid coming from the nipple, or a red scaly patch of skin. Diabetes [26]-Diabetes is a metabolic disorder which is characterized by high blood sugar, insulin resistance, and relative lack of insulin. Common symptoms include increased thirst, frequent urination,andunexplainedweight loss E-coli [26]-Escherichia coli is a gram-negative, facultative anaerobic, rod-shaped bacterium of the genus Escherichia which is commonly found in the lower intestine of warm- blooded organisms. Most E. coli strains are harmless, but some serotypes can cause serious food poisoning in their hosts. Heart Disease [26]-Cardiovascular disease (CVD) is a class of diseases which involves the heart or blood vessels. This is generally caused by high blood pressure, smoking, diabetes, lack of exercise, obesity, high blood cholesterol, poor diet, and excessive alcohol consumption. 5. METHODOLOGY AND RESULTS 5.1 Methodology Fig2.Proposed methodology 5.2 Performance Analysis This paper has designed and implemented the proposed technique in MATLAB tool u2013a. The evaluation of proposed technique is done on the basis offollowingmetrics i.e. Accuracy, F-measure, true positive rate andfalsepositive raate. A comparison is drawn between all the parameters with proposed algorithm and figures shows all the results. 1. Correctly Classified Instances-It is defined as the number of instances which are classified as correct from the total number of instances used. 2.Incorrectly Classified Instances-It is defined as the number of instances which are classified as incorrect from the total number of instances used. 3. Kappa Statistics-Kappa statistics is the measure that determines inter rater agreement for qualitative items. Cohen's kappa measures the agreement between two raters who each classify N items into C mutually exclusive categories. It is calculated as- LOAD DATA SET TRAIN AND TEST DATA USING HYBRID MULTICLASS SVM AND ANT COLONY OPTIMIZATION EVALUATE PERFORMANCE RETURN SOLUTION START END
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 2569 4. Accuracy-Accuracy refers to the ability of the model to correctly predict the class label of new or unseen data.It is calculated as- where, True positives (TP) =No. of correct classifications predicted as yes(or positive). True negatives (TN) =No. of correct classificationspredicted as no(or negative). False positive (FP) =No. of incorrectclassificationspredicted as yes(positive) when it is actually no(negative). False negative (FN) =No. of incorrect classifications predicted as no(negative) when it is actually yes(positive). 5. F-Measure-It is the measure that combines precision and recall. It is the harmonic mean of precision and recall.It is calculated as- where, Precision and Recall are defined from the eqs below 6.RESULTS AND PERFORMANCE EVALUATION The results obtained from the proposedmodel areforBreast cancer, Diabetes, E-coli and Heart disease data sets. The detailed discussion of the results obtained for the data sets. 1. Discusssion for Wisconsin Breast Cancer data set The accuracy of proposed Hybrid multiclass SVM and LAD tree model(shown in “bold” text in table 3) is computed as 96.5 for Breast Cancer data set.The accuracies of other classification models have alsobeencomputedinthetable.It is clearly evident from the results obtainedthattheaccuracy of proposed model is computed as the highest among all other classification models. In addition to accuracy, another important metric i.e. Kappa Statistics is also evaluatedand it also shows the highest reading of 0.9157 from all other existing classifiers. Another important metric ROC is also calculated. The ROC area of proposed model i.e. 0.994 is also highest among all other existing classifiers. Fig 3.Analysis of corr class istances, Inc class ins,Accuracy Fig 4.Analysis of kappa stat and F-measure 2. Discussion for Diabetes data set The accuracy of proposed Hybrid multiclass SVM and LAD tree model (shown in “bold” text in table 4) is computed as 99.2 for Diabetes data set. The accuracies of other classification models have alsobeencomputedinthetable.It is clearly evident from the results obtainedthattheaccuracy of proposed model is computed as the highest among all other classification models. In addition to accuracy, another important metric i.e. Kappa Statistics is also evaluatedand it also shows the highest reading of 0.9827 from all other existing classifiers. Another important metric ROC is also calculated. The ROC area of proposed model i.e.1 is also highest among all other existing classifiers.
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 2570 Fig 5.Analysis of corr class istances, Inc class ins,Accuracy Fig 6.Analysis of kappa stat and F-measure 3. Discussion for E-coli data set The accuracy of proposed Hybrid multiclass SVM and LAD tree model(shown in “bold” text in table 5) is computed as 99.4 for E-coli data set. The accuracies of other classification models have also been computed in the table. It is clearly evident from the results obtained that the accuracy of proposed model is computed as the highest among all other classification models. In addition to accuracy, another important metric i.e. Kappa Statistics is also evaluatedand it also shows the highest reading of 0.9918 from all other existing classifiers. Another important metric ROC is also calculated. The ROC area of proposed model i.e.1 is also highest among all other existing classifiers. Fig 7.Analysis of corr class istances, Inc class ins,Accuracy Fig 8.Analysis of kappa stat and F-measure 4. Discussion for Heart Disease data set The accuracy of proposed Hybrid multiclass SVM and LAD tree model (shown in “bold” text in table 6) is computed as 98.3 for Heart disease data set. The accuracies of other classification models have alsobeencomputedinthetable.It is clearly evident from the results obtainedthattheaccuracy of proposed model is computed as the highest among all other classification models. In addition to accuracy, another important metric i.e. Kappa Statistics is also evaluatedand it also shows the highest reading of 0.9632 from all other existing classifiers. Another important metric ROC is also calculated. The ROC area of proposed model i.e.0.998 is also highest among all other Fig 9.Analysis of corr class istances, Inc class ins,Accuracy Fig 10.Analysis of kappa stat and F-measure
  • 7. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 06 | June -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 2571 6.CONCLUSION In this paper, we have analyzed existing ANT colony optimization and random forest tree based data mining.The proposed unsupervised filtering by ACO and multiclassSVM based data mining gives betterresults.Thispaperhasshown comparison between exiting and proposed data mining techniques on the basis of parameters like Correctly classified instances,Incorrectly classified instances,Kappa statistics,Accuracy and F-measure.This proposed technique of data mining shows better results as compared to the existing technique. REFERENCES [1]Duan, K. B., Rajapakse, J. C., Wang, H., & Azuaje, F. (2005). Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE transactions on nanobioscience, 4(3), 228-234. [2]Cios, K. J., & Moore, G. W. (2002). Uniqueness of medical data mining. Artificial intelligence in medicine, 26(1), 1- 24. [3]Brameier, M., & Banzhaf, W. (2001). A comparison of linear genetic programming and neural networks in medical data mining. IEEE Transactions on Evolutionary Computation, 5(1), 17-26. [4]Prather, J. C., Lobach, D. F., Goodwin, L. K., Hales, J. W., Hage, M. L., & Hammond, W. E. (1997). Medical data mining: knowledge discovery in a clinical data warehouse. In Proceedings of the AMIA annual fall symposium (p. 101). American Medical Informatics Association. [5] Parpinelli, R. S., Lopes, H. S., & Freitas, A. A. (2001, July). An ant colony based system for data mining:applications to medical data. In Proceedings of the genetic and evolutionarycomputationconference(GECCO-2001)(pp. 791-797). [6] Li, J., Fu, A. W. C., He, H., Chen, J., Jin, H., McAullay, D., & Kelman, C. (2005, August). Mining risk patterns in medical data. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (pp. 770-775). ACM. [7] Ghazavi, S. N., & Liao, T. W. (2008). Medical data mining by fuzzy modeling with selected features. Artificial Intelligence in Medicine, 43(3), 195-206. [8] Delen, D., Walker, G., & Kadam, A. (2005). Predicting breast cancer survivability: a comparison of three data mining methods. Artificial intelligencein medicine,34(2), 113-127. [9] Pamulaparty, L., Rao, C. G., & Rao, M. S. (2016). Cluster Analysis of Medical Research Data usingR. Global Journal of Computer Science and Technology, 16(1). [10] Sebag, M., Azé, J., & Lucas, N. (2003, October). ROC- based evolutionary learning: Application to medical data mining. In International Conference on Artificial Evolution (Evolution Artificielle) (pp.384-396).Springer Berlin Heidelberg. [11]Yu, H., Vaidya, J., & Jiang, X. (2006, April). Privacy- preserving svm classification on vertically partitioned data. In Pacific-Asia Conference on KnowledgeDiscovery and Data Mining (pp. 647-656). Springer Berlin Heidelberg. [12] Caruana, R., & Niculescu-Mizil, A. (2004, August). Data mining in metric space: an empirical analysis of supervised learning performancecriteria.InProceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 69-78). ACM. [13] Delen, D. (2009). Analysis of cancer data: a data mining approach. Expert Systems, 26(1), 100-112. [14] Tomar, D., & Agarwal, S. (2013). A survey on Data Mining approaches for Healthcare. International Journal of Bio-Science and Bio-Technology, 5(5), 241-266. [15] Raikwal, J. S., s& Saxena, K. (2012). Performance evaluation of SVM and k-nearestneighboralgorithmover medical data set. International Journal of Computer Applications, 50(14). [16] Moses, D. (2015). A survey of data mining algorithms used in cardiovascular disease diagnosis frommulti-lead ECG data. Kuwait Journal of Science, 42(2). [17] Nichat, A. M., & Ladhake, S. A. (2016). Brain Tumor Segmentation and Classification UsingModifiedFCMand SVM Classifier. Brain, 5(4). [18] Verma, L., Srivastava, S., & Negi, P. C. (2016). A Hybrid Data Mining Model to Predict Coronary Artery Disease Cases Using Non-Invasive Clinical Data. Journal of Medical Systems, 40(7), 1-7. [19] Li, D. C., Liu, C. W., & Hu, S. C. (2010). A learning method for the class imbalance problem with medical data sets. Computers in biology and medicine, 40(5), 509-518. [20]Kazemzadeh, R. S., & Sartipi, K. (2005, September). Interoperability of data and knowledge in distributed health care systems. In 13th IEEE International Workshop on Software Technology and Engineering Practice (STEP'05) (pp. 230-240). IEEE.