SlideShare a Scribd company logo
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Vol. 2, Issue 1, pp: (96-99), Month: April 2015 – September 2015, Available at: www.paperpublications.org
Page | 96
Paper Publications
A Study on Cancer Perpetuation Using the
Classification Algorithms
ANITA KUMAR
Bishop Heber college Trichy-17, India
Abstract: Analysis of cancer datasets is one of the important research in data mining techniques. In the present
work, classification techniques such as CART, Random Forest, LMT, and Naive Bayesian are used. The result
predicts that Random forest method using training dataset outperforms the remaining methods. The random
forest method using training dataset have less value of absolute relative error. Relative absolute error of LMT is
high for cancer survival dataset. Value of absolute relative error is greater than 50% for almost all the algorithms
except for random forest method using training dataset.
Keywords: Classification techniques- CART, Random Forest, LMT, and Naive Bayesian.
1. INTRODUCTION
Data mining is the process of analysing data from different perspectives and summarizing it into important information so
as to identify hidden patterns from a large data set. Researchers in many fields have shown great interest in data mining.
Cancer also known as a malignant tumor or malignant neoplasm, is a group of diseases involving aberrant cell growth
with the potential to invade or spread to other parts of the body.[1][2] it does not mean that all tumors are
cancerous; benign tumors do not spread to other parts of the body. The Possible signs and the symptoms include: a new
lump, abnormal bleeding, cough for a very long period, unexplained weight loss, among others.[3] While these symptoms
might indicate cancer, they may also occur due to other complications. There are over 100 different types of known
cancers that affect humans.
In bioinformatics age, cancer datasets can be used for the cancer diagnosis and treatment, which can improve human
aging [4].The Data mining techniques, such as the pattern association, classification and clustering, are mostly applied in
the cancer and gene expressions correlation studies. Bioinformatics provides logic for developing novel data mining
methods.
Classification of datasets based on a predefined knowledge of the objects is a data mining [5].Knowledge management
technique is used in grouping the same data objects together. The ultimate goal of a supervised learning algorithm is to
build a classifier that can be used to classify unlabelled instances accurately [6]. Data classification contains supervised
learning algorithms as it assigns class labels to data objects based on the relationship between the data sets with a pre-
defined class label. Classification algorithms have a very wide range ofapplications like fraud detection, churn prediction,
artificial intelligence, neural networks and the credit card rating etc. [7]. There are many classification algorithms
available in literature and is a well studied area in data mining. Numerous classification algorithms have been proposed in
the literature, such as classification and regression tree [8], Logistic Model Tree [9], [10], Random forest [11], Bayesian
classifiers [12]. Cancer detection is one of the most important research topics in biomedical science. Biomedical research
applies a wide range of designs to solve problems in laboratory, clinical, and population settings [13]. Here in this paper
we studied various classification algorithms like CART, Random Forest, LMT and Naïve Bayesian over different cancer
survival dataset. Accuracy is the main objective to estimate the performance of these algorithms over cancer datasets.
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Vol. 2, Issue 1, pp: (96-99), Month: April 2015 – September 2015, Available at: www.paperpublications.org
Page | 97
Paper Publications
2. METHODOLOGY
A study of Cancer Surveillance using Data Mining, and the Decision Support Systems can reduce the national cancer
burden or the oral complications of cancer therapies. Here, in this paper, we study various classifications of algorithms
like CART, Random Forest, LMT and Naïve Bayesian over different cancer survival dataset. The data explored in this
research was obtained from the Dataset available in the UCI . Patients with highly developed cancers of the stomach,
bronchus or colon were treated with acrobat. The rationale of this study is to resolve if the survival times differ with
respect to the organ affected by cancer. There were no missing values and the dataset was complete. The main aim of
processing the data is to discriminate cancer survivability in people with a two-decision classification problem.
2.1 CART:
A classification and regression tree (CART) is a recursive and gradual refinement data mining algorithm of building a
decision tree. CART algorithm is widely used statistical procedure based on tree structure that can produce classification
and regression trees, depending on the dependent variable weather it is categorical or numeric, respectively and generates
binary tree.
2.2 LMT:
A Logistic Model Tree (LMT) is an algorithm for supervised learning tasks which is combined with linear Logistic
regression and tree induction. LMT creates a model tree with a standard decision tree structure with logistic regression
functions at leaf nodes. In LMT, leaves have an associated logic regression function instead of just class labels.
2.3 Random forest:
Random forest is an ensemble classifier that consists of many decision tree, and outputs the class that is the mode of the
class's output by individual trees. The Random Forests grows many classification trees without pruning. Then a test
sample is classified by each decision tree and random forest assigns a class which have maximum occurrence among these
classifications.
2.4 Naïve Bayesian:
Naïve Bayesian classifier is a simple probabilistic classifier based upon Bayes theorem with strong (naive) independence
assumptions. Naïve Bayesian classifier is based on Bayes conditional probability rule and is used for performing
classification tasks. All attributes of the dataset are considered independent of each other. In general, a naïve Bayes
classifier assume that the presence (or absence) of a selective feature of a class is unrelated to the presence (or absence) of
any other feature. An advantage of the naïve Bayes classifier is that it rebuild amount of training data to estimate the
parameters (means and variances of the variables) necessary for classification.
TABLE 1
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Vol. 2, Issue 1, pp: (96-99), Month: April 2015 – September 2015, Available at: www.paperpublications.org
Page | 98
Paper Publications
TABLE 2
3. RESULTS AND DISCUSSION
Study of cancer survival dataset is also done in (Table 1). Here Random forest algorithm outperforms all other
classification algorithms used in the study. Comparison of the classification techniques which includes CART, Random
Forest, LMT, and the Naive Bayesian over different cancer survival dataset shows that Random forest method using
training dataset outperforms the other methods (Table 2). Relative absolute error of LMT is high for cancer survival
dataset. Value of absolute relative error is greater than 50% for almost all the algorithms. Only the random forest method
using training dataset have less value of the absolute relative error.
4. CONCLUSION
On Comparing the o classification techniques on cancer survival dataset including the Random Forest ,CART, LMT and
Naïve Bayesian ,it is clear that Random forest method outperforms the remaining methods. Absolute relative error for the
algorithm (Random Forest) is also less than the Absolute relative error of the other algorithms.
REFERENCES
[1] "Cancer Fact sheet ". World Health Organization. February 2014. Retrieved on 10 June 2014.
[2] "DefiningCancer". National Cancer Institute. Retrieved on 10 June 2014.
[3] "Cancer - Signs and symptoms". NHS Choices. Retrieved on 10 June 2014.
[4] Christoph Bock, Thomas Lengauer, “Computational epigenetic,” Bioinformatics, Vol. 24, No.1, pp. 1-10, in the year
2008.
[5] Yi Peng, Gang Kou, Yong Shi, Zhengxin Chen, “A descriptive framework for the field of data mining and
Knowledge discovery,” Vol. 7, No. 4, pp. 639-682, in the year 2008.
[6] H. Friedman, R. Kohavi, Y. Yun, “Lazy decision trees,” In Proceedings of the Thirteenth National Conference on
Artificial Intelligence, AAAI Press and the MIT Press, pp. 717-724, in the year 2006.
[7] Richard J. Bolton, David J. Hand, “Statistical Fraud Detection: A Review,“ Statist. Sci., Vol. 17, No. 3, pp. 235-255,
in the year 2002.
[8] Breiman L, Friedman J, Olshen R, Stone C, "Classification and Regression Trees," Wadsworth International Group,
in the year 2004.
[9] Frank E, Wang Y, Inglis S, Holmes G, Witten I. H, “Using model trees for classification,” Machine Learning, Vol.
32, No. 1, pp. 63–76, in the year 2008.
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Vol. 2, Issue 1, pp: (96-99), Month: April 2015 – September 2015, Available at: www.paperpublications.org
Page | 99
Paper Publications
[10] Niels Landwehr, Mark Hall, Eibe Frank, “Logistic Model Trees,” Machine Learning, Vol. 59, No. 1-2, pp.161-205,
in the year 2005.
[11] Leo Breiman, “Random Forests,” Machine Learning, Vol. 45, No. 1, pp. 5-32, 2001.
[12] Langley P, Iba W, Thompson K, “An analysis of Bayesian classifiers,” In Proceedings of AAAI-92, AAAI Press,
pp. 223-228, in the year 2002.
[13] John C. Bailar, Thomas A. Louis, Philip W. Lavori, Marcia Polansky, “A Classification for Biomedical Research
Reports,” N Engl J Med, Vol. 311, No. 23 pp. 1482-1487, in the year 2010.
[14] Golub T.R, Slonim D.K, Tamayo P, et al., “Molecular classification of cancer: class discovery and class prediction
by gene expression monitoring,” Science. Vol. 286, No.5439, pp. 531–537, in the year 2009. 343International
Journal of Research and Reviews in Computer Science (IJRRCS) Vol. 2, No. 2, April 2011.
[15] Alizadeh A, Eisen M.B, Davis R.E, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI,
Yang L, Marti GE, Moore T, Hudson J Jr, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC,Greiner TC,
Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO,
Staudt LM, “Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling,” Nature, Vol.
403, No. 6769, pp. 503–511, in the year 2000.
[16] Nielsen T.O, West R.B, Linn S.C, Alter O, Knowling MA, O'Connell JX, Zhu S, Fero M, Sherlock G, Pollack JR,
Brown PO, Botstein D, van de Rijn M,“Molecular characterisation of soft tissue tumors: a gene expression study,”
Lancet, Vol. 359, No. 9314, pp. 1301-1307, in the year 2012.
[17] Thangaraju, P., and G. Barkavi. "Lung Cancer Early Diagnosis Using Some Data Mining Classification Techniques:
A Survey." COMPUSOFT, An international journal of advanced computer technology (IJACT) 3.6 (2014).
[18] Krishnaiah, V., Dr G. Narsimha, and Dr N. Subhash Chandra. "Diagnosis of lung cancer prediction system using
data mining classification techniques."International Journal of Computer Science and Information Technologies 4.1
(2013): 39-45.
[19] Tomar, Divya, and Sonali Agarwal. "A survey on Data Mining approaches for Healthcare." International Journal of
Bio-Science and Bio-Technology 5.5 (2013): 241-266.
[20] Ramachandran, P., et al. "Cancer Spread Pattern–an Analysis using Classification and Prediction
Techniques." Cancer 2.6 (2013).
[21] Yang, Chun-Yi. "A Hybrid of Data Mining and Statistical Analysis Approach on Association between Pulmonary
Tuberculosis and Lung Cancer." (2014).
[22] Ada, Rajneet Kaur. "Using Some Data Mining Techniques to Predict the Survival Year of Lung Cancer Patient."
(2013).
[23] Khedr, Aymn E., and Abd El-Ghany AM Mohmed. "A proposed image processing framework to support Early liver
Cancer Diagnosis." Life Sci J 9.4 (2012): 3808-3813.
[24] Halder, Subhas. An Approach to Diagnosis of Cancer using k-nearest Neighbor (k-NN) Algorithm. Diss.
JADAVPUR UNIVERSITY KOLKATA, 2013.
[25] Ramachandran, P., N. Girija, and T. Bhuvaneswari. "Early Detection and Prevention of Cancer using Data Mining
Techniques." International Journal of Computer Applications 97.13 (2014): 48-53.

More Related Content

PDF
Hybrid prediction model with missing value imputation for medical data 2015-g...
PDF
PERFORMANCE EVALUATION OF DIFFERENT CLASSIFIER ON BREAST CANCER
PDF
Multi-Cluster Based Approach for skewed Data in Data Mining
PDF
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
DOCX
Advance KNN classification of brain tumor
PDF
SENIOR COMP FINAL
PDF
[IJET-V2I3P22] Authors: Harsha Pakhale,Deepak Kumar Xaxa
PDF
DataMining_CA2-4
Hybrid prediction model with missing value imputation for medical data 2015-g...
PERFORMANCE EVALUATION OF DIFFERENT CLASSIFIER ON BREAST CANCER
Multi-Cluster Based Approach for skewed Data in Data Mining
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
Advance KNN classification of brain tumor
SENIOR COMP FINAL
[IJET-V2I3P22] Authors: Harsha Pakhale,Deepak Kumar Xaxa
DataMining_CA2-4

What's hot (20)

PDF
Breast cancer diagnosis and recurrence prediction using machine learning tech...
PDF
Some Imputation Methods to Treat Missing Values in Knowledge Discovery in Dat...
PDF
Ec33772776
PDF
Towards reducing the
PDF
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
IRJET - Survey on Analysis of Breast Cancer Prediction
PDF
Ijcatr04041015
PDF
hb2s5_BSc scriptie Steyn Heskes
PDF
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
PDF
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
PDF
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
PDF
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
PDF
50120130406032
PDF
Incremental learning from unbalanced data with concept class, concept drift a...
PDF
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
PDF
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
PDF
Comprehensive Survey of Data Classification & Prediction Techniques
PDF
Fault detection of imbalanced data using incremental clustering
Breast cancer diagnosis and recurrence prediction using machine learning tech...
Some Imputation Methods to Treat Missing Values in Knowledge Discovery in Dat...
Ec33772776
Towards reducing the
An Heterogeneous Population-Based Genetic Algorithm for Data Clustering
Welcome to International Journal of Engineering Research and Development (IJERD)
IRJET - Survey on Analysis of Breast Cancer Prediction
Ijcatr04041015
hb2s5_BSc scriptie Steyn Heskes
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA
50120130406032
Incremental learning from unbalanced data with concept class, concept drift a...
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Comprehensive Survey of Data Classification & Prediction Techniques
Fault detection of imbalanced data using incremental clustering
Ad

Viewers also liked (20)

PDF
Survey on Performance of Hadoop Map reduce Optimization Methods
PDF
A Novel Frame Work System Used In Mobile with Cloud Based Environment
PDF
Crowd Density Estimation Using Base Line Filtering
PDF
Literature Survey on Buliding Confidential and Efficient Query Processing Usi...
PDF
Google Algorithms
PDF
Some Categories and Their Properties
PDF
Smart Home Management System Using Wireless Sensor Network (WSN)
PDF
日本医疗制度镜鉴-《财经》
DOCX
Resume_current
PDF
Stalactites powerpoint portugues
PDF
Seminário Tetra Pak- As Empresas e os Novos Desafios (autor- Tiago Robalo Gou...
PPTX
El plano
DOCX
nsk resume
PDF
Dossier de Produção Audiovisual II
PDF
PDF
Brochure Meer Bereiken
PPTX
Não estejais inquietos por coisa alguma!
PDF
ORIENTAÇÕES ABNT
Survey on Performance of Hadoop Map reduce Optimization Methods
A Novel Frame Work System Used In Mobile with Cloud Based Environment
Crowd Density Estimation Using Base Line Filtering
Literature Survey on Buliding Confidential and Efficient Query Processing Usi...
Google Algorithms
Some Categories and Their Properties
Smart Home Management System Using Wireless Sensor Network (WSN)
日本医疗制度镜鉴-《财经》
Resume_current
Stalactites powerpoint portugues
Seminário Tetra Pak- As Empresas e os Novos Desafios (autor- Tiago Robalo Gou...
El plano
nsk resume
Dossier de Produção Audiovisual II
Brochure Meer Bereiken
Não estejais inquietos por coisa alguma!
ORIENTAÇÕES ABNT
Ad

Similar to A Study on Cancer Perpetuation Using the Classification Algorithms (20)

PDF
Comparative Analysis of Early Stage Cancer Detection Methods in Machine Learning
PDF
Classification of Breast Cancer Diseases using Data Mining Techniques
PDF
IRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
PDF
IRJET- Detection of Breast Cancer using Machine Learning Techniques
PDF
Decision Tree Models for Medical Diagnosis
PDF
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
DOC
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
PDF
IRJET- Effect of Principal Component Analysis in Lung Cancer Detection us...
PDF
IRJET- Diagnosis of Breast Cancer using Decision Tree Models and SVM
PPTX
Machine learning to solve bioinformatics problems
PDF
Classification of Breast Cancer Tissues using Decision Tree Algorithms
PDF
IRJET- Exploring Colorectal Cancer Genes through Data Mining Techniques
PDF
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
PPTX
Cancer detection using data mining
PDF
Breast Cancer Detection Using Machine Learning
PDF
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
PDF
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
PDF
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
PDF
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
Comparative Analysis of Early Stage Cancer Detection Methods in Machine Learning
Classification of Breast Cancer Diseases using Data Mining Techniques
IRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
IRJET- Detection of Breast Cancer using Machine Learning Techniques
Decision Tree Models for Medical Diagnosis
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
IRJET- Effect of Principal Component Analysis in Lung Cancer Detection us...
IRJET- Diagnosis of Breast Cancer using Decision Tree Models and SVM
Machine learning to solve bioinformatics problems
Classification of Breast Cancer Tissues using Decision Tree Algorithms
IRJET- Exploring Colorectal Cancer Genes through Data Mining Techniques
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
Cancer detection using data mining
Breast Cancer Detection Using Machine Learning
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION
SVM &GA-CLUSTERING BASED FEATURE SELECTION APPROACH FOR BREAST CANCER DETECTION

Recently uploaded (20)

PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
Well-logging-methods_new................
DOCX
573137875-Attendance-Management-System-original
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
web development for engineering and engineering
PPT
Mechanical Engineering MATERIALS Selection
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
PPT on Performance Review to get promotions
PPTX
Foundation to blockchain - A guide to Blockchain Tech
R24 SURVEYING LAB MANUAL for civil enggi
Well-logging-methods_new................
573137875-Attendance-Management-System-original
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Lecture Notes Electrical Wiring System Components
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
CH1 Production IntroductoryConcepts.pptx
bas. eng. economics group 4 presentation 1.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Operating System & Kernel Study Guide-1 - converted.pdf
web development for engineering and engineering
Mechanical Engineering MATERIALS Selection
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPT on Performance Review to get promotions
Foundation to blockchain - A guide to Blockchain Tech

A Study on Cancer Perpetuation Using the Classification Algorithms

  • 1. ISSN 2350-1022 International Journal of Recent Research in Mathematics Computer Science and Information Technology Vol. 2, Issue 1, pp: (96-99), Month: April 2015 – September 2015, Available at: www.paperpublications.org Page | 96 Paper Publications A Study on Cancer Perpetuation Using the Classification Algorithms ANITA KUMAR Bishop Heber college Trichy-17, India Abstract: Analysis of cancer datasets is one of the important research in data mining techniques. In the present work, classification techniques such as CART, Random Forest, LMT, and Naive Bayesian are used. The result predicts that Random forest method using training dataset outperforms the remaining methods. The random forest method using training dataset have less value of absolute relative error. Relative absolute error of LMT is high for cancer survival dataset. Value of absolute relative error is greater than 50% for almost all the algorithms except for random forest method using training dataset. Keywords: Classification techniques- CART, Random Forest, LMT, and Naive Bayesian. 1. INTRODUCTION Data mining is the process of analysing data from different perspectives and summarizing it into important information so as to identify hidden patterns from a large data set. Researchers in many fields have shown great interest in data mining. Cancer also known as a malignant tumor or malignant neoplasm, is a group of diseases involving aberrant cell growth with the potential to invade or spread to other parts of the body.[1][2] it does not mean that all tumors are cancerous; benign tumors do not spread to other parts of the body. The Possible signs and the symptoms include: a new lump, abnormal bleeding, cough for a very long period, unexplained weight loss, among others.[3] While these symptoms might indicate cancer, they may also occur due to other complications. There are over 100 different types of known cancers that affect humans. In bioinformatics age, cancer datasets can be used for the cancer diagnosis and treatment, which can improve human aging [4].The Data mining techniques, such as the pattern association, classification and clustering, are mostly applied in the cancer and gene expressions correlation studies. Bioinformatics provides logic for developing novel data mining methods. Classification of datasets based on a predefined knowledge of the objects is a data mining [5].Knowledge management technique is used in grouping the same data objects together. The ultimate goal of a supervised learning algorithm is to build a classifier that can be used to classify unlabelled instances accurately [6]. Data classification contains supervised learning algorithms as it assigns class labels to data objects based on the relationship between the data sets with a pre- defined class label. Classification algorithms have a very wide range ofapplications like fraud detection, churn prediction, artificial intelligence, neural networks and the credit card rating etc. [7]. There are many classification algorithms available in literature and is a well studied area in data mining. Numerous classification algorithms have been proposed in the literature, such as classification and regression tree [8], Logistic Model Tree [9], [10], Random forest [11], Bayesian classifiers [12]. Cancer detection is one of the most important research topics in biomedical science. Biomedical research applies a wide range of designs to solve problems in laboratory, clinical, and population settings [13]. Here in this paper we studied various classification algorithms like CART, Random Forest, LMT and Naïve Bayesian over different cancer survival dataset. Accuracy is the main objective to estimate the performance of these algorithms over cancer datasets.
  • 2. ISSN 2350-1022 International Journal of Recent Research in Mathematics Computer Science and Information Technology Vol. 2, Issue 1, pp: (96-99), Month: April 2015 – September 2015, Available at: www.paperpublications.org Page | 97 Paper Publications 2. METHODOLOGY A study of Cancer Surveillance using Data Mining, and the Decision Support Systems can reduce the national cancer burden or the oral complications of cancer therapies. Here, in this paper, we study various classifications of algorithms like CART, Random Forest, LMT and Naïve Bayesian over different cancer survival dataset. The data explored in this research was obtained from the Dataset available in the UCI . Patients with highly developed cancers of the stomach, bronchus or colon were treated with acrobat. The rationale of this study is to resolve if the survival times differ with respect to the organ affected by cancer. There were no missing values and the dataset was complete. The main aim of processing the data is to discriminate cancer survivability in people with a two-decision classification problem. 2.1 CART: A classification and regression tree (CART) is a recursive and gradual refinement data mining algorithm of building a decision tree. CART algorithm is widely used statistical procedure based on tree structure that can produce classification and regression trees, depending on the dependent variable weather it is categorical or numeric, respectively and generates binary tree. 2.2 LMT: A Logistic Model Tree (LMT) is an algorithm for supervised learning tasks which is combined with linear Logistic regression and tree induction. LMT creates a model tree with a standard decision tree structure with logistic regression functions at leaf nodes. In LMT, leaves have an associated logic regression function instead of just class labels. 2.3 Random forest: Random forest is an ensemble classifier that consists of many decision tree, and outputs the class that is the mode of the class's output by individual trees. The Random Forests grows many classification trees without pruning. Then a test sample is classified by each decision tree and random forest assigns a class which have maximum occurrence among these classifications. 2.4 Naïve Bayesian: Naïve Bayesian classifier is a simple probabilistic classifier based upon Bayes theorem with strong (naive) independence assumptions. Naïve Bayesian classifier is based on Bayes conditional probability rule and is used for performing classification tasks. All attributes of the dataset are considered independent of each other. In general, a naïve Bayes classifier assume that the presence (or absence) of a selective feature of a class is unrelated to the presence (or absence) of any other feature. An advantage of the naïve Bayes classifier is that it rebuild amount of training data to estimate the parameters (means and variances of the variables) necessary for classification. TABLE 1
  • 3. ISSN 2350-1022 International Journal of Recent Research in Mathematics Computer Science and Information Technology Vol. 2, Issue 1, pp: (96-99), Month: April 2015 – September 2015, Available at: www.paperpublications.org Page | 98 Paper Publications TABLE 2 3. RESULTS AND DISCUSSION Study of cancer survival dataset is also done in (Table 1). Here Random forest algorithm outperforms all other classification algorithms used in the study. Comparison of the classification techniques which includes CART, Random Forest, LMT, and the Naive Bayesian over different cancer survival dataset shows that Random forest method using training dataset outperforms the other methods (Table 2). Relative absolute error of LMT is high for cancer survival dataset. Value of absolute relative error is greater than 50% for almost all the algorithms. Only the random forest method using training dataset have less value of the absolute relative error. 4. CONCLUSION On Comparing the o classification techniques on cancer survival dataset including the Random Forest ,CART, LMT and Naïve Bayesian ,it is clear that Random forest method outperforms the remaining methods. Absolute relative error for the algorithm (Random Forest) is also less than the Absolute relative error of the other algorithms. REFERENCES [1] "Cancer Fact sheet ". World Health Organization. February 2014. Retrieved on 10 June 2014. [2] "DefiningCancer". National Cancer Institute. Retrieved on 10 June 2014. [3] "Cancer - Signs and symptoms". NHS Choices. Retrieved on 10 June 2014. [4] Christoph Bock, Thomas Lengauer, “Computational epigenetic,” Bioinformatics, Vol. 24, No.1, pp. 1-10, in the year 2008. [5] Yi Peng, Gang Kou, Yong Shi, Zhengxin Chen, “A descriptive framework for the field of data mining and Knowledge discovery,” Vol. 7, No. 4, pp. 639-682, in the year 2008. [6] H. Friedman, R. Kohavi, Y. Yun, “Lazy decision trees,” In Proceedings of the Thirteenth National Conference on Artificial Intelligence, AAAI Press and the MIT Press, pp. 717-724, in the year 2006. [7] Richard J. Bolton, David J. Hand, “Statistical Fraud Detection: A Review,“ Statist. Sci., Vol. 17, No. 3, pp. 235-255, in the year 2002. [8] Breiman L, Friedman J, Olshen R, Stone C, "Classification and Regression Trees," Wadsworth International Group, in the year 2004. [9] Frank E, Wang Y, Inglis S, Holmes G, Witten I. H, “Using model trees for classification,” Machine Learning, Vol. 32, No. 1, pp. 63–76, in the year 2008.
  • 4. ISSN 2350-1022 International Journal of Recent Research in Mathematics Computer Science and Information Technology Vol. 2, Issue 1, pp: (96-99), Month: April 2015 – September 2015, Available at: www.paperpublications.org Page | 99 Paper Publications [10] Niels Landwehr, Mark Hall, Eibe Frank, “Logistic Model Trees,” Machine Learning, Vol. 59, No. 1-2, pp.161-205, in the year 2005. [11] Leo Breiman, “Random Forests,” Machine Learning, Vol. 45, No. 1, pp. 5-32, 2001. [12] Langley P, Iba W, Thompson K, “An analysis of Bayesian classifiers,” In Proceedings of AAAI-92, AAAI Press, pp. 223-228, in the year 2002. [13] John C. Bailar, Thomas A. Louis, Philip W. Lavori, Marcia Polansky, “A Classification for Biomedical Research Reports,” N Engl J Med, Vol. 311, No. 23 pp. 1482-1487, in the year 2010. [14] Golub T.R, Slonim D.K, Tamayo P, et al., “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science. Vol. 286, No.5439, pp. 531–537, in the year 2009. 343International Journal of Research and Reviews in Computer Science (IJRRCS) Vol. 2, No. 2, April 2011. [15] Alizadeh A, Eisen M.B, Davis R.E, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J Jr, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC,Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM, “Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling,” Nature, Vol. 403, No. 6769, pp. 503–511, in the year 2000. [16] Nielsen T.O, West R.B, Linn S.C, Alter O, Knowling MA, O'Connell JX, Zhu S, Fero M, Sherlock G, Pollack JR, Brown PO, Botstein D, van de Rijn M,“Molecular characterisation of soft tissue tumors: a gene expression study,” Lancet, Vol. 359, No. 9314, pp. 1301-1307, in the year 2012. [17] Thangaraju, P., and G. Barkavi. "Lung Cancer Early Diagnosis Using Some Data Mining Classification Techniques: A Survey." COMPUSOFT, An international journal of advanced computer technology (IJACT) 3.6 (2014). [18] Krishnaiah, V., Dr G. Narsimha, and Dr N. Subhash Chandra. "Diagnosis of lung cancer prediction system using data mining classification techniques."International Journal of Computer Science and Information Technologies 4.1 (2013): 39-45. [19] Tomar, Divya, and Sonali Agarwal. "A survey on Data Mining approaches for Healthcare." International Journal of Bio-Science and Bio-Technology 5.5 (2013): 241-266. [20] Ramachandran, P., et al. "Cancer Spread Pattern–an Analysis using Classification and Prediction Techniques." Cancer 2.6 (2013). [21] Yang, Chun-Yi. "A Hybrid of Data Mining and Statistical Analysis Approach on Association between Pulmonary Tuberculosis and Lung Cancer." (2014). [22] Ada, Rajneet Kaur. "Using Some Data Mining Techniques to Predict the Survival Year of Lung Cancer Patient." (2013). [23] Khedr, Aymn E., and Abd El-Ghany AM Mohmed. "A proposed image processing framework to support Early liver Cancer Diagnosis." Life Sci J 9.4 (2012): 3808-3813. [24] Halder, Subhas. An Approach to Diagnosis of Cancer using k-nearest Neighbor (k-NN) Algorithm. Diss. JADAVPUR UNIVERSITY KOLKATA, 2013. [25] Ramachandran, P., N. Girija, and T. Bhuvaneswari. "Early Detection and Prevention of Cancer using Data Mining Techniques." International Journal of Computer Applications 97.13 (2014): 48-53.