SlideShare a Scribd company logo
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Vol. 2, Issue 1, pp: (55-60), Month: April 2015 – September 2015, Available at: www.paperpublications.org
Page | 55
Paper Publications
Supervised Multi Attribute Gene Manipulation
For Cancer
Shenbagam.S1
, S.Brintha Rajakumari2
1
PG Student, 2
Assistant Professor, Department of CSE, Bharath University, Chennai
Abstract: Data mining, the extraction of hidden predictive information from large databases, is a powerful new
technology with great potential to help companies focus on the most important information in their data
warehouses. Data mining tools predict future trends and behaviours, allowing businesses to make proactive,
knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the
analyses of past events provided by retrospective tools typical of decision support systems. They scour databases
for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.
Data mining techniques are the result of a long process of research and product development. This evolution began
when business data was first stored on computers, continued with improvements in data access, and more recently,
generated technologies that allow users to navigate through their data in real time. Data mining takes this
evolutionary process beyond retrospective data access and navigation to prospective and proactive information
delivery.
Keywords: Data mining tools predict future trends and behaviours, allowing businesses to make proactive,
knowledge-driven decisions, Supervised Multi Attribute Gene.
1. INTRODUCTION
Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different
perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or
both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from
many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is
the process of finding correlations or patterns among dozens of fields in large relational databases. Web services have
been promising in recent years and are by now one of the most popular techniques for building distributed systems.
Service-oriented systems can be built efficiently by dynamically composing different web services, which are provided by
other organizations. Data mining, the extraction of hidden predictive information from large databases, is a powerful new
technology with great potential to help companies focus on the most important information in their data warehouses. Data
mining tools predict future trends and behaviours, allowing businesses to make proactive, knowledge-driven decisions.
The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by
retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally
were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts
may miss because it lies outside their expectations. Data mining techniques are the result of a long process of research and
product development. This evolution began when business data was first stored on computers, continued with
improvements in data access, and more recently, generated technologies that allow users to navigate through their data in
real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and
proactive information delivery.
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Vol. 2, Issue 1, pp: (55-60), Month: April 2015 – September 2015, Available at: www.paperpublications.org
Page | 56
Paper Publications
2. RELATED WORK
The cause of many cancers remains unknown. Apart from internal (genetic) causes, there are certain environmental and
external factors too that participate in cancer formation within an organism, viz. environmental toxins, adulterated food
intake, air pollution, and irregular lifestyle; (share as depicted in the Fig. 1). These can be categorized under epigenetics.
Epigenetics is an un ignorable issue to be addressed by the biomedical community. Symptoms of cancer depend on the
type and location of the cancer. For example, lung cancer can cause coughing, heavy breathing, chest pain, etc. Colon
cancer often causes diarrhoea, constipation, dysentery, and blood in the stool.
2.1 MATHEMATICS IN NATURE: AN INTUITIVE CERTITUDE:
Mathematics is known to be an indispensable part all sciences. It forms the edifice of all existences being a formidable
aegis that “holds” all parts together. One can have certain propensity towards it and more when going through
John A. Adam‟s texts Docile is the symmetry found in vegetation, anatomy of living creatures, shapes of heavenly
bodies: planets being spherical and much so their orbits, to name a few. Mathematical modeling of any natural
phenomena and its inherent dogma at core, which ensures it compositional as well as physical traits can be extremely
expedient towards developing an understanding. A mathematical model is a feat if it fits the known data and makes
accurate predictions for the future, as rendered in the fig.A snowball is defined to grow in size and attain an almost
intermediary shape between a circle and a sphere as it rolls through ice. However, external factors like intensity of
sunlight, heat produced due to friction/resistance dymanic surface area of the ball, etc., are the variates that contribute to
the problem, profoundly (Adam, 2006). The author also registers that all mathematical models are flawed to some extent
owing to the inappropriate presumptions made during their construction. The aesthetic of all natural and physical
phenomena is brought to life once the mathematical undergird is realized.“Mathematics is to nature as Sherlock Holmes
is to evidence.”With a compendium of suggestions to consult, it renders highly probabilistic scenario that the gene
expression datawon‟t be an exception. Studies have shown that mathematical and statistical models can be built around it
and they are seminal to identify biomarkers. The only apprehension is the uncertainity attached to it which is subjected to
validation to establish the baseline parameters.
Fig. 2.1 MATHEMATICAL MODEL
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Vol. 2, Issue 1, pp: (55-60), Month: April 2015 – September 2015, Available at: www.paperpublications.org
Page | 57
Paper Publications
2.2 EPIGENETICS:
Literally speaking, “epi” stands for “on top of” and epigenetics is on top of genetics. Governing factors for the gene
expression and protein building, explicit to the inherent DNA code, delineates epigenetics. The environment and our
lifestyle can significantly direct our genetic behaviour and even that of our kids. The multi cellular organisms have
optimally identical underlying code, yet they have incongruent phenotypes.
3. SYSTEM MODEL
In this section, our system model is designed with two phases. In Phase 1 consists of analysis and design is considered and
in Phase 2 the model is implemented and validated. The Initial analysis activity involves researching the problem in detail
and evaluating various approaches to resolve the same. Design activity is carried out in two fold, high level and low level
design. High level design involved architectural design of the framework and case study planning as it plays a vital role in
validating the approach.
4. ARCHITECTURAL OVERVIEW
Predicting Cancer by analyzing gene and converting the gene expression is the proposed concept of our project, which
leads to identifying and analyzing the cancer result set. Controlling Gene Activity From Gene to Functional Protein &
Phenotype has also been analyzed in order to identify the cancer cells. In our proposed methodology the experts
documental DNA data methylation (Gene expression segments) is a kind of binding site for proteins which make DNA
inaccessible to be in alive state.
5. GENE HEAT MAP VISUALIZATION
The goal is to reduce the dimensionality of data to facilitate visualization and additional analysis. They are often used as a
preliminary step to clustering of large data sets. MDS starts from a distance matrix between objects and finds the locations
of these objects in a low dimensional space that best preserves the original distances. These techniques work on ratio-
optimization principle. It‟s almost concomitant of clustering techniques for high dimensional data to be exploratory. Their
strength is in providing rough maps and suggesting directions for further study. Also, clustering results are sensitive to a
variety of user-specified inputs. The clustering of a large and complex set of objects can be planned in different ways
depending on the goals.
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Vol. 2, Issue 1, pp: (55-60), Month: April 2015 – September 2015, Available at: www.paperpublications.org
Page | 58
Paper Publications
Fig. 5.1 GENE HEAT MAP VISUALIZATION
6. CLUSTERING
These techniques can be used in microarray analysis to facilitate visual display (mostly preferred by biologists)and
interpretation of experimental results and suggest the presence of subgroups of objects (genes or samples) that behave
similarly. Often finds itself as the foremost step of data infiltration since it is vital to parry microarray data for noise
elimination. Confusion marks with the trait of the genes to participate in multiple pathways that may or may not be
coactive under all conditions, so a gene can find its place in multiple clusters or in none at all. Clustering can be sample-
based and/or gene-based by character.A gene-based clustering shall abstract genes as objects and samples as features,
while sample-based clustering would perceive vice-versa. A third category of clustering type also exists, subspace
clustering. Subspace clustering is not “global” rather it aims to cluster genes based on their indulgence in any disease,
being a part of one or more biological pathways.
Fig. 6 SUBSPACE CLUSTERING
6.1 K-Means Clustering and Self Organized Maps (SOM):
It partitions objects into groups that have little variability within clusters and large variability across clusters. The user is
required to specify the number K of clusters a priori. Estimation is iterative, starting with a random allocation of objects to
clusters, re-allocating to minimize distance to the estimated “centroids” of the clusters, and stops when no further
improvements can be made. Its implementation is easy and execution is faster. The time complexity was computed to be
O (L_K_N), where L is the number of iterations in K clusters.
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Vol. 2, Issue 1, pp: (55-60), Month: April 2015 – September 2015, Available at: www.paperpublications.org
Page | 59
Paper Publications
7. DATASET COLLECTION
In this module we describe dataset collection for multiple real world web services. The term "dataset" originated in the
mainframe field. A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a
variable. Each row corresponds to a member of the data set. It lists values for each of the variables. The data set may
comprise data for one or more members, corresponding to the number of rows.
7.1 Given Input & Output Design:
Gene Heat map visualization.
Input: Fetch data and Dataset comparison.
Output: diagnosis for lung cancer.
8. TECHNIQUE USED
8.1 PRE-PROCESSING PHASE:
Genes, DNA clones, or expressed sequence tags [ESTs] usually constitute the DNA sequences that are scanned by
microarray experiments, conditions contingent. They may include time series data of a biological process, e.g., life cycle
of a yeast cell, or a collection of varied tissue samples, e.g., normal versus cancerous tissues. Study on promoter
sequences can be staple for deriving transcription factors of an associated gene. Regulation of transcription is the most
common form of gene control, and the activity of transcription factors allows genes to be specifically regulated during
development and in different types of cells.
8.2 POST PROCESSING:
Since, the pre-processing phase aids in precipitating several groups, patterns, correlations of genes at the expression level
basis, it becomes almost necessary to re-evaluate and formalize them in a phase called post-processing phase. During this
phase, the domain experts analyze and match the extracted patterns to the business objectives and success criteria. The
dogma of pattern management is heterogeneous pattern representation. Since the extracted patterns can be relevant as well
as irrelevant; indexing them is a labor intensive task that involves marking and classifying them scrupulously.
Predictive Model Markup Language (PMML) and Common Warehouse model for Data Mining (CWM-DM) were
designed for genetic data modelling, but they lacked the efficacy to handle and represent specific classes of patterns. As a
solution, Rizzi et al. introduced Pattern Base Later, Kotsifakos et al. revised the PBMS architecture by enabling support
for domain ontologies. After defining a data modelling system, it‟s vital to design a mechanism to query and extract the
required data. For the same, certain APIs namely, SQL/MM DM, Java Data Mining (JDM) API were standardized to
handle data as well as the metadata entwining genetic correlational patterns.
9. CONCLUSION
A reliable lead and precise classification of tumors is essential for successful diagnosis and treatment of cancer. The
microarray experiments may lead to a more complex understanding of the molecular version of the tumors. The ability to
distuinguish between tumor classes using gene expression is a new approach to cancer classification.
A microarray dataset contains the numerous groups of co-expressed genes. A typical strategy for the biologist is to start
from genes which are known to be closely related to a biological function and to browse the preliminary rough clustering
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Vol. 2, Issue 1, pp: (55-60), Month: April 2015 – September 2015, Available at: www.paperpublications.org
Page | 60
Paper Publications
result, to focus on a small subset of those genes. Thus biologist follow explanatory strategies by manual knowledge. So,
on experimenting with these „superficial‟ data and applying for various data mining techniques to them, the data has to be
concise and close to accurate to obtain the results of cancer.
REFERENCES
[1] Data and Statistics. World Health Organization, Geneva, Switzerland,2010.
[2] PubMedHealth- U.S. Nat. Library Med., (2009).[Online].Available:http://www. ncbi.nlm.nih.gov/pubmedhealth/
PMH0002267/
[3] S. Dudoit, J. Fridlyand, and T. P. Speed, “Comparison of discrimination methods for the classification of tumors
using gene expression data,” J. Amer. Statist. Assoc., vol. 97, no. 457, pp. 77–87, Mar 2009.
[4] G.-M. Elizabeth and P. Giovanni, (2008, Dec.). “Clustering and classification methods for gene expression data
analysis.” Johns Hopkins Univ., Dept. of Biostatist. Working Papers. Working Paper 70. [Online]. Available:
http://biostats.bepress. com / jhubiostat/paper70//
[5] E. Shay, (2007, Jan.). “Microarray cluster analysis and applications”[Online]. Available: http://
www.science.co.il/enuka/Essays//Microarray-Review.pdf.
[6] M. B. Eisen, T. P. Spellman, P. O. Brown, and D. Botstein, “Cluster analysis and display of genome-wide
expression patterns,” Proc.Nat. Acad. Sci. USA, vol. 95, no. 25, pp. 14863–14868, Dec. 2007.
[7] S. Tavazoie, D. Hughes, M. J. Campbell, R. J. Cho, and G. M.Church, “Systematic determination of genetic
network architecture,” Nature Genetics, vol. 22, pp. 281–285, 2006.
[8] T. Kohonen, Self-Organising Maps. Berlin, Germany: Springer-Verlag, 2005.
[9] N. Pasquier, C. Pasquier, L. Brisson, and M. Collard, (2005).“Mining gene expression data using domain
knowledge,” Int. J.Softw. Informat, vol. 2, no. 2, pp. 215–231, [Online] Available: http://www.ijsi.org/1673-
7288/2/215//
[10] N. Revathy and R. Amalraj, “Accurate cancer classification using expressions few genes,” Int. J. Comput. Appl.,
vol. 14, no. 4,pp. 19–22, Jan. 2005.
[11] Y. Su, T. M. Murali, V. Pavlovic, M. Schaffer, and S. Kasif, (2005)“RankGene: Identification of diagnostic genes
based on expression data,” Bioinformatics, vol. 19, no. 12, pp. 1578–1579, [Online] Avaialble:
http://bioinformatics.oxfordjournals.org/content/19/
[12] K. Raza and A. Mishra, “A novel anticlustering filtering algorithm for the prediction of genes as a drug target,”
Amer. J. Biomed. Eng.vol. 2, no. 5, pp. 206–211, 2004.
[13] D. Jiang, C. Tang, and A. Zhang, “Cluster analysis for gene expression data: A survey,” IEEE Trans. Knowl. Data
Eng., vol. 16,no. 11, pp. 1370–1386, Nov. 2004.
[14] D. A. Roff and R. Preziosi, “The estimation of the genetic correlation:The use of the jackknife,” Heredity, vol. 73,
pp. 544–548, 2004.
[15] T. Scharl and F. Leisch, “Jackknife distances for clustering timecourse expression data,” in Proc. ASA Biometrics,
2005, p. 8.
[16] K. M Williams, “Statistical Methods for analysing microarray data: Detection of differentially expressed genes”
Inst. Signal Process.,Tampere Univ. Technol. Tampere, Finland, Dep. Biology,Univ. York, York, U.K., 2004.
[17] B. Collard, “An ontology driven data mining process” Inst.TELECOM, TELECOM Bretagne, CNRS FRE 3167
LAB-STICC,Technopole Brest-Iroise, France & Univ. Nice Sophia Antipolis,France, 2003.
[18] J. Hauke and T. Kossowski, “Comparison of values of Pearson‟s and Spearman‟s correlation coefficient on the
same sets of data,”Quaestiones Geographicae, vol. 30, no. 2, pp. 87–93, 2003.
[19] B. Collard, “How to semantically enhance a data mining process?”Lecture Notes Bus. Inform. Process., vol. 13,
pp. 103–116, 2003.

More Related Content

PDF
AI for drug discovery
PDF
Deep learning for biomedical discovery and data mining II
PPTX
A Semantics-based Approach to Machine Perception
DOCX
CNNS Brochure
PDF
June 2020: Top Read Articles in Advanced Computational Intelligence
PPTX
The Amazing Ways Artificial Intelligence Is Transforming Genomics and Gene Ed...
PPT
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
PDF
ONTOLOGY BASED TELE-HEALTH SMART HOME CARE SYSTEM: ONTOSMART TO MONITOR ELDERLY
AI for drug discovery
Deep learning for biomedical discovery and data mining II
A Semantics-based Approach to Machine Perception
CNNS Brochure
June 2020: Top Read Articles in Advanced Computational Intelligence
The Amazing Ways Artificial Intelligence Is Transforming Genomics and Gene Ed...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
ONTOLOGY BASED TELE-HEALTH SMART HOME CARE SYSTEM: ONTOSMART TO MONITOR ELDERLY

What's hot (14)

PPTX
A metadata scheme of the software-data relationship: A proposal
PDF
Next generation big data analytics state of the art
PDF
Privacy Preserving Aggregate Statistics for Mobile Crowdsensing
PDF
Intelligent data analysis for medicinal diagnosis
PDF
A REVIEW ON PREDICTIVE ANALYTICS IN DATA MINING
PDF
Deep learning for biomedicine
DOC
EDRG12_Re.doc
PPTX
Digital webinar master deck final
DOC
Cao report 2007-2012
PDF
accelerating-data-driven
PDF
Lung cancer disease analyzes using pso based fuzzy logic system
PDF
Machine Learning and Reasoning for Drug Discovery
PPT
Data quality and uncertainty visualization
PDF
Deep learning for genomics: Present and future
A metadata scheme of the software-data relationship: A proposal
Next generation big data analytics state of the art
Privacy Preserving Aggregate Statistics for Mobile Crowdsensing
Intelligent data analysis for medicinal diagnosis
A REVIEW ON PREDICTIVE ANALYTICS IN DATA MINING
Deep learning for biomedicine
EDRG12_Re.doc
Digital webinar master deck final
Cao report 2007-2012
accelerating-data-driven
Lung cancer disease analyzes using pso based fuzzy logic system
Machine Learning and Reasoning for Drug Discovery
Data quality and uncertainty visualization
Deep learning for genomics: Present and future
Ad

Viewers also liked (20)

PDF
Efficient File Sharing Scheme in Mobile Adhoc Network
PDF
Analysis of Fungus in Plant Using Image Processing Techniques
PDF
Privacy Preserving Data Leak Detection for Sensitive Data
PDF
Automatic Fire Fighting Robot
PDF
Fingerprint Feature Extraction, Identification and Authentication: A Review
PDF
Folder Security Using Graphical Password Authentication Scheme
PDF
Using Bandwidth Aggregation to Improve the Performance of Video Quality- Adap...
PDF
Fuzzy α^m-Separation Axioms
PDF
Smart Password
PDF
Diabetics Online
PDF
Tree Based Proactive Source Routing Protocol for MANETs
PDF
Efficient and Optimal Routing Scheme for Wireless Sensor Networks
PDF
Some Aspects of Abelian Categories
PDF
Sign Language Recognition with Gesture Analysis
PDF
Comparative Study of Data Mining Classification Algorithms in Heart Disease P...
PDF
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
PDF
Android Application for College
PDF
Multimedia Cloud Computing To Save Smartphone Energy
PDF
Enhancing Data Security in Cloud Storage Auditing With Key Abstraction
PDF
Secure Encrypted Data in Cloud Based Environment
Efficient File Sharing Scheme in Mobile Adhoc Network
Analysis of Fungus in Plant Using Image Processing Techniques
Privacy Preserving Data Leak Detection for Sensitive Data
Automatic Fire Fighting Robot
Fingerprint Feature Extraction, Identification and Authentication: A Review
Folder Security Using Graphical Password Authentication Scheme
Using Bandwidth Aggregation to Improve the Performance of Video Quality- Adap...
Fuzzy α^m-Separation Axioms
Smart Password
Diabetics Online
Tree Based Proactive Source Routing Protocol for MANETs
Efficient and Optimal Routing Scheme for Wireless Sensor Networks
Some Aspects of Abelian Categories
Sign Language Recognition with Gesture Analysis
Comparative Study of Data Mining Classification Algorithms in Heart Disease P...
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)
Android Application for College
Multimedia Cloud Computing To Save Smartphone Energy
Enhancing Data Security in Cloud Storage Auditing With Key Abstraction
Secure Encrypted Data in Cloud Based Environment
Ad

Similar to Supervised Multi Attribute Gene Manipulation For Cancer (20)

PDF
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
PDF
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
PDF
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
PDF
Science in Society - EAPP and Physics Magazine.pdf
PDF
Data Science Demystified_ Journeying Through Insights and Innovations
PDF
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
PDF
Framework for understanding data science.pdf
PDF
Challenges and outlook with Big Data
PDF
Review on Solar Power System with Artificial Intelligence
PDF
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
PDF
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
PPTX
"Melting Pot" of the Sciences in interdisciplinary research
PDF
Advanced Prognostic Predictive Modelling in Healthcare Data Analytics 1st Edi...
PDF
An Analysis of Outlier Detection through clustering method
DOCX
Big Data Analytics
PDF
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
PDF
Understand the Idea of Big Data and in Present Scenario
PDF
Assessment of the main features of the model of dissemination of information ...
PDF
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
DOC
Ci2004-10.doc
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
A COMPREHENSIVE STUDY ON POTENTIAL RESEARCH OPPORTUNITIES OF BIG DATA ANALYTI...
Science in Society - EAPP and Physics Magazine.pdf
Data Science Demystified_ Journeying Through Insights and Innovations
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
Framework for understanding data science.pdf
Challenges and outlook with Big Data
Review on Solar Power System with Artificial Intelligence
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
"Melting Pot" of the Sciences in interdisciplinary research
Advanced Prognostic Predictive Modelling in Healthcare Data Analytics 1st Edi...
An Analysis of Outlier Detection through clustering method
Big Data Analytics
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
Understand the Idea of Big Data and in Present Scenario
Assessment of the main features of the model of dissemination of information ...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
Ci2004-10.doc

Recently uploaded (20)

PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Current and future trends in Computer Vision.pptx
PDF
Visual Aids for Exploratory Data Analysis.pdf
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
Artificial Intelligence
PPT
Total quality management ppt for engineering students
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PDF
Soil Improvement Techniques Note - Rabbi
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PDF
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
PPT
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
PPTX
Information Storage and Retrieval Techniques Unit III
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Fundamentals of Mechanical Engineering.pptx
PDF
Categorization of Factors Affecting Classification Algorithms Selection
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Current and future trends in Computer Vision.pptx
Visual Aids for Exploratory Data Analysis.pdf
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Artificial Intelligence
Total quality management ppt for engineering students
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
Soil Improvement Techniques Note - Rabbi
Exploratory_Data_Analysis_Fundamentals.pdf
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
Automation-in-Manufacturing-Chapter-Introduction.pdf
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
Information Storage and Retrieval Techniques Unit III
Safety Seminar civil to be ensured for safe working.
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Fundamentals of Mechanical Engineering.pptx
Categorization of Factors Affecting Classification Algorithms Selection

Supervised Multi Attribute Gene Manipulation For Cancer

  • 1. ISSN 2350-1022 International Journal of Recent Research in Mathematics Computer Science and Information Technology Vol. 2, Issue 1, pp: (55-60), Month: April 2015 – September 2015, Available at: www.paperpublications.org Page | 55 Paper Publications Supervised Multi Attribute Gene Manipulation For Cancer Shenbagam.S1 , S.Brintha Rajakumari2 1 PG Student, 2 Assistant Professor, Department of CSE, Bharath University, Chennai Abstract: Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviours, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery. Keywords: Data mining tools predict future trends and behaviours, allowing businesses to make proactive, knowledge-driven decisions, Supervised Multi Attribute Gene. 1. INTRODUCTION Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. Web services have been promising in recent years and are by now one of the most popular techniques for building distributed systems. Service-oriented systems can be built efficiently by dynamically composing different web services, which are provided by other organizations. Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviours, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery.
  • 2. ISSN 2350-1022 International Journal of Recent Research in Mathematics Computer Science and Information Technology Vol. 2, Issue 1, pp: (55-60), Month: April 2015 – September 2015, Available at: www.paperpublications.org Page | 56 Paper Publications 2. RELATED WORK The cause of many cancers remains unknown. Apart from internal (genetic) causes, there are certain environmental and external factors too that participate in cancer formation within an organism, viz. environmental toxins, adulterated food intake, air pollution, and irregular lifestyle; (share as depicted in the Fig. 1). These can be categorized under epigenetics. Epigenetics is an un ignorable issue to be addressed by the biomedical community. Symptoms of cancer depend on the type and location of the cancer. For example, lung cancer can cause coughing, heavy breathing, chest pain, etc. Colon cancer often causes diarrhoea, constipation, dysentery, and blood in the stool. 2.1 MATHEMATICS IN NATURE: AN INTUITIVE CERTITUDE: Mathematics is known to be an indispensable part all sciences. It forms the edifice of all existences being a formidable aegis that “holds” all parts together. One can have certain propensity towards it and more when going through John A. Adam‟s texts Docile is the symmetry found in vegetation, anatomy of living creatures, shapes of heavenly bodies: planets being spherical and much so their orbits, to name a few. Mathematical modeling of any natural phenomena and its inherent dogma at core, which ensures it compositional as well as physical traits can be extremely expedient towards developing an understanding. A mathematical model is a feat if it fits the known data and makes accurate predictions for the future, as rendered in the fig.A snowball is defined to grow in size and attain an almost intermediary shape between a circle and a sphere as it rolls through ice. However, external factors like intensity of sunlight, heat produced due to friction/resistance dymanic surface area of the ball, etc., are the variates that contribute to the problem, profoundly (Adam, 2006). The author also registers that all mathematical models are flawed to some extent owing to the inappropriate presumptions made during their construction. The aesthetic of all natural and physical phenomena is brought to life once the mathematical undergird is realized.“Mathematics is to nature as Sherlock Holmes is to evidence.”With a compendium of suggestions to consult, it renders highly probabilistic scenario that the gene expression datawon‟t be an exception. Studies have shown that mathematical and statistical models can be built around it and they are seminal to identify biomarkers. The only apprehension is the uncertainity attached to it which is subjected to validation to establish the baseline parameters. Fig. 2.1 MATHEMATICAL MODEL
  • 3. ISSN 2350-1022 International Journal of Recent Research in Mathematics Computer Science and Information Technology Vol. 2, Issue 1, pp: (55-60), Month: April 2015 – September 2015, Available at: www.paperpublications.org Page | 57 Paper Publications 2.2 EPIGENETICS: Literally speaking, “epi” stands for “on top of” and epigenetics is on top of genetics. Governing factors for the gene expression and protein building, explicit to the inherent DNA code, delineates epigenetics. The environment and our lifestyle can significantly direct our genetic behaviour and even that of our kids. The multi cellular organisms have optimally identical underlying code, yet they have incongruent phenotypes. 3. SYSTEM MODEL In this section, our system model is designed with two phases. In Phase 1 consists of analysis and design is considered and in Phase 2 the model is implemented and validated. The Initial analysis activity involves researching the problem in detail and evaluating various approaches to resolve the same. Design activity is carried out in two fold, high level and low level design. High level design involved architectural design of the framework and case study planning as it plays a vital role in validating the approach. 4. ARCHITECTURAL OVERVIEW Predicting Cancer by analyzing gene and converting the gene expression is the proposed concept of our project, which leads to identifying and analyzing the cancer result set. Controlling Gene Activity From Gene to Functional Protein & Phenotype has also been analyzed in order to identify the cancer cells. In our proposed methodology the experts documental DNA data methylation (Gene expression segments) is a kind of binding site for proteins which make DNA inaccessible to be in alive state. 5. GENE HEAT MAP VISUALIZATION The goal is to reduce the dimensionality of data to facilitate visualization and additional analysis. They are often used as a preliminary step to clustering of large data sets. MDS starts from a distance matrix between objects and finds the locations of these objects in a low dimensional space that best preserves the original distances. These techniques work on ratio- optimization principle. It‟s almost concomitant of clustering techniques for high dimensional data to be exploratory. Their strength is in providing rough maps and suggesting directions for further study. Also, clustering results are sensitive to a variety of user-specified inputs. The clustering of a large and complex set of objects can be planned in different ways depending on the goals.
  • 4. ISSN 2350-1022 International Journal of Recent Research in Mathematics Computer Science and Information Technology Vol. 2, Issue 1, pp: (55-60), Month: April 2015 – September 2015, Available at: www.paperpublications.org Page | 58 Paper Publications Fig. 5.1 GENE HEAT MAP VISUALIZATION 6. CLUSTERING These techniques can be used in microarray analysis to facilitate visual display (mostly preferred by biologists)and interpretation of experimental results and suggest the presence of subgroups of objects (genes or samples) that behave similarly. Often finds itself as the foremost step of data infiltration since it is vital to parry microarray data for noise elimination. Confusion marks with the trait of the genes to participate in multiple pathways that may or may not be coactive under all conditions, so a gene can find its place in multiple clusters or in none at all. Clustering can be sample- based and/or gene-based by character.A gene-based clustering shall abstract genes as objects and samples as features, while sample-based clustering would perceive vice-versa. A third category of clustering type also exists, subspace clustering. Subspace clustering is not “global” rather it aims to cluster genes based on their indulgence in any disease, being a part of one or more biological pathways. Fig. 6 SUBSPACE CLUSTERING 6.1 K-Means Clustering and Self Organized Maps (SOM): It partitions objects into groups that have little variability within clusters and large variability across clusters. The user is required to specify the number K of clusters a priori. Estimation is iterative, starting with a random allocation of objects to clusters, re-allocating to minimize distance to the estimated “centroids” of the clusters, and stops when no further improvements can be made. Its implementation is easy and execution is faster. The time complexity was computed to be O (L_K_N), where L is the number of iterations in K clusters.
  • 5. ISSN 2350-1022 International Journal of Recent Research in Mathematics Computer Science and Information Technology Vol. 2, Issue 1, pp: (55-60), Month: April 2015 – September 2015, Available at: www.paperpublications.org Page | 59 Paper Publications 7. DATASET COLLECTION In this module we describe dataset collection for multiple real world web services. The term "dataset" originated in the mainframe field. A data set (or dataset) is a collection of data, usually presented in tabular form. Each column represents a variable. Each row corresponds to a member of the data set. It lists values for each of the variables. The data set may comprise data for one or more members, corresponding to the number of rows. 7.1 Given Input & Output Design: Gene Heat map visualization. Input: Fetch data and Dataset comparison. Output: diagnosis for lung cancer. 8. TECHNIQUE USED 8.1 PRE-PROCESSING PHASE: Genes, DNA clones, or expressed sequence tags [ESTs] usually constitute the DNA sequences that are scanned by microarray experiments, conditions contingent. They may include time series data of a biological process, e.g., life cycle of a yeast cell, or a collection of varied tissue samples, e.g., normal versus cancerous tissues. Study on promoter sequences can be staple for deriving transcription factors of an associated gene. Regulation of transcription is the most common form of gene control, and the activity of transcription factors allows genes to be specifically regulated during development and in different types of cells. 8.2 POST PROCESSING: Since, the pre-processing phase aids in precipitating several groups, patterns, correlations of genes at the expression level basis, it becomes almost necessary to re-evaluate and formalize them in a phase called post-processing phase. During this phase, the domain experts analyze and match the extracted patterns to the business objectives and success criteria. The dogma of pattern management is heterogeneous pattern representation. Since the extracted patterns can be relevant as well as irrelevant; indexing them is a labor intensive task that involves marking and classifying them scrupulously. Predictive Model Markup Language (PMML) and Common Warehouse model for Data Mining (CWM-DM) were designed for genetic data modelling, but they lacked the efficacy to handle and represent specific classes of patterns. As a solution, Rizzi et al. introduced Pattern Base Later, Kotsifakos et al. revised the PBMS architecture by enabling support for domain ontologies. After defining a data modelling system, it‟s vital to design a mechanism to query and extract the required data. For the same, certain APIs namely, SQL/MM DM, Java Data Mining (JDM) API were standardized to handle data as well as the metadata entwining genetic correlational patterns. 9. CONCLUSION A reliable lead and precise classification of tumors is essential for successful diagnosis and treatment of cancer. The microarray experiments may lead to a more complex understanding of the molecular version of the tumors. The ability to distuinguish between tumor classes using gene expression is a new approach to cancer classification. A microarray dataset contains the numerous groups of co-expressed genes. A typical strategy for the biologist is to start from genes which are known to be closely related to a biological function and to browse the preliminary rough clustering
  • 6. ISSN 2350-1022 International Journal of Recent Research in Mathematics Computer Science and Information Technology Vol. 2, Issue 1, pp: (55-60), Month: April 2015 – September 2015, Available at: www.paperpublications.org Page | 60 Paper Publications result, to focus on a small subset of those genes. Thus biologist follow explanatory strategies by manual knowledge. So, on experimenting with these „superficial‟ data and applying for various data mining techniques to them, the data has to be concise and close to accurate to obtain the results of cancer. REFERENCES [1] Data and Statistics. World Health Organization, Geneva, Switzerland,2010. [2] PubMedHealth- U.S. Nat. Library Med., (2009).[Online].Available:http://www. ncbi.nlm.nih.gov/pubmedhealth/ PMH0002267/ [3] S. Dudoit, J. Fridlyand, and T. P. Speed, “Comparison of discrimination methods for the classification of tumors using gene expression data,” J. Amer. Statist. Assoc., vol. 97, no. 457, pp. 77–87, Mar 2009. [4] G.-M. Elizabeth and P. Giovanni, (2008, Dec.). “Clustering and classification methods for gene expression data analysis.” Johns Hopkins Univ., Dept. of Biostatist. Working Papers. Working Paper 70. [Online]. Available: http://biostats.bepress. com / jhubiostat/paper70// [5] E. Shay, (2007, Jan.). “Microarray cluster analysis and applications”[Online]. Available: http:// www.science.co.il/enuka/Essays//Microarray-Review.pdf. [6] M. B. Eisen, T. P. Spellman, P. O. Brown, and D. Botstein, “Cluster analysis and display of genome-wide expression patterns,” Proc.Nat. Acad. Sci. USA, vol. 95, no. 25, pp. 14863–14868, Dec. 2007. [7] S. Tavazoie, D. Hughes, M. J. Campbell, R. J. Cho, and G. M.Church, “Systematic determination of genetic network architecture,” Nature Genetics, vol. 22, pp. 281–285, 2006. [8] T. Kohonen, Self-Organising Maps. Berlin, Germany: Springer-Verlag, 2005. [9] N. Pasquier, C. Pasquier, L. Brisson, and M. Collard, (2005).“Mining gene expression data using domain knowledge,” Int. J.Softw. Informat, vol. 2, no. 2, pp. 215–231, [Online] Available: http://www.ijsi.org/1673- 7288/2/215// [10] N. Revathy and R. Amalraj, “Accurate cancer classification using expressions few genes,” Int. J. Comput. Appl., vol. 14, no. 4,pp. 19–22, Jan. 2005. [11] Y. Su, T. M. Murali, V. Pavlovic, M. Schaffer, and S. Kasif, (2005)“RankGene: Identification of diagnostic genes based on expression data,” Bioinformatics, vol. 19, no. 12, pp. 1578–1579, [Online] Avaialble: http://bioinformatics.oxfordjournals.org/content/19/ [12] K. Raza and A. Mishra, “A novel anticlustering filtering algorithm for the prediction of genes as a drug target,” Amer. J. Biomed. Eng.vol. 2, no. 5, pp. 206–211, 2004. [13] D. Jiang, C. Tang, and A. Zhang, “Cluster analysis for gene expression data: A survey,” IEEE Trans. Knowl. Data Eng., vol. 16,no. 11, pp. 1370–1386, Nov. 2004. [14] D. A. Roff and R. Preziosi, “The estimation of the genetic correlation:The use of the jackknife,” Heredity, vol. 73, pp. 544–548, 2004. [15] T. Scharl and F. Leisch, “Jackknife distances for clustering timecourse expression data,” in Proc. ASA Biometrics, 2005, p. 8. [16] K. M Williams, “Statistical Methods for analysing microarray data: Detection of differentially expressed genes” Inst. Signal Process.,Tampere Univ. Technol. Tampere, Finland, Dep. Biology,Univ. York, York, U.K., 2004. [17] B. Collard, “An ontology driven data mining process” Inst.TELECOM, TELECOM Bretagne, CNRS FRE 3167 LAB-STICC,Technopole Brest-Iroise, France & Univ. Nice Sophia Antipolis,France, 2003. [18] J. Hauke and T. Kossowski, “Comparison of values of Pearson‟s and Spearman‟s correlation coefficient on the same sets of data,”Quaestiones Geographicae, vol. 30, no. 2, pp. 87–93, 2003. [19] B. Collard, “How to semantically enhance a data mining process?”Lecture Notes Bus. Inform. Process., vol. 13, pp. 103–116, 2003.