SlideShare a Scribd company logo
International Journal on Bioinformatics & Biosciences (IJBB) Vol.6, No.1, March 2016
DOI :10.5121/ijbb.2016.6101 1
METHODS USED FOR IDENTIFICATION OF
DIFFERENTIALLY EXPRESSING GENES (DEGS)
FROM MICROARRAY GENE DATASET: A REVIEW
Chanda Panse1
,Manali Kshirsagar2
, Dhananjay Raje3
, Dipak Wajgi4
1,2.Department of Computer Technology, YCCE, Nagpur,Maharashtra,441110
3.MDS Bio-analytics, Nagpur,
4. Department of Computer Engineering, STVPCE, Nagpur
ABSTRACT:
Genes contain blue print of living organism. Malfunctioning occurred in cellular life is indicated by
proteins which are responsible for behavior of genes. Fixed set of genes decides behavior and functioning
of cells. They guide the cells what to do and when to do. To analyze the insight of biological activities,
analysis of gene expressions is necessary. Advanced technology like microarray plays an important role in
gene analysis as it captures expressions of thousands of genes under different conditions simultaneously.
Out of thousands of genes, very few behave differently which are called as Differentially Expressing Genes
(DEGs). Identifying these most significant genes is a crucial task in molecular biology and is a major area
of research for bioinformaticians because DEGs are the major source of disease prediction. They help in
planning therapeutic strategies for a disease through Gene Regulatory Network (GRN) constructed from
them. GRN is a graphical representation containing genes as nodes and regulatory interactions among
them as edges. GRN helps to know change over occurred among genes which involved in cause of genetic
diseases as well as to analyze their response to different stress conditions through microarray expressions.
In this paper we have discussed many methods proposed by researchers for identifying differentially
expressing genes based upon changes in their expressions patterns.
I. INTRODUCTION
Micro array technology monitors abundant amount of gene data in terms of gene expressions.
Gene expressions are stored in terms of two dimensional matrix of size n X m where n indicates
no of genes and m indicates no of samples. Since gene expressions are stochastic in nature it is
necessary to observe changes in them over time. Understanding these changes and obtaining
useful information from them is a major challenge in bioinformatics. Single gene chip contains of
thousands of gene expressions. Before doing any downstream analysis like clustering or modeling
of GRN on gene expressions, it is necessary to identify most significant genes. These genes called
marker genes. Sudden changes in their expressions help to understand what goes wrong and
where. They also help in classifying different types of tumors and act as starting point for
studying certain systems like cell cycle, drug response etc. They also help in identifying
abnormalities in biological activities. Thus assist in detection of disease and its diagnosis at early
stage.
In this paper we have critically review existing methods used for identification of differentially
expressing genes along with their advantages and disadvantages. In this context, it has been found
in the literature that, methods which were applied to static data were also extended for time
course expressions [1]. However statistical validations of genes were not done as methods were
originally designed for static data. Therefore we have classified these methods based on nature of
microarray data slides used. Microarray datasets considered in the literature, for experimentation
fall into two main categories: Replicated Microarray Dataset (RMD) and Non-Replicated
Microarray Dataset (NRMD). Some of the most commonly methods are Fold Change (FC),
International Journal on Bioinformatics & Biosciences (IJBB) Vol.6, No.1, March 2016
ANOVA [2], RM-ANOVA [3], t-test [4], SAM [5], Empirical Bayesian method [6] etc. This
review helps researchers to design more general algorithm for identifying DEGs apart from
statistical methods.
II. TYPES OF MICROARRAY DATASET
Microarray technology is a powerful tool which enables researchers to investigate and address
issues in molecular biology by analyzing gene expressions of thousands of gene in single reaction
and in an efficient manner. A typical microarray chip preparation involves hybridization of an
mRNA molecule to DNA template from which it is originated. An array is constructed from
many DNA molecules. The amount of mRNA bound to each site on the array indicates the
expression levels of various genes.
Before actually going into review of existing methods, first we will see short background of what
is replicated time series and static microarray slide. There is a possibility that gene expressions
are not consistently captured because of factors like dust and scratches on glass slides, poor
illumination. In order to have reliable and invariable gene expressions, experiment is replicated
that is same set of experiment is repeated for several times. The author in paper [7] states the
importance of replication in misclassification of genes. Replication is not a duplication of
experiment. When microarray data from several replications are combined, there is reduction in
false positive and false negative rate. Time series microarray captures multiple gene expressions
at discrete time points (minutes, hours or days) whereas static microarray slide contains gene
expressions for differential conditions only ones. As data is captured for different experimental
conditions, it is not possible to have mathematical relationship between conditions. Therefore it is
needed to have several replicates for such conditions. In the next section we will have brief
review of methods applied on replicated time series data.
III. METHODS FOR REPLICATED MICROARRAY DATASET
Based on the limitations of existing methods[8][9] which would required to know DEGs in
advanced, author in paper[10] compared 3 empirical Bayes methods i.e CyberT, BRB and Limma
t-test on synthetic replicated gene expressions. It is found that CyberT maintains fixed False
Positive Rate for data with unstable variance across intensities and BRB and Limma t-test also
generate many false positive based on the preprocessing used on data without decreasing True
Positive Rate(TPR). In order to extract useful biological knowledge from large microarray data,
multivariate data analysis is needed as it reduces dimensions. Therefore Principal Component
Space based algorithm is proposed in [11] where replicated microarray dataset of Tomato is used
for finding DEGs. Genes which are close to a particular condition are considered as significant
genes. If there exist strong relationship between gene and a condition then it is significantly
expressing in that condition. Closeness measure is denoted by Cd which is given by
is a function which indicates whether
gene belongs to direction di. Cd can have three values, 0, 1 and -1. 1 and -1 indicate strong relation
between gene and specific condition and 0 indicates non-expressing gene in particular condition
[11]. As replication of microarray chip is costly, very few methods exist for finding DEGs.
IV. METHODS FOR NON-REPLICATED MICROARRAY DATASET
In paper [12] author has adapted method from text categorization and information retrieval
literature for classifying genes into diseased and normal category by identifying their
International Journal on Bioinformatics & Biosciences (IJBB) Vol.6, No.1, March 2016
discriminating capability. Author has decided a threshold value t which will classify a gene into
one of the classes’ i.e diseased or normal by observing gene expressions in both classes and
discriminating V score is calculated as follows. Genes having highest value of V are the most
discriminating genes.
Where a,b,c,d are the count of gene expression values of gene g having values greater than equal
to or less than equal to threshold t. But some time it is very difficult to identify threshold value t
because of stochastic nature of gene expressions. As microarray contains noise and lack of
normal pattern of gene expressions, Olga [13] compared three model-free non-parametric
methods i.e. t-test, rank-sum test and heuristic method based on high Pearson co-relation
coefficient. Out of these three methods Wilcoxon Rank Sum Test performs better than others. But
TPR rate depends on noise in the dataset and p-value selected. Another replication-free method is
proposed in [14] based on significant temporal variation exhibited by genes in estrous cycle of rat
mammary gland. The algorithm is designed in such a way that it will fit data on B-splins curve.
But the algorithm is only applicable for time dependent biological processes such as cell cycle,
circadian rhythms, development patterns, hormonal fluctuations where gene expressions vary
with time. Again the experiment was conducted on simulated data.
As clustering helps in finding biologically co-expressing genes, a method was discussed for non-
replicated datasets in[15] based on pareto-optimal clustering using supervised learning in which
genes are assigned to a cluster using SVM. But prior to clustering there is need of identification
of significant genes which was done manually by observing changes in expressions of genes.
In order to observe the periodic nature in microarray gene expressions, average periodigram is
used in [16]. It is tool used in time series analysis for observing peaks in time domain. Upon
noticing peaks, their significance is validated using g-test of Fisher [17] followed by calculation
of p-value for each test and gene which reject null hypothesis is considered as DEG. But for the
application of g-test minimum 40 measurements per gene are desirable and as p-value for each
gene is calculated, the method is time consuming.
For biological processes which are periodic in nature, Fast Fourier Transform based method is
proposed in [18] whose performance is affected by small time series experiments and missing
values.
By observing the behavior of genes in microarray, author Ping Ma in [19] proposed a technique
in which the time series gene chip is classified into two classes of genes. Genes which do not
interact with time are kept in PDE (Parallel Differentially Expressing) class and those with time
interaction are deposited in NPDE (Non-Parallel Differentially Expressing) class. After
classification a functional ANOVA mixed-effect model is applied for identifying NPDE genes.
For some cases of datasets, the method had identified NPDE genes as PDE. As above methods
are examining individual gene, time required for the analysis is more. To overcome this, a new
Functional Principal Component (FPC) approach is developed in [20] which observe the temporal
trajectories of gene expressions which are modeled using few basic functions which require lesser
number of parameters as compared to earlier methods. This method is suffering with increase in
false positive rate. Another method based on Fourier transform is proposed in [21]. In this method
genes which are not differentially expressed are filtered out on the basis of Fourier coefficients.
As number of genes gets reduced model-based clustering is applied on them. But disadvantage of
this method is that for each gene it is needed to calculate Fourier coefficient. Another approach is
proposed for identifying DEGs in non-replicated time-course data by inculcating FPCA into
hypothesis testing framework [22]. Other method includes a geometrical approach for
identification of DEGs in which co-relation between genes is considered. It is multivariate
International Journal on Bioinformatics & Biosciences (IJBB) Vol.6, No.1, March 2016
approach. It reframes linear classification method to identify DEGs by defining a separate hyper-
plane, the orientation of which decides DEGs. Comparative analysis of statistical methods is done
in [23] along with various tools and packages used for identifying DEGs mentioned in it. It is
observed that none of the statistical method is the best. The choice of method depends on the type
of dataset selected for the experimentation. In Fold change method a log ratio of expressions
under two conditions is taken. Its drawback is that statistical variance is not considered in [24].
FC method is subject to bias if the data have not been properly normalized. The danger of false
positive and false negative is illustrated in [25] by Tanaka after strictly considering only FC. The
shortcomings of FC are removed by t-test by considering variance. Two sample t-statistics is
calculated as follows.
Where s is sample variance and n1 and n2 are number of observations in each condition. But
small sample size is the major obstacle in it[26]. Again it is observed that variance is not same
within each group. Therefore Welch proposed t-test for unequal variance [27] by correcting
degree of freedom for unequal variance as follows.
Based on the limitation of small sample size, Wilcoxon rank sum test has used as alternative
method for testing differential expressions Many of these methods are based on hypothesis
testing. Author in paper [32] applied hypothesis test on each gene to determine as to whether it
falls under the category of significant or non-significant gene by observing its variance in
population average versus time curve. Apart from this Angelini [33] have used Bayesian
approach to identify highly expressed genes. Selection of method depends on nature of gene
expressions i.e. Static and
Dynamic. In static type, gene expressions are measured at single time point for multiple subjects
whereas in dynamic type, multiple time points are used for capturing gene expressions.
While searching for the significant genes, it is quite possible that redundant genes get extracted
from the dataset. To avoid this PSO based approach is proposed in[28] which will identify non-
redundant disease related genes. In this paper, multi-objective function is used which will
optimize multiple goals i.e minimization of sensitivity and minimization of specificity.
Redundancy is avoided in this approach by selecting the solution which gives mutually
maximally dissimilar genes. As size of solution is (n+n*s), where n is number of genes and s is
number of samples, time required is more and author has just given 2 or 3 biomarker genes for
different cancer related datasets.
As replication of time series microarray expressions is costly and it is not possible to fit
irregularly varying gene expressions into predefined models, a general approach is proposed
in[30]. It is based on Partial Energy ratio for Microarray (PEM). PEM statistic is incorporated
into the permutation based SAM framework for significance analysis. This method suffers with
low samples as signal smoothing is not possible for dataset consisting of fewer samples. Again a
general method is proposed to overcome the drawback of modeling methods in [30]. This method
International Journal on Bioinformatics & Biosciences (IJBB) Vol.6, No.1, March 2016
uses cubic splines which are set of cubic polynomials used for fitting time series and noisy data
and each gene is represented by spline but noise which makes the process computationally
intensive. A Ranking and Combination based method is described in [34] which rank genes based
on feature such as variance, deviation, correlation and probability. After sorting genes rank-wise,
combination is done.
Thus we have reviewed many methods for replicated and non-replicated datasets. Following table
will show results of those methods in terms of count of significantly identified genes in particular
dataset on which that method is applied.
Method Dataset Actual count of Count
genes significant
genes
t-test [35] Prostate cancer 27575 9985
SAM [5 ] Human lymphoidblastoid cell 6800 180
lines
ANOVA [2] Human Liver and skeletal - 1274(male)
smaples(male) 1886(female)
Placenta(female)
RM-ANOVA[3] Breast cancer 1900 55
Empirical Bayes [6] Lever cancer 6810 127
Wilcoxon(RST) [13] Lung tumor 91
PEM [PEM] Yeast cell cycle 800 104
PSO-based [28] Prostate cancer 12533 6
Diffuse B-cell lymphoma 7070 1
ALL (child-ALL) 8280 2
CML 12625 5
Fourier Transform Yeast cell cycle 4489 2227
[29]
FPCA [20] C. Elegance 2430 1982
B-spline[14] Estrous cycle 21044 871
International Journal on Bioinformatics & Biosciences (IJBB) Vol.6, No.1, March 2016
CONCLUSION AND FUTURE SCOPE
Thus we have reviewed statistical as well as non-statistical methods used for finding DEGs.
Identification of differentially expressing genes is the main area of research in Bioinformatics.
Though many methods exist in the literature but none of them is the best. Model-based methods
are not suitable for any type of datasets because it is not feasible to fit stochastic gene expressions
into those models. In these model-based methods computational cost increases as number of
genes increases because number of mathematical expressions increases with respect to count of
genes. Also many statistical methods are not discussing about missing values before finding
DEGs. Curve based methods like b-splines fit individual gene expressions into curve. This results
in increasing computation overhead and time requirement.
Many tools have been developed based on statistical methods. These include SAM, ANOVA, t-
test etc. Since genes expressions do not have specific pattern, a generalized algorithm is needed
which will find significant genes and one has to explore the comparison between statistical and
non-statistical method by considering same dataset. Also methods for which same genes are
observed as significant will be compared on other factors like time and computational cost.
REFERENCES
[1] Tusher V.G., Tibshirani R., and Chu G, “Significance Analysis of Microarrays Applied to
the Ionizing Radiation Response,” Proceeding National Academy of Sciences USA, vol. 98,
2001
[2] Kerr,M.K., Martin,M. and Churchill,G.A. “Analysis of variance for gene expression
microarray data”, Journal of Computational. Biology., 7,2000.
[3] Ola EiBakry,M.Omair Ahmad and M.N.S. Swamy, “Identification of Differentially
Expressed Genes for Time-Course Microarray Data Based on Modified RM ANOVA”,
IEEE/ACM Transactions on Computational Biology and Bioinformatics,vol.9, 2012
[4] Thomas JG, Olson JM, Tapscott SJ, Zhao LP: An efficient and robust statistical modeling
approach to discover differentially expressed genes using genomic expression profiles.
Genome Research vol.11,2001
[5] Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the
ionizing radiation response. Proc Natl Acad Sci USA, vol.98, 2001
[6] Efron B, Tibshirani R, Gross V, Tusher VG: Empirical Bayes analysis of a microarray
experiment. Journal of American Statistic Association, vol. 96,2001
[7] Lee ML, Kuo FC,Whitemore GA and Sklar J. “Importance of replication in microarray
studies: statistical methods and evidence from repetitive cDNA hybridization”, Proceeding
National Academic Science,USA, vol.97,2000
[8] Qin LX, Kerr KF: Empirical evaluation of data transformations and ranking statistics for
microarray analysis. Nucleic Acids Res, vol.32, 2004
[9] Sioson AA, Mane SP, Li P, Sha W, Heath LS, Bohnert HJ, Grene R: The statistics of
identifying differentially expressed genes in Expresso and TM4: a comparison. BMC
Bioinformatics vol.7,2006
International Journal on Bioinformatics & Biosciences (IJBB) Vol.6, No.1, March 2016
[10] Carl Murie,Owen Woody,Anna Y Lee and Robert Nadon, “Comparison of small n statistical
tests of differential expression applied to microarryas”. BMC Bioinformatics,vol.10,2009
[11] Luis Ospina and Liliana Lopez-Kleine , “Indentification of differentially expressed genes in
microarray data in a principal component space”, SpringerPlus, vol.2,2013
[12] Hisham Al-Mubaid and Noushin Ghaffari, “Identifying the Most Significant genes from
Gene expression Profiles for Sample Classification”, Proceeding ,IEEE conference on
Granular Computation,2006
[13] Olga G. Toyanskaya, Mitchell E. Garber, Patrick O. Brown, David Botstein and Russ B.
Altman, “Nonparametric methods for identifying differentially expressed genes in
microarray”, Bioinofrmatics, vol.18,2002
[14] Stephen C Billupus, Margaret C Neville, Michael Rudolph, Weston Porter and Pepper
Schedin, “Identifying significant temporal variation in time curse microarray data without
replicated”, BMC Bioinformatics, vol.10,2009
[15] Ujjwal Maulik, Anirban Mukhopadhyay,Sangmitra Bandopadhyay, “Combining Pareto-
optimal clustering using supervised learning for identifying co-expressed genes”, BMC
Bioinformatics, vol.10,2009
[16] Sofia Wichert, Konstantinos Fokianos and Korbinian Strimmer, “Indentifying periodically
expressed transcripts in microarray time series data”, Bioinformatics, vol.20,2004
[17] Fisher R.A. “Tests of significance in harmonic analysis”, Proceeding Royal Society
Publishing, vol.125,1929
[18] Jerry Chen, and Paul Paolini, “Fourier Analysis of Time Course Microarray data and its
Relevance to Gene Expressions Dynamics”, Proceeding ACSESS , 2008
[19]Ping Ma, Wenxuan Zhong, Jun S. Liu, “Identifying Differentially Expressed Genes in time
Course Microarray Data”, Statistic in Bioscience, vol,1. 2009
[20] Chen Kun, Wang, Jane-Ling, “Identifying Differentially Expressed Genes for Time-course
Microarray Data through Functional Data Analysis”, Statistic in biosciences,vol.2,2010
[21] Jaehee Kim, Robert Todd Ogden and Haseong Kim, “ A method to identify differential
expression profiles of time-course gene data with Fourier transform”, BMC Bioinformatics,
col.14,2013
[22] Shuang Wu, Hulin Wu, “More powerful significant testing for time course gene expression
data using functional principal component analysis approaches”, BMC Bioinformatics,
vol.14, 2013
[23] J. Sreekumar and K.K. Jose, “Statistical tests for identification of differentially expressed
genes in cDNA microarray experiments”, Indian Journal of Biotechnology, vol.7,no.10,2008
[24] Biju J., Anuparna S and Govindswami K, “Microarray -chipping in genomics”, Indian
Journal of Biotechnology ,vol.1,2002
[25] Tanaka T.S.,Jaradat S.A., Lim M.K., Kargul G.J.,Wang X., “Genome-wide expression
International Journal on Bioinformatics & Biosciences (IJBB) Vol.6, No.1, March 2016
profiling and mid-gestation placenta and embriyo using a 15000 mouse development cDNA
microarray”, Proceeding National Academy of science USA, vol.97, 2002
[26] Devore J. And Peck R. “Satistics: The exploration and analysis of data” 3rd edition, Duxury
Press, Pacific Grove,CA,1997
[27] Welch B.I., “The significance of the difference between two means when population means
are unequal”,Biometrika, vol.29,1938
[28] Anirban Mukhopadhyay and Monalisa Mandal, “Identifying Non-Redundant Gene Markers
from Microarray Data: A Multiobjective variable Length PSO-Based Approach”,
IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.11, 2014
[29] Jaehee Kim, Robert Todd Ogden and Haseong Kim, “A method to identify differential
expression profiles of time-course gene data with Fourier transform”, BMC Bioinformatics,
vol.14,no.310,2013
[30] Xu Han, “ PEM: A General Statistical Approach for Identifying Differentially Expressed
Genes in the Time-Course cDNA Microarray Experiment Withour Replicate”, Journal of
Bioinformatics and Computational Biology,2006
[31] Ziv Bar-Joseph, Georg Gerber, Itamar Simon, David K. Gifford and Tommi Jaakkola,
“Comparing the Continuous Representation of time-seriess expressions profiles to identify
Differentially expressed genes”, PNAS, vol.100,no.18,2003
[32] J.D. Storey,W.Xiao, J.T. Leek, R.G.Tompkins, and R.W. Davis, “Significance Analysis of
Time Course Microarray Experiments”, Proceeding National Academy of science USA, vol.
102,2005
[33]C. Angelini, D. De Canditiis, M. Mutarelli and M. Pensky, “A Bayesian approach to
estimation and testing in time-course microarray Experiments”, Statistical application in
Genetics and Molecular Biology,vol.6,2007
[34] Han-Yu Chuang, Hongfang Liu, Stuart Brown, Cameron McMunn-Coffran, “Identifying
Significant Genes from Microarray Data”, Fourth IEEE Symposium on Bioinformatics and
Bioengineering, 2004
[35] Khalid Raza and Rajni Jaiswal, “Reconstruction and Analysis of Cancer-specific Gene
Regulatory Networks from Gene Expression Profiles”, vol.3,2013

More Related Content

PDF
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
PDF
RECONSTRUCTION AND ANALYSIS OF CANCERSPECIFIC GENE REGULATORY NETWORKS FROM G...
PDF
Reconstruction and analysis of cancerspecific Gene regulatory networks from G...
PDF
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
PDF
An Overview on Gene Expression Analysis
PDF
Efficacy of Non-negative Matrix Factorization for Feature Selection in Cancer...
PDF
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
PDF
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
RECONSTRUCTION AND ANALYSIS OF CANCERSPECIFIC GENE REGULATORY NETWORKS FROM G...
Reconstruction and analysis of cancerspecific Gene regulatory networks from G...
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
An Overview on Gene Expression Analysis
Efficacy of Non-negative Matrix Factorization for Feature Selection in Cancer...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...

Similar to METHODS USED FOR IDENTIFICATION OF DIFFERENTIALLY EXPRESSING GENES (DEGS) FROM MICROARRAY GENE DATASET: A REVIEW (20)

PDF
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
PDF
Sample Work For Engineering Literature Review and Gap Identification
PDF
Classification of Microarray Gene Expression Data by Gene Combinations using ...
PDF
RT-PCR and DNA microarray measurement of mRNA cell proliferation
PPTX
How to analyse large data sets
PDF
Survey and Evaluation of Methods for Tissue Classification
PDF
Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...
PDF
RapportHicham
PDF
MicroRNA-Disease Predictions Based On Genomic Data
PDF
overview on Next generation sequencing in breast csncer
PDF
Austin Neurology & Neurosciences
DOCX
Molecular marker and its application in breed improvement and conservation.docx
PDF
Analisis de la expresion de genes en la depresion
PDF
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
PDF
Mining of Important Informative Genes and Classifier Construction for Cancer ...
PDF
Research Statement Chien-Wei Lin
PDF
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
PDF
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
PDF
Graphical Model and Clustering-Regression based Methods for Causal Interactio...
PDF
Impact_of_gene_length_on_DEG
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
Sample Work For Engineering Literature Review and Gap Identification
Classification of Microarray Gene Expression Data by Gene Combinations using ...
RT-PCR and DNA microarray measurement of mRNA cell proliferation
How to analyse large data sets
Survey and Evaluation of Methods for Tissue Classification
Innovative Technique for Gene Selection in Microarray Based on Recursive Clus...
RapportHicham
MicroRNA-Disease Predictions Based On Genomic Data
overview on Next generation sequencing in breast csncer
Austin Neurology & Neurosciences
Molecular marker and its application in breed improvement and conservation.docx
Analisis de la expresion de genes en la depresion
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
Mining of Important Informative Genes and Classifier Construction for Cancer ...
Research Statement Chien-Wei Lin
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
GRAPHICAL MODEL AND CLUSTERINGREGRESSION BASED METHODS FOR CAUSAL INTERACTION...
Graphical Model and Clustering-Regression based Methods for Causal Interactio...
Impact_of_gene_length_on_DEG
Ad

More from ijbbjournal2 (20)

PDF
Computational identification of Brassica napus pollen specific protein Bnm1 a...
PDF
Amino acid sequence based in silico analysis of βgalactosidases
PDF
CALL FOR PAPERS.!!! International Journal on Bioinformatics & Biosciences (I...
PDF
SECURING TELEHEALTH PLATFORMS: ML-POWERED PHISHING DETECTION WITH DEVOPS IN H...
PDF
Impact of Latitude on Sexual Size Dimorphism in Ground Beetle Poecilus Cupreus L
PDF
Wave Optics of Chromosomes for the Remote Transfer of the Bioholographic Anal...
PDF
DESIGN OF UWB ANTENNA WITH IMPROVED RADIATION FOR CONVENTIONAL GSM, LTE NETWO...
PDF
Securing Telehealth Platforms: ML-powered Phishing Detection with Devops in H...
PDF
Impact of Latitude on Sexual Size Dimorphism in Ground Beetle Poecilus Cupreus L
PDF
Wave Optics of Chromosomes for the Remote Transfer of the Bioholographic Anal...
PDF
A COMPARATIVE IN SILICO STUDY FINDS A FUNCTIONAL CO-RELATION BETWEEN HUMAN HS...
PDF
CALL FOR PAPERS.!!! - International Journal on Bioinformatics & Biosciences ...
PDF
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...
PDF
STRATEGY FOR ELECTROMYOGRAPHY BASED DIAGNOSIS OF NEUROMUSCULAR DISEASES FOR A...
PDF
SURVEY ON MODELLING METHODS APPLICABLE TO GENE REGULATORY NETWORK
PDF
A Morphological Multiphase Active Contour for Vascular Segmentation
PDF
A New SDM Classifier Using Jaccard Mining Procedure Case Study: Rheumatic Fev...
PDF
PAPER TITLE - Study of Vitellogenin Motif
PDF
Gene Expression Mining for Predicting Survivability of Patients in Earlystage...
PDF
Novel Adaptive Filter (NAF) for Impulse Noise Suppression from Digital Images
Computational identification of Brassica napus pollen specific protein Bnm1 a...
Amino acid sequence based in silico analysis of βgalactosidases
CALL FOR PAPERS.!!! International Journal on Bioinformatics & Biosciences (I...
SECURING TELEHEALTH PLATFORMS: ML-POWERED PHISHING DETECTION WITH DEVOPS IN H...
Impact of Latitude on Sexual Size Dimorphism in Ground Beetle Poecilus Cupreus L
Wave Optics of Chromosomes for the Remote Transfer of the Bioholographic Anal...
DESIGN OF UWB ANTENNA WITH IMPROVED RADIATION FOR CONVENTIONAL GSM, LTE NETWO...
Securing Telehealth Platforms: ML-powered Phishing Detection with Devops in H...
Impact of Latitude on Sexual Size Dimorphism in Ground Beetle Poecilus Cupreus L
Wave Optics of Chromosomes for the Remote Transfer of the Bioholographic Anal...
A COMPARATIVE IN SILICO STUDY FINDS A FUNCTIONAL CO-RELATION BETWEEN HUMAN HS...
CALL FOR PAPERS.!!! - International Journal on Bioinformatics & Biosciences ...
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...
STRATEGY FOR ELECTROMYOGRAPHY BASED DIAGNOSIS OF NEUROMUSCULAR DISEASES FOR A...
SURVEY ON MODELLING METHODS APPLICABLE TO GENE REGULATORY NETWORK
A Morphological Multiphase Active Contour for Vascular Segmentation
A New SDM Classifier Using Jaccard Mining Procedure Case Study: Rheumatic Fev...
PAPER TITLE - Study of Vitellogenin Motif
Gene Expression Mining for Predicting Survivability of Patients in Earlystage...
Novel Adaptive Filter (NAF) for Impulse Noise Suppression from Digital Images
Ad

Recently uploaded (20)

PDF
CHAPTER 9 MEETING SAFETY NEEDS FOR OLDER ADULTS.pdf
PPTX
Immunity....(shweta).................pptx
PPTX
ABG advance Arterial Blood Gases Analysis
PPTX
Infection prevention and control for medical students
PDF
Myers’ Psychology for AP, 1st Edition David G. Myers Test Bank.pdf
PPTX
Rheumatic heart diseases with Type 2 Diabetes Mellitus
PDF
Dr. Jasvant Modi - Passionate About Philanthropy
PPTX
Medical aspects of impairment including all the domains mentioned in ICF
PPTX
Importance of Immediate Response (1).pptx
PPTX
3. Adherance Complianace.pptx pharmacy pci
PDF
Khaled Sary- Trailblazers of Transformation Middle East's 5 Most Inspiring Le...
PPTX
1. Drug Distribution System.pptt b pharmacy
PPTX
different types of Gait in orthopaedic injuries
PPTX
General Pharmacology by Nandini Ratne, Nagpur College of Pharmacy, Hingna Roa...
PDF
Dermatology diseases Index August 2025.pdf
PPTX
PE and Health 7 Quarter 3 Lesson 1 Day 3,4 and 5.pptx
PDF
Dr Masood Ahmed Expertise And Sucess Story
PPTX
CBT FOR OCD TREATMENT WITHOUT MEDICATION
PPTX
community services team project 2(4).pptx
PPTX
BLS, BCLS Module-A life saving procedure
CHAPTER 9 MEETING SAFETY NEEDS FOR OLDER ADULTS.pdf
Immunity....(shweta).................pptx
ABG advance Arterial Blood Gases Analysis
Infection prevention and control for medical students
Myers’ Psychology for AP, 1st Edition David G. Myers Test Bank.pdf
Rheumatic heart diseases with Type 2 Diabetes Mellitus
Dr. Jasvant Modi - Passionate About Philanthropy
Medical aspects of impairment including all the domains mentioned in ICF
Importance of Immediate Response (1).pptx
3. Adherance Complianace.pptx pharmacy pci
Khaled Sary- Trailblazers of Transformation Middle East's 5 Most Inspiring Le...
1. Drug Distribution System.pptt b pharmacy
different types of Gait in orthopaedic injuries
General Pharmacology by Nandini Ratne, Nagpur College of Pharmacy, Hingna Roa...
Dermatology diseases Index August 2025.pdf
PE and Health 7 Quarter 3 Lesson 1 Day 3,4 and 5.pptx
Dr Masood Ahmed Expertise And Sucess Story
CBT FOR OCD TREATMENT WITHOUT MEDICATION
community services team project 2(4).pptx
BLS, BCLS Module-A life saving procedure

METHODS USED FOR IDENTIFICATION OF DIFFERENTIALLY EXPRESSING GENES (DEGS) FROM MICROARRAY GENE DATASET: A REVIEW

  • 1. International Journal on Bioinformatics & Biosciences (IJBB) Vol.6, No.1, March 2016 DOI :10.5121/ijbb.2016.6101 1 METHODS USED FOR IDENTIFICATION OF DIFFERENTIALLY EXPRESSING GENES (DEGS) FROM MICROARRAY GENE DATASET: A REVIEW Chanda Panse1 ,Manali Kshirsagar2 , Dhananjay Raje3 , Dipak Wajgi4 1,2.Department of Computer Technology, YCCE, Nagpur,Maharashtra,441110 3.MDS Bio-analytics, Nagpur, 4. Department of Computer Engineering, STVPCE, Nagpur ABSTRACT: Genes contain blue print of living organism. Malfunctioning occurred in cellular life is indicated by proteins which are responsible for behavior of genes. Fixed set of genes decides behavior and functioning of cells. They guide the cells what to do and when to do. To analyze the insight of biological activities, analysis of gene expressions is necessary. Advanced technology like microarray plays an important role in gene analysis as it captures expressions of thousands of genes under different conditions simultaneously. Out of thousands of genes, very few behave differently which are called as Differentially Expressing Genes (DEGs). Identifying these most significant genes is a crucial task in molecular biology and is a major area of research for bioinformaticians because DEGs are the major source of disease prediction. They help in planning therapeutic strategies for a disease through Gene Regulatory Network (GRN) constructed from them. GRN is a graphical representation containing genes as nodes and regulatory interactions among them as edges. GRN helps to know change over occurred among genes which involved in cause of genetic diseases as well as to analyze their response to different stress conditions through microarray expressions. In this paper we have discussed many methods proposed by researchers for identifying differentially expressing genes based upon changes in their expressions patterns. I. INTRODUCTION Micro array technology monitors abundant amount of gene data in terms of gene expressions. Gene expressions are stored in terms of two dimensional matrix of size n X m where n indicates no of genes and m indicates no of samples. Since gene expressions are stochastic in nature it is necessary to observe changes in them over time. Understanding these changes and obtaining useful information from them is a major challenge in bioinformatics. Single gene chip contains of thousands of gene expressions. Before doing any downstream analysis like clustering or modeling of GRN on gene expressions, it is necessary to identify most significant genes. These genes called marker genes. Sudden changes in their expressions help to understand what goes wrong and where. They also help in classifying different types of tumors and act as starting point for studying certain systems like cell cycle, drug response etc. They also help in identifying abnormalities in biological activities. Thus assist in detection of disease and its diagnosis at early stage. In this paper we have critically review existing methods used for identification of differentially expressing genes along with their advantages and disadvantages. In this context, it has been found in the literature that, methods which were applied to static data were also extended for time course expressions [1]. However statistical validations of genes were not done as methods were originally designed for static data. Therefore we have classified these methods based on nature of microarray data slides used. Microarray datasets considered in the literature, for experimentation fall into two main categories: Replicated Microarray Dataset (RMD) and Non-Replicated Microarray Dataset (NRMD). Some of the most commonly methods are Fold Change (FC),
  • 2. International Journal on Bioinformatics & Biosciences (IJBB) Vol.6, No.1, March 2016 ANOVA [2], RM-ANOVA [3], t-test [4], SAM [5], Empirical Bayesian method [6] etc. This review helps researchers to design more general algorithm for identifying DEGs apart from statistical methods. II. TYPES OF MICROARRAY DATASET Microarray technology is a powerful tool which enables researchers to investigate and address issues in molecular biology by analyzing gene expressions of thousands of gene in single reaction and in an efficient manner. A typical microarray chip preparation involves hybridization of an mRNA molecule to DNA template from which it is originated. An array is constructed from many DNA molecules. The amount of mRNA bound to each site on the array indicates the expression levels of various genes. Before actually going into review of existing methods, first we will see short background of what is replicated time series and static microarray slide. There is a possibility that gene expressions are not consistently captured because of factors like dust and scratches on glass slides, poor illumination. In order to have reliable and invariable gene expressions, experiment is replicated that is same set of experiment is repeated for several times. The author in paper [7] states the importance of replication in misclassification of genes. Replication is not a duplication of experiment. When microarray data from several replications are combined, there is reduction in false positive and false negative rate. Time series microarray captures multiple gene expressions at discrete time points (minutes, hours or days) whereas static microarray slide contains gene expressions for differential conditions only ones. As data is captured for different experimental conditions, it is not possible to have mathematical relationship between conditions. Therefore it is needed to have several replicates for such conditions. In the next section we will have brief review of methods applied on replicated time series data. III. METHODS FOR REPLICATED MICROARRAY DATASET Based on the limitations of existing methods[8][9] which would required to know DEGs in advanced, author in paper[10] compared 3 empirical Bayes methods i.e CyberT, BRB and Limma t-test on synthetic replicated gene expressions. It is found that CyberT maintains fixed False Positive Rate for data with unstable variance across intensities and BRB and Limma t-test also generate many false positive based on the preprocessing used on data without decreasing True Positive Rate(TPR). In order to extract useful biological knowledge from large microarray data, multivariate data analysis is needed as it reduces dimensions. Therefore Principal Component Space based algorithm is proposed in [11] where replicated microarray dataset of Tomato is used for finding DEGs. Genes which are close to a particular condition are considered as significant genes. If there exist strong relationship between gene and a condition then it is significantly expressing in that condition. Closeness measure is denoted by Cd which is given by is a function which indicates whether gene belongs to direction di. Cd can have three values, 0, 1 and -1. 1 and -1 indicate strong relation between gene and specific condition and 0 indicates non-expressing gene in particular condition [11]. As replication of microarray chip is costly, very few methods exist for finding DEGs. IV. METHODS FOR NON-REPLICATED MICROARRAY DATASET In paper [12] author has adapted method from text categorization and information retrieval literature for classifying genes into diseased and normal category by identifying their
  • 3. International Journal on Bioinformatics & Biosciences (IJBB) Vol.6, No.1, March 2016 discriminating capability. Author has decided a threshold value t which will classify a gene into one of the classes’ i.e diseased or normal by observing gene expressions in both classes and discriminating V score is calculated as follows. Genes having highest value of V are the most discriminating genes. Where a,b,c,d are the count of gene expression values of gene g having values greater than equal to or less than equal to threshold t. But some time it is very difficult to identify threshold value t because of stochastic nature of gene expressions. As microarray contains noise and lack of normal pattern of gene expressions, Olga [13] compared three model-free non-parametric methods i.e. t-test, rank-sum test and heuristic method based on high Pearson co-relation coefficient. Out of these three methods Wilcoxon Rank Sum Test performs better than others. But TPR rate depends on noise in the dataset and p-value selected. Another replication-free method is proposed in [14] based on significant temporal variation exhibited by genes in estrous cycle of rat mammary gland. The algorithm is designed in such a way that it will fit data on B-splins curve. But the algorithm is only applicable for time dependent biological processes such as cell cycle, circadian rhythms, development patterns, hormonal fluctuations where gene expressions vary with time. Again the experiment was conducted on simulated data. As clustering helps in finding biologically co-expressing genes, a method was discussed for non- replicated datasets in[15] based on pareto-optimal clustering using supervised learning in which genes are assigned to a cluster using SVM. But prior to clustering there is need of identification of significant genes which was done manually by observing changes in expressions of genes. In order to observe the periodic nature in microarray gene expressions, average periodigram is used in [16]. It is tool used in time series analysis for observing peaks in time domain. Upon noticing peaks, their significance is validated using g-test of Fisher [17] followed by calculation of p-value for each test and gene which reject null hypothesis is considered as DEG. But for the application of g-test minimum 40 measurements per gene are desirable and as p-value for each gene is calculated, the method is time consuming. For biological processes which are periodic in nature, Fast Fourier Transform based method is proposed in [18] whose performance is affected by small time series experiments and missing values. By observing the behavior of genes in microarray, author Ping Ma in [19] proposed a technique in which the time series gene chip is classified into two classes of genes. Genes which do not interact with time are kept in PDE (Parallel Differentially Expressing) class and those with time interaction are deposited in NPDE (Non-Parallel Differentially Expressing) class. After classification a functional ANOVA mixed-effect model is applied for identifying NPDE genes. For some cases of datasets, the method had identified NPDE genes as PDE. As above methods are examining individual gene, time required for the analysis is more. To overcome this, a new Functional Principal Component (FPC) approach is developed in [20] which observe the temporal trajectories of gene expressions which are modeled using few basic functions which require lesser number of parameters as compared to earlier methods. This method is suffering with increase in false positive rate. Another method based on Fourier transform is proposed in [21]. In this method genes which are not differentially expressed are filtered out on the basis of Fourier coefficients. As number of genes gets reduced model-based clustering is applied on them. But disadvantage of this method is that for each gene it is needed to calculate Fourier coefficient. Another approach is proposed for identifying DEGs in non-replicated time-course data by inculcating FPCA into hypothesis testing framework [22]. Other method includes a geometrical approach for identification of DEGs in which co-relation between genes is considered. It is multivariate
  • 4. International Journal on Bioinformatics & Biosciences (IJBB) Vol.6, No.1, March 2016 approach. It reframes linear classification method to identify DEGs by defining a separate hyper- plane, the orientation of which decides DEGs. Comparative analysis of statistical methods is done in [23] along with various tools and packages used for identifying DEGs mentioned in it. It is observed that none of the statistical method is the best. The choice of method depends on the type of dataset selected for the experimentation. In Fold change method a log ratio of expressions under two conditions is taken. Its drawback is that statistical variance is not considered in [24]. FC method is subject to bias if the data have not been properly normalized. The danger of false positive and false negative is illustrated in [25] by Tanaka after strictly considering only FC. The shortcomings of FC are removed by t-test by considering variance. Two sample t-statistics is calculated as follows. Where s is sample variance and n1 and n2 are number of observations in each condition. But small sample size is the major obstacle in it[26]. Again it is observed that variance is not same within each group. Therefore Welch proposed t-test for unequal variance [27] by correcting degree of freedom for unequal variance as follows. Based on the limitation of small sample size, Wilcoxon rank sum test has used as alternative method for testing differential expressions Many of these methods are based on hypothesis testing. Author in paper [32] applied hypothesis test on each gene to determine as to whether it falls under the category of significant or non-significant gene by observing its variance in population average versus time curve. Apart from this Angelini [33] have used Bayesian approach to identify highly expressed genes. Selection of method depends on nature of gene expressions i.e. Static and Dynamic. In static type, gene expressions are measured at single time point for multiple subjects whereas in dynamic type, multiple time points are used for capturing gene expressions. While searching for the significant genes, it is quite possible that redundant genes get extracted from the dataset. To avoid this PSO based approach is proposed in[28] which will identify non- redundant disease related genes. In this paper, multi-objective function is used which will optimize multiple goals i.e minimization of sensitivity and minimization of specificity. Redundancy is avoided in this approach by selecting the solution which gives mutually maximally dissimilar genes. As size of solution is (n+n*s), where n is number of genes and s is number of samples, time required is more and author has just given 2 or 3 biomarker genes for different cancer related datasets. As replication of time series microarray expressions is costly and it is not possible to fit irregularly varying gene expressions into predefined models, a general approach is proposed in[30]. It is based on Partial Energy ratio for Microarray (PEM). PEM statistic is incorporated into the permutation based SAM framework for significance analysis. This method suffers with low samples as signal smoothing is not possible for dataset consisting of fewer samples. Again a general method is proposed to overcome the drawback of modeling methods in [30]. This method
  • 5. International Journal on Bioinformatics & Biosciences (IJBB) Vol.6, No.1, March 2016 uses cubic splines which are set of cubic polynomials used for fitting time series and noisy data and each gene is represented by spline but noise which makes the process computationally intensive. A Ranking and Combination based method is described in [34] which rank genes based on feature such as variance, deviation, correlation and probability. After sorting genes rank-wise, combination is done. Thus we have reviewed many methods for replicated and non-replicated datasets. Following table will show results of those methods in terms of count of significantly identified genes in particular dataset on which that method is applied. Method Dataset Actual count of Count genes significant genes t-test [35] Prostate cancer 27575 9985 SAM [5 ] Human lymphoidblastoid cell 6800 180 lines ANOVA [2] Human Liver and skeletal - 1274(male) smaples(male) 1886(female) Placenta(female) RM-ANOVA[3] Breast cancer 1900 55 Empirical Bayes [6] Lever cancer 6810 127 Wilcoxon(RST) [13] Lung tumor 91 PEM [PEM] Yeast cell cycle 800 104 PSO-based [28] Prostate cancer 12533 6 Diffuse B-cell lymphoma 7070 1 ALL (child-ALL) 8280 2 CML 12625 5 Fourier Transform Yeast cell cycle 4489 2227 [29] FPCA [20] C. Elegance 2430 1982 B-spline[14] Estrous cycle 21044 871
  • 6. International Journal on Bioinformatics & Biosciences (IJBB) Vol.6, No.1, March 2016 CONCLUSION AND FUTURE SCOPE Thus we have reviewed statistical as well as non-statistical methods used for finding DEGs. Identification of differentially expressing genes is the main area of research in Bioinformatics. Though many methods exist in the literature but none of them is the best. Model-based methods are not suitable for any type of datasets because it is not feasible to fit stochastic gene expressions into those models. In these model-based methods computational cost increases as number of genes increases because number of mathematical expressions increases with respect to count of genes. Also many statistical methods are not discussing about missing values before finding DEGs. Curve based methods like b-splines fit individual gene expressions into curve. This results in increasing computation overhead and time requirement. Many tools have been developed based on statistical methods. These include SAM, ANOVA, t- test etc. Since genes expressions do not have specific pattern, a generalized algorithm is needed which will find significant genes and one has to explore the comparison between statistical and non-statistical method by considering same dataset. Also methods for which same genes are observed as significant will be compared on other factors like time and computational cost. REFERENCES [1] Tusher V.G., Tibshirani R., and Chu G, “Significance Analysis of Microarrays Applied to the Ionizing Radiation Response,” Proceeding National Academy of Sciences USA, vol. 98, 2001 [2] Kerr,M.K., Martin,M. and Churchill,G.A. “Analysis of variance for gene expression microarray data”, Journal of Computational. Biology., 7,2000. [3] Ola EiBakry,M.Omair Ahmad and M.N.S. Swamy, “Identification of Differentially Expressed Genes for Time-Course Microarray Data Based on Modified RM ANOVA”, IEEE/ACM Transactions on Computational Biology and Bioinformatics,vol.9, 2012 [4] Thomas JG, Olson JM, Tapscott SJ, Zhao LP: An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Research vol.11,2001 [5] Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA, vol.98, 2001 [6] Efron B, Tibshirani R, Gross V, Tusher VG: Empirical Bayes analysis of a microarray experiment. Journal of American Statistic Association, vol. 96,2001 [7] Lee ML, Kuo FC,Whitemore GA and Sklar J. “Importance of replication in microarray studies: statistical methods and evidence from repetitive cDNA hybridization”, Proceeding National Academic Science,USA, vol.97,2000 [8] Qin LX, Kerr KF: Empirical evaluation of data transformations and ranking statistics for microarray analysis. Nucleic Acids Res, vol.32, 2004 [9] Sioson AA, Mane SP, Li P, Sha W, Heath LS, Bohnert HJ, Grene R: The statistics of identifying differentially expressed genes in Expresso and TM4: a comparison. BMC Bioinformatics vol.7,2006
  • 7. International Journal on Bioinformatics & Biosciences (IJBB) Vol.6, No.1, March 2016 [10] Carl Murie,Owen Woody,Anna Y Lee and Robert Nadon, “Comparison of small n statistical tests of differential expression applied to microarryas”. BMC Bioinformatics,vol.10,2009 [11] Luis Ospina and Liliana Lopez-Kleine , “Indentification of differentially expressed genes in microarray data in a principal component space”, SpringerPlus, vol.2,2013 [12] Hisham Al-Mubaid and Noushin Ghaffari, “Identifying the Most Significant genes from Gene expression Profiles for Sample Classification”, Proceeding ,IEEE conference on Granular Computation,2006 [13] Olga G. Toyanskaya, Mitchell E. Garber, Patrick O. Brown, David Botstein and Russ B. Altman, “Nonparametric methods for identifying differentially expressed genes in microarray”, Bioinofrmatics, vol.18,2002 [14] Stephen C Billupus, Margaret C Neville, Michael Rudolph, Weston Porter and Pepper Schedin, “Identifying significant temporal variation in time curse microarray data without replicated”, BMC Bioinformatics, vol.10,2009 [15] Ujjwal Maulik, Anirban Mukhopadhyay,Sangmitra Bandopadhyay, “Combining Pareto- optimal clustering using supervised learning for identifying co-expressed genes”, BMC Bioinformatics, vol.10,2009 [16] Sofia Wichert, Konstantinos Fokianos and Korbinian Strimmer, “Indentifying periodically expressed transcripts in microarray time series data”, Bioinformatics, vol.20,2004 [17] Fisher R.A. “Tests of significance in harmonic analysis”, Proceeding Royal Society Publishing, vol.125,1929 [18] Jerry Chen, and Paul Paolini, “Fourier Analysis of Time Course Microarray data and its Relevance to Gene Expressions Dynamics”, Proceeding ACSESS , 2008 [19]Ping Ma, Wenxuan Zhong, Jun S. Liu, “Identifying Differentially Expressed Genes in time Course Microarray Data”, Statistic in Bioscience, vol,1. 2009 [20] Chen Kun, Wang, Jane-Ling, “Identifying Differentially Expressed Genes for Time-course Microarray Data through Functional Data Analysis”, Statistic in biosciences,vol.2,2010 [21] Jaehee Kim, Robert Todd Ogden and Haseong Kim, “ A method to identify differential expression profiles of time-course gene data with Fourier transform”, BMC Bioinformatics, col.14,2013 [22] Shuang Wu, Hulin Wu, “More powerful significant testing for time course gene expression data using functional principal component analysis approaches”, BMC Bioinformatics, vol.14, 2013 [23] J. Sreekumar and K.K. Jose, “Statistical tests for identification of differentially expressed genes in cDNA microarray experiments”, Indian Journal of Biotechnology, vol.7,no.10,2008 [24] Biju J., Anuparna S and Govindswami K, “Microarray -chipping in genomics”, Indian Journal of Biotechnology ,vol.1,2002 [25] Tanaka T.S.,Jaradat S.A., Lim M.K., Kargul G.J.,Wang X., “Genome-wide expression
  • 8. International Journal on Bioinformatics & Biosciences (IJBB) Vol.6, No.1, March 2016 profiling and mid-gestation placenta and embriyo using a 15000 mouse development cDNA microarray”, Proceeding National Academy of science USA, vol.97, 2002 [26] Devore J. And Peck R. “Satistics: The exploration and analysis of data” 3rd edition, Duxury Press, Pacific Grove,CA,1997 [27] Welch B.I., “The significance of the difference between two means when population means are unequal”,Biometrika, vol.29,1938 [28] Anirban Mukhopadhyay and Monalisa Mandal, “Identifying Non-Redundant Gene Markers from Microarray Data: A Multiobjective variable Length PSO-Based Approach”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.11, 2014 [29] Jaehee Kim, Robert Todd Ogden and Haseong Kim, “A method to identify differential expression profiles of time-course gene data with Fourier transform”, BMC Bioinformatics, vol.14,no.310,2013 [30] Xu Han, “ PEM: A General Statistical Approach for Identifying Differentially Expressed Genes in the Time-Course cDNA Microarray Experiment Withour Replicate”, Journal of Bioinformatics and Computational Biology,2006 [31] Ziv Bar-Joseph, Georg Gerber, Itamar Simon, David K. Gifford and Tommi Jaakkola, “Comparing the Continuous Representation of time-seriess expressions profiles to identify Differentially expressed genes”, PNAS, vol.100,no.18,2003 [32] J.D. Storey,W.Xiao, J.T. Leek, R.G.Tompkins, and R.W. Davis, “Significance Analysis of Time Course Microarray Experiments”, Proceeding National Academy of science USA, vol. 102,2005 [33]C. Angelini, D. De Canditiis, M. Mutarelli and M. Pensky, “A Bayesian approach to estimation and testing in time-course microarray Experiments”, Statistical application in Genetics and Molecular Biology,vol.6,2007 [34] Han-Yu Chuang, Hongfang Liu, Stuart Brown, Cameron McMunn-Coffran, “Identifying Significant Genes from Microarray Data”, Fourth IEEE Symposium on Bioinformatics and Bioengineering, 2004 [35] Khalid Raza and Rajni Jaiswal, “Reconstruction and Analysis of Cancer-specific Gene Regulatory Networks from Gene Expression Profiles”, vol.3,2013