SlideShare a Scribd company logo
Methods Of Microarray Data Analysis Iii Papers
From Camda 02 1st Edition Simon M Lin download
https://ebookbell.com/product/methods-of-microarray-data-
analysis-iii-papers-from-camda-02-1st-edition-simon-m-lin-2261042
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Methods Of Microarray Data Analysis Iv 1st Edition Shoemaker
https://ebookbell.com/product/methods-of-microarray-data-analysis-
iv-1st-edition-shoemaker-2162142
Methods Of Microarray Data Analysis V 1st Edition Raphael D Isokpehi
Auth
https://ebookbell.com/product/methods-of-microarray-data-
analysis-v-1st-edition-raphael-d-isokpehi-auth-4211780
Methods Of Microarray Data Analysis Ii Papers From Camda 01 1st
Edition Simon M Lin
https://ebookbell.com/product/methods-of-microarray-data-analysis-ii-
papers-from-camda-01-1st-edition-simon-m-lin-4591990
New Theory Of Discriminant Analysis After R Fisher Advanced Research
By The Feature Selection Method For Microarray Data 1st Edition
Shuichi Shinmura Auth
https://ebookbell.com/product/new-theory-of-discriminant-analysis-
after-r-fisher-advanced-research-by-the-feature-selection-method-for-
microarray-data-1st-edition-shuichi-shinmura-auth-5741792
Microarray Analysis Of The Physical Genome Methods And Protocols 1st
Edition Jonathan R Pollack Auth
https://ebookbell.com/product/microarray-analysis-of-the-physical-
genome-methods-and-protocols-1st-edition-jonathan-r-pollack-
auth-2531292
Methods Of It Project Management 4th Edition Jeffrey L Brewer
https://ebookbell.com/product/methods-of-it-project-management-4th-
edition-jeffrey-l-brewer-46805496
Methods Of Partial Deafness Treatment Henryk Skaryski Piotr H Skaryski
https://ebookbell.com/product/methods-of-partial-deafness-treatment-
henryk-skaryski-piotr-h-skaryski-47202830
Methods Of The Policy Process Taylor Francis Group
https://ebookbell.com/product/methods-of-the-policy-process-taylor-
francis-group-47203084
Methods Of Mathematical Physics Classical And Modern Alexey N
Karapetyants
https://ebookbell.com/product/methods-of-mathematical-physics-
classical-and-modern-alexey-n-karapetyants-47277812
Methods Of Microarray Data Analysis Iii Papers From Camda 02 1st Edition Simon M Lin
Methods Of Microarray Data Analysis Iii Papers From Camda 02 1st Edition Simon M Lin
METHODS OF MICROARRAY
DATA ANALYSIS III
This page intentionally left blank
METHODS OF MICROARRAY
DATA ANALYSIS III
Papers from CAMDA ‘02
edited by
Kimberly F. Johnson
Cancer Center Information Systems
Duke University Medical Center
Durham, NC
Simon M. Lin
Duke Bioinformatics Shared Resource
Duke University Medical Center
Durham, NC
KLUWER ACADEMIC PUBLISHERS
NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
eBook ISBN: 0-306-48354-8
Print ISBN: 1-4020-7582-0
Print ©2003 Kluwer Academic Publishers
All rights reserved
No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,
mechanical, recording, or otherwise, without written consent from the Publisher
Created in the United States of America
Boston
©2004 Springer Science + Business Media, Inc.
Visit Springer's eBookstore at: http://www.ebooks.kluweronline.com
and the Springer Global Website Online at: http://www.springeronline.com
Contents
Contributing Authors
Preface
Introduction
SECTION I TUTORIALS
THE BIOLOGY BEHIND GENE EXPRESSION: A BASIC TUTORIAL
MICHAEL F. OCHS AND ERICA A. GOLEMIS
MONITORING THE QUALITY OF MICROARRAY EXPERIMENTS
KEVIN R. COOMBES, JING WANG, LYNNE V. ABRUZZO
OUTLIERS IN MICROARRAY DATA ANALYSIS
RONALD K. PEARSON, GREGORY E. GONYE, AND JAMES S. SCHWABER
SECTION II BEST PRESENTATION AWARD
ORGAN-SPECIFIC DIFFERENCES IN GENE EXPRESSION AND
UNIGENE ANNOTATIONS DESCRIBING SOURCE MATERIAL
DAVID N. STIVERS, JING WANG, GARY L. ROSNER, AND KEVIN R. COOMBES
SECTION III ANALYZING IMAGES
ix
xi
1
7
9
25
41
57
59
73
vi Methods of Microarray Data Analysis III
CHARACTERIZATION, MODELING, AND SIMULATION OF MOUSE
MICROARRAY DATA
DAVID S. LALUSH
TOPOLOGICAL ADJUSTMENTS TO GENECHIP EXPRESSION VALUES
ANDREY PTITSYN
SECTION IV NORMALIZING RAW DATA
COMPARISON OF NORMALIZATION METHODS FOR CDNA
MICROARRAYS
LILING WARREN, BEN LIU
SECTION V CHARACTERIZING TECHNICAL AND BIOLOGICAL
VARIANCE
SIMULTANEOUS ASSESSMENT OF TRANSCRIPTOMIC VARIABILITY
AND TISSUE EFFECTS IN THE NORMAL MOUSE
SHIBING DENG, TZU-MING CHU, AND RUSS WOLFINGER
HOW MANY MICE AND HOW MANY ARRAYS? REPLICATION IN
MOUSE CDNA MICROARRAY EXPERIMENTS
XIANGQIN CUI AND GARY A. CHURCHILL
BAYESIAN CHARACTERIZATION OF NATURAL VARIATION IN GENE
EXPRESSION
MADHUCHHANDA BHATTACHARJEE, COLIN PRITCHARD, MIKKO J.
SlLLANPÄÄ AND ELJA ARJAS
SECTION VI INVESTIGATING CROSS HYBRIDIZATION ON
OLIGONUCLEOTIDE MICROARRAYS
QUANTIFICATION OF CROSS HYBRIDIZATION ON
OLIGONUCLEOTIDE MICROARRAYS
LI ZHANG, KEVIN R. COOMBES, LIANCHUN XIAO
ASSESSING THE POTENTIAL EFFECT OF CROSS-HYBRIDIZATION ON
OLIGONUCLEOTIDE MICROARRAYS
SEMAN KACHALO, ZAREMA ARBIEVA AND JIE LIANG
WHO ARE THOSE STRANGERS IN THE LATIN SQUARE?
WEN-PING HSIEH, TZU-MING CHU, AND RUSS WOLFINGER
75
93
103
105
123
125
139
155
173
175
185
199
Methods of Microarray Data Analysis III vii
SECTION VII FINDING PATTERNS AND SEEKING BIOLOGICAL
EXPLANATIONS
BAYESIAN DECOMPOSITION CLASSIFICATION OF THE PROJECT
NORMAL DATA SET
T. D. MOLOSHOK, D. DATTA, A. V. KOSSENKOV, M. F. OCHS
THE USE OF GO TERMS TO UNDERSTAND THE BIOLOGICAL
SIGNIFICANCE OF MICROARRAY DIFFERENTIALGENE EXPRESSION
DATA
RAMÓN DÍAZ-URIARTE, FÁTIMA AL-SHAHROUR, AND JOAQUÍN DOPAZO
Acknowledgments
Index
209
211
233
249
251
This page intentionally left blank
Contributing Authors
Abruzzo, Lynne V., University of Texas M.D. Anderson Cancer Center,
Houston, TX
Al-Shahrour, Fátima, Centro Nacional de Investigaciones Oncológicas,
(CNIO),
(Spanish National Cancer Centre), Madrid, Spain
Arbieva, Zarema, University of Illinois at Chicago, Chicago, IL
Arjas, Elja, Rolf Nevanlinna Institute, University of Helsinki, Finland
Bhattacharjee, Madhuchhanda, Rolf Nevanlinna Institute, University of
Helsinki, Finland
Chu, Tzu-Ming, SAS Institute, Cary, NC
Churchill, Gary A., The Jackson Laboratory, Bar Harbor, Maine
Coombes, Kevin R., University of Texas M.D. Anderson Cancer Center,
Houston, TX
Cui, Xiangqin, The Jackson Laboratory, Bar Harbor, Maine
Datta, D., Fox Chase Cancer Center, Philadelphia, PA
Deng, Shibing, SAS Institute, Cary, NC
Díaz-Uriarte, Ramón, Centro Nacional de Investigaciones Oncológicas,
(CNIO), (Spanish National Cancer Centre), Madrid, Spain
Dopazo, Joaquín, Centro Nacional de Investigaciones Oncológicas, (CNIO),
(Spanish National Cancer Centre), Madrid, Spain
Golemis, Erica A., Fox Chase Cancer Center, Philadelphia, PA
Gonye, Gregory E., Thomas Jefferson University, Philadelphia, PA
Hsieh, Wen-Ping, North Carolina State University, Raleigh, NC
Kachalo, Seman, University of Illinois at Chicago, Chicago, IL
Kossenkov, A. V., Fox Chase Cancer Center, Philadelphia, PA and Moscow
Physical Engineering Institute, Moscow, Russian Federation
Liang, Jie, University of Illinois at Chicago, Chicago, IL
Liu, Ben, Bio-informatics Group Inc., Cary, NC
Moloshok, T. D., Fox Chase Cancer Center, Philadelphia, PA
Ochs, Michael F., Fox Chase Cancer Center, Philadelphia, PA
Pearson, Ronald K., Thomas Jefferson University, Philadelphia, PA
Pritchard, Colin, Fred Hutchinson Cancer Research Centre, Seattle, WA
Ptitsyn, Andrey, Pennington Biomedical Research Center
Rosner, Gary L., University of Texas M.D. Anderson Cancer Center,
Houston, TX
Schwaber, James S, Thomas Jefferson University, Philadelphia, PA
Sillanpää, Mikko J., Rolf Nevanlinna Institute, University of Helsinki,
Finland
Stivers, David N., University of Texas M.D. Anderson Cancer Center,
Houston, TX
Wang, Jing, University of Texas M.D. Anderson Cancer Center, Houston,
TX
Warren, Liling, Bio-informatics Group Inc., Cary, NC
Wolfinger, Russ, SAS Institute, Cary, NC
Xiao, Lianchun, The University of Texas MD Anderson Cancer Center,
Houston, TX
Zhang, Li, The University of Texas MD Anderson Cancer Center, Houston,
TX
x Contributing Authors
Preface
As microarray technology has matured, data analysis methods have
advanced as well. However, microarray results can vary widely from lab to
lab as well as from chip to chip, with many opportunities for errors along the
path from sample to data. The third CAMDA conference held in November
of 2002 pointed out the increasing need for data quality assurance
mechanisms through real world problems with the CAMDA datasets. Thus,
the third volume of Methods of Microarray Data Analysis emphasizes many
aspects of data quality assurance.
We highlight three tutorial papers to assist with a basic understanding of
underlying principles in microarray data analysis, and add twelve papers
presented at the conference. As editors, we have not comprehensively edited
these papers, but have provided comments to the authors to encourage clarity
and expansion of ideas. Each paper was peer-reviewed and returned to the
author for further revision.
We do not propose these methods as the de facto standard for microarray
analysis. But rather we present them as starting points for discussion to
further the science of micrarray data analysis. The CAMDA conference
continues to bring to light problems, solutions and new ideas to this arena
and offers a forum for continued advancement of the art and science of
microarray data analysis.
Kimberly Johnson
Simon Lin
This page intentionally left blank
Introduction
A comparative study of analytical methodologies using a standard data
set has proven fruitful in microarray analysis. To provide a forum for these
comparisons the third Critical Assessment of Microarray Data Analysis
(CAMDA) conference was held in November, 2002. Over 170 researchers
from eleven countries heard twelve presentations on topics such as data
quality analysis, image analysis, data normalization, expression variance,
cross hybridization and pattern searching. The conference has evolved in its
third year, just as the science of microarrays has developed. While initial
microarray data analysis techniques focused on classification exercises
(CAMDA ’00), and later on pattern extraction (CAMDA ’01), this year’s
conference, by necessity, focused on data quality issues. This shift in focus
follows the maturation of microarray technology as the detection of data
quality problems has become a prerequisite for data analysis. Problems such
as background noise determination, faulty fabrication processes, and, in our
case, errors in data handling, were highlighted at the conference.
The CAMDA ‘02 conference provided a real-world lesson on data
quality control and saw significant development of the cross-hybridization
models. In this volume, we present three tutorial chapters and twelve paper
presentations. First, Michael Ochs and Erica Golemis present a tutorial
called “The Biology Behind Gene Expression.” This discussion is for non-
biologists who want to know more about an intelligent machine called the
cell. This machinery is extremely complex and a glossary in this tutorial
provides the novice with an overview of important terms related to
microarrays while the rest of the paper details the biological processes that
impact microarray analysis. Next is a tutorial on methods of data quality
control by Kevin Coombes. We invited Dr. Coombes to submit this tutorial,
2 Introduction
titled “Monitoring the Quality of Microarray Experiments,” as an expansion
of his presentation at the conference, which is also prominently featured.
The last tutorial is by Ronald Pearson titled “Outliers in Microarray Data
Analysis.” This tutorial addresses the issue of quality control by identifying
outliers and suggesting methods to deal with technical and biological
variations in microarray data.
As always, we are happy to highlight the paper voted by attendees as the
Best Presentation. This year, the award went to:
David N. Stivers, Jing Wang, Gary L. Rosner, and Kevin R. Coombes
University of Texas M.D. Anderson Cancer Center, Houston, TX
“Organ-specific Differences in Gene Expression and UniGene
Annotations Describing Source Material”
for their rigorous scrutiny of data quality before starting data analysis.
Presented by Kevin Coombes, their paper not only revealed the existence of
errors in the Project Normal data set, but also specified the exact nature of
the problems and included the methods used to detect these problems. See
below for more details on these data set errors.
CAMDA 2002 Data Sets
The scientific committee chose two data sets for CAMDA ‘02. The first,
called Project Normal came from The Fred Hutchinson Cancer Center and it
showed the variation of baseline gene expression levels in the liver, kidney
and testis of six normal mice. By using a 5406-clone spotted cDNA
microarray, Pritchard et al. concluded that replications are necessary in
microarray experiments. The second data set came from the Latin Square
Study at Affymetrix Inc. This benchmark data set was created to develop
statistical algorithms for microarrays. Sets of fourteen genes with known
concentrations were spiked into a complex background solution and
hybridized on Affymetrix chips. Data was obtained with replicates and both
Human and E. Coli chips were studied.
As mentioned above, there were errors in the Project Normal dataset that
were undetected until CAMDA abstracts were submitted. Once we received
the Stivers et al. abstract, we asked the original Project Normal authors to
confirm their findings. The errors in the data set were verified and after
much discussion among the Scientific Committee members, a decision was
made to keep the contest going to allow the Stivers group to report and
discuss their finding of data abnormalities at the conference. Actually, many
groups revealed various aspects of the data abnormalities, but the Stivers
group not only realized that a problem existed, but also identified the
Methods of Microarray Data Analysis III 3
specific problem. Colin Pritchard, representing Project Normal, confirmed
that indeed, the problems in the dataset were a result of incorrectly merging
the data with the annotations, resulting in mismatched row/column
combinations. In addition, several slides have a small number of misaligned
grids. These problems affected about 1/3 of the genes (though different sets)
in the testis and liver data. Pritchard also noted that a re-analysis ofthe data
with the corrected data sets showed that the results were not notably
different from the original conclusions. For the record, both the original and
the corrected data sets are available at the CAMDA conference website for
researchers who might be interested in “data forensics”. We extend our
thanks to Colin Pritchard, Li Hsu, and Peter Nelson at the Fred Hutchinson
Cancer Center for their assistance and professionalism in handling this
discovery and allowing the conference to proceed as planned. They were
most gracious in their contributions to the conference.
Organization of this Volume
After presenting the three tutorial papers, naturally the first conference
paper is the one voted as Best Presentation. We then divide the book into
subject areas covering image analysis, data normalization, variance
characterization, cross hybridization, and finally pattern searching. At the
end of this introduction, you will also find a link to the web companion to
this volume.
Analyzing Images
Raw microarray data first exists as a scanned image file. Differences in
spot size, non-uniformity of spots, heterogeneous backgrounds, dust and
scratches all contribute to variations at the image level. In Chapter 5, David
Lalush characterizes such parameters and discusses ways to simulate
additional microarray images for use in developing image analysis
algorithms.
On the Affymetrix platform, hybridization operators have observed that
the images tend to form some kind of mysterious pattern. In Chapter 6,
Andrey Ptitsyn argues that there indeed is a background pattern. He further
postulates that the pattern might be caused by the fluid dynamics in the
hybridization chamber.
4 Introduction
Normalizing Raw Data
Normalization has been recognized as a crucial step in data pre-
processing. Do some mathematical operations truly allow us to remove the
systematic variation that might skew our analysis, or, are we distorting the
data to create illusions? The paper by Liling Warren and Ben Lu.
investigated seven different ways to normalize microarray data. Results
show that normalization has a greater impact than expected on detecting
differential expressions: the same downstream detection method can result in
23 to 451 genes, depending on the pre-processing of the data. Suggestions
to guide researchers in the normalization process are provided.
Characterizing Technical and Biological Variance
The project normal paper [PNAS 98:13266-77, 2001] showed us that
even for animals under ‘normal’ conditions, gene expression levels do
fluctuate from one to the other. This biological variation complicates the
final genetic variation we find on the microarray. The microarray can also
include technical variations produced during the measurement process. Deng
et al. describes a two linear mixed model to assess variability and
significance in Chapter 8. By a similar mixed model approach, Cui et al.
calculates the necessary number of replicates to detect certain changes. This
is of great interest to experimental biologists. Usually, we have limited
resources for either total number of microarrays as a financial consideration,
or from the limited number of cells we can obtain. The optimal resource
allocation formula by Cui et al. lets us answer questions such as: Should we
use more mice or more arrays? Should we pool mice? Chapter 9 provides
some answers to these questions.
Estimating location and scale from experimental measurements has been
one of the major themes in statistics. Most of the previous work on
microarrays focused on the classification of the expression changes under
different conditions. Bhattacharjee et al. investigated the classification of
intrinsic biological variance of gene expression. By using a Bayesian
framework, the authors support the hypothesis that some genes by nature
exhibit highly varied expression. This work is featured in Chapter 10.
Investigating Cross Hybridization on Oligonucleotide
Microarrays
Quantitative binding of genes on the chip surface is a fundamental issue
of microarrays [Nature Biotechnology 17:788-792, 1999]. Characterizing
Methods of Microarray Data Analysis III 5
specific versus non-specific binding on the chip surface has been important
yet under studied. The Affymetrix Latin Square data set provides an
excellent opportunity for such studies. All three papers in the next section of
this volume address this issue. In 2002, Zhang, Miles, and Aldape
developed a free energy model of binding on microarrays [Nature
Biotechnology 2003, In Press]. This mechanistic model is a major
development since the Li-Wong statistical model [PNAS 98:31-36, 2001] for
hybridization. At CAMDA ‘02, Zhang et al. demonstrated an application of
this model to identify spurious cross-hybridization signals. Their work is
highlighted in Chapter 11. Kachalo et al. suggest in Chapter 12 that a
match of 7 to 8 nucleotides could potentially contribute to non-specific
binding. This non-specific binding provides clues to the interpretation of
hybridization results but also assists in the future design of oligos. The same
cross-hybridization issue also caught the attention of statisticians. Hsieh et
al. exposed cross-hybridization problems by studying outliers in the data set.
In this case, the cause of cross-hybridization was found to be fragments
matching the spiked-in genes. In summary, these three papers expound on
potential cross-hybridization problems on microarrays and suggest some
solutions.
Finding Patterns and Seeking Biological Explanations
The final section of this volume focuses on utilizing the Gene Ontology
(GO) as an explanatory tool, though the final two papers differ in how to
group the genes. Moloshok et al. modified their Bayesian decomposition
algorithm to identify the patterns in gene expression and to specify which
gene belongs to which pattern. The Bayesian framework not only allows
encoding prior information in a probabilistic way, but also naturally allows
genes to be assigned to multiple classes. In the final chapter, Diaz-Uriarte et
al. use GO to obtain biological information about genes that are
differentially expressed between organs in the Project Normal data set. The
techniques incorporate a number of statistical tests for possible identification
of altered biochemical pathways in different organs.
Summary
The CAMDA ’02 conference again brought together a diverse group of
researchers who provide many new perspectives to the study of microarray
data analysis. At previous CAMDA conferences we studied expression
patterns of yeast and cancers. The most recent CAMDA conference took a
step back to find variations in the data and possible problems. Our next
6 Introduction
CAMDA conference (’03) will focus on data acquired at different academic
centers and the problems in combining that data.
Web Companion
Additional information for many of these chapters can be found at the
CAMDA website, where links to algorithms, color versions of several
figures, and conference presentation slides can be found. Information about
future CAMDA conferences is available at this site as well. Please check the
website regularly for the call for papers and announcements about the next
conference.
http://camda.duke.edu
SECTION I
TUTORIALS
This page intentionally left blank
1
THE BIOLOGY BEHIND GENE EXPRESSION: A
BASIC TUTORIAL
Michael F. Ochs*
and Erica A. Golemis
Division of Basic Science, Fox Chase Cancer Center, Philadelphia, PA
Abstract:
Key words:
Microarrays measure the relative levels of gene expression within a set of cells
isolated through an experimental procedure. Analysis of microarray data
requires an understanding of how the mRNA measured with a microarray is
generated within a cell, how it is processed to produce a protein that carries
out the work of the cell, and how the creation of the protein relates to the
changes in the cellular machinery, and thus to the phenotype observed.
Transcription, translation, signaling pathway, post-translational modification
1. INTRODUCTION
This tutorial will introduce key concepts on cellular response and
transcriptional activation to non-biologists with the goal of providing a
context for the use and study of microarrays. The field of gene regulation,
including both transcriptional control (the regulation of the conversion of
DNA to messenger RNA) and translational control (the regulation of the
conversion of messenger RNA to protein), is immense. Although not yet
complete, our understanding of these areas is deep and this tutorial can only
touch upon the basics. A deeper understanding can be gained through
following the references given within the text, which are focused primarily
on recent reviews, or through one of the standard textbooks, such as
Molecular Biology of the Cell [Alberts et al., 2002]. It should be noted that
we are focusing on eukaryotic animal organisms within this tutorial, and not
all aspects of the processes discussed will apply to plants or prokaryotes,
such as bacteria.
*
author to whom correspondence should be addressed
10 Ochs and Golemis
The typical microarray experiment aims to distinguish the differences in
gene expression between different conditions. These can be between
different time points during a biological process, or between different tissue
or tumor types. While standard statistical approaches can provide an
estimate of the reliability of the observed changes in the messenger RNA
levels of genes, interpretation and understanding of the significance of these
changes for the biological system under study requires far more detailed
knowledge. Of particular importance is the issue of nonlinearity in
biological systems. For any analysis beyond the simple comparison of
measurements between two conditions, the fact that the system being studied
is a dynamic nonlinear system implies that the analysis must take into
account the interactions present in the system. This is only possible if the
system is understood at a non-superficial level.
The nomenclature surrounding the biological structures and processes
involved in transcription and translation is substantial. The following table
contains a glossary ofterms that can be referenced as needed.
Table 1. A glossary of key terms used for describing cellular processes of transcription and
translation. For human genes and proteins, the convention is to use capital italics (e.g.,
BRCA1) for the gene and capitals for the protein (e.g., BRCA1). Unfortunately the
conventions are organism specific.
Transcription
DNA
GATC
gene
transcription
RNA
promoter
enhancer
transcriptional activator
transcriptional suppressor
chromatin
chromatin remodeling
GLOSSARY
deoxyribonucleic acid, a polar double-stranded molecule
used for the fundamental storage ofhereditary information
and located in the nucleus (polarity is defined as 5’ – 3’)
guanine, adenine, thymidine, cytidine: the 4 “bases” that
comprise the DNA strands
fundamental DNA unit for production of a protein,
transcribed along a specific DNA sequence
the process by which a gene encoded in DNA is converted
into single-strandedpre-mRNA
ribonucleic acid, a single stranded molecule with multiple
uses within the cell
a region of DNA proximal to the 5’ end of a gene, where
proteins bind to initiate transcription of a gene
a region of DNA where binding of transcriptional regulatory
proteins alters the level of transcription of a gene
a protein or protein complex that binds to a promoter or
enhancer to induce transcription of a gene
a protein or protein complex that binds to a promoter or
enhancer to block transcription of a gene
higher-order compaction structure for DNA, based on
assembly of DNA around histones in nucleosomes
a process wherein the phasing of nucleosomes is altered,
altering the access of transcription factors to the DNA
Methods of Microarray Data Analysis III 11
histones
coactivator
basal transcriptional
apparatus
mRNA
rRNA
tRNA
codon
mRNA – associated terms
exon
intron
pre-mRNA
splicing
poly-adenylation
splice variants
mRNA export receptor
mRNA export sequence
miRNA
siRNA
Translation
amino acid
GLOSSARY
proteins that provide structural elements for the creation of
chromatin
proteins that modify histones allowing chromatin
remodeling
a protein complex that binds to a transcriptional initiation
site and performs conversion of DNA into mRNA
messenger RNA, a form of RNA created from DNA
generally containing a 5’ untranslated sequence, a protein
coding region, and a 3’ untranslated sequence
ribosomal RNA, a form of RNA that associates in a complex
with ribosomal proteins, which constitute the structural
elements of the ribosome
transfer RNA, the form of RNA that recognizes individual
codons on an mRNA and brings the amino acid specified by
that codon for incorporation into a polypeptide
a three nucleotide sequence (RNA or DNA) that specifies a
specific amino acid to insert in a protein
a portion of genomic DNA encoding a sequence of amino
acids to be incorporated in a protein (coding region)
a portion of genomic DNA that does not code for a sequence
of amino acids to be placed in a protein, but intervenes
between exons within a gene (noncoding region)
the mRNA polymer that is initially transcribed from DNA,
which contains both exons and introns
the process by which an mRNA is modified to remove
introns and sometimes vary the exons included in the mRNA
prior to translation
the addition of approximately 20-50 adenine residues to the
end of an mRNA sequence
different mRNAs created from the same DNA sequence
through different splicing
a protein that transports the mRNA from the nucleus to the
cytoplasm
a portion of mRNA that is recognized by an export receptor
allowing the mRNA to be exported from the nucleus
micro-RNA, a short RNA (<22 nucleotides) encoded within
the genome that can bind to a specific mRNA and block its
translation into a protein or alter its stability
small interfering RNA, a short, artificial RNA designed to
bind a specific mRNA and block its translation into a
protein, this is now widely used for gene knockout studies
one of 20 specific molecules used within all living creatures
to construct proteins, containing a common core (N-C-
COOH) and varying sidechains attached to the central
carbon
12 Ochs and Golemis
amino acid residue
polypeptide chain
protein
ribosome
chaperone
Proteins
dimer
homodimer
heterodimer
kinase
phosphatase
scaffolding protein
ubiquitination
proteasome
Signaling
signal
signal transduction
signaling activator
signaling inhibitor
receptor
ligand
signaling pathway
GLOSSARY
the term used to refer to an amino acid following insertion
within a polypeptide chain
a series of linked amino acid residues produced during the
process of translation
a completed polypeptide chain that folds into a three
dimensional structure to provide a cellular function specified
by a gene
a complex of proteins and rRNA that builds a polypeptide
chain based on the codons in an mRNA
a large molecular machine that assists a polypeptide chain in
proper folding into a protein
a small complex formed by the non-covalent association of
two proteins
a dimer that contains two identical subunits (i.e., is made
from two copies of the same protein)
a dimer that contains two different subunits (i.e., is made
from two different proteins)
a protein that adds a phosphate group to specific amino acid
residues in a protein in a process called phosphorylation
a protein that removes a phosphate group from specific
amino acid residues in a protein
a protein that binds multiple other proteins into a specific
conformation, enhancing or otherwise controlling their
interactions
the addition of one or more copies of ubiquitin (a short
peptidyl sequence) to a protein, either targeting a protein for
degradation or contributing to control ofprotein localization
a very large protein complex that degrades ubiquitinated
proteins
an abstraction ofprotein modifications indicating the
transmission of information within a cell
the use of programmed, frequently sequential changes in
protein interaction, modification, and activity status to
transmit information within a cell in a “signaling cascade”
a protein that modifies a second protein activating the
function of that protein
a protein that modifies a second protein suppressing the
function of that protein
a membrane bound protein that can receive a signal of
extracellular origin to activate a signaling cascade
a small protein, peptide, or hormone that binds to a cognate
receptor to initiate signaling
a group of proteins that provide the physical mechanism for
a signaling cascade, associated with a specific biological
response
Methods of Microarray Data Analysis III 13
junction
node
signaling network
Cellular Structures
membrane
nucleus
cytoplasm
ER, endoplasmic
reticulum
Golgi apparatus
cytoskeleton
ECM, extracellular matrix
GLOSSARY
a point where multiple signals combine and can be
integrated
a point where a single signal diverges and can provide input
into multiple downstream points
a group of signaling pathways linked together at junctions
and nodes creating a nonlinear response system
a lipid bilayer separating compartments, such as the outer
cell membrane that separates a cell from its environment
a region separated from the rest of the cell by a membrane
and containing the DNA
the portion of the cell outside the nucleus and containing
numerous smaller structures
a membrane-based cellular structure that serves multiple
functions including localization of some ribosomes, and
processing of membrane-associated and secreted proteins
a cellular structure that receives proteins following passage
through the ER, providing for additional protein
modification and transport
the complex of multiple structural proteins (e.g., actin,
tubulin) that provide structural integrity to the cell
the collection of structural proteins secreted by cells, and
forming an external “mesh”, that contributes to cell shape
control, survival functions, and signaling processes
2. CELLULAR RESPONSES AND SIGNALING
PATHWAYS
In order to survive, grow, and interact in complex organisms, cells must
interpret signals coming from both the external and internal environments.
Most behaviors, and all complex responses, require signaling pathways that
encode within the cell the control mechanism for response behaviors.
A simple example of such a system is shown in Figure 1. The yeast
Saccharomyces cerevisiae has developed a signaling pathway that responds
to external signals in order to initiate a mating response [Posas et al., 1998].
The external signal is detected by a transmembrane protein, the receptor
Ste2p. When a ligand specific for Ste2p produced by a cell of a different
mating type binds to this receptor, a signaling cascade is induced within the
cytoplasm of the yeast cell. The conformational change induced by the
ligand binding first leads to the cleavage and activation of a G protein, which
in turn activates Ste20p, a protein kinase. Ste20p in turn activates Ste11p,
which activates Ste7p, which activates Fus3p, which activates a transcription
factor, Ste12p. The transcription factor then initiates transcription within the
14 Ochs and Golemis
nucleus leading to the creation of messenger RNA (mRNA). Only the end
result of this cascade is measured with a microarray.
A set of proteins functionally linked in a single pathway is described as a
signaling cascade or pathway, with the process of movement of information
through a signaling cascade described as signal transduction. It is
immediately obvious that the cell is going to a great deal of trouble to
respond to a simple signal, introducing several intermediaries between signal
(ligand binding) and response (transcription). However, one reason for this
complexity is that it allows encoding of complex behavior by the creation of
signaling networks [Jordan et al., 2000]. A signaling network is created by
the intersection of multiple signaling pathways, with these intersections
generally occurring at multiple points within each separate pathway. The
result is a highly nonlinear system capable of responding in multiple ways to
a signal depending on the state of the overall network and the presence of
signals affecting different pathways.
The structure of a representative network is shown in Figure 2. This is a
highly simplified portrayal of an important pathway in human cancer
Methods of Microarray Data Analysis III 15
involving the cancer-inducing gene RAS (italics for a gene); see the review
by Kolch for a full discussion [Kolch, 2000]. In normal cells, RAS (no
italics for a protein) is activated by an external stimulus through the
epidermal growth factor receptor and other growth factors, and then it
interacts with numerous other proteins to determine a cellular response.
There are multiple, distinct receptors that may be activated in response to
different stimuli, and each receptor may activate multiple pathways. The
network is comprised of junctions, where multiple signals come together
(e.g., RAF and RAC in Figure 2), and nodes, where a single signal can
branch to multiple pathways (e.g., AKT and RAS in Figure 2). In addition,
for any component within a pathway, there can be both activators (often
kinases) and inhibitors (often phosphatases). For this diagram, the inhibitors
and activators are not separated. Additionally, there is generally feedback
within the network, either through proteins created as a result of the
transcriptional activators (circles in Figure 2) or directly from loops in the
network (AKT to RAF together with AKT to RAC to PAK to RAF). A
junction can therefore represent a point where the result of one pathway’s
activation is the activation of a kinase, while the result of a second
pathway’s activation is activation of a phosphatase, both targeted at the same
protein. A node may be a kinase that phosphorylates multiple proteins.
16 Ochs and Golemis
In order for kinases and phosphatases to operate, they must be in close
proximity to the target protein. This adds an additional degree of
complexity, as cellular localization becomes a key issue. While some
localization signals are encoded by the signaling proteins themselves, a set
of proteins, known as scaffolding proteins (e.g., Ste5 in Figure 1), has
evolved to keep the necessary components of a signaling pathway together
[Tzivion et al., 2001]. Scaffolding proteins may not always be absolutely
necessary for signal transduction, however they increase the efficiency of
signaling. And, as noted above, a signaling network is a complex nonlinear
system, so that signal strength can play a major role in the outcome of signal
initiation.
As multiple signals arrive at junctions, the response of that junction will
depend on the relative strength of various signals and the present overall
state of the cell (reflected as levels of kinase and phosphatase activity in the
local environment, protein interaction profiles, etc.). Each junction then
transduces a signal, until a physiological response occurs. This response can
include changes in cellular movement, intracellular transport or metabolism,
protein state (through degradation or modification), or transcriptional and
translational responses. While transcriptional activation is of interest here, it
is important to note not only that it is only one of a myriad of potential
changes occurring after activation of signaling, but also that transcription is
generally a late, downstream indicator of activity.
3. TRANSCRIPTION
Since gene expression is a downstream indicator of activity, it is
important to understand the basics of how the mammalian cell converts its
3,000,000,000 bases of DNA into usable, small messenger RNA (mRNA).
Within cells, DNA is organized into chromatin. The double-stranded DNA
is wound into coils around protein complexes, composed of proteins called
histones. While it has long been understood that chromatin must be
remodeled (i.e. unwound and rewound in a different pattern) as part of
initiating transcription, it has only lately become clear that the histones play
a larger role in regulation of expression [Berger, 2002]. There are numerous
proteins that modify histone structure, through mechanisms including
acetylation, methylation, phosphorylation, and ubiquitination of the histone
proteins. The proteins that modify histones, called coactivators, are
generally recruited to the DNA by transcriptional activators, which bind the
DNA directly [Featherstone, 2002]. Transcriptional activators bind the
Methods of Microarray Data Analysis III 17
DNA at specific sequences, called promoters and enhancers. Promoters lie
upstream of the gene, usually within a few kilobases of the start site, while
enhancers can occur both upstream and downstream from the gene and can
lie farther from the gene itself.
The overall machinery governing transcription is still an area of very
active study. The signal to transcribe a gene is first transduced to the
nucleus through a signaling pathway activating a transcription factor. This
can be done directly by proteins such as nuclear receptors, which are capable
of directly entering the nucleus and initiating transcription [Dilworth et al.,
2001] following their activation in the cytoplasm, or indirectly by
propagation of signals through a chain of protein interactions culminating in
activation of a nuclear transcription factor. The transcription factor must
then activate the basal transcriptional apparatus, which includes the key
TATA-box binding protein and RNA polymerase II [Green, 2000]. Once
the transcriptional activator is active in the nucleus, it binds to DNA at its
promoter. The transcription factor then recruits the coactivators, which leads
to chromatin remodeling, and activates the transcriptional apparatus,
following which transcription begins. The transcription process is itself
complex, requiring unwinding and opening of the DNA and translocation of
the complex along the DNA [Reines et al., 1999], but we will not detail
these issues here.
While much is shared between prokaryotic organisms, such as bacteria,
and eukaryotic organisms, such as ourselves, there are significant differences
in the transcriptional apparatus and structure of DNA. Within prokaryotes
the genes are arranged on the chromosomes in a linear fashion, each gene
comprised of a continuous DNA sequence that is transcribed into a
corresponding mRNA sequence. There tends to be only short sequences of
DNA between coding regions, and these sequences include the promoters.
Within higher eukaryotes, a gene generally contains both introns (sequences
which are not contained in the final mRNA) and exons (sequences that
remain in the final mRNA). Furthermore, the intergenic DNA becomes
significantly larger, with tens to hundreds of kilobases between transcribed
units, and individual genes spread over considerable distances. The mRNA
of a gene is transcribed first as a long sequence containing both introns and
exons, and then spliced to a continuous message comprised only of exons
[Sharp, 1994]. This introduces the possibility of variant proteins encoded
within a single “gene” in the DNA through splice variants, where different
exons are combined to produce a final mRNA transcript.
The spliced mRNA must then be transported out of the nucleus for
translation into a polypeptide. This is done through a highly conserved
export receptor that appears to be coupled to the mRNA splicing machinery
upstream [Reed et al., 2002]. The export mechanism is not well understood,
18 Ochs and Golemis
but it appears that mRNAs have multiple export paths that depend on
specific export sequences encoded within the mRNA [Stutz et al., 1998].
An additional step in the transcriptional machinery was discovered
recently. The existence of small lengths of RNA capable of silencing the
translation of mRNA were first noted in plants, and later confirmed to be
present in all organisms [McManus et al., 2002]. Essentially, functional
codes for extremely short lengths of RNA appear to be part of our genetic
structure. These micro RNAs (miRNAs) get converted within our cells to
approximately 21 base-pair single stranded RNA units that complement
sequences in the noncoding regions of mRNA. The binding of the single
stranded RNA to the mRNA targets the mRNA for destruction. Effectively
this “silences” the gene by blocking translation.
4. TRANSLATION
Following export, mRNA must then be transported to the ribosomes for
translation into a protein. Ribosomes are large complexes made from
ribosomal RNA (rRNA), another form of RNA present within the cell, and
ribosomal proteins. Ribosomes comprise two subunits that clamp onto the
mRNA chain and process it in a linear fashion [Ramakrishnan, 2002], as
shown in Figure 3. Within an mRNA, “triplets” of nucleotides, termed
Methods of Microarray Data Analysis III 19
codons, are arranged in sequence to specify the amino acids and their order
for the protein encoded by the gene. The ribosome contains three motifs
that bind transfer RNA (tRNA), yet another form of RNA existing within the
cell. Each tRNA comprises an “anticodon”, three bases keyed to bind three
mRNA bases, and a structure to bind a specific amino acid. The three
binding sites on the ribosome, labeled A, P, and E, position the tRNAs to test
a match to the mRNA template, then transfer a growing polypeptide chain,
and release the tRNA. The tRNA is first matched to the mRNA at the A site.
The existing polypeptide chain that is bound to the tRNA at the P site is
transferred to the amino acid on the tRNA on the A site, then this tRNA
translocates to the P site, while the tRNA on the P site translocates to the E
site. It is released as the next tRNA binds to the now vacant A site. In this
way, the ribosome builds a protein translated from the codons encoded in the
mRNA.
When completed, the ribosome releases a polypeptide chain that must
then fold to become a functional protein. Protein folding generally involves
a chaperone, a molecular machine which aids the polypeptide chain in
folding into the correct conformation [Zhang et al., 2002]. If the folding
fails to produce the correct structure, the cell has ways to target the protein
for degradation as well as to unfold and refold the protein. Finally, the
protein often requires transport to the correct cellular compartment. For
instance, membrane receptors must be transferred from the ribosomes to the
membrane; nuclear proteins must be moved from the cytoplasm into the
nucleus.
The translation of the proteins can occur in multiple locations.
Generally, proteins that will remain in the cell are translated by ribosomes in
the cytosol, while proteins to be secreted are produced in the endoplasmic
reticulum, processed through the Golgi apparatus, and rapidly exported from
the cell.
5. PROTEIN ACTIVITY
Once the mRNA has been translated into a protein, the subsequent
processing, life expectancy, profile of modifications, and means of function
of the proteins can be extremely complex, and differs greatly from protein to
protein. As proteins accomplish the effective “work” of a cell,
understanding the points necessary to insure creation of a competent work
“unit” is necessary to begin to think about how expression changes seen in a
microarray can result in functional consequences for a cell or organism. For
some proteins, primarily those involved in metabolism, this can be
straightforward. For most proteins, it is complex.
20 Ochs and Golemis
Protein stability varies considerably following translation. Some proteins
have extremely long half-lives, of the order of 20 hours or more. Other
proteins have extremely short half-lives (~2-3 minutes). This difference in
stability reflects the different biological roles of these proteins, which in
some cases require activity as a stable structure (for instance, as part of the
cytoskeleton providing cellular architecture), but in other cases require rapid
turnover (as with proteins critical for execution of a chronologically limited
step in cell cycle progression). Further, some proteins exist at different
locations within cells, or in association with different partner proteins: based
on their location or pattern of association, some intracellular pools of a given
protein may be subject to more rapid degradation than others. These
differences in control of the lifespan of proteins add to the difficulty in
extrapolating protein levels from mRNA levels.
In addition to changes in protein levels, the majority of signaling proteins
are subject to extensive post-translational modifications that can affect their
stability, localization, and activity. For example, the protein initially
exists as an inactive precursor cytoplasmic 105 kD (kiloDalton, a measure of
mass) form that is modified by covalent attachment of ubiquitin, and then
processed to release a 50 kD form that transits to the cell nucleus and
activates transcription. A number of other important proteins are similarly
cleaved and relocated. Another common form of modification is the
attachment of small peptidyl or lipid moieties (e.g., SUMO; myristylation)
that again control patterns of localization, association, and stability.
However, by far the most common form of protein modification is
phosphorylation. The placement of one or more phosphate groups on
proteins by one or more kinases, and the subsequent removal of these groups
by phosphatases, can have drastic effects on every aspect of protein function,
and lies at the core of studies of signal transduction. A rapid screen through
the CGAP signaling resource sponsored by the NIH*
emphasizes the
prevalence of this form of control. Again, different pools of a protein will be
differentially phosphorylated, and hence differentially active against
different targets, within a cell. For some well-studied proteins it is barely
possible to begin to estimate what percentage of an expressed protein pool is
active following a given set of stimuli; for the majority, it is currently an
open question.
Another complicating factor in understanding the activity level of a
protein is the fact that many proteins function for part or all of their
existence as components of complexes involving other proteins. Some
transcription factors only gain their specificity in binding DNA if they
heterodimerize with other proteins. The majority of proteins involved in
*
http://cgap.nci.nih.gov/Pathways
Methods of Microarray Data Analysis III 21
signal transduction become activated through association (sometimes
sequential, sometimes simultaneous) with a large number of other proteins.
Further individual proteins may associate with different sets of partners to
have separate activities in different signaling cascades. Hence, the
expression of one protein, A, in a cell may well have no functional
consequences for output “X”, if the requisite partners for activity in the X
pathway are absent, but function well for output “Y”, because partners
required for that activity are present. In some cases, these requisite partners
may be co-transcriptionally induced with A, and hence predictable by
microarray. In other cases, the partners may pre-exist either ubiquitously (in
all cell types) or selectively (in some cell types), and not be detectable by
methodologies tracing transcriptional induction. Based on two-hybrid or
mass spectrometry proteomics efforts using yeast as a model system, it is
clear that the majority of proteins are engaged in many interactions, with
current estimates suggesting in excess of 10, for any given protein. This is
likely to be an underestimate.
22 Ochs and Golemis
These points of control are briefly summarized in Figure 4. In fact, this
description represents a considerable over-simplification of factors
regulating protein activity. As one example of interest, it has long been
known to cell biologists that cells can have very different responses to
specific growth stimuli when on supports that allow them to form three-
dimensional organized structures rather than on a flat tissue culture plate. In
a number of cases, an appropriate 3-D structure has been shown to be
required for expression of an appropriate transcriptional program [DiPersio
et al., 1991]. In other cases, formation of appropriate cell-cell contacts can
control cellular resistance to drugs and sensitivity to apoptosis [Jacks et al.,
2002], in a process that appears to involve regulation of the availability and
functionality of signaling proteins and transcription factors. As a key issue
in all microarray work is the degree to which in vitro (i.e., cell culture) and
in vivo (e.g., tumor) data can be compared, it is important to keep such
higher order regulatory mechanisms in mind.
6. ISSUES FOR MICROARRAYS
As the above sections indicate, the cell is a complex machine that
evolution has guided to allow it to respond to external threats, survive under
multiple conditions, and, in multicellular organisms, cooperate for the good
of the greater organism. Evolution occurs through the borrowing of
function, the recombination of existing functions into new ones, and the
copying and modification of existing functions. The result is that proteins,
the primary functional components, have multiple roles and multiple
partners allowing the cell to vary the response to a stimulus based on other
stimuli, the external environment including other cells, and on internal state.
Effectively, the cell is a state machine whose response to identical stimuli
can vary according to variables beyond the control of any conceivable
experiment.
The implications for data analysis of microarray data are significant.
First, as is clear from the complexities of transcription, translation, and
protein activity, it is a hopeless task to use changes in the level of expression
of an individual gene as an indicator of changes in the activity of the
corresponding protein without additional information [Chen et al., 2002].
Essentially, the changes in expression are not upstream indicators of protein
activity but instead comprise downstream indicators of changes in the
signaling pathway and cell state.
Second, the signaling networks present within cells allow the cell to
respond differently to the activation of a given pathway, so that the response
to a signal will vary depending on cellular state and the external
environment. Cellular state is impacted by factors such as circadian rhythm
and metabolic oscillations that cannot be well controlled in the laboratory.
Third, the destruction of mRNA can occur in multiple ways. Each of the
mRNA species has a typical half-life, which varies between species, and can
also be targeted for degradation by small RNAs. These issues lead to the
relative concentrations of mRNA being dependent on the timing of mRNA
harvest for some subset of the species within a cell.
Fourth, it is clear that the cell encodes multiple “RNA genes” within a
single “DNA gene” through mRNA splicing. Depending on the specific
sequence spotted onto an array or grown onto a GeneChip, the hybridization
recorded may represent only one of these variants or some or all of these
variants. However, splice variation changes function just as protein
modification does.
Together these biological realities greatly complicate the analysis of
microarray data. Essentially, downstream gene expression is highly
correlated, with pathway activation leading to multiple genes being
expressed, so that expression levels of different genes are not independent.
In addition, each set of genes will involve genes included in multiple
functions, and that therefore respond to multiple stimuli, so that clusters of
genes cannot easily be linked to pathway or function.
Since the cellular responses also vary with cellular state, there is a
variability linked to unobserved variables. The variation will unfortunately
therefore not be stochastic, but instead will represent an underlying
systematic variation within the data. With an adequate number of data
points, the nonstochastic nature should become obvious, however present
microarray studies are data poor in this regard.
For many microarray studies, the situation is further complicated by the
heterogeneic nature of many in vivo samples. In general, it is not feasible to
obtain pure tissue types from tumors or other tissues. The measurements
made on a microarray then represent expression arising from multiple tissue
types (e.g., tumor and surrounding “normal” tissues) rather than from only
the tissue of interest.
In summary, biological systems exhibit complex, highly regulated
behavior with significant feedback in all aspects and at all points in the
response to stimuli, from detection of a signal through activation of a
response to creation of the means to respond to the stimuli. Such systems
include the ability to block the response at multiple stages (e.g., signal
transduction, transcription, translation, protein activation) and to signal
between each stage. This makes the systems highly nonlinear and prone to
the creation of highly correlated errors in derived data. Analysis of the data
should not go forward blind to these realities.
Methods of Microarray Data Analysis III 23
Alberts, B, Lewis, J, Raff, M, Johnson, A, and Roberts, K (2002) Molecular Biology of the
Cell, Ed., Taylor and Francis, Inc., London.
Berger, SL (2002) Histone modifications in transcriptional regulation. Curr Opin Genet Dev
12: 142-8.
Chen, G, Gharib, TG, Huang, CC, Taylor, JM, Misek, DE, Kardia, SL, Giordano, TJ,
Iannettoni, MD, Orringer, MB, Hanash, SM and Beer, DG (2002) Discordant protein and
mRNA expression in lung adenocarcinomas. Mol Cell Proteomics 1: 304-13.
Dilworth, FJ and Chambon, P (2001) Nuclear receptors coordinate the activities of chromatin
remodeling complexes and coactivators to facilitate initiation of transcription. Oncogene
20: 3047-54.
DiPersio, CM, Jackson, DA and Zaret, KS (1991) The extracellular matrix coordinately
modulates liver transcription factors and hepatocyte morphology. Mol Cell Biol 11: 4405-
14.
Featherstone, M (2002) Coactivators in transcription initiation: here are your orders. Curr
Opin Genet Dev 12: 149-55.
Green, MR (2000) TBP-associated factors (TAFIIs): multiple, selective transcriptional
mediators in common complexes. Trends Biochem Sci 25: 59-63.
Jacks, T and Weinberg, RA (2002) Taking the study of cancer cell survival to a new
dimension. Cell 111: 923-5.
Jordan, JD, Landau, EM and Iyengar, R (2000) Signaling networks: the origins of cellular
multitasking. Cell 103: 193-200.
Kolch, W (2000) Meaningful relationships: the regulation of the Ras/Raf/MEK/ERK pathway
by protein interactions. Biochem J 351 Pt 2: 289-305.
McManus, MT and Sharp, PA (2002) Gene silencing in mammals by small interfering RNAs.
Nat Rev Genet 3: 737-47.
Posas, F, Takekawa, M and Saito, H (1998) Signal transduction by MAP kinase cascades in
budding yeast. Curr Opin Microbiol 1: 175-82.
Ramakrishnan, V (2002) Ribosome structure and the mechanism of translation. Cell 108:
557-72.
Reed, R and Hurt, E (2002) A conserved mRNA export machinery coupled to pre-mRNA
splicing. Cell 108: 523-31.
Reines, D, Conaway, RC and Conaway, JW (1999) Mechanism and regulation of
transcriptional elongation by RNA polymerase II. Curr Opin Cell Biol 11: 342-6.
Sharp, PA (1994) Split genes and RNA splicing. Cell 77: 805-15.
Stutz, F and Rosbash, M (1998) Nuclear RNA export. Genes Dev 12: 3303-19.
Tzivion, G, Shen, YH and Zhu, J (2001) 14-3-3 proteins; bringing new definitions to
scaffolding. Oncogene 20: 6331-8.
Zhang, X, Beuron, F and Freemont, PS (2002) Machinery of protein folding and unfolding.
Curr Opin Struct Biol 12: 231-8.
24 Ochs and Golemis
ACKNOWLEDGMENTS
We thank the National Institutes of Health, National Cancer Institute
(CCCG CA06927 to R. Young), the Pennsylvania Department of Health
(grants to mfo and eag), and the Pew Foundation for support.
REFERENCES
2
MONITORING THE QUALITY OF
MICROARRAY EXPERIMENTS
Kevin R. Coombes, Jing Wang, Lynne V. Abruzzo
University of Texas M.D. Anderson Cancer Center, Houston, TX.
A microarray experiment is a complex, multistep process involving biology,
chemistry, physics, and bioinformatics. Something can go wrong at every step
in the process. In order to obtain good results, one needs a thorough, redundant
system to monitor the quality of microarray experiments. In this article, we
provide an overview of quality control measures that can be applied at
different points during the process of conducting and analyzing microarray
experiments.
A microarray experiment is a complex, multistep process. Clones of
known DNA sequences must be grown, harvested, and spotted in precise
locations on the microarray substrate. RNA must be extracted from
experimental samples, labeled targets must be produced by reverse
transcription, and the targets must be hybridized with a microarray. The
hybridized microarray must be scanned to produce a computer image. The
image must be loaded into a software package for quantification, a template
grid must be precisely aligned, and estimates must be computed for spot
intensity and local background. The initial intensity estimates must be
background-corrected and normalized. In a project that includes tens or
hundreds of individual microarray experiments, the final intensity estimates
from each individual microarray experiment must be combined into a single
data structure for further analysis.
Something can go wrong at every step in the process. Every time a
person handles a physical microarray, views an image of a microarray in an
microarrays, quality control, process control, acceptance testing
Abstract:
Key words:
1. INTRODUCTION
26 Coombes et al.
image-editing program, explores the quantifications in a spreadsheet or
database, or saves and labels a file, there is a potential for errors to occur. To
obtain the best possible results, one must devise a thorough, redundant
system to monitor the quality of microarray experiments.
An illustration of the potential difficulties is provided elsewhere in this
volume [Stivers et al., 2003]. In their analysis of the Project Normal data set,
Stivers and colleagues found that the UniGene annotations of the spots on
the microarrays were inconsistent in the data files supplied for the 2002
conference on the Critical Assessment of Microarray Data Analysis. In
response to this finding, the authors of the original Project Normal study
[Pritchard et al., 2002] conducted an extremely careful and detailed review
of their data quality. During the conference, they confirmed that errors had
occurred when merging the quantified expression data into a spreadsheet
with the UniGene annotations. They also uncovered minor problems with a
small number of quantifications, caused by subgrids that had not been
correctly aligned. They prepared a revised data set, which is available from
the conference web site (http://www.camda.duke.edu/camda02/contest.asp),
thus creating perhaps the most accurate microarray data set currently
available.
In this article, we provide an overview of quality control measures that
can be applied at different points during the process of conducting and
analyzing microarray experiments.
Microarray experiments undergo a critical phase change during the
scanning process: they pass from the physical world of clones, plates, slides,
and robots into the virtual, computerized world of bioinformatics. During the
physical phase, quality control is driven by the biological and chemical
properties of the reagents. The most effective method for maintaining a high
quality microarray facility is, in one sense, quite simple: decide on a
standard set of protocols, validate them, and follow them religiously.
Collections of such protocols can be obtained from the Brown lab at
Stanford (http://cmgm.stanford.edu/pbrown/protocols/index.html) or from
The Institute for Genomic Research (http://www.tigr.org/tdb/microarray/
protocolsTIGR.shtml) [Hegde et al., 2001]. During the virtual phase of a
microarray experiment, both the source of errors and our ability to detect and
control them is governed by bioinformatics. Bioinformatics quality can best
be maintained by avoiding spreadsheets or general file naming conventions
in favor of a specialized database [Brazma et al., 2002]. Data should be
stored and transferred in a format that maintains enough detailed structure to
2. QUALITY CONTROL BY BIOLOGISTS
Methods of Microarray Data Analysis III 27
recover biological information about both the samples and the genes on the
array [Brazma et al., 2001].
In this section, we describe how to monitor the quality of two critical
ingredients—the RNA samples and the microarray slides—before the
hybridization step. Because assessment of hybridization quality is typically
based on the scanned image, we will reserve its discussion to a later section.
The basic component of a microarray experiment is the RNA extracted
from samples. Before running a microarray experiment, this quality should
be tested. If enough total RNA is available, it can be evaluated by agarose
gel electrophoresis with ethidium bromide. One expects to see crisp bands
from the 28S and 18S ribosomal RNA, in a ratio of two-to-one. When the
amount of RNA is limited, as is often the case with samples obtained by fine
needle aspiration or laser capture microdissection, as little as 5–500 ng total
RNA can be evaluated using a model 2100 Bioanalyzer (Caliper
Technologies, Mountain View, CA) [Dunmire et al., 2002].
The second fundamental component of a microarray experiment is the
physical array itself. In order to obtain good results, the spots on the array
should contain uniform, equivalent amounts of spotted polymerase chain
reaction (PCR) products or oligonucleotides. Pre-scanning the slide before
using it in a hybridization experiment can verify this property. Good results
have been reported using nondestructive staining with a fluorescent dye such
as SYBR green II [Battaglia et al., 2000] or with SYTO 61 [Yue et al.,
2001]. Alternatively, Shearstone and colleagues have described a method for
printing oligonucleotide microarrays spiked with small amounts of dCTP-
Cy3 or dCTP-Cy5. The slides can be scanned to verify spot morphology by
detecting the spiked products, which are then washed off before hybridizing
the microarray with the sample of interest [Shearstone et al., 2002].
Many institutions have established core facilities that manufacture
microarrays by spotting either PCR products from cDNA clones or
chemically synthesized oligonucleotides on glass slides. Managing,
maintaining, and handling large libraries of clones bring their own set of
quality control problems. These problems, once again, include a mixture of
2.1 Monitoring RNA quality
2.2 Monitoring physical array quality
2.3 Quality control during array manufacturing
28 Coombes et al.
physical difficulties (contamination from nearby wells; suboptimal
conditions for the PCR reactions) and bioinformatical challenges
(maintaining the correct annotations of the clones). For example, when we
resequenced the cDNA clones being printed on microarrays at the Genomics
Core Facility at the University of Texas M.D. Anderson Cancer Center, we
found an error rate of 21% in the clones supplied by the manufacturer
[Taylor et al., 2001]. Our experience strongly suggests the need to sequence-
verify the clones at the final stage before printing them on microarray slides.
To date, we have found a much lower error rate among the synthesized
oligonucleotides printed on arrays at our facility. Nevertheless, we advise
randomly selecting a few oligonucleotides for sequence verification.
Because slides are printed in batches, we can test the quality of an entire
batch using established procedures. For example, we can randomly select a
small number of slides from each batch. We then test the quality of these
microarrays (destructively) by performing a hybridization experiment. If the
selected microarrays pass the test, then the batch is accepted.
Methods of Microarray Data Analysis III 29
The number of slides that must be tested (and pass) depends on the batch
size, the desired confidence level, and other factors. We use Bayesian
methods to make this determination. We want to estimate the proportion of
“good” slides in a batch. If we randomly test N slides, then the probability
that k of those slides pass the test is described by a binomial distribution
We assume a Beta prior distribution, Then standard
computations show that the posterior distribution is also a Beta distribution,
The Bayesian formulation allows us to
make our assumptions about the prior distribution explicit (Figure 1). For
instance, if we believe that a single batch is likely to consist either almost
entirely of good slides or almost entirely of bad slides, we can choose prior
parameters a = b = ½. In this case, testing a single slide can supply enough
evidence to determine the quality of the entire batch. If we instead assume a
uniform prior on the percentage of good slides, then we will need at least
three slides to pass the test in order to accept the batch. One can, of course,
make more extensive tests on a few batches to get a better idea of how the
quality varies within a batch. This information can be used to make an
informed choice of the prior distribution.
There are various choices for the standard hybridization experiment used
to test the slides. For instance, all clones spotted on microarrays in our
Genomics Core Facility share a common sequence (complementary to the
primer). We can label copies of the primer with fluorescent dye and
hybridize them to the microarray. We can then assess the attachment of the
probes at all spots by scanning the microarrays after hybridization [Hu et al.,
2002]. Alternatively, one can chose a standard reference material (such as
the Universal Reference available from Stratagene Inc., La Jolla, CA) for
this purpose [Weil et al., 2002]. A known profile of this reference material
can also be used as part of a process control strategy to monitor the quality
ofhybridizations over time; we return to this idea later.
We now turn to quality control methods that apply to individual
microarray experiments just after hybridization and scanning. Some of these
methods are used to assess the quality of individual spots. Others are used to
decide if the overall quality of the hybridization is low enough to reject it
and request that the experiment be repeated. In this section, we discuss
methods to assess spot quality.
3. SPOT LEVEL METRICS
30 Coombes et al.
Saturation of the fluorescent signal is one of the more common problems
affecting the reliability of spot intensity measurements. Including saturated
spots in the analysis can severely distort the results, whether detecting
differentially expressed genes or using clustering for class discovery [Hsiao
et al., 2002]. Most microarray quantification packages provide a measure of
signal saturation. The recommended procedure is to remove frequently
saturated gene measurements from consideration. If the number of saturated
spots varies widely across experiments, it is usually necessary to revise the
scanning protocol in an effort to achieve greater consistency.
Software quantification packages provide different measures of spot
quality. Yidong Chen has developed a package to quantify microarray
images that includes several spot quality metrics [Chen et al., 2002]. This
package computes quality measures for the fluorescent intensity (quality is
essentially the mean signal pixel intensity divided by the standard deviation
of the background intensity), the target area (a function of the percentage of
the expected spot area occupied by signal above background), the
background flatness (a measure of the extent to which the local background
exceeds the mean background across a large portion of the array), and the
signal intensity consistency (a measure of the variability of the signal pixel
intensity). All four of the quality measures are transformed to lie between 0
and 1, and the overall quality of the spot is taken to be the minimum of the
four measures. This approach allows analysts to exclude measurements at
individual spots from further analysis if the quality is too low.
Other approaches to assessing spot quality have been proposed. Wang
and colleagues have also developed a quantification software package that
provides multiple measures of spot quality [Wang et al., 2001]. Brown and
colleagues use the “spot ratio variability”, which divides the pixel-by-pixel
standard deviation of the ratio by the mean ratio [Brown et al., 2001]. Tran
and colleagues examine the difference between the mean signal intensity and
median signal intensity of the pixels within a spot, requiring that they differ
by less than 10% at good spots [Tran et al., 2002].
After a hybridized slide has been scanned and quantified, several
methods can be applied to assess the quality of the experiment. The
immediate goal of these tests is usually to decide rapidly if the experiment
needs to be repeated.
4. QUALITY CONTROL OF INDIVIDUAL SLIDES
Methods of Microarray Data Analysis III 31
The simplest test to describe is also the hardest to quantify or to
automate. Before starting the statistical analysis, we look at the scanned
images of the microarray to ensure that there are no obvious gross
anomalies. (This step is sometimes referred to as “cortical filtering”; in order
to pass the test, the data must be successfully processed by a human cortex.)
We visualize the images in MATLAB (The Math Works, Inc., Natick MA),
which allows us to alter the color map. Most people do a poor job of
discriminating between different shades of gray, and they are even less good
at distinguishing colors that shade from black into red or green. False color
images prepared in MATLAB can give a good idea of the quality of the
spots and of any strange behavior in the background.
4.1 View the images
After deciding that the images are adequate, we load the quantification
data into S-Plus (Insightful Corp., Seattle WA). We compute kernel density
estimates of the distributions of the logarithm of the raw volume intensity
estimates and the raw background estimates for each channel (Figure 2). The
example here shows a fairly typical distribution pattern. The volume has a
high peak with a long right tail. The background has a tighter peak located at
about the same point as the volume density, without the extended tail. The
background-corrected volume is much more spread out. We interpret these
distributions as telling us that a large number of spots are near or below the
background levels; the spots that truly show significant gene expression are
the ones in the long tail at the right.
Any qualitative change in these density plots is taken as an indication of
a potential problem. Bimodal peaks in the background plots, for example,
usually indicate that large regions are subject to excessively high
background noise. They may also indicate that a portion of the grid was
improperly aligned during quantification, causing the signal to be counted as
background over a large subsection of the array. Obvious differences in the
shape of the distributions from the red and green channels typically indicate
a problem (poor quality RNA or faulty labeling) with one of the samples.
32 Coombes et al.
4.2 Density estimates
Methods of Microarray Data Analysis III 33
We also look at “cartoon” images of the logarithm of the estimated
background intensity at each spot (Figure 3). The log transformation allows
us to see more detail at lower intensity levels, where the background is more
important. In this example, the red channel background is plotted in the top
figure and shows few significant features; the most visible glitch consists of
a few spots near the left end. As usually happens, the background in the
green channel is both slightly higher on average (as reflected in the estimate
of the median) and more variable. In this case, the background is
substantially higher on the right third of the green channel.
4.3 Topological plots of background
4.4 Signal-to-noise ratio
One of the most effective measures for the overall quality of a microarray
experiment is the percentage of spots with adequate signal-to-noise ratio
(S/N). The precise definition of “adequate” in this setting may vary from
platform to platform. On Affymetrix microarrays, for example, the model
used in their Microarray Analysis Suite version 5.0 provides detection p-
34 Coombes et al.
values along with “present” or “absent” calls for each gene. We typically
find that around 40–50% of the genes are present in a successful
hybridization on an Affymetrix microarray.
On our own two-color fluorescent microarrays, we typically require good
spots to have S/N > 2. We have found, for successful hybridizations, that the
percentage of spots with S/N > 2 is typically in the range of 30% to 60%.
We see a similar range of percentages regardless of the tissue of origin for
the RNA sample. We have seen extreme cases where as few as 5% or as
many as 95% of the spots pass the signal-to-noise threshold; both extremes
are indications that something went wrong during the conduct of the
experiment.
5. QUALITY CONTROL WITH REPLICATE SPOTS
The microarrays produced in our Genomics Core have the virtue that
every gene of interest is spotted on the array in duplicate. The value of
printing replicate spots to obtain more accurate expression measurements
has been described previously [Ramakrishnan et al., 2002]. Because of the
replications, we can estimate measurement variability within the microarray.
For each channel, we make scatter plots that compare the first and second
member of each pair of replicates (Figure 4, left). More useful, however, are
the Bland-Altman plots obtained by rotating the scatter plots by 45 degrees
(Figure 4, right). To construct a Bland-Altman plot, we graph the difference
between the log intensity of the replicates (whose absolute value is an
estimate of the standard deviation) as a function of the mean log intensity. In
the typical plots shown here, there is a much wider spread (a “fan” or a
“fishtail”) at the left end of the plot (low intensity) and much tighter
reproducibility at the right end (high intensity). We can fit a smooth curve to
this graph that estimates the variability (in the form of the absolute
difference) as a function of the mean intensity. In Figure 4, we have plotted
confidence bands at plus and minus three times those smooth curves.
Replicates whose difference lies outside these confidence bands are
intrinsically less reliable than replicates whose difference lies within the
bands [Baggerly et al., 2000; Tseng et al., 2001; Loos et al., 2001].
Methods of Microarray Data Analysis III 35
All microarray experiments must include some degree of replication if
we are to assign any meaningful statistical significance to the results. In the
previous section, we showed how some simple plots can automatically
detect spots where replicated spots give divergent measurements. The same
method can be used with replicated experiments. This method works with
purely technological replicates (separate hybridizations of the same labeled
RNA, separately labeling the same RNA, or separate RNA extractions from
the same mixture of cells) or with biological replicates (in which RNA
samples are independently obtained under similar biological conditions
[Novak et al., 2002; Raffelsberger et al., 2002]. When replication occurs
later in the process, of course, one expects the scatter plot of the replicates to
follow the identity line more closely.
Scatter plots of replicate experiments can detect two other common
problems. The first problem can occur when merging data into a spreadsheet
for analysis. It is all too easy to sort data from different arrays in different
orders, which causes the data rows to be mismatched. If the entire array is
affected, then a plot of the replicates shows an uncorrelated cloud of points
instead of the expected band along the identity line (Figure 5). Mismatching
a subset of the rows produces a plot that combines part of the expected band
6. QUALITY CONTROL USING REPLICATE
EXPERIMENTS
36 Coombes et al.
with a smaller cloud. This phenomenon can also occur when a few of the
subgrids on one array were misaligned during quantification (Figure 5).
7. CLUSTERING FOR QUALITY CONTROL
In some array studies, the investigators make a clear distinction between
class discovery and discrimination. Supervised methods (linear discriminant
analysis or support vector machines) are appropriate if the goal of the study
is discrimination or diagnosis. Unsupervised methods (hierarchical
clustering, principal components analysis, or self-organizing maps) are
generally appropriate if the goal of the study is class discovery. In some
published studies, unsupervised methods have been used regardless of the
goal. The attitude seems to have been: “because our unsupervised method
rediscovered known classes, our microarrays are as good at pattern
recognition as a diagnostic pathologist.” In our opinion, such an attitude is
misguided.
Nevertheless, unsupervised classification methods do have an appropriate
place in the analysis of microarray data even when the correct classification
of the experimental samples is known. The avowed purpose of unsupervised
methods is to uncover any structure inherent in the data. If they recover
known structure, then that result is hardly surprising. Because the statistical
and mathematical properties of these methods have not been fully
investigated in the context of microarray data, neither can one draw strong
conclusions from the recovery of known structure. As we have pointed out,
Methods of Microarray Data Analysis III 37
many things can go wrong during the process of collecting microarray data.
If an unsupervised classification method yields deviations from the known
structure, these deviations can be quite revealing of problems in data quality.
We have successfully used clustering methods to identify many of the
problems described in earlier sections of this paper. When the study includes
technological replicates of each experiment, we expect the replicates to be
nearest neighbors (or at least close neighbors), as portrayed, for instance, in
a dendrogram based on a hierarchical cluster analysis. We have seen
instances where replicates failed to be neighbors; in every case, the cause of
the failure could be traced to a problem with the data. The problems detected
using this method have included, among others:
All the methods discussed so far deal with quality control rather than
process control. The distinction is simple. Quality control is based primarily
on inspections. Its goal is to identify components of low quality and reject
them. We have described methods to inspect RNA samples and microarray
slides before hybridization. We have described various methods to inspect
the scanned image of a microarray experiment and determine the quality of
the hybridization. We have also looked at methods to inspect the quality of
individual spots. In this context, the use of clustering methods plays the role
individual arrays where the data rows had been reordered,
misalignment of the entire grid or of subgrids,
analytical errors where “red – green” differences had accidentally been
combined with “green – red” differences,
differences between microarrays produced in different print lots, and
differences in the dynamic range of signals from different arrays.
Plotting the samples using the first two or three components from a
principal components analysis (PCA) can also reveal similar anomalies. A
striking application of this method can be found elsewhere in this volume
[Stivers et al., 2003]. By applying PCA to the individual channels of the
Project Normal microarray experiments, we found that the reference
channels could be separated based on the tissue of origin of the sample
hybridized to the other channel. This finding led us to look more closely at
the data. In turn, this led to the discovery that the UniGene annotations in
data files from experiments using liver RNA did not agree with the UniGene
annotations in data files from experiments using kidney RNA.
8. PROCESS CONTROL
Exploring the Variety of Random
Documents with Different Content
Nor view thee with severe, truth-
searching eye,
Melting thy fairy visions into air.
Thy paradise, delighted, let me
rove,
There study nature, and with
grateful heart,
In thy serene, translucent stream
behold
The light of truth reflected, and the
smile
Of heaven's benevolence, and in
that glass
The loveliness of every Virtue woo
And every Grace. There let me, too,
behold
In all her beauty, bright-eyed Poesy,
That heavenly Maid who charm'd
my youthful heart;
And let the love of glory fire my
breast;
And let me see, to stimulate my
powers,
The new-born crescent of my fame
ascend,
While on its pointed horn the Fairy,
Hope,
On tiptoe stands, fluttering her airy
wings
To fan its beams and joyful hails the
hour
When in its full-orb'd glory it shall
shine.
A SUMMER-EVENING.
——————
Come, my dear Love, and let us
climb yon hill,
The prospect, from its height, will
well reward
The toil of climbing; thence we shall
command
The various beauties of the
landscape round.—
Now we have reached the top. O!
what a scene
Opens upon the sight, and swallows
up
The admiring soul! She feels as if
from earth
Uplifted into heaven. Scarce can she
yet
Collect herself, and exercise her
powers.
While o'er heaven's lofty, wide-
extended arch,
And round the vast horizon, the
bold eye
Shoots forth her view, with what
sublime delight
The bosom swells! See, where the
God of day,
Who through the cloudless ether
long has rid
On his bright, fiery car, amidst a
blaze
Of dazzling glory, and in wrath shot
round
His burning arrows, with tyrannic
power
Oppressing Nature, now, his daily
course
Well-nigh completed, toward the
western goal
Declines, and with less awful
majesty
Concludes his reign; his flamy
chariot hid
In floods of golden light that
dazzles still,
Though less intense. O! how these
scenes exalt
The throbbing heart! Louisa, canst
thou bear
These strong emotions? do they not
o'erpower
Thy tender nerves? I fear, my Love,
they do;
Those eyes that, late, with transport
beam'd so bright,
Now veil their rays with the soft,
dewy shade
Of tenderness. Let us repose
awhile;
The roots of yonder tree, cover'd
with moss,
Present a pleasing seat; there let us
sit.
Hark! Zephyr wakes, and sweetly-
whispering, tells
The approach of Eve; already
Nature feels
Her soothing influence, her
refreshing breath;
The fields, the trees, imbibe the
cool, moist air,
Their feverish thirst allay, and smile
revived.
The Soul, too, feels her influence,
sweetly soothed
Into a tender calm. O! let us now,
My loved Louisa! let us now enjoy
The landscape's charms, and all the
nameless sweets
Of this, our favourite hour, for ever
dear
To Fancy and to Love. Cast round
thy sight
Upon the altered scene, nor longer
fear
The dazzling sun; his latest,
lingering beams
Where are they? can'st thou find
them?—see! they gild
The glittering top of yonder village-
spire;
Upon that distant hill they faintly
shine;
And look! the topmost boughs of
this tall oak
Majestic, which o'ercanopies our
heads,
Yet catch their tremulous
glimmerings:—now they
fade,
Fade and expire; and, as they fade,
the Moon,
The full-orb'd Moon, that seem'd,
erewhile, to melt
In the bright azure, from the
darkening sky
Emerging slow, and silent, sheds
around
Her snowy light, that with the day's
last, dim
Reflection, from the broad,
translucid lake,
Insensibly commingles, and unites
In sweetest harmony, o'er all the
scene
Diffusing magic tints, enchanting
power.
How lovely every object now
appears!
Each in itself, and how they all
combine
In one delightful whole! What eye,
what heart,
O Nature! can resist thy potent
charms
When thus in soft, transparent
shade half-veil'd?
Now Beauty and Sublimity,
methinks,
Upon the lap of Eve, embracing
sleep.
Mark the tree-tops, my Love, of
yonder wood,
Whose moonlight foliage fluctuates
in the breeze,
Say, do they not, in figure, motion,
hue,
Resemble the sea-waves at misty
dawn?
What shadowy shape along the
troubled lake
Comes this way moving? how
mysteriously
It glides along! how indistinct its
form!
Imagination views with sweet
surprise
The unknown appearance—
breathless in suspense.
The Spirit of the waters can it be,
On his aerial car? some fairy Power?
Pants not thy heart, Louisa, half-
alarm'd?
It grows upon the sight,—strange,
watery sounds
Attend its course;—hark! was not
that a voice?
O! 'tis a fishing-boat!—its sails and
oars
I now discern. The church-clock
strikes! how loud
Burst forth its sound into the
startled air,
That feels it still, and trembles far
around!
My dearest Love! it summons us
away;
The dew begins to fall; let us
depart:
How sweetly have we spent this
evening-hour!
PROLOGUE.
——————
The piece, to-night, is of peculiar
kind,
For which the appropriate name is
hard to find;
No Comedy, 'tis clear; nor can it be,
With strictest truth, pronounced a
Tragedy;
Since, though predominant the
tragic tone,
It reigns not uniformly and alone;
Then, that its character be best
proclaim'd,
A Tragic-drama let the piece be
named.
But do not, Critics! rashly hence
conclude,
'Tis a mere Farce, incongruous and
rude,
Where incidents in strange
confusion blend,
Without connexion, interest, or end:
Not so;—far different was the bard's
design;
For though, at times, he ventures to
combine
With grave Melpomene's
impassion'd strain
The gay Thalia's more enlivening
vein;
(As all mankind with one consent
agree
How strong the charms of sweet
variety,)
Yet Reason's path he still with care
observes,
And ne'er from Taste with wilful
blindness swerves,
His plot conducting by the rules of
art:
And, above all, he strives to touch
the heart;
Knowing that, void of pathos and of
fire,
Art, Reason, Taste, are vain, and
quickly tire.
Be mindful then, ye Critics! of the
intent;
The poet means not here to
represent
The tragic Muse in all her terrors
drest,
With might tempestuous to
convulse the breast;
Nor in her statelier, unrelaxing
mien,
To stalk, in buskin'd pomp, through
every scene;
But with an air more mild and
versatile,}
Where fear and grief, sometimes,
admit a smile,}
Now loftier, humbler now, the
changing style,}
Resembling in effect an April-night
When from the clouds, by fits, the
moon throws forth her light;
And louder winds, by turns, their
rage appease,
Succeeded by the simply-whispering
breeze.
But, in few words our author ends
his plea,
Already tending to prolixity,
To paint from Nature was his
leading aim;
Let then, the play your candid
hearing claim:
Judge it, impartial, by dramatic
laws;
If good, reward it with deserved
applause;
If bad, condemn; yet be it still
exempt
From your severer blame, for 'tis a
first attempt.
PROLOGUE.
——————
Lo! Time, at last, has brought, with
tardy flight,
The long-anticipated, wish'd-for
night;
How on this blissful night, while yet
remote,
Did Hope and Fancy with fond
rapture doat!
Like eagles, oft, in glory's dazzling
sky,
With full-stretch'd pinions have they
soar'd on high,
To greet the appearance of the
poet's name,
Dawning conspicuous mid the stars
of fame.
Alas! they soar not now;—the
demon, Fear,
Has hurl'd the cherubs from their
heavenly sphere:
Fancy, o'erwhelm'd with terror,
grovelling lies;—
The world of torment opens on her
eyes,
Darkness and hissing all she sees
and hears;—
(The speaker pauses—the
audience are
supposed to clap, when
he continues,)
But Hope, returning to dispel her
fears,
Claps her bright wings; the magic
sound and light
At once have forced their dreaded
foe to flight,
Silenced the hissing, chased the
darkness round,
And charm'd up marvelling Fancy
from the ground.
Say, shall the cherubs dare once
more to fly?
Not, as of late, in glory's dazzling
sky,
To greet the appearance of the
poet's name,
Dawning conspicuous mid the stars
of fame;
Presumptuous flight! but let them
dare to rise,
Cheer'd by the light of your
propitious eyes,
Within this roof, glory's contracted
sphere,
On fluttering pinions, unsubdued by
Fear;
O! let them dare, ere yet the curtain
draws,
Fondly anticipate your kind
applause.
EPILOGUE.
——————
Perplexing case!—your pardon,
Friends, I pray,—
My head so turns, I know not what
to say;—
However, since I've dared to come
before ye,
I'll stop the whirligig,—
(Clapping his hand to his
forehead,)
and tell my story:
Though 'tis so strange, that I've a
pre-conviction
It may by some, perhaps, be judged
a fiction.
Learn, gentle Audience, then, with
just surprise,
That, when, to-night, you saw the
curtain rise,
Our poet's epilogue was still unwrit:
The devil take him for neglecting it!
Nay though,—'twas not neglected;
'twas deferr'd
From certain motives—which were
most absurd;
For, trusting blindly to his rhyming
vein,
And still-prepared inventiveness of
brain,
He'd form'd the whimsical, foolhardy
plan,
To set about it when the play began;
Thus purposing the drama's fate to
know,
Then write his epilogue quite à
propos.
The time at last arrives—the signal
rings,
Sir Bard, alarm'd, to pen and paper
springs,
And, snug in listening-corner, near
the scene,
With open'd ears, eyes, mouth-
suspended mien,—
Watches opinion's breezes as they
blow,
To kindle fancy's fire, and bid his
verses flow.
Now I, kind Auditors! by fortune's
spite
Was doom'd, alack! to speak what
he should write,
And therefore, as you'll naturally
suppose,
Could not forbear, at times, to cock
my nose
Over his shoulder, curiously to trace
His progress;—zounds! how snail-like
was his pace!
Feeling, at length, my sore-tried
patience sicken,
Good Sir, I cried, your tardy motions
quicken:
'Tis the fourth act, high time, Sir, to
have done!
As if his ear had been the touch-hole
of a gun,
My tongue a match, the Bard, on
fire, exploded;
He was—excuse the pun—with grape
high-loaded.
Hence, prating fool! return'd he, in a
roar,
Push'd me out, neck and heels, and
bang'd the door.
But lest, here too, like hazard I
should run;}
I'll end my story. When the play was
done,}
The epilogue was—look! 'tis here—
begun:}
Such as it is, however, if you will,
I'll read it; shall I, Friends?
(They clap.)
Your orders I
fulfil.
(He reads.)
'Tis come! the fateful hour! list! list!
the bell
Summons me—Duncan-like, to
heaven or hell;
See, see, the curtain draws;—it now
commences;
Fear and suspense have frozen up
my senses:
But let me to my task:—what noise is
this?
They're clapping, clapping, O ye
gods, what bliss!
Now then, to work, my pen:—
descend, O Muse!
Thine inspiration through my soul
infuse;
Prompt such an epilogue as ne'er
before
Has been imagined,—never will be
more.
What subject? hark! new louder
plaudits rise,
I'm fired, and, like a rocket, to the
skies
Dart up triumphantly in flames of
light:—
They hiss, I'm quench'd, and sink in
shades of night.
Again they clap, O extacy!—
Having thus far indulged his rhyming
vein,
He halts,—reads,—curses,—and
begins again;
But not a single couplet could he
muster;
How should he, with his soul in such
a fluster,
All rapture, gratitude, for your
applause?
Be then, the effect excused in favour
of the cause!
LINES
ON THE DEATH OF THE REV. MR. B.
(SUPPOSED TO BE WRITTEN BY MISS B***, HIS
SISTER.)
——————
At God's command the vital spirit
fled,
And thou, my Brother! slumber'st
with the dead.
Alas! how art thou changed! I
scarcely dare
To gaze on thee;—dread sight!
death, death is there.
How does thy loss o'erwhelm my
heart with grief!
But tears, kind nature's tears afford
relief.
Reluctant, sad, I take my last
farewell:—
Thy virtues in my mind shall ever
dwell;
Thy tender friendship felt so long
for me,
Thy frankness, truth, thy generosity,
Thy tuneful tongue's persuasive
eloquence,
Thy science, learning, taste, wit,
common sense,
Thy patriot love of genuine liberty,
Thy heart o'erflowing with
philanthropy;
And chiefly will I strive henceforth
to feel
Thy firm religious faith and pious
zeal,
Enlighten'd, liberal, free from
bigotry,
And, that prime excellence, thy
charity.
Farewell!—for ever?—no! forbid it,
Heaven!
A glorious promise is to Christians
given;
Though parted in this world of sin
and pain,
On high, my Brother! we shall meet
again.
LINES TO AN INFIDEL,
AFTER HAVING READ HIS BOOK AGAINST
CHRISTIANITY.
——————
Your book I've read: I would that I
had not!
For what instruction, pleasure, have
I got?
Amid that artful labyrinth of doubt
Long, long I wander'd, striving to
get out;
Your thread of sophistry, my only
clue,
I fondly hoped would guide me
rightly through:
That spider's web entangled me the
more:
With desperate courage onward still
I went,
Until my head was turn'd, my
patience spent:
Now, now, at last, thank God! the
task is o'er.
I've been a child, who whirls himself
about,
Fancying he sees both earth and
heaven turn round;
Till giddy, panting, sick, and
wearied out,
He falls, and rues his folly on the
ground.
LINES
ON HEARING A YOUNG GENTLEMAN, WHO IS
BOTH LAME
AND BLIND, BUT IN OTHER RESPECTS VERY
HANDSOME,
SING AND PLAY ON HIS VIOLIN FOR THE FIRST
TIME.
——————
Crippled his limbs, and sightless are
his eyes;
I view the youth, and feel
compassion rise.
He sings! how sweet the notes! in
pleased amaze
I listen,—listen, and admiring gaze.
Still, as he catches inspiration's fire,
Sweeping with bolder hands the
obedient strings,
That mix, harmonious, with the
strains he sings,
He pours into the music all his soul,
And governs mine with strong, but
soft controul:
Raptured I glow, and more and
more admire.
His mortal ailments I no longer see;
But, of divinities my fancy dreams;
Blind was the enchanting God of
soft desire;
And lame the powerful Deity of fire;
His bow the magic rod of Hermes
seems;
And in his voice I hear the God of
harmony.
LINES TO A PEDANTIC CRITIC.
——————
Critic! should I vouchsafe to learn of
thee,
Correct, no doubt, but cold my
strains would be:
Now, cold correctness!—I despise
the name;
Is that a passport through the gates
of fame?
Thy pedant rules with care I studied
once;
Was I made wiser, or a greater
dunce?
Hence, Critic, hence! I'll study them
no more;
My eyes are open'd, and the folly's
o'er.
When Genius opes the floodgates of
the soul,
Fancy's outbursting tides impetuous
roll,
Onward they rush with unresisted
sway,}
Sweeping fools, pedants, critics, all
away}
Who would with obstacles their
progress stay.}
As mighty Ocean bids his waves
comply
With the great luminaries of the sky,
So Genius, to direct his course
aright,
Owns but one guide, the inspiring
God of light.
LINES ON SHAKSPEARE.
(SUPPOSED TO BE WRITTEN NEAR HIS TOMB.)
——————
Behold! this marble tablet bears
inscribed
The name of Shakspeare!—
What a glorious
theme
For never-ending praise! His
drama's page,
Like a clear mirror, to our wondering
view
Displays the living image of the
world,
And all the different characters of
men:
Still, in the varying scenes, or sad,
or gay,
We take a part; we weep; we
laugh; we feel
All the strong sympathies of real
life.
To him alone, of mortals, Fancy lent
Her magic wand, potent to conjure
up
Ideal Forms, distinctly character'd,
Exciting fear, or wonder, or delight.
The works of Shakspeare! are they
not a fane,
Majestic as the canopy of heaven,
Embracing all created things, a fane
His superhuman genius has
upraised,
To Nature consecrate? The Goddess
there
For ever dwells, and from her
sanctuary,
By Shakspeare's voice, her poet and
high-priest,
Reveals her awful mysteries to man,
And with her power divine rules
every heart.
At Shakspeare's name, then, bow
down all ye sons
Of learning, and of art! ye men,
endow'd
With talent, taste! ye nobler few
who feel
The genuine glow of genius! bow
down all
In admiration! with deep feeling
own
Your littleness, your insignificance;
And with one general voice due
homage pay
To Nature's Poet, Fancy's best-loved
Child!
LINES ON MILTON.
(SUPPOSED TO BE WRITTEN NEAR HIS TOMB.)
——————
Milton!—
the name of that
divinest Bard
Acts on Imagination like a charm
Of holiest power;—with deep,
religious awe
She hails the sacred spot where
sleep entomb'd
The relics that enshrined his godlike
soul.
O! with what heartfelt interest and
delight,
With what astonishment, will all the
sons
Of Adam, till the end of time,
peruse
His lofty, wondrous page! with what
just pride
Will England ever boast her Milton's
name,
The Poet matchless in sublimity!
E'en now in Memory's raptured ear
resound
The deep-toned strains of the
Miltonic lyre;
Inspiring virtuous, heart-ennobling
thought,
They breathe of heaven; the
imaginative Power
No longer treads the guilt-polluted
world,
But soars aloft, and draws empyreal
air:
Rapt Faith anticipates the
judgment-hour,
When, at the Archangel's call, the
dead shall wake
With frames resuscitated, glorified:
Then, then! in strains like these, the
sainted Bard,
Conspicuous mid salvation's earth-
born heirs,
Shall join harmoniously the
heavenly choir,
And sing the Saviour's praise in
endless bliss.
ANACREONTIC.
——————
Still, as the fleeting seasons
change,
From joy to joy poor mortals range,
And as the year pursues its round,
One pleasure's lost, another found;
Time, urging on his envious course,
Still drives them from their last
resource.
So butterflies, when children chase
The gaudy prize with eager pace,
On each fresh flower but just alight,
And, ere they taste, renew their
flight.
Thanks to kind Fortune! I possess
A constant source of happiness,
And am not poorly forced to live
On what the seasons please to give.
Let clouds or sunshine vest the
pole,
What care I, while I quaff the bowl?
In that secure, I can defy
The changeful temper of the sky.
No weatherglass, or if I be,
Thou, Bacchus! art my Mercury.
ANACREONTIC.
——————
Let us, my Friends, our mirth
forbear,
While yonder Censor mounts the
chair:
His form erect, his stately pace,
His huge, white wig, his solemn
face,
His scowling brows, his ken severe,
His haughty pleasure-chiding sneer,
Some high Philosopher declare:—
Hush! let us hear him from the
chair:
'Ye giddy youths! I hate your mirth;
How ill-beseeming sons of earth!
Know ye not well the fate of man?
That death is certain, life a span?
That merriment soon sinks in
sorrow,
Sunshine to-day, and clouds to-
morrow?
Hearken then, fools! to Reason's
voice,
That bids ye mourn, and not
rejoice?'
Such gloomy thoughts, grave Sage!
are thine,
Now, gentle Friends! attend to
mine.
Since mortals must die,
Since life's but a span,
'Tis wisdom, say I,
To live while we can,
And fill up with pleasure
The poor little measure.
Of fate to complain
How simple and vain!
Long faces I hate;
They shorten the date.
My Friends! while ye may,
Be jovial to-day;
The things that will be
Ne'er wish to foresee;
Or, should ye employ
Your thoughts on to-morrow,
Let Hope sing of joy,
Not Fear croak of sorrow.
But see! the Sage flies, so no more.
Now, Friends! drink and sing, as
before.
ANACREONTIC.
——————
Why must Poets, when they sing,
Drink of the Castalian spring?
Sure 'tis chilling to the brain;
Witness many a modern strain:
Poets! would ye sing with fire,
Wine, not water, must inspire.
Come, then, pour thy purple
stream,
Lovely Bottle! thou'rt my theme.
How within thy crystal frame
Does the rosy nectar flame!
Not so beauteous on the vine
Did the clustering rubies shine,
When the potent God of day
Fill'd them with his ripening ray;
When with proudness and delight
Bacchus view'd the charming sight.
Still it keeps Apollo's fires;
Still the vintage-God admires.
Hail sweet antidote of wo!
Chiefest blessing mortals know!
Nay, the mighty powers divine
Own the magic force of wine.
Wearied with the world's affairs,
Jove himself, to drown his cares,
Bids the nectar'd goblet bear:
Lo! the youthful Hebe fair
Pours the living draught around;—
Hark! with mirth the skies resound.
'Tis to wine, for aught I know,
Deities their godship owe;
Don't we mortals owe to wine
Manhood, and each spark divine?
Say, thou life-inspiring Bowl,
Who thy heavenly treasure stole?
Not the hand that stole Jove's fire
Did so happily aspire;
Tell the lucky spoiler's name,
Worthy never-dying fame.
Since it must a secret be,
Him I'll praise, in praising thee.
Glory of the social treat!
Source of friendly converse sweet!
Source of cheerfulness and sense,
Humour, wit, and eloquence,
Courage and sincerity,
Candour and philanthropy!
Source of—O thou bounteous wine!
What the good that is not thine?
Were my nerves relax'd and low?
Did my chill blood toil on slow?
When thy spirit through me flows,
How each vital function glows!
Tuned, my nerves, no longer coy,
Answer to the touch of joy:
On the steams, that from thee rise,
Time on swifter pinions flies;
Fancy gilds them with her rays;
Hope amid the rainbow plays.
But behold! what Image bright
Rises heavenly to my sight!
Could such wondrous charms adorn
Venus, when from ocean born?
Say, my Julia, is it thou,
Ever lovely, loveliest now?
Yet, methinks, the Cyprian Queen
Comes herself, but takes thy mien.
Goddess! I confess thy power,
And to love devote the hour,
Let me but, with grateful soul,
Greet once more the bounteous
Bowl.
SONG.
——————
Ere Reason rose within my breast,
To enforce her sacred law,
Still would some charm, in every
maid,
My veering passions draw.
But now, to calm those gales of
night,
The morn her light displays;
The twinkling stars no more I view,
For only Venus sways:
The spotless heaven of genuine
love
Unveil'd I wondering see,
And all that heaven, transported,
claim
For Julia and for me.
SONG.
——————
Yes, I could love, could softly yield
To passion all my willing breast,
And fondly listen to the voice
That oft invites me to be blest;
That still, when Fancy, lost in bliss,
Stands gazing on the form divine,
So sweetly whispers to my soul,
O make the heavenly Julia thine!
But hush, thou fascinating voice!
Hence visionary extacy!
Yes, I could love, but ah! I fear
She would not deign to smile on
me.
SONG TO BACCHUS.
——————
Come along, jolly Bacchus! no longer delay;
See'st thou not how the table with bottles is
crown'd?
See'st thou not how thy votaries, impatient to
pay
Their devotion to thee, are all waiting around?
O come then, propitious to our invocation,
To preside of thy rites at the solemnization.
Hark! the voice of Champagne, from its prison
set free,
And the music of glasses that merrily ring,
Thy arrival announce, and invite us to glee;
With what gladness we welcome thee, vine-
crowned King!
To honour thee, Bacchus! we pour a libation,
And the lofty roof echoes our loud salutation.
On that wine-loaded altar, erected to thee,
Sherry, burgundy, claret, invitingly shine;
While all thy rich gifts thus collected we see,
We greet thy munificence boundless, divine.
From these we already inhale animation,
Our hearts and heads warmth, and our souls
elevation.
As thy nectar, kind Bacchus! more copiously
flows,
We purge off the cold dregs that are earthy,
profane;
Each breast with thy own godlike character
glows;
There truth, generosity, happiness reign.
Hail Bacchus! we hail thee in high exultation;
Thou hast blest us, kind God! with thy full
inspiration.
ON SEEING THE APOLLO
BELVIDERE.
——————
What majesty! what elegance and
grace!
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
Bioinformatics In Human Health And Heredity 1st Edition Ranajit Chakraborty
PDF
Adaptive survey design 1st Edition Peytchev
PDF
Survey Samp the Mthds 2nd Edition Arijit Chaudhuri
PDF
Advances in the Study of Behavior 48 1st Edition Marc Naguib
PDF
Bioinformatics for Omics Data Methods and Protocols 1st Edition Maria V. Schn...
PDF
Provenance And Annotation Of Data And Processes Revised Selected Papers Debor...
PDF
Bayesian Ideas And Data Analysis An Introduction For Scientists And Statistic...
PDF
Survey Samp the Mthds 2nd Edition Arijit Chaudhuri
Bioinformatics In Human Health And Heredity 1st Edition Ranajit Chakraborty
Adaptive survey design 1st Edition Peytchev
Survey Samp the Mthds 2nd Edition Arijit Chaudhuri
Advances in the Study of Behavior 48 1st Edition Marc Naguib
Bioinformatics for Omics Data Methods and Protocols 1st Edition Maria V. Schn...
Provenance And Annotation Of Data And Processes Revised Selected Papers Debor...
Bayesian Ideas And Data Analysis An Introduction For Scientists And Statistic...
Survey Samp the Mthds 2nd Edition Arijit Chaudhuri

Similar to Methods Of Microarray Data Analysis Iii Papers From Camda 02 1st Edition Simon M Lin (20)

PDF
Adaptive survey design 1st Edition Peytchev
PDF
Computational Methods For Genetics Of Complex Traits 1st Edition Jay C Dunlap...
PDF
Advances in the Study of Behavior 48 1st Edition Marc Naguib
PDF
Confidence Intervals In Generalized Regression Models 1st Edition Esa Uusipaikka
PDF
Advances in Genetics Volume 92 1st Edition Dunlap
PDF
Stochastic Processes An Introduction 2nd Edition Peter Watts Jones
PDF
Stochastic Processes An Introduction 2nd Edition Peter Watts Jones
PDF
Introduction to Bioinformatics.
PDF
Evaluating and Conducting Research in Audiology 1st Edition Vinaya Manchaiah
PDF
Assessing The Quality Of Survey Data Jrg Blasius Victor Thiessen
PDF
DNA Microarrays Databases and Statistics 1st Edition Alan R. Kimmel
PDF
Survey Samp the Mthds 2nd Edition Arijit Chaudhuri
PDF
Biological Data Mining Chapman Hall Crc Data Mining and Knowledge Discovery S...
PDF
Data Integration In The Life Sciences Second International Workshop Dils 2005...
PDF
Survey Research Methods 5th Edition Floyd J Fowler
PDF
Biological Data Mining Chapman Hall Crc Data Mining And Knowledge Discovery S...
PDF
Stock Identification Methods Applications in Fishery Science 2nd Edition Stev...
PDF
Data Visualization And Annotation Workshop at Biocuration 2015
PPTX
We need to solve more that just our access problems
PDF
Bioinformatics Research And Applications 13th International Symposium Isbra 2...
Adaptive survey design 1st Edition Peytchev
Computational Methods For Genetics Of Complex Traits 1st Edition Jay C Dunlap...
Advances in the Study of Behavior 48 1st Edition Marc Naguib
Confidence Intervals In Generalized Regression Models 1st Edition Esa Uusipaikka
Advances in Genetics Volume 92 1st Edition Dunlap
Stochastic Processes An Introduction 2nd Edition Peter Watts Jones
Stochastic Processes An Introduction 2nd Edition Peter Watts Jones
Introduction to Bioinformatics.
Evaluating and Conducting Research in Audiology 1st Edition Vinaya Manchaiah
Assessing The Quality Of Survey Data Jrg Blasius Victor Thiessen
DNA Microarrays Databases and Statistics 1st Edition Alan R. Kimmel
Survey Samp the Mthds 2nd Edition Arijit Chaudhuri
Biological Data Mining Chapman Hall Crc Data Mining and Knowledge Discovery S...
Data Integration In The Life Sciences Second International Workshop Dils 2005...
Survey Research Methods 5th Edition Floyd J Fowler
Biological Data Mining Chapman Hall Crc Data Mining And Knowledge Discovery S...
Stock Identification Methods Applications in Fishery Science 2nd Edition Stev...
Data Visualization And Annotation Workshop at Biocuration 2015
We need to solve more that just our access problems
Bioinformatics Research And Applications 13th International Symposium Isbra 2...
Ad

Recently uploaded (20)

PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
Classroom Observation Tools for Teachers
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Computing-Curriculum for Schools in Ghana
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
RMMM.pdf make it easy to upload and study
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
Pharma ospi slides which help in ospi learning
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Institutional Correction lecture only . . .
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Microbial diseases, their pathogenesis and prophylaxis
A systematic review of self-coping strategies used by university students to ...
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Classroom Observation Tools for Teachers
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Computing-Curriculum for Schools in Ghana
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
2.FourierTransform-ShortQuestionswithAnswers.pdf
O7-L3 Supply Chain Operations - ICLT Program
Pharmacology of Heart Failure /Pharmacotherapy of CHF
RMMM.pdf make it easy to upload and study
Anesthesia in Laparoscopic Surgery in India
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Pharma ospi slides which help in ospi learning
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Institutional Correction lecture only . . .
human mycosis Human fungal infections are called human mycosis..pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Ad

Methods Of Microarray Data Analysis Iii Papers From Camda 02 1st Edition Simon M Lin

  • 1. Methods Of Microarray Data Analysis Iii Papers From Camda 02 1st Edition Simon M Lin download https://ebookbell.com/product/methods-of-microarray-data- analysis-iii-papers-from-camda-02-1st-edition-simon-m-lin-2261042 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Methods Of Microarray Data Analysis Iv 1st Edition Shoemaker https://ebookbell.com/product/methods-of-microarray-data-analysis- iv-1st-edition-shoemaker-2162142 Methods Of Microarray Data Analysis V 1st Edition Raphael D Isokpehi Auth https://ebookbell.com/product/methods-of-microarray-data- analysis-v-1st-edition-raphael-d-isokpehi-auth-4211780 Methods Of Microarray Data Analysis Ii Papers From Camda 01 1st Edition Simon M Lin https://ebookbell.com/product/methods-of-microarray-data-analysis-ii- papers-from-camda-01-1st-edition-simon-m-lin-4591990 New Theory Of Discriminant Analysis After R Fisher Advanced Research By The Feature Selection Method For Microarray Data 1st Edition Shuichi Shinmura Auth https://ebookbell.com/product/new-theory-of-discriminant-analysis- after-r-fisher-advanced-research-by-the-feature-selection-method-for- microarray-data-1st-edition-shuichi-shinmura-auth-5741792
  • 3. Microarray Analysis Of The Physical Genome Methods And Protocols 1st Edition Jonathan R Pollack Auth https://ebookbell.com/product/microarray-analysis-of-the-physical- genome-methods-and-protocols-1st-edition-jonathan-r-pollack- auth-2531292 Methods Of It Project Management 4th Edition Jeffrey L Brewer https://ebookbell.com/product/methods-of-it-project-management-4th- edition-jeffrey-l-brewer-46805496 Methods Of Partial Deafness Treatment Henryk Skaryski Piotr H Skaryski https://ebookbell.com/product/methods-of-partial-deafness-treatment- henryk-skaryski-piotr-h-skaryski-47202830 Methods Of The Policy Process Taylor Francis Group https://ebookbell.com/product/methods-of-the-policy-process-taylor- francis-group-47203084 Methods Of Mathematical Physics Classical And Modern Alexey N Karapetyants https://ebookbell.com/product/methods-of-mathematical-physics- classical-and-modern-alexey-n-karapetyants-47277812
  • 8. METHODS OF MICROARRAY DATA ANALYSIS III Papers from CAMDA ‘02 edited by Kimberly F. Johnson Cancer Center Information Systems Duke University Medical Center Durham, NC Simon M. Lin Duke Bioinformatics Shared Resource Duke University Medical Center Durham, NC KLUWER ACADEMIC PUBLISHERS NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
  • 9. eBook ISBN: 0-306-48354-8 Print ISBN: 1-4020-7582-0 Print ©2003 Kluwer Academic Publishers All rights reserved No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher Created in the United States of America Boston ©2004 Springer Science + Business Media, Inc. Visit Springer's eBookstore at: http://www.ebooks.kluweronline.com and the Springer Global Website Online at: http://www.springeronline.com
  • 10. Contents Contributing Authors Preface Introduction SECTION I TUTORIALS THE BIOLOGY BEHIND GENE EXPRESSION: A BASIC TUTORIAL MICHAEL F. OCHS AND ERICA A. GOLEMIS MONITORING THE QUALITY OF MICROARRAY EXPERIMENTS KEVIN R. COOMBES, JING WANG, LYNNE V. ABRUZZO OUTLIERS IN MICROARRAY DATA ANALYSIS RONALD K. PEARSON, GREGORY E. GONYE, AND JAMES S. SCHWABER SECTION II BEST PRESENTATION AWARD ORGAN-SPECIFIC DIFFERENCES IN GENE EXPRESSION AND UNIGENE ANNOTATIONS DESCRIBING SOURCE MATERIAL DAVID N. STIVERS, JING WANG, GARY L. ROSNER, AND KEVIN R. COOMBES SECTION III ANALYZING IMAGES ix xi 1 7 9 25 41 57 59 73
  • 11. vi Methods of Microarray Data Analysis III CHARACTERIZATION, MODELING, AND SIMULATION OF MOUSE MICROARRAY DATA DAVID S. LALUSH TOPOLOGICAL ADJUSTMENTS TO GENECHIP EXPRESSION VALUES ANDREY PTITSYN SECTION IV NORMALIZING RAW DATA COMPARISON OF NORMALIZATION METHODS FOR CDNA MICROARRAYS LILING WARREN, BEN LIU SECTION V CHARACTERIZING TECHNICAL AND BIOLOGICAL VARIANCE SIMULTANEOUS ASSESSMENT OF TRANSCRIPTOMIC VARIABILITY AND TISSUE EFFECTS IN THE NORMAL MOUSE SHIBING DENG, TZU-MING CHU, AND RUSS WOLFINGER HOW MANY MICE AND HOW MANY ARRAYS? REPLICATION IN MOUSE CDNA MICROARRAY EXPERIMENTS XIANGQIN CUI AND GARY A. CHURCHILL BAYESIAN CHARACTERIZATION OF NATURAL VARIATION IN GENE EXPRESSION MADHUCHHANDA BHATTACHARJEE, COLIN PRITCHARD, MIKKO J. SlLLANPÄÄ AND ELJA ARJAS SECTION VI INVESTIGATING CROSS HYBRIDIZATION ON OLIGONUCLEOTIDE MICROARRAYS QUANTIFICATION OF CROSS HYBRIDIZATION ON OLIGONUCLEOTIDE MICROARRAYS LI ZHANG, KEVIN R. COOMBES, LIANCHUN XIAO ASSESSING THE POTENTIAL EFFECT OF CROSS-HYBRIDIZATION ON OLIGONUCLEOTIDE MICROARRAYS SEMAN KACHALO, ZAREMA ARBIEVA AND JIE LIANG WHO ARE THOSE STRANGERS IN THE LATIN SQUARE? WEN-PING HSIEH, TZU-MING CHU, AND RUSS WOLFINGER 75 93 103 105 123 125 139 155 173 175 185 199
  • 12. Methods of Microarray Data Analysis III vii SECTION VII FINDING PATTERNS AND SEEKING BIOLOGICAL EXPLANATIONS BAYESIAN DECOMPOSITION CLASSIFICATION OF THE PROJECT NORMAL DATA SET T. D. MOLOSHOK, D. DATTA, A. V. KOSSENKOV, M. F. OCHS THE USE OF GO TERMS TO UNDERSTAND THE BIOLOGICAL SIGNIFICANCE OF MICROARRAY DIFFERENTIALGENE EXPRESSION DATA RAMÓN DÍAZ-URIARTE, FÁTIMA AL-SHAHROUR, AND JOAQUÍN DOPAZO Acknowledgments Index 209 211 233 249 251
  • 14. Contributing Authors Abruzzo, Lynne V., University of Texas M.D. Anderson Cancer Center, Houston, TX Al-Shahrour, Fátima, Centro Nacional de Investigaciones Oncológicas, (CNIO), (Spanish National Cancer Centre), Madrid, Spain Arbieva, Zarema, University of Illinois at Chicago, Chicago, IL Arjas, Elja, Rolf Nevanlinna Institute, University of Helsinki, Finland Bhattacharjee, Madhuchhanda, Rolf Nevanlinna Institute, University of Helsinki, Finland Chu, Tzu-Ming, SAS Institute, Cary, NC Churchill, Gary A., The Jackson Laboratory, Bar Harbor, Maine Coombes, Kevin R., University of Texas M.D. Anderson Cancer Center, Houston, TX Cui, Xiangqin, The Jackson Laboratory, Bar Harbor, Maine Datta, D., Fox Chase Cancer Center, Philadelphia, PA Deng, Shibing, SAS Institute, Cary, NC Díaz-Uriarte, Ramón, Centro Nacional de Investigaciones Oncológicas, (CNIO), (Spanish National Cancer Centre), Madrid, Spain Dopazo, Joaquín, Centro Nacional de Investigaciones Oncológicas, (CNIO), (Spanish National Cancer Centre), Madrid, Spain Golemis, Erica A., Fox Chase Cancer Center, Philadelphia, PA Gonye, Gregory E., Thomas Jefferson University, Philadelphia, PA Hsieh, Wen-Ping, North Carolina State University, Raleigh, NC Kachalo, Seman, University of Illinois at Chicago, Chicago, IL Kossenkov, A. V., Fox Chase Cancer Center, Philadelphia, PA and Moscow Physical Engineering Institute, Moscow, Russian Federation Liang, Jie, University of Illinois at Chicago, Chicago, IL Liu, Ben, Bio-informatics Group Inc., Cary, NC Moloshok, T. D., Fox Chase Cancer Center, Philadelphia, PA Ochs, Michael F., Fox Chase Cancer Center, Philadelphia, PA Pearson, Ronald K., Thomas Jefferson University, Philadelphia, PA
  • 15. Pritchard, Colin, Fred Hutchinson Cancer Research Centre, Seattle, WA Ptitsyn, Andrey, Pennington Biomedical Research Center Rosner, Gary L., University of Texas M.D. Anderson Cancer Center, Houston, TX Schwaber, James S, Thomas Jefferson University, Philadelphia, PA Sillanpää, Mikko J., Rolf Nevanlinna Institute, University of Helsinki, Finland Stivers, David N., University of Texas M.D. Anderson Cancer Center, Houston, TX Wang, Jing, University of Texas M.D. Anderson Cancer Center, Houston, TX Warren, Liling, Bio-informatics Group Inc., Cary, NC Wolfinger, Russ, SAS Institute, Cary, NC Xiao, Lianchun, The University of Texas MD Anderson Cancer Center, Houston, TX Zhang, Li, The University of Texas MD Anderson Cancer Center, Houston, TX x Contributing Authors
  • 16. Preface As microarray technology has matured, data analysis methods have advanced as well. However, microarray results can vary widely from lab to lab as well as from chip to chip, with many opportunities for errors along the path from sample to data. The third CAMDA conference held in November of 2002 pointed out the increasing need for data quality assurance mechanisms through real world problems with the CAMDA datasets. Thus, the third volume of Methods of Microarray Data Analysis emphasizes many aspects of data quality assurance. We highlight three tutorial papers to assist with a basic understanding of underlying principles in microarray data analysis, and add twelve papers presented at the conference. As editors, we have not comprehensively edited these papers, but have provided comments to the authors to encourage clarity and expansion of ideas. Each paper was peer-reviewed and returned to the author for further revision. We do not propose these methods as the de facto standard for microarray analysis. But rather we present them as starting points for discussion to further the science of micrarray data analysis. The CAMDA conference continues to bring to light problems, solutions and new ideas to this arena and offers a forum for continued advancement of the art and science of microarray data analysis. Kimberly Johnson Simon Lin
  • 18. Introduction A comparative study of analytical methodologies using a standard data set has proven fruitful in microarray analysis. To provide a forum for these comparisons the third Critical Assessment of Microarray Data Analysis (CAMDA) conference was held in November, 2002. Over 170 researchers from eleven countries heard twelve presentations on topics such as data quality analysis, image analysis, data normalization, expression variance, cross hybridization and pattern searching. The conference has evolved in its third year, just as the science of microarrays has developed. While initial microarray data analysis techniques focused on classification exercises (CAMDA ’00), and later on pattern extraction (CAMDA ’01), this year’s conference, by necessity, focused on data quality issues. This shift in focus follows the maturation of microarray technology as the detection of data quality problems has become a prerequisite for data analysis. Problems such as background noise determination, faulty fabrication processes, and, in our case, errors in data handling, were highlighted at the conference. The CAMDA ‘02 conference provided a real-world lesson on data quality control and saw significant development of the cross-hybridization models. In this volume, we present three tutorial chapters and twelve paper presentations. First, Michael Ochs and Erica Golemis present a tutorial called “The Biology Behind Gene Expression.” This discussion is for non- biologists who want to know more about an intelligent machine called the cell. This machinery is extremely complex and a glossary in this tutorial provides the novice with an overview of important terms related to microarrays while the rest of the paper details the biological processes that impact microarray analysis. Next is a tutorial on methods of data quality control by Kevin Coombes. We invited Dr. Coombes to submit this tutorial,
  • 19. 2 Introduction titled “Monitoring the Quality of Microarray Experiments,” as an expansion of his presentation at the conference, which is also prominently featured. The last tutorial is by Ronald Pearson titled “Outliers in Microarray Data Analysis.” This tutorial addresses the issue of quality control by identifying outliers and suggesting methods to deal with technical and biological variations in microarray data. As always, we are happy to highlight the paper voted by attendees as the Best Presentation. This year, the award went to: David N. Stivers, Jing Wang, Gary L. Rosner, and Kevin R. Coombes University of Texas M.D. Anderson Cancer Center, Houston, TX “Organ-specific Differences in Gene Expression and UniGene Annotations Describing Source Material” for their rigorous scrutiny of data quality before starting data analysis. Presented by Kevin Coombes, their paper not only revealed the existence of errors in the Project Normal data set, but also specified the exact nature of the problems and included the methods used to detect these problems. See below for more details on these data set errors. CAMDA 2002 Data Sets The scientific committee chose two data sets for CAMDA ‘02. The first, called Project Normal came from The Fred Hutchinson Cancer Center and it showed the variation of baseline gene expression levels in the liver, kidney and testis of six normal mice. By using a 5406-clone spotted cDNA microarray, Pritchard et al. concluded that replications are necessary in microarray experiments. The second data set came from the Latin Square Study at Affymetrix Inc. This benchmark data set was created to develop statistical algorithms for microarrays. Sets of fourteen genes with known concentrations were spiked into a complex background solution and hybridized on Affymetrix chips. Data was obtained with replicates and both Human and E. Coli chips were studied. As mentioned above, there were errors in the Project Normal dataset that were undetected until CAMDA abstracts were submitted. Once we received the Stivers et al. abstract, we asked the original Project Normal authors to confirm their findings. The errors in the data set were verified and after much discussion among the Scientific Committee members, a decision was made to keep the contest going to allow the Stivers group to report and discuss their finding of data abnormalities at the conference. Actually, many groups revealed various aspects of the data abnormalities, but the Stivers group not only realized that a problem existed, but also identified the
  • 20. Methods of Microarray Data Analysis III 3 specific problem. Colin Pritchard, representing Project Normal, confirmed that indeed, the problems in the dataset were a result of incorrectly merging the data with the annotations, resulting in mismatched row/column combinations. In addition, several slides have a small number of misaligned grids. These problems affected about 1/3 of the genes (though different sets) in the testis and liver data. Pritchard also noted that a re-analysis ofthe data with the corrected data sets showed that the results were not notably different from the original conclusions. For the record, both the original and the corrected data sets are available at the CAMDA conference website for researchers who might be interested in “data forensics”. We extend our thanks to Colin Pritchard, Li Hsu, and Peter Nelson at the Fred Hutchinson Cancer Center for their assistance and professionalism in handling this discovery and allowing the conference to proceed as planned. They were most gracious in their contributions to the conference. Organization of this Volume After presenting the three tutorial papers, naturally the first conference paper is the one voted as Best Presentation. We then divide the book into subject areas covering image analysis, data normalization, variance characterization, cross hybridization, and finally pattern searching. At the end of this introduction, you will also find a link to the web companion to this volume. Analyzing Images Raw microarray data first exists as a scanned image file. Differences in spot size, non-uniformity of spots, heterogeneous backgrounds, dust and scratches all contribute to variations at the image level. In Chapter 5, David Lalush characterizes such parameters and discusses ways to simulate additional microarray images for use in developing image analysis algorithms. On the Affymetrix platform, hybridization operators have observed that the images tend to form some kind of mysterious pattern. In Chapter 6, Andrey Ptitsyn argues that there indeed is a background pattern. He further postulates that the pattern might be caused by the fluid dynamics in the hybridization chamber.
  • 21. 4 Introduction Normalizing Raw Data Normalization has been recognized as a crucial step in data pre- processing. Do some mathematical operations truly allow us to remove the systematic variation that might skew our analysis, or, are we distorting the data to create illusions? The paper by Liling Warren and Ben Lu. investigated seven different ways to normalize microarray data. Results show that normalization has a greater impact than expected on detecting differential expressions: the same downstream detection method can result in 23 to 451 genes, depending on the pre-processing of the data. Suggestions to guide researchers in the normalization process are provided. Characterizing Technical and Biological Variance The project normal paper [PNAS 98:13266-77, 2001] showed us that even for animals under ‘normal’ conditions, gene expression levels do fluctuate from one to the other. This biological variation complicates the final genetic variation we find on the microarray. The microarray can also include technical variations produced during the measurement process. Deng et al. describes a two linear mixed model to assess variability and significance in Chapter 8. By a similar mixed model approach, Cui et al. calculates the necessary number of replicates to detect certain changes. This is of great interest to experimental biologists. Usually, we have limited resources for either total number of microarrays as a financial consideration, or from the limited number of cells we can obtain. The optimal resource allocation formula by Cui et al. lets us answer questions such as: Should we use more mice or more arrays? Should we pool mice? Chapter 9 provides some answers to these questions. Estimating location and scale from experimental measurements has been one of the major themes in statistics. Most of the previous work on microarrays focused on the classification of the expression changes under different conditions. Bhattacharjee et al. investigated the classification of intrinsic biological variance of gene expression. By using a Bayesian framework, the authors support the hypothesis that some genes by nature exhibit highly varied expression. This work is featured in Chapter 10. Investigating Cross Hybridization on Oligonucleotide Microarrays Quantitative binding of genes on the chip surface is a fundamental issue of microarrays [Nature Biotechnology 17:788-792, 1999]. Characterizing
  • 22. Methods of Microarray Data Analysis III 5 specific versus non-specific binding on the chip surface has been important yet under studied. The Affymetrix Latin Square data set provides an excellent opportunity for such studies. All three papers in the next section of this volume address this issue. In 2002, Zhang, Miles, and Aldape developed a free energy model of binding on microarrays [Nature Biotechnology 2003, In Press]. This mechanistic model is a major development since the Li-Wong statistical model [PNAS 98:31-36, 2001] for hybridization. At CAMDA ‘02, Zhang et al. demonstrated an application of this model to identify spurious cross-hybridization signals. Their work is highlighted in Chapter 11. Kachalo et al. suggest in Chapter 12 that a match of 7 to 8 nucleotides could potentially contribute to non-specific binding. This non-specific binding provides clues to the interpretation of hybridization results but also assists in the future design of oligos. The same cross-hybridization issue also caught the attention of statisticians. Hsieh et al. exposed cross-hybridization problems by studying outliers in the data set. In this case, the cause of cross-hybridization was found to be fragments matching the spiked-in genes. In summary, these three papers expound on potential cross-hybridization problems on microarrays and suggest some solutions. Finding Patterns and Seeking Biological Explanations The final section of this volume focuses on utilizing the Gene Ontology (GO) as an explanatory tool, though the final two papers differ in how to group the genes. Moloshok et al. modified their Bayesian decomposition algorithm to identify the patterns in gene expression and to specify which gene belongs to which pattern. The Bayesian framework not only allows encoding prior information in a probabilistic way, but also naturally allows genes to be assigned to multiple classes. In the final chapter, Diaz-Uriarte et al. use GO to obtain biological information about genes that are differentially expressed between organs in the Project Normal data set. The techniques incorporate a number of statistical tests for possible identification of altered biochemical pathways in different organs. Summary The CAMDA ’02 conference again brought together a diverse group of researchers who provide many new perspectives to the study of microarray data analysis. At previous CAMDA conferences we studied expression patterns of yeast and cancers. The most recent CAMDA conference took a step back to find variations in the data and possible problems. Our next
  • 23. 6 Introduction CAMDA conference (’03) will focus on data acquired at different academic centers and the problems in combining that data. Web Companion Additional information for many of these chapters can be found at the CAMDA website, where links to algorithms, color versions of several figures, and conference presentation slides can be found. Information about future CAMDA conferences is available at this site as well. Please check the website regularly for the call for papers and announcements about the next conference. http://camda.duke.edu
  • 26. 1 THE BIOLOGY BEHIND GENE EXPRESSION: A BASIC TUTORIAL Michael F. Ochs* and Erica A. Golemis Division of Basic Science, Fox Chase Cancer Center, Philadelphia, PA Abstract: Key words: Microarrays measure the relative levels of gene expression within a set of cells isolated through an experimental procedure. Analysis of microarray data requires an understanding of how the mRNA measured with a microarray is generated within a cell, how it is processed to produce a protein that carries out the work of the cell, and how the creation of the protein relates to the changes in the cellular machinery, and thus to the phenotype observed. Transcription, translation, signaling pathway, post-translational modification 1. INTRODUCTION This tutorial will introduce key concepts on cellular response and transcriptional activation to non-biologists with the goal of providing a context for the use and study of microarrays. The field of gene regulation, including both transcriptional control (the regulation of the conversion of DNA to messenger RNA) and translational control (the regulation of the conversion of messenger RNA to protein), is immense. Although not yet complete, our understanding of these areas is deep and this tutorial can only touch upon the basics. A deeper understanding can be gained through following the references given within the text, which are focused primarily on recent reviews, or through one of the standard textbooks, such as Molecular Biology of the Cell [Alberts et al., 2002]. It should be noted that we are focusing on eukaryotic animal organisms within this tutorial, and not all aspects of the processes discussed will apply to plants or prokaryotes, such as bacteria. * author to whom correspondence should be addressed
  • 27. 10 Ochs and Golemis The typical microarray experiment aims to distinguish the differences in gene expression between different conditions. These can be between different time points during a biological process, or between different tissue or tumor types. While standard statistical approaches can provide an estimate of the reliability of the observed changes in the messenger RNA levels of genes, interpretation and understanding of the significance of these changes for the biological system under study requires far more detailed knowledge. Of particular importance is the issue of nonlinearity in biological systems. For any analysis beyond the simple comparison of measurements between two conditions, the fact that the system being studied is a dynamic nonlinear system implies that the analysis must take into account the interactions present in the system. This is only possible if the system is understood at a non-superficial level. The nomenclature surrounding the biological structures and processes involved in transcription and translation is substantial. The following table contains a glossary ofterms that can be referenced as needed. Table 1. A glossary of key terms used for describing cellular processes of transcription and translation. For human genes and proteins, the convention is to use capital italics (e.g., BRCA1) for the gene and capitals for the protein (e.g., BRCA1). Unfortunately the conventions are organism specific. Transcription DNA GATC gene transcription RNA promoter enhancer transcriptional activator transcriptional suppressor chromatin chromatin remodeling GLOSSARY deoxyribonucleic acid, a polar double-stranded molecule used for the fundamental storage ofhereditary information and located in the nucleus (polarity is defined as 5’ – 3’) guanine, adenine, thymidine, cytidine: the 4 “bases” that comprise the DNA strands fundamental DNA unit for production of a protein, transcribed along a specific DNA sequence the process by which a gene encoded in DNA is converted into single-strandedpre-mRNA ribonucleic acid, a single stranded molecule with multiple uses within the cell a region of DNA proximal to the 5’ end of a gene, where proteins bind to initiate transcription of a gene a region of DNA where binding of transcriptional regulatory proteins alters the level of transcription of a gene a protein or protein complex that binds to a promoter or enhancer to induce transcription of a gene a protein or protein complex that binds to a promoter or enhancer to block transcription of a gene higher-order compaction structure for DNA, based on assembly of DNA around histones in nucleosomes a process wherein the phasing of nucleosomes is altered, altering the access of transcription factors to the DNA
  • 28. Methods of Microarray Data Analysis III 11 histones coactivator basal transcriptional apparatus mRNA rRNA tRNA codon mRNA – associated terms exon intron pre-mRNA splicing poly-adenylation splice variants mRNA export receptor mRNA export sequence miRNA siRNA Translation amino acid GLOSSARY proteins that provide structural elements for the creation of chromatin proteins that modify histones allowing chromatin remodeling a protein complex that binds to a transcriptional initiation site and performs conversion of DNA into mRNA messenger RNA, a form of RNA created from DNA generally containing a 5’ untranslated sequence, a protein coding region, and a 3’ untranslated sequence ribosomal RNA, a form of RNA that associates in a complex with ribosomal proteins, which constitute the structural elements of the ribosome transfer RNA, the form of RNA that recognizes individual codons on an mRNA and brings the amino acid specified by that codon for incorporation into a polypeptide a three nucleotide sequence (RNA or DNA) that specifies a specific amino acid to insert in a protein a portion of genomic DNA encoding a sequence of amino acids to be incorporated in a protein (coding region) a portion of genomic DNA that does not code for a sequence of amino acids to be placed in a protein, but intervenes between exons within a gene (noncoding region) the mRNA polymer that is initially transcribed from DNA, which contains both exons and introns the process by which an mRNA is modified to remove introns and sometimes vary the exons included in the mRNA prior to translation the addition of approximately 20-50 adenine residues to the end of an mRNA sequence different mRNAs created from the same DNA sequence through different splicing a protein that transports the mRNA from the nucleus to the cytoplasm a portion of mRNA that is recognized by an export receptor allowing the mRNA to be exported from the nucleus micro-RNA, a short RNA (<22 nucleotides) encoded within the genome that can bind to a specific mRNA and block its translation into a protein or alter its stability small interfering RNA, a short, artificial RNA designed to bind a specific mRNA and block its translation into a protein, this is now widely used for gene knockout studies one of 20 specific molecules used within all living creatures to construct proteins, containing a common core (N-C- COOH) and varying sidechains attached to the central carbon
  • 29. 12 Ochs and Golemis amino acid residue polypeptide chain protein ribosome chaperone Proteins dimer homodimer heterodimer kinase phosphatase scaffolding protein ubiquitination proteasome Signaling signal signal transduction signaling activator signaling inhibitor receptor ligand signaling pathway GLOSSARY the term used to refer to an amino acid following insertion within a polypeptide chain a series of linked amino acid residues produced during the process of translation a completed polypeptide chain that folds into a three dimensional structure to provide a cellular function specified by a gene a complex of proteins and rRNA that builds a polypeptide chain based on the codons in an mRNA a large molecular machine that assists a polypeptide chain in proper folding into a protein a small complex formed by the non-covalent association of two proteins a dimer that contains two identical subunits (i.e., is made from two copies of the same protein) a dimer that contains two different subunits (i.e., is made from two different proteins) a protein that adds a phosphate group to specific amino acid residues in a protein in a process called phosphorylation a protein that removes a phosphate group from specific amino acid residues in a protein a protein that binds multiple other proteins into a specific conformation, enhancing or otherwise controlling their interactions the addition of one or more copies of ubiquitin (a short peptidyl sequence) to a protein, either targeting a protein for degradation or contributing to control ofprotein localization a very large protein complex that degrades ubiquitinated proteins an abstraction ofprotein modifications indicating the transmission of information within a cell the use of programmed, frequently sequential changes in protein interaction, modification, and activity status to transmit information within a cell in a “signaling cascade” a protein that modifies a second protein activating the function of that protein a protein that modifies a second protein suppressing the function of that protein a membrane bound protein that can receive a signal of extracellular origin to activate a signaling cascade a small protein, peptide, or hormone that binds to a cognate receptor to initiate signaling a group of proteins that provide the physical mechanism for a signaling cascade, associated with a specific biological response
  • 30. Methods of Microarray Data Analysis III 13 junction node signaling network Cellular Structures membrane nucleus cytoplasm ER, endoplasmic reticulum Golgi apparatus cytoskeleton ECM, extracellular matrix GLOSSARY a point where multiple signals combine and can be integrated a point where a single signal diverges and can provide input into multiple downstream points a group of signaling pathways linked together at junctions and nodes creating a nonlinear response system a lipid bilayer separating compartments, such as the outer cell membrane that separates a cell from its environment a region separated from the rest of the cell by a membrane and containing the DNA the portion of the cell outside the nucleus and containing numerous smaller structures a membrane-based cellular structure that serves multiple functions including localization of some ribosomes, and processing of membrane-associated and secreted proteins a cellular structure that receives proteins following passage through the ER, providing for additional protein modification and transport the complex of multiple structural proteins (e.g., actin, tubulin) that provide structural integrity to the cell the collection of structural proteins secreted by cells, and forming an external “mesh”, that contributes to cell shape control, survival functions, and signaling processes 2. CELLULAR RESPONSES AND SIGNALING PATHWAYS In order to survive, grow, and interact in complex organisms, cells must interpret signals coming from both the external and internal environments. Most behaviors, and all complex responses, require signaling pathways that encode within the cell the control mechanism for response behaviors. A simple example of such a system is shown in Figure 1. The yeast Saccharomyces cerevisiae has developed a signaling pathway that responds to external signals in order to initiate a mating response [Posas et al., 1998]. The external signal is detected by a transmembrane protein, the receptor Ste2p. When a ligand specific for Ste2p produced by a cell of a different mating type binds to this receptor, a signaling cascade is induced within the cytoplasm of the yeast cell. The conformational change induced by the ligand binding first leads to the cleavage and activation of a G protein, which in turn activates Ste20p, a protein kinase. Ste20p in turn activates Ste11p, which activates Ste7p, which activates Fus3p, which activates a transcription factor, Ste12p. The transcription factor then initiates transcription within the
  • 31. 14 Ochs and Golemis nucleus leading to the creation of messenger RNA (mRNA). Only the end result of this cascade is measured with a microarray. A set of proteins functionally linked in a single pathway is described as a signaling cascade or pathway, with the process of movement of information through a signaling cascade described as signal transduction. It is immediately obvious that the cell is going to a great deal of trouble to respond to a simple signal, introducing several intermediaries between signal (ligand binding) and response (transcription). However, one reason for this complexity is that it allows encoding of complex behavior by the creation of signaling networks [Jordan et al., 2000]. A signaling network is created by the intersection of multiple signaling pathways, with these intersections generally occurring at multiple points within each separate pathway. The result is a highly nonlinear system capable of responding in multiple ways to a signal depending on the state of the overall network and the presence of signals affecting different pathways. The structure of a representative network is shown in Figure 2. This is a highly simplified portrayal of an important pathway in human cancer
  • 32. Methods of Microarray Data Analysis III 15 involving the cancer-inducing gene RAS (italics for a gene); see the review by Kolch for a full discussion [Kolch, 2000]. In normal cells, RAS (no italics for a protein) is activated by an external stimulus through the epidermal growth factor receptor and other growth factors, and then it interacts with numerous other proteins to determine a cellular response. There are multiple, distinct receptors that may be activated in response to different stimuli, and each receptor may activate multiple pathways. The network is comprised of junctions, where multiple signals come together (e.g., RAF and RAC in Figure 2), and nodes, where a single signal can branch to multiple pathways (e.g., AKT and RAS in Figure 2). In addition, for any component within a pathway, there can be both activators (often kinases) and inhibitors (often phosphatases). For this diagram, the inhibitors and activators are not separated. Additionally, there is generally feedback within the network, either through proteins created as a result of the transcriptional activators (circles in Figure 2) or directly from loops in the network (AKT to RAF together with AKT to RAC to PAK to RAF). A junction can therefore represent a point where the result of one pathway’s activation is the activation of a kinase, while the result of a second pathway’s activation is activation of a phosphatase, both targeted at the same protein. A node may be a kinase that phosphorylates multiple proteins.
  • 33. 16 Ochs and Golemis In order for kinases and phosphatases to operate, they must be in close proximity to the target protein. This adds an additional degree of complexity, as cellular localization becomes a key issue. While some localization signals are encoded by the signaling proteins themselves, a set of proteins, known as scaffolding proteins (e.g., Ste5 in Figure 1), has evolved to keep the necessary components of a signaling pathway together [Tzivion et al., 2001]. Scaffolding proteins may not always be absolutely necessary for signal transduction, however they increase the efficiency of signaling. And, as noted above, a signaling network is a complex nonlinear system, so that signal strength can play a major role in the outcome of signal initiation. As multiple signals arrive at junctions, the response of that junction will depend on the relative strength of various signals and the present overall state of the cell (reflected as levels of kinase and phosphatase activity in the local environment, protein interaction profiles, etc.). Each junction then transduces a signal, until a physiological response occurs. This response can include changes in cellular movement, intracellular transport or metabolism, protein state (through degradation or modification), or transcriptional and translational responses. While transcriptional activation is of interest here, it is important to note not only that it is only one of a myriad of potential changes occurring after activation of signaling, but also that transcription is generally a late, downstream indicator of activity. 3. TRANSCRIPTION Since gene expression is a downstream indicator of activity, it is important to understand the basics of how the mammalian cell converts its 3,000,000,000 bases of DNA into usable, small messenger RNA (mRNA). Within cells, DNA is organized into chromatin. The double-stranded DNA is wound into coils around protein complexes, composed of proteins called histones. While it has long been understood that chromatin must be remodeled (i.e. unwound and rewound in a different pattern) as part of initiating transcription, it has only lately become clear that the histones play a larger role in regulation of expression [Berger, 2002]. There are numerous proteins that modify histone structure, through mechanisms including acetylation, methylation, phosphorylation, and ubiquitination of the histone proteins. The proteins that modify histones, called coactivators, are generally recruited to the DNA by transcriptional activators, which bind the DNA directly [Featherstone, 2002]. Transcriptional activators bind the
  • 34. Methods of Microarray Data Analysis III 17 DNA at specific sequences, called promoters and enhancers. Promoters lie upstream of the gene, usually within a few kilobases of the start site, while enhancers can occur both upstream and downstream from the gene and can lie farther from the gene itself. The overall machinery governing transcription is still an area of very active study. The signal to transcribe a gene is first transduced to the nucleus through a signaling pathway activating a transcription factor. This can be done directly by proteins such as nuclear receptors, which are capable of directly entering the nucleus and initiating transcription [Dilworth et al., 2001] following their activation in the cytoplasm, or indirectly by propagation of signals through a chain of protein interactions culminating in activation of a nuclear transcription factor. The transcription factor must then activate the basal transcriptional apparatus, which includes the key TATA-box binding protein and RNA polymerase II [Green, 2000]. Once the transcriptional activator is active in the nucleus, it binds to DNA at its promoter. The transcription factor then recruits the coactivators, which leads to chromatin remodeling, and activates the transcriptional apparatus, following which transcription begins. The transcription process is itself complex, requiring unwinding and opening of the DNA and translocation of the complex along the DNA [Reines et al., 1999], but we will not detail these issues here. While much is shared between prokaryotic organisms, such as bacteria, and eukaryotic organisms, such as ourselves, there are significant differences in the transcriptional apparatus and structure of DNA. Within prokaryotes the genes are arranged on the chromosomes in a linear fashion, each gene comprised of a continuous DNA sequence that is transcribed into a corresponding mRNA sequence. There tends to be only short sequences of DNA between coding regions, and these sequences include the promoters. Within higher eukaryotes, a gene generally contains both introns (sequences which are not contained in the final mRNA) and exons (sequences that remain in the final mRNA). Furthermore, the intergenic DNA becomes significantly larger, with tens to hundreds of kilobases between transcribed units, and individual genes spread over considerable distances. The mRNA of a gene is transcribed first as a long sequence containing both introns and exons, and then spliced to a continuous message comprised only of exons [Sharp, 1994]. This introduces the possibility of variant proteins encoded within a single “gene” in the DNA through splice variants, where different exons are combined to produce a final mRNA transcript. The spliced mRNA must then be transported out of the nucleus for translation into a polypeptide. This is done through a highly conserved export receptor that appears to be coupled to the mRNA splicing machinery upstream [Reed et al., 2002]. The export mechanism is not well understood,
  • 35. 18 Ochs and Golemis but it appears that mRNAs have multiple export paths that depend on specific export sequences encoded within the mRNA [Stutz et al., 1998]. An additional step in the transcriptional machinery was discovered recently. The existence of small lengths of RNA capable of silencing the translation of mRNA were first noted in plants, and later confirmed to be present in all organisms [McManus et al., 2002]. Essentially, functional codes for extremely short lengths of RNA appear to be part of our genetic structure. These micro RNAs (miRNAs) get converted within our cells to approximately 21 base-pair single stranded RNA units that complement sequences in the noncoding regions of mRNA. The binding of the single stranded RNA to the mRNA targets the mRNA for destruction. Effectively this “silences” the gene by blocking translation. 4. TRANSLATION Following export, mRNA must then be transported to the ribosomes for translation into a protein. Ribosomes are large complexes made from ribosomal RNA (rRNA), another form of RNA present within the cell, and ribosomal proteins. Ribosomes comprise two subunits that clamp onto the mRNA chain and process it in a linear fashion [Ramakrishnan, 2002], as shown in Figure 3. Within an mRNA, “triplets” of nucleotides, termed
  • 36. Methods of Microarray Data Analysis III 19 codons, are arranged in sequence to specify the amino acids and their order for the protein encoded by the gene. The ribosome contains three motifs that bind transfer RNA (tRNA), yet another form of RNA existing within the cell. Each tRNA comprises an “anticodon”, three bases keyed to bind three mRNA bases, and a structure to bind a specific amino acid. The three binding sites on the ribosome, labeled A, P, and E, position the tRNAs to test a match to the mRNA template, then transfer a growing polypeptide chain, and release the tRNA. The tRNA is first matched to the mRNA at the A site. The existing polypeptide chain that is bound to the tRNA at the P site is transferred to the amino acid on the tRNA on the A site, then this tRNA translocates to the P site, while the tRNA on the P site translocates to the E site. It is released as the next tRNA binds to the now vacant A site. In this way, the ribosome builds a protein translated from the codons encoded in the mRNA. When completed, the ribosome releases a polypeptide chain that must then fold to become a functional protein. Protein folding generally involves a chaperone, a molecular machine which aids the polypeptide chain in folding into the correct conformation [Zhang et al., 2002]. If the folding fails to produce the correct structure, the cell has ways to target the protein for degradation as well as to unfold and refold the protein. Finally, the protein often requires transport to the correct cellular compartment. For instance, membrane receptors must be transferred from the ribosomes to the membrane; nuclear proteins must be moved from the cytoplasm into the nucleus. The translation of the proteins can occur in multiple locations. Generally, proteins that will remain in the cell are translated by ribosomes in the cytosol, while proteins to be secreted are produced in the endoplasmic reticulum, processed through the Golgi apparatus, and rapidly exported from the cell. 5. PROTEIN ACTIVITY Once the mRNA has been translated into a protein, the subsequent processing, life expectancy, profile of modifications, and means of function of the proteins can be extremely complex, and differs greatly from protein to protein. As proteins accomplish the effective “work” of a cell, understanding the points necessary to insure creation of a competent work “unit” is necessary to begin to think about how expression changes seen in a microarray can result in functional consequences for a cell or organism. For some proteins, primarily those involved in metabolism, this can be straightforward. For most proteins, it is complex.
  • 37. 20 Ochs and Golemis Protein stability varies considerably following translation. Some proteins have extremely long half-lives, of the order of 20 hours or more. Other proteins have extremely short half-lives (~2-3 minutes). This difference in stability reflects the different biological roles of these proteins, which in some cases require activity as a stable structure (for instance, as part of the cytoskeleton providing cellular architecture), but in other cases require rapid turnover (as with proteins critical for execution of a chronologically limited step in cell cycle progression). Further, some proteins exist at different locations within cells, or in association with different partner proteins: based on their location or pattern of association, some intracellular pools of a given protein may be subject to more rapid degradation than others. These differences in control of the lifespan of proteins add to the difficulty in extrapolating protein levels from mRNA levels. In addition to changes in protein levels, the majority of signaling proteins are subject to extensive post-translational modifications that can affect their stability, localization, and activity. For example, the protein initially exists as an inactive precursor cytoplasmic 105 kD (kiloDalton, a measure of mass) form that is modified by covalent attachment of ubiquitin, and then processed to release a 50 kD form that transits to the cell nucleus and activates transcription. A number of other important proteins are similarly cleaved and relocated. Another common form of modification is the attachment of small peptidyl or lipid moieties (e.g., SUMO; myristylation) that again control patterns of localization, association, and stability. However, by far the most common form of protein modification is phosphorylation. The placement of one or more phosphate groups on proteins by one or more kinases, and the subsequent removal of these groups by phosphatases, can have drastic effects on every aspect of protein function, and lies at the core of studies of signal transduction. A rapid screen through the CGAP signaling resource sponsored by the NIH* emphasizes the prevalence of this form of control. Again, different pools of a protein will be differentially phosphorylated, and hence differentially active against different targets, within a cell. For some well-studied proteins it is barely possible to begin to estimate what percentage of an expressed protein pool is active following a given set of stimuli; for the majority, it is currently an open question. Another complicating factor in understanding the activity level of a protein is the fact that many proteins function for part or all of their existence as components of complexes involving other proteins. Some transcription factors only gain their specificity in binding DNA if they heterodimerize with other proteins. The majority of proteins involved in * http://cgap.nci.nih.gov/Pathways
  • 38. Methods of Microarray Data Analysis III 21 signal transduction become activated through association (sometimes sequential, sometimes simultaneous) with a large number of other proteins. Further individual proteins may associate with different sets of partners to have separate activities in different signaling cascades. Hence, the expression of one protein, A, in a cell may well have no functional consequences for output “X”, if the requisite partners for activity in the X pathway are absent, but function well for output “Y”, because partners required for that activity are present. In some cases, these requisite partners may be co-transcriptionally induced with A, and hence predictable by microarray. In other cases, the partners may pre-exist either ubiquitously (in all cell types) or selectively (in some cell types), and not be detectable by methodologies tracing transcriptional induction. Based on two-hybrid or mass spectrometry proteomics efforts using yeast as a model system, it is clear that the majority of proteins are engaged in many interactions, with current estimates suggesting in excess of 10, for any given protein. This is likely to be an underestimate.
  • 39. 22 Ochs and Golemis These points of control are briefly summarized in Figure 4. In fact, this description represents a considerable over-simplification of factors regulating protein activity. As one example of interest, it has long been known to cell biologists that cells can have very different responses to specific growth stimuli when on supports that allow them to form three- dimensional organized structures rather than on a flat tissue culture plate. In a number of cases, an appropriate 3-D structure has been shown to be required for expression of an appropriate transcriptional program [DiPersio et al., 1991]. In other cases, formation of appropriate cell-cell contacts can control cellular resistance to drugs and sensitivity to apoptosis [Jacks et al., 2002], in a process that appears to involve regulation of the availability and functionality of signaling proteins and transcription factors. As a key issue in all microarray work is the degree to which in vitro (i.e., cell culture) and in vivo (e.g., tumor) data can be compared, it is important to keep such higher order regulatory mechanisms in mind. 6. ISSUES FOR MICROARRAYS As the above sections indicate, the cell is a complex machine that evolution has guided to allow it to respond to external threats, survive under multiple conditions, and, in multicellular organisms, cooperate for the good of the greater organism. Evolution occurs through the borrowing of function, the recombination of existing functions into new ones, and the copying and modification of existing functions. The result is that proteins, the primary functional components, have multiple roles and multiple partners allowing the cell to vary the response to a stimulus based on other stimuli, the external environment including other cells, and on internal state. Effectively, the cell is a state machine whose response to identical stimuli can vary according to variables beyond the control of any conceivable experiment. The implications for data analysis of microarray data are significant. First, as is clear from the complexities of transcription, translation, and protein activity, it is a hopeless task to use changes in the level of expression of an individual gene as an indicator of changes in the activity of the corresponding protein without additional information [Chen et al., 2002]. Essentially, the changes in expression are not upstream indicators of protein activity but instead comprise downstream indicators of changes in the signaling pathway and cell state. Second, the signaling networks present within cells allow the cell to respond differently to the activation of a given pathway, so that the response to a signal will vary depending on cellular state and the external
  • 40. environment. Cellular state is impacted by factors such as circadian rhythm and metabolic oscillations that cannot be well controlled in the laboratory. Third, the destruction of mRNA can occur in multiple ways. Each of the mRNA species has a typical half-life, which varies between species, and can also be targeted for degradation by small RNAs. These issues lead to the relative concentrations of mRNA being dependent on the timing of mRNA harvest for some subset of the species within a cell. Fourth, it is clear that the cell encodes multiple “RNA genes” within a single “DNA gene” through mRNA splicing. Depending on the specific sequence spotted onto an array or grown onto a GeneChip, the hybridization recorded may represent only one of these variants or some or all of these variants. However, splice variation changes function just as protein modification does. Together these biological realities greatly complicate the analysis of microarray data. Essentially, downstream gene expression is highly correlated, with pathway activation leading to multiple genes being expressed, so that expression levels of different genes are not independent. In addition, each set of genes will involve genes included in multiple functions, and that therefore respond to multiple stimuli, so that clusters of genes cannot easily be linked to pathway or function. Since the cellular responses also vary with cellular state, there is a variability linked to unobserved variables. The variation will unfortunately therefore not be stochastic, but instead will represent an underlying systematic variation within the data. With an adequate number of data points, the nonstochastic nature should become obvious, however present microarray studies are data poor in this regard. For many microarray studies, the situation is further complicated by the heterogeneic nature of many in vivo samples. In general, it is not feasible to obtain pure tissue types from tumors or other tissues. The measurements made on a microarray then represent expression arising from multiple tissue types (e.g., tumor and surrounding “normal” tissues) rather than from only the tissue of interest. In summary, biological systems exhibit complex, highly regulated behavior with significant feedback in all aspects and at all points in the response to stimuli, from detection of a signal through activation of a response to creation of the means to respond to the stimuli. Such systems include the ability to block the response at multiple stages (e.g., signal transduction, transcription, translation, protein activation) and to signal between each stage. This makes the systems highly nonlinear and prone to the creation of highly correlated errors in derived data. Analysis of the data should not go forward blind to these realities. Methods of Microarray Data Analysis III 23
  • 41. Alberts, B, Lewis, J, Raff, M, Johnson, A, and Roberts, K (2002) Molecular Biology of the Cell, Ed., Taylor and Francis, Inc., London. Berger, SL (2002) Histone modifications in transcriptional regulation. Curr Opin Genet Dev 12: 142-8. Chen, G, Gharib, TG, Huang, CC, Taylor, JM, Misek, DE, Kardia, SL, Giordano, TJ, Iannettoni, MD, Orringer, MB, Hanash, SM and Beer, DG (2002) Discordant protein and mRNA expression in lung adenocarcinomas. Mol Cell Proteomics 1: 304-13. Dilworth, FJ and Chambon, P (2001) Nuclear receptors coordinate the activities of chromatin remodeling complexes and coactivators to facilitate initiation of transcription. Oncogene 20: 3047-54. DiPersio, CM, Jackson, DA and Zaret, KS (1991) The extracellular matrix coordinately modulates liver transcription factors and hepatocyte morphology. Mol Cell Biol 11: 4405- 14. Featherstone, M (2002) Coactivators in transcription initiation: here are your orders. Curr Opin Genet Dev 12: 149-55. Green, MR (2000) TBP-associated factors (TAFIIs): multiple, selective transcriptional mediators in common complexes. Trends Biochem Sci 25: 59-63. Jacks, T and Weinberg, RA (2002) Taking the study of cancer cell survival to a new dimension. Cell 111: 923-5. Jordan, JD, Landau, EM and Iyengar, R (2000) Signaling networks: the origins of cellular multitasking. Cell 103: 193-200. Kolch, W (2000) Meaningful relationships: the regulation of the Ras/Raf/MEK/ERK pathway by protein interactions. Biochem J 351 Pt 2: 289-305. McManus, MT and Sharp, PA (2002) Gene silencing in mammals by small interfering RNAs. Nat Rev Genet 3: 737-47. Posas, F, Takekawa, M and Saito, H (1998) Signal transduction by MAP kinase cascades in budding yeast. Curr Opin Microbiol 1: 175-82. Ramakrishnan, V (2002) Ribosome structure and the mechanism of translation. Cell 108: 557-72. Reed, R and Hurt, E (2002) A conserved mRNA export machinery coupled to pre-mRNA splicing. Cell 108: 523-31. Reines, D, Conaway, RC and Conaway, JW (1999) Mechanism and regulation of transcriptional elongation by RNA polymerase II. Curr Opin Cell Biol 11: 342-6. Sharp, PA (1994) Split genes and RNA splicing. Cell 77: 805-15. Stutz, F and Rosbash, M (1998) Nuclear RNA export. Genes Dev 12: 3303-19. Tzivion, G, Shen, YH and Zhu, J (2001) 14-3-3 proteins; bringing new definitions to scaffolding. Oncogene 20: 6331-8. Zhang, X, Beuron, F and Freemont, PS (2002) Machinery of protein folding and unfolding. Curr Opin Struct Biol 12: 231-8. 24 Ochs and Golemis ACKNOWLEDGMENTS We thank the National Institutes of Health, National Cancer Institute (CCCG CA06927 to R. Young), the Pennsylvania Department of Health (grants to mfo and eag), and the Pew Foundation for support. REFERENCES
  • 42. 2 MONITORING THE QUALITY OF MICROARRAY EXPERIMENTS Kevin R. Coombes, Jing Wang, Lynne V. Abruzzo University of Texas M.D. Anderson Cancer Center, Houston, TX. A microarray experiment is a complex, multistep process involving biology, chemistry, physics, and bioinformatics. Something can go wrong at every step in the process. In order to obtain good results, one needs a thorough, redundant system to monitor the quality of microarray experiments. In this article, we provide an overview of quality control measures that can be applied at different points during the process of conducting and analyzing microarray experiments. A microarray experiment is a complex, multistep process. Clones of known DNA sequences must be grown, harvested, and spotted in precise locations on the microarray substrate. RNA must be extracted from experimental samples, labeled targets must be produced by reverse transcription, and the targets must be hybridized with a microarray. The hybridized microarray must be scanned to produce a computer image. The image must be loaded into a software package for quantification, a template grid must be precisely aligned, and estimates must be computed for spot intensity and local background. The initial intensity estimates must be background-corrected and normalized. In a project that includes tens or hundreds of individual microarray experiments, the final intensity estimates from each individual microarray experiment must be combined into a single data structure for further analysis. Something can go wrong at every step in the process. Every time a person handles a physical microarray, views an image of a microarray in an microarrays, quality control, process control, acceptance testing Abstract: Key words: 1. INTRODUCTION
  • 43. 26 Coombes et al. image-editing program, explores the quantifications in a spreadsheet or database, or saves and labels a file, there is a potential for errors to occur. To obtain the best possible results, one must devise a thorough, redundant system to monitor the quality of microarray experiments. An illustration of the potential difficulties is provided elsewhere in this volume [Stivers et al., 2003]. In their analysis of the Project Normal data set, Stivers and colleagues found that the UniGene annotations of the spots on the microarrays were inconsistent in the data files supplied for the 2002 conference on the Critical Assessment of Microarray Data Analysis. In response to this finding, the authors of the original Project Normal study [Pritchard et al., 2002] conducted an extremely careful and detailed review of their data quality. During the conference, they confirmed that errors had occurred when merging the quantified expression data into a spreadsheet with the UniGene annotations. They also uncovered minor problems with a small number of quantifications, caused by subgrids that had not been correctly aligned. They prepared a revised data set, which is available from the conference web site (http://www.camda.duke.edu/camda02/contest.asp), thus creating perhaps the most accurate microarray data set currently available. In this article, we provide an overview of quality control measures that can be applied at different points during the process of conducting and analyzing microarray experiments. Microarray experiments undergo a critical phase change during the scanning process: they pass from the physical world of clones, plates, slides, and robots into the virtual, computerized world of bioinformatics. During the physical phase, quality control is driven by the biological and chemical properties of the reagents. The most effective method for maintaining a high quality microarray facility is, in one sense, quite simple: decide on a standard set of protocols, validate them, and follow them religiously. Collections of such protocols can be obtained from the Brown lab at Stanford (http://cmgm.stanford.edu/pbrown/protocols/index.html) or from The Institute for Genomic Research (http://www.tigr.org/tdb/microarray/ protocolsTIGR.shtml) [Hegde et al., 2001]. During the virtual phase of a microarray experiment, both the source of errors and our ability to detect and control them is governed by bioinformatics. Bioinformatics quality can best be maintained by avoiding spreadsheets or general file naming conventions in favor of a specialized database [Brazma et al., 2002]. Data should be stored and transferred in a format that maintains enough detailed structure to 2. QUALITY CONTROL BY BIOLOGISTS
  • 44. Methods of Microarray Data Analysis III 27 recover biological information about both the samples and the genes on the array [Brazma et al., 2001]. In this section, we describe how to monitor the quality of two critical ingredients—the RNA samples and the microarray slides—before the hybridization step. Because assessment of hybridization quality is typically based on the scanned image, we will reserve its discussion to a later section. The basic component of a microarray experiment is the RNA extracted from samples. Before running a microarray experiment, this quality should be tested. If enough total RNA is available, it can be evaluated by agarose gel electrophoresis with ethidium bromide. One expects to see crisp bands from the 28S and 18S ribosomal RNA, in a ratio of two-to-one. When the amount of RNA is limited, as is often the case with samples obtained by fine needle aspiration or laser capture microdissection, as little as 5–500 ng total RNA can be evaluated using a model 2100 Bioanalyzer (Caliper Technologies, Mountain View, CA) [Dunmire et al., 2002]. The second fundamental component of a microarray experiment is the physical array itself. In order to obtain good results, the spots on the array should contain uniform, equivalent amounts of spotted polymerase chain reaction (PCR) products or oligonucleotides. Pre-scanning the slide before using it in a hybridization experiment can verify this property. Good results have been reported using nondestructive staining with a fluorescent dye such as SYBR green II [Battaglia et al., 2000] or with SYTO 61 [Yue et al., 2001]. Alternatively, Shearstone and colleagues have described a method for printing oligonucleotide microarrays spiked with small amounts of dCTP- Cy3 or dCTP-Cy5. The slides can be scanned to verify spot morphology by detecting the spiked products, which are then washed off before hybridizing the microarray with the sample of interest [Shearstone et al., 2002]. Many institutions have established core facilities that manufacture microarrays by spotting either PCR products from cDNA clones or chemically synthesized oligonucleotides on glass slides. Managing, maintaining, and handling large libraries of clones bring their own set of quality control problems. These problems, once again, include a mixture of 2.1 Monitoring RNA quality 2.2 Monitoring physical array quality 2.3 Quality control during array manufacturing
  • 45. 28 Coombes et al. physical difficulties (contamination from nearby wells; suboptimal conditions for the PCR reactions) and bioinformatical challenges (maintaining the correct annotations of the clones). For example, when we resequenced the cDNA clones being printed on microarrays at the Genomics Core Facility at the University of Texas M.D. Anderson Cancer Center, we found an error rate of 21% in the clones supplied by the manufacturer [Taylor et al., 2001]. Our experience strongly suggests the need to sequence- verify the clones at the final stage before printing them on microarray slides. To date, we have found a much lower error rate among the synthesized oligonucleotides printed on arrays at our facility. Nevertheless, we advise randomly selecting a few oligonucleotides for sequence verification. Because slides are printed in batches, we can test the quality of an entire batch using established procedures. For example, we can randomly select a small number of slides from each batch. We then test the quality of these microarrays (destructively) by performing a hybridization experiment. If the selected microarrays pass the test, then the batch is accepted.
  • 46. Methods of Microarray Data Analysis III 29 The number of slides that must be tested (and pass) depends on the batch size, the desired confidence level, and other factors. We use Bayesian methods to make this determination. We want to estimate the proportion of “good” slides in a batch. If we randomly test N slides, then the probability that k of those slides pass the test is described by a binomial distribution We assume a Beta prior distribution, Then standard computations show that the posterior distribution is also a Beta distribution, The Bayesian formulation allows us to make our assumptions about the prior distribution explicit (Figure 1). For instance, if we believe that a single batch is likely to consist either almost entirely of good slides or almost entirely of bad slides, we can choose prior parameters a = b = ½. In this case, testing a single slide can supply enough evidence to determine the quality of the entire batch. If we instead assume a uniform prior on the percentage of good slides, then we will need at least three slides to pass the test in order to accept the batch. One can, of course, make more extensive tests on a few batches to get a better idea of how the quality varies within a batch. This information can be used to make an informed choice of the prior distribution. There are various choices for the standard hybridization experiment used to test the slides. For instance, all clones spotted on microarrays in our Genomics Core Facility share a common sequence (complementary to the primer). We can label copies of the primer with fluorescent dye and hybridize them to the microarray. We can then assess the attachment of the probes at all spots by scanning the microarrays after hybridization [Hu et al., 2002]. Alternatively, one can chose a standard reference material (such as the Universal Reference available from Stratagene Inc., La Jolla, CA) for this purpose [Weil et al., 2002]. A known profile of this reference material can also be used as part of a process control strategy to monitor the quality ofhybridizations over time; we return to this idea later. We now turn to quality control methods that apply to individual microarray experiments just after hybridization and scanning. Some of these methods are used to assess the quality of individual spots. Others are used to decide if the overall quality of the hybridization is low enough to reject it and request that the experiment be repeated. In this section, we discuss methods to assess spot quality. 3. SPOT LEVEL METRICS
  • 47. 30 Coombes et al. Saturation of the fluorescent signal is one of the more common problems affecting the reliability of spot intensity measurements. Including saturated spots in the analysis can severely distort the results, whether detecting differentially expressed genes or using clustering for class discovery [Hsiao et al., 2002]. Most microarray quantification packages provide a measure of signal saturation. The recommended procedure is to remove frequently saturated gene measurements from consideration. If the number of saturated spots varies widely across experiments, it is usually necessary to revise the scanning protocol in an effort to achieve greater consistency. Software quantification packages provide different measures of spot quality. Yidong Chen has developed a package to quantify microarray images that includes several spot quality metrics [Chen et al., 2002]. This package computes quality measures for the fluorescent intensity (quality is essentially the mean signal pixel intensity divided by the standard deviation of the background intensity), the target area (a function of the percentage of the expected spot area occupied by signal above background), the background flatness (a measure of the extent to which the local background exceeds the mean background across a large portion of the array), and the signal intensity consistency (a measure of the variability of the signal pixel intensity). All four of the quality measures are transformed to lie between 0 and 1, and the overall quality of the spot is taken to be the minimum of the four measures. This approach allows analysts to exclude measurements at individual spots from further analysis if the quality is too low. Other approaches to assessing spot quality have been proposed. Wang and colleagues have also developed a quantification software package that provides multiple measures of spot quality [Wang et al., 2001]. Brown and colleagues use the “spot ratio variability”, which divides the pixel-by-pixel standard deviation of the ratio by the mean ratio [Brown et al., 2001]. Tran and colleagues examine the difference between the mean signal intensity and median signal intensity of the pixels within a spot, requiring that they differ by less than 10% at good spots [Tran et al., 2002]. After a hybridized slide has been scanned and quantified, several methods can be applied to assess the quality of the experiment. The immediate goal of these tests is usually to decide rapidly if the experiment needs to be repeated. 4. QUALITY CONTROL OF INDIVIDUAL SLIDES
  • 48. Methods of Microarray Data Analysis III 31 The simplest test to describe is also the hardest to quantify or to automate. Before starting the statistical analysis, we look at the scanned images of the microarray to ensure that there are no obvious gross anomalies. (This step is sometimes referred to as “cortical filtering”; in order to pass the test, the data must be successfully processed by a human cortex.) We visualize the images in MATLAB (The Math Works, Inc., Natick MA), which allows us to alter the color map. Most people do a poor job of discriminating between different shades of gray, and they are even less good at distinguishing colors that shade from black into red or green. False color images prepared in MATLAB can give a good idea of the quality of the spots and of any strange behavior in the background. 4.1 View the images
  • 49. After deciding that the images are adequate, we load the quantification data into S-Plus (Insightful Corp., Seattle WA). We compute kernel density estimates of the distributions of the logarithm of the raw volume intensity estimates and the raw background estimates for each channel (Figure 2). The example here shows a fairly typical distribution pattern. The volume has a high peak with a long right tail. The background has a tighter peak located at about the same point as the volume density, without the extended tail. The background-corrected volume is much more spread out. We interpret these distributions as telling us that a large number of spots are near or below the background levels; the spots that truly show significant gene expression are the ones in the long tail at the right. Any qualitative change in these density plots is taken as an indication of a potential problem. Bimodal peaks in the background plots, for example, usually indicate that large regions are subject to excessively high background noise. They may also indicate that a portion of the grid was improperly aligned during quantification, causing the signal to be counted as background over a large subsection of the array. Obvious differences in the shape of the distributions from the red and green channels typically indicate a problem (poor quality RNA or faulty labeling) with one of the samples. 32 Coombes et al. 4.2 Density estimates
  • 50. Methods of Microarray Data Analysis III 33 We also look at “cartoon” images of the logarithm of the estimated background intensity at each spot (Figure 3). The log transformation allows us to see more detail at lower intensity levels, where the background is more important. In this example, the red channel background is plotted in the top figure and shows few significant features; the most visible glitch consists of a few spots near the left end. As usually happens, the background in the green channel is both slightly higher on average (as reflected in the estimate of the median) and more variable. In this case, the background is substantially higher on the right third of the green channel. 4.3 Topological plots of background 4.4 Signal-to-noise ratio One of the most effective measures for the overall quality of a microarray experiment is the percentage of spots with adequate signal-to-noise ratio (S/N). The precise definition of “adequate” in this setting may vary from platform to platform. On Affymetrix microarrays, for example, the model used in their Microarray Analysis Suite version 5.0 provides detection p-
  • 51. 34 Coombes et al. values along with “present” or “absent” calls for each gene. We typically find that around 40–50% of the genes are present in a successful hybridization on an Affymetrix microarray. On our own two-color fluorescent microarrays, we typically require good spots to have S/N > 2. We have found, for successful hybridizations, that the percentage of spots with S/N > 2 is typically in the range of 30% to 60%. We see a similar range of percentages regardless of the tissue of origin for the RNA sample. We have seen extreme cases where as few as 5% or as many as 95% of the spots pass the signal-to-noise threshold; both extremes are indications that something went wrong during the conduct of the experiment. 5. QUALITY CONTROL WITH REPLICATE SPOTS The microarrays produced in our Genomics Core have the virtue that every gene of interest is spotted on the array in duplicate. The value of printing replicate spots to obtain more accurate expression measurements has been described previously [Ramakrishnan et al., 2002]. Because of the replications, we can estimate measurement variability within the microarray. For each channel, we make scatter plots that compare the first and second member of each pair of replicates (Figure 4, left). More useful, however, are the Bland-Altman plots obtained by rotating the scatter plots by 45 degrees (Figure 4, right). To construct a Bland-Altman plot, we graph the difference between the log intensity of the replicates (whose absolute value is an estimate of the standard deviation) as a function of the mean log intensity. In the typical plots shown here, there is a much wider spread (a “fan” or a “fishtail”) at the left end of the plot (low intensity) and much tighter reproducibility at the right end (high intensity). We can fit a smooth curve to this graph that estimates the variability (in the form of the absolute difference) as a function of the mean intensity. In Figure 4, we have plotted confidence bands at plus and minus three times those smooth curves. Replicates whose difference lies outside these confidence bands are intrinsically less reliable than replicates whose difference lies within the bands [Baggerly et al., 2000; Tseng et al., 2001; Loos et al., 2001].
  • 52. Methods of Microarray Data Analysis III 35 All microarray experiments must include some degree of replication if we are to assign any meaningful statistical significance to the results. In the previous section, we showed how some simple plots can automatically detect spots where replicated spots give divergent measurements. The same method can be used with replicated experiments. This method works with purely technological replicates (separate hybridizations of the same labeled RNA, separately labeling the same RNA, or separate RNA extractions from the same mixture of cells) or with biological replicates (in which RNA samples are independently obtained under similar biological conditions [Novak et al., 2002; Raffelsberger et al., 2002]. When replication occurs later in the process, of course, one expects the scatter plot of the replicates to follow the identity line more closely. Scatter plots of replicate experiments can detect two other common problems. The first problem can occur when merging data into a spreadsheet for analysis. It is all too easy to sort data from different arrays in different orders, which causes the data rows to be mismatched. If the entire array is affected, then a plot of the replicates shows an uncorrelated cloud of points instead of the expected band along the identity line (Figure 5). Mismatching a subset of the rows produces a plot that combines part of the expected band 6. QUALITY CONTROL USING REPLICATE EXPERIMENTS
  • 53. 36 Coombes et al. with a smaller cloud. This phenomenon can also occur when a few of the subgrids on one array were misaligned during quantification (Figure 5). 7. CLUSTERING FOR QUALITY CONTROL In some array studies, the investigators make a clear distinction between class discovery and discrimination. Supervised methods (linear discriminant analysis or support vector machines) are appropriate if the goal of the study is discrimination or diagnosis. Unsupervised methods (hierarchical clustering, principal components analysis, or self-organizing maps) are generally appropriate if the goal of the study is class discovery. In some published studies, unsupervised methods have been used regardless of the goal. The attitude seems to have been: “because our unsupervised method rediscovered known classes, our microarrays are as good at pattern recognition as a diagnostic pathologist.” In our opinion, such an attitude is misguided. Nevertheless, unsupervised classification methods do have an appropriate place in the analysis of microarray data even when the correct classification of the experimental samples is known. The avowed purpose of unsupervised methods is to uncover any structure inherent in the data. If they recover known structure, then that result is hardly surprising. Because the statistical and mathematical properties of these methods have not been fully investigated in the context of microarray data, neither can one draw strong conclusions from the recovery of known structure. As we have pointed out,
  • 54. Methods of Microarray Data Analysis III 37 many things can go wrong during the process of collecting microarray data. If an unsupervised classification method yields deviations from the known structure, these deviations can be quite revealing of problems in data quality. We have successfully used clustering methods to identify many of the problems described in earlier sections of this paper. When the study includes technological replicates of each experiment, we expect the replicates to be nearest neighbors (or at least close neighbors), as portrayed, for instance, in a dendrogram based on a hierarchical cluster analysis. We have seen instances where replicates failed to be neighbors; in every case, the cause of the failure could be traced to a problem with the data. The problems detected using this method have included, among others: All the methods discussed so far deal with quality control rather than process control. The distinction is simple. Quality control is based primarily on inspections. Its goal is to identify components of low quality and reject them. We have described methods to inspect RNA samples and microarray slides before hybridization. We have described various methods to inspect the scanned image of a microarray experiment and determine the quality of the hybridization. We have also looked at methods to inspect the quality of individual spots. In this context, the use of clustering methods plays the role individual arrays where the data rows had been reordered, misalignment of the entire grid or of subgrids, analytical errors where “red – green” differences had accidentally been combined with “green – red” differences, differences between microarrays produced in different print lots, and differences in the dynamic range of signals from different arrays. Plotting the samples using the first two or three components from a principal components analysis (PCA) can also reveal similar anomalies. A striking application of this method can be found elsewhere in this volume [Stivers et al., 2003]. By applying PCA to the individual channels of the Project Normal microarray experiments, we found that the reference channels could be separated based on the tissue of origin of the sample hybridized to the other channel. This finding led us to look more closely at the data. In turn, this led to the discovery that the UniGene annotations in data files from experiments using liver RNA did not agree with the UniGene annotations in data files from experiments using kidney RNA. 8. PROCESS CONTROL
  • 55. Exploring the Variety of Random Documents with Different Content
  • 56. Nor view thee with severe, truth- searching eye, Melting thy fairy visions into air. Thy paradise, delighted, let me rove, There study nature, and with grateful heart, In thy serene, translucent stream behold The light of truth reflected, and the smile Of heaven's benevolence, and in that glass The loveliness of every Virtue woo And every Grace. There let me, too, behold In all her beauty, bright-eyed Poesy, That heavenly Maid who charm'd my youthful heart; And let the love of glory fire my breast; And let me see, to stimulate my powers, The new-born crescent of my fame ascend, While on its pointed horn the Fairy, Hope, On tiptoe stands, fluttering her airy wings To fan its beams and joyful hails the hour When in its full-orb'd glory it shall shine.
  • 57. A SUMMER-EVENING. —————— Come, my dear Love, and let us climb yon hill, The prospect, from its height, will well reward The toil of climbing; thence we shall command The various beauties of the landscape round.— Now we have reached the top. O! what a scene Opens upon the sight, and swallows up The admiring soul! She feels as if from earth Uplifted into heaven. Scarce can she yet Collect herself, and exercise her powers. While o'er heaven's lofty, wide- extended arch, And round the vast horizon, the bold eye Shoots forth her view, with what sublime delight The bosom swells! See, where the God of day, Who through the cloudless ether long has rid On his bright, fiery car, amidst a blaze
  • 58. Of dazzling glory, and in wrath shot round His burning arrows, with tyrannic power Oppressing Nature, now, his daily course Well-nigh completed, toward the western goal Declines, and with less awful majesty Concludes his reign; his flamy chariot hid In floods of golden light that dazzles still, Though less intense. O! how these scenes exalt The throbbing heart! Louisa, canst thou bear These strong emotions? do they not o'erpower Thy tender nerves? I fear, my Love, they do; Those eyes that, late, with transport beam'd so bright, Now veil their rays with the soft, dewy shade Of tenderness. Let us repose awhile; The roots of yonder tree, cover'd with moss, Present a pleasing seat; there let us sit. Hark! Zephyr wakes, and sweetly- whispering, tells The approach of Eve; already Nature feels
  • 59. Her soothing influence, her refreshing breath; The fields, the trees, imbibe the cool, moist air, Their feverish thirst allay, and smile revived. The Soul, too, feels her influence, sweetly soothed Into a tender calm. O! let us now, My loved Louisa! let us now enjoy The landscape's charms, and all the nameless sweets Of this, our favourite hour, for ever dear To Fancy and to Love. Cast round thy sight Upon the altered scene, nor longer fear The dazzling sun; his latest, lingering beams Where are they? can'st thou find them?—see! they gild The glittering top of yonder village- spire; Upon that distant hill they faintly shine; And look! the topmost boughs of this tall oak Majestic, which o'ercanopies our heads, Yet catch their tremulous glimmerings:—now they fade, Fade and expire; and, as they fade, the Moon,
  • 60. The full-orb'd Moon, that seem'd, erewhile, to melt In the bright azure, from the darkening sky Emerging slow, and silent, sheds around Her snowy light, that with the day's last, dim Reflection, from the broad, translucid lake, Insensibly commingles, and unites In sweetest harmony, o'er all the scene Diffusing magic tints, enchanting power. How lovely every object now appears! Each in itself, and how they all combine In one delightful whole! What eye, what heart, O Nature! can resist thy potent charms When thus in soft, transparent shade half-veil'd? Now Beauty and Sublimity, methinks, Upon the lap of Eve, embracing sleep. Mark the tree-tops, my Love, of yonder wood, Whose moonlight foliage fluctuates in the breeze, Say, do they not, in figure, motion, hue,
  • 61. Resemble the sea-waves at misty dawn? What shadowy shape along the troubled lake Comes this way moving? how mysteriously It glides along! how indistinct its form! Imagination views with sweet surprise The unknown appearance— breathless in suspense. The Spirit of the waters can it be, On his aerial car? some fairy Power? Pants not thy heart, Louisa, half- alarm'd? It grows upon the sight,—strange, watery sounds Attend its course;—hark! was not that a voice? O! 'tis a fishing-boat!—its sails and oars I now discern. The church-clock strikes! how loud Burst forth its sound into the startled air, That feels it still, and trembles far around! My dearest Love! it summons us away; The dew begins to fall; let us depart: How sweetly have we spent this evening-hour!
  • 62. PROLOGUE. —————— The piece, to-night, is of peculiar kind, For which the appropriate name is hard to find; No Comedy, 'tis clear; nor can it be, With strictest truth, pronounced a Tragedy; Since, though predominant the tragic tone, It reigns not uniformly and alone; Then, that its character be best proclaim'd, A Tragic-drama let the piece be named. But do not, Critics! rashly hence conclude, 'Tis a mere Farce, incongruous and rude, Where incidents in strange confusion blend, Without connexion, interest, or end: Not so;—far different was the bard's design; For though, at times, he ventures to combine With grave Melpomene's impassion'd strain The gay Thalia's more enlivening vein;
  • 63. (As all mankind with one consent agree How strong the charms of sweet variety,) Yet Reason's path he still with care observes, And ne'er from Taste with wilful blindness swerves, His plot conducting by the rules of art: And, above all, he strives to touch the heart; Knowing that, void of pathos and of fire, Art, Reason, Taste, are vain, and quickly tire. Be mindful then, ye Critics! of the intent; The poet means not here to represent The tragic Muse in all her terrors drest, With might tempestuous to convulse the breast; Nor in her statelier, unrelaxing mien, To stalk, in buskin'd pomp, through every scene; But with an air more mild and versatile,} Where fear and grief, sometimes, admit a smile,} Now loftier, humbler now, the changing style,} Resembling in effect an April-night
  • 64. When from the clouds, by fits, the moon throws forth her light; And louder winds, by turns, their rage appease, Succeeded by the simply-whispering breeze. But, in few words our author ends his plea, Already tending to prolixity, To paint from Nature was his leading aim; Let then, the play your candid hearing claim: Judge it, impartial, by dramatic laws; If good, reward it with deserved applause; If bad, condemn; yet be it still exempt From your severer blame, for 'tis a first attempt. PROLOGUE. —————— Lo! Time, at last, has brought, with tardy flight, The long-anticipated, wish'd-for night; How on this blissful night, while yet remote,
  • 65. Did Hope and Fancy with fond rapture doat! Like eagles, oft, in glory's dazzling sky, With full-stretch'd pinions have they soar'd on high, To greet the appearance of the poet's name, Dawning conspicuous mid the stars of fame. Alas! they soar not now;—the demon, Fear, Has hurl'd the cherubs from their heavenly sphere: Fancy, o'erwhelm'd with terror, grovelling lies;— The world of torment opens on her eyes, Darkness and hissing all she sees and hears;— (The speaker pauses—the audience are supposed to clap, when he continues,) But Hope, returning to dispel her fears, Claps her bright wings; the magic sound and light At once have forced their dreaded foe to flight, Silenced the hissing, chased the darkness round, And charm'd up marvelling Fancy from the ground.
  • 66. Say, shall the cherubs dare once more to fly? Not, as of late, in glory's dazzling sky, To greet the appearance of the poet's name, Dawning conspicuous mid the stars of fame; Presumptuous flight! but let them dare to rise, Cheer'd by the light of your propitious eyes, Within this roof, glory's contracted sphere, On fluttering pinions, unsubdued by Fear; O! let them dare, ere yet the curtain draws, Fondly anticipate your kind applause. EPILOGUE. —————— Perplexing case!—your pardon, Friends, I pray,— My head so turns, I know not what to say;— However, since I've dared to come before ye, I'll stop the whirligig,—
  • 67. (Clapping his hand to his forehead,) and tell my story: Though 'tis so strange, that I've a pre-conviction It may by some, perhaps, be judged a fiction. Learn, gentle Audience, then, with just surprise, That, when, to-night, you saw the curtain rise, Our poet's epilogue was still unwrit: The devil take him for neglecting it! Nay though,—'twas not neglected; 'twas deferr'd From certain motives—which were most absurd; For, trusting blindly to his rhyming vein, And still-prepared inventiveness of brain, He'd form'd the whimsical, foolhardy plan, To set about it when the play began; Thus purposing the drama's fate to know, Then write his epilogue quite à propos. The time at last arrives—the signal rings, Sir Bard, alarm'd, to pen and paper springs, And, snug in listening-corner, near the scene,
  • 68. With open'd ears, eyes, mouth- suspended mien,— Watches opinion's breezes as they blow, To kindle fancy's fire, and bid his verses flow. Now I, kind Auditors! by fortune's spite Was doom'd, alack! to speak what he should write, And therefore, as you'll naturally suppose, Could not forbear, at times, to cock my nose Over his shoulder, curiously to trace His progress;—zounds! how snail-like was his pace! Feeling, at length, my sore-tried patience sicken, Good Sir, I cried, your tardy motions quicken: 'Tis the fourth act, high time, Sir, to have done! As if his ear had been the touch-hole of a gun, My tongue a match, the Bard, on fire, exploded; He was—excuse the pun—with grape high-loaded. Hence, prating fool! return'd he, in a roar, Push'd me out, neck and heels, and bang'd the door.
  • 69. But lest, here too, like hazard I should run;} I'll end my story. When the play was done,} The epilogue was—look! 'tis here— begun:} Such as it is, however, if you will, I'll read it; shall I, Friends? (They clap.) Your orders I fulfil. (He reads.) 'Tis come! the fateful hour! list! list! the bell Summons me—Duncan-like, to heaven or hell; See, see, the curtain draws;—it now commences; Fear and suspense have frozen up my senses: But let me to my task:—what noise is this? They're clapping, clapping, O ye gods, what bliss! Now then, to work, my pen:— descend, O Muse! Thine inspiration through my soul infuse; Prompt such an epilogue as ne'er before Has been imagined,—never will be more. What subject? hark! new louder plaudits rise,
  • 70. I'm fired, and, like a rocket, to the skies Dart up triumphantly in flames of light:— They hiss, I'm quench'd, and sink in shades of night. Again they clap, O extacy!— Having thus far indulged his rhyming vein, He halts,—reads,—curses,—and begins again; But not a single couplet could he muster; How should he, with his soul in such a fluster, All rapture, gratitude, for your applause? Be then, the effect excused in favour of the cause! LINES ON THE DEATH OF THE REV. MR. B. (SUPPOSED TO BE WRITTEN BY MISS B***, HIS SISTER.) ——————
  • 71. At God's command the vital spirit fled, And thou, my Brother! slumber'st with the dead. Alas! how art thou changed! I scarcely dare To gaze on thee;—dread sight! death, death is there. How does thy loss o'erwhelm my heart with grief! But tears, kind nature's tears afford relief. Reluctant, sad, I take my last farewell:— Thy virtues in my mind shall ever dwell; Thy tender friendship felt so long for me, Thy frankness, truth, thy generosity, Thy tuneful tongue's persuasive eloquence, Thy science, learning, taste, wit, common sense, Thy patriot love of genuine liberty, Thy heart o'erflowing with philanthropy; And chiefly will I strive henceforth to feel Thy firm religious faith and pious zeal, Enlighten'd, liberal, free from bigotry, And, that prime excellence, thy charity.
  • 72. Farewell!—for ever?—no! forbid it, Heaven! A glorious promise is to Christians given; Though parted in this world of sin and pain, On high, my Brother! we shall meet again. LINES TO AN INFIDEL, AFTER HAVING READ HIS BOOK AGAINST CHRISTIANITY. —————— Your book I've read: I would that I had not! For what instruction, pleasure, have I got? Amid that artful labyrinth of doubt Long, long I wander'd, striving to get out; Your thread of sophistry, my only clue, I fondly hoped would guide me rightly through: That spider's web entangled me the more: With desperate courage onward still I went,
  • 73. Until my head was turn'd, my patience spent: Now, now, at last, thank God! the task is o'er. I've been a child, who whirls himself about, Fancying he sees both earth and heaven turn round; Till giddy, panting, sick, and wearied out, He falls, and rues his folly on the ground. LINES ON HEARING A YOUNG GENTLEMAN, WHO IS BOTH LAME AND BLIND, BUT IN OTHER RESPECTS VERY HANDSOME, SING AND PLAY ON HIS VIOLIN FOR THE FIRST TIME. —————— Crippled his limbs, and sightless are his eyes; I view the youth, and feel compassion rise. He sings! how sweet the notes! in pleased amaze I listen,—listen, and admiring gaze. Still, as he catches inspiration's fire,
  • 74. Sweeping with bolder hands the obedient strings, That mix, harmonious, with the strains he sings, He pours into the music all his soul, And governs mine with strong, but soft controul: Raptured I glow, and more and more admire. His mortal ailments I no longer see; But, of divinities my fancy dreams; Blind was the enchanting God of soft desire; And lame the powerful Deity of fire; His bow the magic rod of Hermes seems; And in his voice I hear the God of harmony. LINES TO A PEDANTIC CRITIC. —————— Critic! should I vouchsafe to learn of thee, Correct, no doubt, but cold my strains would be: Now, cold correctness!—I despise the name; Is that a passport through the gates of fame? Thy pedant rules with care I studied once;
  • 75. Was I made wiser, or a greater dunce? Hence, Critic, hence! I'll study them no more; My eyes are open'd, and the folly's o'er. When Genius opes the floodgates of the soul, Fancy's outbursting tides impetuous roll, Onward they rush with unresisted sway,} Sweeping fools, pedants, critics, all away} Who would with obstacles their progress stay.} As mighty Ocean bids his waves comply With the great luminaries of the sky, So Genius, to direct his course aright, Owns but one guide, the inspiring God of light. LINES ON SHAKSPEARE. (SUPPOSED TO BE WRITTEN NEAR HIS TOMB.) ——————
  • 76. Behold! this marble tablet bears inscribed The name of Shakspeare!— What a glorious theme For never-ending praise! His drama's page, Like a clear mirror, to our wondering view Displays the living image of the world, And all the different characters of men: Still, in the varying scenes, or sad, or gay, We take a part; we weep; we laugh; we feel All the strong sympathies of real life. To him alone, of mortals, Fancy lent Her magic wand, potent to conjure up Ideal Forms, distinctly character'd, Exciting fear, or wonder, or delight. The works of Shakspeare! are they not a fane, Majestic as the canopy of heaven, Embracing all created things, a fane His superhuman genius has upraised, To Nature consecrate? The Goddess there For ever dwells, and from her sanctuary,
  • 77. By Shakspeare's voice, her poet and high-priest, Reveals her awful mysteries to man, And with her power divine rules every heart. At Shakspeare's name, then, bow down all ye sons Of learning, and of art! ye men, endow'd With talent, taste! ye nobler few who feel The genuine glow of genius! bow down all In admiration! with deep feeling own Your littleness, your insignificance; And with one general voice due homage pay To Nature's Poet, Fancy's best-loved Child! LINES ON MILTON. (SUPPOSED TO BE WRITTEN NEAR HIS TOMB.) —————— Milton!— the name of that divinest Bard Acts on Imagination like a charm
  • 78. Of holiest power;—with deep, religious awe She hails the sacred spot where sleep entomb'd The relics that enshrined his godlike soul. O! with what heartfelt interest and delight, With what astonishment, will all the sons Of Adam, till the end of time, peruse His lofty, wondrous page! with what just pride Will England ever boast her Milton's name, The Poet matchless in sublimity! E'en now in Memory's raptured ear resound The deep-toned strains of the Miltonic lyre; Inspiring virtuous, heart-ennobling thought, They breathe of heaven; the imaginative Power No longer treads the guilt-polluted world, But soars aloft, and draws empyreal air: Rapt Faith anticipates the judgment-hour, When, at the Archangel's call, the dead shall wake With frames resuscitated, glorified:
  • 79. Then, then! in strains like these, the sainted Bard, Conspicuous mid salvation's earth- born heirs, Shall join harmoniously the heavenly choir, And sing the Saviour's praise in endless bliss. ANACREONTIC. —————— Still, as the fleeting seasons change, From joy to joy poor mortals range, And as the year pursues its round, One pleasure's lost, another found; Time, urging on his envious course, Still drives them from their last resource. So butterflies, when children chase The gaudy prize with eager pace, On each fresh flower but just alight, And, ere they taste, renew their flight. Thanks to kind Fortune! I possess A constant source of happiness, And am not poorly forced to live On what the seasons please to give. Let clouds or sunshine vest the pole,
  • 80. What care I, while I quaff the bowl? In that secure, I can defy The changeful temper of the sky. No weatherglass, or if I be, Thou, Bacchus! art my Mercury. ANACREONTIC. —————— Let us, my Friends, our mirth forbear, While yonder Censor mounts the chair: His form erect, his stately pace, His huge, white wig, his solemn face, His scowling brows, his ken severe, His haughty pleasure-chiding sneer, Some high Philosopher declare:— Hush! let us hear him from the chair: 'Ye giddy youths! I hate your mirth; How ill-beseeming sons of earth! Know ye not well the fate of man? That death is certain, life a span? That merriment soon sinks in sorrow, Sunshine to-day, and clouds to- morrow? Hearken then, fools! to Reason's voice,
  • 81. That bids ye mourn, and not rejoice?' Such gloomy thoughts, grave Sage! are thine, Now, gentle Friends! attend to mine. Since mortals must die, Since life's but a span, 'Tis wisdom, say I, To live while we can, And fill up with pleasure The poor little measure. Of fate to complain How simple and vain! Long faces I hate; They shorten the date. My Friends! while ye may, Be jovial to-day; The things that will be Ne'er wish to foresee; Or, should ye employ Your thoughts on to-morrow, Let Hope sing of joy, Not Fear croak of sorrow. But see! the Sage flies, so no more. Now, Friends! drink and sing, as before. ANACREONTIC. ——————
  • 82. Why must Poets, when they sing, Drink of the Castalian spring? Sure 'tis chilling to the brain; Witness many a modern strain: Poets! would ye sing with fire, Wine, not water, must inspire. Come, then, pour thy purple stream, Lovely Bottle! thou'rt my theme. How within thy crystal frame Does the rosy nectar flame! Not so beauteous on the vine Did the clustering rubies shine, When the potent God of day Fill'd them with his ripening ray; When with proudness and delight Bacchus view'd the charming sight. Still it keeps Apollo's fires; Still the vintage-God admires. Hail sweet antidote of wo! Chiefest blessing mortals know! Nay, the mighty powers divine Own the magic force of wine. Wearied with the world's affairs, Jove himself, to drown his cares, Bids the nectar'd goblet bear: Lo! the youthful Hebe fair Pours the living draught around;— Hark! with mirth the skies resound. 'Tis to wine, for aught I know, Deities their godship owe; Don't we mortals owe to wine Manhood, and each spark divine? Say, thou life-inspiring Bowl, Who thy heavenly treasure stole?
  • 83. Not the hand that stole Jove's fire Did so happily aspire; Tell the lucky spoiler's name, Worthy never-dying fame. Since it must a secret be, Him I'll praise, in praising thee. Glory of the social treat! Source of friendly converse sweet! Source of cheerfulness and sense, Humour, wit, and eloquence, Courage and sincerity, Candour and philanthropy! Source of—O thou bounteous wine! What the good that is not thine? Were my nerves relax'd and low? Did my chill blood toil on slow? When thy spirit through me flows, How each vital function glows! Tuned, my nerves, no longer coy, Answer to the touch of joy: On the steams, that from thee rise, Time on swifter pinions flies; Fancy gilds them with her rays; Hope amid the rainbow plays. But behold! what Image bright Rises heavenly to my sight! Could such wondrous charms adorn Venus, when from ocean born? Say, my Julia, is it thou, Ever lovely, loveliest now? Yet, methinks, the Cyprian Queen Comes herself, but takes thy mien. Goddess! I confess thy power, And to love devote the hour, Let me but, with grateful soul,
  • 84. Greet once more the bounteous Bowl. SONG. —————— Ere Reason rose within my breast, To enforce her sacred law, Still would some charm, in every maid, My veering passions draw. But now, to calm those gales of night, The morn her light displays; The twinkling stars no more I view, For only Venus sways: The spotless heaven of genuine love Unveil'd I wondering see, And all that heaven, transported, claim For Julia and for me. SONG. ——————
  • 85. Yes, I could love, could softly yield To passion all my willing breast, And fondly listen to the voice That oft invites me to be blest; That still, when Fancy, lost in bliss, Stands gazing on the form divine, So sweetly whispers to my soul, O make the heavenly Julia thine! But hush, thou fascinating voice! Hence visionary extacy! Yes, I could love, but ah! I fear She would not deign to smile on me. SONG TO BACCHUS. —————— Come along, jolly Bacchus! no longer delay; See'st thou not how the table with bottles is crown'd? See'st thou not how thy votaries, impatient to pay Their devotion to thee, are all waiting around? O come then, propitious to our invocation, To preside of thy rites at the solemnization. Hark! the voice of Champagne, from its prison set free, And the music of glasses that merrily ring,
  • 86. Thy arrival announce, and invite us to glee; With what gladness we welcome thee, vine- crowned King! To honour thee, Bacchus! we pour a libation, And the lofty roof echoes our loud salutation. On that wine-loaded altar, erected to thee, Sherry, burgundy, claret, invitingly shine; While all thy rich gifts thus collected we see, We greet thy munificence boundless, divine. From these we already inhale animation, Our hearts and heads warmth, and our souls elevation. As thy nectar, kind Bacchus! more copiously flows, We purge off the cold dregs that are earthy, profane; Each breast with thy own godlike character glows; There truth, generosity, happiness reign. Hail Bacchus! we hail thee in high exultation; Thou hast blest us, kind God! with thy full inspiration. ON SEEING THE APOLLO BELVIDERE. —————— What majesty! what elegance and grace!
  • 87. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com