SlideShare a Scribd company logo
Informatics In Proteomics 1st Edition Sudhir
Srivastava download
https://ebookgate.com/product/informatics-in-proteomics-1st-
edition-sudhir-srivastava/
Get Instant Ebook Downloads – Browse at https://ebookgate.com
Get Your Digital Files Instantly: PDF, ePub, MOBI and More
Quick Digital Downloads: PDF, ePub, MOBI and Other Formats
Content Networking in the Mobile Internet 1st Edition
Sudhir Dixit
https://ebookgate.com/product/content-networking-in-the-mobile-
internet-1st-edition-sudhir-dixit/
C In Depth 2nd Edition S.K. Srivastava
https://ebookgate.com/product/c-in-depth-2nd-edition-s-k-
srivastava/
Belief Functions in Business Decisions 1st Edition
Rajendra P. Srivastava
https://ebookgate.com/product/belief-functions-in-business-
decisions-1st-edition-rajendra-p-srivastava/
Ultimate Python Programming 1st Edition Deepali
Srivastava
https://ebookgate.com/product/ultimate-python-programming-1st-
edition-deepali-srivastava/
How to Succeed at Interviews 2nd Edition Sudhir Andrews
https://ebookgate.com/product/how-to-succeed-at-interviews-2nd-
edition-sudhir-andrews/
Healthcare Information Systems and Informatics Research
and Practices Advances in Healthcare Information
Systems and Informatics 1st Edition Joseph Tan
https://ebookgate.com/product/healthcare-information-systems-and-
informatics-research-and-practices-advances-in-healthcare-
information-systems-and-informatics-1st-edition-joseph-tan/
Mass Spectrometry Data Analysis in Proteomics 1st
Edition Rune Matthiesen
https://ebookgate.com/product/mass-spectrometry-data-analysis-in-
proteomics-1st-edition-rune-matthiesen/
Tourism Informatics Nalin Sharda
https://ebookgate.com/product/tourism-informatics-nalin-sharda/
Nano catalysts for Energy Applications 1st Edition
Rohit Srivastava (Editor)
https://ebookgate.com/product/nano-catalysts-for-energy-
applications-1st-edition-rohit-srivastava-editor/
Informatics In Proteomics 1st Edition Sudhir Srivastava
Informatics
in Proteomics
Informatics In Proteomics 1st Edition Sudhir Srivastava
Editedby
SudhirSrivastava
Informatics
in Proteomics
Published in 2005 by
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2005 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group
No claim to original U.S. Government works
Printed in the United States of America on acid-free paper
10 9 8 7 6 5 4 3 2 1
International Standard Book Number-10: 1-57444-480-8 (Hardcover)
International Standard Book Number-13: 978-1-57444-480-3 (Hardcover)
This book contains information obtained from authentic and highly regarded sources. Reprinted material is
quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts
have been made to publish reliable data and information, but the author and the publisher cannot assume
responsibility for the validity of all materials or for the consequences of their use.
No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic,
mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and
recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com
(http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive,
Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration
for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate
system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only
for identification and explanation without intent to infringe.
Library of Congress Cataloging-in-Publication Data
Catalog record is available from the Library of Congress
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Taylor & Francis Group
is the Academic Division of T&F Informa plc.
Dedication
Dedicated to my daughters, Aditi and Jigisha, and to my
lovely wife, Dr. Rashmi Gopal Srivastava
Informatics In Proteomics 1st Edition Sudhir Srivastava
Foreword
A remarkable development in the post-genome era is the re-emergence of proteomics
as a new discipline with roots in old-fashioned chemistry and biochemistry, but with
new branches in genomics and informatics. The appeal of proteomics stems from
the fact that proteins are the most functional component encoded for in the genome
and thus represent a direct path to functionality. Proteomics emphasizes the global
profiling of cells, tissues, and biological fluids, but there is a long road from applying
various proteomics tools to the discovery, for example, of proteins that have clinical
utility as disease markers or as therapeutic targets. Given the complexity of various
cell and tissue proteomes and the challenges of identifying proteins of particular
interest, informatics is central to all aspects of proteomics. However, protein infor-
matics is still in its early stages, as is the entire field of proteomics.
Although collections of protein sequences have preceded genomic sequence data-
bases by more than two decades, there is a substantial need for protein databases as basic
protein information resources. There is a need for implementing algorithms, statistical
methods, and computer applications that facilitate pattern recognition and biomarker
discovery by integrating data from multiple sources. This book, which is dedicated to
protein informatics, is intended to serve as a valuable resource for people interested in
protein analysis, particularly in the context of biomedical studies. An expert group of
authors has been assembled with proteomics informatics–related expertise that is highly
valuable in guiding proteomic studies, particularly since currently the analysis of pro-
teomics data is rather informal and largely dependent on the idiosyncrasies of the analyst.
Several chapters address the need for infrastructures for proteomic research and
cover the status of public protein databases and interfaces. The creation of a national
virtual knowledge environment and information management systems for proteomic
research is timely and clearly addressed. Issues surrounding data standardization
and integration are very well presented. They are captured in a chapter that describes
ongoing initiatives within the Human Proteome Organization (HUPO). A major
strength of the book is in the detailed review and discussion of applications of
statistical and bioinformatic tools to data analysis and data mining. Much concern
at the present time surrounds the analysis of proteomics data by mass spectrometry
for a variety of applications. The book shines in its presentation in several chapters
of various approaches and issues surrounding mass spectrometry data analysis.
Although the field of proteomics and related informatics is highly evolving, this
book captures not only the current state-of-the-art but also presents a vision for
where the field is heading. As a result, the contributions of the book and its com-
ponent chapters will have long-lasting value.
Sam Hanash, M.D.
Fred Hutchinson Cancer Center
Seattle, Washington
Informatics In Proteomics 1st Edition Sudhir Srivastava
Preface
The biological dictates of an organism are largely governed through the structure
and function of the products of its genes, the most functional of which is the
proteome. Originally defined as the analysis of the entire protein complement of a
cell or tissue, proteomics now encompasses the study of expressed proteins including
the identification and elucidation of their structure–function relationships under
normal and disease conditions. In combination with genomics, proteomics can
provide a holistic understanding of the biology underlying disease processes. Infor-
mation at the level of the proteome is critical for understanding the function of
specific cell types and their roles in health and disease. Bioinformatic tools are
needed at all levels of proteomic analysis. The main databases serving as the targets
for mass spectrometry data searches are the expressed sequence tag (EST) and the
protein sequence databases, which contain protein sequence information translated
from DNA sequence data. It is thought that virtually any protein that can be detected
on a 2DE gel can be identified through the EST database, which contains over
2 million cDNA sequences. However, ESTs cover only a partial sequence of the
protein. This poses a formidable challenge for the proteomic community and neces-
sitates the need for databases with extensive coverage and search algorithms for
identifying proteins/peptides with accuracy.
The handling and analysis of data generated by proteomic investigations repre-
sent an emerging and challenging field. New techniques and collaborations between
computer scientists, biostatisticians, and biologists are called for. There is a need to
develop and integrate a variety of different types of databases; to develop tools for
translating raw primary data into forms suitable for public dissemination and formal
data analysis; to obtain and develop user interfaces to store, retrieve, and visualize
data from databases; and to develop efficient and valid methods of data analysis.
The sheer volume of data to be collected and processed will challenge the usual
approaches. Analyzing data of this dimension is a fairly new endeavor for statisti-
cians, for which there is not an extensive technical statistical literature.
There are several levels of complexity in the investigation of proteomic data,
from the day-to-day interpretation of protein patterns generated by individual mea-
surement systems to the query and manipulation of data from multiple experiments
or information sources. Interaction with data warehouses represents another level of
data interrogation. Users typically retrieve data and formulate queries to test hypoth-
eses and generate conclusions. Formulating queries can be a difficult task requiring
extensive syntactic and semantic knowledge. Syntactic knowledge is needed to
ensure that a query is well formed and references existing relations and attributes.
Semantic knowledge is needed to ensure that a query satisfies user intent. Because
a user often has an incomplete understanding of the contents and structure of the data
warehouse, it is necessary to provide automated techniques for query formulation
that significantly reduce the amount of knowledge required by data warehouse users.
This book intends to provide a comprehensive view of informatic approaches to data
storage, curation, retrieval, and mining as well as application-specific bioinformatic
tools in disease detection, diagnosis, and treatment.
Rapid technological advances are yielding abundant data in many formats that,
because of their vast quantity and complexity, are becoming increasingly difficult
to analyze. A strategic objective is to streamline the transfer of knowledge and
technology to allow for data from disparate sources to be analyzed, providing new
inferences about the complex role of proteomics in disease processes. Data mining,
the process of knowledge extraction from data and the exploration of available data
for patterns and relationships, is increasingly needed for today’s high-throughput
technologies. Data architectures that support the integration of biological data files
with epidemiologic profiles of human clinical responses need to be developed. The
ability to develop and analyze metadata will stimulate new research theories and
streamline the transfer of basic knowledge into clinical applications. It is my belief
that this book will serve as a unique reference for researchers, biologists, technol-
ogists, clinicians, and other health professions as it provides information on the
informatics needs of proteomic research on molecular targets relevant to disease
detection, diagnosis, and treatment.
The nineteen chapters in this volume are contributed by eminent researchers in
the field and critically address various aspects of bioinformatics and proteomic
research. The first two chapters are introductory: they discuss the biological rationale
for proteomic research and provide a brief overview of technologies that allow for
rapid analysis of the proteome. The next five chapters describe the infrastructures
that provide the foundations for proteomic research: these include the creation of a
national, virtual knowledge environment and information management systems for
proteomic research; the availability of public protein databases and interfaces; and
the need for collaboration and interaction between academia, industry, and government
agencies. Chapter 6 illustrates the power of proteomic knowledge in furthering hypoth-
esis-driven cancer biomarker research through data extraction and curation. Chapter
7 and Chapter 8 provide the conceptual framework for data standardization and inte-
gration and give an example of an ongoing collaborative research within the Human
Proteome Organization. Chapter 9 identifies genomic and proteomic informatic tools
used in deciphering functional pathways. The remaining ten chapters describe appli-
cations of statistical and bioinformatic tools in data analysis, data presentation, and
data mining. Chapter 10 provides an overview of a variety of proteomic data mining
tools, and subsequent chapters provide specific examples of data mining approaches
and their applications. Chapter 11 describes methods for quantitative analysis of a
large number of proteins in a relatively large number of lung cancer samples using
two-dimensional gel electrophoresis. Chapter 12 discusses the analysis of mass spec-
trometric data by nonparametric inference for high-dimensional comparisons involv-
ing two or more groups, based on a few samples and very few replicates from within
each group. Chapter 13 discusses bioinformatic tools for the identification of proteins
by searching a collection of sequences with mass spectrometric data and describes
several critical steps that are necessary for the successful protein identification, which
include: (a) the masses of peaks in the mass spectrum corresponding to the monoiso-
topic peptide masses have to assigned; (b) a collection of sequences have to be
searched using a sensitive and selective algorithm; (c) the significance of the results
have to be tested; and (d) the function of the identified proteins have to be assigned.
In Chapter 14, two types of approaches are described: one based on statistical
theories and another on machine learning and computational data mining tech-
niques. In Chapter 15, the author discusses the problems with the currently avail-
able disease classifier algorithms and puts forward approaches for scaling the data
set, searching for outliers, choosing relevant features, building classification mod-
els, and then determining the characteristics of the models. Chapter 16 discusses
currently available computer tools that support data collection, analysis, and val-
idation in a high-throughput LC-MS/MS–based proteome research environment
and subsequent protein identification and quantification with minimal false-posi-
tive error rates. Chapter 17 and Chapter 18 describe experimental designs, statis-
tical methodologies, and computational tools for the analysis of spectral patterns
in the diagnosis of ovarian and prostate cancer. Finally, Chapter 19 illustrates how
quantitative analysis of fluorescence microscope images augments mainstream
proteomics by providing information about the abundance, localization, move-
ment, and interactions of proteins inside cells.
This book has brought together a mix of scientific disciplines and specializations,
and I encourage readers to expand their knowledge by reading how the combination
of proteomics and bioinformatics is used to uncover interesting biology and discover
clinically significant biomarkers. In a field with rapidly changing technologies, it is
difficult to ever feel that one has knowledge that is current and definitive. Many
chapters in this book are conceptual in nature but have been included because
proteomics is an evolving science that offers much hope to researchers and patients
alike.
Last, but not least, I would like to acknowledge the authors for their contributions
and patience. When I accepted the offer to edit this book, I was not sure we were
ready for a book on proteomics as the field is continuously evolving, but the excellent
contributions and enthusiasm of my colleagues have allayed my fears. The chapters
in the book describe the current state-of-the-art in informatics and reflect the inter-
ests, experience, and creativity of the authors. Many chapters are intimately related
and therefore there may be some overlap in the material presented in each individual
chapter. I would also like to acknowledge Dr. Asad Umar for his help in designing
the cover for this book. Finally, I would like to express my sincere gratitude to Dr.
Sam Hanash, the past president of HUPO, for his encouragement and support.
Sudhir Srivastava, Ph.D., MPH, MS
Bethesda, Maryland
Informatics In Proteomics 1st Edition Sudhir Srivastava
Contributors
Bao-Ling Adam
Department of Microbiology and
Molecular Cell Biology
Eastern Virginia Medical School
Norfolk, Virginia, USA
Marcin Adamski
Bioinformatics Program
Department of Human Genetics
School of Medicine
University of Michigan
Ann Arbor, Michigan, USA
Ruedi Aebersold
Institute for Systems Biology
Seattle, Washington, USA
R.C. Beavis
Beavis Informatics
Winnipeg, Manitoba, Canada
David G. Beer
General Thoracic Surgery
University of Michigan
Ann Arbor, Michigan, USA
Guoan Chen
General Thoracic Surgery
University of Michigan
Ann Arbor, Michigan, USA
Chad Creighton
Pathology Department
University of Michigan
Ann Arbor, Michigan, USA
Daniel Crichton
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California, USA
Cim Edelstein
Division of Public Health Services
Fred Hutchinson Cancer Research Center
Seattle, Washington, USA
Jimmy K. Eng
Division of Public Health Services
Fred Hutchinson Cancer Research Center
Seattle, Washington, USA
J. Eriksson
Department of Chemistry
Swedish University of Agricultural
Sciences
Uppsala, Sweden
Ziding Feng
Division of Public Health Sciences
Fred Hutchinson Cancer Research
Center
Seattle, Washington, USA
D. Fenyö
Amersham Biosciences AB
Uppsala, Sweden
The Rockefeller University
New York, New York, USA
R. Gangal
SciNova Informatics
Pune, Maharashtra, India
Gary L. Gilliland
Biotechnology Division
National Institute of Standards and
Technology
Gaithersburg, Maryland, USA
Samir M. Hanash
Division of Public Health Sciences
Fred Hutchinson Cancer Research
Center
Seattle, Washington, USA
Ben A. Hitt
Correlogic Systems, Inc.
Bethesda, Maryland, USA
J. Steven Hughes
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California, USA
Donald Johnsey
National Cancer Institute
National Institutes of Health
Bethesda, Maryland, USA
Andrew Keller
Division of Public Health Sciences
Fred Hutchinson Cancer Research
Center
Seattle, Washington, USA
Sean Kelly
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California, USA
Heather Kincaid
Fred Hutchinson Cancer Research
Center
Seattle, Washington, USA
Jeanne Kowalski
Division of Oncology Biostatistics
Johns Hopkins University
Baltimore, Maryland, USA
Peter A. Lemkin
Laboratory of Experimental and
Computational Biology
Center for Cancer Research
National Cancer Institute
Frederick, Maryland, USA
Xiao-jun Li
Institute for Systems Biology
Seattle, Washington, USA
Chenwei Lin
Department of Computational Biology
Fred Hutchinson Cancer Research
Center
Seattle, Washington, USA
Lance Liotta
FDA-NCI Clinical Proteomics Program
Laboratory of Pathology
National Cancer Institute
Bethesda, Maryland, USA
Stephen Lockett
NCI–Frederick/SAIC–Frederick
Frederick, Maryland, USA
Brian T. Luke
SAIC-Frederick
Advanced Biomedical Computing
Center
NCI Frederick
Frederick, Maryland, USA
Dale McLerran
Division of Public Health Sciences
Fred Hutchinson Cancer Research
Center
Seattle, Washington, USA
Djamel Medjahed
Laboratory of Molecular Technology
SAIC-Frederick Inc.
Frederick, Maryland, USA
Alexey I. Nesvizhskii
Division of Public Health Sciences
Fred Hutchinson Cancer Research
Center
Seattle, Washington, USA
Jane Meejung Chang Oh
Wayne State University
Detroit, Michigan, USA
Gilbert S. Omenn
Departments of Internal Medicine
and Human Genetics
Medical School and School
of Public Health
University of Michigan
Ann Arbor, Michigan, USA
Emanuel Petricoin
FDA-NCI Clinical Proteomics Program
Office of Cell Therapy
CBER/Food and Drug Administration
Bethesda, Maryland, USA
Veerasamy Ravichandran
Biotechnology Division
National Institute of Standards
and Technology
Gaithersburg, Maryland, USA
John Semmes
Department of Microbiology and
Molecular Cell Biology
Eastern Virginia Medical School
Norfolk, Virginia, USA
Ram D. Sriram
Manufacturing Systems Integration
Division
National Institute of Standards and
Technology
Gaithersburg, Maryland, USA
Sudhir Srivastava
Cancer Biomarkers Research Group
Division of Cancer Prevention
National Cancer Institute
Bethesda, Maryland, USA
David J. States
Bioinformatics Program
Department of Human Genetics
School of Medicine
University of Michigan
Ann Arbor, Michigan, USA
Mark Thornquist
Division of Public Health Sciences
Fred Hutchinson Cancer Research
Center
Seattle, Washington, USA
Mukesh Verma
Cancer Biomarkers Research Group
Division of Cancer Prevention
National Cancer Institute
Bethesda, Maryland, USA
Paul D. Wagner
Cancer Biomarkers Research Group
Division of Cancer Prevention
National Cancer Institute
Bethesda, Maryland, USA
Denise B. Warzel
Center for Bioinformatics
National Cancer Institute
Rockville, Maryland, USA
Nicole White
Department of Pathology
Johns Hopkins University
Baltimore, Maryland, USA
Marcy Winget
Department of Population Health and
Information
Alberta Cancer Board
Edmonton, Alberta, Canada
Yutaka Yasui
Division of Public Health Sciences
Fred Hutchinson Cancer Research
Center
Seattle, Washington, USA
Mei-Fen Yeh
Division of Oncology Biostatistics
Johns Hopkins University
Baltimore, Maryland, USA
Zhen Zhang
Center for Biomarker Discovery
Department of Pathology
Johns Hopkins University
Baltimore, Maryland, USA
Contents
Chapter 1
The Promise of Proteomics: Biology, Applications, and Challenges.......................1
Paul D. Wagner and Sudhir Srivastava
Chapter 2
Proteomics Technologies and Bioinformatics.........................................................17
Sudhir Srivastava and Mukesh Verma
Chapter 3
Creating a National Virtual Knowledge Environment
for Proteomics and Information Management........................................................31
Daniel Crichton, Heather Kincaid, Sean Kelly, Sudhir Srivastava,
J. Steven Hughes, and Donald Johnsey
Chapter 4
Public Protein Databases and Interfaces.................................................................53
Jane Meejung Chang Oh
Chapter 5
Proteomics Knowledge Databases: Facilitating Collaboration and
Interaction between Academia, Industry, and Federal Agencies............................79
Denise B. Warzel, Marcy Winget, Cim Edelstein,
Chenwei Lin, and Mark Thornquist
Chapter 6
Proteome Knowledge Bases in the Context of Cancer ........................................109
Djamel Medjahed and Peter A. Lemkin
Chapter 7
Data Standards in Proteomics: Promises and Challenges ....................................141
Veerasamy Ravichandran, Ram D. Sriram,
Gary L. Gilliland, and Sudhir Srivastava
Chapter 8
Data Standardization and Integration in Collaborative Proteomics Studies ........163
Marcin Adamski, David J. States, and Gilbert S. Omenn
Chapter 9
Informatics Tools for Functional Pathway Analysis Using
Genomics and Proteomics.....................................................................................193
Chad Creighton and Samir M. Hanash
Chapter 10
Data Mining in Proteomics ...................................................................................205
R. Gangal
Chapter 11
Protein Expression Analysis..................................................................................227
Guoan Chen and David G. Beer
Chapter 12
Nonparametric, Distance-Based, Supervised Protein
Array Analysis .......................................................................................................255
Mei-Fen Yeh, Jeanne Kowalski, Nicole White,
and Zhen Zhang
Chapter 13
Protein Identification by Searching Collections of Sequences
with Mass Spectrometric Data ..............................................................................267
D. Fenyö, J. Eriksson, and R.C. Beavis
Chapter 14
Bioinformatics Tools for Differential Analysis of Proteomic
Expression Profiling Data from Clinical Samples................................................277
Zhen Zhang
Chapter 15
Sample Characterization Using Large Data Sets..................................................293
Brian T. Luke
Chapter 16
Computational Tools for Tandem Mass Spectrometry–Based
High-Throughput Quantitative Proteomics ...........................................................335
Jimmy K. Eng, Andrew Keller, Xiao-jun Li,
Alexey I. Nesvizhskii, and Ruedi Aebersold
Chapter 17
Pattern Recognition Algorithms and Disease Biomarkers....................................353
Ben A. Hitt, Emanuel Petricoin, and Lance Liotta
Chapter 18
Statistical Design and Analytical Strategies for Discovery of
Disease-Specific Protein Patterns..........................................................................367
Ziding Feng, Yutaka Yasui, Dale McLerran, Bao-Ling Adam,
and John Semmes
Chapter 19
Image Analysis in Proteomics...............................................................................391
Stephen Lockett
Index......................................................................................................................433
Informatics In Proteomics 1st Edition Sudhir Srivastava
1
1 The Promise of
Proteomics: Biology,
Applications, and
Challenges
Paul D. Wagner and Sudhir Srivastava
CONTENTS
1.1 Introduction ......................................................................................................1
1.2 Why Is Proteomics Useful?.............................................................................2
1.3 Gene–Environment Interactions.......................................................................3
1.4 Organelle-Based Proteomics............................................................................4
1.5 Cancer Detection..............................................................................................5
1.6 Why Proteomics Has Not Succeeded in the Past:
Cancer as an Example......................................................................................6
1.7 How Have Proteomic Approaches Changed over the Years? .........................7
1.8 Future of Proteomics in Drug Discovery, Screening, Early
Detection, and Prevention..............................................................................11
References................................................................................................................13
1.1 INTRODUCTION
In the 19th century, the light microscope opened a new frontier in the study of
diseases, allowing scientists to look deep into the cell. The science of pathology (the
branch of medicine that deals with the essential nature of disease) expanded to
include the study of structural and functional changes in cells, and diseases could
be attributed to recognizable changes in the cells of the body. At the start of the 21st
century, the molecular-based methods of genomics and proteomics are bringing
about a new revolution in medicine. Diseases will be described in terms of patterns
of abnormal genetic and protein expression in cells and how these cellular alterations
affect the molecular composition of the surrounding environment. This new pathol-
ogy will have a profound impact on the practice of medicine, enabling physicians
to determine who is at risk for a specific disease, to recognize diseases before they
have invaded tissues, to intervene with agents or treatments that may prevent or
2 Informatics in Proteomics
delay disease progression, to guide the choice of therapies, and to assess how well
a treatment is working.
Cancer is one of the many diseases whose treatment will be affected by these
molecular approaches. Currently available methods can only detect cancers that have
achieved a certain size threshold, and in many cases, the tumors, however small,
have already invaded blood vessels or spread to other parts of the body. Molecular
markers have the potential to find tumors in their earliest stages of development,
even before the cell’s physical appearance has changed. Molecular-based detection
methods will also change our definition of cancer. For example, precancerous
changes in the uterine cervix are called such because of specific architectural and
cytological changes. In the future, we may be able to define the expression patterns
of specific cellular proteins induced by human papillomavirus that indicate the cells
are beginning to progress to cancer. We may also be able to find molecular changes
that affect all the tissues of an organ, putting the organ at risk for cancer.
In addition to improving the physician’s ability to detect cancers early, molecular
technologies will help doctors determine which neoplastic lesions are most likely
to progress and which are not destined to do so — a dilemma that confronts urologists
in the treatment of prostate cancer. Accurate discrimination will help eliminate
overtreatment of harmless lesions. By revealing the metastatic potential of tumors
and their corresponding preneoplastic lesions, molecular-based methods will fill a
knowledge gap impossible to close with traditional histopathology. If these advances
are made and new screening tests are developed, then one day we may be able to
identify and eliminate the invasive forms of most malignant epithelial tumors.
1.2 WHY IS PROTEOMICS USEFUL?
Mammalian systems are much more complex than can be deciphered by their genes
alone, and the biological dictates of an organism are largely governed through the
function of proteins. In combination with genomics, proteomics can provide a
holistic understanding of the biology of cells, organisms, and disease processes. The
term “proteome” came into use in the mid 1990s and is defined as the protein
complement of the genome. Although proteomics was originally used to describe
methods for large-scale, high-throughput protein separation and identification,1 today
proteomics encompasses almost any method used to characterize proteins and deter-
mine their functions. Information at the level of the proteome is critical for under-
standing the function of specific cell types and their roles in health and disease. This
is because proteins are often expressed at levels and forms that cannot be predicted
from mRNA analysis. Proteomics also provides an avenue to understand the inter-
action between a cell’s functional pathways and its environmental milieu, indepen-
dent of any changes at the RNA level. It is now generally recognized that expression
analysis directly at the protein level is necessary to unravel the critical changes that
occur as part of disease pathogenesis.
Currently there is much interest in the use of molecular markers or biomarkers
for disease diagnosis and prognosis. Biomarkers are cellular, biochemical, and
molecular alterations by which normal, abnormal, or simply biologic processes can
be recognized or monitored. These alterations should be able to objectively measure
The Promise of Proteomics: Biology, Applications, and Challenges 3
and evaluate normal biological processes, pathogenic processes, or pharmacologic
responses to a therapeutic intervention. Proteomics is valuable in the discovery of
biomarkers as the proteome reflects both the intrinsic genetic program of the cell
and the impact of its immediate environment. Protein expression and function are
subject to modulation through transcription as well as through translational and
posttranslational events. More than one messenger RNA can result from one gene
through differential splicing, and proteins can undergo more than 200 types of
posttranslation modifications that can affect function, protein–protein and protein–
ligand interactions, stability, targeting, or half-life.2 During the transformation of a
normal cell into a neoplastic cell, distinct changes occur at the protein level that
range from altered expression, differential modification, changes in specific activity,
and aberrant localization, all of which affect cellular function. Identifying and
understanding these changes is the underlying theme in cancer proteomics. The
deliverables include identification of biomarkers that have utility both for early
detection and for determining therapy.
While proteomics has traditionally dealt with quantitative analysis of protein
expression, more recently proteomics has been viewed to encompass structural
analyses of proteins.3 Quantitative proteomics strives to investigate the changes in
protein expression in different physiological states such as in healthy and diseased
tissue or at different stages of the disease. This enables the identification of state-
and stage-specific proteins. Structural proteomics attempts to uncover the structure
of proteins and to unravel and map protein–protein interactions. Proteomics provides
a window to pathophysiological states of cells and their microenvironments and
reflects changes that occur as disease-causing agents interact with the host environ-
ment. Some examples of proteomics are described below.
1.3 GENE–ENVIRONMENT INTERACTIONS
Infectious diseases result from interactions between the host and pathogen, and
understanding these diseases requires understanding not only alterations in gene
and protein expressions within the infected cells but also alterations in the sur-
rounding cells and tissues. Although genome and transcriptome analyses can pro-
vide a wealth of information on global alterations in gene expression that occur
during infections, proteomic approaches allow the monitoring of changes in protein
levels and modifications that play important roles in pathogen–host interactions.
During acute stages of infection, pathogen-coded proteins play a significant role,
whereas in the chronic infection, host proteins play the dominating role. Viruses,
such as hepatitis B (HBV), hepatitis C (HCV), and human papillomavirus (HPV),
are suitable for proteomic analysis because they express only eight to ten major
genes.4,5 Analyzing a smaller number of genes is easier than analyzing the proteome
of an organism with thousands of genes.6–8 For example, herpes simplex virus type 1
(HSV-1) infection induces severe alterations of the translational apparatus, includ-
ing phosphorylation of ribosomal proteins and the association of several nonribo-
somal proteins with the ribosomes.9–12 Whether ribosomes themselves could con-
tribute to the HSV-1–induced translational control of host and viral gene expression
has been investigated. As a prerequisite to test this hypothesis, the investigators
4 Informatics in Proteomics
undertook the identification of nonribosomal proteins associated with the ribosomes
during the course of HSV-1 infection. Two HSV-1 proteins, VP19C and VP26, that
are associated to ribosomes with different kinetics were identified. Another nonri-
bosomal protein identified was the poly(A)-binding protein 1 (PAB1P). Newly
synthesized PAB1P continued to associate to ribosomes throughout the course of
infection. This finding attests to the need for proteomic information for structural
and functional characterization.
Approximately 15% of human cancers (about 1.5 million cases per year, world-
wide) are linked to viral, bacterial, or other pathogenic infections.13 For cancer
development, infectious agents interact with host genes and sets of infectious
agent-specific or host-specific genes are expressed. Oncogenic infections increase
the risk of cancer through expression of their genes in the infected cells. Occasion-
ally, these gene products have paracrine effects, leading to neoplasia in neighboring
cells. More typically, it is the infected cells that become neoplastic. These viral,
bacterial, and parasitic genes and their products are obvious candidates for pharma-
cologic interruptions or immunologic mimicry, promising approaches for drugs and
vaccines. By understanding the pathways involved in the infectious agent–host
interaction leading to cancer, it would be possible to identify targets for intervention.
1.4 ORGANELLE-BASED PROTEOMICS
Eukaryotic cells contain a number of organelles, including nucleoli, mitochondria,
smooth and rough endoplasmic reticula, Golgi apparatus, peroxisomes, and lysosomes.
The mitochondria are among the largest organelles in the cell. Mitochondrial dys-
function has been frequently reported in cancer, neurodegenerative diseases, diabetes,
and aging syndromes.14–16 The mitochondrion genome (16.5 Kb) codes only for a
small fraction (estimated to be 1%) of the proteins housed within this organelle.
The other proteins are encoded by the nuclear DNA (nDNA) and transported into the
mitochondria. Thus, a proteomic approach is needed to fully understand the nature
and extent of mutated and modified proteins found in the mitochondria of diseased
cells. According to a recent estimate, there are 1000 to 1500 polypeptides in the
human mitochondria.17–20 This estimate is based on several lines of evidence,
including the existence of at least 800 distinct proteins in yeast and Arabidopsis
thaliana mitochondria18,19 and the identification of 591 abundant mouse mitochondrial
proteins.20
Investigators face a number of challenges in organelle proteome characterization
and data analysis. A complete characterization of the posttranslational modifications
that mitochondrial proteins undergo is an enormous and important task, as all of
these modifications cannot be identified by a single approach. Differences in post-
translational modifications are likely to be associated with the onset and progression
of various diseases. In addition, the mitochondrial proteome, although relatively
simple, is made up of complex proteins located in submitochondrial compartments.
Researchers will need to reduce the complexity to subproteomes by fractionation
and analysis of various compartments. A number of approaches are focusing on
specific components of the mitochondria, such as isolation of membrane proteins,
affinity labeling, and isolation of redox proteins,21 or isolation of large complexes.22
The Promise of Proteomics: Biology, Applications, and Challenges 5
Other approaches may combine expression data from other species, such as yeast,
to identify and characterize the human mitochondrial proteome.23,24
The need to identify mitochondrial proteins associated with or altered during the
development and progression of cancer is compelling. For example, mitochondrial
dysfunction has been frequently associated with transport of proteins, such as cyto-
chrome c. Mitochondrial outer membrane permeabilization by pro-apoptotic proteins,
such as Bax or Bak, results in the release of cytochrome c and the induction of
apoptosis. An altered ratio of anti-apoptotic proteins (e.g., Bcl-2) to pro-apoptotic
proteins (e.g., Bax and Bak) promotes cell survival and confers resistance to therapy.25
1.5 CANCER DETECTION
Molecular markers or biomarkers are currently used for cancer detection, diagnosis,
and monitoring therapy and are likely to play larger roles in the future. In cancer
research, a biomarker refers to a substance or process that is indicative of the presence
of cancer in the body. It might be a molecule secreted by the malignancy itself, or
it can be a specific response of the body to the presence of cancer. The biological
basis for usefulness of biomarkers is that alterations in gene sequence or expression
and in protein expression and function are associated with every type of cancer and
with its progression through the various stages of development.
Genetic mutations, changes in DNA methylation, alterations in gene expression,
and alterations in protein expression or modification can be used to detect cancer,
determine prognosis, and monitor disease progression and therapeutic response.
Currently, DNA-based, RNA-based, and protein-based biomarkers are used in cancer
risk assessment and detection. The type of biomarker used depends both on the
application (i.e., risk assessment, early detection, prognosis, or response to therapy)
and the availability of appropriate biomarkers. The relative advantages and disad-
vantages of genomic and proteomic approaches have been widely discussed, but
since a cell’s ultimate phenotype depends on the functions of expressed proteins,
proteomics has the ability to provide precise information on a cell’s phenotype.
Tumor protein biomarkers are produced either by the tumor cells themselves or by
the surrounding tissues in response to the cancer cells.
More than 80% of human tumors (colon, lung, prostate, oral cavity, esophagus,
stomach, uterine, cervix, and bladder) originate from epithelial cells, often at the
mucosal surface. Cells in these tumors secrete proteins or spontaneously slough off
into blood, sputum, or urine. Secreted proteins include growth factors, angiogenic
proteins, and proteases. Free DNA is also released by both normal and tumor cells
into the blood and patients with cancer have elevated levels of circulating DNA.
Thus, body fluids such as blood and urine are good sources for cancer biomarkers.
That these fluids can be obtained using minimally invasive methods is a great
advantage if the biomarker is to be used for screening and early detection.
From a practical point of view, assays of protein tumor biomarkers, due to their
ease of use and robustness, lend themselves to routine clinical practice, and histor-
ically tumor markers have been proteins. Indeed, most serum biomarkers used today
are antibody-based tests for epithelial cell proteins. Two of the earliest and most
widely used cancer biomarkers are PSA and CA25. Prostate-specific antigen (PSA)
6 Informatics in Proteomics
is a secreted protein produced by epithelial cells within the prostate. In the early
1980s it was found that sera from prostate cancer patients contain higher levels of
PSA than do the sera of healthy individuals. Since the late 1980s, PSA has been
used to screen asymptomatic men for prostate cancer and there has been a decrease
in mortality rates due to prostate cancer. How much of this decrease is attributable
to screening with PSA and how much is due to other factors, such as better therapies,
is uncertain. Although PSA is the best available serum biomarker for prostate cancer
and the only one approved by the FDA for screening asymptomatic men, it is far
from ideal. Not all men with prostate cancer have elevated levels of PSA; 20 to 30%
of men with prostate cancer have normal PSA levels and are misdiagnosed. Con-
versely, because PSA levels are increased in other conditions, such as benign pros-
tatic hypertrophy and prostatitis, a significant fraction of men with elevated levels
of PSA do not have cancer and undergo needless biopsies.
The CA125 antigen was first detected over 20 years ago; CA125 is a mucin-like
glycoprotein present on the cell surface of ovarian tumor cells that is released into
the blood.26 Serum CA125 levels are elevated in about 80% of womenwith epithelial
ovarian cancer but in less than 1% of healthy women. However, the CA125 test only
returns a positive result for about 50% of Stage I ovarian cancer patients and is,
therefore, not useful by itself as an early detection test.27 Also, CA125 is elevated
in a number of benign conditions, which diminishes its usefulness in the initial
diagnosis of ovarian cancer. Despite these limitations, CA125 is considered to be
one of the best available cancer serum markers and is used primarily in the man-
agement of ovarian cancer. Falling CA125 following chemotherapy indicates that
the cancer is responding to treatment.28 Other serum protein biomarkers, such as
alpha fetoprotein (AFP) for hepatocellular carcinoma and CA15.3 for breast cancer,
are also of limited usefulness as they are elevated in some individuals without cancer,
and not all cancer patients have elevated levels.
1.6 WHY PROTEOMICS HAS NOT SUCCEEDED
IN THE PAST: CANCER AS AN EXAMPLE
The inability of these protein biomarkers to detect all cancers (false negatives)
reflects both the progressive nature of cancer and its heterogeneity. Cancer is not a
single disease but rather an accumulation of several events, genetic and epigenetic,
arising in a single cell over a long period of time. Proteins overexpressed in late
stage cancers may not be overexpressed in earlier stages and, therefore, are not
useful for early cancer detection. For example, the CA125 antigen is not highly
expressed in many Stage I ovarian cancers. Also, because tumors are heterogeneous,
the same sets of proteins are not necessarily overexpressed in each individual tumor.
For example, while most patients with high-grade prostate cancers have increased
levels of PSA, approximately 15% of these patients do not have an elevated PSA
level. The reciprocal problem of biomarkers indicating the presence of cancer when
none is present (false positives) results because these proteins are not uniquely
produced by tumors. For example, PSA is produced by prostatitis (inflammation of
the prostate) and benign prostatic hyperplasia (BPH), and elevated CA125 levels are
caused by endometriosis and pelvic inflammation.
The Promise of Proteomics: Biology, Applications, and Challenges 7
The performance of any biomarker can be described in terms of its specificity
and sensitivity. In the context of cancer biomarkers, sensitivity refers to the proportion
of case subjects (individuals with confirmed disease) who test positive for the biom-
arker, and specificity refers to the proportion of control subjects (individuals without
disease) who test negative for the biomarker. An ideal biomarker test would have
100% sensitivity and specificity; i.e., everyone with cancer would have a positive
test, and everyone without cancer would have a negative test. None of the currently
available protein biomarkers achieve 100% sensitivity and specificity. For example,
as described above, PSA tests achieve 70 to 90% sensitivity and only about 25%
specificity, which results in many men having biopsies when they do not have
detectable prostrate cancer. The serum protein biomarker for breast cancer CA15.3
has only 23% sensitivity and 69% specificity. Other frequently used terms are positive
predictive value (PPV), the chance that a person with a positive test has cancer, and
negative predictive value (NPV), the chance that a person with a negative test does
not have cancer. PPV is affected by the prevalence of disease in the screened popu-
lation. For a given sensitivity and specificity, the higher the prevalence, the higher
the PPV. Even when a biomarker provides high specificity and sensitivity, it may not
be useful for screening the general population if the cancer has low prevalence. For
example, a biomarker with 100% sensitivity and 95% specificity has a PPV of only
17% for a cancer with 1% prevalence (only 17 out of 100 people with a positive test
for the biomarker actually have cancer) and 2% for a cancer with 0.1% prevalence.
The prevalence of ovarian cancer in the general population is about 0.04%. Thus, a
biomarker used to screen the general population must have significantly higher spec-
ificity and sensitivity than a biomarker used to monitor an at-risk population.
1.7 HOW HAVE PROTEOMIC APPROACHES
CHANGED OVER THE YEARS?
Currently investigators are pursuing three different approaches to develop biomarkers
with increased sensitivity and specificity. The first is to improve on a currently used
biomarker. For instance, specificity and sensitivity of PSA may be improved by
measurement of its complex with alpha(1)-antichymotrypsin; patients with benign
prostate conditions have more free PSA than bound, while patients with cancer have
more bound PSA than free.29 This difference is thought to result from differences in
the type of PSA released intothe circulation by benign and malignant prostatic cells.
Researchers are also trying to improve the specificity and sensitivity of PSA by
incorporating age- and race-specific cut points and by adjusting serum PSA concen-
tration by prostatic volume (PSA density). The second approach is to discover and
validate new biomarkers that have improved sensitivity and specificity. Many inves-
tigators are actively pursuing new biomarkers using a variety of new and old tech-
nologies. The third approach is to use a panel of biomarkers, either by combining
several individually identified biomarkers or by using mass spectrometry to identify
a pattern of protein peaks in sera that can be used to predict the presence of cancer
or other diseases. High-throughput proteomic methodologies have the potential to
revolutionize protein biomarker discovery and to allow for multiple markers to be
assayed simultaneously.
8 Informatics in Proteomics
In the past, researchers have mostly used a one-at-time approach to biomarker
discovery. They have looked for differences in the levels of individual proteins in
tissues or blood from patients with disease and from healthy individuals. The choice
of proteins to examine was frequently based on biological knowledge of the cancer
and its interaction with surrounding tissues. This approach is laborious and time
consuming, and most of the biomarkers discovered thus far do not have sufficient
sensitivity and specificity to be useful for early cancer detection. A mainstay of
protein biomarker discovery has been two-dimensional gel electrophoresis (2DE).
The traditional 2DE method is to separately run extracts from control and diseased
tissues or cells and to compare the relative intensities of the various protein spots
on the stained gels. Proteins whose intensities are significantly increased or decreased
in diseased tissues are identified using mass spectrometry. For example, 2DE was
recently used to identify proteins that are specifically overexpressed in colon cancer.30
The limitations of the 2DE approach are well known: the gels are difficult to run
reproducibly, a significant fraction of the proteins either do not enter the gels or are
not resolved, low-abundance proteins are not detected, and relatively large amounts
of sample are needed. A number of modifications have been made to overcome these
limitations, including fractionation of samples prior to 2DE, the use of immobilized
pH gradients, and labeling proteins from control and disease cells with different
fluorescent dyes and then separating them on the same gel (differential in-gel elec-
trophoresis; DIGE). An additional difficulty is contamination from neighboring
stromal cells that can confound the detection of tumor-specific markers. Laser
capture microdissection (LCD) can be used to improve the specificity of 2DE, as it
allows for the isolation of pure cell populations; however, it further reduces the
amount of sample available for analysis. Even with these modifications, 2DE is a
relatively low throughput methodology that only samples a subset of the proteome,
and its applicability for screening and diagnosis is very limited.
A number of newer methods for large-scale protein analysis are being used or
are under development. Several of these rely on mass spectrometry and database
interrogation. Mass spectrometers work by imparting an electrical charge to the
analytes (e.g., proteins or peptides) and then sending the charged particles though
a mass analyzer. A time of flight (TOF) mass spectrometer measures the time it
takes a charged particle (protein or peptide) to reach the detector; the higher the
mass the longer the flight time. A mixture of proteins or peptides analyzed by TOF
generates a spectrum of protein peaks. TOF mass spectrometers are used to analyze
peptide peaks generated by protease digestion of proteins resolved on 2DE. A major
advance in this methodology is matrix-assisted laser desorption ionization (a form
of soft ionization), which allows for the ionization of larger biomolecules such as
proteins and peptides. TOF mass spectrometers are also used to identify peptides
eluted from HPLC columns.
With tandem mass spectrometers (MS/MS), a mixture of charged peptides is
separated in the first MS according to their mass-to-charge ratios, generating a list
of peaks. In the second MS, the spectrometer is adjusted so that a single
mass-to-charge species is directed to a collision cell to generate fragment ions, which
are then separated by their mass-to-charge ratios. These patterns are compared to
databases to identify the peptide and its parent protein. Liquid chromatography
The Promise of Proteomics: Biology, Applications, and Challenges 9
combined with MS or MS/MS (LC-MS and LC-MS/MS) is currently being used as
an alternative to 2DE to analyze complex protein mixtures. In this approach, a mixture
of proteins is digested with a protease, and the resulting peptides are then fractionated
by liquid chromatography (typically reverse-phase HPLC) and analyzed by MS/MS
and database interrogation. A major limitation to this approach is the vast number of
peptides generated when the initial samples contain a large number of proteins. Even
the most advanced LC-MS/MS systems cannot resolve and analyze these complex
peptide mixtures, and currently it is necessary to either prefractionate the proteins
prior to proteolysis or to enrich for certain types of peptides (e.g., phosphorylated,
glycoslylated, or cysteine containing) prior to liquid chromatography.
Although the use of mass spectrometry has accelerated the pace of protein
identification, it is not inherently quantitative and the amounts of peptides ionized
vary. Thus, the signal obtained in the mass spectrometer cannot be used to measure
the amount of protein in the sample. Several comparative mass spectrometry methods
have been developed to determine the relative amounts of a particular peptide or
protein in two different samples. These approaches rely on labeling proteins in one
sample with a reagent containing one stable isotope and labeling the proteins in the
other sample with the same reagent containing a different stable isotope. The samples
are then mixed, processed, and analyzed together by mass spectrometry. The mass
of a peptide from one sample will be different by a fixed amount from the same
peptide from the other sample. One such method (isotope-coded affinity tags; ICAT)
modifies cysteine residues with an affinity reagent that contains either eight hydrogen
or eight deuterium atoms.31 Other methods include digestion in 16O and 18O water
and culturing cells in 12C- and 13C-labeled amino acids.
Although the techniques described thus far are useful for determining proteins
that are differently expressed in control and disease, they are expensive, relatively
low throughput, and not suitable for routine clinical use. Surface-enhanced laser
description ionization time-of-flight (SELDI-TOF) and protein chips are two pro-
teomic approaches that have the potential to be high throughput and adaptable to
clinical use. In the SELDI-TOF mass spectrometry approach, protein fractions or
body fluids are spotted onto chromatographic surfaces (ion exchange, reverse phase,
or metal affinity) that selectively bind a subset of the proteins (Ciphergen® Protein-
Chip Arrays). After washing to remove unbound proteins, the bound proteins are
ionized and analyzed by TOF mass spectrometry. This method has been used to
identify disease-related biomarkers, including the alpha chain of haptoglobin
(Hp-alpha) for ovarian cancer32 and alpha defensin for bladder cancer. Other inves-
tigators are using SELDI-TOF to acquire proteomic patterns from whole sera, urine,
or other body fluids. The complex patterns of proteins obtained by the TOF mass
spectrometer are analyzed using pattern recognition algorithms to identify a set of
protein peaks that can be used to distinguish disease from control. With this approach,
protein identification and characterization are not necessary for development of clin-
ical assays, and a SELDI protein profile may be sufficient for screening. For example,
this method has been reported to identify patients with Stage I ovarian cancer with
100% sensitivity and 95% specificity.27 Similar, albeit less dramatic, results have
been reported for other types of cancer.28,33–36 At this time, it is uncertain whether
SELDI protein profiling will prove to be as valuable a diagnostic tool as the initial
10 Informatics in Proteomics
reports have suggested. A major technical issue is the reproducibility of the protein
profiles. Variability between SELDI-TOF instruments, in the extent of peptide ion-
ization, in the chips used to immobilize the proteins, and in sample processing, can
contribute to the lack of reproducibility. There is concern that the protein peaks
identified by SELDI and used for discriminating between cancer and control are not
derived from the tumor per se but rather from the body’s response to the cancer
(epiphenomena) and that they may not be specific for cancer; inflammatory condi-
tions and benign pathologies may elicit the same bodily responses.37,38 Most known
tumor marker proteins in the blood are on the order of ng/ml (PSA above 4 ng/ml
and alpha fetoprotein above 20 ng/ml are considered indicators of, respectively,
prostate and hepatocellular cancers). The SELDI-TOF peptide peaks typically used
to distinguish cancer from control are relatively large peaks representing proteins
present in the serum on the order of μg to mg/ml; these protein peaks may result
from cancer-induced proteolysis or posttranslational modification of proteins nor-
mally present in sera. Although identification of these discriminating proteins may
not be necessary for this “black-box” approach to yield a clinically useful diagnostic
test, identifying these proteins may help elucidate the underlying pathology and lead
to improved diagnostic tests. Potential advantages of the SELDI for clinical assays
are that it is high throughput, it is relatively inexpensive, and it uses minimally
invasive specimens (blood, urine, sputum).
Interest in protein chips in part reflects the success of DNA microarrays. While
these two methodologies have similarities, a number of technical and biological
differences exist that make the practical application of protein chips or arrays chal-
lenging. Proteins, unlike DNA, must be captured in their native conformation and
are easily denatured irreversibly. There is no method to amplify their concentrations,
and their interactions with other proteins and ligands are less specific and of variable
affinity. Current bottlenecks in creating protein arrays include the production (expres-
sion and purification) of the huge diversity of proteins that will form the array
elements, methods to immobilize proteins in their native states on the surface, and
lack of detection methods with sufficient sensitivity and accuracy. To date, the most
widely used application of protein chips are antibody microarrays that have the
potential for high-throughput profiling of a fixed number of proteins. A number of
purified, well-characterized antibodies are spotted onto a surface and then cell extracts
or sera are passed over the surface to allow for the antigen to bind to the specific,
immobilized antibodies. The bound proteins are detected either by using secondary
antibodies against each antigen or by using lysates that are tagged with fluorescent
or radioactive labels. A variation that allows for direct comparison between two
different samples is to label each extract with a different fluorescent dye, which is
then mixed prior to exposure to the antibody array. A significant problem with
antibody arrays is lack of specificity; the immobilized antibodies cross react with
proteins other than the intended target. The allure of protein chips is their potential
to rapidly analyze multiple protein markers simultaneously at a moderate cost.
As discussed earlier, most currently available cancer biomarkers lack sufficient
sensitivity and specificity for use in early detection, especially to screen asymptom-
atic populations. One approach to improve sensitivity and specificity is to use a
panel of biomarkers. It is easy to envision how combining biomarkers can increase
The Promise of Proteomics: Biology, Applications, and Challenges 11
sensitivity if they detect different pathological processes or different stages of cancer,
and one factor to consider in developing such a panel is whether the markers are
complementary. However, simply combining two biomarkers will more than likely
decrease specificity and increase the number of false positives. Reducing their cutoff
values (the concentration of a biomarker that is used as an indication of the presence
of cancer) can be useful to reduce the number of false positives. A useful test for
evaluating a single biomarker or panel of biomarkers is the receiver operating
characteristic (ROC) curve. An ROC curve is a graphical display of false-positive
rates and true-positive rates from multiple classification rules (different cutoff values
for the various biomarkers). Each point on the graph corresponds to a different
classification rule. In addition to analyzing individually measured markers, ROC
curves can be used to analyze SELDI-TOF proteomic profiles.39
The measurement and analysis of biomarker panels will be greatly facilitated
by high-throughput technologies such as protein arrays, microbeads with multiple
antibodies bound to them, and mass spectrometry. It is in these areas that a number
of companies are concentrating their efforts, as not only must a biomarker or panel
of biomarkers have good specificity and sensitivity, there must be an efficient and
cost-effective method to assay them.
1.8 FUTURE OF PROTEOMICS IN DRUG DISCOVERY,
SCREENING, EARLY DETECTION,
AND PREVENTION
Proteomics has benefited greatly from the development of high-throughput meth-
ods to simultaneously study thousands of proteins. The successful application of
proteomics to medical diagnostics will require the combined efforts of basic
researchers, physicians, pathologists, technology developers, and information sci-
entists (Figure 1.1). However, its application in clinics will require development
FIGURE 1.1 Application of medical proteomics: Interplay between various disciplines and
expertise is the key to developing tools for detection, diagnosis, and treatment of cancer.
Technologist Information Scientist
Basic Scientist Physician/Scientist
Cancer
Biorespository
BIOMARKERS
DIAGNOSTICS
THERAPEUTICS
12 Informatics in Proteomics
of test kits based on pattern analysis, single molecule detection, or multiplexing
of several clinical acceptable tests, such as ELISA, for various targets in a sys-
tematic way under rigorous quality control regimens (Figure 1.2). Interperson
heterogeneity is a major hurdle when attempting to discover a disease-related
biomarker within biofluids such as serum. However, the coupling of high-through-
put technologies with protein science now enables samples from hundreds of
patients to be rapidly compared. Admittedly, proteomic approaches cannot remove
the “finding a needle in a haystack” requirement for discovering novel biomarkers;
however, we now possess the capability to inventory components within the
“haystack” at an unprecedented rate. Indeed, such capabilities have already begun
to bear fruits as our knowledge of the different types of proteins within serum is
growing exponentially and novel technologies for diagnosing cancers using pro-
teomic technologies are emerging.
Is the development of methods capable of identifying thousands of proteins in
a high-throughput manner going to lead to novel biomarkers for the diagnosis of
early stage diseases or is the amount of data that is accumulated in such studies
going to be overwhelming? The answer to this will depend on our ability to develop
and successfully deploy bioinformatic tools. Based on the rate at which interesting
leads are being discovered, it is likely that not only will biomarkers with better
sensitivity and specificity be identified but individuals will be treated using custom-
ized therapies based on their specific protein profile. The promise of proteomics for
discovery is its potential to elucidate fundamental information on the biology of
cells, signaling pathways, and disease processes; to identify disease biomarkers and
new drug targets; and to profile drug leads for efficacy and safety. The promise of
FIGURE 1.2 Strategies in medical proteomics: Steps in identification of detection targets
and the development of clinical assays.
Protein Profiling
Define Protein Changes
1. 2DE
2. SELDI-TOF-MS
3. LC-coupled MS
Bio-informatics
Bio-computation
Databases
Protein Identification
1. Nano-LC-coupled SELDI-MS
2. CapLC-MS/MS
3. TOF-MS
Assay Development
1. ELISA
2. SELDI-based
3. Ab arrays
Functional Analysis
1. Protein-protein interaction
2. Cellular targeting
3. Protein-ligand interactions
The Promise of Proteomics: Biology, Applications, and Challenges 13
proteomics for clinical use is the refinement and development of protein-based assays
that are accurate, sensitive, robust, and high throughput. Since many of the proteomic
technologies and data management tools are still in their infancy, their validations
and refinements are going to be the most important tasks in the future.
REFERENCES
1. Wasinger, V.C., Cordwell, S.J., Cerpa-Poljak, A., et al. Progress with gene-product
mapping of the Mollicutes: Mycoplasma genitalium. Electrophoresis, 16, 1090–1094,
1995.
2. Banks, R.E., Dunn, M.J., Hochstrasser, D.F., et al. Proteomics: New perspectives,
new biomedical opportunities. Lancet, 356, 1749–1756, 2000.
3. Anderson, N.L., Matheson, A.D., and Steiner, S. Proteomics: Applications in basic
and applied biology. Curr. Opin. Biotechnol., 11, 408–412, 2000.
4. Genther, S.M., Sterling, S., Duensing, S., Munger, K., Sattler, C., and Lambert, P.F.
Quantitative role of the human papillomavirus type 16 E5 gene during the productive
stage of the viral life cycle. J. Virol., 77, 2832–2842, 2003.
5. Middleton, K., Peh, W., Southern, S., et al. Organization of human papillomavirus
productive cycle during neoplastic progression provides a basis for selection of
diagnostic markers. J. Virol., 77, 10186–10201, 2003.
6. Verma, M., Lambert, P.F., and Srivastava, S.K. Meeting highlights: National Cancer
Institute workshop on molecular signatures of infectious agents. Dis. Markers, 17,
191–201, 2001.
7. Verma, M. and Srivastava, S. New cancer biomarkers deriving from NCI early detec-
tion research. Recent Results Canc. Res., 163, 72–84; discussion, 264–266, 2003.
8. Verma, M. and Srivastava, S. Epigenetics in cancer: implications for early detection
and prevention. Lancet Oncol., 3, 755–763, 2002.
9. Diaz, J.J., Giraud, S., and Greco, A. Alteration of ribosomal protein maps in herpes
simplex virus type 1 infection. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci.,
771, 237–249, 2002.
10. Greco, A., Bausch, N., Coute, Y., and Diaz, J.J. Characterization by two-dimensional
gel electrophoresis of host proteins whose synthesis is sustained or stimulated during
the course of herpes simplex virus type 1 infection. Electrophoresis, 21, 2522–2530,
2000.
11. Greco, A., Bienvenut, W., Sanchez, J.C., et al. Identification of ribosome-associated
viral and cellular basic proteins during the course of infection with herpes simplex
virus type 1. Proteomics, 1, 545–549, 2001.
12. Laurent, A.M., Madjar, J.J., and Greco, A. Translational control of viral and host
protein synthesis during the course of herpes simplex virus type 1 infection: evidence
that initiation of translation is the limiting step. J. Gen. Virol., 79, 2765–2775, 1998.
13. Gallo, R.C. Thematic review series. XI: Viruses in the origin of human cancer.
Introduction and overview. Proc. Assoc. Am. Phys,, 111, 560–562, 1999.
14. Wallace, D.C. Mitochondrial diseases in man and mouse. Science, 283, 1482–1488,
1999.
15. Enns, G.M. The contribution of mitochondria to common disorders. Mol. Genet.
Metab., 80, 11–26, 2003.
16. Maechler, P. and Wollheim, C.B. Mitochondrial function in normal and diabetic
beta-cells. Nature, 414, 807–812, 2001.
14 Informatics in Proteomics
17. Lopez, M.F. and Melov, S. Applied proteomics: mitochondrial proteins and effect on
function. Circ. Res., 90, 380–389, 2002.
18. Kumar, A., Agarwal, S., Heyman, J.A., et al. Subcellular localization of the yeast
proteome. Genes Dev., 16, 707–719, 2002.
19. Werhahn, W. and Braun, H.P. Biochemical dissection of the mitochondrial proteome
from Arabidopsis thaliana by three-dimensional gel electrophoresis. Electrophoresis,
23, 640–646, 2002.
20. Mootha, V.K., Bunkenborg, J., Olsen, J.V., et al. Integrated analysis of protein com-
position, tissue diversity, and gene regulation in mouse mitochondria. Cell, 115,
629–640, 2003.
21. Lin, T.K., Hughes, G., Muratovska, A., et al. Specific modification of mitochondrial
protein thiols in response to oxidative stress: A proteomics approach. J. Biol. Chem.,
277, 17048–17056, 2002.
22. Brookes, P.S., Pinner, A., Ramachandran, A., et al. High throughput two-dimensional
blue-native electrophoresis: A tool for functional proteomics of mitochondria and
signaling complexes. Proteomics, 2, 969–977, 2002.
23. Richly, E., Chinnery, P.F., and Leister, D. Evolutionary diversification of mitochon-
drial proteomes: Implications for human disease. Trends Genet., 19, 356–362, 2003.
24. Koc, E.C., Burkhart, W., Blackburn, K., Moseley, A., Koc, H., and Spremulli, L.L.
A proteomics approach to the identification of mammalian mitochondrial small sub-
unit ribosomal proteins. J. Biol. Chem., 275, 32585–32591, 2000.
25. Newmeyer, D.D. and Ferguson-Miller, S. Mitochondria: Releasing power for life and
unleashing the machineries of death. Cell, 112, 481–490, 2003.
26. Yin, B.W., Dnistrian, A., and Lloyd, K.O. Ovarian cancer antigen CA125 is encoded
by the MUC16 mucin gene. Int. J. Canc., 98, 737–740, 2002.
27. Petricoin, E.F., Ardekani, A.M., Hitt, B.A., et al. Use of proteomic patterns in serum
to identify ovarian cancer. Lancet, 359, 572–577, 2002.
28. Li, J., Zhang, Z., Rosenzweig, J., Wang, Y.Y., and Chan, D.W. Proteomics and
bioinformatics approaches for identification of serum biomarkers to detect breast
cancer. Clin. Chem., 48, 1296–1304, 2002.
29. Martinez, M., Espana, F., Royo, M., et al. The proportion of prostate-specific antigen
(PSA) complexed to alpha(1)-antichymotrypsin improves the discrimination between
prostate cancer and benign prostatic hyperplasia in men with a total PSA of 10 to
30 microg/L. Clin. Chem., 48, 1251–1256, 2002.
30. Brunagel, G., Schoen, R.E., and Getzenberg, R.H. Colon cancer specific nuclear
matrix protein alterations in human colonic adenomatous polyps. J. Cell Biochem.,
91, 365–374, 2004.
31. Gygi, S.P., Rist, B., Gerber, S.A., Turecek, F., Gelb, M.H., and Aebersold, R. Quan-
titative analysis of complex protein mixtures using isotope-coded affinity tags. Nat.
Biotechnol., 17, 994–999, 1999.
32. Ye, B., Cramer, D.W., Skates, S.J., et al. Haptoglobin-alpha subunit as potential serum
biomarker in ovarian cancer: Identification and characterization using proteomic
profiling and mass spectrometry. Clin. Canc. Res., 9, 2904–2911, 2003.
33. Adam, B.L., Qu, Y., Davis, J.W., et al. Serum protein fingerprinting coupled with a
pattern-matching algorithm distinguishes prostate cancer from benign prostate hyper-
plasia and healthy men. Canc. Res., 62, 3609–3614, 2002.
34. Poon, T.C., Yip, T.T., Chan, A.T., et al. Comprehensive proteomic profiling identifies
serum proteomic signatures for detection of hepatocellular carcinoma and its sub-
types. Clin. Chem., 49, 752–760, 2003.
The Promise of Proteomics: Biology, Applications, and Challenges 15
35. Kozak, K.R., Amneus, M.W., Pusey, S.M., et al. Identification of biomarkers for
ovarian cancer using strong anion-exchange ProteinChips: Potential use in diagnosis
and prognosis. Proc. Natl. Acad. Sci. USA, 100, 12343–12348, 2003.
36. Petricoin, E.F., III, Ornstein, D.K., Paweletz, C.P., et al. Serum proteomic patterns
for detection of prostate cancer. J. Natl. Canc. Inst., 94, 1576–1578, 2002.
37. Diamandis, E.P. Point: Proteomic patterns in biological fluids: Do they represent the
future of cancer diagnostics? Clin. Chem., 49, 1272–1275, 2003.
38. Petricoin, E., III and Liotta, L.A. Counterpoint: The vision for a new diagnostic
paradigm. Clin. Chem., 49, 1276–1278, 2003.
39. Baker, S.G. The central role of receiver operating characteristic (ROC) curves in
evaluating tests for the early detection of cancer. J. Natl. Canc. Inst., 95, 511–515,
2003.
Informatics In Proteomics 1st Edition Sudhir Srivastava
17
2 ProteomicsTechnologies
and Bioinformatics
Sudhir Srivastava and Mukesh Verma
CONTENTS
2.1 Introduction: Proteomics in Cancer Research...............................................17
2.1.1 Two-Dimensional Gel Electrophoresis (2DE)...................................17
2.1.2 Mass Spectrometry.............................................................................18
2.1.3 Isotope-Coded Affinity Tags (ICAT) .................................................19
2.1.4 Differential 2DE (DIGE) ...................................................................19
2.1.5 Protein-Based Microarrays ................................................................20
2.2 Current Bioinformatics Approaches in Proteomics.......................................23
2.2.1 Clustering ...........................................................................................24
2.2.2 Artificial Neural Networks.................................................................25
2.2.3 Support Vector Machine (SVM)........................................................25
2.3 Protein Knowledge System............................................................................26
2.4 Market Opportunities in Computational Proteomics.....................................26
2.5 Challenges ......................................................................................................27
2.6 Conclusion......................................................................................................28
References................................................................................................................28
2.1 INTRODUCTION: PROTEOMICS
IN CANCER RESEARCH
Proteomics is the study of all expressed proteins. A major goal of proteomics is a
complete description of the protein interaction networks underlying cell physiology.
Before we discuss protein computational tools and methods, we will give a brief
background of current proteomic technologies used in cancer diagnosis. For cancer
diagnosis, both surface-enhanced laser desorption ionization (SELDI) and
two-dimensional gel electrophoresis (2DE) approaches have been used.1,2 Recently
protein-based microarrays have been developed that show great promise for analyz-
ing the small amount of samples and yielding the maximum data on the cell’s
microenvironment.3–5
18 Informatics in Proteomics
2.1.1 TWO-DIMENSIONAL GEL ELECTROPHORESIS (2DE)
The recent upsurge in proteomics research has been facilitated largely by stream-
lining of 2DE technology and parallel developments in MS for analysis of peptides
and proteins. Two-dimensional gel electrophoresis is used to separate proteins based
on charge and mass and can be used to identify posttranslationally modified proteins.
A major limitation of this technology in proteomics is that membrane proteins
contain a considerable number of hydrophobic amino acids, causing them to precip-
itate during the isoelectric focusing of standard 2DE.6 In addition, information
regarding protein– protein interactions is lost during 2DE due to the denaturing
conditions used in both gel dimensions. To overcome these limitations, two-dimen-
sional blue-native gel electrophoresis has been used to resolve membrane proteins.
In this process, membrane protein complexes are solubilized and resolved in the
native forms in the first dimension. The separation in the second dimension is
performed by sodium dodecyl sulfate polyacrylamide gel electrophoresis
(SDS-PAGE), which denatures the complexes and resolves them into their separate
subunits. Protein spots are digested with trypsin and analyzed by matrix-assisted
laser ionization desorption time-of-flight mass spectrometry (MALDI-TOF MS).
The 2DE blue-native gel electrophoresis is suitable for small biological samples and
can detect posttranslational modifications (PTMs) in proteins. Common PTMs
include phosphorylation, oxidation and nitrosation, fucosylation and galactosylation,
reaction with lipid-derived aldehydes, and tyrosine nitration. Improvements are
needed to resolve low-molecular-mass proteins, especially those with isoelectric
points below pH 3 and above pH 10. This technique has low throughput (at the most
30 samples can be run simultaneously), and most of the steps are manual. Automatic
spot-picking also needs improvement.
2.1.2 MASS SPECTROMETRY
Mass spectrometry (MS) is an integral part of the proteomic analysis. MS instruments
are made up of three primary components: the source, which produces ions for analysis;
the mass analyzer, which separates the ions based on their mass-to-charge ratios (m/z);
and the detector, which quantifies the ions resolved by the analyzer. Multiple subtypes
of ion sources, analyzers, and detectors have been developed, and different components
can be combined to create different instruments, but the principle remains the same—
the spectrometers create ion mixtures from a sample and then resolve them into their
component ions based on their m/z values. Significant improvements have been made
in spectrometric devices during the past two decades, allowing precise analysis of
biomolecules too fragile to survive earlier instrumentation. For ionization of peptides
and proteins, these ionization sources are usually coupled to time-of-flight (TOF)2,7,8
spectrometers. Historically, MS has been limited to the analysis of small molecules.
Larger biomolecules, such as peptides or proteins, simply do not survive the harsh
ionization methods available to create the ions. ESI (electrospray ionization),9 MALDI,
and SELDI techniques permit a gentler ionization of large biomolecules, called soft
ionization, without too much fragmentation of the principal ions. ESI and MALDI were
both developed during the late 1980s and were the foundation for the emergence of
MS as a tool of investigation of biological samples. Although MALDI equipment is
Proteomics Technologies and Bioinformatics 19
expensive, quantitative high throughput can be achieved (about 100 samples per day
can be run by a single laboratory).
SELDI, developed in the early 1990s, is a modification of the MALDI approach
to ionization. All the ionization techniques described above are sensitive in the
picomole-to-femtomole range that is required for application to biological samples,
carbohydrates; oligonucleotides; small polar molecules; and peptides, proteins, and
posttranslationally modified proteins.
Tandem mass analyzers are instruments used for detailed structural analysis of
selected peptides. An example of this kind of analyzer is ABI’s QSTAR® (Applied
Biosystems, Foster City, CA), a hybrid system that joins two quadrupoles in tandem
with a TOF analyzer.10 Particular tryptic peptide fragments can be sequentially
selected and subfragmented in the two quadrupoles, and then the subfragments can
be measured in the analyzer. The resulting pattern is somewhat like the sequence-ladder
pattern obtained in DNA sequencing. Although the analysis of the protein pattern
is more complex than DNA sequencing, software is available that allows the direct
determination of the amino acid sequence of peptides. Based on the peptide sequence
information, it is possible to identify the parent protein in the database.
2.1.3 ISOTOPE-CODED AFFINITY TAGS (ICAT)
Isotope-coded affinity tags (ICAT)11 is a technology that facilitates quantitative pro-
teomic analysis. This approach uses isotope tagging of thiol-reactive group to label
reduced cysteine residues, and a biotin affinity tag to isolate the labeled peptides. These
two functional groups are joined by linkers that contain either eight hydrogen atoms
(light reagent) or eight deuterium atoms (heavy reagent). Proteins in a sample (cancer)
are labeled with the isotopically light version of the ICAT reagent, while proteins in
another sample (control) are labeled by the isotopically heavy version of the ICAT
reagent. The two samples are combined, digested to generate peptide fragments, and
the cysteine-containing peptides are enriched by avidin affinity chromatography. This
results in an approximately tenfold enrichment of the labeled peptides. The peptides
may be further purified and analyzed by reverse-phase liquid chromatography, fol-
lowed by MS. The ratio of the isotopic molecular mass peaks that differ by 8 Da
provides a measure of the relative amounts of each protein in the original samples.
This technology is good for detection of differentially expressed proteins between two
pools. Recently the method has been modified to include 16O and 18O water and culture
cells in 12C- and 13C-labeled amino acids. Problems with ICAT include its dependency
on radioactive materials, its low throughput (about 30 samples per day), it only detects
proteins that contain cysteine, and labeling decreases over time (see also Chapter 16).
2.1.4 DIFFERENTIAL 2DE (DIGE)
Differential 2DE (DIGE) allows for a comparison of differentially expressed proteins
in up to three samples. In this technology, succinimidyl esters of the cyanine dyes, Cy2,
Cy3, and Cy5, are used to fluorescently label proteins in up to three different pools of
proteins. After labeling, samples are mixed and run simultaneously on the same 2DE.12
Images of the gel are obtained using three different excitation/emission filters, and the
ratios of different fluorescent signals are used to find protein differences among the
20 Informatics in Proteomics
samples. The problem with DIGE is that only 2% of the lysine residues in the proteins
can be fluorescently modified, so that the solubility of the labeled proteins is maintained
during electrophoresis. An additional problem with this technology is that the labeled
proteins migrate with slightly higher mass than the bulk of the unlabeled proteins. DIGE
technology is more sensitive than silver stain formulations optimized for MS. SYPRO
Ruby dye staining detects 40% more protein spots than the Cy dyes.
2.1.5 PROTEIN-BASED MICROARRAYS
DNA microarrays have proven to be a powerful technology for large-scale gene
expression analysis. A related objective is the study of selective interactions between
proteins and other biomolecules, including other proteins, lipids, antibodies, DNA,
and RNA. Therefore, the development of assays that could detect protein-directed
interactions in a rapid, inexpensive way using a small number of samples is highly
desirable. Protein-based microarrays provide such an opportunity. Proteins are sep-
arated using any separation mode, which may consist of ion exchange liquid
chromatography (LC), reverse-phase LC, or carrier ampholyte–based separations,
such as Rotophor. Each fraction obtained after the first dimensional separation can
be further resolved by other methods to yield either purified protein or fractions
containing a limited number of proteins that can directly be arrayed or spotted. A
robotic arrayer is used for spotting provided the proteins remain in liquid form
throughout the separation procedure. These slides are hybridized with primary anti-
bodies against a set of proteins and the resulting immune complex detected. The
resulting image shows only these fractions that react with a specific antibody. The use
of multidimensional techniques to separate thousands of proteins enhances the utility
of protein microarray technology. This approach is sensitive enough to detect specific
proteins in individual fractions that have been spotted directly without further con-
centration of the proteins in individual fraction. However, one of the limitations of
the nitrocellulose-based array chip is the lack of control over orientation in the
immobilization process and optimization of physical interactions between immobi-
lized macromolecules and their corresponding ligands, which can affect sensitivity
of the assay.
Molecular analysis of cells in their native tissue microenvironment can provide
the most desirable situation of in vivo states of the disease. However, the availability
of low numbers of cells of specific populations in the tissue poses a challenge. Laser
capture microdissection (LCM) helps alleviate this matter as this technology is
capable of procuring specific, pure subpopulations of cells directly from the tissue.
Protein profiling of cancer progression within a single patient using selected longi-
tudinal study sets of highly purified normal, premalignant, and carcinoma cells
provides the unique opportunity to not only ascertain altered protein profiles but
also to determine at what point in the cancer progression these alterations in protein
patterns occur. Preliminary results from one such study suggest complex cellular
communication between epithelial and stroma cells. A majority of the proteins in
this study are signal transduction proteins.5 Protein-based microarrays were used in
this study. Advantages and disadvantages of some proteomic-relevant technologies
are listed in Table 2.1.
Proteomics Technologies and Bioinformatics 21
TABLE
2.1
Comparisons
of
Various
Proteomic
Technologies
Characteristics
ELISA
2DE
PAGE
IsotopeCoded
Affinity
Tag
(ICAT)
TM
Multidimensional
Protein
Identification
Technology
(MudPIT)
TM
Proteomic
Pattern
Diagnostics
Protein
Microarrays
Chemiluminescence
or
fluorescence-based
2DE
serological
proteome
analysis
(SERPA);
2DGE
+
serum
immunoblotting
ICAT/LC-EC-MS/MS;
ICAT/LCMS/MS/MALD
I
2D
LC-MS/MS
a
MALDI-TOF;
SELDI-TOF;
SELDIT-OF/QStar
TM
Antibody
arrays:
chemiluminescence
multi-ELISA
platforms;
glass
fluorescence
based
(Cy3Cy5);
tissue
arrays
Sensitivity
Highest
Low,
particularly
for
less
abundant
proteins;
sensitivity
limited
by
detection
method;
difficult
to
resolve
hydrophobic
proteins
High
High
Medium
sensitivity
with
diminishing
yield
at
higher
molecular
weights;
improved
with
fitting
of
high-resolution
QStar
mass
spectrometer
to
SELDI
Medium
to
highest
(depending
on
detection
system)
Direct
identification
of
markers
N/A
Yes
Yes
Yes
No;
possible
with
additional
high-resolution
MS
Possible
when
coupled
to
MS
technologies;
or
probable,
if
antibodies
have
been
highly
defined
by
epitope
mapping
and
neutralization
Use
Detection
of
single,
well-
characterized
specific
analyte
in
plasma/serum,
tissue;
gold
standard
of
clinical
assays
Identification
and
discovery
of
biomarkers
not
a
direct
means
for
early
detection
in
itself
Quantification
of
relative
abundance
of
proteins
from
two
different
cell
states
Detection
and
ID
of
potential
biomarkers
Diagnostic
pattern
analysis
in
body
fluids
and
tissues
(LCM);
potential
biomarker
identification
Multiparametric
analysis
of
many
analytes
simultaneously
(Continued)
22 Informatics in Proteomics
TABLE
2.1
Comparisons
of
Various
Proteomic
Technologies
(Continued)
Characteristics
ELISA
2DE
PAGE
IsotopeCoded
Affinity
Tag
(ICAT)
TM
Multidimensional
Protein
Identification
Technology
(MudPIT)
TM
Proteomic
Pattern
Diagnostics
Protein
Microarrays
Throughput
Moderate
Low
Moderate/low
Very
low
High
High
Advantages/
drawbacks
Very
robust,
well-established
use
in
clinical
assays;
requires
well-characterized
antibody
for
detection;
requires
extensive
validation
not
amenable
to
direct
discovery;
calibration
(standard)
dependent;
FDA
regulated
for
clinical
diagnostics
Requires
a
large
number
of
samples;
all
identifications
require
validation
and
testing
before
clinical
use;
reproducible
and
more
quantitative
combined
with
fluorescent
dyes;
not
amenable
for
high
throughput
or
automation;
limited
resolution,
multiple
proteins
may
be
positioned
at
the
same
location
on
the
gel
Robust,
sensitive,
and
automated;
suffers
from
the
demand
for
continuous
on-the-fly
selection
of
precursor
ions
for
sequencing;
coupling
with
MALDI
promises
to
overcome
this
limitation
and
increase
efficiency
of
proteomic
comparison
of
biological
cell
states;
still
not
highly
quantitative
and
difficult
to
measure
subpg/ml
concentrations
Significantly
higher
sensitivity
than
2D-
PAGE;
much
larger
coverage
of
the
proteome
for
biomarker
discovery;
not
reliable
for
low
abundance
proteins
and
low-molecular-weight
fractions
SELDI
protein
identification
not
necessary
for
biomarker
pattern
analysis;
reproducibility
problematic,
improved
with
QStar
addition;
revolutionary
tool;
1-2
μl
of
material
needed;
upfront
fractionation
of
protein
mixtures
and
downstream
purification
methods
necessary
to
obtain
absolute
protein
quantification;
MALDI
crystallization
of
protein
can
lack
reproducibility
and
be
matrix
dependent;
high
MW
proteins
requires
MS/MS
Format
is
flexible;
can
be
used
to
assay
for
multiple
analytes
in
a
single
specimen
or
a
single
analyte
in
a
number
of
specimens;
requires
prior
knowledge
of
analyte
being
measured;
limited
by
antibody
sensitivity
and
specificity;
requires
extensive
crossvalidation
for
antibody
crossreactivity;
requires
use
of
an
amplified
tag
detection
system;
requires
more
sample
to
measure
low
abundant
proteins;
needs
to
be
measured
undiluted
Bioinformatic
needs
Moderate,
standardized
Moderate;
mostly
home
grown,
some
proprietary
Moderate
Moderate
Moderate
to
extensive;
home
grown,
not
standardized
Extensive,
home
grown;
not
standardized
a
LCM:
Laser
Capture
Microdissection
Proteomics Technologies and Bioinformatics 23
2.2 CURRENT BIOINFORMATICS APPROACHES
IN PROTEOMICS
Most biological databases have been generated by the biological community,
whereas most computational databases have been generated by the mathematical
and computational community. As a result, biological databases are not easily acqui-
escent to automated data mining methods and are unintelligible to some computers,
and computational tools are nonintuitive to biologists. A list of database search tools
is presented in Table 2.2, and some frequently used databases to study protein-protein
interaction are shown in Table 2.3. A number of bioinformatic approaches have been
discussed elsewhere in the book (see Chapters 10 and 14); therefore, we have
described only the basic principles of some of these approaches.
An important goal of bioinformatics is to develop robust, sensitive, and specific
methodologies and tools for the simultaneous analysis of all the proteins expressed
by the human genome, referred to as the human proteome, and to establish “bio-
signature” profiles that discriminate between disease states. Artifacts can be intro-
duced into spectra from physical, electrical, or chemical sources. Each spectrum in
TABLE 2.2
Database Search Tools for 2DE and MS
Name of the Software Web Site
Delta2Da www.decodon.com/Solutions/Delta2D.html
GD Impressionista www.genedata.com/productsgell/Gellab.html
Investigator HT PC Analyzera www.genomicsolutions.com/proteomics/2dgelanal.html
Phortix 2Da www.phortix.com/products/2d_products.htm
Z3 2D-Gel Analysis Systema www.2dgels.com
Mascot www.matrixscience.com
MassSearch www.Cbrg.inf.ethz/Server/MassSearch.html
MS-FIT www.Prospector.ucsf.edu
Peptldent www.expasy.ch/tools/peptident.html
a
Software for 2DE.
TABLE 2.3
Database for Protein Interaction
Name of the Database Web Site
CuraGen Portal.curagen.com
DIP Dipdoe-mbi.ucla.edu
Interact Bioinf.man.ac.uk/interactso.htm
MIPS www.mips.biochem.mpg.de
ProNet Pronet.doublewist.com
24 Informatics in Proteomics
MALDI or SELDI-TOF could be composed of three components: (1) true peak signal,
(2) exponential baseline, and (3) white noise.
Low-level processing is usually used to disentangle these components, remove
systematic artifacts, and isolate the true protein signal.
A key for successful biomarker discovery is the bioinformatic approach that
enables thorough, yet robust, analysis of a massive database generated by modern
biotechnologies, such as microarrays for genetic markers and time-of-flight mass
spectrometry for proteomic spectra.
Prior to a statistical analysis of marker discovery, TOF-MS data require a
pre-analysis processing: this enables extraction of relevant information from the
data. This can be thought of as a way to standardize and summarize the data for a
subsequent statistical analysis. For example, based on some eminent properties of
the data, pre-analytical processing first identifies all protein signals that are distin-
guishable from noise, then calibrates mass (per charge) values of proteins for poten-
tial measurement errors, and finally aggregates, as a single signal, multiple protein
signals that are within the range of measurement errors. The above discussion is
specifically relevant to serum-based analysis prone to all types of artifacts and errors.
Serum proteomic pattern analysis is an emerging technology that is increasingly
employed for the early detection of disease, the measurement of therapeutic toxicity
and disease responses, and the discovery of new drug targets for therapy. Various
bioinformatics algorithms have been used for protein pattern discovery, but all studies
have used the SELDI ionization technique along with low-resolution TOF-MS anal-
ysis. Earlier studies demonstrated proof-of-principle of biomarker development for
prostate cancer using SELDI-TOF, but some of the studies relied on the isolation
of actual malignant cells from pathology specimens.13–16 Body-fluid-based diagnos-
tics, using lavage, effluent, or effusion material, offers a less invasive approach to
biomarker discovery than biopsy or surgical-specimen-dependent approaches.17
Additionally, serum-based approaches may offer a superior repository of biomarkers
because serum is easy and inexpensive to obtain.18–21
Several preprocessing and postprocessing steps are needed in the protein chip
data analysis. For data analysis we must process the mass spectra in such a way that
it is conducive to downstream multidimensional methods (clustering and classifica-
tion, for example). The binding to protein chip spots used for general profiling is
specific only to a class of proteins that share a physical or chemical property that
creates an affinity for a given protein chip array surface. As a result, mass spectra
can contain hundreds of protein expression levels encoded in their peaks.
Bioinformatics tools have promise in aiding early cancer detection and risk
assessment. Some of the useful areas in bioinformatics tools are pattern clustering,
classification, array analysis, decision support, and data mining. A brief application
of these approaches is described below.
2.2.1 CLUSTERING
Two major approaches to clustering methods are bottom-up and top-down. An
example of the bottom-up approach includes hierarchical clustering where each gene
has its own profile.22 The basis of the clustering is that closest pairs are clustered
Discovering Diverse Content Through
Random Scribd Documents
“I think I will,” replied Lady Elizabeth, with a little yawn, and
giving her father a kiss, she went upstairs to her bedroom.
“Oh, dear,” she exclaimed, as she proceeded to undress herself,
“what an unfortunate girl I am. Fancy an earl’s daughter having no
maid to help her to bed when she is sleepy. Bah!” and here she
stamped her little foot, “I wish everything were gold, that I could sell
it.”
Having made this foolish remark, she was naughty enough to
break the strings of her petticoat, for they had become knotted.
Then she jumped into bed, and before her pretty head had touched
the white pillow she was fast asleep, beyond even the land of
dreams.
She slept soundly all the night through, not waking up till the sun
was shining in at her window, in all his golden glory; indeed it was a
glorious day, golden, bright, and beautiful!
Lady Elizabeth jumped from her bed with a song on her lips, and
her eyes bright with health and beauty. But of a sudden the song
ceased, as she cried out in wonder and alarm, and her eyes became
fixed with extraordinary astonishment. She had poured the water
from the jug into the basin, and as soon as she touched it with her
pink fingers it had frozen hard. Frozen quite solid, not into ice, but
into pure gold. Pure gold, worth hundreds of pounds!
It was the same in the bath, a bath both deep and wide. As soon
as her little pink toe touched the water it froze into a large block of
yellow gold, worth thousands upon thousands of pounds.
Lady Elizabeth Buys the
Magic Fish.
She was so bewildered, so excited, so delighted that she could
hardly dress herself, but she managed to do so somehow, and then
ran downstairs to tell her father the good news. He was a rich man
now, and could have servants, and horses and carriages and
everything else that he desired!
Lady Elizabeth and the Earl gloated over the gold, and the
household came and stared at it in mute wonder. More water was
poured into the bath and the same thing happened as before; when
touched by Lady Elizabeth’s fair fingers it turned into the precious
metal. But wonder must give way to other feelings. The Earl’s
daughter began to feel hungry, very hungry in fact, for she had a
good appetite and it was long past breakfast-time; she had had
nothing to eat since her supper of Magic Fish the night before.
It was a nice breakfast, coffee and rolls, fresh butter and eggs,
and jams and other nice things. Lady Elizabeth said her grace, sat
down, poured herself out a cup of coffee and raised it to her rosy
lips.
Lady Elizabeth let the cup fall with a crash, breaking it to atoms,
as she sprang to her feet with a scream, while the Earl fell off his
chair in amazement. He was an elderly earl, and rather nervous, and
sudden shocks upset him.
But really it was enough to upset anybody, for as soon as his
daughter’s lips touched the coffee it had turned into solid gold. No
wonder she dropped the cup, it was so heavy.
She tried a second cup with the same result; then, with trembling
fingers, she touched the loaf of bread, when it turned to gold
immediately; eggs, jam, butter, even the very crumbs turned into
golden nuggets, and as Lady Elizabeth found it impossible to eat
gold, she went without any breakfast whatsoever.
Her father was much concerned. Magicians were sent for from all
over the country, but they could do nothing but stare with wonder
and help themselves to the golden eggs to pay for their travelling
expenses.
The Poodle turns into a
Golden Dog.
The same thing happened at luncheon, at dinner, tea and supper.
Lady Elizabeth was starving. In the evening another remarkable
event took place. She happened to touch the pet poodle, when it
immediately became a golden dog. The Earl, at this, became more
nervous than ever, and shrieked whenever his daughter came near
him. The servants shunned her, too, fearful of the consequences of
touching her. Poor Elizabeth; a more unhappy girl did not go to bed
that night! But she had eaten the Magic Fish and wished for gold,
and her wish had been fulfilled.
The same happened the next day. Crowds of people came from
far and near to see the wonder of the age, and while they
wondered, Lady Elizabeth was slowly starving to death.
“Oh,” she cried, “if only I could be like an ordinary girl again. I
vow I would never be discontented any more. I would do my best to
be cheerful and never, never grumble again.”
As she made this vow there came a peal of thunder, and of a
sudden the golden water, the golden bread, jam, butter, and even
the eggs the Magicians had taken for their travelling expenses,
turned back into their natural state. And to the joy of Lady Elizabeth,
her father, and the people who loved her, she once more could work,
eat, and drink again.
From that day to this she was never discontented, and never once
longed for the gold which was hers for so short a while.
By the way, I was nearly forgetting to say that the pet poodle did
not turn into a live dog again. He remained a golden one, and made
an exceedingly handsome ornament for the fireplace.
THE PRINCESS AND THE FROG.
There was once a Frog.
He lay in a pool near the horse-pond in the farmyard, behind the
King’s Castle. To look at, he was not by any means a remarkable
frog. He was neither bigger nor smaller than other frogs of his kind;
neither was he greener, browner, nor more yellow. He certainly was
a perfect swimmer, and his croak was perhaps just a little more
musical than the croak of the other frogs, but in other respects he
was exactly like them. He spent his days catching worms and flies,
and dodging ducks who were always on the lookout to catch him.
His was the usual frog’s life—and yet, and yet he was no ordinary
frog.
There was once a Princess.
She lived in the Castle beyond the pool, on the other side of the
horse-pond. She was no ordinary Princess. Princesses, of course, are
always beautiful; but this one was more beautiful than any. Her hair
was more golden than real gold; her eyes as blue as an eastern sky;
her teeth as white as the whitest of pearls, while her smile was as
sweet as an angel’s. She was as good as she was beautiful.
Indeed, she was no ordinary Princess. She loved the world and
everybody in it. She loved her dear old father, the King (she had no
mother and brothers and sisters to love, poor Princess); she loved all
the King’s subjects, from the oldest old man to the youngest new
baby, and she loved all animals—yes, all animals, from the noble
horses to—well, even to the frogs in the pool beyond the horse-
pond, in the farmyard at the back of the Castle.
Now, the King was very rich, and so his daughter had everything
she desired, and what she desired most was the means to do good
to others, and to be able to care for all the maimed and injured
animals in her father’s kingdom. She had comfortable stables built
for the poor old horses, kennels for the poor old dogs, almshouses
for the poor old men and women, and happy homes for homeless
babies. The Princess was the ministering angel of the country.
In the Castle itself she had aviaries filled with beautiful birds, and
aquariums full of fish and all sorts of queer animals, including even a
frog with an injured foot, that the Princess herself had found in the
pool in the farmyard behind her father’s Castle. This was the Frog
that was no ordinary frog, except in appearance. He lived in the
Castle, and was happy; and his foot got quite well, except when he
hopped he had a slight limp.
Now, everything went happily until the lovely Princess was
eighteen years old, and then something fearful happened. A terrible
and cruel war broke out between the King, her father, and a
neighbouring Emperor, and alas! the King got the worst of it. He lost
every battle from the very beginning; town after town fell into the
hands of the enemy; the happy villages were burnt down; the crops
and the cattle were seized, and the King and his daughter sat in the
Castle with only a few soldiers to guard them, expecting every
moment the arrival of the Emperor’s victorious army.
They had no money—all their treasures had been sold to pay for
the horrid war. The old men and women were miserable in the
almshouses; the babies cried in their homes; the horses and birds
and fishes had been set free, for there was no money with which to
buy them food, and there was misery over all the land. The poor
Princess had no pets except one that had been left behind in the
aquarium—the Frog that was no ordinary frog, and that had a limp
when he hopped, and whose croak was rather more musical than
the croak of other frogs. Well, it came at last, the Emperor’s
conquering army, and it swept all before it; the Castle was taken,
and the King and the Princess had only just time to escape by the
back door, and through the farmyard by the pool, near the horse-
pond, and so on to the woods, where they hid themselves from their
enemies. The Frog was with them—yes, in a safety-matchbox, in the
Princess’s pocket. It was certainly not comfortable there, but he
preferred it to being left behind in a castle filled with strangers. The
next day found the King and his daughter miles away from their old
home, seated hand in hand upon a bank, hungry and miserable. No
one would have taken them for a King and a Princess, for he wore
an ordinary felt hat, instead of a crown, and she wore nothing on
her head but her own beautiful golden hair, which was more
beautiful and brilliant than the finest gold. Well, they went all that
day without anything to eat but berries, and at night they slept in
the woods again; and so they journeyed on, more miserable and
hungry. The Frog, too, was not very happy, and having the cramp in
his lame foot, kicked somewhat vigorously in his matchbox, so that
the Princess heard him, and pitied him, and determined to let him go
when they came to some water.
Now, they had not gone much farther before they came to a pond,
and here, I think, comes the wonderful part of the story. The
Princess took the Frog from the matchbox and held it for a moment
in her hand, and as she did so, she burst into tears, and her tears
fell upon the little creature.
“Alas!” she cried, “you are the last of my poor pets I loved so
dearly.”
Then there suddenly came a flash of light, and a noise like terrible
thunder, and the King, in his fright, fell on his back, while the
Princess opened her dark blue eyes in wonder. There stood before
her a handsome Prince, who smiled and held out his hands to her.
“The spell of a wicked fairy is broken,” he said. “The Frog you took
from the pool was no ordinary frog—in reality, he was an enchanted
Prince; your love for, and the tears that fell on him, have restored
him to his own form again.”
“Come,” he continued, “we three will go over those blue hills
together, to my lovely country. And you shall be my Princess, and we
will rule the land together.”
And so they went away, hand in hand, the Princess between her
father and the Prince, and they went over the blue hills to the most
beautiful country you can imagine. And then, before long, the
Princess built stables and kennels for the old horses and poor dogs,
and almshouses for the old men and old women, and houses for the
homeless babies; and she was never so happy as when doing good
to others, and everybody loved her, for, truly, she was the
ministering angel of the land.
THE THREE SNOWFLAKES
Once upon a time there were three snowflakes, and they were called
Faith, Hope, and Charity. When I say three snowflakes, I don’t quite
mean that, but three little girls dressed in white, and looking like
snow Princesses as they trudged along across the white covered
country.
They were the Earl’s daughters, and, as I have just said, their
names were Faith, Hope, and Charity. I wonder what the Earl would
have called a fourth daughter, supposing he had had one.
The three snowflakes lived at the Castle, which was on a hillside,
surrounded by a beautiful park, and overlooking the valley.
In the summer it was a lovely valley, with a river running through
it, and beautiful green woods coming down to the edges of the
water.
Now the winter had come it was all white, except the river, which
looked grey in the distance. In one corner of the valley lay the
village, and in the last cottage of the village there lived a little girl
called Ruth.
Ruth was very poor, indeed, she was so poor that she possessed
nothing. The tiny cottage she stood in had been rented by her
grandmother, and now her grandmother was dead; the only relation
she had left in the world had been taken from her.
There was not a crumb of bread in the cupboard, not a stick with
which to make a fire, not a penny in the girl’s pocket, so no wonder
she stood looking out of the window with dismay in her face.
The window was a little open, and through the opening came
three flakes of snow.
They fell upon the brick floor and melted slowly away.
Ruth shuddered; it was the first snow of the year, it might mean
the beginning of a long, hard, cruel winter.
She shuddered again, and then of a sudden knelt on the brick
floor and clasped her hands in prayer, and this showed she had Faith
in her heart.
And as she prayed the sun broke through the snow clouds, and
poured in through the window, and shone on the girl’s brown hair.
She rose with a smile on her lips and a light dancing in her eyes, for
there was Hope in her breast.
Ruth opened the window and took in the withered flowers on the
sill.
“Poor flowers,” she said, “you will be
warmer inside.”
Now this was Charity, for kindness is
Charity, and we can be kind even to flowers.
Then, of a sudden, there came shouts of
laughter from the lane without, and the sound
of merry voices; the door of the cottage flew
open, and in ran the Earl’s daughters, the
three snowflakes.
“Oh, Ruth,” said Charity, “we have heard of
your trouble, and our father has sent us to
help you.”
And Charity kissed Ruth on the cheek.
“And you are to come and live in the lodge by the gates,” said
Faith, putting her arms round the poor girl’s waist, and leading her
to the door of the cottage.
“And you are to be happy the whole year long,” cried Hope,
clapping her hands, and turning, she led the way, skipping and
laughing, up the lane.
And so it happened that Ruth went and lived in the lodge of the
great lord’s beautiful estate, and there she may be living, contented
and happy, to this day.
Informatics In Proteomics 1st Edition Sudhir Srivastava
Informatics In Proteomics 1st Edition Sudhir Srivastava
A SELECTION FROM
RAPHAEL TUCK & SONS’
PUBLICATIONS.
1.
2.
3.
4.
5.
6.
The Children’s Gem Library.
A series of six cloth bound Story Books by the most popular
Writers for Children.
Effie’s Little Mother, by Rosa Nouchette Carey.
Tic-tac-too, by L. T. Meade.
Betsy Brian’s Needle, by M. A. Hoyer.
The Seven Plaits of Nettles, by Edric Vredenburg.
The Rainbow Queen, by E. Nesbit.
Mildred and her Mills, by Nora Chesson.
All the above Illustrated in colour and black and white. 64 pages.
25c. each. Complete, in a neat case, $1.50.
Humorous Books by Louis Wain.
Big Dogs, Little Dogs, Cats and Kittens. Thirty-six pages of
coloured and black and white pictures.
Bound in Picture boards 1.50
Bound in Cloth, bevelled 2.00
Pa Cats, Ma Cats and their Kittens. Thirty-six pages of
coloured and black and white pictures.
Bound in Picture boards 1.50
Bound in Cloth, bevelled 2.00
With Louis Wain to Fairyland. Described by Nora Chesson.
Thirty-six pages of coloured and black and white pictures.
Bound in Picture boards 1.50
Bound in Cloth, bevelled 2.00
Louis Wain’s Cats and Dogs. Untearable linen leaves. Twenty-
four full-page coloured pictures, and four black and white.
Bound in Picture boards 1.50
These books are in Louis Wain’s inimitable style, and will amuse both old and
young alike.
New and Amusing Books
By T. E. Donnison, etc.
Odds and Ends and Old Friends. Thirty-six pages of coloured
and black and white pictures.
Bound in Picture boards 1.50
Bound in Cloth, bevelled 2.00
Old Fairy Legends in New Colours, with Verses by Nora
Chesson. Thirty-six pages of coloured and black and white
pictures.
Bound in Picture boards 1.50
Bound in Cloth, bevelled 2.00
Old Friends in New Frocks, with Verses by Nora Chesson.
Untearable linen leaves. Twenty-four full-page coloured pictures,
and four black and white.
Bound in Picture boards 1.50
Bound in Cloth, bevelled 2.00
The familiar Nursery Tales and Rhymes treated in a very clever and entirely new
manner.
Rhymes without Reason. Pictured and penned by E. M. and M.
F. Taylor. Thirty-six pages of coloured and black and white pictures.
Bound in Picture boards 1.50
Bound in Cloth, bevelled 2.00
Wallypug Tales. A novel and extremely humorous creation of G.
E. Farrow, illustrated with 36 full-paged pictures in colour, by
Alan Wright.
Bound in Picture boards 1.50
Bound in Cloth, bevelled 2.00
The Wallypug stories have brought the author into the front rank of writers for
children.
Proverbs Old, Newly Told, by Clifton Bingham. Thirty-six pages
of coloured and black and white pictures.
Bound in Picture boards 1.50
Bound in Cloth, bevelled 2.00
The well-known proverbs treated in a very original and humorous fashion.
Books by the Rev. Canon Duckworth, D.D.,
C.V.O.,
Sub-Dean of Westminster; Chaplain-in-Ordinary to the King.
The Holy Land. Illustrated with forty-nine pictures in colour and
black and white, from original drawings, painted in Palestine, by
W. J. Webb. Coloured map. Thirty-six pages.
Bound in Picture boards 1.50
Bound in Cloth, bevelled 2.00
Through the Holy Land. Thirty-two pictures in colour and black
and white, by W. J. Webb.
Paper 40c.
Linen leaves 75c.
By the late Rev. H. R. Haweis, M.A.,
Author of “Music and Morals,” “Arrows in the Air,” “Christ and
Christianity,” etc.
The Child’s Life of Jesus. Illustrated with twenty full-paged
coloured and forty-three black and white pictures. One hundred
pages.
Bound in Picture boards 1.50
Bound in Cloth, gilt edges 2.00
Written in Mr. Haweis’s charming and forcible language, which makes the life of
our Saviour readily understood by children.
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
ebookgate.com

More Related Content

PDF
Informatics In Proteomics 1st Edition Sudhir Srivastava
PDF
Bioinformatics for Comparative Proteomics 1st Edition Chuming Chen
PDF
Bioinformatics for Comparative Proteomics 1st Edition Chuming Chen
PDF
Bioinformatics for Comparative Proteomics 1st Edition Chuming Chen
PDF
Bioinformatics for Comparative Proteomics 1st Edition Chuming Chen
PDF
Data Mining In Proteomics From Standards To Applications 1st Edition Michael ...
PDF
Bioinformatics for Comparative Proteomics 1st Edition Chuming Chen
PDF
Bioinformatics For Comparative Proteomics 1st Edition Chuming Chen
Informatics In Proteomics 1st Edition Sudhir Srivastava
Bioinformatics for Comparative Proteomics 1st Edition Chuming Chen
Bioinformatics for Comparative Proteomics 1st Edition Chuming Chen
Bioinformatics for Comparative Proteomics 1st Edition Chuming Chen
Bioinformatics for Comparative Proteomics 1st Edition Chuming Chen
Data Mining In Proteomics From Standards To Applications 1st Edition Michael ...
Bioinformatics for Comparative Proteomics 1st Edition Chuming Chen
Bioinformatics For Comparative Proteomics 1st Edition Chuming Chen

Similar to Informatics In Proteomics 1st Edition Sudhir Srivastava (20)

PDF
Bioinformatics for Comparative Proteomics 1st Edition Chuming Chen
PPT
Salisha ppt (1) (1)
PPTX
genomics proteomics metbolomics.pptx
PDF
Proteomics For Biomarker Discovery 2013th Ming Zhou Timothy Veenstra
PPTX
Proteomics
PDF
Introduction To Proteomics Principles And Applications Nawin Mishraauth
PPTX
The Role Of Proteomics In Drug Discovery And Development (1).pptx
PPTX
Geomics proteomics
PPT
Proteomics, definatio , general concept, signficance
PPTX
Proteomics: types, protein profiling steps etc.
PDF
Emerging Sample Treatments In Proteomics 1st Ed Josluis Capelomartnez
PDF
Recent Advances Proteomics Research Sameh Magdeldin
PDF
Proteomics Methods and Protocols 1st Edition Friedrich Lottspeich (Auth.)
PPT
proteomics.ppt
PDF
Proteomics Methods and Protocols 1st Edition Friedrich Lottspeich (Auth.)
PPTX
Genomics and proteomics by shreeman
PDF
Proteomics For Biomarker Discovery Methods And Protocols 1st Ed Virginie Brun
PDF
Introduction to Proteomics Principles and Applications Methods of Biochemical...
PPT
Proteome
PPTX
Proteomics
Bioinformatics for Comparative Proteomics 1st Edition Chuming Chen
Salisha ppt (1) (1)
genomics proteomics metbolomics.pptx
Proteomics For Biomarker Discovery 2013th Ming Zhou Timothy Veenstra
Proteomics
Introduction To Proteomics Principles And Applications Nawin Mishraauth
The Role Of Proteomics In Drug Discovery And Development (1).pptx
Geomics proteomics
Proteomics, definatio , general concept, signficance
Proteomics: types, protein profiling steps etc.
Emerging Sample Treatments In Proteomics 1st Ed Josluis Capelomartnez
Recent Advances Proteomics Research Sameh Magdeldin
Proteomics Methods and Protocols 1st Edition Friedrich Lottspeich (Auth.)
proteomics.ppt
Proteomics Methods and Protocols 1st Edition Friedrich Lottspeich (Auth.)
Genomics and proteomics by shreeman
Proteomics For Biomarker Discovery Methods And Protocols 1st Ed Virginie Brun
Introduction to Proteomics Principles and Applications Methods of Biochemical...
Proteome
Proteomics
Ad

Recently uploaded (20)

PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Classroom Observation Tools for Teachers
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Lesson notes of climatology university.
PDF
RMMM.pdf make it easy to upload and study
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
GDM (1) (1).pptx small presentation for students
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Institutional Correction lecture only . . .
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Cell Types and Its function , kingdom of life
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Module 4: Burden of Disease Tutorial Slides S2 2025
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Complications of Minimal Access Surgery at WLH
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Classroom Observation Tools for Teachers
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Lesson notes of climatology university.
RMMM.pdf make it easy to upload and study
TR - Agricultural Crops Production NC III.pdf
O7-L3 Supply Chain Operations - ICLT Program
Renaissance Architecture: A Journey from Faith to Humanism
GDM (1) (1).pptx small presentation for students
VCE English Exam - Section C Student Revision Booklet
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Institutional Correction lecture only . . .
Computing-Curriculum for Schools in Ghana
Cell Types and Its function , kingdom of life
Ad

Informatics In Proteomics 1st Edition Sudhir Srivastava

  • 1. Informatics In Proteomics 1st Edition Sudhir Srivastava download https://ebookgate.com/product/informatics-in-proteomics-1st- edition-sudhir-srivastava/ Get Instant Ebook Downloads – Browse at https://ebookgate.com
  • 2. Get Your Digital Files Instantly: PDF, ePub, MOBI and More Quick Digital Downloads: PDF, ePub, MOBI and Other Formats Content Networking in the Mobile Internet 1st Edition Sudhir Dixit https://ebookgate.com/product/content-networking-in-the-mobile- internet-1st-edition-sudhir-dixit/ C In Depth 2nd Edition S.K. Srivastava https://ebookgate.com/product/c-in-depth-2nd-edition-s-k- srivastava/ Belief Functions in Business Decisions 1st Edition Rajendra P. Srivastava https://ebookgate.com/product/belief-functions-in-business- decisions-1st-edition-rajendra-p-srivastava/ Ultimate Python Programming 1st Edition Deepali Srivastava https://ebookgate.com/product/ultimate-python-programming-1st- edition-deepali-srivastava/
  • 3. How to Succeed at Interviews 2nd Edition Sudhir Andrews https://ebookgate.com/product/how-to-succeed-at-interviews-2nd- edition-sudhir-andrews/ Healthcare Information Systems and Informatics Research and Practices Advances in Healthcare Information Systems and Informatics 1st Edition Joseph Tan https://ebookgate.com/product/healthcare-information-systems-and- informatics-research-and-practices-advances-in-healthcare- information-systems-and-informatics-1st-edition-joseph-tan/ Mass Spectrometry Data Analysis in Proteomics 1st Edition Rune Matthiesen https://ebookgate.com/product/mass-spectrometry-data-analysis-in- proteomics-1st-edition-rune-matthiesen/ Tourism Informatics Nalin Sharda https://ebookgate.com/product/tourism-informatics-nalin-sharda/ Nano catalysts for Energy Applications 1st Edition Rohit Srivastava (Editor) https://ebookgate.com/product/nano-catalysts-for-energy- applications-1st-edition-rohit-srivastava-editor/
  • 8. Published in 2005 by CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2005 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number-10: 1-57444-480-8 (Hardcover) International Standard Book Number-13: 978-1-57444-480-3 (Hardcover) This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Catalog record is available from the Library of Congress Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Taylor & Francis Group is the Academic Division of T&F Informa plc.
  • 9. Dedication Dedicated to my daughters, Aditi and Jigisha, and to my lovely wife, Dr. Rashmi Gopal Srivastava
  • 11. Foreword A remarkable development in the post-genome era is the re-emergence of proteomics as a new discipline with roots in old-fashioned chemistry and biochemistry, but with new branches in genomics and informatics. The appeal of proteomics stems from the fact that proteins are the most functional component encoded for in the genome and thus represent a direct path to functionality. Proteomics emphasizes the global profiling of cells, tissues, and biological fluids, but there is a long road from applying various proteomics tools to the discovery, for example, of proteins that have clinical utility as disease markers or as therapeutic targets. Given the complexity of various cell and tissue proteomes and the challenges of identifying proteins of particular interest, informatics is central to all aspects of proteomics. However, protein infor- matics is still in its early stages, as is the entire field of proteomics. Although collections of protein sequences have preceded genomic sequence data- bases by more than two decades, there is a substantial need for protein databases as basic protein information resources. There is a need for implementing algorithms, statistical methods, and computer applications that facilitate pattern recognition and biomarker discovery by integrating data from multiple sources. This book, which is dedicated to protein informatics, is intended to serve as a valuable resource for people interested in protein analysis, particularly in the context of biomedical studies. An expert group of authors has been assembled with proteomics informatics–related expertise that is highly valuable in guiding proteomic studies, particularly since currently the analysis of pro- teomics data is rather informal and largely dependent on the idiosyncrasies of the analyst. Several chapters address the need for infrastructures for proteomic research and cover the status of public protein databases and interfaces. The creation of a national virtual knowledge environment and information management systems for proteomic research is timely and clearly addressed. Issues surrounding data standardization and integration are very well presented. They are captured in a chapter that describes ongoing initiatives within the Human Proteome Organization (HUPO). A major strength of the book is in the detailed review and discussion of applications of statistical and bioinformatic tools to data analysis and data mining. Much concern at the present time surrounds the analysis of proteomics data by mass spectrometry for a variety of applications. The book shines in its presentation in several chapters of various approaches and issues surrounding mass spectrometry data analysis. Although the field of proteomics and related informatics is highly evolving, this book captures not only the current state-of-the-art but also presents a vision for where the field is heading. As a result, the contributions of the book and its com- ponent chapters will have long-lasting value. Sam Hanash, M.D. Fred Hutchinson Cancer Center Seattle, Washington
  • 13. Preface The biological dictates of an organism are largely governed through the structure and function of the products of its genes, the most functional of which is the proteome. Originally defined as the analysis of the entire protein complement of a cell or tissue, proteomics now encompasses the study of expressed proteins including the identification and elucidation of their structure–function relationships under normal and disease conditions. In combination with genomics, proteomics can provide a holistic understanding of the biology underlying disease processes. Infor- mation at the level of the proteome is critical for understanding the function of specific cell types and their roles in health and disease. Bioinformatic tools are needed at all levels of proteomic analysis. The main databases serving as the targets for mass spectrometry data searches are the expressed sequence tag (EST) and the protein sequence databases, which contain protein sequence information translated from DNA sequence data. It is thought that virtually any protein that can be detected on a 2DE gel can be identified through the EST database, which contains over 2 million cDNA sequences. However, ESTs cover only a partial sequence of the protein. This poses a formidable challenge for the proteomic community and neces- sitates the need for databases with extensive coverage and search algorithms for identifying proteins/peptides with accuracy. The handling and analysis of data generated by proteomic investigations repre- sent an emerging and challenging field. New techniques and collaborations between computer scientists, biostatisticians, and biologists are called for. There is a need to develop and integrate a variety of different types of databases; to develop tools for translating raw primary data into forms suitable for public dissemination and formal data analysis; to obtain and develop user interfaces to store, retrieve, and visualize data from databases; and to develop efficient and valid methods of data analysis. The sheer volume of data to be collected and processed will challenge the usual approaches. Analyzing data of this dimension is a fairly new endeavor for statisti- cians, for which there is not an extensive technical statistical literature. There are several levels of complexity in the investigation of proteomic data, from the day-to-day interpretation of protein patterns generated by individual mea- surement systems to the query and manipulation of data from multiple experiments or information sources. Interaction with data warehouses represents another level of data interrogation. Users typically retrieve data and formulate queries to test hypoth- eses and generate conclusions. Formulating queries can be a difficult task requiring extensive syntactic and semantic knowledge. Syntactic knowledge is needed to ensure that a query is well formed and references existing relations and attributes. Semantic knowledge is needed to ensure that a query satisfies user intent. Because a user often has an incomplete understanding of the contents and structure of the data warehouse, it is necessary to provide automated techniques for query formulation that significantly reduce the amount of knowledge required by data warehouse users.
  • 14. This book intends to provide a comprehensive view of informatic approaches to data storage, curation, retrieval, and mining as well as application-specific bioinformatic tools in disease detection, diagnosis, and treatment. Rapid technological advances are yielding abundant data in many formats that, because of their vast quantity and complexity, are becoming increasingly difficult to analyze. A strategic objective is to streamline the transfer of knowledge and technology to allow for data from disparate sources to be analyzed, providing new inferences about the complex role of proteomics in disease processes. Data mining, the process of knowledge extraction from data and the exploration of available data for patterns and relationships, is increasingly needed for today’s high-throughput technologies. Data architectures that support the integration of biological data files with epidemiologic profiles of human clinical responses need to be developed. The ability to develop and analyze metadata will stimulate new research theories and streamline the transfer of basic knowledge into clinical applications. It is my belief that this book will serve as a unique reference for researchers, biologists, technol- ogists, clinicians, and other health professions as it provides information on the informatics needs of proteomic research on molecular targets relevant to disease detection, diagnosis, and treatment. The nineteen chapters in this volume are contributed by eminent researchers in the field and critically address various aspects of bioinformatics and proteomic research. The first two chapters are introductory: they discuss the biological rationale for proteomic research and provide a brief overview of technologies that allow for rapid analysis of the proteome. The next five chapters describe the infrastructures that provide the foundations for proteomic research: these include the creation of a national, virtual knowledge environment and information management systems for proteomic research; the availability of public protein databases and interfaces; and the need for collaboration and interaction between academia, industry, and government agencies. Chapter 6 illustrates the power of proteomic knowledge in furthering hypoth- esis-driven cancer biomarker research through data extraction and curation. Chapter 7 and Chapter 8 provide the conceptual framework for data standardization and inte- gration and give an example of an ongoing collaborative research within the Human Proteome Organization. Chapter 9 identifies genomic and proteomic informatic tools used in deciphering functional pathways. The remaining ten chapters describe appli- cations of statistical and bioinformatic tools in data analysis, data presentation, and data mining. Chapter 10 provides an overview of a variety of proteomic data mining tools, and subsequent chapters provide specific examples of data mining approaches and their applications. Chapter 11 describes methods for quantitative analysis of a large number of proteins in a relatively large number of lung cancer samples using two-dimensional gel electrophoresis. Chapter 12 discusses the analysis of mass spec- trometric data by nonparametric inference for high-dimensional comparisons involv- ing two or more groups, based on a few samples and very few replicates from within each group. Chapter 13 discusses bioinformatic tools for the identification of proteins by searching a collection of sequences with mass spectrometric data and describes several critical steps that are necessary for the successful protein identification, which include: (a) the masses of peaks in the mass spectrum corresponding to the monoiso- topic peptide masses have to assigned; (b) a collection of sequences have to be
  • 15. searched using a sensitive and selective algorithm; (c) the significance of the results have to be tested; and (d) the function of the identified proteins have to be assigned. In Chapter 14, two types of approaches are described: one based on statistical theories and another on machine learning and computational data mining tech- niques. In Chapter 15, the author discusses the problems with the currently avail- able disease classifier algorithms and puts forward approaches for scaling the data set, searching for outliers, choosing relevant features, building classification mod- els, and then determining the characteristics of the models. Chapter 16 discusses currently available computer tools that support data collection, analysis, and val- idation in a high-throughput LC-MS/MS–based proteome research environment and subsequent protein identification and quantification with minimal false-posi- tive error rates. Chapter 17 and Chapter 18 describe experimental designs, statis- tical methodologies, and computational tools for the analysis of spectral patterns in the diagnosis of ovarian and prostate cancer. Finally, Chapter 19 illustrates how quantitative analysis of fluorescence microscope images augments mainstream proteomics by providing information about the abundance, localization, move- ment, and interactions of proteins inside cells. This book has brought together a mix of scientific disciplines and specializations, and I encourage readers to expand their knowledge by reading how the combination of proteomics and bioinformatics is used to uncover interesting biology and discover clinically significant biomarkers. In a field with rapidly changing technologies, it is difficult to ever feel that one has knowledge that is current and definitive. Many chapters in this book are conceptual in nature but have been included because proteomics is an evolving science that offers much hope to researchers and patients alike. Last, but not least, I would like to acknowledge the authors for their contributions and patience. When I accepted the offer to edit this book, I was not sure we were ready for a book on proteomics as the field is continuously evolving, but the excellent contributions and enthusiasm of my colleagues have allayed my fears. The chapters in the book describe the current state-of-the-art in informatics and reflect the inter- ests, experience, and creativity of the authors. Many chapters are intimately related and therefore there may be some overlap in the material presented in each individual chapter. I would also like to acknowledge Dr. Asad Umar for his help in designing the cover for this book. Finally, I would like to express my sincere gratitude to Dr. Sam Hanash, the past president of HUPO, for his encouragement and support. Sudhir Srivastava, Ph.D., MPH, MS Bethesda, Maryland
  • 17. Contributors Bao-Ling Adam Department of Microbiology and Molecular Cell Biology Eastern Virginia Medical School Norfolk, Virginia, USA Marcin Adamski Bioinformatics Program Department of Human Genetics School of Medicine University of Michigan Ann Arbor, Michigan, USA Ruedi Aebersold Institute for Systems Biology Seattle, Washington, USA R.C. Beavis Beavis Informatics Winnipeg, Manitoba, Canada David G. Beer General Thoracic Surgery University of Michigan Ann Arbor, Michigan, USA Guoan Chen General Thoracic Surgery University of Michigan Ann Arbor, Michigan, USA Chad Creighton Pathology Department University of Michigan Ann Arbor, Michigan, USA Daniel Crichton Jet Propulsion Laboratory California Institute of Technology Pasadena, California, USA Cim Edelstein Division of Public Health Services Fred Hutchinson Cancer Research Center Seattle, Washington, USA Jimmy K. Eng Division of Public Health Services Fred Hutchinson Cancer Research Center Seattle, Washington, USA J. Eriksson Department of Chemistry Swedish University of Agricultural Sciences Uppsala, Sweden Ziding Feng Division of Public Health Sciences Fred Hutchinson Cancer Research Center Seattle, Washington, USA D. Fenyö Amersham Biosciences AB Uppsala, Sweden The Rockefeller University New York, New York, USA R. Gangal SciNova Informatics Pune, Maharashtra, India
  • 18. Gary L. Gilliland Biotechnology Division National Institute of Standards and Technology Gaithersburg, Maryland, USA Samir M. Hanash Division of Public Health Sciences Fred Hutchinson Cancer Research Center Seattle, Washington, USA Ben A. Hitt Correlogic Systems, Inc. Bethesda, Maryland, USA J. Steven Hughes Jet Propulsion Laboratory California Institute of Technology Pasadena, California, USA Donald Johnsey National Cancer Institute National Institutes of Health Bethesda, Maryland, USA Andrew Keller Division of Public Health Sciences Fred Hutchinson Cancer Research Center Seattle, Washington, USA Sean Kelly Jet Propulsion Laboratory California Institute of Technology Pasadena, California, USA Heather Kincaid Fred Hutchinson Cancer Research Center Seattle, Washington, USA Jeanne Kowalski Division of Oncology Biostatistics Johns Hopkins University Baltimore, Maryland, USA Peter A. Lemkin Laboratory of Experimental and Computational Biology Center for Cancer Research National Cancer Institute Frederick, Maryland, USA Xiao-jun Li Institute for Systems Biology Seattle, Washington, USA Chenwei Lin Department of Computational Biology Fred Hutchinson Cancer Research Center Seattle, Washington, USA Lance Liotta FDA-NCI Clinical Proteomics Program Laboratory of Pathology National Cancer Institute Bethesda, Maryland, USA Stephen Lockett NCI–Frederick/SAIC–Frederick Frederick, Maryland, USA Brian T. Luke SAIC-Frederick Advanced Biomedical Computing Center NCI Frederick Frederick, Maryland, USA Dale McLerran Division of Public Health Sciences Fred Hutchinson Cancer Research Center Seattle, Washington, USA Djamel Medjahed Laboratory of Molecular Technology SAIC-Frederick Inc. Frederick, Maryland, USA
  • 19. Alexey I. Nesvizhskii Division of Public Health Sciences Fred Hutchinson Cancer Research Center Seattle, Washington, USA Jane Meejung Chang Oh Wayne State University Detroit, Michigan, USA Gilbert S. Omenn Departments of Internal Medicine and Human Genetics Medical School and School of Public Health University of Michigan Ann Arbor, Michigan, USA Emanuel Petricoin FDA-NCI Clinical Proteomics Program Office of Cell Therapy CBER/Food and Drug Administration Bethesda, Maryland, USA Veerasamy Ravichandran Biotechnology Division National Institute of Standards and Technology Gaithersburg, Maryland, USA John Semmes Department of Microbiology and Molecular Cell Biology Eastern Virginia Medical School Norfolk, Virginia, USA Ram D. Sriram Manufacturing Systems Integration Division National Institute of Standards and Technology Gaithersburg, Maryland, USA Sudhir Srivastava Cancer Biomarkers Research Group Division of Cancer Prevention National Cancer Institute Bethesda, Maryland, USA David J. States Bioinformatics Program Department of Human Genetics School of Medicine University of Michigan Ann Arbor, Michigan, USA Mark Thornquist Division of Public Health Sciences Fred Hutchinson Cancer Research Center Seattle, Washington, USA Mukesh Verma Cancer Biomarkers Research Group Division of Cancer Prevention National Cancer Institute Bethesda, Maryland, USA Paul D. Wagner Cancer Biomarkers Research Group Division of Cancer Prevention National Cancer Institute Bethesda, Maryland, USA Denise B. Warzel Center for Bioinformatics National Cancer Institute Rockville, Maryland, USA Nicole White Department of Pathology Johns Hopkins University Baltimore, Maryland, USA Marcy Winget Department of Population Health and Information Alberta Cancer Board Edmonton, Alberta, Canada
  • 20. Yutaka Yasui Division of Public Health Sciences Fred Hutchinson Cancer Research Center Seattle, Washington, USA Mei-Fen Yeh Division of Oncology Biostatistics Johns Hopkins University Baltimore, Maryland, USA Zhen Zhang Center for Biomarker Discovery Department of Pathology Johns Hopkins University Baltimore, Maryland, USA
  • 21. Contents Chapter 1 The Promise of Proteomics: Biology, Applications, and Challenges.......................1 Paul D. Wagner and Sudhir Srivastava Chapter 2 Proteomics Technologies and Bioinformatics.........................................................17 Sudhir Srivastava and Mukesh Verma Chapter 3 Creating a National Virtual Knowledge Environment for Proteomics and Information Management........................................................31 Daniel Crichton, Heather Kincaid, Sean Kelly, Sudhir Srivastava, J. Steven Hughes, and Donald Johnsey Chapter 4 Public Protein Databases and Interfaces.................................................................53 Jane Meejung Chang Oh Chapter 5 Proteomics Knowledge Databases: Facilitating Collaboration and Interaction between Academia, Industry, and Federal Agencies............................79 Denise B. Warzel, Marcy Winget, Cim Edelstein, Chenwei Lin, and Mark Thornquist Chapter 6 Proteome Knowledge Bases in the Context of Cancer ........................................109 Djamel Medjahed and Peter A. Lemkin Chapter 7 Data Standards in Proteomics: Promises and Challenges ....................................141 Veerasamy Ravichandran, Ram D. Sriram, Gary L. Gilliland, and Sudhir Srivastava Chapter 8 Data Standardization and Integration in Collaborative Proteomics Studies ........163 Marcin Adamski, David J. States, and Gilbert S. Omenn
  • 22. Chapter 9 Informatics Tools for Functional Pathway Analysis Using Genomics and Proteomics.....................................................................................193 Chad Creighton and Samir M. Hanash Chapter 10 Data Mining in Proteomics ...................................................................................205 R. Gangal Chapter 11 Protein Expression Analysis..................................................................................227 Guoan Chen and David G. Beer Chapter 12 Nonparametric, Distance-Based, Supervised Protein Array Analysis .......................................................................................................255 Mei-Fen Yeh, Jeanne Kowalski, Nicole White, and Zhen Zhang Chapter 13 Protein Identification by Searching Collections of Sequences with Mass Spectrometric Data ..............................................................................267 D. Fenyö, J. Eriksson, and R.C. Beavis Chapter 14 Bioinformatics Tools for Differential Analysis of Proteomic Expression Profiling Data from Clinical Samples................................................277 Zhen Zhang Chapter 15 Sample Characterization Using Large Data Sets..................................................293 Brian T. Luke Chapter 16 Computational Tools for Tandem Mass Spectrometry–Based High-Throughput Quantitative Proteomics ...........................................................335 Jimmy K. Eng, Andrew Keller, Xiao-jun Li, Alexey I. Nesvizhskii, and Ruedi Aebersold Chapter 17 Pattern Recognition Algorithms and Disease Biomarkers....................................353 Ben A. Hitt, Emanuel Petricoin, and Lance Liotta
  • 23. Chapter 18 Statistical Design and Analytical Strategies for Discovery of Disease-Specific Protein Patterns..........................................................................367 Ziding Feng, Yutaka Yasui, Dale McLerran, Bao-Ling Adam, and John Semmes Chapter 19 Image Analysis in Proteomics...............................................................................391 Stephen Lockett Index......................................................................................................................433
  • 25. 1 1 The Promise of Proteomics: Biology, Applications, and Challenges Paul D. Wagner and Sudhir Srivastava CONTENTS 1.1 Introduction ......................................................................................................1 1.2 Why Is Proteomics Useful?.............................................................................2 1.3 Gene–Environment Interactions.......................................................................3 1.4 Organelle-Based Proteomics............................................................................4 1.5 Cancer Detection..............................................................................................5 1.6 Why Proteomics Has Not Succeeded in the Past: Cancer as an Example......................................................................................6 1.7 How Have Proteomic Approaches Changed over the Years? .........................7 1.8 Future of Proteomics in Drug Discovery, Screening, Early Detection, and Prevention..............................................................................11 References................................................................................................................13 1.1 INTRODUCTION In the 19th century, the light microscope opened a new frontier in the study of diseases, allowing scientists to look deep into the cell. The science of pathology (the branch of medicine that deals with the essential nature of disease) expanded to include the study of structural and functional changes in cells, and diseases could be attributed to recognizable changes in the cells of the body. At the start of the 21st century, the molecular-based methods of genomics and proteomics are bringing about a new revolution in medicine. Diseases will be described in terms of patterns of abnormal genetic and protein expression in cells and how these cellular alterations affect the molecular composition of the surrounding environment. This new pathol- ogy will have a profound impact on the practice of medicine, enabling physicians to determine who is at risk for a specific disease, to recognize diseases before they have invaded tissues, to intervene with agents or treatments that may prevent or
  • 26. 2 Informatics in Proteomics delay disease progression, to guide the choice of therapies, and to assess how well a treatment is working. Cancer is one of the many diseases whose treatment will be affected by these molecular approaches. Currently available methods can only detect cancers that have achieved a certain size threshold, and in many cases, the tumors, however small, have already invaded blood vessels or spread to other parts of the body. Molecular markers have the potential to find tumors in their earliest stages of development, even before the cell’s physical appearance has changed. Molecular-based detection methods will also change our definition of cancer. For example, precancerous changes in the uterine cervix are called such because of specific architectural and cytological changes. In the future, we may be able to define the expression patterns of specific cellular proteins induced by human papillomavirus that indicate the cells are beginning to progress to cancer. We may also be able to find molecular changes that affect all the tissues of an organ, putting the organ at risk for cancer. In addition to improving the physician’s ability to detect cancers early, molecular technologies will help doctors determine which neoplastic lesions are most likely to progress and which are not destined to do so — a dilemma that confronts urologists in the treatment of prostate cancer. Accurate discrimination will help eliminate overtreatment of harmless lesions. By revealing the metastatic potential of tumors and their corresponding preneoplastic lesions, molecular-based methods will fill a knowledge gap impossible to close with traditional histopathology. If these advances are made and new screening tests are developed, then one day we may be able to identify and eliminate the invasive forms of most malignant epithelial tumors. 1.2 WHY IS PROTEOMICS USEFUL? Mammalian systems are much more complex than can be deciphered by their genes alone, and the biological dictates of an organism are largely governed through the function of proteins. In combination with genomics, proteomics can provide a holistic understanding of the biology of cells, organisms, and disease processes. The term “proteome” came into use in the mid 1990s and is defined as the protein complement of the genome. Although proteomics was originally used to describe methods for large-scale, high-throughput protein separation and identification,1 today proteomics encompasses almost any method used to characterize proteins and deter- mine their functions. Information at the level of the proteome is critical for under- standing the function of specific cell types and their roles in health and disease. This is because proteins are often expressed at levels and forms that cannot be predicted from mRNA analysis. Proteomics also provides an avenue to understand the inter- action between a cell’s functional pathways and its environmental milieu, indepen- dent of any changes at the RNA level. It is now generally recognized that expression analysis directly at the protein level is necessary to unravel the critical changes that occur as part of disease pathogenesis. Currently there is much interest in the use of molecular markers or biomarkers for disease diagnosis and prognosis. Biomarkers are cellular, biochemical, and molecular alterations by which normal, abnormal, or simply biologic processes can be recognized or monitored. These alterations should be able to objectively measure
  • 27. The Promise of Proteomics: Biology, Applications, and Challenges 3 and evaluate normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention. Proteomics is valuable in the discovery of biomarkers as the proteome reflects both the intrinsic genetic program of the cell and the impact of its immediate environment. Protein expression and function are subject to modulation through transcription as well as through translational and posttranslational events. More than one messenger RNA can result from one gene through differential splicing, and proteins can undergo more than 200 types of posttranslation modifications that can affect function, protein–protein and protein– ligand interactions, stability, targeting, or half-life.2 During the transformation of a normal cell into a neoplastic cell, distinct changes occur at the protein level that range from altered expression, differential modification, changes in specific activity, and aberrant localization, all of which affect cellular function. Identifying and understanding these changes is the underlying theme in cancer proteomics. The deliverables include identification of biomarkers that have utility both for early detection and for determining therapy. While proteomics has traditionally dealt with quantitative analysis of protein expression, more recently proteomics has been viewed to encompass structural analyses of proteins.3 Quantitative proteomics strives to investigate the changes in protein expression in different physiological states such as in healthy and diseased tissue or at different stages of the disease. This enables the identification of state- and stage-specific proteins. Structural proteomics attempts to uncover the structure of proteins and to unravel and map protein–protein interactions. Proteomics provides a window to pathophysiological states of cells and their microenvironments and reflects changes that occur as disease-causing agents interact with the host environ- ment. Some examples of proteomics are described below. 1.3 GENE–ENVIRONMENT INTERACTIONS Infectious diseases result from interactions between the host and pathogen, and understanding these diseases requires understanding not only alterations in gene and protein expressions within the infected cells but also alterations in the sur- rounding cells and tissues. Although genome and transcriptome analyses can pro- vide a wealth of information on global alterations in gene expression that occur during infections, proteomic approaches allow the monitoring of changes in protein levels and modifications that play important roles in pathogen–host interactions. During acute stages of infection, pathogen-coded proteins play a significant role, whereas in the chronic infection, host proteins play the dominating role. Viruses, such as hepatitis B (HBV), hepatitis C (HCV), and human papillomavirus (HPV), are suitable for proteomic analysis because they express only eight to ten major genes.4,5 Analyzing a smaller number of genes is easier than analyzing the proteome of an organism with thousands of genes.6–8 For example, herpes simplex virus type 1 (HSV-1) infection induces severe alterations of the translational apparatus, includ- ing phosphorylation of ribosomal proteins and the association of several nonribo- somal proteins with the ribosomes.9–12 Whether ribosomes themselves could con- tribute to the HSV-1–induced translational control of host and viral gene expression has been investigated. As a prerequisite to test this hypothesis, the investigators
  • 28. 4 Informatics in Proteomics undertook the identification of nonribosomal proteins associated with the ribosomes during the course of HSV-1 infection. Two HSV-1 proteins, VP19C and VP26, that are associated to ribosomes with different kinetics were identified. Another nonri- bosomal protein identified was the poly(A)-binding protein 1 (PAB1P). Newly synthesized PAB1P continued to associate to ribosomes throughout the course of infection. This finding attests to the need for proteomic information for structural and functional characterization. Approximately 15% of human cancers (about 1.5 million cases per year, world- wide) are linked to viral, bacterial, or other pathogenic infections.13 For cancer development, infectious agents interact with host genes and sets of infectious agent-specific or host-specific genes are expressed. Oncogenic infections increase the risk of cancer through expression of their genes in the infected cells. Occasion- ally, these gene products have paracrine effects, leading to neoplasia in neighboring cells. More typically, it is the infected cells that become neoplastic. These viral, bacterial, and parasitic genes and their products are obvious candidates for pharma- cologic interruptions or immunologic mimicry, promising approaches for drugs and vaccines. By understanding the pathways involved in the infectious agent–host interaction leading to cancer, it would be possible to identify targets for intervention. 1.4 ORGANELLE-BASED PROTEOMICS Eukaryotic cells contain a number of organelles, including nucleoli, mitochondria, smooth and rough endoplasmic reticula, Golgi apparatus, peroxisomes, and lysosomes. The mitochondria are among the largest organelles in the cell. Mitochondrial dys- function has been frequently reported in cancer, neurodegenerative diseases, diabetes, and aging syndromes.14–16 The mitochondrion genome (16.5 Kb) codes only for a small fraction (estimated to be 1%) of the proteins housed within this organelle. The other proteins are encoded by the nuclear DNA (nDNA) and transported into the mitochondria. Thus, a proteomic approach is needed to fully understand the nature and extent of mutated and modified proteins found in the mitochondria of diseased cells. According to a recent estimate, there are 1000 to 1500 polypeptides in the human mitochondria.17–20 This estimate is based on several lines of evidence, including the existence of at least 800 distinct proteins in yeast and Arabidopsis thaliana mitochondria18,19 and the identification of 591 abundant mouse mitochondrial proteins.20 Investigators face a number of challenges in organelle proteome characterization and data analysis. A complete characterization of the posttranslational modifications that mitochondrial proteins undergo is an enormous and important task, as all of these modifications cannot be identified by a single approach. Differences in post- translational modifications are likely to be associated with the onset and progression of various diseases. In addition, the mitochondrial proteome, although relatively simple, is made up of complex proteins located in submitochondrial compartments. Researchers will need to reduce the complexity to subproteomes by fractionation and analysis of various compartments. A number of approaches are focusing on specific components of the mitochondria, such as isolation of membrane proteins, affinity labeling, and isolation of redox proteins,21 or isolation of large complexes.22
  • 29. The Promise of Proteomics: Biology, Applications, and Challenges 5 Other approaches may combine expression data from other species, such as yeast, to identify and characterize the human mitochondrial proteome.23,24 The need to identify mitochondrial proteins associated with or altered during the development and progression of cancer is compelling. For example, mitochondrial dysfunction has been frequently associated with transport of proteins, such as cyto- chrome c. Mitochondrial outer membrane permeabilization by pro-apoptotic proteins, such as Bax or Bak, results in the release of cytochrome c and the induction of apoptosis. An altered ratio of anti-apoptotic proteins (e.g., Bcl-2) to pro-apoptotic proteins (e.g., Bax and Bak) promotes cell survival and confers resistance to therapy.25 1.5 CANCER DETECTION Molecular markers or biomarkers are currently used for cancer detection, diagnosis, and monitoring therapy and are likely to play larger roles in the future. In cancer research, a biomarker refers to a substance or process that is indicative of the presence of cancer in the body. It might be a molecule secreted by the malignancy itself, or it can be a specific response of the body to the presence of cancer. The biological basis for usefulness of biomarkers is that alterations in gene sequence or expression and in protein expression and function are associated with every type of cancer and with its progression through the various stages of development. Genetic mutations, changes in DNA methylation, alterations in gene expression, and alterations in protein expression or modification can be used to detect cancer, determine prognosis, and monitor disease progression and therapeutic response. Currently, DNA-based, RNA-based, and protein-based biomarkers are used in cancer risk assessment and detection. The type of biomarker used depends both on the application (i.e., risk assessment, early detection, prognosis, or response to therapy) and the availability of appropriate biomarkers. The relative advantages and disad- vantages of genomic and proteomic approaches have been widely discussed, but since a cell’s ultimate phenotype depends on the functions of expressed proteins, proteomics has the ability to provide precise information on a cell’s phenotype. Tumor protein biomarkers are produced either by the tumor cells themselves or by the surrounding tissues in response to the cancer cells. More than 80% of human tumors (colon, lung, prostate, oral cavity, esophagus, stomach, uterine, cervix, and bladder) originate from epithelial cells, often at the mucosal surface. Cells in these tumors secrete proteins or spontaneously slough off into blood, sputum, or urine. Secreted proteins include growth factors, angiogenic proteins, and proteases. Free DNA is also released by both normal and tumor cells into the blood and patients with cancer have elevated levels of circulating DNA. Thus, body fluids such as blood and urine are good sources for cancer biomarkers. That these fluids can be obtained using minimally invasive methods is a great advantage if the biomarker is to be used for screening and early detection. From a practical point of view, assays of protein tumor biomarkers, due to their ease of use and robustness, lend themselves to routine clinical practice, and histor- ically tumor markers have been proteins. Indeed, most serum biomarkers used today are antibody-based tests for epithelial cell proteins. Two of the earliest and most widely used cancer biomarkers are PSA and CA25. Prostate-specific antigen (PSA)
  • 30. 6 Informatics in Proteomics is a secreted protein produced by epithelial cells within the prostate. In the early 1980s it was found that sera from prostate cancer patients contain higher levels of PSA than do the sera of healthy individuals. Since the late 1980s, PSA has been used to screen asymptomatic men for prostate cancer and there has been a decrease in mortality rates due to prostate cancer. How much of this decrease is attributable to screening with PSA and how much is due to other factors, such as better therapies, is uncertain. Although PSA is the best available serum biomarker for prostate cancer and the only one approved by the FDA for screening asymptomatic men, it is far from ideal. Not all men with prostate cancer have elevated levels of PSA; 20 to 30% of men with prostate cancer have normal PSA levels and are misdiagnosed. Con- versely, because PSA levels are increased in other conditions, such as benign pros- tatic hypertrophy and prostatitis, a significant fraction of men with elevated levels of PSA do not have cancer and undergo needless biopsies. The CA125 antigen was first detected over 20 years ago; CA125 is a mucin-like glycoprotein present on the cell surface of ovarian tumor cells that is released into the blood.26 Serum CA125 levels are elevated in about 80% of womenwith epithelial ovarian cancer but in less than 1% of healthy women. However, the CA125 test only returns a positive result for about 50% of Stage I ovarian cancer patients and is, therefore, not useful by itself as an early detection test.27 Also, CA125 is elevated in a number of benign conditions, which diminishes its usefulness in the initial diagnosis of ovarian cancer. Despite these limitations, CA125 is considered to be one of the best available cancer serum markers and is used primarily in the man- agement of ovarian cancer. Falling CA125 following chemotherapy indicates that the cancer is responding to treatment.28 Other serum protein biomarkers, such as alpha fetoprotein (AFP) for hepatocellular carcinoma and CA15.3 for breast cancer, are also of limited usefulness as they are elevated in some individuals without cancer, and not all cancer patients have elevated levels. 1.6 WHY PROTEOMICS HAS NOT SUCCEEDED IN THE PAST: CANCER AS AN EXAMPLE The inability of these protein biomarkers to detect all cancers (false negatives) reflects both the progressive nature of cancer and its heterogeneity. Cancer is not a single disease but rather an accumulation of several events, genetic and epigenetic, arising in a single cell over a long period of time. Proteins overexpressed in late stage cancers may not be overexpressed in earlier stages and, therefore, are not useful for early cancer detection. For example, the CA125 antigen is not highly expressed in many Stage I ovarian cancers. Also, because tumors are heterogeneous, the same sets of proteins are not necessarily overexpressed in each individual tumor. For example, while most patients with high-grade prostate cancers have increased levels of PSA, approximately 15% of these patients do not have an elevated PSA level. The reciprocal problem of biomarkers indicating the presence of cancer when none is present (false positives) results because these proteins are not uniquely produced by tumors. For example, PSA is produced by prostatitis (inflammation of the prostate) and benign prostatic hyperplasia (BPH), and elevated CA125 levels are caused by endometriosis and pelvic inflammation.
  • 31. The Promise of Proteomics: Biology, Applications, and Challenges 7 The performance of any biomarker can be described in terms of its specificity and sensitivity. In the context of cancer biomarkers, sensitivity refers to the proportion of case subjects (individuals with confirmed disease) who test positive for the biom- arker, and specificity refers to the proportion of control subjects (individuals without disease) who test negative for the biomarker. An ideal biomarker test would have 100% sensitivity and specificity; i.e., everyone with cancer would have a positive test, and everyone without cancer would have a negative test. None of the currently available protein biomarkers achieve 100% sensitivity and specificity. For example, as described above, PSA tests achieve 70 to 90% sensitivity and only about 25% specificity, which results in many men having biopsies when they do not have detectable prostrate cancer. The serum protein biomarker for breast cancer CA15.3 has only 23% sensitivity and 69% specificity. Other frequently used terms are positive predictive value (PPV), the chance that a person with a positive test has cancer, and negative predictive value (NPV), the chance that a person with a negative test does not have cancer. PPV is affected by the prevalence of disease in the screened popu- lation. For a given sensitivity and specificity, the higher the prevalence, the higher the PPV. Even when a biomarker provides high specificity and sensitivity, it may not be useful for screening the general population if the cancer has low prevalence. For example, a biomarker with 100% sensitivity and 95% specificity has a PPV of only 17% for a cancer with 1% prevalence (only 17 out of 100 people with a positive test for the biomarker actually have cancer) and 2% for a cancer with 0.1% prevalence. The prevalence of ovarian cancer in the general population is about 0.04%. Thus, a biomarker used to screen the general population must have significantly higher spec- ificity and sensitivity than a biomarker used to monitor an at-risk population. 1.7 HOW HAVE PROTEOMIC APPROACHES CHANGED OVER THE YEARS? Currently investigators are pursuing three different approaches to develop biomarkers with increased sensitivity and specificity. The first is to improve on a currently used biomarker. For instance, specificity and sensitivity of PSA may be improved by measurement of its complex with alpha(1)-antichymotrypsin; patients with benign prostate conditions have more free PSA than bound, while patients with cancer have more bound PSA than free.29 This difference is thought to result from differences in the type of PSA released intothe circulation by benign and malignant prostatic cells. Researchers are also trying to improve the specificity and sensitivity of PSA by incorporating age- and race-specific cut points and by adjusting serum PSA concen- tration by prostatic volume (PSA density). The second approach is to discover and validate new biomarkers that have improved sensitivity and specificity. Many inves- tigators are actively pursuing new biomarkers using a variety of new and old tech- nologies. The third approach is to use a panel of biomarkers, either by combining several individually identified biomarkers or by using mass spectrometry to identify a pattern of protein peaks in sera that can be used to predict the presence of cancer or other diseases. High-throughput proteomic methodologies have the potential to revolutionize protein biomarker discovery and to allow for multiple markers to be assayed simultaneously.
  • 32. 8 Informatics in Proteomics In the past, researchers have mostly used a one-at-time approach to biomarker discovery. They have looked for differences in the levels of individual proteins in tissues or blood from patients with disease and from healthy individuals. The choice of proteins to examine was frequently based on biological knowledge of the cancer and its interaction with surrounding tissues. This approach is laborious and time consuming, and most of the biomarkers discovered thus far do not have sufficient sensitivity and specificity to be useful for early cancer detection. A mainstay of protein biomarker discovery has been two-dimensional gel electrophoresis (2DE). The traditional 2DE method is to separately run extracts from control and diseased tissues or cells and to compare the relative intensities of the various protein spots on the stained gels. Proteins whose intensities are significantly increased or decreased in diseased tissues are identified using mass spectrometry. For example, 2DE was recently used to identify proteins that are specifically overexpressed in colon cancer.30 The limitations of the 2DE approach are well known: the gels are difficult to run reproducibly, a significant fraction of the proteins either do not enter the gels or are not resolved, low-abundance proteins are not detected, and relatively large amounts of sample are needed. A number of modifications have been made to overcome these limitations, including fractionation of samples prior to 2DE, the use of immobilized pH gradients, and labeling proteins from control and disease cells with different fluorescent dyes and then separating them on the same gel (differential in-gel elec- trophoresis; DIGE). An additional difficulty is contamination from neighboring stromal cells that can confound the detection of tumor-specific markers. Laser capture microdissection (LCD) can be used to improve the specificity of 2DE, as it allows for the isolation of pure cell populations; however, it further reduces the amount of sample available for analysis. Even with these modifications, 2DE is a relatively low throughput methodology that only samples a subset of the proteome, and its applicability for screening and diagnosis is very limited. A number of newer methods for large-scale protein analysis are being used or are under development. Several of these rely on mass spectrometry and database interrogation. Mass spectrometers work by imparting an electrical charge to the analytes (e.g., proteins or peptides) and then sending the charged particles though a mass analyzer. A time of flight (TOF) mass spectrometer measures the time it takes a charged particle (protein or peptide) to reach the detector; the higher the mass the longer the flight time. A mixture of proteins or peptides analyzed by TOF generates a spectrum of protein peaks. TOF mass spectrometers are used to analyze peptide peaks generated by protease digestion of proteins resolved on 2DE. A major advance in this methodology is matrix-assisted laser desorption ionization (a form of soft ionization), which allows for the ionization of larger biomolecules such as proteins and peptides. TOF mass spectrometers are also used to identify peptides eluted from HPLC columns. With tandem mass spectrometers (MS/MS), a mixture of charged peptides is separated in the first MS according to their mass-to-charge ratios, generating a list of peaks. In the second MS, the spectrometer is adjusted so that a single mass-to-charge species is directed to a collision cell to generate fragment ions, which are then separated by their mass-to-charge ratios. These patterns are compared to databases to identify the peptide and its parent protein. Liquid chromatography
  • 33. The Promise of Proteomics: Biology, Applications, and Challenges 9 combined with MS or MS/MS (LC-MS and LC-MS/MS) is currently being used as an alternative to 2DE to analyze complex protein mixtures. In this approach, a mixture of proteins is digested with a protease, and the resulting peptides are then fractionated by liquid chromatography (typically reverse-phase HPLC) and analyzed by MS/MS and database interrogation. A major limitation to this approach is the vast number of peptides generated when the initial samples contain a large number of proteins. Even the most advanced LC-MS/MS systems cannot resolve and analyze these complex peptide mixtures, and currently it is necessary to either prefractionate the proteins prior to proteolysis or to enrich for certain types of peptides (e.g., phosphorylated, glycoslylated, or cysteine containing) prior to liquid chromatography. Although the use of mass spectrometry has accelerated the pace of protein identification, it is not inherently quantitative and the amounts of peptides ionized vary. Thus, the signal obtained in the mass spectrometer cannot be used to measure the amount of protein in the sample. Several comparative mass spectrometry methods have been developed to determine the relative amounts of a particular peptide or protein in two different samples. These approaches rely on labeling proteins in one sample with a reagent containing one stable isotope and labeling the proteins in the other sample with the same reagent containing a different stable isotope. The samples are then mixed, processed, and analyzed together by mass spectrometry. The mass of a peptide from one sample will be different by a fixed amount from the same peptide from the other sample. One such method (isotope-coded affinity tags; ICAT) modifies cysteine residues with an affinity reagent that contains either eight hydrogen or eight deuterium atoms.31 Other methods include digestion in 16O and 18O water and culturing cells in 12C- and 13C-labeled amino acids. Although the techniques described thus far are useful for determining proteins that are differently expressed in control and disease, they are expensive, relatively low throughput, and not suitable for routine clinical use. Surface-enhanced laser description ionization time-of-flight (SELDI-TOF) and protein chips are two pro- teomic approaches that have the potential to be high throughput and adaptable to clinical use. In the SELDI-TOF mass spectrometry approach, protein fractions or body fluids are spotted onto chromatographic surfaces (ion exchange, reverse phase, or metal affinity) that selectively bind a subset of the proteins (Ciphergen® Protein- Chip Arrays). After washing to remove unbound proteins, the bound proteins are ionized and analyzed by TOF mass spectrometry. This method has been used to identify disease-related biomarkers, including the alpha chain of haptoglobin (Hp-alpha) for ovarian cancer32 and alpha defensin for bladder cancer. Other inves- tigators are using SELDI-TOF to acquire proteomic patterns from whole sera, urine, or other body fluids. The complex patterns of proteins obtained by the TOF mass spectrometer are analyzed using pattern recognition algorithms to identify a set of protein peaks that can be used to distinguish disease from control. With this approach, protein identification and characterization are not necessary for development of clin- ical assays, and a SELDI protein profile may be sufficient for screening. For example, this method has been reported to identify patients with Stage I ovarian cancer with 100% sensitivity and 95% specificity.27 Similar, albeit less dramatic, results have been reported for other types of cancer.28,33–36 At this time, it is uncertain whether SELDI protein profiling will prove to be as valuable a diagnostic tool as the initial
  • 34. 10 Informatics in Proteomics reports have suggested. A major technical issue is the reproducibility of the protein profiles. Variability between SELDI-TOF instruments, in the extent of peptide ion- ization, in the chips used to immobilize the proteins, and in sample processing, can contribute to the lack of reproducibility. There is concern that the protein peaks identified by SELDI and used for discriminating between cancer and control are not derived from the tumor per se but rather from the body’s response to the cancer (epiphenomena) and that they may not be specific for cancer; inflammatory condi- tions and benign pathologies may elicit the same bodily responses.37,38 Most known tumor marker proteins in the blood are on the order of ng/ml (PSA above 4 ng/ml and alpha fetoprotein above 20 ng/ml are considered indicators of, respectively, prostate and hepatocellular cancers). The SELDI-TOF peptide peaks typically used to distinguish cancer from control are relatively large peaks representing proteins present in the serum on the order of μg to mg/ml; these protein peaks may result from cancer-induced proteolysis or posttranslational modification of proteins nor- mally present in sera. Although identification of these discriminating proteins may not be necessary for this “black-box” approach to yield a clinically useful diagnostic test, identifying these proteins may help elucidate the underlying pathology and lead to improved diagnostic tests. Potential advantages of the SELDI for clinical assays are that it is high throughput, it is relatively inexpensive, and it uses minimally invasive specimens (blood, urine, sputum). Interest in protein chips in part reflects the success of DNA microarrays. While these two methodologies have similarities, a number of technical and biological differences exist that make the practical application of protein chips or arrays chal- lenging. Proteins, unlike DNA, must be captured in their native conformation and are easily denatured irreversibly. There is no method to amplify their concentrations, and their interactions with other proteins and ligands are less specific and of variable affinity. Current bottlenecks in creating protein arrays include the production (expres- sion and purification) of the huge diversity of proteins that will form the array elements, methods to immobilize proteins in their native states on the surface, and lack of detection methods with sufficient sensitivity and accuracy. To date, the most widely used application of protein chips are antibody microarrays that have the potential for high-throughput profiling of a fixed number of proteins. A number of purified, well-characterized antibodies are spotted onto a surface and then cell extracts or sera are passed over the surface to allow for the antigen to bind to the specific, immobilized antibodies. The bound proteins are detected either by using secondary antibodies against each antigen or by using lysates that are tagged with fluorescent or radioactive labels. A variation that allows for direct comparison between two different samples is to label each extract with a different fluorescent dye, which is then mixed prior to exposure to the antibody array. A significant problem with antibody arrays is lack of specificity; the immobilized antibodies cross react with proteins other than the intended target. The allure of protein chips is their potential to rapidly analyze multiple protein markers simultaneously at a moderate cost. As discussed earlier, most currently available cancer biomarkers lack sufficient sensitivity and specificity for use in early detection, especially to screen asymptom- atic populations. One approach to improve sensitivity and specificity is to use a panel of biomarkers. It is easy to envision how combining biomarkers can increase
  • 35. The Promise of Proteomics: Biology, Applications, and Challenges 11 sensitivity if they detect different pathological processes or different stages of cancer, and one factor to consider in developing such a panel is whether the markers are complementary. However, simply combining two biomarkers will more than likely decrease specificity and increase the number of false positives. Reducing their cutoff values (the concentration of a biomarker that is used as an indication of the presence of cancer) can be useful to reduce the number of false positives. A useful test for evaluating a single biomarker or panel of biomarkers is the receiver operating characteristic (ROC) curve. An ROC curve is a graphical display of false-positive rates and true-positive rates from multiple classification rules (different cutoff values for the various biomarkers). Each point on the graph corresponds to a different classification rule. In addition to analyzing individually measured markers, ROC curves can be used to analyze SELDI-TOF proteomic profiles.39 The measurement and analysis of biomarker panels will be greatly facilitated by high-throughput technologies such as protein arrays, microbeads with multiple antibodies bound to them, and mass spectrometry. It is in these areas that a number of companies are concentrating their efforts, as not only must a biomarker or panel of biomarkers have good specificity and sensitivity, there must be an efficient and cost-effective method to assay them. 1.8 FUTURE OF PROTEOMICS IN DRUG DISCOVERY, SCREENING, EARLY DETECTION, AND PREVENTION Proteomics has benefited greatly from the development of high-throughput meth- ods to simultaneously study thousands of proteins. The successful application of proteomics to medical diagnostics will require the combined efforts of basic researchers, physicians, pathologists, technology developers, and information sci- entists (Figure 1.1). However, its application in clinics will require development FIGURE 1.1 Application of medical proteomics: Interplay between various disciplines and expertise is the key to developing tools for detection, diagnosis, and treatment of cancer. Technologist Information Scientist Basic Scientist Physician/Scientist Cancer Biorespository BIOMARKERS DIAGNOSTICS THERAPEUTICS
  • 36. 12 Informatics in Proteomics of test kits based on pattern analysis, single molecule detection, or multiplexing of several clinical acceptable tests, such as ELISA, for various targets in a sys- tematic way under rigorous quality control regimens (Figure 1.2). Interperson heterogeneity is a major hurdle when attempting to discover a disease-related biomarker within biofluids such as serum. However, the coupling of high-through- put technologies with protein science now enables samples from hundreds of patients to be rapidly compared. Admittedly, proteomic approaches cannot remove the “finding a needle in a haystack” requirement for discovering novel biomarkers; however, we now possess the capability to inventory components within the “haystack” at an unprecedented rate. Indeed, such capabilities have already begun to bear fruits as our knowledge of the different types of proteins within serum is growing exponentially and novel technologies for diagnosing cancers using pro- teomic technologies are emerging. Is the development of methods capable of identifying thousands of proteins in a high-throughput manner going to lead to novel biomarkers for the diagnosis of early stage diseases or is the amount of data that is accumulated in such studies going to be overwhelming? The answer to this will depend on our ability to develop and successfully deploy bioinformatic tools. Based on the rate at which interesting leads are being discovered, it is likely that not only will biomarkers with better sensitivity and specificity be identified but individuals will be treated using custom- ized therapies based on their specific protein profile. The promise of proteomics for discovery is its potential to elucidate fundamental information on the biology of cells, signaling pathways, and disease processes; to identify disease biomarkers and new drug targets; and to profile drug leads for efficacy and safety. The promise of FIGURE 1.2 Strategies in medical proteomics: Steps in identification of detection targets and the development of clinical assays. Protein Profiling Define Protein Changes 1. 2DE 2. SELDI-TOF-MS 3. LC-coupled MS Bio-informatics Bio-computation Databases Protein Identification 1. Nano-LC-coupled SELDI-MS 2. CapLC-MS/MS 3. TOF-MS Assay Development 1. ELISA 2. SELDI-based 3. Ab arrays Functional Analysis 1. Protein-protein interaction 2. Cellular targeting 3. Protein-ligand interactions
  • 37. The Promise of Proteomics: Biology, Applications, and Challenges 13 proteomics for clinical use is the refinement and development of protein-based assays that are accurate, sensitive, robust, and high throughput. Since many of the proteomic technologies and data management tools are still in their infancy, their validations and refinements are going to be the most important tasks in the future. REFERENCES 1. Wasinger, V.C., Cordwell, S.J., Cerpa-Poljak, A., et al. Progress with gene-product mapping of the Mollicutes: Mycoplasma genitalium. Electrophoresis, 16, 1090–1094, 1995. 2. Banks, R.E., Dunn, M.J., Hochstrasser, D.F., et al. Proteomics: New perspectives, new biomedical opportunities. Lancet, 356, 1749–1756, 2000. 3. Anderson, N.L., Matheson, A.D., and Steiner, S. Proteomics: Applications in basic and applied biology. Curr. Opin. Biotechnol., 11, 408–412, 2000. 4. Genther, S.M., Sterling, S., Duensing, S., Munger, K., Sattler, C., and Lambert, P.F. Quantitative role of the human papillomavirus type 16 E5 gene during the productive stage of the viral life cycle. J. Virol., 77, 2832–2842, 2003. 5. Middleton, K., Peh, W., Southern, S., et al. Organization of human papillomavirus productive cycle during neoplastic progression provides a basis for selection of diagnostic markers. J. Virol., 77, 10186–10201, 2003. 6. Verma, M., Lambert, P.F., and Srivastava, S.K. Meeting highlights: National Cancer Institute workshop on molecular signatures of infectious agents. Dis. Markers, 17, 191–201, 2001. 7. Verma, M. and Srivastava, S. New cancer biomarkers deriving from NCI early detec- tion research. Recent Results Canc. Res., 163, 72–84; discussion, 264–266, 2003. 8. Verma, M. and Srivastava, S. Epigenetics in cancer: implications for early detection and prevention. Lancet Oncol., 3, 755–763, 2002. 9. Diaz, J.J., Giraud, S., and Greco, A. Alteration of ribosomal protein maps in herpes simplex virus type 1 infection. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci., 771, 237–249, 2002. 10. Greco, A., Bausch, N., Coute, Y., and Diaz, J.J. Characterization by two-dimensional gel electrophoresis of host proteins whose synthesis is sustained or stimulated during the course of herpes simplex virus type 1 infection. Electrophoresis, 21, 2522–2530, 2000. 11. Greco, A., Bienvenut, W., Sanchez, J.C., et al. Identification of ribosome-associated viral and cellular basic proteins during the course of infection with herpes simplex virus type 1. Proteomics, 1, 545–549, 2001. 12. Laurent, A.M., Madjar, J.J., and Greco, A. Translational control of viral and host protein synthesis during the course of herpes simplex virus type 1 infection: evidence that initiation of translation is the limiting step. J. Gen. Virol., 79, 2765–2775, 1998. 13. Gallo, R.C. Thematic review series. XI: Viruses in the origin of human cancer. Introduction and overview. Proc. Assoc. Am. Phys,, 111, 560–562, 1999. 14. Wallace, D.C. Mitochondrial diseases in man and mouse. Science, 283, 1482–1488, 1999. 15. Enns, G.M. The contribution of mitochondria to common disorders. Mol. Genet. Metab., 80, 11–26, 2003. 16. Maechler, P. and Wollheim, C.B. Mitochondrial function in normal and diabetic beta-cells. Nature, 414, 807–812, 2001.
  • 38. 14 Informatics in Proteomics 17. Lopez, M.F. and Melov, S. Applied proteomics: mitochondrial proteins and effect on function. Circ. Res., 90, 380–389, 2002. 18. Kumar, A., Agarwal, S., Heyman, J.A., et al. Subcellular localization of the yeast proteome. Genes Dev., 16, 707–719, 2002. 19. Werhahn, W. and Braun, H.P. Biochemical dissection of the mitochondrial proteome from Arabidopsis thaliana by three-dimensional gel electrophoresis. Electrophoresis, 23, 640–646, 2002. 20. Mootha, V.K., Bunkenborg, J., Olsen, J.V., et al. Integrated analysis of protein com- position, tissue diversity, and gene regulation in mouse mitochondria. Cell, 115, 629–640, 2003. 21. Lin, T.K., Hughes, G., Muratovska, A., et al. Specific modification of mitochondrial protein thiols in response to oxidative stress: A proteomics approach. J. Biol. Chem., 277, 17048–17056, 2002. 22. Brookes, P.S., Pinner, A., Ramachandran, A., et al. High throughput two-dimensional blue-native electrophoresis: A tool for functional proteomics of mitochondria and signaling complexes. Proteomics, 2, 969–977, 2002. 23. Richly, E., Chinnery, P.F., and Leister, D. Evolutionary diversification of mitochon- drial proteomes: Implications for human disease. Trends Genet., 19, 356–362, 2003. 24. Koc, E.C., Burkhart, W., Blackburn, K., Moseley, A., Koc, H., and Spremulli, L.L. A proteomics approach to the identification of mammalian mitochondrial small sub- unit ribosomal proteins. J. Biol. Chem., 275, 32585–32591, 2000. 25. Newmeyer, D.D. and Ferguson-Miller, S. Mitochondria: Releasing power for life and unleashing the machineries of death. Cell, 112, 481–490, 2003. 26. Yin, B.W., Dnistrian, A., and Lloyd, K.O. Ovarian cancer antigen CA125 is encoded by the MUC16 mucin gene. Int. J. Canc., 98, 737–740, 2002. 27. Petricoin, E.F., Ardekani, A.M., Hitt, B.A., et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet, 359, 572–577, 2002. 28. Li, J., Zhang, Z., Rosenzweig, J., Wang, Y.Y., and Chan, D.W. Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clin. Chem., 48, 1296–1304, 2002. 29. Martinez, M., Espana, F., Royo, M., et al. The proportion of prostate-specific antigen (PSA) complexed to alpha(1)-antichymotrypsin improves the discrimination between prostate cancer and benign prostatic hyperplasia in men with a total PSA of 10 to 30 microg/L. Clin. Chem., 48, 1251–1256, 2002. 30. Brunagel, G., Schoen, R.E., and Getzenberg, R.H. Colon cancer specific nuclear matrix protein alterations in human colonic adenomatous polyps. J. Cell Biochem., 91, 365–374, 2004. 31. Gygi, S.P., Rist, B., Gerber, S.A., Turecek, F., Gelb, M.H., and Aebersold, R. Quan- titative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol., 17, 994–999, 1999. 32. Ye, B., Cramer, D.W., Skates, S.J., et al. Haptoglobin-alpha subunit as potential serum biomarker in ovarian cancer: Identification and characterization using proteomic profiling and mass spectrometry. Clin. Canc. Res., 9, 2904–2911, 2003. 33. Adam, B.L., Qu, Y., Davis, J.W., et al. Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyper- plasia and healthy men. Canc. Res., 62, 3609–3614, 2002. 34. Poon, T.C., Yip, T.T., Chan, A.T., et al. Comprehensive proteomic profiling identifies serum proteomic signatures for detection of hepatocellular carcinoma and its sub- types. Clin. Chem., 49, 752–760, 2003.
  • 39. The Promise of Proteomics: Biology, Applications, and Challenges 15 35. Kozak, K.R., Amneus, M.W., Pusey, S.M., et al. Identification of biomarkers for ovarian cancer using strong anion-exchange ProteinChips: Potential use in diagnosis and prognosis. Proc. Natl. Acad. Sci. USA, 100, 12343–12348, 2003. 36. Petricoin, E.F., III, Ornstein, D.K., Paweletz, C.P., et al. Serum proteomic patterns for detection of prostate cancer. J. Natl. Canc. Inst., 94, 1576–1578, 2002. 37. Diamandis, E.P. Point: Proteomic patterns in biological fluids: Do they represent the future of cancer diagnostics? Clin. Chem., 49, 1272–1275, 2003. 38. Petricoin, E., III and Liotta, L.A. Counterpoint: The vision for a new diagnostic paradigm. Clin. Chem., 49, 1276–1278, 2003. 39. Baker, S.G. The central role of receiver operating characteristic (ROC) curves in evaluating tests for the early detection of cancer. J. Natl. Canc. Inst., 95, 511–515, 2003.
  • 41. 17 2 ProteomicsTechnologies and Bioinformatics Sudhir Srivastava and Mukesh Verma CONTENTS 2.1 Introduction: Proteomics in Cancer Research...............................................17 2.1.1 Two-Dimensional Gel Electrophoresis (2DE)...................................17 2.1.2 Mass Spectrometry.............................................................................18 2.1.3 Isotope-Coded Affinity Tags (ICAT) .................................................19 2.1.4 Differential 2DE (DIGE) ...................................................................19 2.1.5 Protein-Based Microarrays ................................................................20 2.2 Current Bioinformatics Approaches in Proteomics.......................................23 2.2.1 Clustering ...........................................................................................24 2.2.2 Artificial Neural Networks.................................................................25 2.2.3 Support Vector Machine (SVM)........................................................25 2.3 Protein Knowledge System............................................................................26 2.4 Market Opportunities in Computational Proteomics.....................................26 2.5 Challenges ......................................................................................................27 2.6 Conclusion......................................................................................................28 References................................................................................................................28 2.1 INTRODUCTION: PROTEOMICS IN CANCER RESEARCH Proteomics is the study of all expressed proteins. A major goal of proteomics is a complete description of the protein interaction networks underlying cell physiology. Before we discuss protein computational tools and methods, we will give a brief background of current proteomic technologies used in cancer diagnosis. For cancer diagnosis, both surface-enhanced laser desorption ionization (SELDI) and two-dimensional gel electrophoresis (2DE) approaches have been used.1,2 Recently protein-based microarrays have been developed that show great promise for analyz- ing the small amount of samples and yielding the maximum data on the cell’s microenvironment.3–5
  • 42. 18 Informatics in Proteomics 2.1.1 TWO-DIMENSIONAL GEL ELECTROPHORESIS (2DE) The recent upsurge in proteomics research has been facilitated largely by stream- lining of 2DE technology and parallel developments in MS for analysis of peptides and proteins. Two-dimensional gel electrophoresis is used to separate proteins based on charge and mass and can be used to identify posttranslationally modified proteins. A major limitation of this technology in proteomics is that membrane proteins contain a considerable number of hydrophobic amino acids, causing them to precip- itate during the isoelectric focusing of standard 2DE.6 In addition, information regarding protein– protein interactions is lost during 2DE due to the denaturing conditions used in both gel dimensions. To overcome these limitations, two-dimen- sional blue-native gel electrophoresis has been used to resolve membrane proteins. In this process, membrane protein complexes are solubilized and resolved in the native forms in the first dimension. The separation in the second dimension is performed by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE), which denatures the complexes and resolves them into their separate subunits. Protein spots are digested with trypsin and analyzed by matrix-assisted laser ionization desorption time-of-flight mass spectrometry (MALDI-TOF MS). The 2DE blue-native gel electrophoresis is suitable for small biological samples and can detect posttranslational modifications (PTMs) in proteins. Common PTMs include phosphorylation, oxidation and nitrosation, fucosylation and galactosylation, reaction with lipid-derived aldehydes, and tyrosine nitration. Improvements are needed to resolve low-molecular-mass proteins, especially those with isoelectric points below pH 3 and above pH 10. This technique has low throughput (at the most 30 samples can be run simultaneously), and most of the steps are manual. Automatic spot-picking also needs improvement. 2.1.2 MASS SPECTROMETRY Mass spectrometry (MS) is an integral part of the proteomic analysis. MS instruments are made up of three primary components: the source, which produces ions for analysis; the mass analyzer, which separates the ions based on their mass-to-charge ratios (m/z); and the detector, which quantifies the ions resolved by the analyzer. Multiple subtypes of ion sources, analyzers, and detectors have been developed, and different components can be combined to create different instruments, but the principle remains the same— the spectrometers create ion mixtures from a sample and then resolve them into their component ions based on their m/z values. Significant improvements have been made in spectrometric devices during the past two decades, allowing precise analysis of biomolecules too fragile to survive earlier instrumentation. For ionization of peptides and proteins, these ionization sources are usually coupled to time-of-flight (TOF)2,7,8 spectrometers. Historically, MS has been limited to the analysis of small molecules. Larger biomolecules, such as peptides or proteins, simply do not survive the harsh ionization methods available to create the ions. ESI (electrospray ionization),9 MALDI, and SELDI techniques permit a gentler ionization of large biomolecules, called soft ionization, without too much fragmentation of the principal ions. ESI and MALDI were both developed during the late 1980s and were the foundation for the emergence of MS as a tool of investigation of biological samples. Although MALDI equipment is
  • 43. Proteomics Technologies and Bioinformatics 19 expensive, quantitative high throughput can be achieved (about 100 samples per day can be run by a single laboratory). SELDI, developed in the early 1990s, is a modification of the MALDI approach to ionization. All the ionization techniques described above are sensitive in the picomole-to-femtomole range that is required for application to biological samples, carbohydrates; oligonucleotides; small polar molecules; and peptides, proteins, and posttranslationally modified proteins. Tandem mass analyzers are instruments used for detailed structural analysis of selected peptides. An example of this kind of analyzer is ABI’s QSTAR® (Applied Biosystems, Foster City, CA), a hybrid system that joins two quadrupoles in tandem with a TOF analyzer.10 Particular tryptic peptide fragments can be sequentially selected and subfragmented in the two quadrupoles, and then the subfragments can be measured in the analyzer. The resulting pattern is somewhat like the sequence-ladder pattern obtained in DNA sequencing. Although the analysis of the protein pattern is more complex than DNA sequencing, software is available that allows the direct determination of the amino acid sequence of peptides. Based on the peptide sequence information, it is possible to identify the parent protein in the database. 2.1.3 ISOTOPE-CODED AFFINITY TAGS (ICAT) Isotope-coded affinity tags (ICAT)11 is a technology that facilitates quantitative pro- teomic analysis. This approach uses isotope tagging of thiol-reactive group to label reduced cysteine residues, and a biotin affinity tag to isolate the labeled peptides. These two functional groups are joined by linkers that contain either eight hydrogen atoms (light reagent) or eight deuterium atoms (heavy reagent). Proteins in a sample (cancer) are labeled with the isotopically light version of the ICAT reagent, while proteins in another sample (control) are labeled by the isotopically heavy version of the ICAT reagent. The two samples are combined, digested to generate peptide fragments, and the cysteine-containing peptides are enriched by avidin affinity chromatography. This results in an approximately tenfold enrichment of the labeled peptides. The peptides may be further purified and analyzed by reverse-phase liquid chromatography, fol- lowed by MS. The ratio of the isotopic molecular mass peaks that differ by 8 Da provides a measure of the relative amounts of each protein in the original samples. This technology is good for detection of differentially expressed proteins between two pools. Recently the method has been modified to include 16O and 18O water and culture cells in 12C- and 13C-labeled amino acids. Problems with ICAT include its dependency on radioactive materials, its low throughput (about 30 samples per day), it only detects proteins that contain cysteine, and labeling decreases over time (see also Chapter 16). 2.1.4 DIFFERENTIAL 2DE (DIGE) Differential 2DE (DIGE) allows for a comparison of differentially expressed proteins in up to three samples. In this technology, succinimidyl esters of the cyanine dyes, Cy2, Cy3, and Cy5, are used to fluorescently label proteins in up to three different pools of proteins. After labeling, samples are mixed and run simultaneously on the same 2DE.12 Images of the gel are obtained using three different excitation/emission filters, and the ratios of different fluorescent signals are used to find protein differences among the
  • 44. 20 Informatics in Proteomics samples. The problem with DIGE is that only 2% of the lysine residues in the proteins can be fluorescently modified, so that the solubility of the labeled proteins is maintained during electrophoresis. An additional problem with this technology is that the labeled proteins migrate with slightly higher mass than the bulk of the unlabeled proteins. DIGE technology is more sensitive than silver stain formulations optimized for MS. SYPRO Ruby dye staining detects 40% more protein spots than the Cy dyes. 2.1.5 PROTEIN-BASED MICROARRAYS DNA microarrays have proven to be a powerful technology for large-scale gene expression analysis. A related objective is the study of selective interactions between proteins and other biomolecules, including other proteins, lipids, antibodies, DNA, and RNA. Therefore, the development of assays that could detect protein-directed interactions in a rapid, inexpensive way using a small number of samples is highly desirable. Protein-based microarrays provide such an opportunity. Proteins are sep- arated using any separation mode, which may consist of ion exchange liquid chromatography (LC), reverse-phase LC, or carrier ampholyte–based separations, such as Rotophor. Each fraction obtained after the first dimensional separation can be further resolved by other methods to yield either purified protein or fractions containing a limited number of proteins that can directly be arrayed or spotted. A robotic arrayer is used for spotting provided the proteins remain in liquid form throughout the separation procedure. These slides are hybridized with primary anti- bodies against a set of proteins and the resulting immune complex detected. The resulting image shows only these fractions that react with a specific antibody. The use of multidimensional techniques to separate thousands of proteins enhances the utility of protein microarray technology. This approach is sensitive enough to detect specific proteins in individual fractions that have been spotted directly without further con- centration of the proteins in individual fraction. However, one of the limitations of the nitrocellulose-based array chip is the lack of control over orientation in the immobilization process and optimization of physical interactions between immobi- lized macromolecules and their corresponding ligands, which can affect sensitivity of the assay. Molecular analysis of cells in their native tissue microenvironment can provide the most desirable situation of in vivo states of the disease. However, the availability of low numbers of cells of specific populations in the tissue poses a challenge. Laser capture microdissection (LCM) helps alleviate this matter as this technology is capable of procuring specific, pure subpopulations of cells directly from the tissue. Protein profiling of cancer progression within a single patient using selected longi- tudinal study sets of highly purified normal, premalignant, and carcinoma cells provides the unique opportunity to not only ascertain altered protein profiles but also to determine at what point in the cancer progression these alterations in protein patterns occur. Preliminary results from one such study suggest complex cellular communication between epithelial and stroma cells. A majority of the proteins in this study are signal transduction proteins.5 Protein-based microarrays were used in this study. Advantages and disadvantages of some proteomic-relevant technologies are listed in Table 2.1.
  • 45. Proteomics Technologies and Bioinformatics 21 TABLE 2.1 Comparisons of Various Proteomic Technologies Characteristics ELISA 2DE PAGE IsotopeCoded Affinity Tag (ICAT) TM Multidimensional Protein Identification Technology (MudPIT) TM Proteomic Pattern Diagnostics Protein Microarrays Chemiluminescence or fluorescence-based 2DE serological proteome analysis (SERPA); 2DGE + serum immunoblotting ICAT/LC-EC-MS/MS; ICAT/LCMS/MS/MALD I 2D LC-MS/MS a MALDI-TOF; SELDI-TOF; SELDIT-OF/QStar TM Antibody arrays: chemiluminescence multi-ELISA platforms; glass fluorescence based (Cy3Cy5); tissue arrays Sensitivity Highest Low, particularly for less abundant proteins; sensitivity limited by detection method; difficult to resolve hydrophobic proteins High High Medium sensitivity with diminishing yield at higher molecular weights; improved with fitting of high-resolution QStar mass spectrometer to SELDI Medium to highest (depending on detection system) Direct identification of markers N/A Yes Yes Yes No; possible with additional high-resolution MS Possible when coupled to MS technologies; or probable, if antibodies have been highly defined by epitope mapping and neutralization Use Detection of single, well- characterized specific analyte in plasma/serum, tissue; gold standard of clinical assays Identification and discovery of biomarkers not a direct means for early detection in itself Quantification of relative abundance of proteins from two different cell states Detection and ID of potential biomarkers Diagnostic pattern analysis in body fluids and tissues (LCM); potential biomarker identification Multiparametric analysis of many analytes simultaneously (Continued)
  • 46. 22 Informatics in Proteomics TABLE 2.1 Comparisons of Various Proteomic Technologies (Continued) Characteristics ELISA 2DE PAGE IsotopeCoded Affinity Tag (ICAT) TM Multidimensional Protein Identification Technology (MudPIT) TM Proteomic Pattern Diagnostics Protein Microarrays Throughput Moderate Low Moderate/low Very low High High Advantages/ drawbacks Very robust, well-established use in clinical assays; requires well-characterized antibody for detection; requires extensive validation not amenable to direct discovery; calibration (standard) dependent; FDA regulated for clinical diagnostics Requires a large number of samples; all identifications require validation and testing before clinical use; reproducible and more quantitative combined with fluorescent dyes; not amenable for high throughput or automation; limited resolution, multiple proteins may be positioned at the same location on the gel Robust, sensitive, and automated; suffers from the demand for continuous on-the-fly selection of precursor ions for sequencing; coupling with MALDI promises to overcome this limitation and increase efficiency of proteomic comparison of biological cell states; still not highly quantitative and difficult to measure subpg/ml concentrations Significantly higher sensitivity than 2D- PAGE; much larger coverage of the proteome for biomarker discovery; not reliable for low abundance proteins and low-molecular-weight fractions SELDI protein identification not necessary for biomarker pattern analysis; reproducibility problematic, improved with QStar addition; revolutionary tool; 1-2 μl of material needed; upfront fractionation of protein mixtures and downstream purification methods necessary to obtain absolute protein quantification; MALDI crystallization of protein can lack reproducibility and be matrix dependent; high MW proteins requires MS/MS Format is flexible; can be used to assay for multiple analytes in a single specimen or a single analyte in a number of specimens; requires prior knowledge of analyte being measured; limited by antibody sensitivity and specificity; requires extensive crossvalidation for antibody crossreactivity; requires use of an amplified tag detection system; requires more sample to measure low abundant proteins; needs to be measured undiluted Bioinformatic needs Moderate, standardized Moderate; mostly home grown, some proprietary Moderate Moderate Moderate to extensive; home grown, not standardized Extensive, home grown; not standardized a LCM: Laser Capture Microdissection
  • 47. Proteomics Technologies and Bioinformatics 23 2.2 CURRENT BIOINFORMATICS APPROACHES IN PROTEOMICS Most biological databases have been generated by the biological community, whereas most computational databases have been generated by the mathematical and computational community. As a result, biological databases are not easily acqui- escent to automated data mining methods and are unintelligible to some computers, and computational tools are nonintuitive to biologists. A list of database search tools is presented in Table 2.2, and some frequently used databases to study protein-protein interaction are shown in Table 2.3. A number of bioinformatic approaches have been discussed elsewhere in the book (see Chapters 10 and 14); therefore, we have described only the basic principles of some of these approaches. An important goal of bioinformatics is to develop robust, sensitive, and specific methodologies and tools for the simultaneous analysis of all the proteins expressed by the human genome, referred to as the human proteome, and to establish “bio- signature” profiles that discriminate between disease states. Artifacts can be intro- duced into spectra from physical, electrical, or chemical sources. Each spectrum in TABLE 2.2 Database Search Tools for 2DE and MS Name of the Software Web Site Delta2Da www.decodon.com/Solutions/Delta2D.html GD Impressionista www.genedata.com/productsgell/Gellab.html Investigator HT PC Analyzera www.genomicsolutions.com/proteomics/2dgelanal.html Phortix 2Da www.phortix.com/products/2d_products.htm Z3 2D-Gel Analysis Systema www.2dgels.com Mascot www.matrixscience.com MassSearch www.Cbrg.inf.ethz/Server/MassSearch.html MS-FIT www.Prospector.ucsf.edu Peptldent www.expasy.ch/tools/peptident.html a Software for 2DE. TABLE 2.3 Database for Protein Interaction Name of the Database Web Site CuraGen Portal.curagen.com DIP Dipdoe-mbi.ucla.edu Interact Bioinf.man.ac.uk/interactso.htm MIPS www.mips.biochem.mpg.de ProNet Pronet.doublewist.com
  • 48. 24 Informatics in Proteomics MALDI or SELDI-TOF could be composed of three components: (1) true peak signal, (2) exponential baseline, and (3) white noise. Low-level processing is usually used to disentangle these components, remove systematic artifacts, and isolate the true protein signal. A key for successful biomarker discovery is the bioinformatic approach that enables thorough, yet robust, analysis of a massive database generated by modern biotechnologies, such as microarrays for genetic markers and time-of-flight mass spectrometry for proteomic spectra. Prior to a statistical analysis of marker discovery, TOF-MS data require a pre-analysis processing: this enables extraction of relevant information from the data. This can be thought of as a way to standardize and summarize the data for a subsequent statistical analysis. For example, based on some eminent properties of the data, pre-analytical processing first identifies all protein signals that are distin- guishable from noise, then calibrates mass (per charge) values of proteins for poten- tial measurement errors, and finally aggregates, as a single signal, multiple protein signals that are within the range of measurement errors. The above discussion is specifically relevant to serum-based analysis prone to all types of artifacts and errors. Serum proteomic pattern analysis is an emerging technology that is increasingly employed for the early detection of disease, the measurement of therapeutic toxicity and disease responses, and the discovery of new drug targets for therapy. Various bioinformatics algorithms have been used for protein pattern discovery, but all studies have used the SELDI ionization technique along with low-resolution TOF-MS anal- ysis. Earlier studies demonstrated proof-of-principle of biomarker development for prostate cancer using SELDI-TOF, but some of the studies relied on the isolation of actual malignant cells from pathology specimens.13–16 Body-fluid-based diagnos- tics, using lavage, effluent, or effusion material, offers a less invasive approach to biomarker discovery than biopsy or surgical-specimen-dependent approaches.17 Additionally, serum-based approaches may offer a superior repository of biomarkers because serum is easy and inexpensive to obtain.18–21 Several preprocessing and postprocessing steps are needed in the protein chip data analysis. For data analysis we must process the mass spectra in such a way that it is conducive to downstream multidimensional methods (clustering and classifica- tion, for example). The binding to protein chip spots used for general profiling is specific only to a class of proteins that share a physical or chemical property that creates an affinity for a given protein chip array surface. As a result, mass spectra can contain hundreds of protein expression levels encoded in their peaks. Bioinformatics tools have promise in aiding early cancer detection and risk assessment. Some of the useful areas in bioinformatics tools are pattern clustering, classification, array analysis, decision support, and data mining. A brief application of these approaches is described below. 2.2.1 CLUSTERING Two major approaches to clustering methods are bottom-up and top-down. An example of the bottom-up approach includes hierarchical clustering where each gene has its own profile.22 The basis of the clustering is that closest pairs are clustered
  • 49. Discovering Diverse Content Through Random Scribd Documents
  • 50. “I think I will,” replied Lady Elizabeth, with a little yawn, and giving her father a kiss, she went upstairs to her bedroom. “Oh, dear,” she exclaimed, as she proceeded to undress herself, “what an unfortunate girl I am. Fancy an earl’s daughter having no maid to help her to bed when she is sleepy. Bah!” and here she stamped her little foot, “I wish everything were gold, that I could sell it.” Having made this foolish remark, she was naughty enough to break the strings of her petticoat, for they had become knotted. Then she jumped into bed, and before her pretty head had touched the white pillow she was fast asleep, beyond even the land of dreams. She slept soundly all the night through, not waking up till the sun was shining in at her window, in all his golden glory; indeed it was a glorious day, golden, bright, and beautiful! Lady Elizabeth jumped from her bed with a song on her lips, and her eyes bright with health and beauty. But of a sudden the song ceased, as she cried out in wonder and alarm, and her eyes became fixed with extraordinary astonishment. She had poured the water from the jug into the basin, and as soon as she touched it with her pink fingers it had frozen hard. Frozen quite solid, not into ice, but into pure gold. Pure gold, worth hundreds of pounds! It was the same in the bath, a bath both deep and wide. As soon as her little pink toe touched the water it froze into a large block of yellow gold, worth thousands upon thousands of pounds.
  • 51. Lady Elizabeth Buys the Magic Fish. She was so bewildered, so excited, so delighted that she could hardly dress herself, but she managed to do so somehow, and then ran downstairs to tell her father the good news. He was a rich man now, and could have servants, and horses and carriages and everything else that he desired! Lady Elizabeth and the Earl gloated over the gold, and the household came and stared at it in mute wonder. More water was poured into the bath and the same thing happened as before; when touched by Lady Elizabeth’s fair fingers it turned into the precious metal. But wonder must give way to other feelings. The Earl’s daughter began to feel hungry, very hungry in fact, for she had a good appetite and it was long past breakfast-time; she had had nothing to eat since her supper of Magic Fish the night before. It was a nice breakfast, coffee and rolls, fresh butter and eggs, and jams and other nice things. Lady Elizabeth said her grace, sat down, poured herself out a cup of coffee and raised it to her rosy lips. Lady Elizabeth let the cup fall with a crash, breaking it to atoms, as she sprang to her feet with a scream, while the Earl fell off his
  • 52. chair in amazement. He was an elderly earl, and rather nervous, and sudden shocks upset him. But really it was enough to upset anybody, for as soon as his daughter’s lips touched the coffee it had turned into solid gold. No wonder she dropped the cup, it was so heavy. She tried a second cup with the same result; then, with trembling fingers, she touched the loaf of bread, when it turned to gold immediately; eggs, jam, butter, even the very crumbs turned into golden nuggets, and as Lady Elizabeth found it impossible to eat gold, she went without any breakfast whatsoever. Her father was much concerned. Magicians were sent for from all over the country, but they could do nothing but stare with wonder and help themselves to the golden eggs to pay for their travelling expenses.
  • 53. The Poodle turns into a Golden Dog. The same thing happened at luncheon, at dinner, tea and supper. Lady Elizabeth was starving. In the evening another remarkable event took place. She happened to touch the pet poodle, when it immediately became a golden dog. The Earl, at this, became more nervous than ever, and shrieked whenever his daughter came near him. The servants shunned her, too, fearful of the consequences of touching her. Poor Elizabeth; a more unhappy girl did not go to bed that night! But she had eaten the Magic Fish and wished for gold, and her wish had been fulfilled. The same happened the next day. Crowds of people came from far and near to see the wonder of the age, and while they wondered, Lady Elizabeth was slowly starving to death. “Oh,” she cried, “if only I could be like an ordinary girl again. I vow I would never be discontented any more. I would do my best to be cheerful and never, never grumble again.” As she made this vow there came a peal of thunder, and of a sudden the golden water, the golden bread, jam, butter, and even the eggs the Magicians had taken for their travelling expenses, turned back into their natural state. And to the joy of Lady Elizabeth,
  • 54. her father, and the people who loved her, she once more could work, eat, and drink again. From that day to this she was never discontented, and never once longed for the gold which was hers for so short a while. By the way, I was nearly forgetting to say that the pet poodle did not turn into a live dog again. He remained a golden one, and made an exceedingly handsome ornament for the fireplace.
  • 55. THE PRINCESS AND THE FROG. There was once a Frog. He lay in a pool near the horse-pond in the farmyard, behind the King’s Castle. To look at, he was not by any means a remarkable frog. He was neither bigger nor smaller than other frogs of his kind; neither was he greener, browner, nor more yellow. He certainly was a perfect swimmer, and his croak was perhaps just a little more musical than the croak of the other frogs, but in other respects he was exactly like them. He spent his days catching worms and flies, and dodging ducks who were always on the lookout to catch him. His was the usual frog’s life—and yet, and yet he was no ordinary frog. There was once a Princess. She lived in the Castle beyond the pool, on the other side of the horse-pond. She was no ordinary Princess. Princesses, of course, are always beautiful; but this one was more beautiful than any. Her hair was more golden than real gold; her eyes as blue as an eastern sky;
  • 56. her teeth as white as the whitest of pearls, while her smile was as sweet as an angel’s. She was as good as she was beautiful. Indeed, she was no ordinary Princess. She loved the world and everybody in it. She loved her dear old father, the King (she had no mother and brothers and sisters to love, poor Princess); she loved all the King’s subjects, from the oldest old man to the youngest new baby, and she loved all animals—yes, all animals, from the noble horses to—well, even to the frogs in the pool beyond the horse- pond, in the farmyard at the back of the Castle. Now, the King was very rich, and so his daughter had everything she desired, and what she desired most was the means to do good to others, and to be able to care for all the maimed and injured animals in her father’s kingdom. She had comfortable stables built for the poor old horses, kennels for the poor old dogs, almshouses for the poor old men and women, and happy homes for homeless babies. The Princess was the ministering angel of the country. In the Castle itself she had aviaries filled with beautiful birds, and aquariums full of fish and all sorts of queer animals, including even a frog with an injured foot, that the Princess herself had found in the pool in the farmyard behind her father’s Castle. This was the Frog
  • 57. that was no ordinary frog, except in appearance. He lived in the Castle, and was happy; and his foot got quite well, except when he hopped he had a slight limp. Now, everything went happily until the lovely Princess was eighteen years old, and then something fearful happened. A terrible and cruel war broke out between the King, her father, and a neighbouring Emperor, and alas! the King got the worst of it. He lost every battle from the very beginning; town after town fell into the hands of the enemy; the happy villages were burnt down; the crops and the cattle were seized, and the King and his daughter sat in the Castle with only a few soldiers to guard them, expecting every moment the arrival of the Emperor’s victorious army. They had no money—all their treasures had been sold to pay for the horrid war. The old men and women were miserable in the almshouses; the babies cried in their homes; the horses and birds and fishes had been set free, for there was no money with which to buy them food, and there was misery over all the land. The poor Princess had no pets except one that had been left behind in the aquarium—the Frog that was no ordinary frog, and that had a limp when he hopped, and whose croak was rather more musical than the croak of other frogs. Well, it came at last, the Emperor’s conquering army, and it swept all before it; the Castle was taken, and the King and the Princess had only just time to escape by the back door, and through the farmyard by the pool, near the horse- pond, and so on to the woods, where they hid themselves from their
  • 58. enemies. The Frog was with them—yes, in a safety-matchbox, in the Princess’s pocket. It was certainly not comfortable there, but he preferred it to being left behind in a castle filled with strangers. The next day found the King and his daughter miles away from their old home, seated hand in hand upon a bank, hungry and miserable. No one would have taken them for a King and a Princess, for he wore an ordinary felt hat, instead of a crown, and she wore nothing on her head but her own beautiful golden hair, which was more beautiful and brilliant than the finest gold. Well, they went all that day without anything to eat but berries, and at night they slept in the woods again; and so they journeyed on, more miserable and hungry. The Frog, too, was not very happy, and having the cramp in his lame foot, kicked somewhat vigorously in his matchbox, so that the Princess heard him, and pitied him, and determined to let him go when they came to some water. Now, they had not gone much farther before they came to a pond, and here, I think, comes the wonderful part of the story. The Princess took the Frog from the matchbox and held it for a moment in her hand, and as she did so, she burst into tears, and her tears fell upon the little creature. “Alas!” she cried, “you are the last of my poor pets I loved so dearly.”
  • 59. Then there suddenly came a flash of light, and a noise like terrible thunder, and the King, in his fright, fell on his back, while the Princess opened her dark blue eyes in wonder. There stood before her a handsome Prince, who smiled and held out his hands to her. “The spell of a wicked fairy is broken,” he said. “The Frog you took from the pool was no ordinary frog—in reality, he was an enchanted Prince; your love for, and the tears that fell on him, have restored him to his own form again.” “Come,” he continued, “we three will go over those blue hills together, to my lovely country. And you shall be my Princess, and we will rule the land together.” And so they went away, hand in hand, the Princess between her father and the Prince, and they went over the blue hills to the most beautiful country you can imagine. And then, before long, the Princess built stables and kennels for the old horses and poor dogs, and almshouses for the old men and old women, and houses for the homeless babies; and she was never so happy as when doing good to others, and everybody loved her, for, truly, she was the ministering angel of the land.
  • 60. THE THREE SNOWFLAKES Once upon a time there were three snowflakes, and they were called Faith, Hope, and Charity. When I say three snowflakes, I don’t quite mean that, but three little girls dressed in white, and looking like snow Princesses as they trudged along across the white covered country. They were the Earl’s daughters, and, as I have just said, their names were Faith, Hope, and Charity. I wonder what the Earl would have called a fourth daughter, supposing he had had one. The three snowflakes lived at the Castle, which was on a hillside, surrounded by a beautiful park, and overlooking the valley. In the summer it was a lovely valley, with a river running through it, and beautiful green woods coming down to the edges of the water. Now the winter had come it was all white, except the river, which looked grey in the distance. In one corner of the valley lay the
  • 61. village, and in the last cottage of the village there lived a little girl called Ruth. Ruth was very poor, indeed, she was so poor that she possessed nothing. The tiny cottage she stood in had been rented by her grandmother, and now her grandmother was dead; the only relation she had left in the world had been taken from her. There was not a crumb of bread in the cupboard, not a stick with which to make a fire, not a penny in the girl’s pocket, so no wonder she stood looking out of the window with dismay in her face. The window was a little open, and through the opening came three flakes of snow. They fell upon the brick floor and melted slowly away. Ruth shuddered; it was the first snow of the year, it might mean the beginning of a long, hard, cruel winter. She shuddered again, and then of a sudden knelt on the brick floor and clasped her hands in prayer, and this showed she had Faith in her heart. And as she prayed the sun broke through the snow clouds, and poured in through the window, and shone on the girl’s brown hair. She rose with a smile on her lips and a light dancing in her eyes, for there was Hope in her breast. Ruth opened the window and took in the withered flowers on the sill.
  • 62. “Poor flowers,” she said, “you will be warmer inside.” Now this was Charity, for kindness is Charity, and we can be kind even to flowers. Then, of a sudden, there came shouts of laughter from the lane without, and the sound of merry voices; the door of the cottage flew open, and in ran the Earl’s daughters, the three snowflakes. “Oh, Ruth,” said Charity, “we have heard of your trouble, and our father has sent us to help you.” And Charity kissed Ruth on the cheek. “And you are to come and live in the lodge by the gates,” said Faith, putting her arms round the poor girl’s waist, and leading her to the door of the cottage. “And you are to be happy the whole year long,” cried Hope, clapping her hands, and turning, she led the way, skipping and laughing, up the lane. And so it happened that Ruth went and lived in the lodge of the great lord’s beautiful estate, and there she may be living, contented and happy, to this day.
  • 65. A SELECTION FROM RAPHAEL TUCK & SONS’ PUBLICATIONS.
  • 66. 1. 2. 3. 4. 5. 6. The Children’s Gem Library. A series of six cloth bound Story Books by the most popular Writers for Children. Effie’s Little Mother, by Rosa Nouchette Carey. Tic-tac-too, by L. T. Meade. Betsy Brian’s Needle, by M. A. Hoyer. The Seven Plaits of Nettles, by Edric Vredenburg. The Rainbow Queen, by E. Nesbit. Mildred and her Mills, by Nora Chesson. All the above Illustrated in colour and black and white. 64 pages. 25c. each. Complete, in a neat case, $1.50.
  • 67. Humorous Books by Louis Wain. Big Dogs, Little Dogs, Cats and Kittens. Thirty-six pages of coloured and black and white pictures. Bound in Picture boards 1.50 Bound in Cloth, bevelled 2.00 Pa Cats, Ma Cats and their Kittens. Thirty-six pages of coloured and black and white pictures. Bound in Picture boards 1.50 Bound in Cloth, bevelled 2.00 With Louis Wain to Fairyland. Described by Nora Chesson. Thirty-six pages of coloured and black and white pictures. Bound in Picture boards 1.50 Bound in Cloth, bevelled 2.00 Louis Wain’s Cats and Dogs. Untearable linen leaves. Twenty- four full-page coloured pictures, and four black and white. Bound in Picture boards 1.50 These books are in Louis Wain’s inimitable style, and will amuse both old and young alike.
  • 68. New and Amusing Books By T. E. Donnison, etc. Odds and Ends and Old Friends. Thirty-six pages of coloured and black and white pictures. Bound in Picture boards 1.50 Bound in Cloth, bevelled 2.00 Old Fairy Legends in New Colours, with Verses by Nora Chesson. Thirty-six pages of coloured and black and white pictures. Bound in Picture boards 1.50 Bound in Cloth, bevelled 2.00 Old Friends in New Frocks, with Verses by Nora Chesson. Untearable linen leaves. Twenty-four full-page coloured pictures, and four black and white. Bound in Picture boards 1.50 Bound in Cloth, bevelled 2.00 The familiar Nursery Tales and Rhymes treated in a very clever and entirely new manner. Rhymes without Reason. Pictured and penned by E. M. and M. F. Taylor. Thirty-six pages of coloured and black and white pictures. Bound in Picture boards 1.50 Bound in Cloth, bevelled 2.00
  • 69. Wallypug Tales. A novel and extremely humorous creation of G. E. Farrow, illustrated with 36 full-paged pictures in colour, by Alan Wright. Bound in Picture boards 1.50 Bound in Cloth, bevelled 2.00 The Wallypug stories have brought the author into the front rank of writers for children. Proverbs Old, Newly Told, by Clifton Bingham. Thirty-six pages of coloured and black and white pictures. Bound in Picture boards 1.50 Bound in Cloth, bevelled 2.00 The well-known proverbs treated in a very original and humorous fashion.
  • 70. Books by the Rev. Canon Duckworth, D.D., C.V.O., Sub-Dean of Westminster; Chaplain-in-Ordinary to the King. The Holy Land. Illustrated with forty-nine pictures in colour and black and white, from original drawings, painted in Palestine, by W. J. Webb. Coloured map. Thirty-six pages. Bound in Picture boards 1.50 Bound in Cloth, bevelled 2.00 Through the Holy Land. Thirty-two pictures in colour and black and white, by W. J. Webb. Paper 40c. Linen leaves 75c. By the late Rev. H. R. Haweis, M.A., Author of “Music and Morals,” “Arrows in the Air,” “Christ and Christianity,” etc. The Child’s Life of Jesus. Illustrated with twenty full-paged coloured and forty-three black and white pictures. One hundred pages. Bound in Picture boards 1.50 Bound in Cloth, gilt edges 2.00 Written in Mr. Haweis’s charming and forcible language, which makes the life of our Saviour readily understood by children.
  • 71. Welcome to Our Bookstore - The Ultimate Destination for Book Lovers Are you passionate about books and eager to explore new worlds of knowledge? At our website, we offer a vast collection of books that cater to every interest and age group. From classic literature to specialized publications, self-help books, and children’s stories, we have it all! Each book is a gateway to new adventures, helping you expand your knowledge and nourish your soul Experience Convenient and Enjoyable Book Shopping Our website is more than just an online bookstore—it’s a bridge connecting readers to the timeless values of culture and wisdom. With a sleek and user-friendly interface and a smart search system, you can find your favorite books quickly and easily. Enjoy special promotions, fast home delivery, and a seamless shopping experience that saves you time and enhances your love for reading. Let us accompany you on the journey of exploring knowledge and personal growth! ebookgate.com