Bioinformatics And Data Analysis In Microbiology Ozlem Tastan Bishop
Bioinformatics And Data Analysis In Microbiology Ozlem Tastan Bishop
Bioinformatics And Data Analysis In Microbiology Ozlem Tastan Bishop
Bioinformatics And Data Analysis In Microbiology Ozlem Tastan Bishop
1. Bioinformatics And Data Analysis In Microbiology
Ozlem Tastan Bishop download
https://ebookbell.com/product/bioinformatics-and-data-analysis-
in-microbiology-ozlem-tastan-bishop-5303500
Explore and download more ebooks at ebookbell.com
2. Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Advances In Data Analysis Theory And Applications To Reliability And
Inference Data Mining Bioinformatics Lifetime Data And Neural Networks
1st Edition Ramn Alvarezesteban
https://ebookbell.com/product/advances-in-data-analysis-theory-and-
applications-to-reliability-and-inference-data-mining-bioinformatics-
lifetime-data-and-neural-networks-1st-edition-ramn-
alvarezesteban-4259894
Applying Big Data Analytics In Bioinformatics And Medicine 1st Edition
Paraskevi Papadopoulou Editor
https://ebookbell.com/product/applying-big-data-analytics-in-
bioinformatics-and-medicine-1st-edition-paraskevi-papadopoulou-
editor-42166254
Big Data Analytics In Chemoinformatics And Bioinformatics With
Applications To Computeraided Drug Design Cancer Biology Emerging
Pathogens And Computational Toxicology Subhash C Basak
https://ebookbell.com/product/big-data-analytics-in-chemoinformatics-
and-bioinformatics-with-applications-to-computeraided-drug-design-
cancer-biology-emerging-pathogens-and-computational-toxicology-
subhash-c-basak-47921176
Advances In Bioinformatics And Big Data Analytics Sujata Dash
https://ebookbell.com/product/advances-in-bioinformatics-and-big-data-
analytics-sujata-dash-50738320
3. Computational Intelligence And Big Data Analytics Applications In
Bioinformatics 1st Ed Ch Satyanarayana
https://ebookbell.com/product/computational-intelligence-and-big-data-
analytics-applications-in-bioinformatics-1st-ed-ch-
satyanarayana-7328082
Bioinformatics And Biomarker Discovery Omic Data Analysis For
Personalized Medicine 1st Edition Francisco Azuaje
https://ebookbell.com/product/bioinformatics-and-biomarker-discovery-
omic-data-analysis-for-personalized-medicine-1st-edition-francisco-
azuaje-1339810
Big Data Analysis For Bioinformatics And Biomedical Discoveries Ye
https://ebookbell.com/product/big-data-analysis-for-bioinformatics-
and-biomedical-discoveries-ye-5308452
Big Data Analysis For Bioinformatics And Biomedical Discoveries Shui
Qing Ye Editor
https://ebookbell.com/product/big-data-analysis-for-bioinformatics-
and-biomedical-discoveries-shui-qing-ye-editor-9970720
Bioinformatics A Practical Guide To Next Generation Sequencing Data
Analysis Chapman Hallcrc Computational Biology Series 1st Edition
Hamid D Ismail
https://ebookbell.com/product/bioinformatics-a-practical-guide-to-
next-generation-sequencing-data-analysis-chapman-hallcrc-
computational-biology-series-1st-edition-hamid-d-ismail-51574846
10. Contributors
Lilija V. Avdeeva
Department of Antibiotics
Institute of Microbiology and Virology NAS of
Ukraine
Kiev
Ukraine
avdeeva@imv.kiev.ua
Sofia Barreira
School of Maths, Applied Maths and Statistics
College of Science
NUI Galway
Ireland
s.depereirabarreira1@nuigalway.ie
Oliver K.I. Bezuidt
Department of Biochemistry, Bioinformatics and
Computational Biology Unit
University of Pretoria
Pretoria
South Africa
bezuidt@gmail.com
Rainer Borriss
ABiTEP CmbH
Glienicker Weg 185
Berlin
Germany
rborriss@abitep.de
Wai Y. Chan
Department of Microbiology and Plant Pathology
University of Pretoria
Pretoria
South Africa
Annie.Chan@fabi.up.ac.za
Simone Coughlan
School of Maths, Applied Maths and Statistics
College of Science
NUI Galway
Ireland
s.coughlan9@nuigalway.ie
Don A. Cowan
Centre for Microbial Ecology and Genomics
Department of Genetics
University of Pretoria
Pretoria
South Africa
don.cowan@up.ac.za
Pieter De Maayer
Centre for Microbial Ecology and Genomics
Department of Genetics
University of Pretoria
Pretoria
South Africa
pieter.demaayer@fabi.up.ac.za
11. Contributors
vi |
Rosemary A. Dorrington
Department of Biochemistry and Microbiology
Rhodes University
Grahamstown
South Africa
r.dorrington@ru.ac.za
Tim Downing
School of Maths, Applied Maths and Statistics
College of Science
NUI Galway
Ireland
tim.downing@nuigalway.ie
Anthony Fodor
Department of Bioinformatics and Genomics
UNC Charlotte
Charlotte, NC
USA
afodor@uncc.edu
Johannes B. Goll
The EMMES Corporation
N Washington St
Rockville, MD
USA
johannes.goll@gmail.com
Morag Graham
National Microbiology Laboratory
Public Health Agency of Canada
Winnipeg, MB
Canada
morag.graham@phac-aspc.gc.ca
Jade Hotchkiss
Institute of Infectious Disease and Molecular
Medicine
Faculty of Health Sciences
University of Cape Town
Cape Town
South Africa
giant.plankton@gmail.com
Meesbah Jiwaji
Department of Biochemistry and Microbiology
Rhodes University
Grahamstown
South Africa
m.jiwaji@ru.ac.za
Konstantinos Krampis
The J. Craig Venter Institute
Rockville
MD
USA
agbiotec@gmail.com
Svitlana V. Lapa
Department of Antibiotics
Institute of Microbiology and Virology NAS of
Ukraine
Kiev
Ukraine
slapa@ukr.net
Jonathan McCafferty
Department of Bioinformatics and Genomics
UNC Charlotte
Charlotte, NC
USA
jmccaff2@uncc.edu
Gwynneth F. Matcher
Department of Biochemistry and Microbiology
Rhodes University
Grahamstown
South Africa
g.matcher@ru.ac.za
Nicola J. Mulder
Institute of Infectious Disease and Molecular
Medicine
Faculty of Health Sciences
University of Cape Town
Cape Town
South Africa
nicola.mulder@uct.ac.za
12. Contributors | vii
Karen E. Nelson
The J. Craig Venter Institute
Rockville, MD
USA
kenelson@jcvi.org
Oleg Paliy
Boonshoft School of Medicine
Wright State University
Dayton, OH
USA
oleg.paliy@wright.edu
Oleg N. Reva
Department of Biochemistry, Bioinformatics and
Computational Biology Unit
University of Pretoria
Pretoria
South Africa
oleg.reva@up.ac.za
Larisa A. Safronova
Department of Antibiotics
Institute of Microbiology and Virology NAS of
Ukraine
Kiev
Ukraine
safronova_larisa@ukr.net
Marketa Sagova-Mareckova
Crop Research Institute
Prague
Czech Republic
sagova@vurv.cz
Cathal Seoighe
School of Maths, Applied Maths and Statistics
College of Science
NUI Galway
Ireland
cathal.seoighe@nuigalway.ie
Vijay Shankar
Boonshoft School of Medicine
Wright State University
Dayton, OH
USA
shankar.5@wright.edu
Paul Stothard
Department of Agricultural, Food and Nutritional
Science
University of Alberta
Edmonton, AB
Canada
stothard@ualberta.ca
Sebastian Szpakowski
The J. Craig Venter Institute
Rockville, MD
USA
shpakoo@gmail.com
Özlem Taştan Bishop
Research Unit in Bioinformatics (RUBi)
Department of Biochemistry and Microbiology
Rhodes University
Grahamstown
South Africa
ozlem.tastanbishop@gmail.com
Angel Valverde
Centre for Microbial Ecology and Genomics
Department of Genetics
University of Pretoria
Pretoria
South Africa
angel.valverde@up.ac.za
Gary Van Domselaar
National Microbiology Laboratory
Public Health Agency of Canada
Winnipeg, MB
Canada
gary.vandomselaar@phac-aspc.gc.ca
13. Current books of interest
Microarrays: Current Technology, Innovations and Applications 2014
Metagenomics of the Microbial Nitrogen Cycle: Theory, Methods and Applications 2014
Proteomics: Targeted Technology, Innovations and Applications 2014
Biofuels: From Microbes to Molecules 2014
Human Pathogenic Fungi: Molecular Biology and Pathogenic Mechanisms 2014
Applied RNAi: From Fundamental Research to Therapeutic Applications 2014
Halophiles: Genetics and Genomes 2014
Phage Therapy: Current Research and Applications 2014
The Cell Biology of Cyanobacteria 2014
Pathogenic Escherichia coli: Molecular and Cellular Microbiology 2014
Campylobacter Ecology and Evolution 2014
Burkholderia: From Genomes to Function 2014
Myxobacteria: Genomics, Cellular and Molecular Biology 2014
Next-generation Sequencing: Current Technologies and Applications 2014
Omics in Soil Science 2014
Applications of Molecular Microbiological Methods 2014
Mollicutes: Molecular Biology and Pathogenesis 2014
Genome Analysis: Current Procedures and Applications 2014
Bacterial Toxins: Genetics, Cellular Biology and Practical Applications 2013
Bacterial Membranes: Structural and Molecular Biology 2014
Cold-Adapted Microorganisms 2013
Fusarium: Genomics, Molecular and Cellular Biology 2013
Prions: Current Progress in Advanced Research 2013
RNA Editing: Current Research and Future Trends 2013
Real-Time PCR: Advanced Technologies and Applications 2013
Microbial YZ%) Pumps: Current Research 2013
Cytomegaloviruses: From Molecular Pathogenesis to Intervention 2013
Oral Microbial Ecology: Current Research and New Perspectives 2013
Bionanotechnology: Biological Self-assembly and its Applications 2013
Real-Time PCR in Food Science: Current Technology and Applications 2013
Bacterial Gene Regulation and Transcriptional Networks 2013
Bioremediation of Mercury: Current Research and Industrial Applications 2013
Neurospora: Genomics and Molecular Biology 2013
Rhabdoviruses 2012
Full details at www.caister.com
14. Preface
We are at the door of an exciting future for micro-
biology. The rapid advancement of sequencing
techniques, coupled with the new methodologies
of bioinformatics to handle large scale data analy-
sis, are providing exciting opportunities for us to
understand microbial communities from a variety
of environments beyond previous imagination.
Data analysis is extremely important for a deeper
knowledge of microbes and their habitats, and
for many applications of microbiology ranging
from understanding the basis of diseases or host
pathogen interactions so as to design drugs and
develop vaccines, to many other biotechnology
applications, including barcoding, microbial bio-
remediation and bio-fuel production.
This book aims to present up-to-date and
detailed information on various aspects of bio-
informatics data analysis with applications to
microbiology. It describes a number of different
useful bioinformatics tools, highlights the links
to some wet-lab techniques, explains different
approaches to tackle a problem, talks about cur-
rent challenges and limitations, demonstrates
the applications, and discusses future trends.
A brief summary of each chapter is as follows:
Chapter 1 provides a review of microbes and
their importance in the context of ecosystems,
and an overview of the methods applied to study
microbes within ecosystems. It aims to give an
introduction to newcomers to the fields of micro-
biology and bioinformatics. Chapter 2 reviews
sequencing technologies and discusses the
challenges of assembly and ways of tackling the
problems.Chapter3highlightsthescopeofmicro-
bial variations and explores the importance of
accurate genome assembly for structural variation
and single nucleotide polymorphism analysis.
Chapter 4 deals with genome annotation, and
reviews computational methods that have been
developed for annotation of bacterial and archaeal
genomes. Chapter 5 explores the methods for
comparative analysis of microbial genomes, and
focuses on the analysis of Mycobacteria as a case
study. Microbial community profiling is the topic
of Chapter 6, in which fingerprinting techniques
and construction of phylogenetic trees are interro-
gated. Metagenomics, one of the fastest advancing
fields of microbiology, is discussed in Chapter 7
with examples of the Human Microbiome Project
and a Baltic Sea study. Chapter 8 further inter-
rogates human microbiome analysis and presents
the pros and cons of using 16S rRNA based
sequencing studies. Chapter 9 takes us to another
interesting topic, phylogenetic microarrays, in
which 16S rRNA sequences are used to determine
the composition of microbial communities, yet
from the microarray aspect. The last chapter is
about genetic barcoding with its applications to
microbiology and biotechnology. Overall, this
is an essential book for researchers, lecturers
and students involved in microbiology and/or
different aspects of bioinformatics including next-
generation sequencing, sequence data analysis,
comparative genome analysis, metagenomics and
more.
The book has been peer reviewed. Each chap-
ter of the book has been reviewed by at least one
reviewer and by the editor. The reviewing process
was undertaken by other authors contributing
to the book, and I believe the constructive com-
ments of the reviewers significantly improved
the quality of the book. I thank the authors of
15. Preface
x |
the book chapters for their participation in this
process. My special thanks go to Tim Downing
who was always available for the reviewing pro-
cess, and always responded exceptionally fast. I
am especially grateful to Gary Van Domselaar,
Morag Graham and Paul Stothard who agreed to
write an extra chapter in a very short time, when
another author dropped out. I also would like to
thank Fourie Joubert for his initial contribution
in discussing the chapter topics and suggesting
some potential authors. My deepest thanks to the
Sabanci University, Istanbul-Turkey, for hosting
me and providing a friendly working environment
during my sabbatical which made it possible to
finalize the book. Last but not least, I would like
to thank my husband, Nigel T. Bishop, for his con-
tinuous support in my first time as an editor.
Özlem Taştan Bishop
16. 1
Understanding the Unseen Majority
Around Us: An Overview of
'!' '
Meesbah Jiwaji, Gwynneth F. Matcher and Rosemary A. Dorrington
Abstract
Of all the living organisms on the planet, microor-
ganisms are the most numerically abundant and
diverse in nature. Despite their ubiquity, research-
ers have only begun to understand the diversity
profiles, metabolic functioning and potential
economic value of these organisms.
Classical investigation of microorganisms
involves the culture and study of selected
microbes in the laboratory setting. While this
approach has yielded much information, there
are two major drawbacks. Firstly, most microbes
present in the environment are unculturable using
currently available media/methodologies and,
secondly, they focus on one, often attenuated,
isolate/species and/or a set of genes at a time.
To overcome these problems, researchers have
focussed on the development of new technolo-
gies that yield large, reliable and robust datasets
in fields that include genomics, transcriptomics
and proteomics. Importantly, the development
of high-throughput sequencing technologies has
dramatically advanced the analysis of microbial
species diversity and their functioning within
ecosystems. The large volumes of information-
rich data require intelligent, and often repetitive,
computational analysis, stressing the need for
development of suitable bioinformatics analy-
sis tools. This chapter provides an overview of
microbes and the importance of why we need to
understand them, as well as the methods applied
to studying microbiota within ecosystems.
Introduction
The field of microbiology is undergoing a renais-
sance. The discipline has changed dramatically
since Antonie van Leeuwenhoek, the man who
invented the microscope, watched bacteria that
he recovered from his own teeth through his
homemade microscope (van Leeuwenhoek,
1677). Traditional microbiology has focussed
on the study of individual organisms, but now,
while microorganisms and their processes still
need to be understood at the level of the indi-
vidual, increasingly, the aim is to understand the
cumulative influence of microorganisms on the
functioning of biological systems. As a conse-
quence, microbiology has progressed from the
study of individual genomes to that of microbial
populationsinecosystems.Scientificobservations
at the subcellular level are now being interpreted
at rapidly increasing levels of complexity in order
to gain a better understanding of the complex
metabolic pathways within cells as well as their
interactions in the context of the ecosystem. In
order to gain understanding of system organiza-
tion on such a large scale, new methodologies
are constantly being developed, most of which
generate extremely large sets of raw data which
need to be curated and analysed using the tools of
bioinformatics.
What is a microorganism?
The term ‘microorganism’ is used to describe any
small organism, typically an entity that has a mass
of less than 10–5
g and a length of less than 500μm,
with a largest dimension of 100–150μm (Hughes
Martiny et al., 2006; Karl, 2007). It is important to
17. Jiwaji et al.
2 |
note that currently organisms are assigned to the
‘micro’ class, based only on their size. This ignores
any differences in their evolutionary histories and
metabolic capabilities. Given time, and with the
increasing awareness of the diversity of microor-
ganisms, this method of classification may need to
be revisited.
The classification of organisms involves placing
them within collections/groups of related species.
Traditionally, microorganisms were classified
within the Domains Prokarya or Eukarya based
on their morphology, the environment they were
isolated from, the means by which they generated
energy, their nutrient requirements and their
mode of replication (Woese et al., 1990; Karl,
2007; Schleifer, 2009). Since the advent of molec-
ular biology and the collection of sequence data,
microorganisms are being classified into three
Domains; the Bacteria, Archaea and Eukarya (Fig.
1.1)(Woese et al.,1990).BacteriaandArchaeaare
grouped together as prokaryotes based on several
cellular characteristics. These include the absence
of a membrane surrounding the genome, the lack
of introns in the encoded genes, the absence of
intracellular organelles (e.g. chloroplasts, mito-
chondria, etc.) and ribosomal subunits of 30S
and 50S (eukaryotic subunits are larger at 60S and
40S). While Archaea and Bacteria do share several
similarities, significant differences do exist. For
example, bacterial cell walls contain peptidogly-
can whilst archaeal cell walls do not and bacterial
cell membranes contain ester links whilst those of
Archaea have ether links. While Archaea and Bac-
teria are both classified as prokaryotes, Archaea
do in fact share a number of morphological char-
acteristics and DNA sequence similarities with
that of Eukaryotes prompting the speculation
that Archaea and Eukarya diverged from bacteria
before they diverged from one another (Fig. 1.1)
(Zillig, 1991; Schleifer, 2009).
Microbes in the world around
us – the unseen majority
Microorganisms are very old; they are thought to
have inhabited the Earth for more than 3.5 billion
years as evidenced by their presence in fossils, for
example the Burgess Shale in Canada (Schopf,
2001). It is interesting that through their evolu-
tionary history, microorganisms have remained
simple and small and that their classification is
often expressed in terms of their physiology and
metabolism rather than their morphologies.
Microorganisms are integral components of all
ecosystems and there is no environment stud-
ied to date in which microorganisms have not
been found. From the extreme environments
where temperatures exceed 100°C or fall below
!
#
#
$
Figure 1.1 != 7
Y%. ! !!% !
18. Microbes: An Unseen Majority Around Us | 3
0°C, pHs that range from below 2 to over 10, or
pressures over 100MPa, to every conceivable
environment in between (Ulukanli and Digrak,
2002; Baker-Austin and Dopson, 2007; Moyer
and Morita, 2007; Fang et al., 2010; Synowiecki,
2010). In the polar regions to the deserts, from
the shallow to the deep oceans, from exotic envi-
ronments to one’s own backyard, there exists an
unexpected abundance and diversity of microor-
ganisms (Pace, 1997). In terms of abundance, for
example, one gram of soil may harbour up to 10
billion microorganisms potentially representing
thousands of different species (Roselló-Mora
and Amann, 2001). It is now well recognized that
microorganisms are both taxonomically diverse
and metabolically complex. In fact, microorgan-
isms dominate their environments in terms of
biomass, diversity and metabolic activity and are
thus truly the ‘unseen majority’ (DeLong and
Karl, 2005).
Of particular interest is the role of the microor-
ganism in ecosystem biology. Ecosystems consist
of interconnected systems whose components
interact over a broad range of physical and bio-
logical states (DeLong, 2007). While several
physico-chemical parameters may be similar
between different ecosystems, the microbial com-
munities within these ecosystems differ. Thus,
while microorganisms are ubiquitous, their distri-
bution is not uniform and their diversity profiles
differ significantly between ecological niches. A
strong body of evidence shows that the individual
environment selects, and is partly responsible for,
the spatial variation in the diversity and popula-
tion structure of microorganisms (Hughes
Martiny et al., 2006). In addition to the effect of
a wide range of physico-chemical conditions on
microbial diversity profiles, factors such as inter-
species competition and predation also play a role
in effecting relative abundances and distributions
of microbial species (DeLong, 2007; Hibbing et
al., 2010).
Microorganisms form a considerable propor-
tion of the total biomass in all ecosystems and
so are supremely important in the functioning
of global ecosystem processes. For example,
in marine ecosystems, it is estimated that the
microbes are responsible for up to 98% of the
primary productivity and the mediation of all
biogeochemical processes (Sogin et al., 2006).
Microorganisms harvest and convert solar energy,
catalyse important biogeochemical transforma-
tionsoffreenutrientsandtraceelementstosustain
ecosystems, they form crucial links in the carbon
and nitrogen cycles, and they both produce and
consume greenhouse gases including carbon
dioxide (CO2
), nitrous oxide (N2
O) and methane
(CH4
) (McGrady-Steed et al., 1997; Naeem and
Li, 1997; van der Heijden et al., 1998; Cavigelli
and Robertson, 2000; Horz et al., 2004; Bell et
al., 2005). In addition, cellular microorganisms
are pivotal in driving the sulfur and phosphorous
cycles, the production of secondary metabolites
including vitamins and co-factors and bioac-
tive compounds such as antibiotics that confer
selective advantages in the highly competitive
environmental niches (Fagerbakke et al., 1996;
Demain, 2007).
In the past, the majority of microbial genetic
datageneratedhasbeendefinedbytheresearcher’s
individual areas of interest and has often focused
on individual species of microorganisms. For
example, the study of Acidithiobacillus ferrooxidans
and its application to bioremediation (Umrania,
2006). More recently, there has been a shift in the
approach to studies and there has been increasing
interest in the study of environmental ecosystems
as a whole. As examples, Desai et al. (2010), Xu et
al. (2010) and Petrić et al. (2011) examined the
microbial communities in xenobiotic-contami-
nated soils during bioremediation.
Microorganisms and climate
change
The study of climate change includes green-
house-gas-induced warming, and in the case of
water bodies, water acidification (Doney, 2006).
Both of these are destructive processes that
are likely to affect the macro-faunal and -floral
community structure and dynamics as well as
extinction of species. To date, the effect of cli-
mate change on microbial species distribution
and abundances has not been shown. However,
considering the crucial contribution of microbial
metabolism to ecosystem processes, a shift in
microbial population dynamics is likely to have
severe implications.
19. Jiwaji et al.
4 |
Microorganisms both consume and produce
major greenhouse gases, thus they have central
roles both as effectors and monitors of global
climate change (DeLong, 2007). The effect of
climate change on microorganisms is easiest to
monitor in the Antarctica, the coldest, windi-
est, driest and most isolated continent on Earth.
Due to the environment, Antarctic food webs are
relatively simple, and the absence of insect and
mammalian herbivores means that most energy
and biomass is channelled into a detritus trophic
pathway that is dominated by microorganisms
(Davis, 1981). Thus, soil microorganisms have a
disproportionate importance in nutrient cycling
and other ecosystem processes in the ice-free ter-
restrial Antarctic ecosystems. The simplicity of the
Antarctic ecosystem makes it particularly vulner-
able to environmental perturbations like global
warming and the Antarctic Peninsula is among the
most rapidly warming regions on the planet. This
has been demonstrated by significant increases in
the abundance of fungi and bacteria and particu-
larly in the Alphaproteobacteria-to-Acidobacteria
ratio, which is indicative of higher soil nutrient
availability (Thomson et al., 2010). The observed
shifts are consistent with increased turnover of
carbon and nitrogen in the soil upon warming of
the environment (Thomson et al., 2010; Yergeau
et al., 2012). In the oceans, phytoplankton can
also be used as monitors of climate change. These
eukaryotic microorganisms influence the biologi-
cal pump which cycles carbon, removing it from
theupperoceanandtransportingittothedeepsea.
Falkowski and Oliver (2007) used the behaviour
of phytoplankton to predict the potential effects
of climate change on the microorganism commu-
nity structure. As these microorganisms are major
participants in oceanic primary production, any
changes to the phytoplankton community have
knock-on effects and implications for fishery pro-
ductivity as well as global ocean ecology.
The impact of microorganisms
in industry
From a health and economic point of view,
microorganisms are of great interest and they
may impact our lives both positively and nega-
tively. For example, microbes may be pathogenic
and can cause serious diseases (Karlen, 1995).
Many microbial species are responsible for the
degradation of food produce and can result in
significant economic losses. On the positive side,
for example, microorganisms are indispensable
for their symbiotic roles in the human digestive
tract (Eckburg et al., 2005). In addition, they can
be harnessed for the production of fuels, chemical
compounds, animal feeds, human food, antibiot-
ics and pharmaceuticals (Zhu et al., 2012).
Microorganisms also represent a vast and
dynamic reservoir of genetic variability thus the
harnessing of this genetic diversity has allowed
scientists to unlock the economic potential of
microorganisms. Some of the important meta-
bolic processes that microorganisms are involved
in include:
Nitrogen fixation
Nitrogen fixation, the assimilation of atmospheric
nitrogen (N2
) into ammonia, is an economically
and environmentally important natural microbial
function, which yields hundreds of tonnes of bio-
logically available nitrogen and is worth billions
of dollars to global agriculture (Herridge et al.,
2008).
Bioremediation
Bioremediation, the use of microorganisms to
sequester or remove pollutants, is increasing both
as a concept and as an economically viable appli-
cation. Non-biological processes are estimated to
cost ten-fold more to remediate known hazard-
ous waste sites whereas bioremediation would
cost less and could occur in the same time frame.
This is a developing area and has the potential of
becoming economically important. For example,
hydrocarbon cold seeps and ecologically devas-
tating oil spills have led to the development of
marine microorganisms that will utilize alkanes as
carbon sources. These bacteria and their enzymes
have a potential role in bioremediation and oil
processing (Xu et al., 2008; Wasmund et al., 2009;
Augustinovic et al., 2012).
Bioprospecting
Bioprospecting, defined as the discovery and
subsequent commercialization of useful products
from environmental isolates, has been central in
20. Microbes: An Unseen Majority Around Us | 5
the search for novel pharmaceuticals and com-
pounds of industrial importance (Dionisi et al.,
2012). More than 75% of all antibacterial com-
pounds and approximately 50% of all anticancer
compounds that are in use clinically are either
natural products or derivatives thereof (Newman
and Cragg, 2007). The percentage of biologically
active natural products is much higher than that
of synthetic compounds primarily due to the fact
that natural products have evolved bioactivity in a
biological context (Firn and Jones, 2003).
Since Alexander Fleming’s observation of the
antibacterial activity of penicillin in 1928, and its
subsequent application to treating bacterial infec-
tions, a large number of microbe-derived natural
products have been used in the pharmaceutical
sector for a diverse range of medical applications.
This includes biologically active compounds like
the antibiotics cephalosporins, tetracyclines, ami-
noglycosides, rifamycins and chloramphenicol,
the mycotoxin asperlicin, the immunosuppres-
sant cyclosporine and the cholesterol-lowering
agent lovastatin (Pan et al., 2010). In addition to
the production of pharmaceutically important
metabolites, microorganisms also represent
a rich source of enzymes with applications as
biocatalysts in the production of a wide range of
industrially useful compounds. While the pro-
duction of these compounds in most instances
can be achieved via chemical processes, bio-
catalysts are an attractive alternative due to the
lower temperatures and neutral pH at which
they function, as well as the production of fewer
toxic by-products (Azerad, 1995; Koeller and
Wong, 2001). Furthermore, biocatalysts often
exhibit specific regioselectivity thereby negating
the requirement of chemical synthesis pathways
for substrate functional-group protection which
involves addition of several steps in the synthetic
pathway to block and unblock substituents
(Schmid et al., 2001).
Research into interesting enzymes with
potential industrial applications, for example
psychrophilic enzymes, is incredibly competitive
and lucrative (Struvay and Feller, 2012; Feller,
2013). The range of enzymes with unique abilities
includes lipases (de Pascale et al., 2008; Jeon et
al., 2009), cellulases (Ekborg et al., 2007; Shan-
mughapriya et al., 2009), chitinases (Cottrell et
al., 1999; Hobel et al., 2005), proteases, alkane
hydroxylase genes (Xu et al., 2008; Wasmund et
al., 2009), esterases with high tolerances for salt,
organic solvents, cold and high pressure (Auriliaet
al., 2008; Chu et al., 2008), and metalloproteases
with high temperature optima (Lee et al., 2007).
Lipases have wide-ranging applications in the
food, detergent, and pharmaceutical industries
(de Pascale et al., 2008; Jeon et al., 2009). Cel-
lulases have been the subject of intense study
because of their role in the generation of biofuels
from renewable cellulosic substrates (Ohkuma,
2003; Ekborg et al., 2007; Shanmughapriya et
al., 2009). Esterases have great commercial value
because of their use in industrial biotransforma-
tions (Aurilia et al., 2008; Chu et al., 2008).
Some of these enzymes have been isolated
using sequence-based approaches by screening for
genes with known bioactivity. These methods rely
on similarity to known protein families. Others
have been isolated using functional screens for
enzyme activity (Ferrer et al., 2005). The bio-
technological and industrial microbiological
potential for new natural products and processes
arising from mining microbial diversity is set to
increase in the coming years, especially with the
application of newly developed techniques, and
in particular high-throughput robotic screening
technology (Bornscheuer et al., 2012).
Assessing microbial
populations
Traditionally, study of microorganisms has been
conducted by staining to visualize the organisms,
microscopy and subsequent image analysis. At
first, stains were used to quantify the abundance
of microorganisms and their biomass (Francisco
et al., 1973; Porter and Feig, 1980). Visualization
of stained cells was then enhanced by powerful
image analysis (Psenner, 1990). More recently,
stains have been developed to visualize microor-
ganisms like viruses (Ortmann and Suttle, 2009)
and to quantify aspects of microbial metabolism
(including respiration and enzyme degradation).
In addition, radioisotopes (Simon and Azam,
1989) and fluorescent stains (Cotner et al., 2001)
have been applied to the measurement of bacterial
growth rates.
21. Jiwaji et al.
6 |
In addition to direct visualization of micro-
bial populations, microorganisms have been
grown under laboratory conditions, often as
pure cultures, and been used for sequence or
function-based biological studies. Advances in
environmental chemistry (including stable iso-
topes, ultrafiltration, measurement of dissolved
organic carbon (DOC) and dissolved inorganic
carbon (DIC), high performance liquid chro-
matography (HPLC) and mass spectrometry
(MS) have also enhanced our ability to study
microorganisms and their interaction with their
environment (Cotner and Biddanda, 2002; Han-
delsman, 2004).
Up until the 1980s, identification of micro-
organisms was achieved by culturing individual
isolates in the laboratory, viewing the organisms
microscopically and subjecting these cultures to
a variety of biochemical tests. While much valu-
able data has been generated by this approach,
there are several major limitations. Firstly, rep-
lication of the complex environment in which
microorganisms exist in a laboratory setting is
often extremely difficult and it is estimated that
less than 0.1% of microorganisms are currently
culturable (Stres, 2007). Furthermore, these
approaches are time-consuming, labour-intensive
and often subjective. Added to this, microorgan-
isms that are abundant and/or those that can be
cultured under some environmental conditions
may change into dormant or possibly uncultur-
able forms under other conditions (Hattori et al.,
1997). The severity of the problem of relying on
culture-dependent studies has been highlighted
by cultivation-independent surveys where a
major discrepancy can be observed between
viable plate count technology and direct methods
such as epifluorescence microscopic counts and
ssRNA phylogenetic analysis. For example, the
observation that marine bacterioplankton are
infected by huge viral numbers (tens of billions
of phage per litre) and that this phage predation
is remarkably specific was undetected primarily
because of a lack of representative pure cultures
(DeLong, 2007).
Viruses, as technically ‘non-living’ entities,
cannot be studied with the same methods used to
study other microorganisms. Classical methods
used for the identification of viruses include in
vitro viral amplification followed by cell culture
followed by visual observation of cytopathic
effects. Virus discovery can also be based on
specific nucleic acid hybridization and antigenic
cross-reactivity, for example on DNA or antibody
microarrays. This can be followed by visualiza-
tion using electron microscopy or by testing for
immunological cross-reactivity using panels of
sera, which is a powerful and rapid method for the
identification of the unknown viral agents. Tenta-
tive identification of the virus allows the use of
more specific molecular approaches like the use of
PCR with primers that target the likely viral group
for definitive genetic characterization (Delwart,
2007; Mokili et al., 2012).
The application of molecular methods bypass
the drawbacks presented by culture-dependent
methodologies and have led to an improved and
deeper understanding of the microbial compo-
nents of ecosystems. These new approaches have
made apparent our lack of knowledge about envi-
ronmental biodiversity. It is estimated that there
are ca. 1.5 million taxa that have been described
at the species level (de Meeus and Renaud, 2002)
yet this represents only a small proportion of the
estimated diversity. Currently, there is also a large
discrepancy between the microorganism diversity
that we detect in biological samples and what is
actually present. Molecular ecology studies sug-
gest that ca. 1–5% of microbial species have been
isolated (Floyd et al., 2005; Hughes Martiny et al.,
2006).Andmycologistsestimatethatthereare1.5
million species of fungi despite the fact that only
72,000 species have been isolated or described
(Hawkswerth, 1997).
Molecular approaches to investigating micro-
bial processes can focus either on individual
isolates or on the microbial population as a whole
within an ecosystem. Either way, a key aim in
the molecular biological study of microorgan-
isms is to determine the genetic information
present within the cell followed by correlation
of the encoding genetic material to the complex
biological processes which occur within the cell.
Bioinformatics analyses are a crucial require-
ment, not only in curating and analysing large
scale sequence data but also in bridging the gap
between genetic code and the encoded function-
ality of genes.
22. Microbes: An Unseen Majority Around Us | 7
DNA sequence determination
Up until recently, sequence analysis of DNA
has been achieved by Sanger sequencing. In this
method, DNA polymerase-dependent synthesis
of a complementary DNA strand occurs in the
presence of both natural 2′-deoxynucleotides
(dNTPs)and2′,3′-dideoxynucleotides(ddNTPs),
which function as irreversible terminators of syn-
thesis (Sanger et al., 1977). The DNA synthesis
reaction is terminated at random whenever a
ddNTP is added to a growing oligonucleotide
chain. This results in products of varying lengths
with an appropriate ddNTP at the 3′ terminus.
These truncated products are separated based on
size, using capillary electrophoresis, and the ter-
minal ddNTPs used to reveal the DNA sequence
of the DNA template strand. This technology has
been in use since 1977 and is well established.
However, in addition to being lengthy and labour
intensive, this methodology suffers from the limi-
tation that only one sequence can be generated
per capillary which in turn impacts on the amount
of genetic information that can be generated.
Furthermore, amplification of DNA fragments
is required before they can be sequenced. This
amplification can be achieved either in vivo, by
clonal amplification, or in vitro by PCR, and con-
sequently is susceptible to both host-related and
PCR biases (Hall, 2007).
To deal with the shortcomings of Sanger
sequencing, faster, cheaper, and simpler methods
for sequencing that bypass the cloning bias and
time- and labour-intensive nature of the Sanger
method have been developed. These new tech-
nologies differ in their approach to generating
sequence data, the average read length generated,
and the error rate distribution (Shendure and Ji,
2008) thus the appropriate sequencing method
needs to be selected based on the application.
A brief description of three of these sequencing
technologies, namely pyrosequencing, reversible
terminator sequencing chemistry and ligation
based sequencing chemistry will be covered in
this chapter. Detailed information on sequencing
techniques can be found in Chapter 2.
Pyrosequencing is a sequencing-by-synthesis
technique whereby enzyme driven biochemical
reactions and chemiluminescence are used to
generate sequence data. In this method, the DNA
template is sheared and the resultant DNA frag-
ments immobilized on beads at a ratio such that
a single DNA molecule is immobilized per bead.
Individual DNA fragment-carrying beads are then
capturedintoseparateemulsiondropletsandthese
droplets act as individual amplification chambers
which are capable of producing 107
clonal copies
of a unique DNA template per bead (Margulies et
al., 2005). Each template-containing bead is then
transferred into a well of a picotitre plate along
with smaller packing beads containing the immo-
bilized enzymes ATP sulfurylase and luciferase.
The use of the picotitre plate allows hundreds of
thousands of pyrosequencing reactions to be car-
ried out in parallel, thus increasing the sequencing
throughput (Margulies et al., 2005). Once the
clonally amplified bead-immobilized DNA frag-
ments have been deposited into the picotitre plate
wells, the picotitre plate is flooded with dNTP
solutions in a sequential manner. As a result of
the action of the polymerase, inorganic pyrophos-
phate(PPi)isreleasedwheneveracomplementary
nucleotide is incorporated into the growing DNA
strand. This PPi is converted to ATP by the sul-
furylase enzyme and the luciferase in turn utilizes
the ATP to convert luciferin to oxyluciferin with
light as a by-product. The number of bases incor-
porated into the extending DNA strand is directly
proportional to the amount of pyrophosphate
released which in turn is reflected in the amount
of light generated. This chemiluminescent signal
is detected by an extremely sensitive camera and
interpreted as sequence data (Margulies et al.,
2005; Rothberg and Leamon, 2008). Pyrose-
quencing is valuable for sequencing regions of
DNA that are technically difficult due to strong
secondary structure or high GC content as well as
for sequencing regions that are resistant to cloning
in Escherichia coli (Goldberg et al., 2006). One of
the drawbacks of pyrosequencing is the reduced
reading accuracy over homopolymeric stretches
of identical nucleotides (Ansorge, 2009).
The 454 GS FLX+ Platform, which utilizes
pyrosequencing chemistry and is currently mar-
keted by Roche Applied Science, is capable of
generating 700Mb of sequence with a consensus
accuracy of 99,997% and a read length of up to
1000bp (http://454.com/products/gs-flx-sys-
tem/index.asp, April 2013). The data generated
23. Jiwaji et al.
8 |
on the 454 platform correlates well with other well
established gene expression profiling technologies
including microarrays (Torres et al., 2008).
An alternative to pyrosequencing is the reversi-
ble terminator chemistry (Bennett, 2004; Bennett
et al., 2005; Bentley, 2006) which involves what is
termed solid-phase bridge amplification of single-
molecule DNA templates. Here, one terminal of
a single DNA molecule hybridizes to a comple-
mentary adaptor which is immobilized on a solid
surface, known as a flowcell. With the adapter
functioning as a primer, the polymerase extends
the DNA fragment thereby generating a copy of
the original template. This copy is attached to
the flowcell via the adaptor/primer which is now
incorporated into the DNA strand. Following
denaturation, the flowcell is rinsed to remove the
original template leaving the immobilized DNA
copy behind. This DNA fragment then bends over
and the free terminal end hybridizes to a nearby
complementary adaptor forming a DNA ‘bridge’.
Once again, the polymerase extends the adaptor/
primer generating a new copy of the DNA which
is also attached to the flowcell. This process is
repeated until a cluster of ~1000 clonal copies of
a single template molecule is formed. After ampli-
fication, a flowcell with approximately 40million
clusters of DNA amplicons is then subjected to a
DNA sequencing-by-synthesis methodology that
uses a custom DNA polymerase that can incor-
porate specialized reversible terminators with
removable fluorescent moieties into growing oli-
gonucleotide chains. The terminators are labelled
with fluorophores of four different wavelengths
to distinguish between the different nucleotide
bases and the sequence of the template in each
cluster is determined by detecting the wavelength
generated at each successive nucleotide addition
step. While this methodology is more effective
at sequencing homopolymeric stretches than
pyrosequencing, the sequence reads are shorter
(Bennett et al., 2005; Bentley, 2006; Ansorge,
2009). Also, the custom DNA polymerases and
the use of the reversible terminators do result in a
higher number of substitution errors (Hutchison,
2007). The HiSeq 2500/1500 platform, which
utilizes reversible terminator chemistry, is cur-
rently marketed by Illumina. Depending on the
run mode (i.e. Rapid-Run Mode or High-Output
Run Mode), the HiSeq generates between 95 to
600Gb of sequence with read lengths of between
35 and 150bp (http://www.illumina.com/sys-
tems/hiseq_2500_1500.ilmn, April 2013).
The third approach to high-throughput
sequencing utilizes ligation based chemistry. This
approach differs from other next generation tech-
nologies in that it is based on hybridization and
ligation of fluorescently labelled oligonucleotides
rather than polymerase-dependent incorporation
of nucleotides. For this approach, an emulsion
PCR single-molecule amplification step similar to
the one used in the pyrosequencing technique is
conducted. The amplification products are then
transferred onto a glass surface and sequence
analysis occurs via sequential rounds of hybridi-
zation and ligation with 16 oligonucleotide
octamers labelled with four different fluorescent
dyes. After addition, ligation, and fluorescent
detection of the complementary octamer, the last
two nucleotides with the fluorescent dye attached
are cleaved off and the next octamer flowed over
the immobilized template DNA. Once the entire
length of the DNA fragment has been covered, it
is stripped and the ligation of the octamers begins
anew from one nucleotide back (n–1) from where
the previous hybridizations occurred. As a result,
each position on the template is effectively probed
twice, and the identity of the nucleotide is deter-
mined by analysing the colour that results from
two successive ligation reactions. Most impor-
tantly, this two-base encoding scheme allows the
differentiation between a sequencing error and a
sequence polymorphism because an error would
be detected in one particular ligation reaction
but a polymorphism would be detected in both
reactions. This methodology is carried out on the
supported oligonucleotide ligation and detection
system (SOLiD) supplied by Applied Biosystems.
The SOLiD 4™ instrument is capable of generating
80–100Gb of sequence data per run in 35–50bp
reads per 8 to 13 day sequencing run (http://
www.appliedbiosystems.com/absite/us/en/
home/applications-technologies/solid-next-gen-
eration-sequencing/next-generation-systems/
solid-4-system.html, April 2013).
The availability of these high-throughput
sequencing technologies has democratized
genomics by substantially reducing the cost of
24. Microbes: An Unseen Majority Around Us | 9
the technology. In addition, they have removed
the need for in vivo cloning by clonal amplifica-
tion of spatially separated single molecules using
either emulsion PCR or bridge amplification on
a solid surface. As these methodologies use single
molecule templates, they allow for the detection
of heterogeneity in a DNA sample thus provid-
ing a powerful advantage over Sanger sequencing
(Bentley, 2006; Thomas et al., 2006).
Bioinformatics analysis of
sequence data
Part of bioinformatics research involves the man-
agement and analysis of large scale sequence data
that has been generated, and is a rapidly growing
field of science that incorporates aspects of biol-
ogy, mathematics and computer science. Once
sequence data have been generated, bioinformat-
ics analysis is required as explained in detail in
Chapters2,3and4.Dependingontheapplication,
input template, and platform utilized, this initial
processing would include removal of substandard
reads and the alignment of reads into contigs (in
the case of whole genome sequencing). The final
role played by bioinformaticists is the curation of
the huge datasets generated by next generation
sequencing technologies.
Determination of the nucleotide sequence
of a target organism or population of organisms’
genetic material on its own is relatively uninform-
ative. Defining how this nucleotide sequence is
responsible for structural and metabolic function-
ality is more important. Bioinformatics forms the
bridge between the sequence data and the biologi-
cal functioning in an organism/organisms. Once
the nucleotide sequence has been determined,
the first step in the bioinformatics analysis of the
sequence is gene prediction by detecting potential
open reading frames (ORFs). This is achieved by
identifying conserved sequences responsible for
the initiation and termination of transcription
as well as the site for the initiation of translation.
Once a potential ORF is identified, the next step is
to annotate the putative gene by assigning a func-
tion to the sequence. By comparing the encoded
queryproteinsequencewithadatabaseofproteins
with known functions, putative proteins with suf-
ficient homology to the known proteins can then
be assigned the corresponding function. If the
query protein sequence does not show high levels
of similarity with known proteins, annotation by
function can be carried out whereby the putative
proteins domains can be assigned a function. For
example, a particular arrangement of hydrophobic
regions within a protein may indicate a membrane
protein. While assignment of function based
on similarity to other proteins does provide an
extremely valuable starting point when correlat-
ing DNA sequence to cellular metabolism, it is
important to keep in mind that subsequent bio-
logical validation is required for confirmation.
Application of molecular
approaches to the study of
microorganisms
The genomes of living organisms are essentially
barcodes and contain sufficient information to
both identify the microorganism as well as outline
its physiological functionality (Blaxter, 2003;
Blaxter and Floyd, 2003). Genomes contain
areas of high and low identity with the differences
between the corresponding genes typically clus-
tered in sections such as the third (wobble) bases
of codons, intronic, and intergenic DNA. As sig-
nificant stretches of the genome are maintained by
selection to be identical or near-identical between
members within a taxon, but which vary between
taxa, these segments can be applied to both iden-
tification and taxonomy. In addition, as these
sequences evolve, they represent both specific
and systematic data (Woese, 1987). This makes
sequence-based methods incredibly powerful,
and this field has revolutionized the ways that we
both classify and study microorganisms in their
ecosystems, as well as how we screen for novel
products and processes.
Whole genome sequencing
The first microbial genome to be sequenced to
completion was that of the human pathogen Hae-
mophilus influenzae (Fleischmann et al., 1995). In
the same year, the smallest genome of a free-living
microorganism, the pathogenic microorganism
Mycoplasmagenitalium,wasalsosequenced(Fraser
et al., 1995). Since then, the complete sequences
of more 4328 microbial genomes have been
25. Jiwaji et al.
10 |
elucidated (complete and permanent draft) of
which 187 are archaeal, 3958 are prokaryotes and
183 are eukaryotic (www.genomesonline.org,
April 2013). Initially, genome sequence projects
focused on sequencing pathogenic bacteria, now
biotechnological consortia are catching up: cur-
rently 47% of all bacterial-sequencing projects
deal with microorganisms that have industrial
applications, 52% of the projects are focused on
sequencing pathogens and approximately 1%
have targeted exotic organisms (Bode and Muller,
2005; www.genomesonline.org, May 2013; www.
tigr.org, May 2013).
Knowledge of the genome of a given organism
provides tremendous biological insight into cellu-
lar processes that may not be evident when using
classical culturing/assay techniques which are
limited to the study of phenotypic characteristics
under the culture/assay conditions. This allows
for the identification of genes encoding for poten-
tially economically useful metabolites or proteins
which may not be produced or expressed under
current culture/assay conditions. Furthermore,
by increasing our understanding of cellular pro-
cesses and mechanisms of gene regulation within
target organisms, the ability to optimize microbial
metabolism for enhanced application in industry
is increased (Xu et al., 2013). In addition to the
information generated when sequencing a single
genome, the large number of genome sequences
whicharecurrentlypubliclyavailableondatabases
allows for comparative studies between genomes
of different organisms. Such comparative studies
can provide valuable information with respect to
the encoded function of genes particularly when
genomes of closely related strains with differing
phenotypes are compared against one another. In
addition to novel species, multiple isolates of the
same species are also being sequenced. This is due
to the fact that even well-known species such as
Escherichia coli show large levels of heterogene-
ity between strains. For example, comparison of
E. coli genomes sequenced to completion show
a discrepancy in size from 4.6 to 5.5Mbp. This
means that there are close to one million nucleo-
tides worth of sequence data that is present in
one strain but absent in another (Binnewies et
al., 2006). Comparative genomics is discussed in
detail in Chapter 5.
Currently, the most widely utilized approach
to whole genome sequencing is termed ‘shotgun
sequencing’(seeChapter2).Inthistechnique,the
genome of a chosen microbe is randomly sheared
into millions of DNA fragments which are then
sequenced. Owing to the random nature of the
DNA shearing, many of these fragments will over-
lap in terms of sequence data. By aligning these
overlapping fragments against one another, it is
possible to assemble a larger contiguous sequence
(contig). If there are regions of the genome which
are not represented in the sequenced fragment
library, this will result in contigs which do not
overlap and can therefore not be joined together
to form a full length sequence of the genome. In
this instance, targeted re-sequencing of the miss-
ing region is then done by amplifying this region
of the genome using primers specific to the termi-
nal sequence in the contig. Once the genome has
been assembled, the annotation of the genome is
begunwherethestructuralandfunctionalfeatures
of the genome are identified using bioinformatics
tools (discussed in detail in Chapter 4).
Microbial species diversity
Microbial species diversity in a given environment
is determined by the physico-chemical nature of
the environment as well as by changes that are
caused by the metabolic activities of the microor-
ganisms within the community (see Chapter 6).
Ideally, a study designed to analyse the biodiver-
sity in an environmental sample would examine
all the species within the community as well as the
size of these species populations. As more envi-
ronments are being sampled and analysed, it is
evident that the majority of the microbes have yet
to be cultured (Rappe and Giovannoni, 2003).
This is driving the need for technologies to cul-
tivate and enrich formerly ‘not-yet-culturable’
organisms (Stewart, 2012). This is a laborious
and time-consuming process and so techniques
that circumvent the cultivation of organisms but
still allow the exploration of diversity are being
explored.
When selecting specific genetic regions for
analysis of microbial populations, factors such a
ubiquity (the target gene must be present in all
species), species sequence conservation (to allow
for species identification) and evolution-induced
26. Microbes: An Unseen Majority Around Us | 11
interspecies variability (to allow for differentia-
tion between species as well as to infer taxonomic
relatedness) need to be considered (Petti, 2007).
The sequences that have been applied to molecu-
lar barcoding include the:
1 nuclear small subunit ribosomal RNA gene
(SSU, also known as 16S rRNA in prokary-
otes, and 18S rRNA in most eukaryotes);
2 nuclear large-subunit ribosomal RNA gene
(LSU, also known as 23S rRNA and 28S
rRNA gene);
3 internal-transcribed spacer section of the
ribosomal RNA cistron (ITS, separated by
the 5S ribosomal RNA gene into ITS1 and
ITS2 regions);
4 mitochondrial cytochrome c oxidase 1 (CO1
or COX1) gene; and
5 chloroplast ribulose bisphosphate
carboxylase large subunit (rbcL) gene
(Blaxter, 2004).
Analysis of the 16S rRNA gene is discussed
in detail in Chapter 8 and genomic barcoding is
covered further in Chapter 10.
Each gene target has both its benefits and
pitfalls and these have to be weighed up when
selecting the most appropriate gene. The useful-
ness of each of these genes is determined by (i)
the ease with which it can be isolated from the
sample, (ii) the level of variation between indi-
viduals, (iii) the ease with which the amplified
sequences can be aligned and analysed and (iv)
the availability and number of other sequences
from known and identified specimens. It should
be kept in mind that sequence-based analyses can
be adversely affected if the DNA has extreme base
composition biases, polynucleotide runs or very
stable secondary structures thus this needs to be
taken into account when the targets are selected
for analysis. In addition, with a sequence-based
system, it is essential that a repository of sequence
information, whether these sequences are housed
by annexes to databases like EMBL/GenBank or
in a freestanding effort, is available (Hebert et al.,
2003).
For these targets, the PCR can amplify suf-
ficient quantities of DNA for study and there are
numerous approaches to analyse these amplified
gene targets. For example, randomly amplified
polymorphic DNA (RAPD), terminal restriction
fragment length polymorphism (TRFLP), or
denaturing gradient gel electrophoresis (DGGE)
allow for the assessment of microbial diversity
as well as comparison of community structure
between different ecosystems or over time (Liu
et al., 1997). While these approaches are rapid
and provide useful information in terms of shifts
in microbial population dynamics, their major
disadvantage lies in the fact that identification
of the individual microbes cannot be carried
out. In order to identify the species, analysis of
the nucleotide sequence must be done. Up until
recently, this has been achieved by Sanger-based
sequencing of individual clones representing
PCR-amplified genes from a few hundred repre-
sentatives in a microbial population. The major
limitation of this is that only the dominant
genotypes within the target population would be
sampled. By contrast, next-generation sequencing
technologies generate several thousand sequences
per sample which consequently allows for the
detection of rare biosphere within a given ecosys-
tem community. This rare biosphere is important
in terms of ecosystem functioning as they may
represent critical components of complex con-
sortia or be dormant vestiges of a past ecological
setting with the potential to become dominant
should environmental conditions shift in favour
of their growth. An additional advantage of next-
generation sequencing technologies is that, due to
the large number of sequences generated, a more
accurate representation of the relative abundances
of the bacterial phylotypes within a target ecosys-
tem can be provided (Sogin et al., 2006; Huse et
al., 2008).
The first studies of bacterial diversity using
culture-independent methods have completely
changed our view of the biosphere. Many new
prokaryotic divisions have been described based
on culture-independent studies (Giovannoni et
al., 1990; Schmidt et al., 1991; Barns et al., 1994).
While some of the sequences were close to known
cultured taxa, others showed deeply divergent
lineages with no known cultured members. Many
of these newly identified lineages are now known
to be widespread in many different environments
and are sometimes dominant, be it numerically
27. Jiwaji et al.
12 |
or ecologically. While it is true that some globally
distributed ecosystems have a low bacterial diver-
sity (Hentschel et al., 2002), most environments,
including some of the most inhospitable regions
of our planet, showed an unexpected and deep
diversity (Pace, 1997; Rothschild and Mancinelli,
2001; Furlong et al., 2002; Venter et al., 2004).
However, one must bear in mind that there
are pitfalls in assessing microbial diversity using
PCR-based techniques as PCR amplification may
introduce a bias with each physical, chemical
and biological step resulting in a distorted view
of the actual environment. This may be caused
by the choice of the primers, the conditions of
PCR, potential inhibitors that are present in the
reactions and variable amplification efficiencies of
target genes between species to name a few poten-
tial issues (Wintzingerode et al., 1997).
Metagenomics
Metagenomics provides a gene-based explora-
tion of the microbial community as a whole on
the basis of genetic material (DNA or RNA) and
it returns high resolution data rich information
(Moore et al., 2011). The value of this information
has been enhanced by the availability of sequence
data on a large number of genomes. This topic is
discussed in detail in Chapter 7.
Metagenomic analysis involves isolating DNA
from an environmental sample and analysing
the totality of the DNA. As a consequence, the
DNA libraries contain the genetic information of
all organisms present at a specific location at the
sampling time (Daniel, 2004; Streit et al., 2004;
Grant et al., 2006). Previously, this DNA would
be cloned into suitable vectors, the clones trans-
formed into an appropriate host bacterium, and
the resulting transformants screened. The clones
could be screened for phylogenetic markers, for
conserved genes, for expression of specific traits
such as enzyme activity or antibiotic production
ortheycouldbesequencedviaSangersequencing.
In the case of sequencing data, these sequences
can be used to query databases allowing for the
inference of phylogeny or the identification of
putative functional genes. With the availability
of next generation sequencing technologies,
metagenomes can be analysed without the need
for time-consuming and labour intensive cloning
steps. Instead, DNA isolated from environmental
samples can be sequenced directly (Tuffin et al.,
2009).Thisapproachhasbeensuccessfullyapplied
to terrestrial and aquatic environments resulting
in the discovery of genes for antibiotics, antibiotic
resistance and industrial enzymes (D’Costa et al.,
2007; Lammle et al., 2007; Suenaga et al., 2007).
Examples of enzymes isolated from microorgan-
isms using a functional metagenomics approach
include lipases, esterases, amylases, amidases and
chitinases (Hardeman et al., 2007; Lee et al., 2007;
Chu et al., 2008; Xu et al., 2008). Thus the applica-
tion of metagenomics has paved the way for the
discovery of new genes, proteins and biochemi-
cal pathways (Prakash and Taylor, 2012). This
technology has also been very important for the
identification of new biocatalysts that have been
developed by nature, isolated by bioprospecting
and optimized by directed evolution (Fernández-
Arrojo et al., 2010; Yeh et al., 2011; de Pascale et
al, 2012). Viral metagenomics studies have shown
that up to 60% of the sequences in a viral prepara-
tion are unique, these virus sequences represent
unknown viral species that would be missed by
traditional Sanger sequencing approaches but are
detected by the application of next generation
sequencing technologies (Delwart, 2007; Mokili
et al., 2012).
The full potential of metagenomics is yet to
yet be fully realized. This can be attributed to the
inability of some metagenomic clones to produce
active enzymes. Also, functional metagenomic
approaches rely on E. coli as an expression host
for metagenome-encoded proteins. While this
may occur for a large number of genes, others
from more distantly related organisms may not
be expressed because of differences in the gene
promoters, or the levels of expression may be det-
rimentally affected due to differences in the codon
usage. If transcription and translation of foreign
genes results in the production of protein, it is
possiblethatthelimitationsof E. colitopost-trans-
lationally modify or export the protein may result
in the lack of detectable active protein. The availa-
bility of suitable hosts for heterologous expression
remains a barrier to efficient mining of functional
metagenomic data (Handelsman, 2004). It is pos-
sible to couple functional screening approaches
to other approaches like substrate-induced gene
28. Microbes: An Unseen Majority Around Us | 13
expression screening (SIGEX) to enhance data
mining (Uchiyama et al., 2005; Uchiyama and
Watanabe, 2007). SIGEX was initially developed
to detect metagenomic clones that expressed
catabolic genes of interest in the presence of
appropriate substrates. The time-consuming and
labour-intensive nature of functional screens can
also be alleviated by using robotic instrumenta-
tion to screen large clone libraries for functional
activities in a high-throughput manner (Kennedy
et al., 2008).
Metagenomics, in addition to being a powerful
technique in itself, can be paired with metatran-
scriptomics and metaproteomics to generate
complementary datasets for a more thorough
analysis of the microorganisms in a community
(Ram et al., 2005; Frias-Lopez et al., 2008) includ-
ing the human microbial biome (discussed in
Chapter 8). Despite the recent exciting research
advances involving next generation sequenc-
ers, it should be noted that the application of
these methods is still in its infancy. Efficient and
rigorous data analysis pipelines need to be imple-
mented and further studies are required to verify
the robustness of these techniques as well as the
correlation of these results with those obtained by
previous methods.
Metatranscriptomics and
metaproteomics
Adaptive responses are driven by changing levels
of transcription in the cell as well as changes
in the levels of translation. Metatranscriptom-
ics, and by extension metaproteomics, focuses
on microbial gene expression within complex
natural habitats, allowing for culture-independent
whole-genome expression profiling of complex
microbial communities (Moran, 2009; Sorek and
Cossart, 2010; Gosalbes et al., 2011; Mader et al.,
2011). However, mining the ‘transcriptome’ and
the ‘proteome’, which represent the collection of
transcribed sequences and the translated proteins
respectively, poses a significant challenge, particu-
larly when it comes to comparing data generated
ondifferent‘omics’platforms.Therearebothtech-
nical and biological hurdles to overcome. Efficient
techniques to isolate total environmental RNA
and protein are still being developed; however,
the inherent complexities of RNA and proteins
means that these techniques lag behind those that
have been developed for DNA. The relationship
between RNA and protein is complex; thus, it is
important to be aware of biases in the techniques,
for example the differential lifetimes of mRNA
and protein. This requires the analysis of temporal
changes in transcript and protein levels. The chal-
lenge with transcriptomic and proteomic datasets
remains the identification of true mRNA–protein
concordance and discordance (Hack, 2004; Fang
et al., 2008). For this reason, metatranscriptom-
ics and metaproteomics are fields that are still
being developed, particularly the ability to study
gene expression and protein translation in natural
environments, which hold special promise for
studying microorganism function in ecosystems.
The transcriptome refers to coding RNA
(mRNA) and noncoding RNA (including rRNA,
tRNA, structural RNA, regulatory RNA and
other RNA species). In addition, when discuss-
ing RNA species, it is important to differentiate
between de novo synthesized RNA (capped pri-
mary transcripts) and post-transcriptionally
modified (uncapped secondary) transcripts as
only the processed RNAs will be represented
in the proteome of the microorganism. Under-
standing transcriptome dynamics is essential for
a clearer understanding of the functional output
of the genome and can provide valuable insight
into gene expression patterns, gene function and
regulation (van Vliet, 2010).
Early studies of transcriptional activity in
microbial cells relied on the sequence analysis
of cDNA libraries. Currently, microarrays are
the most widely used technique for studying
transcriptomes. Arrays provide specific and
relative quantification of gene expression free of
the bias that is associated with cloning and they
allow high-throughput analysis of relative tran-
script levels (Hinton et al., 2004). The technique
involves the conversion of cellular RNA into
labelled cDNA, which in turn is used for hybridi-
zation to short oligonucleotides that represent the
coding sequences within a genome. To analyse
the transcriptome, annotated genome sequences
have been used to construct microarrays that
represent the majority of all of the predicted genes
in a genome. For example, the transcriptomes of
Mycoplasma pneumonia, Halobacterium salinarum,
29. Jiwaji et al.
14 |
Caulobacter crescentus, Bacillus subtilis, Escherichia
coli and Listeria monocytogenes have all been ana-
lysed on arrays (Selinger et al., 2000; McGrath
et al., 2007; Güell et al., 2009; Koide et al., 2009;
Rasmussen et al., 2009; Toledo-Arana et al., 2009;
Wurtzel et al., 2009).
Recently there have been dramatic advances
enabling the extension of DNA microarray tech-
nology applications to studying environmental
transcriptomics (Parro et al., 2007; Moreno-Paz
et al., 2010; Pinto et al., 2011). Microarrays that
target the 16S rRNA gene have been used to study
the population structure of microbial communi-
ties (Brodie et al., 2006). Functional gene arrays,
for example the Geochip array (He et al., 2007),
have been used to study nitrogen and carbon
metabolism in Antarctic and PCB contaminated
soils (Leigh et al., 2007; Yergeau et al., 2007)
and the cellular processes in microbial biofilms
(Duran-Pinedo et al., 2011). Phylogenetic micro-
arrays are discussed in detail in Chapter 9.
While arrays have been instrumental in
developing the current understanding of tran-
scriptomes, we have started to reach the limits of
the applicability of this technology (Bloom et al.,
2009). Arrays, like other hybridization-depend-
ent techniques, have a relatively limited dynamic
range for the detection of the levels of transcripts
due to background, saturation and spot density
and quality. Arrays need to include sequences
that cover multiple strains as mismatches
negatively affect hybridization efficiency and the
design of appropriate probes is critical to avoid
a high background due to nonspecific- or cross-
hybridization. The comparison of transcription
levels between experiments is challenging and
usually requires complex normalization methods
(Hinton et al., 2004). Finally, the sensitivity of
scanning instruments determines the quality
of data that is collected. As probe design limits
detection to sequences that are already known,
microarrays are more appropriate when they are
applied to characterize ecosystems rather than to
discover new genes and functions. Also important
is the difficulty in differentiating between de novo
synthesized transcripts and modified transcripts.
While there are techniques that will allow the
differentiation of the two sets of transcripts
(Hashimoto et al., 2009), the alternative is to
couple array-based transcript analysis to a high-
throughput sequencing of cDNA libraries (Hoen
et al., 2008), resulting in increased transcriptomic
data that has been validated by independent plat-
forms (Roh et al., 2010).
All next generation sequencing technologies
can be used for transcriptome sequencing. In this
case,totalRNAisextractedfromtheorganismand
converted into cDNA by reverse transcription.
Because prokaryotic mRNAs lack the poly(A)
tail that is typically used for reverse transcription
priming in eukaryotic RNA-seq applications,
alternative priming approaches are used. These
include random hexamer priming (Passalacqua
et al., 2009), oligo(dT) priming from artificially
polyadenylated mRNAs (Frias-Lopez et al., 2008)
and priming from a specific RNA probe ligated
to mRNAs (Wurtzel et al., 2009). Sequencing
platforms have been used to detect genes, intron
usage and alternative initiation codons in yeast
(Nagalakshmi et al., 2008; Sorek et al., 2010; van
Vliet, 2010; Mader et al., 2011). The application
of these platforms is incredibly powerful. In a
recent article, 512 new genes were predicted for
the nitrogen-fixing plant symbiont Sinorhizobium
meliloti after analysis of its transcriptome (Mao et
al., 2008).
There are, however, technical challenges with
metatranscriptomics approaches, which include
the need for the standardization of protocols such
that datasets can be compared. With environmen-
tal samples, technical choices can dramatically
affect results and molecular techniques are prone
to artefacts and biases no matter what platforms
are chosen for sample analysis and data collection
(Raz et al., 2011). The reproducible recovery of
mRNA and proteins from natural environments
is technically challenging as bacterial mRNA
often has a very short half-life and hence can
be highly unstable (Deutscher, 2003; Condon,
2007). Furthermore, obtaining sufficient mRNA
for replicated studies with environmental samples
is often difficult. It is essential that future stud-
ies consider how to meet accepted standards for
independent replication, as this will allow robust
statistical analyses of changes or differences in
expression patterns (Moran, 2009). In these cases
it is necessary first to amplify the RNA. Cao et
al. (2010) performed a comparison of several
30. Microbes: An Unseen Majority Around Us | 15
methodscapableofamplifyingRNAfortranscrip-
tome analyses and found that the polyadenylation
of bacterial RNAs and subsequent oligo-dT prim-
ing for amplification was sensitive and specific for
the measurement of differential gene expression
as well as metatranscriptome analyses.
Bacterial and archaeal mRNAs are generally
unstable and typically do not carry polyA tails
thus methods for the specific capture of eukary-
otic cDNAs are not applicable (Deutscher, 2003;
Wang et al., 2009). Extraction of microbial RNA
results in co-extraction of the more abundant and
stable rRNAs (which represents more than 80%
of total prokaryotic RNA) and tRNAs (Liu et al.,
2009; Yoder-Himes et al., 2009). This can lead to
low yields of expressed gene sequences in a large-
scale sequencing run, potentially as low as 10%
(Moran, 2009). Many sequencing protocols using
RNA as the template rely on the removal of rRNA
by subtractive hybridization (Chen and Duan,
2011; Liu and Camilli, 2011), or exonuclease
treatment (Sharma et al., 2010) before the prepa-
ration of cDNA libraries. It has to be borne in
mind that any treatments, including rRNA deple-
tion methods, will probably introduce unknown
biases so alternative approaches are being consid-
ered, for example the NSR-Seq (not so random)
that uses computationally designed hexamers to
selectively enrich for mRNA transcripts during
cDNA synthesis (Armour et al., 2009; Hirakawa
et al., 2011).
Metaproteomics promises to be an exciting
techniquethatiscomplementarytometatranscrip-
tomics (Maron et al., 2007; Siggins et al., 2011).
Metaproteomic data provides a less-resolved
view of instantaneous regulatory responses than
metatranscriptomics, but will provide a better
link to metabolic function. Successful application
of this protein-based technique relies on effective
recovery of proteins from samples. Ideally, a pro-
cedure for recovering microbial proteins should
allow highly efficient protein recovery to obtain
a protein pool that is (1) of sufficient purity for
analysis using the biochemical methods available,
and (2) representative of the total proteins within
the natural microbial community (Vieites et al.,
2009).
This approach was first used to analyse the
community proteome in a natural biofilm growing
inside the Richmond Mine at Iron Mountain,
northern California, in the USA, and successfully
combined mass spectrometry proteomics with
community genomic analyses to yield rich and
robust data (Ram et al., 2005). This approach
is becoming increasingly important in under-
standing the role of the microbiota in ecosystem
functioning in the environment (Abram et al.,
2011; Wang et al., 2011). In a more recent study,
Kolmeder et al. (2012) used the same approach to
investigate the stability and function of the human
intestinal microbiota.
The profiling of biological samples for
biochemical reaction products, or so-called
metabolome, serves to elucidate the main meta-
bolic pathways and metabolic bottlenecks and, in
the context of microbial communities, it may be
helpful to access and track the complex metabolic
interactions between microorganisms. Being a
post-genomics tool, metabolomics is a young and
vibrant field of research still in its growth phase
(Motti, 2012). At present, the technical issues of
protein extraction, separation, and identification
make metaproteomics more challenging than
metatranscriptomics. There is much still to be
done in the development of the metaproteomic
technologies until they can be routinely applied
to analysis of the metaproteome in environmental
samples.
As sequencing information is accumulated, bio-
informatics techniques are also being applied to
turning this information into knowledge such that
scientistscanassessmicrobesatthemolecularand
the functional level. This is assisted by the devel-
opment of specific computational biology tools
to assess the phylogenetic relationships and meta-
bolic functions on the basis of the comparison of
gene sequences (Prakash and Taylor, 2012).
Many technical challenges remain in metagen-
omic, metatranscriptomic and metaproteomic
approaches, including extraction of the sample
material (be it DNA, RNA, or protein), mRNA
instability, low abundance, and low proportion
of mRNA in total RNA (Wooley and Ye, 2009;
Carvalhais et al., 2012; Muth et al., 2012). Micro-
fluidics, the manipulation of small volumes and by
31. Jiwaji et al.
16 |
extrapolation,smallersamples,willenhanceallthe
high throughput meta techniques. The combina-
tion of high throughput screening, microfluidics,
and dilution-to-extinction techniques can make
accessible previously unculturable species such
that they can be characterized and their biotech-
nological potential evaluated. As these are still
relatively new approaches, solutions to these tech-
nical problems will be developed as researchers
are required to address them.
In addition, there are the bioinformatic chal-
lenges (Schneider and Orchard, 2011). With the
advent of the ‘omics’ era, metagenomic, metatran-
scriptomic, and metaproteomic technologies are
producing more and more large datasets from
the analysis of environmental samples. Analysis
resources and tools have been developed in an
attempt to maximize the meaningful data that can
be obtained from these vast datasets. This requires
the available capacity in the public databases for
data storage. Currently, there are a few public
databases that store annotated microbial genomes
and allow various types of searches and sequence
analyses. However, the number of data reposito-
ries for environmental sequences is limited. And
although GenBank serves as the main repository
for all public sequences, the annotation quality
for environmental data is poor and the options
for comparative analyses are limited (Vieites et al.,
2009).
The lack of reference sequences and genomes
presents an additional difficulty when approach-
ing metagenomic data however, the increase in
available data and analysis tools should stead-
ily reduce this problem. Also, the inclusion of
comprehensive sampling and processing meth-
odologies will support the contextualization of
sequence data thereby improving interpretation
of data obtained (Kennedy et al., 2010). Another
important and difficult challenge for all ‘meta’
approaches is that only a small percentage of the
vast number of ecologically important genes have
been correctly annotated and sequence datasets
often contain only the most abundant genes from
a very limited number of natural microbial com-
munities. Furthermore, many sequences cannot
be confidently assigned a function as they have no
close matches in existing databases. For example,
in a metatranscriptomics study by Poretsky et al.
(2009), only 33% of the putative protein-encod-
ing sequences exhibited homology to annotated
proteins in the NCBI RefSeq database, only 24%
had matches to a KEGG pathway, and only 16%
to a COG category. It is essential to have accurate
sequence determinations to be able to map cDNA
readsontothegenomeandtoremovepoor-quality
sequences (Marioni et al., 2008; Jiang and Wong,
2009; Oshlack and Wakefield, 2009) which makes
this a limitation to detailed analysis of ‘meta’ data.
While the larger datasets will allow more
accurate determination of transcript levels and
associated statistics, they also increase the risk of
large volumes of meaningless data. Visualization,
analysis and interpretation of these large datasets
require significant levels of expertise and some-
times appropriate programming skills. There are
some bioinformatics tools available, for example
ARTEMIS (Carver et al., 2008), LASERGENE
(DNAstar) and CAMERA (Seshardi et al., 2007),
however continued and coordinated effort will
be required to develop new tools for analysis of
burgeoning datasets as these large datasets need
to be critically analysed. In particular, databases
and software tools are essential and their further
development is needed to deal with the grow-
ing bottleneck in the analysis of metagenomic,
metatranscriptomic and metaproteomic data.
Concluding remarks
Biodiversity is defined as ‘all hereditarily based
variation at all levels of organization, from the
genes within a single local population or species,
to the species composing all or part of a local com-
munity,andfinallytothecommunitiesthemselves
that compose the living parts of the multifarious
ecosystems of the world’ (Wilson, 1997). By
this definition, microbial diversity represents the
genetic composition of the microorganisms and
the environment or habitat in which they are
found, as well as their ecological or functional role
within the ecosystem.
To better understand the environmental biodi-
versity, new approaches are being developed that
are likely to have a major impact on biodiscovery
of novel enzymes from microorganisms that are
difficult to culture. These new methodologies
include single cell analysis (Fritzsch et al., 2012),
32. Microbes: An Unseen Majority Around Us | 17
high-throughput nanoscale sequencing (Wanunu,
2012) and sequencing single molecules (Kumar et
al., 2012).
Whilethemetagenomicsapproachesdescribed
above are useful for exploiting the biochemistry
of microorganisms in a community, they are how-
ever unable to access the metabolic capabilities
associated with specific microorganisms within
the consortia; given that they largely rely more on
a ‘take all’ approach (Kennedy et al., 2010). In this
era of generation of large datasets (i.e. metagen-
omics is providing a gene-based exploration of the
community as a whole), the classical techniques
still have their place as the cultivation and analysis
of individual members of this community will
continue to be the mainstay for testing metabolic
abilities and more detailed genomic studies.
Emerging DNA, RNA, and protein sequenc-
ing technologies are exponentially increasing our
understanding of the microbial world from the
level of a single cell to that of a complex microbial
consortia and from environments ranging from
deep sea thermal vents to acidic hot springs,
permafrost to desert soils, the human mouth to
termite guts. The new technologies have resulted
in a situation where the limitation is no longer
the ability to produce biological data but rather
the challenge now is with developing effective
tools to analyse the data to generate meaningful
information. The demand for improved bioinfor-
matics tools has stimulated an extremely active
field of science which is continually expanding
and enriching our understanding of microorgan-
isms.
References
Abram, F., Enright, A.M., O’Reilly, J., Botting, C.H., Col-
lins, G., and O’Flaherty, V. (2011). A metaproteomic
approach gives functional insights into anaerobic
digestion. J. Appl. Microbiol. 110, 1550–1560.
Ansorge, W.J. (2009). Next-generation DNA sequencing
techniques. New Biotechnol. 25, 195–203.
Armour, C.D., Castle, J.C., Chen, R., Babak, T., Loerch,
P., Jackson, S., Shah, J.K., Dey, J., Rohl, C.A., Johnson,
J.M., and Raymond, C.K. (2009). Digital transcrip-
tome profiling using selective hexamer priming for
cDNA synthesis. Nat. Methods 6, 647–649.
Augustinovic, Z., Birketveit, O., Clements, K., Freeman,
M., Gopi, S., Ishoey, T., Jackson, G., Kubala, G., Larsen,
J., Marcotte, B.W.G., et al. (2012). Microbes- oilfield
enemies or allies? Oilfield Rev. 24, 4–17.
Aurilia, V., Parracino, A., and D’Auria, S. (2008). Microbial
carbohydrate esterases in cold adapted environments.
Gene 410, 234–240.
Azerad, R. (1995). Application of biocatalysis in organic
synthesis. Bull. Soc. Chim. Fr. 132, 17–51.
Baker-Austin, C., and Dopson, M. (2007). Life in acid:
pH homeostasis in acidophiles. Trends Microbiol. 15,
165–171.
Barns, S.M., Fundygam, R.E., Jeffries, M.W., and Pace,
N.R. (1994). Remarkable archaeal diversity detected
in a Yellowstone National Park hot spring environ-
ment. Proc. Natl. Acad. Sci. U.S.A. 91, 1609–1613.
Bell, T., Newman, J.A., Silverman, B.W., Turner, S.L., and
Lilley, A.K. (2005). The contribution of species rich-
ness and composition to bacterial services. Nature 436,
1157–1160.
Bennett, S. (2004). Solexa Ltd. Pharmacogenomics J. 5,
433–438.
Bennett, S.T., Barnes, C., Cox, A., Davies, L., and Brown,
C. (2005). Toward the 1,000 dollars human genome.
Pharmacogenomics J. 6, 373–382.
Bentley, D.R. (2006). Whole-genome re-sequencing,
Curr. Opin. Genet. Dev. 16, 545–552.
Binnewies, T.T., Motro, Y., Hallin, P.F., Lund, O., Dunn, D.,
La, T., Hampson, D.J., Bellgard, M., Wassenaar, T.M.,
and Usserv, D.W. (2006) Ten years of bacterial genome
sequencing: comparative-genomics-based discoveries.
Funct. Integr. Genomics 6, 165–85.
Blaxter, M.L. (2003). Molecular systematics: counting
angels with DNA. Nature 421, 122–124.
Blaxter, M.L. (2004). The promise of a DNA taxonomy.
Phil. Trans. R. Soc. Lond. B 359, 669–679.
Blaxter, M.L., and Floyd, R. (2003). Molecular taxonom-
ics for biodiversity surveys: already a reality. Trends
Ecol. Evol. 18, 268–269.
Bloom, J.S., Khan, Z., Kruglyak, L., Singh, M., and Caudy,
A.A. (2009). Measuring differential gene expression
by short read sequencing: quantitative comparison to
2-channel gene expression microarrays. BMC Genom.
10, 221.
Bode, H.B., and Muller, R. (2005). The impact of bacterial
genomics on natural product research. Angew. Chem.
44, 6828–6846.
Bornscheuer, U.T., Huisman, G.W., Kazlauskas, R.J., Lutz,
S., Moore, J.C., and Robins, K. (2012). Engineering
the third wave of biocatalysis. Nature 485, 185–194.
Brodie, E.L., Desantis, T.Z., Joyner, D.C., Baek, S.M.,
Larsen, J.T., Andersen, G.L., Hazen, T.C., Richardson,
P.M., Herman, D.J., Tokunaga, T.K., Wan, J.M., and
Firestone, M.K. (2006). Application of a high-density
oligonucleotide microarray approach to study bacterial
population dynamics during uranium reduction and
reoxidation. Appl. Environ. Microbiol. 72, 6288–6298.
Cao, F.L., Liu, H.H., Wang, Y.H., Liu, Y., Zhang, X.Y.,
Zhao, J.Q., Sun, Y.M., Zhou, J., and Zhang, L. (2010).
An optimized RNA amplification method for prokary-
otic expression profiling analysis. Appl. Microbiol.
Biotechnol. 87, 343–352.
Carvalhais, L.C., Dennis, P.G., Tyson, G.W., and Schenk,
P.M. (2012). Application of metatranscriptomics to
soil environments. J. Microbiol. Methods 91, 246–51.
33. Jiwaji et al.
18 |
Carver, T., Berriman, M., Tivey, A., Patel, C., Bohme,
U., Barrell, B.G., Parkhill, J., and Rajandream, M.A.
(2008). Artemis and ACT: viewing, annotating and
comparing sequences stored in a relational database.
Bioinformatics 24, 2672–2676.
Cavigelli, M.A., and Robertson, G.P. (2000). The
functional significance of denitrifier community
composition in a terrestrial ecosystem. Ecology 81,
1402–1414.
Chen, Z., and Duan, X. (2011) Ribosomal RNA depletion
for massively parallel bacterial RNA-sequencing appli-
cations. Methods Mol. Biol. 733, 93–103.
Chu, X., He, H., Guo, C., and Sun, B. (2008). Identifica-
tionoftwonovelesterasesfromamarinemetagenomic
libraryderivedfromSouthChinaSea.Appl.Microbiol.
Biotechnol. 80, 615–625.
Condon, C. (2007). Maturation and degradation of RNA
in bacteria. Curr. Opin. Microbiol. 10, 271–278.
Cotner, J.B., and Biddanda, B.A. (2002). Small Players,
Large Role: Microbial Influence on Biogeochemical
Processes in Pelagic Aquatic Ecosystems. Ecosystems
5, 105–121.
Cotner, J.B., Ogdahl, M.L., and Biddanda, B.A. (2001).
Double-stranded DNA measurement in lakes with
the fluorescent stain PicoGreen and the application to
bacterial bioassays. Aquat. Microb. Ecol. 25, 65–74.
Cottrell, M.T., Moore, J.A., and Kirchman, D.L. (1999).
Chitinases from uncultured marine microorganisms.
Appl. Environ. Microbiol. 65, 2553–2557.
Daniel, R. (2004). The soil metagenome – a rich resource
for the discovery of novel natural products. Curr. Opin.
Biotechnol. 15, 199–204.
Davis, R.C. (1981). Structure and function of two Ant-
arctic terrestrial moss communities. Ecol. Monogr. 51,
125–143.
D’Costa, V.M., Griffiths, E., and Wright, G.D. (2007)
Expanding the soil antibiotic resistome: exploring
environmental diversity. Curr. Opin. Microbiol. 10,
481–489.
DeLong, E.F. (2007). Modern microbial seascapes. Nat.
Rev. Microbiol. 5, 755–757.
DeLong, E.F., and Karl, D.M. (2005). Genomic perspec-
tives in microbial oceanography. Nature 437, 336–342.
Delwart, E.L. (2007). Viral metagenomics. Rev. Med.
Virol. 17, 115–131.
Demain, A.L. (2007). The business of biotechnology. Ind.
Biotechnol. 3, 269–283.
Desai,C.,Pathak,H.,andMadamwar,D.(2010).Advances
in molecular and ‘-omics’ technologies to gauge micro-
bial communities and bioremediation at xenobiotic/
anthropogen contaminated sites. Bioresour. Technol.
101, 1558–69.
Deutscher, M.P. (2003). Degradation of stable RNA in
bacteria. J. Biol. Chem. 278, 45041–45044.
De Pascale, D., Cusano, A.M., Autore, F., Parrilli, E., di
Prisco, G., Marino, G., and Tutino, M.L. (2008). The
cold-active Lip1 lipase from the Antarctic bacterium
Pseudoalteromonas haloplanktis TAC125 is a member
of a new bacterial lipolytic enzyme family. Extremo-
philes 12, 311–323.
De Pascale, D., de Santi, C., Fu, J., and Landfald, B. (2012).
The microbial diversity of Polar environments is a
fertile ground for bioprospecting. Mar. Genomics 8,
15–22.
Dionisi, H.M., Lozada, M., and Olivera, N.L. (2012).
Bioprospection of marine microorganisms: biotech-
nological applications and methods. Rev. Argentina
Microbiol. 44, 49–60.
Doney, S.C. (2006). The dangers of ocean acidification.
Sci. Am. 294, 58–65.
Duran-Pinedo, A.E., Paster, B., Teles, R., and Frias-Lopez,
J. (2011). Correlation network analysis applied to
complex biofilm communities. PLoS One 6, e28438.
Eckburg, P.B., Bik, E.M., Bernstein, C.N., Purdom, E.,
Dethlefsen, L., Sargent, M., Gill, S.R., Nelson, K.E.,
and Relman, D.A. (2005). Diversity of the human
intestinal microbial flora. Science 308, 1635–1638.
Ekborg, N.A., Morrill, W., Burgoyne, A.M., Li, L., and
Distel, D.L. (2007). CelAB, a multifunctional cellulase
encoded by Teredinibacter turnerae T7902T, a cultur-
able symbiont isolated from the wood-boring marine
bivalve Lyrodus pedicellatus. Appl. Environ. Microbiol.
73, 7785–7788.
Fagerbakke, K.M., Heldal, M., and Norland, S. (1996).
Content of carbon, nitrogen, oxygen, sulfur and phos-
phorus in native aquatic and cultured bacteria. Aquat.
Microb. Ecol. 10, 15–27.
Falkowski, P.G., and Oliver, M.J. (2007). Mix and match:
how climate selects phytoplankton. Nat. Rev. Micro-
biol. 5, 813–819.
Fang, H., Wang, K., and Zhang, J. (2008). Transcriptome
and Proteome Analyses of Drug Interactions with
Natural Products. Curr. Drug Metab. 9, 1038–1048.
Fang, J., Zhang, L., and Bazylinski, D.A. (2010). Deep-sea
piezosphere and piezophiles: geomicrobiology and
biogeochemistry. Trends Microbiol. 18, 413–422.
Feller, G. (2013). Psychrophilic Enzymes: From Folding
to Function and Biotechnology. Scientifica http://
dx.doi.org/10.1155/2013/512840
Fernández-Arrojo, L., Guazzaroni, M.E., López-Cortés,
N., Beloqui, A., and Ferrer, M. (2010). Metagenomic
era for biocatalyst identification. Curr. Opin. Biotech-
nol. 21, 725–33.
Ferrer, M., Golyshina, O.V., Chernikova, T.N., Khachane,
A.N., Martins Dos Santos, V.A., Yakimov, M.M.,
Timmis, K.N., and Golyshin, P.N. (2005). Microbial
enzymes mined from the Urania deepsea hypersaline
anoxic basin. Chem. Biol. 12, 895–904.
Firn, R.D., and Jones, C.G. (2003). Natural products – a
simple model to explain chemical diversity. Nat. Prod.
Rep. 20, 382–391.
Fleischmann, R.D., Adams, M.D., White, O., Clayton,
R.A., Kirkness, E.F., Kerlavage, A.R., Bult, C.J., Tomb,
J.F., Dougherty, B.A., Merrick, J.M., et al. (1995).
Whole genome random sequencing and assembly of
Haemophilus influenzae Rd. Science 269, 496–512.
Floyd, M.M., Tang, J., Kane, M., and Emerson, D.
(2005). Captured diversity in a culture collection:
case study of the geographic and habitat distribu-
tions of environmental isolates held at the American
type culture collection. Appl. Environ. Microbiol.
71, 2813–2823.
Francisco, D., Mah, R., and Rabin, A. (1973). Acridine
orange–epifluorescence technique for counting
34. Microbes: An Unseen Majority Around Us | 19
bacteria in natural waters. Trans. Am. Microscop. Soc.
92, 416–21.
Fraser, C.M., Gocayne, J.D., White, O., Adams, M.D.,
Clayton, R.A., Fleischmann, R.D., Bult, C.J., Kerlav-
age, A.R., Sutton, G., Kelley, J.M., et al. (1995) The
minimal gene complement of Mycoplasma genitalium.
Science 270, 397–403.
Frias-Lopez, J., Shi, Y., Tyson, G.W., Coleman, M.L.,
Schuster, S.C., Chisholm, S.W., and DeLong, E.F.
(2008). Microbial community gene expression in
ocean surface waters. Proc. Natl. Acad. Sci. U.S.A. 105,
3805–3810.
Fritzsch, F.S., Dusny, C., Frick, O.,and Schmid, A. (2012).
Single-cell analysis in biotechnology, systems biology,
and biocatalysis. Annu. Rev. Chem. Biomol. Eng. 3,
129–155.
Furlong, M.A., Singleton, D.R., Coleman, D.C., and
Whitman, W.B. (2002). Molecular and culture-based
analyses of prokaryotic communities from an agricul-
tural soil and the burrows and casts of the earthworm
Lumbricus rubellus. Appl. Environ. Microbiol. 68,
1265–1279.
Giovannoni, S.J., Britschgi, T.B., Moyer, C.L., and Field,
K.G. (1990). Genetic diversity in Sargasso Sea bacte-
rioplankton. Nature 345, 60–63.
Goldberg, S.M., Johnson, J., Busam, D., Feldblyum, T.,
Ferriera, S., Friedman, R., Halpern, A., Khouri, H.,
Kravitz, S.A., Lauro, F.M., et al. (2006). A Sanger/
pyrosequencing hybrid approach for the generation
of high-quality draft assemblies of marine microbial
genomes. Proc. Natl. Acad. Sci. U.S.A. 103, 11240–
11245.
Gosalbes, M.J., Durban, A., Pignatelli, M., Abellan, J.J.,
Jimenez- Hernandez, N., Perez-Cobas, A.E., Latorre,
A., and Moya, A. (2011). Metatranscriptomic
approach to analyze the functional human gut micro-
biota. PLoS One 6, e17447.
Grant, S., Grant, W.D., Cowan, D.A., Jones, B.E., Ma, Y.,
Ventosa, A., and Heaphy, S. (2006). Identification
of eukaryotic open reading frames in metagenomic
cDNA libraries made from environmental samples.
Appl. Environ. Microbiol. 72, 135–43.
Güell, M., van Noort, V., Yus, E., Chen, W.-H., Leigh-Bell,
J., Michalodimitrakis, K., Yamada, T., Arumugam, M.,
Doerks, T., Kühner, S., et al. (2009). Transcriptome
complexity in a genome-reduced bacteria. Science
326, 1268–1271.
Hack,C.J.(2004).Integratedtranscriptomeandproteome
data: The challenges ahead. Brief. Funct. Genom. Pro-
teom. 3, 212–219.
Hall, N. (2007) Advanced sequencing technologies and
their wider impact in microbiology. J. Exp. Biol. 210,
1518–1525.
Handelsman, J. (2004). Metagenomics: Application of
Genomics to Uncultured Microorganisms. Microbiol.
Mol. Biol. Rev. 68, 669–685.
Hardeman, F., and Sjoling, S. (2007). Metagenomic
approach for the isolation of a novel low-tempera-
tureactive lipase from uncultured bacteria of marine
sediment. FEMS Microbiol. Ecol. 59, 524–534.
Hashimoto, S., Qu, W., Ahsan, B., Ogoshi, K., Sasaki, A.,
Nakatani, Y., Lee, Y., Ogawa, M., Ametani, A., Suzuki,
Y., et al. (2009). High-resolution analysis of the 5′-end
transcriptomeusinganextgenerationDNAsequencer.
PLoS One 4, e4108.
Hattori, T., Mitsui, H., Haga, H., Wakao, N., Shikano, S.,
Gorlach, K., Kasahara, Y., El-Beltagy, A., and Hattori,
R. (1997). Advances in soil microbial ecology and the
biodiversity. Antonie van Leeuwenhoek 72, 21–28.
Hawkswerth, D.L. (1997). The fascination of fungi:
exploring fungal diversity. Mycologist 11, 18–22.
He, Z., Gentry, T.J., Schadt, C.W., Wu, L., Liebich, J.,
Chong, S.C., Huang, Z., Wu, W., Gu, B., Jardine, P.,
Criddle, C., and Zhou, J. (2007). GeoChip: a compre-
hensive microarray for investigating biogeochemical,
ecological and environmental processes. ISME J. 1,
67–77.
Hebert, P.D.N., Cywinska, A., Ball, S.L., and de Waard,
J.R. (2003). Biological identifications through DNA
barcodes. Proc. R. Soc. Lond. B 270, 313–321.
van der Heijden, M.G.A., Klironomos, J.N., Ursic, M.,
Moutoglis, P., Streitwolf-Engel, R., Boller, T., Wiem-
ken, A., and Sanders, I.R. (1998). Mycorrhizal fungal
diversity determines plant biodiversity, ecosystem
variability and productivity. Nature 396, 69–72.
Hentschel, U., Hopke, J., Horn, M., Friedrich, A.B.,
Wagner, M., Hacker, J., and Moore, B.S. (2002).
Molecular evidence for a uniform microbial commu-
nity in sponges from different oceans. Appl. Environ.
Microbiol. 68, 4431–4440.
Herridge, D.F., Peoples, M.B., and Boddey, R.M. (2008).
Global inputs of biological nitrogen fixation in agricul-
tural systems. Plant Soil 311, 1–18.
Hibbing, M.E., Fuqua, C., Parsek, M.R., and Peterson, S.B.
(2010). Bacterial competition: surviving and thriving
in the microbial jungle. Nat. Rev. Microbiol. 8, 15–25.
Hinton, J.C., Hautefort, I., Eriksson, S., Thompson, A., and
Rhen, M. (2004). Benefits and pitfalls of using micro-
arrays to monitor bacterial gene expression during
infection. Curr. Opin. Microbiol. 7, 277–282.
Hirakawa, H., Oda, Y., Phattarasukol, S., Armour, C.D.,
Castle, J.C., Raymond, C.K., Lappala, C.R., Schaefer,
A.L., Harwood, C.S., and Greenberg, E.P. (2011).
Activity of the Rhodopseudomonas palustris p-cou-
maroyl-homoserine lactone-responsive transcription
factor RpaR. J. Bacteriol. 193, 2598–2607.
Hobel, C.F., Hreggvidsson, G.O., Marteinsson, V.T., Bah-
rani-Mougeot, F., Einarsson, J.M., and Kristjansson,
J.K. (2005). Cloning, expression, and characterization
of a highly thermostable family 18 chitinase from Rho-
dothermus marinus. Extremophiles 9, 53–64.
Hoen, P.A.T., Ariyurek, Y., Thygesen, H.H., Vreugdenhil,
E., Vossen, R.H., de Menezes, R.X., Boer, J.M., van
Ommen, G.J., and den Dunnen, J.T. (2008). Deep
sequencing-based expression analysis shows major
advances in robustness, resolution and inter-lab port-
ability over five microarray platforms. Nucleic Acids
Res. 36, e141.
Horz, H.-P., Barbrook, A., Field, C.B., and Bohannan,
B.J.M. (2004). Ammonia-oxidizing bacteria respond
to multifactorial global change. Proc. Natl Acad. Sci.
U.S.A. 101, 15136–15141.
Hughes Martiny, J.B., Bohannan, B.J.M., Brown, J.H., Col-
well, R.K., Fuhrman, J.A., Green, J.L., Horner-Devine,
35. Jiwaji et al.
20 |
M.C., Kane, M., Adams Krumins, J., Kuske, C.R., et al.
(2006). Microbial biogeography: putting microorgan-
isms on the map. Nat. Rev. Microbiol. 4, 102–112.
Huse, S.M., Dethlefsen, L., Huber, J.A., Welch, D.M.,
Relman, D.A., and Sogin, M.L. (2008). Exploring
microbial diversity and taxonomy Using SSU rRNA
Hypervariable Tag Sequencing. PLoS Genet. 4,
e1000255.
Hutchison III, C.A. (2007). DNA sequencing: bench to
bedside and beyond. Nucleic Acids Res. 35, 6227–
6237.
Jeon, J.H., Kim, J.T., Kim, Y.J., Kim, H.K., Lee, H.S., Kang,
S.G., Kim, S.J., and Lee, J.H. (2009). Cloning and char-
acterization of a new cold-active lipase from a deep-sea
sediment metagenome. Appl. Microbiol. Biotechnol.
81, 865–874.
Jiang, H., and Wong, W.H. (2009). Statistical inferences
for isoform expression in RNA-Seq. Bioinf. 25,
1026–1032.
Karl, D.M. (2007). Microbial oceanography: Paradigms,
processes and promise. Nat. Rev. Microbiol. 5,
759–769.
Karlen, A. (1995). Man and Microbes: Disease and
Plagues in History and Modern Times. (New York:
G.P. Putnam’s).
Kennedy, J., Marchesi, J.R., and Dobson, A.D. (2008).
Marine metagenomics: strategies for the discovery
of novel enzymes with biotechnological applications
from marine environments. Microb. Cell Fact. 7, 27.
Kennedy, J., Flemer, B., Jackson, S.A., Lejon, D.P.H., Mor-
rissey, J.P., O’Gara, F., and Dobson, A.D.W. (2010).
Marine Metagenomics: New Tools for the Study and
Exploitation of Marine Microbial Metabolism. Mar.
Drugs 8, 608–628.
Koeller, K.M., and Wong, C. (2001). Enzymes for chemi-
cal synthesis. Nature 409, 232–240.
Koide, T., Reiss, D.J., Bare, J.C., Pang, W.L., Facciotti, M.T.,
Schmid, A.K., Pan, M., Marzolf, B., Van, P.T., Lo, F.Y.,
et al. (2009). Prevalence of transcription promoters
within archaeal operons and coding sequences. Mol.
Syst. Biol. 5, 285.
Kolmeder, C.A., de Been, M., Nikkilä, J., Ritamo, I., Mättö,
J., Valmu, L., Salojärvi, J., Palva, A., Salonen, A., and
de Vos, W.M. (2012). Comparative metaproteomics
and diversity analysis of human intestinal microbiota
testifies for its temporal stability and expression of core
functions. PLoS One 7, e29913.
Kumar, S., Tao, C., Chien, M., Hellner, B., Balijepalli,
A., Robertson, J.W., Li, Z., Russo, J.J., Reiner, J.E.,
Kasianowicz, J.J.,and Ju, J. (2012). PEG-labeled nucle-
otides and nanopore detection for single molecule
DNA sequencing by synthesis. Sci. Rep. 2, 684.
Lammle, K., Zipper, H., Breuer, M., Hauer, B., Buta, C.,
Brunner, H., and Rupp, S. (2007). Identification of
novel enzymes with different hydrolytic activities by
metagenome expression cloning. J. Biotechnol. 127,
575–592.
Lee, D.G., Jeon, J.H., Jang, M.K., Kim, N.Y., Lee, J.H.,
Lee, J.H., Kim, S.J., Kim, G.D., and Lee, S.H. (2007).
Screening and characterization of a novel fibrinolytic
metalloprotease from a metagenomic library. Biotech-
nol. Lett. 29, 465–472.
van Leeuwenhoek, A. (1677) Concerning little animals
by him observed in rain-, well-, sea- and snow-water;
as also in water wherein pepper had lain infused. Phil.
Trans. Royal Soc. London 12, 821–831.
Leigh,M.B.,Pellizari,V.H.,Uhlik,O.,Sutka,R.,Rodrigues,
J., Ostrom, N.E., Zhou, J., and Tiedje, J.M. (2007).
Biphenyl-utilizing bacteria and their functional genes
in a pine root zone contaminated with polychlorinated
biphenyls (PCBs). ISME J. 1, 134–148.
Liu, J.M., and Camilli, A. (2011). Discovery of bacterial
sRNAs by high-throughput sequencing. Methods Mol.
Biol. 733, 63–79.
Liu, J.M., Livny, J., Lawrence, M.S., Kimball, M.D., Waldor,
M.K., and Camilli, A. (2009). Experimental discovery
of sRNAs in Vibrio cholerae by direct cloning, 5S/
tRNA depletion and parallel sequencing. Nucleic
Acids Res. 37, e46.
Liu, W.-T., Marsh, T.L., Cheng, H., and Forney, L. (1997).
Characterization of microbial diversity by determining
terminal restriction fragment length polymorphisms
of genes encoding 16SrRNA. Appl. Environ. Micro-
biol. 63, 4516–4522.
McGrady-Steed, J., Harris, P.M., and Morin, P.J. (1997).
Biodiversity regulates ecosystem predictability. Nature
390, 162–165.
McGrath, P.T., Lee, H., Zhang, L., Iniesta, A.A., Hottes,
A.K., Tan, M.H., Hillson, N.J., Hu, P., Shapiro, L., and
McAdams, H.H. (2007). High-throughput identifica-
tion of transcription start sites, conserved promoter
motifs and predicted regulons. Nat. Biotech. 25,
584–592.
Mader, U., Nicolas, P., Richard, H., Bessieres, P., and
Aymerich, S. (2011). Comprehensive identification
and quantification of microbial transcriptomes by
genome-wide unbiased methods. Curr. Opin. Biotech-
nol. 22, 32–41.
Mao, C., Evans, C., Jensen, R.V., and Sobral, B.W. (2008).
Identification of new genes in Sinorhizobium meliloti
using the Genome Sequencer FLX system. BMC
Microbiol. 8, 72.
Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader,
J.S., Bemben, L.A., Berka, J., Braverman, M.S., Chen,
Y.-J., Chen, Z., et al. (2005). Genome sequencing in
microfabricated high-density picolitre reactors. Nature
437, 376–380.
Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M., and
Gilad, Y. (2008). RNA-seq: an assessment of technical
reproducibility and comparison with gene expression
arrays. Genome Res. 18, 1509–1517.
Maron, P.A., Ranjard, L., Mougel, C., and Lemanceau, P.
(2007). Metaproteomics: a new approach for study-
ing functional microbial ecology. Microb. Ecol. 53,
486–93.
de Meeus, T., and Renaud, F. (2002). Parasites within the
new phylogeny of eukaryotes. Trends Parasitol. 18,
247–251.
Mokili, J.L., Rohwer, F., and Dutilh, B.E. (2012). Metagen-
omics and future perspectives in virus discovery. Curr.
Opin. Virol. 2, 63–77.
Moore, R.A., Warren, R.L., Freeman, J.D., Gustavsen, J.A.,
Chénard, C., Friedman, J.M., Suttle, C.A., Zhao, Y., and
Holt, R.A. (2011). The sensitivity of massively parallel
36. Microbes: An Unseen Majority Around Us | 21
sequencing for detecting candidate infectious agents
associated with human tissue. PLoS One 6, e19838.
Moran, M.A. (2009). Metatranscriptomics: eavesdrop-
ping on complex microbial communities. Microbe 4,
329–335.
Moreno-Paz, M., Gómez, M.J., Arcas, A., and Parro,
V. (2010). Environmental transcriptome analysis
reveals physiological differences between biofilm and
planktonic modes of life of the iron oxidizing bacteria
Leptospirillum spp. in their natural microbial commu-
nity. BMC Genomics 11, 404.
Motti, C. (2012). Environmental marine metabolomics:
from whole organism system biology to ecosystem
management. J. Mar. Sci. Res. Dev. 2, 3.
Moyer, C.L., and Morita, R.Y. (2007). Psychrophiles and
Psychrotrophs. In Encyclopedia Life Sciences (John
Wiley Sons Ltd, Chichester, UK). Available from:
http://www.els.net [doi: 10.1002/9780470015902.
a0000402.pub2].
Muth, T., Benndorf, D., Reichl, U., Rapp, E., and Martens,
L. (2012). Searching for a needle in a stack of needles:
challenges in metaproteomics data analysis. Mol. Bio-
syst. 9, 578–585.
Naeem, S., and Li, S.B. (1997). Biodiversity enhances
ecosystem reliability. Nature 390, 507–509.
Nagalakshmi, U., Wang, Z., Waern, K., Shou, C., Raha, D.,
Gerstein, M., and Snyder, M. (2008). The transcrip-
tional landscape of the yeast genome defined by RNA
sequencing. Science 320, 1344–1349.
Newman, D.J., and Cragg, G.M. (2007). Natural products
as sources of new drugs over the last 25 years. J. Nat.
Prod. 70, 461–477.
Ohkuma, M. (2003). Termite symbiotic systems: efficient
bio-recycling of lignocellulose. Appl. Microbiol. Bio-
technol. 61, 1–9.
Ortmann, A.C., and Suttle, C.A. (2009). Determination
of virus abundance by epifluorescence microscopy.
Methods Mol. Biol. 501, 87–95.
Oshlack, A., and Wakefield, M.J. (2009). Transcript length
bias in RNAseq data confounds systems biology. Biol.
Direct. 4, 14.
Pace, N.R. (1997). A molecular view of microbial diversity
and the biosphere. Science 276, 734–740.
Pan, S.Y., Pan, S., Yu, Z.L., Ma, D.L., Chen, S.B., Fong, W.F.,
Han, Y.F., and Ko. K.M.. (2010). New perspectives
on innovative drug discovery: an overview. J. Pharm.
Pharm. Sci. 13, 450–71.
Parro, V., Moreno-Paz, M., and González-Toril, E. (2007).
Analysis of environmental transcriptomes by DNA
microarrays. Environ. Microbiol. 9, 453–64.
Passalacqua, K.D., Varadarajan, A., Ondov, B.D., Okou,
D.T., Zwick, M.E., and Bergman, N.H. (2009) Struc-
ture and complexity of a bacterial transcriptome. J.
Bacteriol. 191, 3203–3211.
Petrić, I., Bru, D., Udiković-Kolić, N., Hršak, D., Philippot,
L., and Martin-Laurent F. (2011). Evidence for shifts
in the structure and abundance of the microbial com-
munity in a long-term PCB-contaminated soil under
bioremediation. J. Hazard Mater. 195, 254–60.
Petti, C.A. (2007). Detection and identification of micro-
organisms by gene amplification and sequencing. Clin.
Infect. Dis. 44, 1108–1114.
Pinto, A.C., Melo-Barbosa, H.P., Miyoshi, A., Silva, A., and
Azevedo, V. (2011). Application of RNA-seq to reveal
the transcript profile in bacteria. Genet. Mol. Res. 10,
1707–18.
Poretsky, R.S., Hewson, I., Sun, S., Allen, A.E., Zehr, J.P.,
and Moran, M.A. (2009). Comparative day/night
metatranscriptomic analysis of microbial communities
in the North Pacific Subtropical Gyre. Environ. Micro-
biol. 11, 1358–1375.
Porter, K.G., and Feig, Y.S. (1980). The use of DAPI for
identifying and counting aquatic microflora. Limnol.
Oceanogr. 25, 943–948.
Prakash, T., and Taylor, T.D. (2012). Functional
assignment of metagenomic data: challenges and
applications. Brief. Bioinf. 13, 711–27.
Psenner, R. (1990). From image analysis to chemical anal-
ysis of bacteria: a long-term study? Limnol. Oceanogr.
35, 234–237.
Ram, R.J., Verberkmoes, N.C., Thelen, M.P., Tyson,
G.W., Baker, B.J., Blake II, R.C., Shah, M., Hettich,
R.L., and Banfield, J.F. (2005). Community prot-
eomics of a natural microbial biofilm. Science 308,
1915–1920.
Rappe, M.S., and Giovannoni, S.J. (2003). The uncul-
tured microbial majority. Annu. Rev. Microbiol. 57,
369–394.
Rasmussen, S., Nielsen, H.B., and Jarmer, H. (2009). The
transcriptionally active regions in the genome of Bacil-
lus subtilis. Mol. Microbiol. 73, 1043–1057.
Raz, T., Kapranov, P., Lipson, D., Letovsky, S., Milos, P.M.,
and Thompson, J.F. (2011). Protocol dependence of
sequencing-based gene expression measurements.
PLoS One 6, e19287.
Roh, S.W., Abell, G.C., Kim, K.H., Nam, Y.D., and Bae,
J.W. (2010). Comparing microarrays and next-gener-
ation sequencing technologies for microbial ecology
research. Trends Biotechnol. 28, 291–299.
Roselló-Mora, R., and Amann, R. (2001). The species
concept for prokaryotes. FEMS Microbiol. Rev. 25,
39–67.
Rothberg, J.M., and Leamon, J.H. (2008). The develop-
ment and impact of 454 sequencing. Nat. Biotechnol.
26, 1117–1124.
Rothschild, L.J., and Mancinelli, R.L. (2001). Life in
extreme environments. Nature 409, 1092–1101.
Sanger, F., Nicklen, S., and Coulson, A.R. (1977). DNA
sequencing with chain-terminating inhibitors, Proc.
Natl. Acad. Sci. U.S.A. 74, 5463–5467.
Schleifer, K.H. (2009). Classification of Bacteria and
Archaea: past, present and future. Syst. Appl. Micro-
biol. 32, 533–542.
Schmid, A., Dordick, J.S., Hauer, B., Kiener, A., Wubbolts,
M., and Witholt, B. (2001). Industrial biocatalysis
today and tomorrow. Nature 409, 258–268.
Schmidt, T.M., DeLong, E.F., and Pace, N.R. (1991).
Analysis of a marine picoplankton community by 16S
rRNA gene cloning and sequencing. J. Bacteriol. 173,
4371–4378.
Schneider, M.V., and Orchard, S. (2011). Omics technolo-
gies, data and bioinformatics principles. Methods Mol.
Biol. 719, 3–30.
37. Jiwaji et al.
22 |
Schopf, J.W. (2001) Cradle of Life – The Discovery of
Earth’s Earliest Fossils. (New Jersey, USA: Princeton
University Press).
Selinger, D.W.,Cheung, K.J., Mei, R., Johansson, E.M.,
Richmond, C.S., Blattner, F.R., Lockhart, D.J., and
Church, G.M. (2000). RNA expression analysis using
a 30 base pair resolution Escherichia coli genome array.
Nat. Biotech. 18, 1262–1268.
Seshardi, R., Kravitz, S.A., Smarr, L., Gilna, P., and Frazier,
M. (2007). CAMERA: A Community Resource for
Metagenomics. PLoS Biol. 5, e75.
Shanmughapriya, S., Kiran, G.S., Selvin, J., Thomas,
T.A., and Rani, C. (2009). Optimization, purifica-
tion,andcharacterizationofextracellularmesophilic
alkaline cellulase from sponge-associated Marino-
bacter sp. MSI032. Appl. Biochem. Biotechnol. 162,
625–640.
Sharma, C.M., Hoffmann, S., Darfeuille, F., Reignier, J.,
Findeiss, S., Sittka, A., Chabas, S., Reiche, K., Hack-
ermuller, J., Reinhardt, R., Stadler, P.F., and Vogel,
J. (2010). The primary transcriptome of the major
human pathogen Helicobacter pylori. Nature 464,
250–255.
Shendure, J., and Ji, H. (2008). Next-generation DNA
sequencing. Nat. Biotechnol. 26, 1135–1145.
Siggins, A., Gunnigle, E., and Abram, F. (2011). Explor-
ing mixed microbial community functioning: recent
advances in metaproteomics. FEMS Microbiol. Ecol.
80, 265–280.
Simon, M., and Azam, F. (1989). Protein content and
protein synthesis rates of planktonic marine bacteria.
Mar. Ecol. Progr. Ser. 51, 201–213.
Sogin, M.L., Morrison, H.G., Huber, J.A., Welch, D.M.,
Huse, S.M., Neal, P.R., Arrieta, J.M., and Herndl, G.J.
(2006). Microbial diversity in the deep sea and the
underexplored ‘rare biosphere’. Proc. Natl. Acad. Sci.
U.S.A. 103, 12115–12120.
Sorek, R., and Cossart, P. (2010). Prokaryotic transcrip-
tomics: a new view on regulation, physiology and
pathogenicity. Nat. Rev. Genet. 11, 9–16.
Stewart, E.J. (2012). Growing unculturable bacteria. J.
Bacteriol. 194, 4151–60.
Streit, W.R., Daniel, R., and Jaeger, K.E. (2004). Pros-
pecting for biocatalysts and drugs in the genomes of
non-cultured microorganisms. Curr. Opin. Biotech-
nol. 15, 285–290.
Stres, B. (2007). The relationship between total and cul-
turable bacteria in cold soils. Acta agriculturae Slov. 90,
25–31.
Struvay, C., and Feller, G. (2012). Optimization to Low
Temperature Activity in Psychrophilic Enzymes. Int. J.
Mol. Sci. 13, 11643–11665.
Suenaga, H., Ohnuki, T., and Miyazaki, K. (2007). Func-
tional screening of a metagenomic library for genes
involved in microbial degradation of aromatic com-
pounds. Environ. Microbiol. 9, 2289–2297.
Synowiecki, J. (2010). Some applications of thermophiles
and their enzymes for protein processing. Afr. J. Bio-
technol. 9, 7020–7025.
Thomas, R.K., Nickerson, E., Simons, J.F., Jänne, P.A.,
Tengs, T., Yuza, Y., Garraway, L.A., LaFramboise, T.,
Lee, J.C., Shah, K., et al. (2006). Sensitive mutation
detection in heterogeneous cancer specimens by mas-
sively parallel picoliter reactor sequencing. Nat. Med.
12, 852–855.
Thomson, B., Ostle, N., McNamara, N., Bailey, M., White-
ley, A., and Griffiths, R. (2010). Vegetation affects the
relative abundances of dominant soil bacterial taxa
and soil respiration rates in an upland grassland soil.
Microb. Ecol. 59, 335–343.
Toledo-Arana, A., Dussurget, O., Nikitas, G., Sesto, N.,
Guet-Revillet, H., Balestrino, D., Loh, E., Gripenland,
J., Tiensuu, T., Vaitkevicius, K., et al. (2009). The Lis-
teria transcriptional landscape from saprophytism to
virulence. Nature 459, 950–956.
Torres, T.T., Metta, M., Ottenwalder, B., and Schlotterer,
C. (2008). Gene expression profiling by massively
parallel sequencing, Genome Res. 18, 172–177.
Tuffin, M., Anderson, D., Heath, C., and Cowan, D.A.
(2009). Metagenomic gene discovery: how far have
we moved into novel sequence space? Biotechnol. J. 4,
1671–83.
Uchiyama, T., and Watanabe, K. (2007). The SIGEX
scheme: high throughput screening of environmental
metagenomes for the isolation of novel catabolic
genes. Biotechnol. Genet. Eng. Rev. 24, 107–116.
Uchiyama, T., Abe, T., Ikemura, T., and Watanabe, K.
(2005). Substrate-induced gene-expression screening
of environmental metagenome libraries for isolation of
catabolic genes. Nat. Biotechnol. 23, 88–93.
Ulukanli, Z., and Digrak, M. (2002). Alkaliphilic Micro-
organisms and Habitats. Turk. J. Biol. 26, 181–191.
Umrania, V.V. (2006). Bioremediation of toxic heavy
metals using acidothermophilic autotrophies. Biore-
sour. Technol. 97, 1237–1242.
Venter, J.C., Remington, K., Heidelberg, J.F., Halpern,
A.L., Rusch, D., Eisen, J.A., Wu, D., Paulsen, I., Nelson,
K.E., Nelson, W., et al. (2004). Environmental genome
shotgun sequencing of the Sargasso Sea. Science 304,
66–74.
Vieites, J.M., Guazzaroni, M.E., Beloqui, A., Golyshin,
P.N.,andFerrer,M.(2009).Metagenomicsapproaches
in systems microbiology FEMS Microbiol. Rev. 33,
236–255.
van Vliet, A.H. (2010). Next generation sequencing of
microbial transcriptomes: challenges and opportuni-
ties. FEMS Microbiol. Lett. 302, 1–7.
Wang, H.B., Zhang, Z.X., Li, H., He, H.B., Fang, C.X.,
Zhang, A.J., Li, Q.S., Chen, R.S., Guo, X.K., Lin,
H.F., et al. (2011). Characterization of metaprot-
eomics in crop rhizospheric soil. J. Proteome Res. 10,
932–940.
Wang, Z., Gerstein, M., and Snyder, M. (2009). RNA-Seq:
a revolutionary tool for transcriptomics. Nat. Rev.
Genet. 10, 57–63.
Wanunu, M. (2012). Nanopores: A journey towards DNA
sequencing. Phys. Life Rev. 9, 125–58.
Wasmund, K., Burns, K.A., Kurtboke, D.I., and Bourne,
D.G. (2009). Novel alkane hydroxylase gene (alkB)
diversity in sediments associated with hydrocarbon
seeps in the Timor Sea, Australia. Appl. Environ.
Microbiol. 75, 7391–7398.
Wilson, E.O. (1997). Introduction. in biodiver-
sity. In Understanding and Protecting our
38. Microbes: An Unseen Majority Around Us | 23
Biological Resources, Reaka-Kudla, M.L., Wilson,
D.E., and Wilson, E.O., eds. (Washington DC, USA:
Joseph Henry Press) pp. 1–3.
Wintzingerode, F., Gobel, U.B., and Stachebrandt, E.
(1997). Determination of microbial diversity in
environmental samples: pitfalls of PCR based rRNA
analysis. FEMS Microbiol. Rev. 21, 213–229.
Woese, C.R. (1987). Bacterial evolution. Microbiol. Rev.
51, 221–271.
Woese, C.R., Kandler, O., and Wheelis, M.L. (1990).
Towards a natural system of organisms: Proposal for
the domains Archaea, Bacteria, and Eucarya. Proc.
Natl. Acad. Sci. U.S.A. 87, 4576–4579.
Wooley, J.C., and Ye, Y. (2009). Metagenomics: Facts and
Artifacts, and Computational Challenges. J. Comput.
Sci. Technol. 25, 71–81.
Wurtzel, O., Sapra, R., Chen, F., Zhu, Y., Simmons,
B.A., and Sorek, R. (2009). A single-base resolution
map of an archaeal transcriptome Genome Res. 20,
133–141.
Xu, C., Liu, L., Zhang, Z., Jin, D., Qiu, J., and Chen, M.
(2013) Genome-scale metabolic model in guiding
metabolic engineering of microbial improvement.
Appl. Microbiol. Biotechnol. 97, 519–539.
Xu, M., Xiao, X., and Wang, F. (2008). Isolation and
characterization of alkane hydroxylases from a
metagenomic library of Pacific deep-sea sediment.
Extremophiles 12, 255–262.
Xu, M., Wu, W.M., Wu, L., He, Z., Van Nostrand, J.D.,
Deng, Y., Luo, J., Carley, J., Ginder-Vogel, M., Gentry,
T.J., et al. (2010) Responses of microbial community
functional structures to pilot-scale uranium in situ
bioremediation. ISME J. 4, 1060–1070.
Yeh, W.-K., Yang, H.-C., and McCarthy, J.R. (2011).
Enzyme Technologies: Metagenomics, Evolution,
Biocatalysis, and Biosynthesis (New Jersey, USA: John
Wiley Sons, Inc.)
Yergeau, E., Kang, S., He, Z., Zhou, J., and Kowalchuk,
G.A. (2007). Functional microarray analysis of nitro-
gen and carbon cycling genes across an Antarctic
latitudinal transect. ISME J. 1, 163–179.
Yergeau, E., Bokhorst, S., Kang, S., Zhou, J., Greer, C.W.,
Aerts, R., and Kowalchuk, G.A. (2012). Shifts in soil
microorganisms in response to warming are consistent
across a range of Antarctic environments. ISME J. 6,
692–702.
Yoder-Himes, D.R., Chain, P.S., Zhu, Y.,Wurtzel, O.,
Rubin, E.M., Tiedje, J.M., and Sorek, R. (2009). Map-
ping the Burkholderia cenocepacia niche response via
high-throughput sequencing. Proc. Natl. Acad. Sci.
U.S.A. 106, 3976–3981.
Zhu, L., Zhu, Y., Zhang, Y., and Li, Y. (2012). Engineering
the robustness of industrial microbes through syn-
thetic biology. Trends Microbiol. 20, 94–101.
Zillig, W. (1991). Comparative biochemistry of Archaea
and Bacteria. Curr. Opin. Genet. Dev. 1, 544–551.
40. 2
Prokaryotic Genome Sequencing and
Assembly
Morag Graham, Gary Van Domselaar and Paul Stothard
Abstract
Researchers can now readily obtain millions of
sequencereadsfromthegenomesoftheirfavourite
prokaryotic organisms thanks to the development
of next-generation sequencing technologies.
Through sequence assembly, it is possible to
reconstruct large portions of a genome from the
overlapping sequence reads. However, assembly
is challenging because the sequence reads are
generally quite short and genomes often contain
internally repeated segments that may confound
the complete reconstruction of a genome from its
constituent reads. There are different approaches
for addressing these challenges that involve, for
example, more advanced assembly tools, refer-
ence genome sequences, and directed follow-up
sequencing. Regardless of the strategy employed
there are many steps and programs involved, and
the final outputs need to be annotated and inter-
preted with the known shortcomings of the data
and methodologies in mind.
Introduction
Modern sequence technologies – termed ‘next
generation’ or ‘next-gen’ sequencing (NGS) –
have revolutionized the field of biology with
their ability to rapidly and cheaply generate vast
amounts of genomic sequence data. Up until
the last half-decade, generating whole-genomic
sequencedatahasrequiredsignificantinvestments
of time and resources generally available only at
large sequencing centres; today NGS allows even
small laboratories to routinely generate genomic
sequence data for organisms under study. The
widespread adoption of these technologies has
created tremendous new opportunities for bio-
logical research, but also new challenges owing
to the bioinformatics required to process the vast
amounts of sequence data that the platforms so
readily generate.
Current NGS technologies generate read
lengths that sample merely a small fraction of the
genome size of most microorganisms. Although
it is possible to identify and annotate coding
sequence regions and other features of interest on
individual reads (as is often done in the context
of metagenomics research), there is much to be
gained from combining overlapping reads into
larger contiguous sequences, or contigs, through
a process called sequence assembly. The creation
of contigs allows for more accurate and complete
annotation of genomic features, and also permits
more in-depth analyses of sequence evolution,
gene structure, and various other sequence prop-
erties and features (see Chapter 4 for further
discussion). Initial sequence assembly approaches
used a labour intensive technique called primer
walking to iteratively extend and connect contigs.
Later, a higher throughput, automated approach
called shotgun sequence assembly was introduced.
Today, most genome sequencing projects primar-
ily adopt the shotgun sequencing strategy or a
hybrid strategy utilizing both shotgun sequencing
and primer walking to acquire genomes. Depend-
ing on the goals of the sequencing project and the
availability of a closely related reference genome
sequence, so-called reference mapping may be
employable, in which sequence reads from the
newly sequenced genome are aligned and mapped
to the reference genome to identify differences
(and similarities) in the target genome.
41. Graham et al.
26 |
In this chapter we provide an overview of
sequencing technologies as well as the down-
streamprocessingstepsthatcanbeusedtoconvert
the raw reads into larger contigs suitable for anno-
tation of genes and other genomic features (Fig.
2.1). Strategies for specific sequencing objectives
are presented, which include suggestions for
sequencing technology, library types, assembly
methods and software. Given the rapid rate at
which new sequencing technologies and bioinfor-
matics tools are being produced, such strategies
need to be regularly revised, but these guidelines
should serve as a useful starting point. Lastly, the
chapter is intended to introduce the method-
ologies associated with genome sequencing and
assembly; it not intended to be comprehensive
with respect to existing bioinformatics tools.
Sequencing technologies
First generation sequencing
approaches
The first widely adopted approaches to DNA
sequencing were developed in the mid-seventies.
The pioneering method invented by Frederick
Sanger and Alan Coulsen (Sanger and Coulson,
1975), commonly referred to as the Sanger method
or chain-termination sequencing, is based on the
incorporation of chain-terminating 2′,3′-dide-
oxyribonucleotide 5′-triphosphates (also called
dideoxynucleotides; ddNTPs) during replication of
a single-stranded DNA template fragment. In this
approach, the DNA fragment to be sequenced is
separated into four different reactions containing
radiolabelled DNA primer, DNA polymerase,
and one of the four dideoxynucleotides (con-
taining adenosine (ddATP), cytosine (ddCTP),
guanosine (ddGTP), or thymidine (ddTTP),
respectively). A dideoxynucleotide terminates
the chain thereby blocking extension because
other nucleotides cannot bind to it owing to
its 3′-end modification. Thus, the polymerase
extends the labelled DNA primer until a dide-
oxynucleotide is randomly incorporated, at which
point the sequencing reaction is terminated.
These fragments are then denatured, separated by
electrophoresis and visualized using radiography.
The random incorporation of chain-terminating
ddNTPs guarantees that every fragment length
(correspondingtoeverypossiblesinglenucleotide
addition) will be represented as a distinct band in
the electropherogram, thus providing a mecha-
nism to deduce the genomic sequence. A different
approach, developed around the same time by
Alan Maxam and Walter Gilbert (Maxam and Gil-
bert, 1977) employs radiolabelling at the 5′ end of
the DNA fragment to be sequenced, followed by
chemical treatments to cleave the DNA fragment
at specific residues or residue pairs. The fragments
are separated by size using electrophoresis, and
then detected by autoradiography. Analysis of
.
.
.
.
.
.
.
.
.
.
Process
Time
Frame
Microbial
WGS
Project
Flow
Figure 2.1 #5 /%
* ]# #5
#5 *
^ # study design
I * 7
# /%
/%
/% : ) 7
# D$ 7 %
_ ! /%7
7 Sample collection
!
!7 ) ` !
7# . I
#
! !
/% '% D$
/% culture
7 % 5
% % ! NGS
library construction sequencing are described
] :00 7data analysis7
% j 7 %
) /% % %
:0
42. Genome Sequencing and Assembly | 27
the fragment sizes permits reconstruction of the
DNA fragment sequence. A large advancement
was achieved in 1986 with the introduction of
fluorescent labels to replace the radiolabelled
DNA (Smith et al., 1986). In addition to being
safer than radiolabelling, fluorescent labels facili-
tate detection by optical systems that are more
amenable to automation and miniaturization.
The chemical cleavage approach developed by
Maxam and Gilbert was initially more popular
than the biochemical chain-terminator approach
developed by Sanger and Coulson. However,
improvements to the Sanger method, such as the
introduction of dye-terminator sequencing where
the ddNTPs are fluorescently labelled (instead of
the DNA primer) allowing sequencing to occur
in a single reaction, ultimately made the Sanger
approach faster, easier to use, and better suited
to automated sequencing. Today, the majority of
first-generation automated sequence analysers use
the dye-terminator sequencing approach.
Automated sequencers employing Sanger
sequencing chemistry, commonly called Sanger
sequencers, apply capillary electrophoresis separa-
tion and fluorescence detection to generate DNA
sequences. Completed sequencing reactions
undergo electrokinetic injection into the capil-
lary sequencer, which separates the fluorescently
labelled, dye-terminated sequences based on frag-
ment size. The instrument records the fluorescent
signals from each capillary, generating sequence
traces, or chromatograms, that show the signal
detected simultaneously for each of the four
possible terminating bases at each position. On-
board software performs the signal processing for
nucleotide base calling and confidence scoring.
High quality traces, typically 300–1200bases in
length, are automatically converted into DNA
sequence and reported as a string of nucleotides
(a sequence read) along with their associated
quality scores (often called Phred scores) (Ewing
and Green, 1998; Ewing et al., 1998). A typical
modern automated Sanger sequencer can accom-
modate 96 or 384 DNA samples at a time and
can run unattended with multiple sample plates
queued, generating a theoretical throughput
of nearly 1Mb per day at a cost of ~$2500 per
Mb. Despite having been largely supplanted by
next-generation sequencing (NGS), and being
dramatically less cost-effective per base than NGS,
the Sanger method is still the preferred approach
for many small-scale projects, and remains in
widespread use today.
Next-generation sequencing (NGS)
approaches
The first genome to be sequenced was bacte-
riophage PhiX174 (Sanger et al., 1977). This viral
genome, with only 5375bp, is within the range of
early (pre-automated) sequencing technologies.
Prokaryotic genomes, with sizes ranging from
0.5–10Mb, require higher throughput automated
methods. Advances in sequence analysis made
duringtheearly1990sincreasedthethroughputof
automated Sanger sequence analysers sufficiently
to make practical the generation of whole-genome
sequences of prokaryotes, and even higher organ-
isms, including the first 3.3Gb draft human
genome, completed in 2000 (Lander et al., 2001;
Venter et al., 2001). However, these early efforts
required massive resources to complete, in some
cases requiring hundreds of scientists, thousands
of Sanger sequencing instruments, billions of
dollars, and many years of labour. The continuous
demand for low-cost, low-labour, whole-genome
sequence data combined with the limitations of
automated Sanger sequencing drove the devel-
opment of newer next-generation sequencing
(NGS) platforms, of which a variety are now
commercially available. In these, nanofluidic
mechanisms provide small-volume reactions for
economical molecular sequencing at a massively
parallel scale.
NGS occurs via in situ sequencing by synthesis
(SBS). Most platforms apply a DNA polymerase
or ligase enzymes to simultaneously synthesize
the new DNA strands. SBS most often occurs
on multiple identical copies of a DNA template
molecule, usually amplified as a group on beads
(or an isolated surface). Alternatively, SBS can
occur as a single molecule-based process in which
sequencing detection occurs on a single molecule.
Real-time SBS may occur, in which a free-running
DNA polymerase is given all nucleotides required.
More often, SBS platforms control the sequenc-
ing process in a ‘stop-and-go’ iterative fashion
– in order to assist with the identification of the
incorporated nucleotide or oligonucleotide in
43. Graham et al.
28 |
the growing strand. This can be accomplished by
supplying substrates (nucleotides or short oligo-
nucleotides) that are modified with an identifying
tag (such as a fluorophore) and also reversibly
blocked or else by providing only a single kind of
substrate (e.g. dATP) at a time. Nucleotides and
reagents are cyclically and individually washed
across the immobilized templates, whereupon the
specific incorporation of nucleotide(s) is detected
by high-resolution digital imaging. Thereafter,
reagents are washed away and a new reagent cycle
is initiated. Thus, most NGS is currently achieved
on spatially distinct, immobilized DNA template
libraries via cycles of nucleotide incorporation/
washing/imaging. In contrast, single-molecule
sequencing (SMS) requires no such nucleotide/
washing/imaging cycles; instead SMS directly
monitors and detects nucleotide addition on a
single DNA fragment strand. Other proprietary
features of each NGS technology are related to the
specific automated fluidic and optic technologies
applied to capture each nucleotide series for each
template library. Lastly, it is worth mentioning
that regeneration of raw image data is less expen-
sive than storage of raw data owing to the sheer
size of NGS image data files produced; hence the
collected raw data files are generally deleted once
processed.
All current commercial NGS technologies
require DNA template preparation (NGS library
construction), immobilization of said libraries,
followed by massively parallel sequencing. The
overall NGS procedure is summarized in Fig.
2.2. Proprietary nuances rest in how the stages
are achieved. In general terms, bacterial genome
sequencing is accomplished by highly redundant/
deep sequencing of random fragments making up
the genome (Fig. 2.3). Random genomic library
sequencing is sometimes referred to as ‘shotgun
sequencing’, and is often achieved with ‘single-end’
sequencing of the library fragments (although
not always – see below). In single-end sequenc-
ing, the library fragment is sequenced exclusively
from one direction. Random fragmentation of
genomic DNA is achieved using mechanical
shearing, sonication or enzymatic processes. The
random DNA fragment ends are then made ame-
nable to immobilization and/or sequencing via
end polishing and incorporation of universal but
platform-specific sequences (called ‘adaptors’).
Following immobilization, each adaptor-contain-
ing template library most often requires clonal
amplification to achieve sufficient quantities prior
to massively parallel sequencing and detection.
Amplification can be performed on tethered
libraries or in solution, but the efficiency of any
NGS platform is tied to the relative assurance that
each amplified library clone contains multiple
copies of only a single and identical DNA frag-
ment. The sole exception is the single-molecule
sequencing approach commercialized as PacBio
RS by Pacific Biosciences, which as the name sug-
gests operates on a single DNA fragment.
Modification of the standard ‘shotgun’ tem-
plate library procedure can be applied to generate
mate-pair libraries, comprised of originally dis-
continuous sequence fragments from the genome
separated by intervening nucleotides that are later
enzymatically linked together in order to undergo
sequencing. As such, mate-pair sequencing
™ TEMPLATE QC PREPARATION
(randomly shear; end repair; size select)
™ NGS LIBRARY CONSTRUCTION
(append sequencing adaptors + multiplex identifiers)
™ CLONAL AMPLIFICATION of Libraries
(layout library on sequencing slide/wells)
™ MASSIVELY PARALLEL SEQUENCING
(determine order identity of bases [at either end] of fragment)
™ RAW DATA ANALYSIS
(image processing + base calling, with associated qualities)
Figure 2.2 D) 5 /%
!!#I 7 ) 5 *
% . % I
template QC and preparation7D
./%
# I NGS library construction7
D/% !
/% = I clonal
7
%
% %/%
/% D =
/%$$/% I massively parallel
sequencing !%
/% z raw data analysis7
/% %
/% D$ #
/% % %
44. Genome Sequencing and Assembly | 29
provides a cost-efficient means of co-sequencing
discontinuous fragments (mates) from across
the genome. Knowing that the mated reads must
face each other (in most cases) across an average
known gap size, even without determining the
intervening sequence within the gap the mated
reads can be applied to generate a ‘genomic scaf-
fold’ that is useful for orienting gapped contigs
(Fig. 2.3). Paired-end sequencing refers to the
sequencing of both ends of DNA library frag-
ments and is thus performed when mate-pair
libraries are used. Paired-end sequencing can also
be applied to fragments prepared through stand-
ard, non-mate-pair (i.e. shotgun) template library
procedures. In this case, the paired reads are from
each end of the same NGS library fragment, and
are thus separated on the original genome by
only the small NGS fragment insert size. In the
literature the term ‘paired-end’ is sometimes used
to refer exclusively to this latter scenario; some-
times for long-insert, mate-pair libraries (Roche
pyrosequencing). Owing to differences in library
Paired-end
sequencing
Genomic DNA
Generate random
genomic fragments
Genomic scaffold
Mate-pair reads
Contigs
Single-end
sequencing
“draft” genome
“finished” genome
Gap-filling
Sequence polishing
De novo assembly
Mate pair library
protocol
NGS library construction
Figure 2.3 z 7 7 =
=%% D$ /% D
! D$ 7
=D{ | % .D$ $/% % single-
endpaired-end/% # # {mate-pair|
# 5 /% # 7 5 %%
# ! % . # /% ) . # 5
% %| % 5 5
:;; de novo assembled
{ |% { |
^7 % 7% % * 7# in
silico . Y) ! %
# ! (' = # % 5
D # . `% ! ] . /%
% ' ;7:;$% % /% %
_/% 7# !=!
{ |' %%= %
) 7%
45. Graham et al.
30 |
preparation protocols and sequencing method-
ologies, the expected orientations of the reads
with respect to the underlying genome can differ.
It is important to be aware of these methodologi-
cal and technology-specific details when working
with the sequence reads. It is also important to
note that discovered inconsistencies between
mate-pair read orientations and a given reference
sequence may indicate structural variation in
the new sequence or the presence of erroneous
chimeric pairings inadvertently generated during
library preparation (see ‘Evaluating a genome
assembly’ and Chapter 3 for more detail).
Failure to incorporate a single nucleotide in
a given NGS cycle may result in off-phasing, in
which some library molecules lag in their exten-
sion. When too many nucleotides are added, it
is called pre-phasing. Pre-phasing and off-phasing
cause the extracted signal intensities for a given
cycle to be muddled with noise from the preced-
ing or following cycles of sequencing, respectively.
Pre-phasing and off-phasing contribute to loss of
synchrony in the readout of the sequence copies
for a given library fragment. The detected signals
associated with the growing DNA strand become
increasingly muddled over time, resulting in a
quality drop-off for the called bases. Sequence
base calling accuracy generally declines as the
NGS read length increases; eventually no bases
can be accurately called. As phasing susceptibil-
ity varies with each NGS technology, the various
NGS platforms vary in the lengths and qualities of
their output sequence reads, and whether all reads
are of the same or of variable lengths. Owing to
this quality variation, it is important and neces-
sary to assess data quality at multiple instances
within any NGS project (Fig. 2.4).
With the caveat that this chapter merely
represents a snapshot in time – as dramatic
advancements in sequencing technology are con-
tinuously occurring – here we review, in general
terms, the main commercialized NGS technol-
ogy platforms in use today. Table 2.1 contains a
summary of the performance metrics (current at
Primary
Analysis
Analysis of hardware-generated data,
sequencing run statistics, etc.
Generate sequence reads quality scores
Secondary
Analysis
QA QC filtering of raw reads (read cleaning)
Alignment + mapping or de novo assembly of reads
QA variant calling on aligned + mapped reads
Tertiary
Analysis
Multi-sample processing
QA/QC of variant calls; mate-pair/assembly assessment
Genome annotation; feature identification
Annotation filtering of variants
Data aggregation curation; data improvement
Interpretation / hypothesis generation
The “sense making”
Figure 2.4 !! ) 5 /% * I primary
analysis7 # 5 7% /% % I secondary
analysis7 /% /% % % /% /%
% = % /% # !/%
%de novo I tertiary
analysis7# % 7 % 5 7 %
= 7 /% = ! %
46. Genome Sequencing and Assembly | 31
Table 2.1 ' ) 5 /% Readers are !#
z% et al., 0G-0j z et al., 0G-0j }% et al., 0G-0 % % latest =
as = !! B are % # ! #5
/% assembly a 5
$ 5%
5
/%
$ %
/% (/%
$/%
by
$/%
by
)
I % ( RS
IIc ( =
I
( z
GS :z~ 3_
B$/ 0GGG,
$/
I%
I
$,z
33GG€
I ;H;G~z
Read
3GGG !
% 0G,GGG
;3`GG H3G !
% -GGG
0-GG
B$/,
003G
$/
H3
03G
mate
GG`FGG
%
FFFFF‚
}3G # CSS
F9‚
}0G
FFF‚
}0G
F9‚
};G
FFF‚
ƒ};G,
FFFF‚ #
ECC %
FFFFF‚
}3G
Reads % a ;3,GGG reads,
% H3,GGG
-0-G6 --G6 @GGG-G6
B$/,
-H-G6
$/
GG-G6 F@ ;9
-@ %
be /%%
%
Time % b
;G`-0G 0 0 11days
B$/, 1day
$/
H`-0 ;
'
-5 bases
$„
„0 „- „-G „GG3
„G-3
„G-; „0GG
! D =
: Y
are
z
$ % %
!
D base
=
z#
Small
:
D 5
!
z reads B
%%
B
%
z#
I
:#'
%
B /% ,
6 D$
=
! I % size
D
mated
z# yield at
%
B
Y) !
/%
B
Y) !
B$/
Hybrid
assemblies
Y) !
$
Hybrid
assemblies
Y) !
/%
I
/%
*
a
Reads % # =% %
b
% times are data
c
( = % ( RS II # be !
the time of writing) for the various sequencing
approaches described in this section. Readers are
also referred to recent reviews (Liu et al., 2012;
Loman et al., 2012; Quail et al., 2012) and the
NGS technology manufacturers for the latest
specifications.
Pyrosequencing
PyrosequencingdiffersfromSangersequencingin
that the former detects direct release of a pyroph-
osphate molecule as a by-product of nucleotide
incorporation; whereas the latter detects a fluores-
cent signal emitted from size-sorted fluorescently
47. Graham et al.
32 |
labelled sequence fragments. The pyrosequencing
technique, first developed in 1996 (Ronaghi et
al., 1996), has been modified for miniaturization
and automation and was incorporated into the
Roche/454 line of Genome Sequencers (GS)
madecommercialin2005(Marguliesetal.,2005).
The commercialized platforms (GS FLX/GS Jr)
are based on the sequencing by synthesis princi-
ple. As previously described, the first step in the
procedure is referred to as template preparation.
Template preparation begins with the random
shearing of whole genomic DNA into random,
shorter fragments, typically by mechanical shear-
ing, nebulization or by enzymatic processes.
Adaptors containing sequences for (A) amplifica-
tion and (B) pyrosequencing are then ligated to
the ends of the fragment. Fragments containing
the A/B adaptors are recovered and captured onto
oligo-adapted capture beads under conditions
that favour the binding of one DNA fragment
molecule per bead.
The bead-fragment library and PCR ampli-
fication reagents are placed into a ‘water-in-oil’
emulsion under conditions that favour one frag-
ment-bound bead per water droplet microreactor.
Emulsion PCR (emPCR) is employed to generate
millions of bead-bound, clonally amplified single
stranded genomic templates in each microreactor.
The emulsion is then broken and the bead-bound
templates recovered. After immobilization
and enrichment, the next GS pyrosequencing
procedural step is the actual sequencing of bead-
bound single stranded DNA templates. DNA
beads are incubated with a mixture containing
DNA polymerase and deposited into a PicoTiter
Plate™ (PTP) containing millions of wells, with
each well accommodating a single sequencing
bead. Enzyme beads containing sulfurylase and
luciferase are then added to the wells, along with
packing beads. On instrument, the four DNA
deoxyribonucleotide triphosphates (dNTPs)
are added sequentially in a controlled flow order
(e.g. T then C, A, G) across the PTP plate, during
which only those dNTPs complementary to the
template strand are incorporated into the growing
strand. Incorporation of one or more nucleotides
liberates hydrogen and pyrophosphate. The gen-
erated pyrophosphate is converted into a light
signal by the enzymatic action of sulfurylase and
luciferase, and the light signal is recorded by a
charge-coupleddevice(CCD)cameraattheendof
every nucleotide flow. The total light signal gener-
ated is proportional to the number of nucleotides
incorporated into the growing strand during the
single flow cycle; thus, homonucleotide stretches
generate an enhanced signal relative to single
nucleotide incorporations, and corresponding to
the number of added nucleotides. The successive
addition of dNTPs creates a series of light signals
for each well, which at the completion of each flow
are recorded initially as TIFF images with signal
quality information. When non-complementary
dNTPs are flowed and are not incorporated, no
light signal is generated for that nucleotide flow
and the well has no light emitted for that flow
cycle. Later, these TIFF images are computation-
ally processed yielding called bases in a sequence
‘flowgram’ with corresponding quality scores for
each called base.
A major disadvantage for pyrosequencing
is difficulty in accurately quantitating signal
strengths for stretches of homonucleotides owing
to a phenomenon of light signal decay; homo-
nucleotide tracts (termed homopolymers) are
accuratelycalledonlytoroughlyeightconsecutive
nucleotides. When inaccurate base calling occurs,
the result is an incorrectly inferred insertion or
deletion (indel) error.
Semiconductor sequencing
Semiconductor sequencing uses chemistry simi-
lar to pyrosequencing to generate sequence reads.
The major difference between the two approaches
is that semiconductor sequencing employs instru-
mentation that detects the produced hydrogen
ions during nucleotide incorporation, rather than
pyrophosphate. The resultant concentration of
hydrogen ion liberated during dNTP incorpora-
tionaffectsthepHwithinthemicrowell,whichisin
turn detected by an ion-sensitive field-effect tran-
sistor. As with pyrosequencing, the concentration
of hydrogen ions is proportional to the number of
incorporated dNTPs in the growing strand, and
as such the semiconductor sequencing platform is
stillsubjecttohomopolymer-inducedindelerrors.
However, as the nucleotide incorporation signal is
measured directly as an electrical pulse, no signal
conversion is required. The resulting benefit is
48. Genome Sequencing and Assembly | 33
substantially reduced run times relative to Roche
pyrosequencing and simplified detector construc-
tion. Although work is under way to enhance
clonal enrichment, semiconductor sequencing
currently still requires emPCR for library enrich-
ment prior to sequencing. The technology has
been incorporated into the Life Technologies
line of commercial sequencers, including the Ion
Torrent Personal Genome Machine®
(PGM™) and
Ion Proton™.
Sequencing by synthesis
As with Roche pyrosequencing and semiconduc-
tor sequencing, Illumina sequencing is based on
theinsitusequencingbysynthesis(SBS)principle.
The main distinct features of the Illumina-based
SBS approach are a solid-support amplification
method deployed for template fragment library
enrichment(termed‘bridge-PCR’)anduseoffluo-
rophore-labelled reversible terminator nucleotide
chemistryduringmassivelyparallelSBS.Template
preparation incorporates universal adaptors that
are complementary to ‘anchor’ oligonucleotides
covalently linked to an immobilized glass surface
termed a ‘flow cell’. Adapted genomic fragment
libraries are then affixed to the glass slide surface
via annealing to these anchor oligos on the flow
cell. After annealing, adaptor-containing template
DNA molecules are then clonally amplified in a
modified, isothermal PCR reaction (the bridge-
PCR step) in which the DNA molecules are free
to flex and form a ‘bridge’ with a second nearby
flow cell functionalized oligonucleotide. Formed
bridges then undergo isothermal extension in situ
ontheglasssurface.This‘clustergeneration’process
generates individualized clusters each containing
approximately 1000 copies of identical, clonally
amplified DNA molecules right on the flow cell
surface, and millions of library clusters with a
diameter of ~1μm per flow cell. DNA clusters are
then readied for sequencing via denaturation to
a single molecular strand (termed ‘linearization’)
followed by blocking of the free 3′ ends of the
cluster fragments and hybridization of a sequenc-
ing primer. During each sequencing cycle, all
four (reversibly) 3′-blocked dNTPs with corre-
sponding fluorescent dye-bound terminators are
simultaneously flowed and incorporated into the
growing nucleotide chain by DNA polymerase at
each cluster. The dye-terminator dNTPs ensure
that each chain is extended by a single nucleotide,
which is identified via its (label) fluorescence
emission. At the end of the flow cycle, the flow cell
is digitally imaged by a CCD and then the fluores-
cent dye is removed by washing, enabling another
sequencing cycle.
Base calling is achieved by measuring the
signal intensity via fluorescence emission at
each cluster during each sequencing cycle. Such
base-by-base sequencing shows fewer base call-
ing errors associated with homonucleotide runs
than pyrosequencing. In contrast, this short read
SBS technology is primarily subject to random
base substitution errors and so-called G-motif
(GGCxG) errors. Sequence reads generated with
SBS technology are shorter than pyrosequencing
reads, but are less expensive to generate and can
be produced readily with higher throughput; fea-
tures that have made this NGS platform extremely
popular. Illumina Inc. markets a number of
sequencers incorporating this technology; popu-
lar models include the HiSeq line of instruments
and the more recently introduced MiSeq Personal
Sequencer.
Sequencing by oligonucleotide ligation
and detection
The SOLiD (Sequencing by Oligonucleotide
Ligation and Detection) system commercialized
by Applied Biosystems applies a collection of
fluorophore-labelled oligonucleotides and liga-
tion enzyme chemistry to achieve sequencing.
Sequencing is actually comprised of multiple
rounds during which eight base probes contain-
ing a fluorophore are sequentially ligated to the
template sequence to build up a complementary
strand. Each round consists of the following: a
priming step; a repeated cycle of ligating probes
to the template; excitation and imaging of the
fluorescence emission; and lastly, cleavage of the
fluorophore and a terminal part of the bound
probe prior to the next cycle of chemistry. Probe/
fluorophore combinations have been designed
to interrogate the first two of the eight ligated
positions in the template, with each of four fluo-
rophore colours used to indicate four of the 16
possible nucleotide pairs at these positions. The
emission of the fluorophore for each template is
49. Graham et al.
34 |
recorded and applied later to infer the nucleotide
sequence of the template strand (output read).
SOLiD template libraries are prepared via
random fragmentation to an appropriate size
range; end repair, followed by ligation of ‘P1’ and
‘P2’ DNA adaptors to the ends of the fragments.
EmPCR is deployed to immobilize the adapted
DNA onto ‘P1’-coated paramagnetic beads. High-
density semi-ordered arrays (polonies) of the
DNA templates are then generated by function-
alizing the 3′-ends of libraries and immobilizing
the beads onto a solid glass slide. Sequencing is
achieved by cyclic ligation of the pool of uniquely
labelled, partially degenerate, fluorescently
labelled DNA octamers, containing all possible
dinucleotide variations of the two-base ‘recogni-
tion core’. When these complementary detection
probes anneal to the immobilized template, they
are ligated to the primer. After imaging the fluores-
cence emission, strands not extended are capped
and the fluorophore and last three bases are
cleaved from the probe. With this cleavage, the
strand is available for extension in the next cycle
of ligation-based sequencing; but now being initi-
ated 5 bases upstream (n minus 5; i.e. n−5) from
the priming site. The repeated ligation of eight
additional bases and the cleaving of a terminal
three bases ensure that the pair of bases of tem-
plate sequence being examined then moves on
by five positions every cycle. After seven cycles,
the complementary strand is melted from the
template to leave the template ready to be primed
for the next round. Each round applies a different
primer such that positions interrogated on the
target library by the probes change each time; for
example, on the first round the first position of the
probes on cycles 1, 2, … corresponds to the tem-
plate sequence positions 1, 6, …; on round 2 these
become template sequence positions 2, 7, …, etc.
After seven sequencing cycles, the first sequenc-
ing primer is removed and a second sequencing
primer is hybridized to the template strand (at the
n-1 site). In total, five distinct sequencing primers
(n, n-1, n-2, n-3 and n-4) are required for SOLiD.
The SOLiD system uses a complex algorithm
to deconvolute the fluorescence signals from
labelled oligonucleotides in the off-set sequencing
series, and reports the output in ‘colour space’. This
peculiar output data is one downside of SOLiD
sinceitgenerallynecessitatestheuseofareference
sequence and many bioinformatics programs do
not support colour space reads. However, newer
SOLiD instruments can generate conventional
base-space output through the use of an ECC
(Exact Call Chemistry) module. ECC augments
the two-base-encoding chemistry (achieving 2×
redundant sampling of each base by the dinu-
cleotide recognition core structure of the octamer
(detection) oligonucleotides) with an additional
round of ligation, using an alternative set of three-
base encoding probes interrogating three positions.
Each cycle of this round then interrogates posi-
tions 1, 2 and 4,… of each five-nucleotide block
of the template. The same four fluorophores are
used, each now indicating the presence of one of
16 of the possible 64 combinations of nucleotides
at the positions interrogated. These three-base
encoded colour calls are then used to detect and
improve base miscalls made in previous (two-base
encoded) rounds. After imaging, templates to
which a probe failed to ligate have their previous
probe de-capped (i.e. dephosphorylated), such
that they cannot be extended in future cycles.
Thus, owing to ECC, the SOLiD system achieves
good sequencing accuracy and less ‘phasing’
issues compared to other NGS technologies.
Several features of the SOLiD platform are
noteworthy. The SOLiD chemistry is non-intui-
tive. A simple and rapid on-instrument ‘Wildfire’
template preparation technology (released in
2012) has accelerated the previous 8hour sample
preparation methodology to a 2hour preparation
time,andhasreducedthecostperbaseofSOLiDa
further 50%. Life Technologies markets the Wild-
fire chemistry for use with 5500 W and 5500 XL
W Genetic Analyzers. A downside of sequencing
by ligation is a fundamental limitation in achiev-
able read length, which restricts the technology to
shorter reads relative to other conventional NGS
technologies (75bp), which in turn complicates
sequence assembly.
9' 9 ;0
sequencing
Single-molecule sequencing (SMS) is considered
the next evolution of sequencing. Although a
number of SMS approaches are under develop-
ment, to date only one has been commercialized.
50. Genome Sequencing and Assembly | 35
The single-molecule real-time (SMRT) sequenc-
ing approach, developed by Pacific Biosciences,
applieszero-modewaveguide(ZMW)technology
to tightly constrain the signal area generated from
the sequencing of a single DNA template within a
specially fabricated, deep well nanostructure on a
glass surface (Korlach et al., 2010). During library
preparation, hairpin loop adaptors are ligated to
both ends of double-stranded DNA inserts. A
tethered Phi29 polymerase is a highly processive,
strand-displacing enzyme capable of performing
rolling circle amplification: this acts on topologi-
cally circular SMRTbell™ DNA template libraries.
Sequencing is performed on a chip containing
SMRT cells, each containing 150,000 arrayed
ZMWs. The four nucleotide bases are each con-
structed with a unique fluorescent dye attached
to the phosphate chain. Applying an anchored,
unmodified polymerase that is physically tethered
to the base of a deep ZMW well, only fluorescence
intensity resulting from addition of a fluorescently
labelled nucleotide at the bottom of the well is
recorded. Upon base incorporation, the fluores-
cent dye is cleaved. The dye then diffuses out of
the small ZMV-detection volume such that signal
is no longer detected. Nucleotide incorporation
is recorded by CCD, in real-time, for each DNA
template in the form of a time-trace of detected
fluorescence intensity from each ZMW; the poly-
merase then translocates to the next position and
the sequencing cycle is repeated.
The SMRT technology can generate long reads
(average 5000 bases; up to 20,000 bases with
the latest instrumentation, PacBio RS II). When
first introduced, the platform exhibited relatively
low read accuracy (~87%); however, this has
since improved (Table 2.1). Continuous long
reads (CLRs) are generated when the adaptors
located within a raw sequence read from a ZMW
are removed; the read is split into multiple ‘sub-
reads’. A full-pass subread is a subread with two
observed adjacent adaptors. Large DNA inserts
allow for lower accuracy CLRs that can help span
gaps or larger repetitive regions, and are gaining
popularity for scaffolding existing de novo genome
assemblies.
For variant detection applications and to
overcome high error rates associated with SMRT
CLR sequencing, an alternative SMS sequencing
approach termed ‘circular consensus sequenc-
ing’ (CCS) may be applied. Again leveraging
topologically circularized SMRTbell™ library
constructs, for CCS – shorter inserts are used. A
CCS sequence is generated by collapsing multiple
subreads from a single ZMW to form a single,
higher-accuracy consensus read; the same insert
is read on both the sense and the antisense strands
multiple times (redundancy varies with insert
size), thereby improving the final consensus read
accuracy. Using 1–2kb insert SMRTbell™ libraries
and CCS, one can achieve error correction for the
longer CLRs prior to assembly.
The large footprint and capital expenditure
requirement, as well as the earlier low read
accuracy has limited the popularity of this SMS
platform relative to other mainstream NGS
technologies. However, SMRT sequencing
errors generally occur randomly throughout the
sequence read, which means high consensus accu-
racy can be achieved at relatively low coverage
relative to other NGS technologies. The platform
offers short run times (30–180min) and samples
need not be amplified prior to sequencing. Con-
sequently, there is growing interest in the PacBio
platform for rapid de novo microbial genome
sequencing, structural variation analysis, and
genome closure.
Emerging sequencing technologies
Novel sequencing technologies are constantly
under development, and expected to offer faster
and cheaper sequencing. Several additional
approaches of single-molecule sequencing,
through nanopore structures, have been demon-
strated with the hope of rapidly sequencing larger
(native) DNA fragments without the need for
prior time-consuming library template work-ups.
A commercial version of nanopore sequencing
will likely emerge in the not too distant future.
Owing to space constraints, the reader is referred
to excellent summaries of emerging sequencing
technologies provided elsewhere (Venkatesan
and Bashir, 2011; Wanunu, 2012).
Quality assurance of sequence
reads
Regardless of the sequencing technology applied
to generate NGS data, before using the raw
51. Graham et al.
36 |
sequences end users should conduct an objec-
tive check to assess the quality of each dataset.
For example, it is important to ensure that reads
are free of adaptor sequences, contaminant
sequences, notable artefacts such as low complex-
ity or excessive duplicate reads, and sequences of
unacceptably low quality; otherwise these issues
mayconfounddownstreamanalyses.Thisnotonly
ensures that the best quality data goes forward to
later analysis, it may also save time in the long run
by flagging problematic reads before downstream
analysis problems are encountered. The adage
‘garbage in, equals garbage out’ is very true in the
context of genome sequencing and assembly.
A useful means for accomplishing assessment
of NGS data soundness is FastQC (http://www.
bioinformatics.babraham.ac.uk/projects/down-
load.html#fastqc), a quality control tool for high
throughput sequence data. FastQC takes an input
file in FASTQ format (a widely used format for
NGS data that includes read sequences and qual-
ity scores) and runs a series of tests to generate
a comprehensive collection of useful QC assess-
ments, statistics, and plots for visualizing the data.
Other NGS utilities are increasingly available,
and now in pipeline format (Blankenberg et al.,
2010; Dai et al., 2010; Brouwer et al., 2012). Users
should standardly opt to process data by applying
data analysis pipelines, as such systematic and
recordable approaches makes documentation
and troubleshooting of analysis procedures much
easier.
It is highly advisable to remove problematic
datapriortoembarkingontime-consumingdown-
stream analyses. Read quality is initially assessed
by looking at the read length distribution, Phred
quality distribution, nucleotide frequencies, and
read complexity. Low-quality read sets (showing
uncharacteristically short read distributions or
low quality scores) should be investigated or pos-
sibly regenerated prior to downstream analysis.
Read cleaning is achieved either by read filtering,
wherein reads that fail to meet certain criteria are
removed, or by read trimming which removes low-
quality 3′ bases. It is relatively easy to find adaptor
contamination or biases produced during the
library construction protocol by plotting the fre-
quency of each nucleotide for each position or by
conducting homology searches (e.g. via BLAST
analysis). If all positions are not observed to
exhibit equivalent nucleotide frequencies in what
should be random genomic DNA fragments, then
the presence of adaptors within the reads should
be suspected and the reads may need trimming.
By trimming and removing low quality reads, the
average sequencing quality of the remaining reads
is improved, albeit with the loss of some minimal
information.
Knowing that NGS technologies are subject
to base calling inaccuracies, error correction to
improve sequence accuracy can be attempted to
correct errors in the individual reads. Programs
with this aim collect all reads (or parts of reads)
that correspond to the same (likely) genomic
region based on multiple sequence alignments
and assume that any low frequency and random
changes observed have resulted from sequenc-
ing errors. Error correction is made based on
the majority nucleotide call in the read col-
lection. Methods employing analysis of DNA
k-mer (DNA ‘words’ of length k) frequencies in
whole-genome sequence are also available. Read-
ers are referred to an excellent review of error
correction methods for NGS (Yang et al., 2013).
These authors have made their evaluation tools
available (http://aluru-sun.ece.iastate.edu/doku.
php?id=ecr#reference). Another useful program
for read cleaning is Nesoni (http://www.vicbioin-
formatics.com/software.nesoni.shtml).
Sequence assembly
Here we provide a brief overview of sequence
assembly and read mapping approaches. It is
important to note that sequence assembly is cur-
rently a very active area of bioinformatics research
owingtotheavailabilityoflow-cost,high-through-
put sequencing technologies. Consequently,
there is a vast array of bioinformatics tools cur-
rently available, and there will undoubtedly be
better ones created as the field evolves. Many of
these programs and their underlying algorithms
are specialized for particular types of sequence
technology or sequence libraries. Thus, it is
important to choose programs and approaches
that are appropriate for the dataset(s) in hand and
questions of interest, and to also regularly revisit
(and likely revise) assembly strategies as new
52. Genome Sequencing and Assembly | 37
software tools are constantly released. Moreover,
inherent flexibility is encouraged as NGS data
characteristics improve at a rapid pace with each
technology enhancement (for example, increasing
read lengths and accuracies), and as new technol-
ogy becomes available. Hence, the field is likely
to remain unsettled for some time. We discuss
some of the currently available tools and their
applications at the end of this chapter. Additional
information is found in Chapter 3, ‘Genome
assembly – methods, tools and improvement’.
Primer walking
Early genome sequence assembly was carried
out using the technique of primer walking, first
introduced in 1993 (Voss et al., 1993). The pro-
cess begins with the preparation of a library of
large genomic DNA fragments. Genomic DNA
is partially digested by endonuclease restriction
enzymes, the fragments separated by gel electro-
phoresis, and the separated fragments ligated into
large-insert DNA vectors such as cosmids, fos-
mids, bacterial artificial chromosomes (BACs), or
yeast artificial chromosomes (YACs). The vector
DNAistheninsertedintoasuitablehostorganism
– typically bacteria – and propagated for template
augmentation and retrieval. Genomic libraries are
designed to contain several-fold, random coverage
of the genome.
The large genomic fragments contained in the
aforementioned, low-copy genomic clone librar-
ies are not directly suitable for Sanger sequencing
and generally require enrichment prior to
sequencing. Template augmentation can be
achieved by sub-cloning insert DNAs as random
fragments into higher-copy sequencing vectors.
DNA inserts from each low-copy vector are iso-
lated and randomly sheared to a size of 2–10kb.
These randomly overlapping fragments are then
selectively subcloned into a sequencing vector
library containing annealing sites complementary
to the universal primers used to initiate sequenc-
ing reactions. The sequencing vector libraries
are then sequenced usually from both ends (on
both strands) by Sanger sequencing to acquire
the unknown sequence for each DNA insert. This
initial sequencing step identifies the first ~1000
bases of genome sequence from each end. Primers
are subsequently designed (based on the acquired
novel insert sequence) to anneal to the newly
acquired (known) end(s) of high-confidence
sequence, and used to extend further into the
unknown insert region in subsequent rounds of
sequencing. The procedure is then repeated with
each cycle of primer ‘walking’ into the unknown
portion of the insert DNA until sufficient overlap
is achieved and it is completely sequenced, or
found to match sequence from another genomic
library fragment with which it can be merged and
extended.
The main drawback of primer walking is
the requirement for identifying and synthesiz-
ing different primers after each new sequence
data acquisition step, making it a slow and
largely manual approach for acquiring unknown
sequences and closing sequence gaps. Hence,
to overcome this limiting primer synthesis/
sequence/hold/repeat strategy, shotgun sequenc-
ing assembly was developed and has since become
adopted as the routine, current-day strategy for
genome sequence assembly.
Shotgun sequence assembly
The origins of whole-genome shotgun sequencing
can be traced as far back as 1979 (Staden, 1979),
although this initial method was suitable only for
relatively small genomes (under 10kb). The first
large-scale approach to whole-genome shotgun
sequence assembly was used to assemble the
1.8Mb genome of the bacterium Haemophilus
influenzae (Venter et al., 1996). Briefly, two librar-
ies consisting of ~2kb and 15–20kb inserts were
prepared from random, mechanically sheared
DNA, and then nearly 30,000 dye-terminator
sequencing reactions were completed over a
period of three months. The more than 11Mb of
accumulated sequence read data were assembled
using the TIGR ASSEMBLER program, which
uses a greedy merging heuristic (problem solving
technique) to combine overlapping fragments.
This software identifies and scores all possible
overlaps between reads, with larger overlaps
receiving higher scores. The two reads with the
highest scoring overlap are then merged into a
single sequence. This merging step is repeated
using the previously merged sequences and
reads until no more sequences can be combined.
Approximately 210 H. influenzae contigs were
53. Graham et al.
38 |
obtained in this manner, which were further
merged into 140 contigs using lower-stringency
sequence comparisons among contigs. The
contigs were then ordered using a variety of
confirmatory wet laboratory approaches such as
Southern blotting and PCR, and the remaining
gaps were filled with a primer walking strategy.
This landmark achievement set the stage for the
widespread adoption of shotgun sequencing,
although present day sequencing and assembly
methods have since matured.
Modern de novo assembly
approaches
A challenge when reconstructing a genome
sequence from overlapping sequence reads is that
repeats cause ambiguities that make the task of
deducing the correct assembly difficult (or even
impossible) depending on the characteristics of
the reads and of the repeats (Nagarajan and Pop,
2009). Although greedy assemblers (such as used
for the H. influenzae genome) successfully apply
heuristics to avoid some issues caused by repeat
sequences, they have been superseded by more
advanced programs that better consider the global
relationships among reads, making them better
suited to the abundant short reads generated
by modern NGS instruments. Here we briefly
describe two widely used assembly strategies:
overlap–layout–consensus and De Bruijn graph.
For more detailed descriptions of these and other
assembly methods see Chapter 3, and recent
reviews of assembly techniques for NGS data
(Miller et al., 2010; Nagarajan and Pop, 2013).
The overlap–layout–consensus (OLC)
strategy takes advantage of techniques from the
field of graph theory, which uses mathemati-
cal structures, called graphs, to model pairwise
relationships between objects. These graphs,
unlike the plots used to visualize functions or
relationships between variables, consist of nodes
(also called vertices) some of which may be con-
nected by links (also called edges) indicating a
relationship between the nodes (overlaps in the
case of sequence reads). Once data is depicted in
this manner, a variety of algorithms can be used
to analyse the graph and infer the relationships of
the sequences represented by nodes. OLC assem-
blers construct a graph by first comparing each
read to every other read, to identify base overlaps
using a ‘seed-and-extend’ approach. This step
is computationally intensive, but lends itself to
parallelization. Overlaps are then represented as
links in the graph, connecting nodes (reads). The
assembler then refines and interprets the graph
to identify potential contigs, which are identified
as nonintersecting paths in the graph. Finally,
these paths are converted to consensus sequences
(consisting of the most abundant series of nucleo-
tides for each node at each overlapping position)
through the alignment of the constituent reads.
An unordered listing of non-intersecting contigs
is generated, with gaps of unknown sequence (and
of unknown size) between contigs.
De Bruijn graph assemblers draw heavily from
previous developments in the field of computing
science, and also construct a graph, albeit from
overlapping, fixed-size subregions of the reads
called k-mers, where k can typically be specified by
the user and is generally substantially shorter than
the read lengths themselves. There is an important
computational advantage to this approach, as all
reads do not need to be explicitly compared to all
others using sequence alignment tools but instead
can be rapidly summarized using an efficient
data structure called a hash table, which is in this
case an organized set of sequences all of length k.
The assembly process, as in the case of the OLC
approach, involves refining and interpreting the
graph and then returning sequences based on
the remaining paths. De Bruijn graph assemblers
look for exact overlaps of fixed length between
fragments and are consequently sensitive to
sequencing errors, although some sequence errors
can be identified and eliminated by a topological
analysisofthegraph.Thus,itisrecommendedthat
reads be quality filtered and trimmed to remove
low-confidence bases prior to assembly using a De
Bruijn graph-based tool (Salzberg et al., 2012).
WhileOLCandDeBruijnapproachesareboth
theoretically capable of producing high-quality
assemblies, the short sequence read lengths of
NGS technologies make assembly much more
challenging than for conventional (Sanger)
sequence data. One of the reasons for the diver-
sity of assembly programs available is that each
applies different strategies to address particular
real-world nuances associated with each NGS
54. Genome Sequencing and Assembly | 39
technology. For example, programs are adapted
to the read lengths, error characteristics, or insert
sizes produced by particular sequencing plat-
forms. Just as the greedy algorithms employ extra
strategies to deal with repeated elements, so too
must these assembly approaches for NGS data;
many assemblers are able to incorporate informa-
tion from other sources, such as information from
mate-pair reads. Newer iterative approaches, such
as IMAGE (Tsai et al., 2010), PRICE (Ruby et
al., 2013), using mate-pair (or paired-end) read
information have been developed to facilitate
contig gap closure. Other challenges include
non-uniform sequence coverage, regions com-
pletely lacking sequence data (non-sampled or
of problematic quality), and intrinsic sequencing
errors. Both the OLC and De Bruijn assemblers
are able to recognize certain graph structures that
tend to arise from such issues, and apply a variety
of strategies to attempt to resolve problematic
structures. Most assemblers have several adjust-
able parameters that can sometimes drastically
alter assembly outcome; these parameters need
to be determined empirically for some bacterial
datasets. For these reasons, extensive testing of
assemblyprogramsandparametersonrealdata(as
opposed to simulated data) is necessary to make
informed decisions regarding their performance
in bacterial genome assembly. Fortunately, there
are ongoing efforts in the research community to
standardize such parameter testing and improve
genome assembly evaluation; for example, see
below and Magoc et al. (2013).
Evaluating a genome assembly
Once a draft genome is assembled from sequence
reads, it is important to examine the quality of
the result prior to embarking on time consuming
downstream work. An assembly represents a best
approximation of the genome determined in silico,
based on the input data and assembly parameters
selected. Although there may be existing stud-
ies of the performance of a particular assembly
program, likely conducted through comparing
high-quality finished genomes to those recon-
structed automatically from raw or simulated
sequence data, the results may not predict well
the quality for another new genome assembly
made for a different bacterial species harbouring
different repetitive sequences. When such evalua-
tions are available, owing to rapid evolution of the
NGS field, characteristics of any newly acquired
raw reads likely differ from those of data previ-
ously used to evaluate the assembly software.
Computational methods that reliably assess
the accuracy of genome sequence assemblies
are currently lacking, but becoming increasingly
important as assembly processes become more
automated and larger numbers of genomes are
being simultaneously processed. A variety of
assembly metrics have been used in the literature,
and the choice of which metric(s) to evaluate
and report is the subject of debate. Most genome
assemblers report information on the numbers
and sizes of the resultant contigs. Typically
included among these measures is a value called
the N50 size, which is the length of the smallest
contig such that 50% of the total contig length
is contained in contigs of N50, or greater. Larger
N50 sizes are typically interpreted as indicating
a better assembly; however, N50 sizes only tell
part of the story. Incorrect merging of contigs will
lead to larger N50 values but generate an assem-
bly that is not consistent with the true genome
sequence. Overall, such crude metrics are only
qualitative and hence, insufficient to describe
genome assembly completeness; as mentioned,
they also overlook potential mis-assemblies. For
this reason additional assembly metrics should be
examined. If a reference genome is already availa-
ble for the newly sequenced species, reads can be
aligned and mapped to this reference. If no refer-
ence genome is available, the reads can be aligned
and mapped against the generated contigs – in
order to examine characteristics such as aligned
read depth, insert sizes between paired-end or
mate-pair reads, and read pair orientation. The
idea here is that mis-assemblies can be identified
when inconsistent or unusual mapping results
are noted, such as mate-pairs being separated
within the in silico assembly by incorrect large
distances or with incorrect pairs orientation.
Fortunately, tools for conducting these types
of genome assembly quality assessments are
becoming available (Phillippy et al., 2008; Vezzi
et al., 2012; Gurevich et al., 2013; Rahman and
Pachter, 2013), which weigh the likelihood that
an assembly is computed accurately by evaluating
60. This ebook is for the use of anyone anywhere in the United States
and most other parts of the world at no cost and with almost no
restrictions whatsoever. You may copy it, give it away or re-use it
under the terms of the Project Gutenberg License included with this
ebook or online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the country where
you are located before using this eBook.
Title: Kaksintaistelu
Author: Anton Pavlovich Chekhov
Translator: Emil Mannstén
Release date: February 17, 2013 [eBook #42116]
Language: Finnish
Credits: Produced by Jukka Aakula
*** START OF THE PROJECT GUTENBERG EBOOK KAKSINTAISTELU
***
61. Produced by Jukka Aakula
KAKSINTAISTELU
Kirj.
Anton Tshehov
Venäjänkielestä [Duelj] suomentanut Emil Mannstén
Arvi A. Karisto Oy, Hämeenlinna, 1921.
62. I.
Oli kello kahdeksan aamulla — se aika, jolloin upseerit, virkamiehet
ja matkustavaiset tavallisesti kuuman, tukahduttavan yön jälkeen
uivat meressä ja sitten tulivat paviljonkiin juomaan kahvia tai teetä.
Ivan Andreitsh Lajevski, laihahko, vaaleaverinen, kahdeksankolmatta
ikäinen nuorimies, rahaministeriön virkamieslakini päässä ja tohvelit
jalassa, tapasi uimaan mennessä rannalla kosolta tuttavia ja niiden
joukossa ystävänsä, sotilaslääkäri Samoilenkon.
Samoilenkolla oli iso pää, tukka hyvin lyhyeksi leikattu, kaulaa
tuskin ollenkaan, punakat kasvot, tuuheat mustat kulmakarvat ja
harmaa poskiparta; kun tähän vielä yhtyi turpea vartalo ja käheä
soturin basso, niin eipä ihmettä, että hän vieraaseen, joka ensi
kerran saapui paikkakunnalle, teki vastenmielisen vaikutuksen. Mutta
kun ehti kulua pari kolme päivää ensimmäisestä tutustumisesta, niin
alkoivat hänen kasvonsa näyttää erittäin herttaisilta, lempeiltä, jopa
kauniiltakin. Kankeista liikkeistään ja karkeahkosta äänestään
huolimatta hän oli ihmisenä sävyisä, tavattoman helläluontoinen,
jalomielinen ja kohtelias. Kaikkia kaupunkilaisia hän sinutteli, kaikille
hän antoi rahoja lainaksi, oli heidän lääkärinsä, toimitteli heitä
naimisiin, sovitteli heidän pikku riitojaan, pani toimeen
ulkoilmakekkereitä, joissa paistoi hiilillä lampaanlihamöykkyjä ja keitti
63. piikkieväkaloista mainiota kalalientä; aina hän puuhasi ja pyysi
jonkun puolesta, ja aina hänellä oli jotakin iloitsemisen syytä. Yleisen
mielipiteen mukaan hän oli moitteeton mies, jolla tiedettiin olevan
vain kaksi heikkoutta: ensiksikin hän häpesi omaa
hyväsydämisyyttään ja koetti peittää sitä tuimalla katsannolla ja
tahallisella töykeydellä, ja toiseksi hän tahtoi, että välskärien ja
sotamiesten tuli puhutella häntä ylhäisyydeksi, vaikka hänellä oli
ainoastaan valtioneuvoksen arvo.
— Vastaa minulle, Aleksander Daviditsh, yhteen kysymykseen, —
virkkoi Lajevski, sittenkun he molemmat, hän ja Samoilenko, olivat
astuneet olkapäitään myöten veteen. — Otaksutaan, että olet
rakastunut naiseen ja asettunut hänen kanssaan elämään yhdessä;
parin vuoden kuluttua sitten, kuten voi sattua, et enää häntä
rakasta, vaan alat tuntea, että hän on sinulle vieras. Miten sinä
sellaisessa tapauksessa menettelisit?
— Perin yksinkertaisesti. Sanoisin: lähde, muoriseni, mille
ilmansuunnalle halajat — ilman pitempiä puheita.
— Helposti sanottu! Mutta jollei hänellä ole mihin mennä? Jos hän
on yksinkertainen nainen, vailla sukulaisia, vailla rahaa, jos ei osaa
tehdä työtä…
— Siinä tapauksessa … anna hänelle viisisataa ruplaa käteen tahi
maksa viisikolmatta ruplaa kuukaudessa — ja tyytyköön siihen.
Onhan se selvää!
— Otaksutaan, että sinulla on nuo viisisataa tahi kuukausittain
viisikolmatta, mutta nainen, josta puhun, on älykäs, hieno, ylpeä
luonnostaan. Tokko ottaisit tarjotaksesi hänelle rahoja? Ja missä
muodossa?
64. Samoilenko aikoi jotakin vastata, matta samassa suuri aalto
hulvahti heidän molempien ylitse, loiskahti rantaan ja vyöryi
pauhaten takaisin pienten kivien yli. Ystävykset nousivat rannalle ja
alkoivat pukeutua.
— Tietysti, onhan hankalaa elää naisen kanssa, jota ei rakasta, —
lausui Samoilenko, karistaen hiekkaa saappaastaan. — Mutta
pohtikaamme asiaa ihmisyyden kannalta, Vanja. Jos asia koskisi
minua, niin en ollenkaan näyttäisi, että olen lakannut rakastamasta,
vaan eläisin hänen kanssaan hamaan kuolemaan asti.
Hänen tuli äkkiä häpeä omia sanojaan; niiden vaikutusta
lieventääkseen hän sitten virkkoi:
— Minusta on sama, vaikka ei akkaväkeä olisi olemassakaan.
Lempo heidät periköön!
Ystävykset pukeutuivat ja menivät paviljonkiin. Täällä Samoilenko
oli kuin kotonaan, ja hänellä oli siellä omat erikoiset astiansakin.
Joka aamu hänelle tuotiin tarjottimella kuppi kahvia, korkeassa
särmäisessä juomalasissa jäänsekaista vettä ja ryyppy konjakkia;
hän joi ensin konjakin, sitten kuuman kahvin, sitten jäänsekaisen
veden, ja se lienee maistunut aika hyvältä, sillä nämä juotuaan hän
sai silmiinsä erikoisen kiillon, silitteli molemmin käsin poskipartaansa
ja virkkoi merta pälyillen:
— Suurenmoinen näky kerrassaan!
Pitkän, hyödyttömissä ja ikävissä ajatuksissa vietetyn yön jälkeen,
jotka eivät sallineet hänen nukkua, vaan ihan kuin lisäsivät yön
kuumuutta ja pimeyttä, Lajevski tunsi itsensä masentuneeksi ja
veltoksi. Kylpy ja kahvi eivät tehneet hänen oloaan paremmaksi.
65. — Jatketaanpa, Aleksander Daviditsh, taannoista keskusteluamme,
— sanoi hän. — Minä en tahdo peitellä, vaan puhun sinulle suoraan,
kuten ystävälle: välini Nadeshda Feodorovnan kanssa ovat huonot …
tykkänään huonot! Suo anteeksi, että ilmaisen sinulle salaisuuteni,
mutta minun täytyy saada keventää sydäntäni.
Samoilenko, aavistaen mistä tulisi olemaan puhe, loi katseensa
maahan ja alkoi sormillaan naputtaa pöytään.
— Olen elänyt hänen kanssaan kaksi vuotta, ja nyt on rakkauteni
häneen lopussa, — jatkoi Lajevski, — tai oikeammin, olen tullut
käsittämään, etten koskaan ole rakastanutkaan… Nämä kaksi vuotta
ovat olleet petosta.
Lajevskilla oli tapana keskustellessaan tarkkaavasti katsella
heleänpunaisia kämmeniään, pureskella kynsiään tahi likistää
sormilla kalvosimiaan. Ja nytkin hän teki samoin.
— Minä tiedän varsin hyvin, ettet sinä voi minua auttaa, mutta
puhun sinulle sentähden, että meikäläisen kovaosaisen ja liikanaisen
ihmisen ainoa pelastus on puhumisessa. Minun täytyy tehdä
tiettäväksi jokainen tekoni, minun on löydettävä selitys ja puolustus
järjettömälle elämälleni kenenkä hyvänsä teorioista, kirjallisista
tyypeistä, siitä esimerkiksi, että me, aateliset, huononemme
suvustamme, ja niin edelleen… Viime yönä esimerkiksi minä lohdutin
itseäni koko ajan ajattelemalla: ah, kuinka oikeassa on Tolstoi,
kuinka säälimättömän oikeassa! Ja minulle tuli siitä helpotusta.
Todellakin suuri kirjailija! Sanottakoon mitä hyvänsä.
Samoilenko, joka ei koskaan ollut lukenut Tolstoita, vaikka
yhtämittaa oli ollut aikeissa lukea hänen teoksiaan, joutui hämille ja
vastasi:
66. — Niin, kaikki kirjailijat kirjoittavat mielikuvituksesta, mutta hän
ottaa kuvattavansa suoraan luonnosta…
— Voi sentään, — huokasi Lajevski, — missä määrin sivistys
onkaan meidät turmellut! Minä rakastuin naituun naiseen; hän
samaten minuun… Alussa meillä oli suuteloita, hiljaisia iltoja,
lupauksia, Spencer ja ihanteita ja yhteisiä harrastuksia… Mokoma
vale! Todellisuudessa pakenimme hänen miestään, mutta
valehtelimme itsellemme, että pakenimme sivistyneen elämämme
tyhjyyttä. Tulevaisuus kangasti meille seuraavanlaatuisena: aluksi
Kaukaasiassa, kunnes perehdymme maahan ja ihmisiin, minä otan
virastotakin ylleni ja alan hoitaa virkaa, sitten otamme kappaleen
perkaamatonta maata, rupeamme tekemään työtä otsamme hiessä,
istutamme viinitarhan, muokkaamme pellon, ynnä muuta. Jos minun
sijassani olisit ollut sinä tai tuo eläintieteilijä von Coren, olisitte
Nadeshda Feodorovnan kanssa kukaties eläneet kolmekymmentä
vuotta ja jättäneet jälkeläisillenne rikkaan viinitarhan ja tuhannen
desjatiinaa maissipeltoa, jota vastoin minä tunsin itseni hävinneeksi
mieheksi jo ensi päivästä lähtien. Kaupungissa on sietämätön helle,
ikävää ja ihmistyhjää; jos taas lähtee kaupungin ulkopuolelle, on
siellä joka pensaan ja kiven juurella uhkaamassa skorpionit ja
käärmeet, ja taas edempänä ovat vuoret ja erämaa. Outoja ovat
ihmiset, outoa on luonto, viljelys kurjaa — kaikki se on jotakin toista
kuin turkki päällä kävellä Nevskillä käsi kädessä Nadeshda
Feodorovnan kanssa ja haaveksia lämpimistä maista. Se kysyy
taistelua elämästä ja kuolemasta, mutta mikä taistelija minä olen?
Kurja, heikkohermoinen ja työhön tottumaton… Ensi päivästä asti
oivalsin, että raatajan elämää ja viinitarhaa koskevat ajatukseni
olivat pötyä. Mitä taas tulee rakkauteen, täytyy minun sanoa, että
elämä naisen kanssa, joka on lukenut Spenceriä ja sinun tähtesi
matkannut maailman ääriin, on yhtä vähän mielenkiintoista kuin
67. elämä minkä Anisan tai Akulinan kanssa hyvänsä. Sama
silitysraudan, ihojauheen ja lääkkeiden haju, samat paperikähertimet
joka ikinen aamu ja sama itsepetos…
— Silitysraudatta ei taloudessa tulla toimeen, — huomautti
Samoilenko, punastuen siitä, että Lajevski niin avomielisesti puheli
hänelle tutusta naisesta. — Sinä, Vanja, olet tänään pahalla tuulella,
huomaan… Nadeshda Feodorovna on erinomainen, sivistynyt nainen,
sinä mainion järkevä mies. Tosin ette ole vihityt, — sanoi
Samoilenko, katsahtaen hämillään pöydälle päin, — mutta sehän ei
ole teidän vikanne, ja sitäpaitsi … pitää osata vapautua
ennakkoluuloista ja asettua nykyajan kannalle… Minä itse puolustan
siviiliavioliittoa, ja … mutta olen sitä mieltä, että jos kerran olette
yhteen käyneet, on teidän yhdessä elettävä kuolemaanne asti.
— Ilman rakkauttako?
— Selitän sinulle heti, — sanoi Samoilenko. — Noin kahdeksan
vuotta takaperin meillä oli täällä asioitsijana vanha ukko, ylen viisas
mies. Hänellä oli tapana sanoa: perhe-elämässä on tärkeintä
kärsivällisyys. Kuuletko, Vanja? Ei rakkaus, vaan kärsivällisyys…
Elettyäsi parisen vuotta rakkaudessa on perhe-elämäsi
silminnähtävästi joutunut sellaiseen vaiheeseen, jolloin sinun,
voidaksesi ylläpitää tasapainoa, on pantava liikkeelle kaikki
kärsivällisyytesi…
— Sinä uskot asioitsijasi, tuon rahjuksen lausuntoon. Minusta
hänen neuvonsa sensijaan on mielettömyyttä. Sinun äijäsi saattoi
teeskennellä, saattoi harjoitella kärsivällisyyttä, jolloin se henkilö,
johon hän ei tuntenut rakkautta, esiintyi hänelle kärsivällisyyden
harjoituksissa välttämättömänä esineenä, mutta minä en ole vielä
niin syvälle vajonnut; jos minua miellyttää kärsivällisyyttä harjoitella,
68. ostan itselleni voimistelupainot tahi vikurin hevosen, mutta ihmisen
jätän rauhaan.
Samoilenko tilasi valkoista jäänsekaista viiniä. Kun oli juotu lasi
mieheen, kysäisi Lajevski äkkiä:
— Sanoppas, ole niin hyvä, mitä merkitsee aivojen
pehmeneminen?
— Se on, kuinka sinulle selittäisin … taudillinen tila, jossa aivoaine
pehmenee … aivan kuin vetelöityy.
— Voiko sitä parantaa?
— Voi kyllä, ellei tauti ole saanut kehittyä varsin pitkälle… Kylmiä
suihkukylpyjä, espanjankärpäslaastaria… Jotakin sisällisesti
nautittavaksi.
— Vai niin… Näetkös siis, mimmoinen on tilani. Hänen kanssaan
en voi elää: se käy yli voimieni. Niin kauan kuin olen sinun
seurassasi, minä vielä sekä filosofoin että naureskelen, mutta kotona
mieleni kokonaan lamaantuu. On niin äärettömän tukala ollakseni,
että jos minulle sanottaisiin esimerkiksi, että minun on vielä elettävä
hänen kanssaan yksi kuukausi, niin on luultavaa, että laskisin luodin
otsaani. Siitä huolimatta minun on mahdoton hänestä erota… Hän
on yksinäinen, ei osaa tehdä työtä, rahoja ei ole minulla eikä
hänellä… Minne hän joutuu? Kenenkä luo menisi? En voi keksiä
mitään … Sano nyt: mitä minun on tekeminen?
— Tjaa … murahti Samoilenko, tietämättä mitä vastata. —
Rakastaako hän sinua?
69. — Rakastaa, sen verran kuin mies hänen ikäiselleen ja
luonteiselleen naiselle on tarpeellinen. Minusta hänen olisi yhtä
vaikea erota kuin ihojauheestaan ja paperikähertimistään. Minä olen
hänen naiskammionsa välttämätön, olennainen osa.
Samoilenko joutui hämille.
— Sinä olet, Vanja, tänään pahalla päällä, — sanoi hän. — Et
tainnut nukkua.
— Sepä se, nukuin huonosti… Tunnen yleensä voivani pahoin.
Päässä tyhjyys, sydäntä ahdistaa, jokin yleinen heikkous… Ihan
täytyy lähteä karkuun!
— Minne?
— Sinne, pohjolaan. Mäntyjen, sienien, ihmisten, aatteiden
pariin… Antaisin puolet elämästäni, jos tällä hetkellä saisin jossakin
Moskovan tai Tulan kuvernementissa uida joessa, palella,
ymmärräthän, kuljeskella sitten kolmisen tuntia vaikka kaikkein
mitättömimmän ylioppilaan kanssa ja lörpötellä, lörpötellä… Entä
heinä, — kuinka se tuoksuu! Muistatko? Iltaisin kun käyskentelee
puistossa, sisältä kuuluu pianonsoittoa, kuulee junan tulevan…
Mielihyvä saattoi Lajevskin naurahtamaan, hänen silmistään
herahtivat kyynelet, ja salatakseen ne hän paikaltaan nousematta
kurottautui ottamaan tulitikkuja viereiseltä pöydältä.
— Minäpä en ole kahdeksaantoista vuoteen käynyt Venäjällä, —
sanoi Samoilenko. — Olen unohtanut, millaista siellä onkaan.
Mielestäni ei ole ihanampaa maata kuin Kaukaasia.
70. — On eräs Vereshtshaginin taulu: mahdottoman syvän kaivon
pohjalla nääntyvät muutamat hengiltä tuomitut vaivoihinsa. Juuri
semmoiselta syvältä kaivolta minusta näyttää ihana Kaukaasiasi. Jos
minun olisi valittava kahdesta: olla nokikolarina Pietarissa tahi
täkäläisenä ruhtinaana, niin valitsisin edellisen toimen.
Lajevski vaipui mietteisiin. Silmäillessään hänen kumaraista
vartaloaan, yhteen kohti tähystäviä silmiään, kalpeita, hikisiä
kasvojaan ja kuopalle painuneita ohimottaan, pureskeltuja kynsiä ja
tohvelia, joka riippuen kantapäästä lonkollaan jätti huonosti parsitun
sukan näkyviin, Samoilenkon tuli häntä sääli, ja kaiketikin sentähden,
että Lajevski muistutti hänen mielestään avutonta lasta, hän kysyi:
— Elääkö äitisi?
— Elää, mutta me olemme huonoissa väleissä. Hän ei voinut antaa
minulle anteeksi tätä rakkausjuttua.
Samoilenko oli hartaasti mieltynyt ystäväänsä. Hän näki
Lajevskissa hulivilin, ylioppilaan, hyvälaatuisen nuoren miehen, jonka
kanssa sopi sekä ryypätä että nauraa makeasti ja haastaa suoraan
sydämestä. Se, mitä hän käsitti Lajevskista, ei häntä ollenkaan
miellyttänyt. Lajevski naukkaili aika lailla, silloinkin kun ei olisi
sopinut, pelasi korttia, halveksi virkaansa, eli yli varojensa, käytti
puheessaan usein säädyttömiä lausetapoja, kulki kadulla tohvelit
jalassa ja riiteli syrjäisten kuullen Nadeshda Feodorovnan kanssa —
ja se oli Samoilenkolle vastenmielistä. Mutta se, että Lajevski oli
opiskellut filologisessa tiedekunnassa, tilasi kahta paksua
aikakauskirjaa, puhui usein niin oppineesti, että vain harvat häntä
ymmärsivät, vietti yhdyselämää sivistyneen naisen kanssa — tätä
kaikkea Samoilenko ei ymmärtänyt, mutta se miellytti häntä, ja hän
piti Lajevskia suuressa arvossa, itseänsä ylempänä.
71. — Vielä yksi seikka, — sanoi Lajevski pudistaen päätään. — Mutta
se olkoon vain meidän kesken… En ole ilmaissut sitä Nadeshda
Feodorovnallekaan, joten muista pitää hampaittesi takana… Sain
toissa päivänä kirjeen, jossa ilmoitettiin hänen miehensä kuolleen
aivojen pehmenemiseen.
— Hyväinen aika … huokasi Samoilenko. — Miksi sitä häneltä
salaat?
— Kirjeen näyttäminen hänelle merkitsisi samaa kuin lähde pois
kirkkoon vihille. Mutta ensin ovat välimme suoritettavat. Sittenkun
hän on tullut huomaamaan, että yhdyselämämme jatkuminen on
mahdotonta, näytän hänelle kirjeen. Silloin se ei enää ole vaarallista.
— Tiedätkö mitä, Vanja? — sanoi Samoilenko, ja hänen kasvoilleen
tuli äkkiä surullinen, rukoileva ilme, kuin hän olisi ollut aikeissa
pyytää jotakin hyvin suloista ja pelännyt saavansa kieltävän
vastauksen. — Nai sinä hänet.
— Minkä tähden?
— Täytä velvollisuutesi tätä viehättävää naista kohtaan! Hänen
miehensä kuoli pois, joten siis itse kaitselmus ikäänkuin viittaa
sinulle, mitä on tehtävä!
— Kummallinen ihminen, pitäisihän sinun käsittää, että se on
mahdotonta. Yhtä halpamaista ja ihmisarvoa alentavaa on naida
ilman rakkautta kuin toimittaa jumalanpalvelusta, jos puuttuu uskoa.
— Mutta se on velvollisuutesi!
— Miksi minä olisin velvollinen? — kysyi Lajevski kiihtyneenä.
72. — Siksi, että sinä houkuttelit hänet miehensä luota ja otit hänet
vastuullesi.
— Olenhan sanonut jo ihan selvästi: minä en rakasta!
— Kun et rakastane, niin pidä kunniassa, palvo…
— Pidä kunniassa, palvo … matki Lajevski. — Kuin mitäkin
abbedissaa… Sinä olet huono psykologi ja fysiologi, jos luulet, että
naisen kanssa elääksesi selviydyt yksistään kunnioittamisella ja
palvomisella. Naiselle on ennen kaikkea tarpeen makuuhuone.
— Vanja, Vanja … nuhteli Samoilenko hämillään.
— Sinä olet vanha lapsi, teoreetikko, minä sen sijaan nuori ukko ja
käytännön mies, me emme milloinkaan opi ymmärtämään
toisiamme. Parasta kun lopetamme koko keskustelun… Mustaa! —
huusi Lajevski viinurille, — paljonko olemme velkaa?
— Ei, ei, — virkkoi tohtori hätäisesti ja tarttui Lajevskin
käsivarteen. — Minä maksan tämän. Minähän tilasinkin. Merkitse
minun laskuuni! — huusi hän Mustaalle.
Ystävykset nousivat ja lähtivät äänettöminä astumaan rantakatua
pitkin. Puistikkokadulle kääntyessään he pysähtyivät ja puristivat
jäähyväisiksi toistensa kättä.
— Te, herrat, olette liiaksi hemmoiteltuja! — huokasi Samoilenko.
— Sinulle on kohtalo lähettänyt nuoren, kauniin, sivistyneen naisen,
ja sinä pyrit hänestä eroon; jos minulle Jumala sen sijaan antaisi
vaikka kierokylkisen mummon, kunhan olisi lempeä ja
hyväluontoinen, niin olisinpa onnellinen! Eläisin hänen kanssaan
viinitarhassani ja…
73. Samoilenko huomasi viimeisen lausumansa sopimattomaksi ja
lisäsi:
— Hommailisihan se velho siellä edes teekeittiötä kiehumaan!
Heitettyään Lajevskille hyvästit hän lähti astumaan puistikkokatua
pitkin. Kun hän jykevänä, kasvoilla ankara sävy, yllään lumivalkoinen
palttinamekko, jalassa huolellisesti kiilloitetut saappaat, röyhistäen
rintaansa, jolla koreili Vladimirin tähti nauhoineen, asteli pitkin katua,
oli hän hyvin mieltynyt omaan itseensä ja hänestä näytti, että koko
maailma mielihyvin katseli häntä. Päätään kääntämättä hän katsoi
sivuilleen ja huomasi, että puistikkokatu oli hyvässä kunnossa, että
nuoret kypressit, kumipuut ja rumat, kuivankituliaat palmut olivat
sangen kauniita ja lupasivat vastaisuudessa levittää taajaa varjoa,
että tserkessit ovat rehellistä, vieraanvaraista kansaa. Kumma, ettei
Lajevskia miellytä Kaukaasia, ajatteli hän, varsin kummallista.
Samassa tuli vastaan viisi sotamiestä pyssyt olalla, tehden hänelle
kunniaa. Oikealla puolen puistikkokatua kulki jalkakäytävää myöten
erään virkamiehen rouva poikansa, lukiolaisen, seurassa.
— Maria Konstantinovna, hyvää huomenta! — huusi hänelle
Samoilenko, herttaisesti hymyillen. — Kävittekö uimassa? Ha-ha-
haa… Terveiseni Nikodim Aleksandritshille!
Ja hän asteli edelleen, hymyillen yhä herttaisesti; mutta
nähdessään sotilasvälskärin tulevan vastaan hän äkkiä rypisti
kulmansa, pysäytti välskärin ja kysyi:
— Onko sairaalassa ketään?
— Ei ketään, teidän ylhäisyytenne.
74. — Kuinka?
— Ei ketään, teidän ylhäisyytenne.
— Hyvä, saat mennä…
Huojutellen mahtavasti ruumistaan hän suuntasi askelensa
vesimyymälälle päin, jossa pöydän takana istui grusialaiseksi itseään
mainitseva vanhahko, lihava juutalaisakka, ja lausui tälle niin kovaa,
kuin olisi komentanut kokonaista rykmenttiä:
— Olkaa hyvä, antakaa minulle soodavettä!
75. II
Lajevskin lemmetön suhde Nadeshda Feodorovnaan ilmeni
etupäässä siinä, että kaikki, mitä tämä sanoi tai teki, tuntui hänestä
valheelta tai valheeseen vivahtavalta ja että kaikki naisia ja rakkautta
vastaan tähdätyt kirjoitukset, jotka hän oli lukenut, näyttivät hänestä
mahdollisimman parhaiten soveltuvan häneen, Nadeshda
Feodorovnaan, ja tämän mieheen. Lajevskin palatessa kotiin hän
istui jo täysin puettuna ja kammattuna ikkunan ääressä, joi
huolestuneen näköisenä kahvia ja selaili paksua aikakauskirjaa; ja
Lajevski ajatteli, ettei kahvinjuonti ole mikään niin merkillinen
tapahtuma, että sen vuoksi kannattaisi tekeytyä huolestuneen
näköiseksi, ja että turhaan hän oli kuluttanut aikaa taiteellisen
hiuslaitteen aikaansaamiseksi, koska täällä ei ollut ketään, jota tulisi
koettaa miellyttää. Aikakauskirjassakin hän havaitsi valhetta. Hän
ajatteli, että Nadeshda Feodorovna pukeutui ja piti huolta
hiuslaitteestaan näyttääkseen kauniilta ja luki näyttääkseen viisaalta.
— Mitä arvelet, sopisiko minun tänään mennä uimaan? — kysyi
Nadeshda
Feodorovna.
76. — Mitäs siitä? Menet tahi olet menemättä, ei tuosta luulisi
maanjäristystä seuraavan…
— Ei, minä kysyn vain sentähden, että kunhan ei tohtori suuttuisi.
— Kysy sitten tohtorilta. Minä en ole tohtori.
Tällä kertaa tuntuivat Lajevskista kaikkein vastenmielisimmiltä
Nadeshda Feodorovnassa hänen valkoinen avoin kaulansa ja
hiuskiharat niskassa, ja hän muisti, että Anna Kareninaa, kun hänen
rakkautensa miestänsä kohtaan kylmeni, eivät miellyttäneet hänen
korvansa, ja ajatteli: Kuinka oikein se on! Kuinka oikein! Tuntien
heikkoutta sekä päässään tyhjyyttä hän meni työhuoneeseensa, kävi
sohvalle pitkäkseen ja peitti kasvonsa nenäliinalla, etteivät kärpäset
pääsisi tekemään kiusaa. Velton unteloja ajatuksia aina yhdestä ja
samasta asiasta liikkui hänen aivoissaan kuin pitkä kuormajono
sateisena syysiltana, ja hän vaipui alakuloiseen unenhouraukseen.
Hän tunsi olevansa syyllinen Nadeshda Feodorovnan ja tämän
miehen edessä, jopa siihenkin, että viimemainittu oli kuollut. Hän
tunsi olevansa syyllinen oman elämänsä edessä, jonka oli turmellut,
korkeiden aatteiden, tietojen ja työn maailman edessä, ja tämä
ihmeellinen maailma tuntui hänestä mahdolliselta ja olemassa
olevalta, ei täällä rannikolla, missä nälkäiset turkkilaiset ja laiskat
abhasialaiset kuljeskelevat, vaan siellä, pohjolassa, missä ovat
ooppera, teatterit, sanomalehdet ja kaikki henkisen työn muodot.
Rehellinen, viisas, ylevä ja puhdas saattaa olla ainoasti siellä, ei
täällä. Hän syytti itseään siitä, ettei hänellä ollut ihanteita eikä
elämää johtavaa aatetta, vaikka nyt hämärästi tajusi, mitä se
merkitsi. Kaksi vuotta takaperin, kun hän oli rakastunut Nadeshda
Feodorovnaan, hänestä tuntui, että heti kun hän on päässyt
yhdyselämään Nadeshda Feodorovnan kanssa ja he ovat
77. matkustaneet Kaukaasiaan, niin hän on pelastettu alennuksesta ja
elämän tyhjyydestä; samaten hän nytkin oli varma, ettei muuta
tarvita kuin että hän hylkää Nadeshda Feodorovnan ja lähtee
Pietariin, niin hän saa kaikki, mitä kaipaa.
— Pakoon! — murahti hän, nousten istualleen ja pureksien
kynsiään. —
Pakoon!
Hänen mielikuvituksensa hahmotteli, kuinka hän astuu laivaan ja
sitten syö aamiaista, juo kylmää olutta, puhelee laivan kannella
naisten kanssa, sitten Sevastopolissa nousee junaan ja antaa huhkia.
Terve, vapaus! Asemat vilahtavat ohi toinen toisensa jälkeen, ilma
käy yhä kylmemmäksi ja kalseammaksi, jo näkyy koivuja ja kuusia,
tuossa jo Kursk ja Moskova… Ravintoloissa kaalikeittoa, lampaanlihaa
ja puuroa, samminlihaa, olutta, sanalla sanoen — ei enää aasialaista
raakalaisuutta, vaan Venäjä, todellinen Venäjä. Matkustajat junassa
puhuvat kaupasta, uusista laulajista, ranskalaisvenäläisistä
ystävyyssuhteista; kaikkialla on jo tunnettavissa virkeä, älykäs,
reipas sivistyselämä… Pikemmin, pikemmin! Jopa ollaan vihdoin
Nevskillä, Isolla Morskajalla, tuossa on Kovnon solakatukin, missä
hän muinoin asui ylioppilaitten kanssa yhdessä, tuossa tuttu harmaja
taivas, tihkusade, märät ajurit…
— Ivan Andreitsh! — kutsui joku viereisestä huoneesta. —
Oletteko kotona?
— Täällä olen, — vastasi Lajevski. — Mitä tahdotte?
— Täällä olisi asiapapereita!
78. Lajevski nousi vastahakoisesti, tunsi päätänsä huimaavan ja lähti
haukottaen ja tohveleillaan laahustaen viereiseen huoneeseen. Siellä
seisoi kadulla avonaisen ikkunan edessä eräs hänen nuoria
virkatovereitaan ja levitteli ikkunalaudalle virkakirjeitä.
— Kohtsillään, ystäväiseni, — sanoi Lajevski suopeasti ja lähti
etsimään musteastiaa; lähestyttyään jälleen ikkunaa hän lukematta
allekirjoitti paperit ja virkkoi:
— Kyllä on kuuma!
— Kuuma on. Tuletteko tänään?
— Tuskinpa… En ole oikein terve. Sanokaa, ystävä hyvä,
Sheshkovskille, että puolisen jäljestä pistäydyn hänen luonaan…
Virkamies lähti pois. Lajevski kävi jälleen sohvalle pitkäkseen ja
alkoi miettiä.
Siis on tarkoin punnittava kaikki asianhaarat ja harkittava…
Ennenkuin lähden täältä, on minun suoritettava velkani. Olen velkaa
noin kaksituhatta ruplaa. Rahaa minulla ei ole… Se tietysti ei ole
tärkeää: osan suoritan jo nyt miten kuten, osan lähetän sitten
Pietarista… Tärkeintä on Nadeshda Feodorovna… Ihan ensiksi täytyy
saada meidän välimme selviksi… Niin.
Tovin kuluttua hän aprikoi: eikö olisi parasta lähteä Samoilenkon
kanssa neuvottelemaan?
Voisi tuota lähteä, ajatteli hän, mutta mitä siitä on hyötyä?
Rupean taas syyttä suotta puhumaan hänelle naiskammiosta,
naisista ja siitä, mikä on kunniallista, mikä ei. Onko, lempo soikoon,
paikallaan puhua kunniallisuudesta tai epäkunniallisuudesta, jos on
79. kysymyksessä saada mahdollisimman pian pelastetuksi oma elämäni,
jos menehdyn tässä hemmetin orjuudessa ja teen lopun itsestäni?…
Täytyyhän vihdoinkin ymmärtää, että tämmöisen elämän jatkaminen
kuin minun on halpamaista ja julmaa, jonka rinnalla kaikki muu on
vähäpätöistä ja arvotonta. Pakoon! mutisi hän, nousten istualleen.
Pakoon!
Autio merenranta, hellittämätön kuumuus, yksitoikkoiset,
punasinervät, autereiset vuoret, aina yhtäläiset ja synkät,
ikävystyttivät häntä ja tuntuivat uuvuttavan hänet uneen ja
unholaan. Saattaa olla, että hän on hyvin järkevä, lahjakas ja
erinomaisen rehellinen; saattaa olla, että jolleivät häntä joka puolelta
olisi ympäröineet meri ja vuoret, hänestä olisi voinut sukeutua joku
erinomainen toimihenkilö kunnalliselämän alalla, valtiomies, puhuja,
sanomakirjailija, aatteiden esitaistelija. Kenpä tietää! Näin ollen, eikö
ole järjetöntä kiistellä siitä, onko kunniallista vai epäkunniallista, jos
lahjakas ja hyödyllinen ihminen, esimerkiksi soittotaituri tai
taidemaalari, karatakseen vankeudesta, rikkoo seinän ja pettää
vartijansa? Semmoisen ihmisen asemassa kaikki on kunniallista.
Kello kaksi Lajevski ja Nadeshda Feodorovna istuutuivat syömään
päivällistä. Palvelijan tuotua pöytään riisikeittoa tomaattien kanssa,
sanoi Lajevski:
— Joka päivä aina samaa. Miksi ei keitetä kaalisoppaa?
— Ei ole kaalia.
— Kummallista. Samoilenkolla keittävät kaalisoppaa, ja Maria
Konstantinovnalla keitetään kaalia. En tiedä, minkä tähden minun
yksistään on aina syötävä tätä imelää sotkua. Eihän käy laatuun sillä
tavalla, kyyhkyseni.
80. Kuten aviopuolisojen suuren suurella enemmistöllä ei Lajevskilla ja
Nadeshda Feodorovnalla aikaisemmin saatu yhtäkään päivällistä
syödyksi ilman oikkuja ja kinastelua, muita siitä pitäen, kun Lajevski
oli päättänyt, ettei hän enää rakasta, koetti hän kaikessa antaa
Nadeshda Feodorovnalle myöten, puhutteli häntä suopeasti ja
kohteliaasti, hymyili, nimitti kyyhkysekseen.
— Tällä liemellä on lakritsan maku, — sanoi hän naurahtaen, hän
koetti väkisinkin näyttää ystävälliseltä, mutta ei voinut, ja lausui: —
Meillä ei kukaan pidä taloudesta huolta… Jos sinä tosiaankin olet niin
sairas tai lukemisesi sinua estää, niin otan minä hoitaakseni
keittiöpuolta.
Ennen Nadeshda Feodorovna olisi hänelle vastannut: ota vaan,
tai: tahdot näköjään tehdä minusta piian, mutta nyt hän vain arasti
katsahti Lajevskiin ja punastui.
— No, kuinka jaksat tänään? — kysyi Lajevski ystävällisesti.
— Eipä valittamista. Tunnen vain itseäni raukaisevan.
— Täytyy pitää vaaria terveydestään, kyyhkyseni. Minä suorastaan
pelkään puolestasi.
Nadeshda Feodorovnalla oli jokin tauti. Samoilenko sanoi hänellä
olevan vuorottelevan kuumetaudin ja määräsi hänelle kiniiniä; sen
sijaan toinen lääkäri, Ustimovitsh, pitkäkasvuinen, laihahko, ihmisiä
karttava mies, joka päivällä istui kotonaan, mutta iltaisin käveli
hiljakseen yskien pitkin rantakatua, kädet selän takana ja
kävelykeppi sojottaen pitkin selkää, väitti hänellä olevan
naistentaudin ja määräsi kylmiä kääreitä. Siihen aikaan, jolloin
Lajevski vielä rakasti, herätti Nadeshda Feodorovnan tauti hänessä
81. sääliä ja pelkoa, mutta nyt hän tuossa taudissakin huomasi valhetta.
Kellervät, uneliaat kasvot, veltto katse ja haukottelut, jotka
seurasivat Nadeshda Feodorovnan vilutaudinkohtauksia, ynnä se,
että hän taudinpuuskan aikana maatessaan piti yllään matkapeitettä
ja oli enemmän pojan kuin naisen näköinen, ja että hänen
huoneessaan oli tukahuttava ilma ja omituisen vastenmielinen haju
— kaikki tämä oli Lajevskin mielestä omiaan hälventämään
onnenkuvitelmaa, samalla kun se oli voimakas vastalause rakkautta
ja avioliittoa vastaan.
Toisena ruokalajina hänelle tarjottiin pinaattia ja koviksi keitettyjä
munia, mutta Nadeshda Feodorovnalle, kuten ainakin sairaalle,
marjahyytelöä ja maitoa. Kun Nadeshda Feodorovna huolestuneen
näköisenä ensin kosketti lusikalla hyytelöä ja sitten ryhtyi sitä
laiskasti syömään, ryypäten maitoa päälle, joutui Lajevski
kuullessaan kulahduksia hänen kurkussaan semmoisen vihan
valtaan, että hänellä ihan alkoi pää syhyä. Hän myönsi kyllä, että
tuommoinen tunne olisi koiraakin vastaan loukkaava, mutta ei
kiukustunut itseensä, vaan Nadeshda Feodorovnaan siitä, että tämä
herätti hänessä sen tunteen, ja ymmärsi nyt, miksi rakastajat
toisinaan surmaavat rakastettunsa. Itse hän ei aikonut murhata,
mutta valamiesoikeuden jäsenenä hän olisi julistanut murhamiehen
syytteestä vapaaksi.
— Kiitos, kyyhkyseni, — sanoi hän päivälliseltä noustua ja suuteli
Nadeshda Feodorovnaa otsaan.
Tultuaan työhuoneeseen hän asteli noin viisi minuuttia nurkasta
nurkkaan, katseli syrjäkulmin saappaitaan, istui sitten sohvaan ja
jupisi:
— Pakoon, pakoon! Välit selviksi ja pakoon!
82. Hän kävi pitkäkseen sohvalle, ja hänelle muistui taas, että hän
kenties oli aiheuttanut Nadeshda Feodorovnan miehen kuoleman.
— Syyttää ihmistä siitä, että hän on rakastunut tahi lakannut
rakastamasta, on typerää, — uskotteli hän itselleen ja koukisti
pitkällään ollen säärensä saadakseen saappaat vedetyiksi jalkaan. —
Rakkaus ja viha eivät ole meidän käskettävissämme. Mitä taas tulee
aviomieheen, niin kenties minä välillisesti olenkin ollut yhtenä syynä
hänen kuolemaansa, mutta toiseksi, olenko minä sittenkään syypää
siihen, että rakastuin hänen vaimoonsa ja vaimo minuun?
Sitten hän nousi, otti lakkinsa ja läksi virkatoverinsa Sheshkovskin
luo. Sinne kokoontui joka päivä virkamiehiä pelaamaan vistiä ja
juomaan kylmää olutta.
Epäröimiselläni minä muistutan Hamletia, ajatteli Lajevski
matkalla.
Kuinka kohdalleen Shakespeare on osannut!
83. III.
Haihduttaakseen ikävää ja suopeasti suhtautuen vasta saapuneiden
sekä perheettömien tulokasten tukalaan tilaan, näillä kun ei ollut
missä syödä päivällistä — kaupungissa ei näet ollut ravintolaa —
tohtori Samoilenko ylläpiti kotonaan jonkinlaista yhteistä
päivällispöytää. Nyt puheenaolevaan aikaan hänellä oli vain kaksi
ruokavierasta: nuori eläintieteilijä von Coren, joka oli kesällä
saapunut Mustalle merelle tutkimaan meduusojen sikiökehitystä, ja
diakoni Pobedov, joka oli äskettäin päässyt pappisseminaarista ja
määrätty tähän pikkukaupunkiin toimittamaan terveyttänsä
hoitamaan matkustaneen vanhan apupapin virkaa. He maksoivat
kumpikin kaksitoista ruplaa kuussa päivällisestä ja illallisesta, ja
Samoilenko oli vaatinut heitä kunniasanallaan lupaamaan, että he
täsmälleen kello kaksi saapuisivat päivälliselle.
Ensimmäisenä saapui tavallisesti von Coren. Hän istuutui
äänetönnä vierashuoneeseen ja ottaen pöydältä albumin ryhtyi
tarkkaavaisesti katselemaan himmentyneitä valokuvia, jotka esittivät
joitakin tuntemattomia mieshenkilöitä, yllään leveät housut ja päässä
silkkihatut, sekä myssypäisiä naisia, joilla oli pönkkähameet; vain
muutamat näistä Samoilenko muisti nimeltä, mutta niistä, jotka hän
oli unohtanut, hän virkkoi huoahtaen: Erinomainen, tuiki viisas
84. mies! Lopetettuaan albumin tarkastelun von Coren otti hyllyltä
pistoolin ja puristaen kiinni vasemman silmänsä tähtäsi hyvän aikaa
ruhtinas Vorontsovin muotokuvaan tahi katseli kuvastimesta
tummanvereviä kasvojaan, leveää otsaansa ja mustaa, kähärää
neekeritukkaansa ja himmeästä, suurikukkaisesta, persialaista
mattoa muistuttavasta pumpulikankaasta tehtyä paitaansa ja leveää
nahkavyötään, joka hänellä oli liivien asemesta. Itsensä
tarkasteleminen tuotti hänelle miltei suurempaa mielihyvää kuin
valokuvien tai kallishelaisen pistoolin katseleminen. Hän oli hyvin
tyytyväinen sekä kasvoihinsa että huolellisesti kerittyyn partaansa ja
leveihin hartioihinsa, jotka olivat ilmeisenä todistuksena hyvästä
terveydestä ja vankasta ruumiinrakenteesta. Samoin hän oli
tyytyväinen keikarimaiseen pukuunsa, alkaen kaulaliinasta, joka oli
paidan väriin mukautuvasti valittu, keltaisiin kenkiinsä asti. Sillä välin,
kun hän katseli albumia ja seisoi kuvastimen edessä, pidettiin
keittiössä ja sen eteen aukeavassa porstuassa kovaa mekastusta ja
kiirettä; siellä näet itse Samoilenko, takitta ja liiveittä, rinta avoinna,
ärtyisenä ja hikeä valuen touhusi pöytien ääressä valmistaen
salaattia tahi jotakin kastiketta, tahi lihaa, kurkkuja ja sipulia erästä
suosittua okroshka-nimistä ruokalajia varten, ja sitä tehdessään hän
sinkautteli mulkoilevia katseita avustavaan sotamiespalvelijaan, väliin
taas huitaisi tätä kohden milloin veitsellä, milloin kauhalla.
— Etikka tänne! — komensi hän, — tahi ei etikkaa, vaan
ruokaöljyä! — tiuskasi hän, polkien jalkaa. — No, minne sinä katosit,
mölhö?
— Lähdin noutamaan voita, teidän ylhäisyytenne, — vastasi
hätääntyneesti palvelija särkyneellä tenoriäänellä.
85. — Pian nyt! voi on kaapissa! Ja sano Darjalle, että hän lisää
kurkkupönttöön tilliä! Tilliä! Peitä hapankerma-astia, etteivät
kärpäset mene sinne, tolvana!
Ja tuntui kuin koko talo olisi ollut täynnä hänen huutonsa pauhua.
Noin kymmentä tai viittätoista minuuttia ennen kello kahta saapui
diakoni, nuori kahdenkolmatta ikäinen mies, laihahko, parraton,
viikset vasta hieman oraalla. Astuessaan vierashuoneeseen hän teki
pyhimyskuvaa kohden kääntyen ristinmerkin ja ojensi hymyillen von
Corenille kättä.
— Päivää, — virkkoi eläintieteilijä kylmästi. — Missä olette ollut?
— Rantalaiturilla merihärkiä onkimassa.
— Tietysti, missäs… Teidän, diakoni, näyttää olevan mahdoton
koskaan ryhtyä vakavampaan työhön.
— Kuinka niin? Eihän työ ole karhu, jotta metsään pakenisi, —
vastasi diakoni hymyillen ja työnsi kätensä syvälle valkoisen
pappisviittansa taskuihin.
— Selkään sietäisitte saada! — huokasi eläintieteilijä.
Kului vielä noin neljännestunti, mutta syömään ei kutsuttu, kuului
vain, kuinka sotamies juosta paukutteli porstuasta keittiöön ja taas
takaisin, koluten saappaillaan, ja Samoilenko huusi:
— Älä sinne tuppaa! Pane pöydälle! Pese ensin! Nälissään diakoni
ja von Coren alkoivat kengänkoroillaan jyskyttää lattiaa, siten
ilmaisten kärsimättömyyttään, kuten toisen rivin yleisö teatterissa.
Viimein avattiin ovi, ja sotamiespalvelija ilmoitti peräti väsyneenä:
Saisi tulla ruualle. Ruokasalissa he kohtasivat Samoilenkon, keittiön
86. kuumuudesta naama punaisena; hän katsoi heihin tuimasti ja
nostaen kauhun ilme kasvoillaan kannen liemimaljasta pani heille
kummallekin lautasellisen keittoa, ja vasta sitten, kun hän oli
varmistunut siitä, että he söivät hyvällä halulla ja että ruoka heille
maistui, päästi hän helpotuksen huokauksen ja istui syvään
nojatuoliinsa. Hänen ilmeensä lieveni ja hän kaatoi itselleen hitaasti
ryypyn viinaa ja lausui:
— Nuoren polven terveydeksi!
Lajevskin kanssa tapahtuneen keskustelun jälkeen Samoilenko oli
koko ajan aamusta puolipäivään asti, vaikka olikin erittäin hyvällä
tuulella, tuntenut sielunsa syvyydessä jonkinlaista painostusta.
Hänen oli sääli Lajevskia ja halutti auttaa häntä. Otettuaan
liemiruuan alle ryypyn hän huokasi ja sanoi:
— Tapasin tänään Vanja Lajevskin. Raskasta on miehen elämä.
Aineellinen puoli ei ole kehuttava, mutta pahempaa on, että hän
henkisesti on aivan masentunut. Sääli sitä poikaa!
— Minun vain ei ole häntä sääli! — sanoi von Coren. — Jos tuo
miellyttävä mies olisi hukkumaisillaan, niin minä vielä kepillä sysäisin:
huku, veikkonen, huku…
— Ei ole totta. Sitä et sinä tekisi.
— Mistä sen päätät? — kysyi eläintieteilijä olkapäitään kohauttaen.
—
Kykenen minä tekemään hyväntyön siinä missä toinenkin.
— Ihmisen hukuttaminen, sekö olisi hyvätyö? — virkkoi diakoni
naurahtaen.
87. — Ainakin silloin, kun on puhe Lajevskista.
— Liemestä puuttuu jotakin, — lausui Samoilenko, kääntääkseen
puheen toiselle tolalle.
— Lajevski on ehdottomasti vahingollinen ja yhteiskunnalle yhtä
vaarallinen kuin koleeramikroobi, — jatkoi von Coren. — Kelpo työn
tekee, ken hänet hukuttaa.
— Ei ole sinulle kunniaksi, että puhut lähimmäisestäsi tuolla
tavalla.
Sanoppa, miksi sinä häntä vihaat?
— Älä puhu, tohtori, joutavia. Tietysti olisi järjetöntä vihata ja
halveksia mikroobia, mutta pitää kaiken uhalla lähimmäisenään ketä
hyvänsä ilman erotusta on samaa kuin olla kokonaan harkitsematta,
kieltäytyä oikeudenmukaisesti suhtautumasta ihmisiin, sanalla
sanoen — pestä kätensä. Minä pidän Lajevskia heittiönä, sitä en
salaa, ja kohtelen häntä kuten heittiötä, täysin tunnollisesti: sinä
pidät häntä lähimmäisenäsi, no — suutele siis häntä; pidät
lähimmäisenä, mikä tietää, että kohtelet häntä samalla tavalla kuin
minua ja diakonia, se on — et niin etkä näin. Sinä olet yhtä
piittaamaton kaikkia kohtaan.
— Että voikin sanoa ihmistä heittiöksi! — murahti Samoilenko ja
rypisti inhoa ilmaisten kulmiaan. — Se on siinä määrin epähienoa,
etten osaa sanoakaan!
— Ihmisiä arvostellaan heidän tekojensa mukaan, — pitkitti von
Coren. — Päättäkää te, diakoni… Puhun, diakoni, nyt teille. Herra
Lajevskin toiminta on levitetty julkisesti eteenne kuin pitkä
kiinalainen asiakirja, ja te voitte lukea sen alusta loppuun. Mitä hän
88. on tehnyt niiden kahden vuoden aikana, jotka hän on oleskellut
täällä? Ensiksi hän on opettanut kaupungin asukkaat pelaamaan
vistiä. Kaksi vuotta takaperin se peli oli täällä tuntematon; nyt sitä
pelaavat aamusta myöhään yöhön kaikki, yksin naiset ja
keskenkasvuiset nuorukaisetkin. Toiseksi hän on opettanut seudun
asukkaat juomaan olutta, mikä myöskin oli tuntematonta täällä;
häntä saavat asukkaat kiittää siitä, että heillä on tietoja eri
viinalajeista, ja nyt he osaavat silmät ummessa erottaa Koshelevin
viinan Smirnovin n:o 21:stä. Kolmanneksi, ennen täällä pidettiin
yhteyttä toisten miesten vaimojen kanssa salaa, samoista syistä kuin
varkaatkin harjoittavat ammattiaan salassa eikä julkisesti;
huorintekoa pidettiin jonakin semmoisena, mitä hävettiin tuoda
julkisesti näytteille; mutta Lajevski esiintyi tässäkin suhteessa
uranuurtajana: hän viettää yhdyselämää toisen miehen vaimon
kanssa julkisesti. Neljänneksi…
Von Coren söi sukkelasti loppuun keiton ja ojensi lautasen
palvelijalle.
— Minä pääsin Lajevskista selville jo tuttavuutemme ensi
kuukautena, — jatkoi hän, kääntyen diakonin puoleen. — Me
saavuimme tänne yksiin aikoihin. Hänen kaltaisensa ihmiset pitävät
hyvin paljon ystävyydestä, läheisestä tuttavuudesta, lujasta
yhteisyydentunteesta ja sen semmoisesta, sillä he tarvitsevat aina
tovereita kortti-iltoihinsa ja juominkeihinsa; lisäksi he ovat hyvin
lavertelevia, ja heillä pitää siis olla kuulijoita. Me tulimme
keskenämme ystäviksi, toisin sanoen, hän juoksi luonani joka päivä,
esti minua työskentelemästä ja lateli minulle jalkavaimoaan koskevia
salaisuuksia. Jo alusta pitäen hämmästyin hänen tavatonta
valheellisuuttaan, joka minua suorastaan tympäisi. Ystävänä minä
häntä nuhtelin siitä, että hän joi paljon, eli yli varojensa ja teki
89. velkaa, että hän kulutti aikaansa toimettomuudessa, ei viitsinyt
lueskella, oli epäsivistynyt ja vähätietoinen — ja vastaukseksi kaikkiin
moitteisiini hän katkerasti hymyili, huokasi ja lausui: Minä olen
kovaosainen, liika ihminen, tahi: Mitäpähän voi vaatia meiltä,
maaorjuuden jätteiltä? tahi Me huononemme suvustamme…
Taikka hän alkoi ladella pitkää sekasotkua Oneginista, Petshorinista,
Byronin Cainista, Dasarovista, joista sanoi: He ovat meidän isiämme
lihan ja hengen puolesta. Ymmärrettävä siis muka niin, ettei ole
hänen vikansa, jos virkakirjeet ovat viikkomääriä aukaisematta, jos
hän itse juo ja juottaa muita, vaan syynä siihen ovat Onegin,
Petshorin ja Turgenjev, joka on ensimmäisenä kuvannut nuo
kovaosaiset ja liiat ihmiset. Syy hänen tavattomaan irstaisuuteensa
ja turmelukseensa ei nähkääs ole hänessä itsessään, vaan jossakin
ulkopuolella häntä, avaruudessa. Ja paitsi sitä — sukkela temppu! —
irstainen, valheellinen ja viheliäinen ei ole yksistään hän, vaan me…
me kahdeksankymmen-luvun ihmiset, me, maaorjuuden veltot,
hermostuneet sikiöt, me sivistyksen silpomat… Sanalla sanoen,
meidän tulee ymmärtää, että sellainen suuri mies kuin Lajevski on
lankeemuksessaankin suuri, että hänen irstaisuutensa,
sivistymättömyytensä ja epäsiisteytensä ovat katsottavat
välttämättömyyden pyhittämäksi luonnonhistorialliseksi ilmiöksi, että
syy siihen on haettava yliluonnollisissa maailman voimissa ja että
Lajevskin eteen on ripustettava pyhimyslamppu, koska hän on ajan
ja sen tuulahdusten, perinnöllisyyden ynnä muun sellaisen
kohtalokas uhri. Kaikki virkamiehet ja naiset, jotka kuuntelivat hänen
puheluaan, päästivät senkin seitsemän oh ja ah, mutta minä en
hyvään aikaan voinut käsittää, olinko tekemisissä kyynikon vai ovelan
veijarin kanssa. Sellaiset ihmiset kuin hän, näöltään älykkäät, jonkun
verran kasvatusta saaneet ja omasta jalosukuisuudestaan suurta
ääntä pitäväiset, osaavat teeskennellä olevansa syvällisiä luonteita.
90. Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com