Introduction of bioinformatics

Introduction
• Bioinformatics is the science concerned with the development and
application of computer hardware and software to the acquisition,
storge,analysis, and visualization of biological information.
• It has the following three component.
- The development of new algorithms and statistics for
assessing the relationship among large sets of biological data.
e.g DNA Sequence data.
- Application of these tools for the analysis and interpretation of
the various biological data. e.g nucleotide sequences, amino acid
sequences.
- The development of database of database for an efficient
storage, access and management of various biological
informations.
• The ‘bioinformatics’ is a combination of ‘biology’ and
informatics.
NEETHUASOKAN

Definition
• Bioinformatics derives knowledge from computer analysis of
biological data.
• These can consist of the information stored in the genetic code,
but also experimental results from various sources, patient
statistics, and scientific literature.
• Research in bioinformatics includes method development for
storage, retrieval, and analysis of the data.
• Bioinformatics is a rapidly developing branch of biology and
is highly interdisciplinary, using techniques and concepts from
informatics, statistics, mathematics, chemistry, biochemistry,
and physics.
• It has many practical applications in different areas of
biology and medicine.
NEETHUASOKAN

History of bioinformatics
• The collection of amino acids sequences was complied in the ‘
Atlas of protein sequence and structure’ by the National
Biomedical Foundation.
• This collection was edited by margaret O.Dayhoff from 1965
to 1978.
• Dayhoff and coworkers contributions to the comparison of
amino acid sequences by developing computer software for
detecting distantly related sequences.
• The EMBL established their data library in 1980 to collect,
organize and distribute nucleotide sequence data and related
information.
• NCBI was established in U.S.A. NCBI serves as primary
information databank and provider of information.
• The National Biomedical Research Foundation established the
PIR in 1984.
NEETHUASOKAN

DNASequences
• The symbols used to represent DNA sequence data.
• The four bases are denoted by single letters A (Adenine), C
(cytosine ), G (guanine), and T (Thymine)
• But often sequence data contain ambiguities in that it is not
clear as to which of the four base present at several positions.
• For example , the sequence data may indicate that the base
present at a specific position may be either G or A, it is purine.
• Similarly , if a position may have either C or T, it is
pyrimidine.
• The base sequence of the two complementary strands of a
DNA molecules are represented by this system of symbols.
NEETHUASOKAN

AminoAcid Sequences of Proteins
• The amino acids were conventionally represented by three-
letters symbols..e.g. Ala for alanine, Val for valine, etc.
• But in Bioinformatics, they are denoted by single letter, e.g A
for alanine C for cyctine, D for aspartics acid, etc.
• But some position in protein sequences have ambiguities this
situation is comparable to that for DNA sequences.
• For e.g , it may not be clear that a position has glutamine or
glutamic acid , the position is given the symbol Z.
• The Protein synthesis begin at the N-terminus and proceeds
to the C-terminus.
• The amino acid Sequences in databases are listed from the N-
terminus to the C-terminus of the polypeptide.
NEETHUASOKAN

Conti...
Single letter code Amino acid Three letter Code
A Alanine Ala
B Asparagine Asx
C Cystine Cys
D Aspartic acid Asp
E Glutamic Acid Glu
F Phenylanine Phe
G Glcine Gly
H Histidine His
I Isoleucine Ile
K Lysine Lys
L Leucine Leu
M Methionine Met
NEETHUASOKAN

Conti.....
Single letter code Amino acid Three letter Code
N Asparagine Asn
P Proline Pro
Q Glutamine Glu
R Arginine Arg
S Serine Ser
T Threonine Thr
V Valine Val
W Tryptophan Trp
Y Tyrosine Tyr
Z Glutamic acid Glx
X Any amino acid Xaa
NEETHUASOKAN

Types of Sequences in Nucleotide Sequence
Databases
• The databases on DNA sequences contain a different types.
cDNA sequences :
• A cDNA molecule is obtained by reverse transcription of an
RNA molecule.
• The cDNA sequences, therefore represent that part of the
genome that is transcribed into RNA.
• If the cDNA is obtained from mRNA, it will represent only the
exon sequences of the gene expressed in the concerned cell /
tissue/organisms.
Genomic DNA sequences :
• These sequences represent the complete genome of the
organisms.
• When the genome sequences is completed, it will contain the
sequences of the entire genome of the organisms.
NEETHUASOKAN

Cont...
• In Case of prokaryotes, genome consists of usually, a single
chromosome, while in case of eukaryotes it relates to the
nuclear DNA
Expressed Sequence Tag (EST) sequences :
• The sequences are obtained by sequencing only a part of the
cDNA molecules produced using mRNA.
• These sequences are dubbed as ‘tags’ because they can be used
as probes for the isolation of the concerned genes from the
genomic DNA.
• This approach was used by J. Craig Venter and his group for
obtaining the sequence of expressed portion of human
genome.
• The EST technique generated enormous sequence data that
permitted the construction of a preliminary transcript map of
the human genome.
NEETHUASOKAN

Conti...
Genome Sequence Tag (GST) Sequences:
• GSTs were developed for identifying the genes of Plasmodium
falciparum.
• It was observed that the enzyme mung bean nuclease (Mnase)
cuts P.falciparum genomic DNA between genes.
• GSTs are developed by sequencing the DNA Fragments on
either side of the points of cuts generated by Mnase.
Organellar DNA Sequences:
• Organellar DNA is the DNA found in mitochondria (mtDNA )
and chloroplasts (cpDNA).
• The sequence of the data are complied in databases.
NEETHUASOKAN

Branches of Bioinformatics
• A living cell is a system where cellular components such as
genome, the gene transcript, and the proteins interact with
each other, and these interactions determine the fact of the
cell. e.g Whether a stem cell is going to become a liver cell
or a cancer cell.
The three branches of bioinformatcs...
1. Genomics
2. Transcriptomics
3. Poteomics
NEETHUASOKAN

Conti....
Genomics
Makes
Trancriptomics
Makes
Proteomics
The three major branches of Bioinformatics
DNA
RNA
Protein
NEETHUASOKAN

Genomics :
• Genomics play a significant role in modern biological research
in which the nucleotide sequences of ali the chromosomes of
an organism are mapped and the location of different genes
and their sequence are determined.
• This involves extensive analysis of the nucleic acids through
molecular biology techniques before the data are ready for
processing by Computer.
• It is a science that attempts to describe a living organisms in
terms of the sequence of its genome.
• It Was not reliable to estimate the number of genes in an
organism based on the number of nucleotide base pairs
because of the presence of high numbers of redundant copies
of many genes.
• Genomics has helped to rectify this problem.
NEETHUASOKAN

Conti...
• Genomics uses technique of molecular biology and bioinformatics
to identify cellular components such as proteins, rRNA, tRAN,etc
and analyse the sequences attributed to the structural genes
regulatory sequences, and non-coding sequence.
• The first automatic DNA sequencer was developed in 1986 by
Leroy Hood.
• Haemophilus influenzae was the first bacterium to be sequenced
in 1995.
• Even if one can identify all the genes on a genome , the genes
only indicate that, at some point in time, it might be transcribed to
produce cellular componts.
• eg. A human genome contains about 30,000 to 60,000 protein
coding genes, but only a subset of them is expressed in a particular
cell type at a particular time.
NEETHUASOKAN

Transcriptomics
• Transcriptomics is the study of the transcriptome, which
includes the whole set of mRAN molecules in one or a
population of biological cells.
• This study helps us to depict the expression level of genes,
often using techaniques such as DNA microarrys, that is
capable of sampling ten thousands of different mRNAs at a
time.
• This kind of new technique has helped biologist to routinely
monitor the gene expression between the control cells and
treatment cells.
• Transcriptomics has a few limitations
• The relative abundance of transcripts as characterized by the
sequential analysis of gene expression (SAGE) or microarry
experiments.
NEETHUASOKAN

Conti....
1. Differential adaptation to the translational machinery.
2. Differential usage of amino acid of different abundances.
3. The lack of information on post-translation modification of
amino acid residues although post-transcriptional
modification such as acetylation , hydroxylation,
glycosylation, phosphorylation, and cleavage are
fundamental in understanding the interaction of cellular
components.
Proteomics :
• Proteomics represents the earliest to identify a major sub-
class of cellular components, the proteins and their
interactions.
• Proteomics involves the sequencing of amino acid in a
protein determining its 3D structure and relating it to the
function of the protein.
NEETHUASOKAN

Cont...
• Before computer processing comes into the picture, extensive
data, particularly through crystallography and nuclear
magnetic resonance (NMR).
• With such data known as proteins, the structure and its
relationship to the function of newly discovered proteins.
• In such areas, bioinformatics has enormous analytical and
predictive potential.
• Metabolic proteins such as haemoglobin and insulin have been
subjected to intensive proteomic investigation.
• The term ’proteomics’ was coined to make an analogy with
genomics.
• Scientists feel that the bioinformatics of proteins is crucial , to
understands the cellular components and the interactions
completely.
NEETHUASOKAN

Aims of Bioinformatics
• The various important ways in which bioinformatics can be
used.
• The aim of bioinformatics is fourfold and includes data
acquisition, tool and database development, data analysis, and
data integration.
Data Acquisition:
• Data Acquisition is primarily concerned with accessing and
storing data generated directly from the biological
experiments.
• The data generated by various sequencing projects have to be
retrieved in the appropriate format, and capable of being
linked to all the information related to the DNA samples.
• The data are organized in different databases so that the
researchers can access existing information.
NEETHUASOKAN

Tool and Database Development
• Many laboratories generate large volumes of data such as DNA
sequences, gene expression information, 3D molecular structure ,
and highly-throughput screening.
• Consequently, they must develop effective databases for storing
and quickly accessing data. The other aim is to develop tools and
resources that aid in the analysis of data.
Data Analysis:
• The third aim is to use these tool to analyse the data and interpret
the results in a biologically meaningful manner. Efficient analysis
require an efficiently deigned database.
• It must allow researchers to place their query effectively and
provide them with all the information they need to begin their data
analysis.
NEETHUASOKAN

Conti...
• If queries cannot be performed , or if the performance is too
slow, the whole system breaks down since scientists will not
be inclined to use the database.
Data Integration :
• Once information has been analysed , a researcher must often
associate or integrate it with the related data from the other
databases.
• For e.g a scientist may run a series of gene expression analysis
experiments and observe that a particular et of 100 genes is
more highly expressed in a cancerous lung tissue than in a
normal lung tissue.
• The scientist may wonder which of the genes is most likely to
be truly related to the disease.
NEETHUASOKAN

Bioinformatics Applications
Molecular medicine :
• The human genome will have profound effects on the fields of
biomedical research and clinical medicine. Every disease has a
genetic component.
• This may be inherited or a result of the body's response to an
environmental stress which causes alterations in the genome
(eg. cancers, heart disease, diabetes.)
• The completion of the human genome means that we can
search for the genes directly associated with different diseases
and begin to understand the molecular basis of these diseases
more clearly.
• This new knowledge of the molecular mechanisms of disease
will enable better treatments, cures and even preventative tests
to be developed.
NEETHUASOKAN

Conti...
• Personalised medicine:
• Clinical medicine will become more personalised with the
development of the field of pharmacogenomics.
This is the study of how an individual's genetic inheritence
affects the body's response to drugs.
• At present, some drugs fail to make it to the market because a
small percentage of the clinical patient population show
adverse affects to a drug due to sequence variants in their
DNA.
• As a result, potentially life saving drugs never make it to the
marketplace. Today, doctors have to use trial and error to find
the best drug to treat a particular patient as those with the same
clinical symptoms can show a wide range of responses to the
same treatment.
NEETHUASOKAN

Conti...
Drug development :
• At present all drugs on the market target only about 500
proteins.
• With an improved understanding of disease mechanisms and
using computational tools to identify and validate new drug
targets, more specific medicines that act on the cause, not
merely the symptoms, of the disease can be developed.
• These highly specific drugs promise to have fewer side effects
than many of today's medicines.
NEETHUASOKAN

Conti...
Gene therapy :
• In the not too distant future, the potential for using genes
themselves to treat disease may become a reality.
• Gene therapy is the approach used to treat, cure or even
prevent disease by changing the expression of a persons genes.
• Currently, this field is in its infantile stage with clinical trials
for many different types of cancer and other diseases ongoing.
NEETHUASOKAN

Conti...
• The reality of bioweapon creation :
• Scientists have recently built the virus poliomyelitis using
entirely artificial means.
• They did this using genomic data available on the Internet and
materials from a mail-order chemical supply.
• The research was financed by the US Department of Defence
as part of a biowarfare response program to prove to the world
the reality of bioweapons. The researchers also hope their
work will discourage officials from ever relaxing programs of
immunisation. This project has been met with very mixed
feelings.
NEETHUASOKAN

Conti.....
Antibiotic resistance :
• Scientists have been examining the genome of Enterococcus
faecalisa leading cause of bacterial infection among hospital
patients.
• They have discovered a virulence region made up of a number
of antibiotic-resistant genes that may contribute to the
bacterium's transformation from a harmless gut bacteria to a
menacing invader.
• The discovery of the region, known as a pathogenicity island,
could provide useful markers for detecting pathogenic strains
and help to establish controls to prevent the spread of infection
in wards.
NEETHUASOKAN

References
• Bioinformatics Principles and Applications – Zhumur
Ghosh, Bibekanand Mallick.
• Bioinformatics – B.D Singh
• WWW.Scfbio.iitd.org
• WWW.ncbi.nim.nih.gov
• http.//genes.mit.edu/Genscan html.
NEETHUASOKAN

Introduction of bioinformatics

More Related Content

What's hot (20)

Similar to Introduction of bioinformatics (20)

More from Dr NEETHU ASOKAN (20)

Recently uploaded (20)

Introduction of bioinformatics