SlideShare a Scribd company logo
3
Most read
4
Most read
8
Most read
INTRODUCTION TO
SEQUENCE ALIGNMENT
PART 1
CONTENT
1-SEQUENCE ALIGNMENT
2-APPLICATIONS OF SEQUENCE
ALIGNMENT
3-DEFINITIONS FOR ALIGNED
SEQUENCES
(a) ALGORITHM (b) DIVERGENT
EVOLUTION (c) CONSERVATION
(d) PROGRAM (e) IDENTITY
(f) SIMILARITY (g) HOMOLOGS
(h) HETEROLOGS (i) ANALOGS
(j) ORTHOLOGS (k) PARALOGS
(l) XENOLOGS
4-PHYSICOCHEMICAL RELATIONSHIPS
BETWEEN AMINO
5-MEASURES OF SEQUENCE
SIMILARITY
(a) HAMMING AND LEVENSHTEIN
DISTANCES
(b) CONCEPT OF SIMILARITY AND
DISTANCE TABLE
(c) SIMPLE MATCHING COEFFICIENT
SEQUENCE ALIGNMENT
• Sequence alignment describes the relationship between
biological sequences by designating portions of sequences that
correspond to each other.
• It is the method used to analyze the similarities and
differences at the level of individual bases or amino acids
with the aim of inferring structural, functional and
evolutionary relationships or random events among the
sequences.
• It is the identification of residue- residue correspondence
OR
• Any assignment of correspondence that preserves the order
of the residues within the sequences is an alignment.
APPLICATIONS OF SEQUENCE ALIGNMENT
INFORMATIONS that are gained by aligning DNA, RNA and
protein sequences----------------
 ● Searching for patterns and informative elements within
a sequence.
 ● Obtaining statistical information on a sequence.
 ● Searching for similarities between two sequences, or
many sequences
 ● Constructing phylogenetic trees based on sequences.
 ● Predicting and analyzing the secondary/tertiary
structures and folding on the basis of the sequence.
APPLICATIONS OF SEQUENCE ALIGNMENT
 ● Identifying unknown sequences.
 ● Finding other members of multigene families.
 ● Gaining information for primer designing.
 ● Reconstructing long sequence of DNA from
string fragments.
 ● Determining physical and genetic maps from
probe data under various experimental
protocols.
 ● Predicting function of actual gene products.
 ● Getting information for molecular modelling.
DEFINITIONS FOR ALIGNED SEQUENCES
 ALGORITHM
 Algorithm is defined by a logical sequence of steps
by which a task can be performed. It is a set of rules
for calculating or solving a problem which normally
is carried out by a computer program.
 Important features of an algorithm are-
 (i) It should stop after a finite number of
steps.
 (ii) All steps of an algorithm must be precisely
defined.
 (iii) Input to the algorithm must be specified.
 (iv) Output to the algorithm must be
specified.
 (v) Algorithm must be very effective.
Thus algorithm is a complete and precise specification
of a method for solving a problem
DEFINITIONS FOR ALIGNED SEQUENCES
 DIVERGENT
EVOLUTION
 CONSERVATION
 Similarity among sequence could arise by chance
or it could be a convergance towards a common
sequence and structure and therefore function,
through evolution, or, the similarity could arise
from divergent evolution of the two sequences
from a common ancestral sequence. The similarity
that arises from this last mechanism alone (i.e.
divergent evolution) is called homologs. Homologs,
heterologs, anologs, orthologs, paralogs and
xenologs are words that describe the different
ways in which sequence similarity could arise.
 Changes at a specific position of an amino acid
in proteins (less commonly nucleotides in DNA
sequences) that preserves the physicochemical
properties of the original residue is known as
conservation.
DEFINITIONS FOR ALIGNED SEQUENCES
 PROGRAM
 IDENTITY
 SIMILARITY :
 A program is an implementation of an
algorithm..
 The extent to which two (nucleotide or amino
acid) sequences are invariant expresses identity
among the sequences.
 The extent to which nucleotide or protein
sequences are related is known as similarity. The
extent of similarity between two sequences can
be based on percent sequence identity and/or
conservation. In BLAST similarity refers to a
positive matrix score.
DEFINITIONS FOR ALIGNED SEQUENCES
 Homologs  ‘Homologs’ is a general term to indicate
sequences with common origin.
 It is qualitative term and is somewhat
arbitrary in the sense that it is up to the
researcher to decide what level of
similarity indicates homology.
 Numerical indicators of similarity
determined by the sequence alignment
programs, other parameters derived from
biochemistry or other areas of biology
may contribute to the decision on whether
a pair of sequence is homologous.
DEFINITIONS FOR ALIGNED SEQUENCES
 Heterologs :  The opposite of 'homologs'
is 'heterologs'.
Heterologous sequences
may still be similar, but
they do not have a
common origin, and nor
do they have a common
function or activity.
DEFINITIONS FOR ALIGNED SEQUENCES
 ANALOGS :  Sequences that have same function but
lack sufficient similarity to imply
common origin are said to be 'analogs'.
What it means is that analogous
sequences followed evolutionary
pathways from different origins to
converge upon same function and
activity. They have homologous
function but heterologous origins. They
may be considered a product of
convergent evolution. Examples are
Chymotrypsin and
subtilisin proteins.
DEFINITIONS FOR ALIGNED SEQUENCES
 ORTHOLOGS
 Orthologs’ are homologs that arise by
speciation.
 The implication is that the two sequences
are currently taken from two different
species, but they show similarity because
they are both derived from the same
ancestral sequence that was present in
the ancestral organism.
 The amount of difference in two
orthologous sequences may be taken as a
rough indicator of the amount of time that
has passed since the speciation event took
place. For example hemoglobin sequences
from horse and zebra.
DEFINITIONS FOR ALIGNED SEQUENCES
 PARALOGS  ‘Paralogs’ are homologs that arise by gene
duplication, without this duplication event
being followed by a speciation event.
 It means that paralogs are homologous
sequences that exist in the same organism,
and that have different functions.
 Paralogs have homologous origin but
heterologous function. for e g. myoglobin
and haemoglobin from human
beings.
 In DNA sequence, the functional
differences between paralogs may simply
be that one is a functional gene, while the
other is a silent non-expressed gene or a
pseudogene.
DEFINITIONS FOR ALIGNED SEQUENCES
 XENOLOGS
 ‘Xenologs’ are homologs resulting from
horizontal gene transfer.
 They are an exception to the rule that the
homologs are always descended from a
common ancestor.
 horizontal or lateral exchange of genes may
take place between different species.
Function of such genes would be
homologous.
PHYSICOCHEMICAL RELATIONSHIPS BETWEEN AMINO ACID
Construction of biologically significant alignments should consider the
fact that protein evolution is constrained by the chemical properties of
amino acids, and by the degeneracy of the genetic code.
Chemically conservative replacements tend to occur more frequently than
replacements with amino acids that are chemically different.
For example, it is more likely to see a substitution of the Leucine with
Isoleucine, both of which are nonpolar, than a substitution of Aspartic
acid, which is negatively charged, for Leucine. Such changes are less
likely to affect the structure and function of the protein. The figure
given below gives an idea of allowed substitutions among amino acids
PHYSICOCHEMICAL RELATIONSHIPS BETWEEN AMINO
ACIDS
MEASURES OF SEQUENCE SIMILARITY
 Quantitative measures of sequence similarity and
difference; two measures of the distance between two
sequence are :
 (1) The Hamming Distance- defines the number of
positions with mismatching characters between two
strings of equal length.
 (2)The Levenshtein (Edit) Distance- between two strings
of not necessarily equal length, is the minimal number of
'edit operations' required to change one string into other,
where an edit operation is a deletion, insertion or
alteration of a single character in either sequence. It is
desirable to assign variable weights to different
edit operations since certain changes are more likely to
happen naturally than others.

MEASURES OF SEQUENCE SIMILARITY
 A given sequence of edit operations induces a unique
alignment, but not vice versa. Example
AGTC
CGTA
 Hamming distance = 2
AG-TCC
CGCTCA
Levenshtein distance = 3
Hamming and Levenshtein distances measure the
dissimilarity of two sequences: similar sequences
give small distances and dissimilar sequences give large
distances.
CONCEPT OF SIMILARITY AND DISTANCE TABLE
a b c d e
a 100 65 50 50 50
b 65 100 50 50 50
c 50 50 100 97 65
d 50 50 97 100 65
e 50 50 65 65 100
By taking hypothetical tables concept of similarity and distance may be
explained.
Here (a-e) are five organisms who are scored for resemblance.
Table-1 table for similarity
Similarity table shows percent of matches, thus the diagonal (in which
each species is compared to itself) consists of 100% values.
(Such data forms the basis of Adansonian analysis or numerical
taxonomy. Definition of the term is “the classification of organisms based
on giving equal weight to every character of the organism; this principle
has its greatest application in numeric taxonomy.")
CONCEPT OF SIMILARITY AND DISTANCE TABLE
a b c d e
a 0 6 11 11 11
b 6 0 11 11 11
c 11 11 0 2 6
d 11 11 2 0 6
e 11 11 6 6 0
The numbers in the distance table show percent differences or
distances. Thus, the diagonal consists of 0% values.
Distance tables are in general use. A common measure of
difference between macromolecular sequences is 100-S where S is the
percentage of identical monomers when sequences have been optimally
aligned.
Table 2- Distance table
SIMPLE MATCHING COEFFICIENT
 The simple matching coefficient (SMC) is a statistic used for
comparing the similarity and diversity of sample sets.
M01 is the total number of attributes where the attribute
of A is 0 and the attribute of B is 1.
M10 is the total number of attributes where the attribute
of A is 1 and the attribute of B is 0.
M00 is the total number of attributes where A and B both
have a value of 0.
The simple matching distance(SMD), which measures
dissimilarity between sample sets, is given by 1-SMC.
SIMPLE MATCHING COEFFICIENT

More Related Content

PPT
Maximum parsimony
PPTX
Swiss prot database
PPTX
Introduction to sequence alignment partii
PPT
Clustal
PDF
Gene prediction methods vijay
PPTX
BLAST AND FASTA.pptx
PDF
Gene prediction method
PDF
Dot matrix
Maximum parsimony
Swiss prot database
Introduction to sequence alignment partii
Clustal
Gene prediction methods vijay
BLAST AND FASTA.pptx
Gene prediction method
Dot matrix

What's hot (20)

PPTX
Protein data bank
PPTX
Multiple sequence alignment
PPTX
Genetic mapping
PPT
Sequence Alignment In Bioinformatics
PPTX
Comparative genomics
PPTX
gene prediction programs
PPTX
Protein ligand interaction.
PPTX
Blast and fasta
PPT
Phylogenetic Tree, types and Applicantion
PPTX
Gene prediction and expression
PPTX
protein sequence analysis
PPTX
Primer designing
PPTX
Scoring schemes in bioinformatics
PPTX
Orthologs,Paralogs & Xenologs
PPTX
Cath
PPTX
Sequence alignment global vs. local
PPT
Est database
PPTX
Dynamic programming and pairwise sequence alignment
PPTX
Chou fasman algorithm for protein structure prediction
PPTX
Global and Local Sequence Alignment
Protein data bank
Multiple sequence alignment
Genetic mapping
Sequence Alignment In Bioinformatics
Comparative genomics
gene prediction programs
Protein ligand interaction.
Blast and fasta
Phylogenetic Tree, types and Applicantion
Gene prediction and expression
protein sequence analysis
Primer designing
Scoring schemes in bioinformatics
Orthologs,Paralogs & Xenologs
Cath
Sequence alignment global vs. local
Est database
Dynamic programming and pairwise sequence alignment
Chou fasman algorithm for protein structure prediction
Global and Local Sequence Alignment
Ad

Similar to Sequence alignment 1 (20)

PPTX
4. sequence alignment.pptx
PPT
Laboratory 1 sequence_alignments
PPT
Protein Evolution and Sequence Analysis.ppt
PPTX
Role Of Homology In Molecular Evolution.pptx
PPTX
222397 lecture 16 17
PPTX
bioinformatics lecture 2.pptx and computational Boilogygy
PPTX
Bioinformatics
PPTX
Sequence alignment
PDF
Characterizing the aggregation and conformation of protein therapeutics
PDF
Basics of bioinformatics
PPTX
MULTIPLE SEQUENCE ALIGNMENT
PPTX
Microarray and its application
PPT
Homology Modeling of Protein, protein structure prediction
PPTX
Sequence alig Sequence Alignment Pairwise alignment:-
PDF
Survey of softwares for phylogenetic analysis
PPTX
Sequence-Similarity-Identity-and-Homology-Unveiling-Evolutionary-Relationship...
PPTX
Introduction to sequence alignment
PPT
Sequence alignment belgaum
PDF
Phylogenomics: Improving Functional Predictions for Uncharacterized Genes by ...
PDF
Aligning Subunits of Internally Symmetric Proteins with CE-Symm
4. sequence alignment.pptx
Laboratory 1 sequence_alignments
Protein Evolution and Sequence Analysis.ppt
Role Of Homology In Molecular Evolution.pptx
222397 lecture 16 17
bioinformatics lecture 2.pptx and computational Boilogygy
Bioinformatics
Sequence alignment
Characterizing the aggregation and conformation of protein therapeutics
Basics of bioinformatics
MULTIPLE SEQUENCE ALIGNMENT
Microarray and its application
Homology Modeling of Protein, protein structure prediction
Sequence alig Sequence Alignment Pairwise alignment:-
Survey of softwares for phylogenetic analysis
Sequence-Similarity-Identity-and-Homology-Unveiling-Evolutionary-Relationship...
Introduction to sequence alignment
Sequence alignment belgaum
Phylogenomics: Improving Functional Predictions for Uncharacterized Genes by ...
Aligning Subunits of Internally Symmetric Proteins with CE-Symm
Ad

More from SumatiHajela (7)

PPTX
Storage lipids
PPTX
Scoring schemes in bioinformatics (blosum)
PPTX
pH meter
PPTX
Thermodynamics part2
PPTX
Fatty acids ppt - nomenclature & properties- By Sumati Hajela
PPTX
Amino acids ppt |Sumati's Biochemistry|
PPTX
Thermodynamics part 1 ppt |Sumati's biochemistry |
Storage lipids
Scoring schemes in bioinformatics (blosum)
pH meter
Thermodynamics part2
Fatty acids ppt - nomenclature & properties- By Sumati Hajela
Amino acids ppt |Sumati's Biochemistry|
Thermodynamics part 1 ppt |Sumati's biochemistry |

Recently uploaded (20)

PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPT
protein biochemistry.ppt for university classes
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPTX
famous lake in india and its disturibution and importance
PPTX
Microbiology with diagram medical studies .pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
BIOMOLECULES PPT........................
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
protein biochemistry.ppt for university classes
7. General Toxicologyfor clinical phrmacy.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Biophysics 2.pdffffffffffffffffffffffffff
ECG_Course_Presentation د.محمد صقران ppt
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
lecture 2026 of Sjogren's syndrome l .pdf
famous lake in india and its disturibution and importance
Microbiology with diagram medical studies .pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
BIOMOLECULES PPT........................
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
AlphaEarth Foundations and the Satellite Embedding dataset
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...

Sequence alignment 1

  • 2. CONTENT 1-SEQUENCE ALIGNMENT 2-APPLICATIONS OF SEQUENCE ALIGNMENT 3-DEFINITIONS FOR ALIGNED SEQUENCES (a) ALGORITHM (b) DIVERGENT EVOLUTION (c) CONSERVATION (d) PROGRAM (e) IDENTITY (f) SIMILARITY (g) HOMOLOGS (h) HETEROLOGS (i) ANALOGS (j) ORTHOLOGS (k) PARALOGS (l) XENOLOGS 4-PHYSICOCHEMICAL RELATIONSHIPS BETWEEN AMINO 5-MEASURES OF SEQUENCE SIMILARITY (a) HAMMING AND LEVENSHTEIN DISTANCES (b) CONCEPT OF SIMILARITY AND DISTANCE TABLE (c) SIMPLE MATCHING COEFFICIENT
  • 3. SEQUENCE ALIGNMENT • Sequence alignment describes the relationship between biological sequences by designating portions of sequences that correspond to each other. • It is the method used to analyze the similarities and differences at the level of individual bases or amino acids with the aim of inferring structural, functional and evolutionary relationships or random events among the sequences. • It is the identification of residue- residue correspondence OR • Any assignment of correspondence that preserves the order of the residues within the sequences is an alignment.
  • 4. APPLICATIONS OF SEQUENCE ALIGNMENT INFORMATIONS that are gained by aligning DNA, RNA and protein sequences----------------  ● Searching for patterns and informative elements within a sequence.  ● Obtaining statistical information on a sequence.  ● Searching for similarities between two sequences, or many sequences  ● Constructing phylogenetic trees based on sequences.  ● Predicting and analyzing the secondary/tertiary structures and folding on the basis of the sequence.
  • 5. APPLICATIONS OF SEQUENCE ALIGNMENT  ● Identifying unknown sequences.  ● Finding other members of multigene families.  ● Gaining information for primer designing.  ● Reconstructing long sequence of DNA from string fragments.  ● Determining physical and genetic maps from probe data under various experimental protocols.  ● Predicting function of actual gene products.  ● Getting information for molecular modelling.
  • 6. DEFINITIONS FOR ALIGNED SEQUENCES  ALGORITHM  Algorithm is defined by a logical sequence of steps by which a task can be performed. It is a set of rules for calculating or solving a problem which normally is carried out by a computer program.  Important features of an algorithm are-  (i) It should stop after a finite number of steps.  (ii) All steps of an algorithm must be precisely defined.  (iii) Input to the algorithm must be specified.  (iv) Output to the algorithm must be specified.  (v) Algorithm must be very effective. Thus algorithm is a complete and precise specification of a method for solving a problem
  • 7. DEFINITIONS FOR ALIGNED SEQUENCES  DIVERGENT EVOLUTION  CONSERVATION  Similarity among sequence could arise by chance or it could be a convergance towards a common sequence and structure and therefore function, through evolution, or, the similarity could arise from divergent evolution of the two sequences from a common ancestral sequence. The similarity that arises from this last mechanism alone (i.e. divergent evolution) is called homologs. Homologs, heterologs, anologs, orthologs, paralogs and xenologs are words that describe the different ways in which sequence similarity could arise.  Changes at a specific position of an amino acid in proteins (less commonly nucleotides in DNA sequences) that preserves the physicochemical properties of the original residue is known as conservation.
  • 8. DEFINITIONS FOR ALIGNED SEQUENCES  PROGRAM  IDENTITY  SIMILARITY :  A program is an implementation of an algorithm..  The extent to which two (nucleotide or amino acid) sequences are invariant expresses identity among the sequences.  The extent to which nucleotide or protein sequences are related is known as similarity. The extent of similarity between two sequences can be based on percent sequence identity and/or conservation. In BLAST similarity refers to a positive matrix score.
  • 9. DEFINITIONS FOR ALIGNED SEQUENCES  Homologs  ‘Homologs’ is a general term to indicate sequences with common origin.  It is qualitative term and is somewhat arbitrary in the sense that it is up to the researcher to decide what level of similarity indicates homology.  Numerical indicators of similarity determined by the sequence alignment programs, other parameters derived from biochemistry or other areas of biology may contribute to the decision on whether a pair of sequence is homologous.
  • 10. DEFINITIONS FOR ALIGNED SEQUENCES  Heterologs :  The opposite of 'homologs' is 'heterologs'. Heterologous sequences may still be similar, but they do not have a common origin, and nor do they have a common function or activity.
  • 11. DEFINITIONS FOR ALIGNED SEQUENCES  ANALOGS :  Sequences that have same function but lack sufficient similarity to imply common origin are said to be 'analogs'. What it means is that analogous sequences followed evolutionary pathways from different origins to converge upon same function and activity. They have homologous function but heterologous origins. They may be considered a product of convergent evolution. Examples are Chymotrypsin and subtilisin proteins.
  • 12. DEFINITIONS FOR ALIGNED SEQUENCES  ORTHOLOGS  Orthologs’ are homologs that arise by speciation.  The implication is that the two sequences are currently taken from two different species, but they show similarity because they are both derived from the same ancestral sequence that was present in the ancestral organism.  The amount of difference in two orthologous sequences may be taken as a rough indicator of the amount of time that has passed since the speciation event took place. For example hemoglobin sequences from horse and zebra.
  • 13. DEFINITIONS FOR ALIGNED SEQUENCES  PARALOGS  ‘Paralogs’ are homologs that arise by gene duplication, without this duplication event being followed by a speciation event.  It means that paralogs are homologous sequences that exist in the same organism, and that have different functions.  Paralogs have homologous origin but heterologous function. for e g. myoglobin and haemoglobin from human beings.  In DNA sequence, the functional differences between paralogs may simply be that one is a functional gene, while the other is a silent non-expressed gene or a pseudogene.
  • 14. DEFINITIONS FOR ALIGNED SEQUENCES  XENOLOGS  ‘Xenologs’ are homologs resulting from horizontal gene transfer.  They are an exception to the rule that the homologs are always descended from a common ancestor.  horizontal or lateral exchange of genes may take place between different species. Function of such genes would be homologous.
  • 15. PHYSICOCHEMICAL RELATIONSHIPS BETWEEN AMINO ACID Construction of biologically significant alignments should consider the fact that protein evolution is constrained by the chemical properties of amino acids, and by the degeneracy of the genetic code. Chemically conservative replacements tend to occur more frequently than replacements with amino acids that are chemically different. For example, it is more likely to see a substitution of the Leucine with Isoleucine, both of which are nonpolar, than a substitution of Aspartic acid, which is negatively charged, for Leucine. Such changes are less likely to affect the structure and function of the protein. The figure given below gives an idea of allowed substitutions among amino acids
  • 17. MEASURES OF SEQUENCE SIMILARITY  Quantitative measures of sequence similarity and difference; two measures of the distance between two sequence are :  (1) The Hamming Distance- defines the number of positions with mismatching characters between two strings of equal length.  (2)The Levenshtein (Edit) Distance- between two strings of not necessarily equal length, is the minimal number of 'edit operations' required to change one string into other, where an edit operation is a deletion, insertion or alteration of a single character in either sequence. It is desirable to assign variable weights to different edit operations since certain changes are more likely to happen naturally than others. 
  • 18. MEASURES OF SEQUENCE SIMILARITY  A given sequence of edit operations induces a unique alignment, but not vice versa. Example AGTC CGTA  Hamming distance = 2 AG-TCC CGCTCA Levenshtein distance = 3 Hamming and Levenshtein distances measure the dissimilarity of two sequences: similar sequences give small distances and dissimilar sequences give large distances.
  • 19. CONCEPT OF SIMILARITY AND DISTANCE TABLE a b c d e a 100 65 50 50 50 b 65 100 50 50 50 c 50 50 100 97 65 d 50 50 97 100 65 e 50 50 65 65 100 By taking hypothetical tables concept of similarity and distance may be explained. Here (a-e) are five organisms who are scored for resemblance. Table-1 table for similarity Similarity table shows percent of matches, thus the diagonal (in which each species is compared to itself) consists of 100% values. (Such data forms the basis of Adansonian analysis or numerical taxonomy. Definition of the term is “the classification of organisms based on giving equal weight to every character of the organism; this principle has its greatest application in numeric taxonomy.")
  • 20. CONCEPT OF SIMILARITY AND DISTANCE TABLE a b c d e a 0 6 11 11 11 b 6 0 11 11 11 c 11 11 0 2 6 d 11 11 2 0 6 e 11 11 6 6 0 The numbers in the distance table show percent differences or distances. Thus, the diagonal consists of 0% values. Distance tables are in general use. A common measure of difference between macromolecular sequences is 100-S where S is the percentage of identical monomers when sequences have been optimally aligned. Table 2- Distance table
  • 21. SIMPLE MATCHING COEFFICIENT  The simple matching coefficient (SMC) is a statistic used for comparing the similarity and diversity of sample sets.
  • 22. M01 is the total number of attributes where the attribute of A is 0 and the attribute of B is 1. M10 is the total number of attributes where the attribute of A is 1 and the attribute of B is 0. M00 is the total number of attributes where A and B both have a value of 0. The simple matching distance(SMD), which measures dissimilarity between sample sets, is given by 1-SMC. SIMPLE MATCHING COEFFICIENT