SlideShare a Scribd company logo
3
Most read
4
Most read
5
Most read
SCORING SCHEMES IN
BIOINFORMATICS
(PAM)
CONTENT
 INTRODUCTION TO SCORING SCHEMES
 PAM-PERCENT ACCEPTED MUTATION
 CONSTRUCTION OF PAM
 SCORES IN PAM MATRICES
 PAM MATRICES’ NUMBERS
 MORE ABOUT PAM MATRICES
 REFERENCES
INTRODUCTION TO SCORING SCHEMES
 Scoring system is a set of values for quantifying the likelihood of one
residue being substituted by another in an alignment.
 The scoring systems are called substitution matrices which account for
residue identity, substitutions, and insertions or deletions.
 These are derived from statistical analysis of residue substitution data
from sets of reliable alignments of highly related sequences.
 Two popular empirical scoring models for protein sequences are: PAM
and BLOSUM
INTRODUCTION TO SCORING SCHEMES(cont.)
 In molecular biology scores are defined as measures of sequence
similarity.
 Similar sequences give high scores and dissimilar sequences give low
scores.
 Deletions, or gaps in a sequence, will have scores that depend upon their
lengths.
PAM-PERCENT ACCEPTED MUTATION
 Margaret Dayhoff and coworkers originally proposed PAM model of evolution
in the 60’s.
 'Accepted' means fixed in the population and is therefore a more complex
process than simply mutation.
 A point accepted mutation - is the replacement of a single amino acid in the
primary structure of a protein with another single amino acid, which is
accepted by the processes of natural selection.
 This definition does not include all point mutations in the DNA of an
organism. In particular, silent mutations are not point accepted mutations,
nor are mutations which are lethal or which are rejected by natural
selection in other ways.
PAM-PERCENT ACCEPTED MUTATION(cont.)
 'PAM matrix' refers to one of a family of matrices which
contain scores representing the likelihood of two amino acids being
aligned due to a series of mutation events, rather than due to random
chance.
 'PAMn matrix' is the PAM matrix corresponding to a time frame long enough for
‘n' mutation events to occur per 100 amino acids.
 Here each column and row represents one of the
twenty standard amino acids.
 The value in each cell of a PAM matrix is related to the probability of a row
amino acid before the mutation being aligned with a column amino acid
afterwards.
 Different PAM matrices correspond to different lengths of time
in the evolution of the protein sequence
CONSTRUCTION OF PAM
 Construction of PAM1 matrix involves alignment of full length sequences and
clustering based on phylogenetic reconstruction using parsimony principle.
 The other PAM matrices are subsequently derived based on the evolutionary
divergence between sequences of the same cluster.
 This allows computation of ancestral sequence for each internal node of
the trees. This information in turn is used to count the number of
substitution along each branch of tree.
CONSTRUCTION OF PAM (cont.)
 The PAM score for a particular residue pair is derived from a multistep
procedure involving calculations of relative mutability which is:-
1. The number of mutational changes from a common ancestor for a particular
amino acid residue divided by the total number of such residues occurring in
an alignment.
2. Normalization of the expected residue substitution frequencies by random
chance,
3. and logarithmic transformation to the base of 10 of the normalized
mutability value divided by the frequency of a particular residue.
 The resulting value is rounded to the nearest integer and entered into the
substitution matrix, which reflects the likelihood of amino acid substitutions.
 This completes the log odds score computation. After compiling all
substitution probabilities of possible amino acid mutations, a 20*20 PAM matrix
is established.
PAM250Amino Acid Similarity Matrix
Substitution of an Aspartic acid for
glutamic acid (both acidic) adds 3
to the score. Substitution of the
positively charged Lys for the
nonpolar Pro adds -1 to the score.
Generally, the more conservative
the replacement, the higher the
value that will be added to the
score.
CONSTRUCTION OF PAM (cont.)
SCORES IN PAM MATRICES
 Positive scores in the matrix denotes substitutions occurring more
frequently than would have occurred by random chance or expected
among evolutionary conserved replacements in a dataset of homologous
sequences.
 A zero score means that the frequency of amino acid substitutions found
in homologous sequence data set is equal to that expected by chance.
 In this case, the relationship between the amino acids is weakly similar
at best in terms of physicochemical properties.
 Negative score corresponds to substitutions found in homologous
sequence dataset less frequent than would have occurred by random
chance.
 This normally occurs with substitutions between dissimilar residues.
PAM MATRICES’ NUMBERS
 PAM-1 is a scoring system for sequence in which 1% of the residue have undergone
mutation.
 The increasing PAM numbers correlate with increasing PAM units & thus
evolutionary distances of protein sequences.
 (PAM-250 represents 250% mutation i.e., an average of 2.5 accepted mutation per
residue-a very distant relationship.)
 Pam matrices with power serial numbers are more suitable for aligning more
closely related sequences.
 Other Pam matrices with increasing numbers for more divergent sequences are
extrapolated from PAM1 through matrix multiplication. For example, PAM80 is
produced by values of the PAM1 matrix multiplied by itself eighty times.
 The mathematical transformation accounts for multiple substitutions having
occurred in an amino acid position during evolution.
PAM MATRICES’ NUMBERS(cont.)
 If mutation is observed as F replaced by I, the evolutionary changes may have
actually undergone a number of intermediate steps before becoming I, such as in
the given sequence F -> M -> L -> I. So, a PAM80 matrix only corresponds to 50% of
observed mutational rates.
 PAM tries to model what happens at long evolutionary distances based on a simple
Morkov model derived from closely related sequences

For PAM250, which corresponds to an to 20%amino acid identity, represents 250
mutations per 100 residues. In theory, the number of evolutionary changes
approximately corresponds to an expected evolutionary span of 2,500 million years
MORE ABOUT PAM MATRICES
 Substitution matrices like PAM250 are constructed by observing
the frequencies of amino acid replacements in large samples of protein
sequences.
 For a given replacement, the PAM value is proportional to
the natural log of the frequency with which that replacement was
observed to occur.
For closely related sequences, it is appropriate to use
PAM 100 matrix in which PAM units have been extrapolated to 100%
replacement.
 Most database searches use PAM 250 matrix since larger
databases will tend to have sets of more distantly-related sequences.
MORE ABOUT PAM MATRICES(cont.)
 Nucleotide scoring matrices have been developed for
scoring DNA sequence alignments, on similar lines as
amino acid scoring matrices.
 DNA matrices can include ambiguous DNA symbols and
information from mutational analysis.
 Typical information about mutation included is that
transitions are more probable than transversions.
REFERENCES
1. Scoring matrices. Ashwini S Mushunuri.BBI-2-13010. https://
www.slideshare.net/ashwinimushunuri96/scoring-matrices.
2. Point accepted mutation. https://en.wikipedia.org/wiki/
Point_accepted_mutation
3. Adansonian Classification - Medical Definition from MediLexicon
www.medilexicon.com/dictionary/18016
4. S.C. Rastogi, Namita Mendiratta, Parag.Rastogi. Bioinformatics
concepts, Skills & Applications. CBS Publishers & distributors.
New Delhi. http://www.cbspd.com

More Related Content

PPTX
PAM : Point Accepted Mutation
PPTX
Scoring schemes in bioinformatics (blosum)
PDF
BITS: Basics of Sequence similarity
PPT
Sequence Alignment In Bioinformatics
PPTX
PAM matrices evolution
PPTX
Sequence homology search and multiple sequence alignment(1)
PPTX
Multiple Sequence Alignment
PPTX
Dynamic programming and pairwise sequence alignment
PAM : Point Accepted Mutation
Scoring schemes in bioinformatics (blosum)
BITS: Basics of Sequence similarity
Sequence Alignment In Bioinformatics
PAM matrices evolution
Sequence homology search and multiple sequence alignment(1)
Multiple Sequence Alignment
Dynamic programming and pairwise sequence alignment

What's hot (20)

PPTX
Sequence Alignment
PDF
Gene prediction methods vijay
PPTX
Multiple sequence alignment
PPTX
Protein protein interactions
PPTX
Protein fold recognition and ab_initio modeling
PDF
Peptide Mass Fingerprinting
PPTX
Protein Threading
PDF
Structural databases
PPTX
Express sequence tags
PPTX
PDF
Gene prediction method
PPTX
Scop database
PPTX
Protein database
PDF
sequence alignment
PPTX
Kegg
PDF
Ab Initio Protein Structure Prediction
PPTX
Blast and fasta
Sequence Alignment
Gene prediction methods vijay
Multiple sequence alignment
Protein protein interactions
Protein fold recognition and ab_initio modeling
Peptide Mass Fingerprinting
Protein Threading
Structural databases
Express sequence tags
Gene prediction method
Scop database
Protein database
sequence alignment
Kegg
Ab Initio Protein Structure Prediction
Blast and fasta
Ad

Similar to Scoring schemes in bioinformatics (20)

PPTX
Scoring matrices
PDF
powerpoint presentation on bioinformatics blosum
PPTX
Week9-Scoring-Matrices.pptx Week9-Scoring-Matrices.pptx
PPT
Bio info statistical-methods[1]
PDF
Scoring Matrices_5.scoring matrices .pdf
PDF
Ch06 alignment
PPTX
2015 bioinformatics score_matrices_wim_vancriekinge
PPTX
4. sequence alignment.pptx
PDF
BIOL335: Sequence alignment
PPT
Protein Evolution and Sequence Analysis.ppt
PPTX
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
PPTX
2016 bioinformatics i_score_matrices_wim_vancriekinge
PPTX
Bioinformatics life sciences_v2015
PDF
Sequence Alignment
PPTX
DYNAMIC PROGRAMMING, Bioinformatics.pptx
PPT
20100515 bioinformatics kapushesky_lecture07
PPTX
Bioinformatica t3-scoringmatrices v2014
PDF
ppgardner-lecture05-alignment-comparativegenomics.pdf
PDF
Basics of bioinformatics
PDF
The derivation of ungapped global protein alignment score distributions - Part1
Scoring matrices
powerpoint presentation on bioinformatics blosum
Week9-Scoring-Matrices.pptx Week9-Scoring-Matrices.pptx
Bio info statistical-methods[1]
Scoring Matrices_5.scoring matrices .pdf
Ch06 alignment
2015 bioinformatics score_matrices_wim_vancriekinge
4. sequence alignment.pptx
BIOL335: Sequence alignment
Protein Evolution and Sequence Analysis.ppt
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
2016 bioinformatics i_score_matrices_wim_vancriekinge
Bioinformatics life sciences_v2015
Sequence Alignment
DYNAMIC PROGRAMMING, Bioinformatics.pptx
20100515 bioinformatics kapushesky_lecture07
Bioinformatica t3-scoringmatrices v2014
ppgardner-lecture05-alignment-comparativegenomics.pdf
Basics of bioinformatics
The derivation of ungapped global protein alignment score distributions - Part1
Ad

More from SumatiHajela (8)

PPTX
Storage lipids
PPTX
Introduction to sequence alignment partii
PPTX
pH meter
PPTX
Sequence alignment 1
PPTX
Thermodynamics part2
PPTX
Fatty acids ppt - nomenclature & properties- By Sumati Hajela
PPTX
Amino acids ppt |Sumati's Biochemistry|
PPTX
Thermodynamics part 1 ppt |Sumati's biochemistry |
Storage lipids
Introduction to sequence alignment partii
pH meter
Sequence alignment 1
Thermodynamics part2
Fatty acids ppt - nomenclature & properties- By Sumati Hajela
Amino acids ppt |Sumati's Biochemistry|
Thermodynamics part 1 ppt |Sumati's biochemistry |

Recently uploaded (20)

PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPT
protein biochemistry.ppt for university classes
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPT
Chemical bonding and molecular structure
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
PPTX
neck nodes and dissection types and lymph nodes levels
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
diccionario toefl examen de ingles para principiante
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
famous lake in india and its disturibution and importance
PPTX
2. Earth - The Living Planet earth and life
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
protein biochemistry.ppt for university classes
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Chemical bonding and molecular structure
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
The KM-GBF monitoring framework – status & key messages.pptx
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
HPLC-PPT.docx high performance liquid chromatography
TOTAL hIP ARTHROPLASTY Presentation.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
neck nodes and dissection types and lymph nodes levels
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
diccionario toefl examen de ingles para principiante
Comparative Structure of Integument in Vertebrates.pptx
Phytochemical Investigation of Miliusa longipes.pdf
famous lake in india and its disturibution and importance
2. Earth - The Living Planet earth and life

Scoring schemes in bioinformatics

  • 2. CONTENT  INTRODUCTION TO SCORING SCHEMES  PAM-PERCENT ACCEPTED MUTATION  CONSTRUCTION OF PAM  SCORES IN PAM MATRICES  PAM MATRICES’ NUMBERS  MORE ABOUT PAM MATRICES  REFERENCES
  • 3. INTRODUCTION TO SCORING SCHEMES  Scoring system is a set of values for quantifying the likelihood of one residue being substituted by another in an alignment.  The scoring systems are called substitution matrices which account for residue identity, substitutions, and insertions or deletions.  These are derived from statistical analysis of residue substitution data from sets of reliable alignments of highly related sequences.  Two popular empirical scoring models for protein sequences are: PAM and BLOSUM
  • 4. INTRODUCTION TO SCORING SCHEMES(cont.)  In molecular biology scores are defined as measures of sequence similarity.  Similar sequences give high scores and dissimilar sequences give low scores.  Deletions, or gaps in a sequence, will have scores that depend upon their lengths.
  • 5. PAM-PERCENT ACCEPTED MUTATION  Margaret Dayhoff and coworkers originally proposed PAM model of evolution in the 60’s.  'Accepted' means fixed in the population and is therefore a more complex process than simply mutation.  A point accepted mutation - is the replacement of a single amino acid in the primary structure of a protein with another single amino acid, which is accepted by the processes of natural selection.  This definition does not include all point mutations in the DNA of an organism. In particular, silent mutations are not point accepted mutations, nor are mutations which are lethal or which are rejected by natural selection in other ways.
  • 6. PAM-PERCENT ACCEPTED MUTATION(cont.)  'PAM matrix' refers to one of a family of matrices which contain scores representing the likelihood of two amino acids being aligned due to a series of mutation events, rather than due to random chance.  'PAMn matrix' is the PAM matrix corresponding to a time frame long enough for ‘n' mutation events to occur per 100 amino acids.  Here each column and row represents one of the twenty standard amino acids.  The value in each cell of a PAM matrix is related to the probability of a row amino acid before the mutation being aligned with a column amino acid afterwards.  Different PAM matrices correspond to different lengths of time in the evolution of the protein sequence
  • 7. CONSTRUCTION OF PAM  Construction of PAM1 matrix involves alignment of full length sequences and clustering based on phylogenetic reconstruction using parsimony principle.  The other PAM matrices are subsequently derived based on the evolutionary divergence between sequences of the same cluster.  This allows computation of ancestral sequence for each internal node of the trees. This information in turn is used to count the number of substitution along each branch of tree.
  • 8. CONSTRUCTION OF PAM (cont.)  The PAM score for a particular residue pair is derived from a multistep procedure involving calculations of relative mutability which is:- 1. The number of mutational changes from a common ancestor for a particular amino acid residue divided by the total number of such residues occurring in an alignment. 2. Normalization of the expected residue substitution frequencies by random chance, 3. and logarithmic transformation to the base of 10 of the normalized mutability value divided by the frequency of a particular residue.  The resulting value is rounded to the nearest integer and entered into the substitution matrix, which reflects the likelihood of amino acid substitutions.  This completes the log odds score computation. After compiling all substitution probabilities of possible amino acid mutations, a 20*20 PAM matrix is established.
  • 9. PAM250Amino Acid Similarity Matrix Substitution of an Aspartic acid for glutamic acid (both acidic) adds 3 to the score. Substitution of the positively charged Lys for the nonpolar Pro adds -1 to the score. Generally, the more conservative the replacement, the higher the value that will be added to the score. CONSTRUCTION OF PAM (cont.)
  • 10. SCORES IN PAM MATRICES  Positive scores in the matrix denotes substitutions occurring more frequently than would have occurred by random chance or expected among evolutionary conserved replacements in a dataset of homologous sequences.  A zero score means that the frequency of amino acid substitutions found in homologous sequence data set is equal to that expected by chance.  In this case, the relationship between the amino acids is weakly similar at best in terms of physicochemical properties.  Negative score corresponds to substitutions found in homologous sequence dataset less frequent than would have occurred by random chance.  This normally occurs with substitutions between dissimilar residues.
  • 11. PAM MATRICES’ NUMBERS  PAM-1 is a scoring system for sequence in which 1% of the residue have undergone mutation.  The increasing PAM numbers correlate with increasing PAM units & thus evolutionary distances of protein sequences.  (PAM-250 represents 250% mutation i.e., an average of 2.5 accepted mutation per residue-a very distant relationship.)  Pam matrices with power serial numbers are more suitable for aligning more closely related sequences.  Other Pam matrices with increasing numbers for more divergent sequences are extrapolated from PAM1 through matrix multiplication. For example, PAM80 is produced by values of the PAM1 matrix multiplied by itself eighty times.  The mathematical transformation accounts for multiple substitutions having occurred in an amino acid position during evolution.
  • 12. PAM MATRICES’ NUMBERS(cont.)  If mutation is observed as F replaced by I, the evolutionary changes may have actually undergone a number of intermediate steps before becoming I, such as in the given sequence F -> M -> L -> I. So, a PAM80 matrix only corresponds to 50% of observed mutational rates.  PAM tries to model what happens at long evolutionary distances based on a simple Morkov model derived from closely related sequences  For PAM250, which corresponds to an to 20%amino acid identity, represents 250 mutations per 100 residues. In theory, the number of evolutionary changes approximately corresponds to an expected evolutionary span of 2,500 million years
  • 13. MORE ABOUT PAM MATRICES  Substitution matrices like PAM250 are constructed by observing the frequencies of amino acid replacements in large samples of protein sequences.  For a given replacement, the PAM value is proportional to the natural log of the frequency with which that replacement was observed to occur. For closely related sequences, it is appropriate to use PAM 100 matrix in which PAM units have been extrapolated to 100% replacement.  Most database searches use PAM 250 matrix since larger databases will tend to have sets of more distantly-related sequences.
  • 14. MORE ABOUT PAM MATRICES(cont.)  Nucleotide scoring matrices have been developed for scoring DNA sequence alignments, on similar lines as amino acid scoring matrices.  DNA matrices can include ambiguous DNA symbols and information from mutational analysis.  Typical information about mutation included is that transitions are more probable than transversions.
  • 15. REFERENCES 1. Scoring matrices. Ashwini S Mushunuri.BBI-2-13010. https:// www.slideshare.net/ashwinimushunuri96/scoring-matrices. 2. Point accepted mutation. https://en.wikipedia.org/wiki/ Point_accepted_mutation 3. Adansonian Classification - Medical Definition from MediLexicon www.medilexicon.com/dictionary/18016 4. S.C. Rastogi, Namita Mendiratta, Parag.Rastogi. Bioinformatics concepts, Skills & Applications. CBS Publishers & distributors. New Delhi. http://www.cbspd.com