SlideShare a Scribd company logo
ZIAUDDIN UNIVERSITY
Faculty Of Engineering Science, Technology & Management
(ZUFESTM)
Department Of Biomedical Engineering
Dr. Syeda Bushra Zafar
Assistant Professor
Bioinformatics
Multiple Sequence Alignment
Multiple sequence alignment
• Multiple sequence alignment is to align multiple related sequences to achieve
optimal matching of the sequences.
• There are unique advantage of multiple sequence alignment:
Reveals biological information
Carry out phylogenetic analysis
Designing of degenerate polymerase chain reaction primers
Scoring Function
• Multiple sequence alignment is to arrange sequences in such a way that a
maximum number of residues from each sequence are matched up according to
a particular scoring function.
• The scoring function for multiple sequence alignment is based on the concept of
sum of pairs (SP).
Approaches used in multiple sequence alignment
• Exhaustive algorithm
• Heuristic algorithm
Exhaustive algorithm
• The exhaustive alignment method involves examining all possible aligned positions simultaneously
• This approach is similar to dynamic programming, but dynamic programming is limited to small datasets of
less than ten short sequences.
• A program called, divide and conquer alignment (DCA) which uses some exhaustive components is used.
• Divide and conquer alignment:
• It is a semi exhaustive approach
• It works by breaking sequences into small fragments
• This algorithm provides an option of using a more heuristic procedure (fastDCA) to choose optimal cutting
points so it can more rapidly handle a greater number of sequences.
• The resulting short alignments are joined together head to tail
• When the length of sequences reach a predefined threshold then dynamic programming is applied for
alignment.
• It performs global alignment and requires the input sequences to be of similar lengths and domain structures.
Heuristic algorithm
• Dynamic programming is not feasible for routine multiple sequence alignment, a
faster heuristic algorithm has been developed.
• Categories of heuristic algorithm:
1. Progressive alignment type
2. Iterative alignment type
3. Block-based alignment type
Progressive Alignment Method
• It first conducts pairwise alignments for each possible pair of sequences and generate
score
• The scores are then converted into evolutionary distances to generate a distance
matrix for all the sequences involved
• A phylogenetic tree is generated using the neighbor-joining method
• According to the guide tree, the two most closely related sequences are first re-aligned
• Two already aligned sequences are converted to a consensus sequence with gap
positions fixed. The consensus is then treated as a single sequence in the subsequent
step.
• The next closest sequence based on the guide tree is aligned with the consensus
sequence using dynamic programming.
Lec 4-multiple sequence alignment.pptx..
Clustal
• Probably the most well-known progressive alignment program is Clustal
• Clustal is a progressive multiple alignment program available either as a stand-
alone or on-line program.
• The stand-alone program, which runs on UNIX and Macintosh, has two variants,
Clustal W and Clustal X.
• One of the most important features of this program is the flexibility of using
substitution matrices.
• For closely related sequences BLOSUM62 or PAM120 matrix are used, whereas,
for more divergent sequences BLOSUM45 or PAM250 matrices may be used.
• Use of adjustable gap penalties that allow more insertions and deletions in
regions that are outside the conserved domains, but fewer in conserved regions
• The program also applies a weighting scheme to increase the reliability of aligning
divergent sequences
Drawbacks and Solutions
It is unsuitable for comparing sequences of different lengths. Accuracy is
compromised.
• Another major limitation is the “greedy” nature of the algorithm. The final
alignment could be far from optimal
T-Coffee (Tree-based Consistency Objective Function for
alignment Evaluation)
• T-Coffee performs both global and local pairwise alignment for all possible pairs involved
The global pairwise alignment is performed using the Clustal program. The local pairwise
alignment is generated by the Lalign program, from which the top ten scored alignments
are selected
Both global and local alignments are pooled
Consistency of alignment is evaluated and a score is generated.
Each pairwise alignment is further aligned with a possible third sequence.
The result is used to refine the original pairwise alignment based on a consistency
criterion in a process known as library extension.
Distance matrix is built to derive a guide tree
Multiple sequence alignment performed using progressive approach.
• T-Coffee avoids the problem of getting stuck in the suboptimal alignment regions,
because an optimal initial alignment is chosen from many alternative alignments
• T-Coffee indeed outperform Clustal however, it is slower than Clustal because of
the time required for calculation of consistency score.
• T-Coffee provides a graphical output of the alignment results.
Iterative alignment
• The iterative approach is based on the idea that an optimal solution can be found
by repeatedly modifying existing suboptimal solutions.
• The order of the sequences used for alignment is different in each iteration
• This method may alleviate the “greedy” problem of the progressive strategy,
because order is change everytime.
• This method does not have guarantees for finding the optimal alignment as it is
also heuristic in nature
PRRN
• Uses a double nested iterative strategy for multiple alignment. It performs multiple alignment
through two sets of iterations: inner iteration and outer iteration
• In the outer iteration, an initial random alignment is generated that is used to derive a UPGMA tree
• In the inner iteration, first the sequences are divided randomly into two groups
• The two groups, each treated as a single sequence, are then aligned to each other using global
dynamic programming
• The process is repeated through many cycles until the total SP score no longer increases
• Tree is generated
• New weights are applied to optimize alignment scores. The newly optimized alignment is subject to
further realignment in the inner iteration.
• This process is repeated over many cycles until there is no further improvement in the overall
alignment scores
Lec 4-multiple sequence alignment.pptx..
Block-Based Alignment
• It is a local alignment based strategy that identifies a block of ungapped
alignment shared by all the sequences.
DIALIGIN 2
• The method breaks each of the sequences down to smaller segments and
performs all possible pairwise alignments between the segments.
• High-scoring segments, called blocks, among different sequences are then
compiled in a progressive manner to assemble a full multiple alignment
• It places emphasis on block-to-block comparison rather than residue-to-residue
comparison
Match-Box
• The program compares segments of every nine residues of all possible pairwise
alignments.
• If the similarity of particular segments is above a certain threshold across all
sequences, they are used as an anchor to assemble multiple alignments

More Related Content

PPTX
PRESENTATION MULTIPLE SEQUENCE ALIGNMENT.pptx
PDF
Multiple sequence alignment
PPTX
MULTIPLE SEQUENCE ALIGNMENT
DOCX
multiple sequence alignment
DOCX
Bioinformatics_Sequence Analysis
PPTX
Multiple sequence alignment
PDF
Sequence alignment
PPTX
Sequence homology search and multiple sequence alignment(1)
PRESENTATION MULTIPLE SEQUENCE ALIGNMENT.pptx
Multiple sequence alignment
MULTIPLE SEQUENCE ALIGNMENT
multiple sequence alignment
Bioinformatics_Sequence Analysis
Multiple sequence alignment
Sequence alignment
Sequence homology search and multiple sequence alignment(1)

Similar to Lec 4-multiple sequence alignment.pptx.. (20)

PPT
Seq alignment
PDF
Sequence Alignment_Assumption.pdf sequence
PDF
sequence alignment
PPTX
Sequence Alignment
PDF
Sequence-analysis-pairwise-alignment.pdf
PPT
B.sc biochem i bobi u 3.1 sequence alignment
PPT
B.sc biochem i bobi u 3.1 sequence alignment
PPT
5.4 mining sequence patterns in biological data
PPTX
Parwati sihag
PDF
Ch06 multalign
PDF
International Journal of Computer Science, Engineering and Information Techno...
PPTX
DYNAMIC PROGRAMMING, Bioinformatics.pptx
PPTX
Msa & rooted/unrooted tree
PDF
BIOINFORMATICS_AND_PHYLOGENY.pdf.pdf
PPT
Clustal
PDF
multiple sequence and pairwise alignment.pdf
PPTX
Bioinformatics lesson
PPTX
Bioinformatics lesson
PPTX
Sequence alignment
PPTX
Virus Sequence Alignment and Phylogenetic Analysis 2019
Seq alignment
Sequence Alignment_Assumption.pdf sequence
sequence alignment
Sequence Alignment
Sequence-analysis-pairwise-alignment.pdf
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
5.4 mining sequence patterns in biological data
Parwati sihag
Ch06 multalign
International Journal of Computer Science, Engineering and Information Techno...
DYNAMIC PROGRAMMING, Bioinformatics.pptx
Msa & rooted/unrooted tree
BIOINFORMATICS_AND_PHYLOGENY.pdf.pdf
Clustal
multiple sequence and pairwise alignment.pdf
Bioinformatics lesson
Bioinformatics lesson
Sequence alignment
Virus Sequence Alignment and Phylogenetic Analysis 2019
Ad

Recently uploaded (20)

PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
2Systematics of Living Organisms t-.pptx
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPT
protein biochemistry.ppt for university classes
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
famous lake in india and its disturibution and importance
PDF
. Radiology Case Scenariosssssssssssssss
PDF
Sciences of Europe No 170 (2025)
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
An interstellar mission to test astrophysical black holes
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Cell Membrane: Structure, Composition & Functions
Placing the Near-Earth Object Impact Probability in Context
2Systematics of Living Organisms t-.pptx
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Comparative Structure of Integument in Vertebrates.pptx
POSITIONING IN OPERATION THEATRE ROOM.ppt
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
protein biochemistry.ppt for university classes
7. General Toxicologyfor clinical phrmacy.pptx
microscope-Lecturecjchchchchcuvuvhc.pptx
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
The KM-GBF monitoring framework – status & key messages.pptx
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
famous lake in india and its disturibution and importance
. Radiology Case Scenariosssssssssssssss
Sciences of Europe No 170 (2025)
Biophysics 2.pdffffffffffffffffffffffffff
An interstellar mission to test astrophysical black holes
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
Ad

Lec 4-multiple sequence alignment.pptx..

  • 1. ZIAUDDIN UNIVERSITY Faculty Of Engineering Science, Technology & Management (ZUFESTM) Department Of Biomedical Engineering Dr. Syeda Bushra Zafar Assistant Professor Bioinformatics Multiple Sequence Alignment
  • 2. Multiple sequence alignment • Multiple sequence alignment is to align multiple related sequences to achieve optimal matching of the sequences. • There are unique advantage of multiple sequence alignment: Reveals biological information Carry out phylogenetic analysis Designing of degenerate polymerase chain reaction primers
  • 3. Scoring Function • Multiple sequence alignment is to arrange sequences in such a way that a maximum number of residues from each sequence are matched up according to a particular scoring function. • The scoring function for multiple sequence alignment is based on the concept of sum of pairs (SP).
  • 4. Approaches used in multiple sequence alignment • Exhaustive algorithm • Heuristic algorithm
  • 5. Exhaustive algorithm • The exhaustive alignment method involves examining all possible aligned positions simultaneously • This approach is similar to dynamic programming, but dynamic programming is limited to small datasets of less than ten short sequences. • A program called, divide and conquer alignment (DCA) which uses some exhaustive components is used. • Divide and conquer alignment: • It is a semi exhaustive approach • It works by breaking sequences into small fragments • This algorithm provides an option of using a more heuristic procedure (fastDCA) to choose optimal cutting points so it can more rapidly handle a greater number of sequences. • The resulting short alignments are joined together head to tail • When the length of sequences reach a predefined threshold then dynamic programming is applied for alignment. • It performs global alignment and requires the input sequences to be of similar lengths and domain structures.
  • 6. Heuristic algorithm • Dynamic programming is not feasible for routine multiple sequence alignment, a faster heuristic algorithm has been developed. • Categories of heuristic algorithm: 1. Progressive alignment type 2. Iterative alignment type 3. Block-based alignment type
  • 7. Progressive Alignment Method • It first conducts pairwise alignments for each possible pair of sequences and generate score • The scores are then converted into evolutionary distances to generate a distance matrix for all the sequences involved • A phylogenetic tree is generated using the neighbor-joining method • According to the guide tree, the two most closely related sequences are first re-aligned • Two already aligned sequences are converted to a consensus sequence with gap positions fixed. The consensus is then treated as a single sequence in the subsequent step. • The next closest sequence based on the guide tree is aligned with the consensus sequence using dynamic programming.
  • 9. Clustal • Probably the most well-known progressive alignment program is Clustal • Clustal is a progressive multiple alignment program available either as a stand- alone or on-line program. • The stand-alone program, which runs on UNIX and Macintosh, has two variants, Clustal W and Clustal X. • One of the most important features of this program is the flexibility of using substitution matrices. • For closely related sequences BLOSUM62 or PAM120 matrix are used, whereas, for more divergent sequences BLOSUM45 or PAM250 matrices may be used. • Use of adjustable gap penalties that allow more insertions and deletions in regions that are outside the conserved domains, but fewer in conserved regions
  • 10. • The program also applies a weighting scheme to increase the reliability of aligning divergent sequences Drawbacks and Solutions It is unsuitable for comparing sequences of different lengths. Accuracy is compromised. • Another major limitation is the “greedy” nature of the algorithm. The final alignment could be far from optimal
  • 11. T-Coffee (Tree-based Consistency Objective Function for alignment Evaluation) • T-Coffee performs both global and local pairwise alignment for all possible pairs involved The global pairwise alignment is performed using the Clustal program. The local pairwise alignment is generated by the Lalign program, from which the top ten scored alignments are selected Both global and local alignments are pooled Consistency of alignment is evaluated and a score is generated. Each pairwise alignment is further aligned with a possible third sequence. The result is used to refine the original pairwise alignment based on a consistency criterion in a process known as library extension. Distance matrix is built to derive a guide tree Multiple sequence alignment performed using progressive approach.
  • 12. • T-Coffee avoids the problem of getting stuck in the suboptimal alignment regions, because an optimal initial alignment is chosen from many alternative alignments • T-Coffee indeed outperform Clustal however, it is slower than Clustal because of the time required for calculation of consistency score. • T-Coffee provides a graphical output of the alignment results.
  • 13. Iterative alignment • The iterative approach is based on the idea that an optimal solution can be found by repeatedly modifying existing suboptimal solutions. • The order of the sequences used for alignment is different in each iteration • This method may alleviate the “greedy” problem of the progressive strategy, because order is change everytime. • This method does not have guarantees for finding the optimal alignment as it is also heuristic in nature
  • 14. PRRN • Uses a double nested iterative strategy for multiple alignment. It performs multiple alignment through two sets of iterations: inner iteration and outer iteration • In the outer iteration, an initial random alignment is generated that is used to derive a UPGMA tree • In the inner iteration, first the sequences are divided randomly into two groups • The two groups, each treated as a single sequence, are then aligned to each other using global dynamic programming • The process is repeated through many cycles until the total SP score no longer increases • Tree is generated • New weights are applied to optimize alignment scores. The newly optimized alignment is subject to further realignment in the inner iteration. • This process is repeated over many cycles until there is no further improvement in the overall alignment scores
  • 16. Block-Based Alignment • It is a local alignment based strategy that identifies a block of ungapped alignment shared by all the sequences. DIALIGIN 2 • The method breaks each of the sequences down to smaller segments and performs all possible pairwise alignments between the segments. • High-scoring segments, called blocks, among different sequences are then compiled in a progressive manner to assemble a full multiple alignment • It places emphasis on block-to-block comparison rather than residue-to-residue comparison
  • 17. Match-Box • The program compares segments of every nine residues of all possible pairwise alignments. • If the similarity of particular segments is above a certain threshold across all sequences, they are used as an anchor to assemble multiple alignments