SlideShare a Scribd company logo
Short introduction to Bioinformatics
             What are the Probabilistic Models?
                            Sequence Alignment
                             Pairwise Alignment
            Multiple Sequence Alignment Models
                         What is Phylogenetics?
                     Building Phylogenetic Trees
                                   Other Models
                                    Conctact Us




Introduction to Probabilistic Models for Bioinformatics

              Igor Bogicevic (igor.bogicevic@sbgenomics.com)




                                          July 3, 2011




                                                                                                         EVEN BRIDGES
                                                                                                             G E N O M I C S, LLC




  Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Short introduction to Bioinformatics




       Bioinformatics is the application of statistics and computer science to the field of
       molecular biology.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Short introduction to Bioinformatics




       Bioinformatics is the application of statistics and computer science to the field of
       molecular biology.
       Major research efforts in the field include sequence alignment, gene finding,
       genome assembly, drug design, drug discovery, protein structure alignment,
       protein structure prediction, prediction of gene expression and protein-protein
       interactions, genome-wide association studies and the modeling of evolution.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Short introduction to Bioinformatics




       Bioinformatics is the application of statistics and computer science to the field of
       molecular biology.
       Major research efforts in the field include sequence alignment, gene finding,
       genome assembly, drug design, drug discovery, protein structure alignment,
       protein structure prediction, prediction of gene expression and protein-protein
       interactions, genome-wide association studies and the modeling of evolution.
       At the current moment, given the enormous volumes of sequenced data, one of
       the biggest challenges is not producing, but actually understanding the data.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What are the Probabilistic Models?

       There are 2 basic definitions:




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What are the Probabilistic Models?

       There are 2 basic definitions:
       Statistical analysis tool that estimates, on the basis of past (historical) data, the
       probability of an event occurring again.
       Probabilistic model is a system that simulates the object under the consideration
       and produces different outcomes with different probabilities.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What are the Probabilistic Models?

       There are 2 basic definitions:
       Statistical analysis tool that estimates, on the basis of past (historical) data, the
       probability of an event occurring again.
       Probabilistic model is a system that simulates the object under the consideration
       and produces different outcomes with different probabilities.
       Simple example - rolling a die.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What are the Probabilistic Models?

       There are 2 basic definitions:
       Statistical analysis tool that estimates, on the basis of past (historical) data, the
       probability of an event occurring again.
       Probabilistic model is a system that simulates the object under the consideration
       and produces different outcomes with different probabilities.
       Simple example - rolling a die.
       A bit more relevant example - random sequence model in DNA .
       Biological sequences are strings from a finite alphabet of residues, most
       commonly either four nucleotides, or twenty amino acids.
       Imagine that a residue a occurs with probability qa , if protein or DNA sequence is
       denoted x1 ...xn , then probability of the whole sequence is:
                                                                     n
                                                                     Y
                                                  qx1 qx2 ...qxn =         qxi
                                                                     i=1
                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Sequence Alignment




       Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein
       to identify regions of similarity that may be a consequence of functional,
       structural, or evolutionary relationships between the sequences.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Sequence Alignment




       Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein
       to identify regions of similarity that may be a consequence of functional,
       structural, or evolutionary relationships between the sequences.
       A variety of computational algorithms have been applied to the sequence
       alignment problem, i.e. dynamic programming, heuristic algorithms, probabilistic
       methods.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Sequence Alignment




       Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein
       to identify regions of similarity that may be a consequence of functional,
       structural, or evolutionary relationships between the sequences.
       A variety of computational algorithms have been applied to the sequence
       alignment problem, i.e. dynamic programming, heuristic algorithms, probabilistic
       methods.
       Common formats for representing alignments are FASTA and GenBank format




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
           What are the Probabilistic Models?
                          Sequence Alignment
                           Pairwise Alignment
          Multiple Sequence Alignment Models
                       What is Phylogenetics?
                   Building Phylogenetic Trees
                                 Other Models
                                  Conctact Us




                                                                                                       EVEN BRIDGES
                                                                                                           G E N O M I C S, LLC




Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.
       The three primary methods of producing pairwise alignments are dot-matrix
       methods, dynamic programming, and word methods.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.
       The three primary methods of producing pairwise alignments are dot-matrix
       methods, dynamic programming, and word methods.
       Needleman-Wunsch algorithm (Global Alignment)




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.
       The three primary methods of producing pairwise alignments are dot-matrix
       methods, dynamic programming, and word methods.
       Needleman-Wunsch algorithm (Global Alignment)
       Smith-Waterman algorithm (Local Alignment)




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.
       The three primary methods of producing pairwise alignments are dot-matrix
       methods, dynamic programming, and word methods.
       Needleman-Wunsch algorithm (Global Alignment)
       Smith-Waterman algorithm (Local Alignment)
       FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with
       dynamic models)




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Pairwise Alignment


       Pairwise sequence alignment methods are used to find the best-matching
       piecewise (local) or global alignments of two query sequences.
       The three primary methods of producing pairwise alignments are dot-matrix
       methods, dynamic programming, and word methods.
       Needleman-Wunsch algorithm (Global Alignment)
       Smith-Waterman algorithm (Local Alignment)
       FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with
       dynamic models)
       Gap Penalities - modeling a cost of a gap in matched sequences (linear, affine,
       etc.)



                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                 What are the Probabilistic Models?
                                Sequence Alignment
                                 Pairwise Alignment
                Multiple Sequence Alignment Models
                             What is Phylogenetics?
                         Building Phylogenetic Trees
                                       Other Models
                                        Conctact Us




Example - Smith-Waterman: A matrix H is built as follows:

                                         H(i, 0) = 0, 0 ≤ i ≤ m
                                         H(0, j) = 0, 0 ≤ j ≤ n


                               if ai = bj then w (ai , bj ) = w (match)
                          or if ai ! = bj then w (ai , bj ) = w (mismatch)

                  8                                                          9
                  >
                  >          0                                               >
                                                                             >
                H(i − 1, j − 1) + w (ai , bj )                 Match/Mismatch
                  <                                                          =
H(i, j) = max                                                                  , 1 ≤ i ≤ m, 1 ≤ j ≤ n
              > H(i − 1, j) + w (ai , −)
              >                                                   Deletion   >
                                                                             >
                 H(i, j − 1) + w (−, bj )                         Insertion
              :                                                              ;



                                                                                                             EVEN BRIDGES
                                                                                                                 G E N O M I C S, LLC




      Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
               What are the Probabilistic Models?
                              Sequence Alignment
                               Pairwise Alignment
              Multiple Sequence Alignment Models
                           What is Phylogenetics?
                       Building Phylogenetic Trees
                                     Other Models
                                      Conctact Us



Sequence 1 = ACACACTA, Sequence 2 = AGCACACA




                                                                                                           EVEN BRIDGES
                                                                                                               G E N O M I C S, LLC




    Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                What are the Probabilistic Models?
                               Sequence Alignment
                                Pairwise Alignment
               Multiple Sequence Alignment Models
                            What is Phylogenetics?
                        Building Phylogenetic Trees
                                      Other Models
                                       Conctact Us



Sequence 1 = ACACACTA, Sequence 2 = AGCACACA
w(match) = +2
w(a,-) = w(-,b) = w(mismatch) = -1

                                  −      A      C     A       C      A       C        T       A
                        0                                                                       1
                  B−              0      0      0     0       0      0        0        0      0C
                  BA              0      2      1     2       1      2        1        0      2C
                  B                                                                             C
                  BG              0      1      1     1       1      1        1        0      1C
                  B                                                                             C
                  BC              0      0      3     2       3      2        3        2      1C
                  B                                                                             C
                H=B
                  BA              0      2      2     5       4      5        4        3      4C
                                                                                                C
                  BC              0      1      4     4       7      6        7        6      5C
                  B                                                                             C
                  BA              0      2      3     6       6      9        8        7      8C
                  B                                                                             C
                  @C              0      1      4     5       8      8       11       10       9A
                    A             0      2      3     6       7      10      10       10      12




                                                                                                                EVEN BRIDGES
                                                                                                                    G E N O M I C S, LLC




     Igor Bogicevic (igor.bogicevic@sbgenomics.com)       Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                 What are the Probabilistic Models?
                                Sequence Alignment
                                 Pairwise Alignment
                Multiple Sequence Alignment Models
                             What is Phylogenetics?
                         Building Phylogenetic Trees
                                       Other Models
                                        Conctact Us



Sequence 1 = ACACACTA, Sequence 2 = AGCACACA
w(match) = +2
w(a,-) = w(-,b) = w(mismatch) = -1

                                   −      A      C     A       C      A       C        T       A
                         0                                                                       1
                   B−              0      0      0     0       0      0        0        0      0C
                   BA              0      2      1     2       1      2        1        0      2C
                   B                                                                             C
                   BG              0      1      1     1       1      1        1        0      1C
                   B                                                                             C
                   BC              0      0      3     2       3      2        3        2      1C
                   B                                                                             C
                 H=B
                   BA              0      2      2     5       4      5        4        3      4C
                                                                                                 C
                   BC              0      1      4     4       7      6        7        6      5C
                   B                                                                             C
                   BA              0      2      3     6       6      9        8        7      8C
                   B                                                                             C
                   @C              0      1      4     5       8      8       11       10       9A
                     A             0      2      3     6       7      10      10       10      12

In the example, the highest value corresponds to the cell in position (8,8). The
walk back corresponds to (8,8), (7,7), (7,6), (6,5), (5,4), (4,3), (3,2), (2,1),
(1,1), and (0,0)
Sequence 1 = A-CACACTA, Sequence 2 = AGCACAC-A                                                                   EVEN BRIDGES
                                                                                                                     G E N O M I C S, LLC




      Igor Bogicevic (igor.bogicevic@sbgenomics.com)       Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Multiple Sequence Alignment Models



       A multiple sequence alignment (MSA) is a sequence alignment of three or more
       biological sequences, commonly protein, DNA, or RNA.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Multiple Sequence Alignment Models



       A multiple sequence alignment (MSA) is a sequence alignment of three or more
       biological sequences, commonly protein, DNA, or RNA.
       We usually want to do multiple alignments to find a homologous sequences that
       point to a shared evolutionary origins that can be used for further phylogenetic
       analysis.
       Progressive Alignment Methods - constructing succession of a pairwise alignment.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Multiple Sequence Alignment Models



       A multiple sequence alignment (MSA) is a sequence alignment of three or more
       biological sequences, commonly protein, DNA, or RNA.
       We usually want to do multiple alignments to find a homologous sequences that
       point to a shared evolutionary origins that can be used for further phylogenetic
       analysis.
       Progressive Alignment Methods - constructing succession of a pairwise alignment.
       Hidden Markov Models - representation of MSA as DAG, observed states are
       individual alignment columns and the hidden states represent the presumed
       ancestral sequence.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
           What are the Probabilistic Models?
                          Sequence Alignment
                           Pairwise Alignment
          Multiple Sequence Alignment Models
                       What is Phylogenetics?
                   Building Phylogenetic Trees
                                 Other Models
                                  Conctact Us




                                                                                                       EVEN BRIDGES
                                                                                                           G E N O M I C S, LLC




Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What is Phylogenetics?



       Phylogenetics is the study of evolutionary relatedness among groups of organisms
       (e.g. species, populations), which is discovered through molecular sequencing
       data and morphological data matrices.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What is Phylogenetics?



       Phylogenetics is the study of evolutionary relatedness among groups of organisms
       (e.g. species, populations), which is discovered through molecular sequencing
       data and morphological data matrices.
       Evolution is regarded as a branching process, whereby populations are altered
       over time and may speciate into separate branches, hybridize together, or
       terminate by extinction. This may be visualized in a phylogenetic tree.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


What is Phylogenetics?



       Phylogenetics is the study of evolutionary relatedness among groups of organisms
       (e.g. species, populations), which is discovered through molecular sequencing
       data and morphological data matrices.
       Evolution is regarded as a branching process, whereby populations are altered
       over time and may speciate into separate branches, hybridize together, or
       terminate by extinction. This may be visualized in a phylogenetic tree.
       Ernst Haeckel’s recapitulation theory (”ontogeny recapitulates phylogeny”) is a
       hypothesis that in developing from embryo to adult, animals go through stages
       resembling or representing successive stages in the evolution of their remote
       ancestors.



                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Building Phylogenetic Trees


       Phylogenetic trees among a nontrivial number of input sequences are constructed
       using computational phylogenetics methods.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Building Phylogenetic Trees


       Phylogenetic trees among a nontrivial number of input sequences are constructed
       using computational phylogenetics methods.
       Common method is to search for maximum likelihood, often within a Bayesian
       Framework, and apply an explicit model of evolution to phylogenetic tree
       estimation.




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Building Phylogenetic Trees


       Phylogenetic trees among a nontrivial number of input sequences are constructed
       using computational phylogenetics methods.
       Common method is to search for maximum likelihood, often within a Bayesian
       Framework, and apply an explicit model of evolution to phylogenetic tree
       estimation.
       Identifying the optimal tree using many of these techniques is NP-hard, so
       heuristic search and optimization methods are used in combination with
       tree-scoring functions to identify a reasonably good tree that fits the data.




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Building Phylogenetic Trees


       Phylogenetic trees among a nontrivial number of input sequences are constructed
       using computational phylogenetics methods.
       Common method is to search for maximum likelihood, often within a Bayesian
       Framework, and apply an explicit model of evolution to phylogenetic tree
       estimation.
       Identifying the optimal tree using many of these techniques is NP-hard, so
       heuristic search and optimization methods are used in combination with
       tree-scoring functions to identify a reasonably good tree that fits the data.
       They do not necessarily accurately represent the species evolutionary history as
       the data on which they are based is noisy; the analysis can be confounded by
       horizontal gene transfer, hybridisation between species that were not nearest
       neighbors on the tree before hybridisation takes place, convergent evolution, and
       conserved sequences.

                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
           What are the Probabilistic Models?
                          Sequence Alignment
                           Pairwise Alignment
          Multiple Sequence Alignment Models
                       What is Phylogenetics?
                   Building Phylogenetic Trees
                                 Other Models
                                  Conctact Us




                                                                                                       EVEN BRIDGES
                                                                                                           G E N O M I C S, LLC




Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                       What are the Probabilistic Models?
                                      Sequence Alignment
                                       Pairwise Alignment
                      Multiple Sequence Alignment Models
                                   What is Phylogenetics?
                               Building Phylogenetic Trees
                                             Other Models
                                              Conctact Us


Other Models




       Transformational Grammars (Chomsky Hierarchy)
       RNA Structure Analysis Models (RNA contains the interactions - rather than
       preserving the sequence)




                                                                                                                   EVEN BRIDGES
                                                                                                                       G E N O M I C S, LLC




            Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics
Short introduction to Bioinformatics
                        What are the Probabilistic Models?
                                       Sequence Alignment
                                        Pairwise Alignment
                       Multiple Sequence Alignment Models
                                    What is Phylogenetics?
                                Building Phylogenetic Trees
                                              Other Models
                                               Conctact Us


Contact Us




       We are Hiring!




                                                                                                                    EVEN BRIDGES
                                                                                                                        G E N O M I C S, LLC




             Igor Bogicevic (igor.bogicevic@sbgenomics.com)   Introduction to Probabilistic Models for Bioinformatics

More Related Content

PDF
Machine Learning Deep Learning AI and Data Science
PPTX
Swiss pdb viewer
PPT
Bioinformatics
PPTX
Deep neural networks
PPTX
Bioinformatics
PPT
Bioinformatics and Drug Discovery
PPT
HMM (Hidden Markov Model)
PPTX
Introduction to Named Entity Recognition
Machine Learning Deep Learning AI and Data Science
Swiss pdb viewer
Bioinformatics
Deep neural networks
Bioinformatics
Bioinformatics and Drug Discovery
HMM (Hidden Markov Model)
Introduction to Named Entity Recognition

What's hot (20)

PPTX
Swarm Intelligence - An Introduction
PPT
Soft Computing-173101
PPTX
Embryo splitting
PPTX
Knowledge Extraction
PDF
Shweta ppt I1FFL
PPT
Bioinformatics, its application main
PDF
Quality control of sequencing with fast qc obtained with
PPTX
biological detabase
PPTX
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
PPTX
Linear regression with gradient descent
PPTX
Needleman-Wunsch Algorithm
PPTX
Uses of Artificial Intelligence in Bioinformatics
PPTX
Restriction enzymes and their categories
PPTX
Applying Hidden Markov Models to Bioinformatics
PPTX
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
PPT
Clustering
PDF
MEGA (Molecular Evolutionary Genetics Analysis)
PPTX
MACHINE LEARNING - GENETIC ALGORITHM
PDF
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Swarm Intelligence - An Introduction
Soft Computing-173101
Embryo splitting
Knowledge Extraction
Shweta ppt I1FFL
Bioinformatics, its application main
Quality control of sequencing with fast qc obtained with
biological detabase
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Linear regression with gradient descent
Needleman-Wunsch Algorithm
Uses of Artificial Intelligence in Bioinformatics
Restriction enzymes and their categories
Applying Hidden Markov Models to Bioinformatics
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
Clustering
MEGA (Molecular Evolutionary Genetics Analysis)
MACHINE LEARNING - GENETIC ALGORITHM
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Ad

Viewers also liked (20)

PDF
Pairwise Alignment Course - Verify Your Cloning
PPTX
Sequence comparison techniques
PPTX
Introduction to sequence alignment
PPT
Multiple sequence alignment
PPT
Sequence Alignment In Bioinformatics
PPTX
Application of bioinformatics
PPT
Pairwise sequence alignment
PPTX
2015 bioinformatics phylogenetics_wim_vancriekinge
PPTX
TCS: A new multiple sequence alignment reliability measure to estimate align...
PPT
Phylogenetics2
PPT
Phylogenetics1
PDF
BIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic Trees
PPT
Clustal X
PPT
The Needleman Wunsch algorithm
PDF
Secondary Structure Prediction of proteins
PPTX
Hidden markov model
PPT
Phylogeny
PPTX
Lecture 7: Hidden Markov Models (HMMs)
PPS
Phylogenetic tree
PPT
Blast fasta 4
Pairwise Alignment Course - Verify Your Cloning
Sequence comparison techniques
Introduction to sequence alignment
Multiple sequence alignment
Sequence Alignment In Bioinformatics
Application of bioinformatics
Pairwise sequence alignment
2015 bioinformatics phylogenetics_wim_vancriekinge
TCS: A new multiple sequence alignment reliability measure to estimate align...
Phylogenetics2
Phylogenetics1
BIS2C. Biodiversity and the Tree of Life. 2014. L4. Inferring Phylogenetic Trees
Clustal X
The Needleman Wunsch algorithm
Secondary Structure Prediction of proteins
Hidden markov model
Phylogeny
Lecture 7: Hidden Markov Models (HMMs)
Phylogenetic tree
Blast fasta 4
Ad

Similar to Introduction to Probabilistic Models for Bioinformatics (8)

PPTX
Bioinformatica t1-bioinformatics
PPTX
Bio-ontologies in bioinformatics: Growing up challenges
PDF
HOMOLOGY MODELING.pptx.pdf
PDF
My ontology is better than yours! Building and evaluating ontologies for inte...
PDF
Stephen Friend HHMI-Penn 2011-05-27
PPTX
Biotechnology as Career Option 2012
PDF
Introduction to Bioinformatics-1.pdf
PDF
Vicarious Systems at Singularity Summit 2011
Bioinformatica t1-bioinformatics
Bio-ontologies in bioinformatics: Growing up challenges
HOMOLOGY MODELING.pptx.pdf
My ontology is better than yours! Building and evaluating ontologies for inte...
Stephen Friend HHMI-Penn 2011-05-27
Biotechnology as Career Option 2012
Introduction to Bioinformatics-1.pdf
Vicarious Systems at Singularity Summit 2011

Recently uploaded (20)

PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Approach and Philosophy of On baking technology
PDF
August Patch Tuesday
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
Tartificialntelligence_presentation.pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
TLE Review Electricity (Electricity).pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
WOOl fibre morphology and structure.pdf for textiles
gpt5_lecture_notes_comprehensive_20250812015547.pdf
A novel scalable deep ensemble learning framework for big data classification...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Approach and Philosophy of On baking technology
August Patch Tuesday
OMC Textile Division Presentation 2021.pptx
Hindi spoken digit analysis for native and non-native speakers
A comparative study of natural language inference in Swahili using monolingua...
Tartificialntelligence_presentation.pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
1 - Historical Antecedents, Social Consideration.pdf
A comparative analysis of optical character recognition models for extracting...
TLE Review Electricity (Electricity).pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Zenith AI: Advanced Artificial Intelligence
Group 1 Presentation -Planning and Decision Making .pptx

Introduction to Probabilistic Models for Bioinformatics

  • 1. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Introduction to Probabilistic Models for Bioinformatics Igor Bogicevic (igor.bogicevic@sbgenomics.com) July 3, 2011 EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 2. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Short introduction to Bioinformatics Bioinformatics is the application of statistics and computer science to the field of molecular biology. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 3. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Short introduction to Bioinformatics Bioinformatics is the application of statistics and computer science to the field of molecular biology. Major research efforts in the field include sequence alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, prediction of gene expression and protein-protein interactions, genome-wide association studies and the modeling of evolution. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 4. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Short introduction to Bioinformatics Bioinformatics is the application of statistics and computer science to the field of molecular biology. Major research efforts in the field include sequence alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, prediction of gene expression and protein-protein interactions, genome-wide association studies and the modeling of evolution. At the current moment, given the enormous volumes of sequenced data, one of the biggest challenges is not producing, but actually understanding the data. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 5. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What are the Probabilistic Models? There are 2 basic definitions: EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 6. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What are the Probabilistic Models? There are 2 basic definitions: Statistical analysis tool that estimates, on the basis of past (historical) data, the probability of an event occurring again. Probabilistic model is a system that simulates the object under the consideration and produces different outcomes with different probabilities. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 7. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What are the Probabilistic Models? There are 2 basic definitions: Statistical analysis tool that estimates, on the basis of past (historical) data, the probability of an event occurring again. Probabilistic model is a system that simulates the object under the consideration and produces different outcomes with different probabilities. Simple example - rolling a die. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 8. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What are the Probabilistic Models? There are 2 basic definitions: Statistical analysis tool that estimates, on the basis of past (historical) data, the probability of an event occurring again. Probabilistic model is a system that simulates the object under the consideration and produces different outcomes with different probabilities. Simple example - rolling a die. A bit more relevant example - random sequence model in DNA . Biological sequences are strings from a finite alphabet of residues, most commonly either four nucleotides, or twenty amino acids. Imagine that a residue a occurs with probability qa , if protein or DNA sequence is denoted x1 ...xn , then probability of the whole sequence is: n Y qx1 qx2 ...qxn = qxi i=1 EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 9. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence Alignment Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 10. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence Alignment Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. A variety of computational algorithms have been applied to the sequence alignment problem, i.e. dynamic programming, heuristic algorithms, probabilistic methods. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 11. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence Alignment Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. A variety of computational algorithms have been applied to the sequence alignment problem, i.e. dynamic programming, heuristic algorithms, probabilistic methods. Common formats for representing alignments are FASTA and GenBank format EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 12. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 13. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 14. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 15. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods. Needleman-Wunsch algorithm (Global Alignment) EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 16. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods. Needleman-Wunsch algorithm (Global Alignment) Smith-Waterman algorithm (Local Alignment) EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 17. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods. Needleman-Wunsch algorithm (Global Alignment) Smith-Waterman algorithm (Local Alignment) FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with dynamic models) EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 18. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Pairwise Alignment Pairwise sequence alignment methods are used to find the best-matching piecewise (local) or global alignments of two query sequences. The three primary methods of producing pairwise alignments are dot-matrix methods, dynamic programming, and word methods. Needleman-Wunsch algorithm (Global Alignment) Smith-Waterman algorithm (Local Alignment) FASTA/BLAST Algorithms (k-tuple heuristic methods, often combined with dynamic models) Gap Penalities - modeling a cost of a gap in matched sequences (linear, affine, etc.) EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 19. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Example - Smith-Waterman: A matrix H is built as follows: H(i, 0) = 0, 0 ≤ i ≤ m H(0, j) = 0, 0 ≤ j ≤ n if ai = bj then w (ai , bj ) = w (match) or if ai ! = bj then w (ai , bj ) = w (mismatch) 8 9 > > 0 > > H(i − 1, j − 1) + w (ai , bj ) Match/Mismatch < = H(i, j) = max , 1 ≤ i ≤ m, 1 ≤ j ≤ n > H(i − 1, j) + w (ai , −) > Deletion > > H(i, j − 1) + w (−, bj ) Insertion : ; EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 20. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence 1 = ACACACTA, Sequence 2 = AGCACACA EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 21. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence 1 = ACACACTA, Sequence 2 = AGCACACA w(match) = +2 w(a,-) = w(-,b) = w(mismatch) = -1 − A C A C A C T A 0 1 B− 0 0 0 0 0 0 0 0 0C BA 0 2 1 2 1 2 1 0 2C B C BG 0 1 1 1 1 1 1 0 1C B C BC 0 0 3 2 3 2 3 2 1C B C H=B BA 0 2 2 5 4 5 4 3 4C C BC 0 1 4 4 7 6 7 6 5C B C BA 0 2 3 6 6 9 8 7 8C B C @C 0 1 4 5 8 8 11 10 9A A 0 2 3 6 7 10 10 10 12 EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 22. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Sequence 1 = ACACACTA, Sequence 2 = AGCACACA w(match) = +2 w(a,-) = w(-,b) = w(mismatch) = -1 − A C A C A C T A 0 1 B− 0 0 0 0 0 0 0 0 0C BA 0 2 1 2 1 2 1 0 2C B C BG 0 1 1 1 1 1 1 0 1C B C BC 0 0 3 2 3 2 3 2 1C B C H=B BA 0 2 2 5 4 5 4 3 4C C BC 0 1 4 4 7 6 7 6 5C B C BA 0 2 3 6 6 9 8 7 8C B C @C 0 1 4 5 8 8 11 10 9A A 0 2 3 6 7 10 10 10 12 In the example, the highest value corresponds to the cell in position (8,8). The walk back corresponds to (8,8), (7,7), (7,6), (6,5), (5,4), (4,3), (3,2), (2,1), (1,1), and (0,0) Sequence 1 = A-CACACTA, Sequence 2 = AGCACAC-A EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 23. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Multiple Sequence Alignment Models A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, commonly protein, DNA, or RNA. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 24. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Multiple Sequence Alignment Models A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, commonly protein, DNA, or RNA. We usually want to do multiple alignments to find a homologous sequences that point to a shared evolutionary origins that can be used for further phylogenetic analysis. Progressive Alignment Methods - constructing succession of a pairwise alignment. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 25. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Multiple Sequence Alignment Models A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, commonly protein, DNA, or RNA. We usually want to do multiple alignments to find a homologous sequences that point to a shared evolutionary origins that can be used for further phylogenetic analysis. Progressive Alignment Methods - constructing succession of a pairwise alignment. Hidden Markov Models - representation of MSA as DAG, observed states are individual alignment columns and the hidden states represent the presumed ancestral sequence. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 26. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 27. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What is Phylogenetics? Phylogenetics is the study of evolutionary relatedness among groups of organisms (e.g. species, populations), which is discovered through molecular sequencing data and morphological data matrices. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 28. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What is Phylogenetics? Phylogenetics is the study of evolutionary relatedness among groups of organisms (e.g. species, populations), which is discovered through molecular sequencing data and morphological data matrices. Evolution is regarded as a branching process, whereby populations are altered over time and may speciate into separate branches, hybridize together, or terminate by extinction. This may be visualized in a phylogenetic tree. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 29. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us What is Phylogenetics? Phylogenetics is the study of evolutionary relatedness among groups of organisms (e.g. species, populations), which is discovered through molecular sequencing data and morphological data matrices. Evolution is regarded as a branching process, whereby populations are altered over time and may speciate into separate branches, hybridize together, or terminate by extinction. This may be visualized in a phylogenetic tree. Ernst Haeckel’s recapitulation theory (”ontogeny recapitulates phylogeny”) is a hypothesis that in developing from embryo to adult, animals go through stages resembling or representing successive stages in the evolution of their remote ancestors. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 30. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Building Phylogenetic Trees Phylogenetic trees among a nontrivial number of input sequences are constructed using computational phylogenetics methods. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 31. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Building Phylogenetic Trees Phylogenetic trees among a nontrivial number of input sequences are constructed using computational phylogenetics methods. Common method is to search for maximum likelihood, often within a Bayesian Framework, and apply an explicit model of evolution to phylogenetic tree estimation. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 32. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Building Phylogenetic Trees Phylogenetic trees among a nontrivial number of input sequences are constructed using computational phylogenetics methods. Common method is to search for maximum likelihood, often within a Bayesian Framework, and apply an explicit model of evolution to phylogenetic tree estimation. Identifying the optimal tree using many of these techniques is NP-hard, so heuristic search and optimization methods are used in combination with tree-scoring functions to identify a reasonably good tree that fits the data. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 33. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Building Phylogenetic Trees Phylogenetic trees among a nontrivial number of input sequences are constructed using computational phylogenetics methods. Common method is to search for maximum likelihood, often within a Bayesian Framework, and apply an explicit model of evolution to phylogenetic tree estimation. Identifying the optimal tree using many of these techniques is NP-hard, so heuristic search and optimization methods are used in combination with tree-scoring functions to identify a reasonably good tree that fits the data. They do not necessarily accurately represent the species evolutionary history as the data on which they are based is noisy; the analysis can be confounded by horizontal gene transfer, hybridisation between species that were not nearest neighbors on the tree before hybridisation takes place, convergent evolution, and conserved sequences. EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 34. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 35. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Other Models Transformational Grammars (Chomsky Hierarchy) RNA Structure Analysis Models (RNA contains the interactions - rather than preserving the sequence) EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics
  • 36. Short introduction to Bioinformatics What are the Probabilistic Models? Sequence Alignment Pairwise Alignment Multiple Sequence Alignment Models What is Phylogenetics? Building Phylogenetic Trees Other Models Conctact Us Contact Us We are Hiring! EVEN BRIDGES G E N O M I C S, LLC Igor Bogicevic (igor.bogicevic@sbgenomics.com) Introduction to Probabilistic Models for Bioinformatics