SlideShare a Scribd company logo
Comparative genomics
in eukaryotes
Gene family analysis



  Klaas Vandepoele, PhD


Professor Ghent University
Comparative & Integrative Genomics
VIB – Ghent University, Belgium


                 http://www.bits.vib.be
Workflow




2
Applications of clustering the
        proteome(s)
       Gene families form the basis for the evolutionary
        (or phylogenetic) analysis of
          Detection of orthologs and paralogs
          Gene duplication, family expansions,
           pseudogene formation and gene loss
          Species taxonomies
          Horizontal Gene Transfer (HGT)
          Evolution of gene structure
             • Introns
             • Protein domain organisation &
               (re)arrangements
          Base composition and codon usage

3
I. Structural annotation: genome-
        wide versus family-wise
       Rationale family-wise annotation
           Since every gene has different (sequence)
            characteristics and different genes evolve at
            different rates, using these characteristics to
            determine homologous gene models will
            improve the overall structural annotation
            quality
       Properties:
           Slow & nearly-manual procedure
           High-quality gene models revealing biological
            novel findings

4
Workflow family-wise annotation
            procedure

  Collecting experi-        MSA experimental                          Family
                                                 HMMbuild
mental representatives       representatives                        HMM profile

              EST/cDNA


                                      BLAST                         Species X
                                                                    proteome
           Protein motifs                      Ab initio gene prediction

      Correction gene model               Putative
                                                                    HMMsearch
                                         Homologs
        Classification using
        Phylogenetic trees

5   Detailed characterization                                    http://hmmer.janelia.org/
Experimental representatives


InterProScan




PFAM HMM logo
     Clustalw + JalView




6
BLAST / HMMsearch


    1. Use multiple sequence
       alignment to create HMM profile
    2. Use HMM profile to search for
       similar proteins




7
Representatives + putative homologs

                                                                        BioEdit Sequence Editor




Suffix finalcds indicates corrected gene model compared to the original gene model
generate by the ab-initio gene prediction


             Multiple sequence alignments assist in the detection and
              correction of errors in the structural annotation (missed exon)
8
Representatives + putative homologs




Suffix finalcds indicates corrected gene model compared to the original gene model
generate by the ab-initio gene prediction


             Multiple sequence alignments assist in the detection of errors
              in the structural annotation (false first exon)
9
Examples of family-specific protein
         motifs




        B-type cyclins have HxKF signature
        Cyclin destruction boxes (B1-type cyclin R-[AV]LGDIGN)

10
Examples of family-specific protein
     Arabidopsis
     Rice
                        motifs




                      D-type cyclins contain LxCxE Rb-binding motif
                      Low conservation of phylogenetic signal at primary sequence level
                      General rules are rarely general: exceptions (i.e. missing protein
                       motifs) are frequent and might indicate functional divergence
11
Classification using phylogenetic
                tree construction
        A- and B-type cyclins
          are mitotic cyclins


                                                                           D-type cyclins are
                                                                               G1-specific



     H-type cyclins regulate activity
       of CDK-activating kinases




         • The complexity of the cyclin gene family appears to be higher in plants than in
         mammals
         • Whether there is functional redundancy within A- and B-type cyclins or different
         regulation (and expression) of some cyclin subclasses remains to be analyzed
12
Unraveling functional divergence using
     Genes   large-scale expression compendia




13
                           Plant tissues
Unraveling functional divergence using
             large-scale expression compendia


                                      A-type cyclin




                                      B-type cyclin
     Genes




                                      D-type cyclin



14
                      Plant tissues                   Genevestigator
II. Orthology & paralogy

        A major goal of sequence analysis is evolutionary
         reconstruction. It is critical to distinguish between two
         principal types of homologous relationships, which differ
         in their evolutionary history and functional implications.

        Orthologs, defined as homologous genes evolved
         through speciation (~evolutionary counterparts derived
         from a single ancestral gene in the last common ancestor
         of the given two species)

        Paralogs, which are homologous genes evolved through
         duplication within the same (perhaps ancestral) genome.

        These definitions were first introduced by Fitch (1970)

15
Orthology & paralogy inference


     Organism phylogeny        Gene phylogenies
     (species tree)                gene duplication
                                                              a1
                    A

                                                              b1

                    B                                         c1
                                          a1
                                               b)             a2
                                          a2
                    C                                         b2
                                          b1
                                                              c2
                          a)              b2
       speciation                                     Outparalogs

16                        Inparalogs      c1
In- and outparalogy




17   Sonnhammer & Koonin: Orthology, paralogy and proposed classification for paralog subtypes
Tree reconciliation

        The automatic detection of speciation and duplication
         events using a species tree and gene family tree




18
III. Types of proteome analysis




19
The evolution of multi-domain
     proteins




20
Interpreting the output of an all-
       against-all similarity search




     Metrics for sequence similarity:
     • E-value, Bit score or percent identity
21   • alignment coverage
Clustering of similar sequences




             Proteins = vertices ~ nodes
        Sequence similarity relationship = edges
22
Clustering of similar sequences




23
Advanced methods for protein
         (orthology) clustering
        Sequence similarity-based
            COG (RBH)         [Tatusov 1997]
            InParanoid        [Remm et al., 2001]
            Tribe-MCL         [Van Dongen 2000]
            OrthoMCL          [Li et al., 2003]

        Phylogenetic tree-based
            PhylomeDB         [Huerta-Cepas et al., 2007]
            Ensembl Compara   [Vilella et al., 2008]


24
Overview methodologies



     BBH
                               Inparanoid



            COG




                                 species overlap




25                                                 Gabaldon, 2008
              reconciliation
IV. Resources




26
Resources (bis)

        Ensembl (Vertebrates)
        EnsembGenomes (Metazoa, Protists,
         Fungi, Plants & Bacteria)

        OrthoMCLDB 5 (150 genomes)
        YGOB (>15 Fungi)




27
Hands-on

        Goal: identify and characterize gene family
         members encoding for talin 2 (TLN2)

         1.   Select Query gene
         2.   Retrieve homo/orthologs
         3.   Create multiple sequence alignment
         4.   Identify conserved positions
         5.   Create phylogenetic tree and identify
              ortho/paralogous genes



28

More Related Content

PDF
BITS - Comparative genomics on the genome level
PDF
Comparative Genomics and Visualisation BS32010
PPTX
Genomics,proteomics and comparative genomics
PDF
Comparative Genomics and Visualisation - Part 1
PPTX
Current trends in pseduogene detection and characterization
PPT
Comparative genomics @ sid 2003 format
PPTX
Genetic fine str. analysis & complementation
PPTX
Comparative and functional genomics
BITS - Comparative genomics on the genome level
Comparative Genomics and Visualisation BS32010
Genomics,proteomics and comparative genomics
Comparative Genomics and Visualisation - Part 1
Current trends in pseduogene detection and characterization
Comparative genomics @ sid 2003 format
Genetic fine str. analysis & complementation
Comparative and functional genomics

What's hot (20)

PPTX
Mapping population ppt
PDF
Comparative Genomics and Visualisation - Part 2
PPTX
Gene mapping & its role in evolution
PPTX
Comparative genomics
PPTX
Gene mapping and gene cloning
PPTX
Pradeep.ii
PPTX
Cisgenesis and Intragenesis
PPTX
Comparative genomics in eukaryotes, organelles
PPT
3.1 genes (2)
PPT
chloroplast genome ppt.
PPTX
Genome Mapping
PDF
Molecular marker and its application to genome mapping and molecular breeding
PPTX
genetic linkage and gene mapping
PPTX
Linkage mapping and QTL analysis_Lecture
PPTX
Tetrad analysis, positive and negative interference, mapping through somatic ...
PPTX
Gene mapping and cloning of disease gene
PPTX
Mapping the genome of bacteria
PPTX
Comparative genomics and proteomics
Mapping population ppt
Comparative Genomics and Visualisation - Part 2
Gene mapping & its role in evolution
Comparative genomics
Gene mapping and gene cloning
Pradeep.ii
Cisgenesis and Intragenesis
Comparative genomics in eukaryotes, organelles
3.1 genes (2)
chloroplast genome ppt.
Genome Mapping
Molecular marker and its application to genome mapping and molecular breeding
genetic linkage and gene mapping
Linkage mapping and QTL analysis_Lecture
Tetrad analysis, positive and negative interference, mapping through somatic ...
Gene mapping and cloning of disease gene
Mapping the genome of bacteria
Comparative genomics and proteomics
Ad

Viewers also liked (20)

PDF
BITS - Genevestigator to easily access transcriptomics data
PDF
BITS - Comparative genomics: the Contra tool
PDF
Productivity tips - Introduction to linux for bioinformatics
PDF
BITS - Protein inference from mass spectrometry data
PDF
The structure of Linux - Introduction to Linux for bioinformatics
POT
RNA-seq quality control and pre-processing
PDF
RNA-seq for DE analysis: detecting differential expression - part 5
PDF
RNA-seq: analysis of raw data and preprocessing - part 2
PPTX
RNA-seq differential expression analysis
PDF
RNA-seq: general concept, goal and experimental design - part 1
PDF
Exchange your knowledge on plant gene families
PDF
Analyzing and integrating probabilistic and deterministic computational model...
PPTX
B.sc biochem i bobi u 4 gene prediction
PDF
IntelliGO semantic similarity measure for Gene Ontology annotations
PPTX
Central dogma of dna
PDF
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
PPTX
SCoT and RAPD
PDF
Bioalgo 2012-01-gene-prediction-sim
PDF
BITS - Search engines for mass spec data
PPTX
Emerging challenges in data-intensive genomics
BITS - Genevestigator to easily access transcriptomics data
BITS - Comparative genomics: the Contra tool
Productivity tips - Introduction to linux for bioinformatics
BITS - Protein inference from mass spectrometry data
The structure of Linux - Introduction to Linux for bioinformatics
RNA-seq quality control and pre-processing
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq differential expression analysis
RNA-seq: general concept, goal and experimental design - part 1
Exchange your knowledge on plant gene families
Analyzing and integrating probabilistic and deterministic computational model...
B.sc biochem i bobi u 4 gene prediction
IntelliGO semantic similarity measure for Gene Ontology annotations
Central dogma of dna
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
SCoT and RAPD
Bioalgo 2012-01-gene-prediction-sim
BITS - Search engines for mass spec data
Emerging challenges in data-intensive genomics
Ad

Similar to BITS - Comparative genomics: gene family analysis (20)

PPTX
Detection of genomic homology in eukaryotic genomes
PPTX
MASTER'S SEMINAR -Southern Hybridization: From DNA Transfer to Probe Detection
PDF
BITS - Introduction to comparative genomics
PPT
Functional Genomics lecture as part of Genomics unit.
PPTX
Comparative genomics
PPTX
Life science grade 12
PDF
HHMI Research poster -6-9-2014 Bipolar
DOCX
Expression systems
PDF
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
PDF
BITS: Overview of important biological databases beyond sequences
PPT
13-miller-chap-5a-lecture.ppt
PPT
verynicepptonnothinggiveallniceworks.ppt
PPT
13 miller-chap-5a-lecture
PPT
miller-chap-5a
PPT
13-miller-chap-5a-lecture.ppt
PPT
Molecular Genetics (Tecniques)-lecture.ppt
PPTX
Microbiology Assignment Help
PDF
Asnmnt 4
PPT
4_BCOR12_4develop_2008.ppt
PPTX
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Detection of genomic homology in eukaryotic genomes
MASTER'S SEMINAR -Southern Hybridization: From DNA Transfer to Probe Detection
BITS - Introduction to comparative genomics
Functional Genomics lecture as part of Genomics unit.
Comparative genomics
Life science grade 12
HHMI Research poster -6-9-2014 Bipolar
Expression systems
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
BITS: Overview of important biological databases beyond sequences
13-miller-chap-5a-lecture.ppt
verynicepptonnothinggiveallniceworks.ppt
13 miller-chap-5a-lecture
miller-chap-5a
13-miller-chap-5a-lecture.ppt
Molecular Genetics (Tecniques)-lecture.ppt
Microbiology Assignment Help
Asnmnt 4
4_BCOR12_4develop_2008.ppt
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform

More from BITS (19)

PDF
RNA-seq for DE analysis: extracting counts and QC - part 4
PDF
RNA-seq for DE analysis: the biology behind observed changes - part 6
PDF
RNA-seq: Mapping and quality control - part 3
PDF
Text mining on the command line - Introduction to linux for bioinformatics
PDF
Managing your data - Introduction to Linux for bioinformatics
PDF
Introduction to Linux for bioinformatics
PDF
BITS - Overview of sequence databases for mass spectrometry data analysis
PDF
BITS - Introduction to proteomics
PDF
BITS - Introduction to Mass Spec data generation
PPTX
BITS training - UCSC Genome Browser - Part 2
PPTX
Marcs (bio)perl course
PDF
Basics statistics
PDF
Cytoscape: Integrating biological networks
PDF
Cytoscape: Gene coexppression and PPI networks
PDF
Genevestigator
PDF
BITS: UCSC genome browser - Part 1
PPT
Vnti11 basics course
PPT
Bits protein structure
PPT
BITS: Introduction to Linux - Software installation the graphical and the co...
RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq: Mapping and quality control - part 3
Text mining on the command line - Introduction to linux for bioinformatics
Managing your data - Introduction to Linux for bioinformatics
Introduction to Linux for bioinformatics
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Introduction to proteomics
BITS - Introduction to Mass Spec data generation
BITS training - UCSC Genome Browser - Part 2
Marcs (bio)perl course
Basics statistics
Cytoscape: Integrating biological networks
Cytoscape: Gene coexppression and PPI networks
Genevestigator
BITS: UCSC genome browser - Part 1
Vnti11 basics course
Bits protein structure
BITS: Introduction to Linux - Software installation the graphical and the co...

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Modernizing your data center with Dell and AMD
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
KodekX | Application Modernization Development
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Electronic commerce courselecture one. Pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
The AUB Centre for AI in Media Proposal.docx
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Modernizing your data center with Dell and AMD
NewMind AI Monthly Chronicles - July 2025
KodekX | Application Modernization Development
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
MYSQL Presentation for SQL database connectivity
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Chapter 3 Spatial Domain Image Processing.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Per capita expenditure prediction using model stacking based on satellite ima...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

BITS - Comparative genomics: gene family analysis

  • 1. Comparative genomics in eukaryotes Gene family analysis Klaas Vandepoele, PhD Professor Ghent University Comparative & Integrative Genomics VIB – Ghent University, Belgium http://www.bits.vib.be
  • 3. Applications of clustering the proteome(s)  Gene families form the basis for the evolutionary (or phylogenetic) analysis of  Detection of orthologs and paralogs  Gene duplication, family expansions, pseudogene formation and gene loss  Species taxonomies  Horizontal Gene Transfer (HGT)  Evolution of gene structure • Introns • Protein domain organisation & (re)arrangements  Base composition and codon usage 3
  • 4. I. Structural annotation: genome- wide versus family-wise  Rationale family-wise annotation  Since every gene has different (sequence) characteristics and different genes evolve at different rates, using these characteristics to determine homologous gene models will improve the overall structural annotation quality  Properties:  Slow & nearly-manual procedure  High-quality gene models revealing biological novel findings 4
  • 5. Workflow family-wise annotation procedure Collecting experi- MSA experimental Family HMMbuild mental representatives representatives HMM profile EST/cDNA BLAST Species X proteome Protein motifs Ab initio gene prediction Correction gene model Putative HMMsearch Homologs Classification using Phylogenetic trees 5 Detailed characterization http://hmmer.janelia.org/
  • 7. BLAST / HMMsearch 1. Use multiple sequence alignment to create HMM profile 2. Use HMM profile to search for similar proteins 7
  • 8. Representatives + putative homologs BioEdit Sequence Editor Suffix finalcds indicates corrected gene model compared to the original gene model generate by the ab-initio gene prediction  Multiple sequence alignments assist in the detection and correction of errors in the structural annotation (missed exon) 8
  • 9. Representatives + putative homologs Suffix finalcds indicates corrected gene model compared to the original gene model generate by the ab-initio gene prediction  Multiple sequence alignments assist in the detection of errors in the structural annotation (false first exon) 9
  • 10. Examples of family-specific protein motifs  B-type cyclins have HxKF signature  Cyclin destruction boxes (B1-type cyclin R-[AV]LGDIGN) 10
  • 11. Examples of family-specific protein Arabidopsis Rice motifs  D-type cyclins contain LxCxE Rb-binding motif  Low conservation of phylogenetic signal at primary sequence level  General rules are rarely general: exceptions (i.e. missing protein motifs) are frequent and might indicate functional divergence 11
  • 12. Classification using phylogenetic tree construction A- and B-type cyclins are mitotic cyclins D-type cyclins are G1-specific H-type cyclins regulate activity of CDK-activating kinases • The complexity of the cyclin gene family appears to be higher in plants than in mammals • Whether there is functional redundancy within A- and B-type cyclins or different regulation (and expression) of some cyclin subclasses remains to be analyzed 12
  • 13. Unraveling functional divergence using Genes large-scale expression compendia 13 Plant tissues
  • 14. Unraveling functional divergence using large-scale expression compendia A-type cyclin B-type cyclin Genes D-type cyclin 14 Plant tissues Genevestigator
  • 15. II. Orthology & paralogy  A major goal of sequence analysis is evolutionary reconstruction. It is critical to distinguish between two principal types of homologous relationships, which differ in their evolutionary history and functional implications.  Orthologs, defined as homologous genes evolved through speciation (~evolutionary counterparts derived from a single ancestral gene in the last common ancestor of the given two species)  Paralogs, which are homologous genes evolved through duplication within the same (perhaps ancestral) genome.  These definitions were first introduced by Fitch (1970) 15
  • 16. Orthology & paralogy inference Organism phylogeny Gene phylogenies (species tree) gene duplication a1 A b1 B c1 a1 b) a2 a2 C b2 b1 c2 a) b2 speciation Outparalogs 16 Inparalogs c1
  • 17. In- and outparalogy 17 Sonnhammer & Koonin: Orthology, paralogy and proposed classification for paralog subtypes
  • 18. Tree reconciliation  The automatic detection of speciation and duplication events using a species tree and gene family tree 18
  • 19. III. Types of proteome analysis 19
  • 20. The evolution of multi-domain proteins 20
  • 21. Interpreting the output of an all- against-all similarity search Metrics for sequence similarity: • E-value, Bit score or percent identity 21 • alignment coverage
  • 22. Clustering of similar sequences Proteins = vertices ~ nodes Sequence similarity relationship = edges 22
  • 23. Clustering of similar sequences 23
  • 24. Advanced methods for protein (orthology) clustering  Sequence similarity-based  COG (RBH) [Tatusov 1997]  InParanoid [Remm et al., 2001]  Tribe-MCL [Van Dongen 2000]  OrthoMCL [Li et al., 2003]  Phylogenetic tree-based  PhylomeDB [Huerta-Cepas et al., 2007]  Ensembl Compara [Vilella et al., 2008] 24
  • 25. Overview methodologies BBH Inparanoid COG species overlap 25 Gabaldon, 2008 reconciliation
  • 27. Resources (bis)  Ensembl (Vertebrates)  EnsembGenomes (Metazoa, Protists, Fungi, Plants & Bacteria)  OrthoMCLDB 5 (150 genomes)  YGOB (>15 Fungi) 27
  • 28. Hands-on  Goal: identify and characterize gene family members encoding for talin 2 (TLN2) 1. Select Query gene 2. Retrieve homo/orthologs 3. Create multiple sequence alignment 4. Identify conserved positions 5. Create phylogenetic tree and identify ortho/paralogous genes 28