SlideShare a Scribd company logo
3
Most read
4
Most read
5
Most read
Clustering Microarray Data

                                           Heather Turner

                                          Department of Statistics
                                         University of Warwick, UK




Heather Turner (University of Warwick)                               1/9
Overview of Microarray Experiment




                                         −→                                −→




    Array of p genes                          Scanned image                     n × p matrix
         (×n)                                     (×n)


Heather Turner (University of Warwick)        Clustering Microarray Data                       2/9
Example: Serum Stimulation of
                                         Human Fibroblasts
                                         (Eisen, Spellman, Brown & Botstein, PNAS,
                                         1998)
                                              9,800 spots representing 8,600 genes
                                              12 samples taken over 24 hour period
                                              Highlighted clusters can be roughly
                                              categorised as genes involved in
                                              A cholesterol biosynthesis
                                              B the cell cycle
                                              C the immediate–early response
                                              D signaling and angiogenesis
                                              E wound healing and tissue remodelling

Heather Turner (University of Warwick)        Clustering Microarray Data               3/9
Why the need for specialised techniques?

          Application
                  Dimensions of the data are nonstandard (large n, small p)
          Structure
                  Both genes and sample clusters may be of interest
                  Co-expression may be restricted to a subset of the attributes
                  Genes/samples may belong to more than one group
                  Many “uninteresting” genes
          Nature
                  Clusters of interest may not be characterised by similar
                  expression profile
                  Samples may be taken over time


Heather Turner (University of Warwick)   Clustering Microarray Data               4/9
One-way Clustering Techniques

          Increased structural flexibility
     Overlapping non-exhaustive clusters                              Context-specific clusters




            Gene shaving: Hastie et al,                         Clustering On Subsets of
            Genome Biol., 2000                                  Attributes (COSA): Friedman
                                                                and Meulman, JRSS B, 2004


Heather Turner (University of Warwick)   Clustering Microarray Data                              5/9
Two-way Clustering Techniques
          Use conventional one-way methods iteratively
        Sample clusters within gene clusters                   Clusters within two-way clusters




                Inter-related two-way                                 Coupled Two-Way Clustering
                clustering: Tang et al, BIBE 01                       (CTWC): Getz et al, PNAS,
                                                                      2003
                EMMIX-GENE: McLachlan et
                al, Bioinformatics, 2002
Heather Turner (University of Warwick)   Clustering Microarray Data                           6/9
Co-clustering Techniques
          Simultaneously cluster both genes and samples
                   Two-way partition                                  Conjugate clusters




            Spectral bi-clustering: Kluger,                     Double Conjugated Clustering
            Genome Res., 2003                                   (DCC): Busygin et al, SIAM
                                                                ICDM 02
            Co-clustering: Cho, SIAM
            ICDM 04
Heather Turner (University of Warwick)   Clustering Microarray Data                        7/9
Biclustering Techniques
          Retrieve isolated two-way clusters: biclusters
         Clusters based on latent model                                 Biclusters




            Rich probabilistic models: Segal                    SAMBA: Tanay et al,
            et al, Bioinformatics, 2001                         Bioinformatics, 2002

                                                                Plaid models: Lazzeroni and
                                                                Owen, Statist. Sinica, 2002
Heather Turner (University of Warwick)   Clustering Microarray Data                           8/9
Current Situation

          Many novel methods, few used in practice
                  Molecular biologists often have limited (access to) statistical
                  expertise
                  Limited number of methods in publically available software
          Little work on performance evaluation
          Development of methods continues
                  Improved algorithms
                  Time series
                  Three-way data
                  Integretation of other sources of data



Heather Turner (University of Warwick)      Clustering Microarray Data              9/9

More Related Content

PPT
Msc Thesis - Presentation
PPTX
Bioinformatics for beginners (exam point of view)
PPTX
Genomics
PDF
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
PDF
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
PDF
EVE161: Microbial Phylogenomics - Class 1 - Introduction
DOC
I NTRODUCTION.doc
PDF
Sample Work For Engineering Literature Review and Gap Identification
Msc Thesis - Presentation
Bioinformatics for beginners (exam point of view)
Genomics
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
EVE161: Microbial Phylogenomics - Class 1 - Introduction
I NTRODUCTION.doc
Sample Work For Engineering Literature Review and Gap Identification

What's hot (20)

PDF
Structural genomics
PPT
Bioinformatics-General_Intro
PPTX
Bioinformatics introduction
ODP
Genomics Technologies
PDF
PDF
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
PPTX
Bioinformatics and functional genomics
PPTX
Structural Genomics
PDF
Gdt 2-126 (1)
PPTX
Prof. Mohamed Labib Salem's students
PDF
Decision Support System for Bat Identification using Random Forest and C5.0
PPTX
Structural genomics
PPSX
Nikon Small World, Photography Competition 2015
PPTX
MoM2010: Bioinformatics
PDF
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
PDF
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
PPTX
Basics in bioinformatics
PDF
Construction of phylogenetic tree from multiple gene trees using principal co...
PPTX
introduction of Bioinformatics
PPTX
Functional genomics
Structural genomics
Bioinformatics-General_Intro
Bioinformatics introduction
Genomics Technologies
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
Bioinformatics and functional genomics
Structural Genomics
Gdt 2-126 (1)
Prof. Mohamed Labib Salem's students
Decision Support System for Bat Identification using Random Forest and C5.0
Structural genomics
Nikon Small World, Photography Competition 2015
MoM2010: Bioinformatics
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
Basics in bioinformatics
Construction of phylogenetic tree from multiple gene trees using principal co...
introduction of Bioinformatics
Functional genomics
Ad

Viewers also liked (20)

PDF
gnm: a Package for Generalized Nonlinear Models
PDF
Multiplicative Interaction Models in R
PDF
Nonlinear Discrete-time Hazard Models for Entry into Marriage
PDF
Custom Functions for Specifying Nonlinear Terms to gnm
PDF
Generalized Bradley-Terry Modelling of Football Results
PDF
From L to N: Nonlinear Predictors in Generalized Models
PDF
Modelling the Diluting Effect of Social Mobility on Health Inequality
PDF
BradleyTerry2: Flexible Models for Paired Comparisons
PDF
Detecting Drug Effects in the Brain
PDF
Sample slides from "Programming with R" course
PDF
Sample slides from "Getting Started with R" course
PDF
Collaborative Solutions eHealth Event - FactNexus
PPTX
Moral issue of euthanasia
PDF
Collaborative Solutions eHealth Event - Claydata
PDF
Digital Jungle (ฉบับภาษาไทย)
PPT
Enrollment Update-Board of Re...
PPTX
CS Education Event - Class Cover
PDF
GIS Uygulamaları ile Zincir Proje Yönetimi
PPTX
אמנות ישראלית עכשווית הרצאה 3
PPT
English Project 5. C L
gnm: a Package for Generalized Nonlinear Models
Multiplicative Interaction Models in R
Nonlinear Discrete-time Hazard Models for Entry into Marriage
Custom Functions for Specifying Nonlinear Terms to gnm
Generalized Bradley-Terry Modelling of Football Results
From L to N: Nonlinear Predictors in Generalized Models
Modelling the Diluting Effect of Social Mobility on Health Inequality
BradleyTerry2: Flexible Models for Paired Comparisons
Detecting Drug Effects in the Brain
Sample slides from "Programming with R" course
Sample slides from "Getting Started with R" course
Collaborative Solutions eHealth Event - FactNexus
Moral issue of euthanasia
Collaborative Solutions eHealth Event - Claydata
Digital Jungle (ฉบับภาษาไทย)
Enrollment Update-Board of Re...
CS Education Event - Class Cover
GIS Uygulamaları ile Zincir Proje Yönetimi
אמנות ישראלית עכשווית הרצאה 3
English Project 5. C L
Ad

Similar to Clustering Microarray Data (20)

PDF
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
PDF
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
DOCX
Data preprocessing
PDF
Data reduction techniques for high dimensional biological data
PPT
Gene expression profiling i
PPT
DNA microarray
PPT
Kishor Presentation
PDF
A comparative study of clustering and biclustering of microarray data
PDF
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
PPT
Microarray Analysis
PDF
Gene Selection for Sample Classification in Microarray: Clustering Based Method
PPTX
Bioinformatics group presentation
PPTX
Bioinformatics group presentation
PDF
MCQs on DNA MicroArray.pdf
PDF
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
PPT
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
PPT
Biostatistics and Statistical Bioinformatics
PDF
Pathway analysis 2012
PPTX
The application of artificial intelligence
DOCX
Identification of Differentially Expressed Genes by unsupervised Learning Method
Clustering Approaches for Evaluation and Analysis on Formal Gene Expression C...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Data preprocessing
Data reduction techniques for high dimensional biological data
Gene expression profiling i
DNA microarray
Kishor Presentation
A comparative study of clustering and biclustering of microarray data
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
Microarray Analysis
Gene Selection for Sample Classification in Microarray: Clustering Based Method
Bioinformatics group presentation
Bioinformatics group presentation
MCQs on DNA MicroArray.pdf
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Biostatistics and Statistical Bioinformatics
Pathway analysis 2012
The application of artificial intelligence
Identification of Differentially Expressed Genes by unsupervised Learning Method

Clustering Microarray Data

  • 1. Clustering Microarray Data Heather Turner Department of Statistics University of Warwick, UK Heather Turner (University of Warwick) 1/9
  • 2. Overview of Microarray Experiment −→ −→ Array of p genes Scanned image n × p matrix (×n) (×n) Heather Turner (University of Warwick) Clustering Microarray Data 2/9
  • 3. Example: Serum Stimulation of Human Fibroblasts (Eisen, Spellman, Brown & Botstein, PNAS, 1998) 9,800 spots representing 8,600 genes 12 samples taken over 24 hour period Highlighted clusters can be roughly categorised as genes involved in A cholesterol biosynthesis B the cell cycle C the immediate–early response D signaling and angiogenesis E wound healing and tissue remodelling Heather Turner (University of Warwick) Clustering Microarray Data 3/9
  • 4. Why the need for specialised techniques? Application Dimensions of the data are nonstandard (large n, small p) Structure Both genes and sample clusters may be of interest Co-expression may be restricted to a subset of the attributes Genes/samples may belong to more than one group Many “uninteresting” genes Nature Clusters of interest may not be characterised by similar expression profile Samples may be taken over time Heather Turner (University of Warwick) Clustering Microarray Data 4/9
  • 5. One-way Clustering Techniques Increased structural flexibility Overlapping non-exhaustive clusters Context-specific clusters Gene shaving: Hastie et al, Clustering On Subsets of Genome Biol., 2000 Attributes (COSA): Friedman and Meulman, JRSS B, 2004 Heather Turner (University of Warwick) Clustering Microarray Data 5/9
  • 6. Two-way Clustering Techniques Use conventional one-way methods iteratively Sample clusters within gene clusters Clusters within two-way clusters Inter-related two-way Coupled Two-Way Clustering clustering: Tang et al, BIBE 01 (CTWC): Getz et al, PNAS, 2003 EMMIX-GENE: McLachlan et al, Bioinformatics, 2002 Heather Turner (University of Warwick) Clustering Microarray Data 6/9
  • 7. Co-clustering Techniques Simultaneously cluster both genes and samples Two-way partition Conjugate clusters Spectral bi-clustering: Kluger, Double Conjugated Clustering Genome Res., 2003 (DCC): Busygin et al, SIAM ICDM 02 Co-clustering: Cho, SIAM ICDM 04 Heather Turner (University of Warwick) Clustering Microarray Data 7/9
  • 8. Biclustering Techniques Retrieve isolated two-way clusters: biclusters Clusters based on latent model Biclusters Rich probabilistic models: Segal SAMBA: Tanay et al, et al, Bioinformatics, 2001 Bioinformatics, 2002 Plaid models: Lazzeroni and Owen, Statist. Sinica, 2002 Heather Turner (University of Warwick) Clustering Microarray Data 8/9
  • 9. Current Situation Many novel methods, few used in practice Molecular biologists often have limited (access to) statistical expertise Limited number of methods in publically available software Little work on performance evaluation Development of methods continues Improved algorithms Time series Three-way data Integretation of other sources of data Heather Turner (University of Warwick) Clustering Microarray Data 9/9