SlideShare a Scribd company logo
Subtypes of Associated Protein-
DNA (TF-TFBS) Patterns

    Prepared by: Cyrus Tak-Ming Chan (tmchan@cse.cuhk.edu.hk)

    Tak-Ming Chan, Kwong-Sak Leung, Kin-Hong Lee, Man-Hon Wong, Chi-Kong Lau, Stephen
    Kwok-Wing Tsui,
    Subtypes of Associated Protein-DNA (Transcription Factor-Transcription Factor Binding Site) Patterns
    , Nucleic Acids Research, 2012, doi: 10.1093/nar/gks749.

    17/Sep/2012 Version 1.2 (Typos corrected on P12)




                                                                                                  1
Introduction

 Proteins bind to DNA fragments to regulate
  genes
  i.e. Transcription Factors (TFs) bind to Transcription Factor
   Binding Sites (TFBSs)




    Finding the binding cores (several residues only) is fundamental and important


                                                                                 2
Motivations

 Finding patterns/motifs one-sided is
  challenging and difficult
      e.g. TFBS Motif Discovery: Noises, variations through mutations,
       unknown locations—weak signals to be recovered




     ?      —Prediction                      —True TFBS
Tak-Ming Chan et al,
IEEE Transactions on Evolutionary Computation, 2012 /                     3
BMC Bioinformatics, 2009, 10: 321 / Bioinformatics, 2007, 24(3)
Introduction

    Finding associated patterns on both sides is
     shown to be promising—when you have many diverse
      binding sequences (e.g. TRANSFAC)
       Associated TF-TFBS patterns found from sequences…


                          x 7664 in TRANSFAC;
                          408 AAs on average
Associated pattern
discovery
                                                           x 26786 bound TFBSs,
…NRIAA…    …TGACA…
                                                           1225 matrices in TRANSFAC;
…NRAAA… …TGACA…                                            25bp on average

          …
…NREAA… …TGTGA…          Tak-Ming Chan et al, Discovering approximate-associated sequence     4
                         patterns for protein-DNA interactions. Bioinformatics, 2011, 27(4)
Introduction

   Finding associated patterns on both sides is
    shown to be promising—when you have many diverse
    binding sequences (e.g. TRANSFAC)
     Associated TF-TFBS patterns found from sequences are verified
       on 3D structures to be binding cores!
                                                                  x 40222 binding pairs
                                                                  from 1290 PDB protein-
    Verified on 3D structures                                     DNA complexes
    (binding cores <3.5Å)

…NRIAA…   …TGACA…

…NRAAA… …TGACA…

          …
…NREAA… …TGTGA…           Tak-Ming Chan et al, Discovering approximate-associated sequence     5
                          patterns for protein-DNA interactions. Bioinformatics, 2011, 27(4)
Introduction—Motivations

   We can go further with these promising
    associated TF-TFBS patterns
     Discovering and analyzing the binding variances (subtypes)

     Subtypes may
     •Lead to changed binding preferences
     •Distinguish conserved from flexible binding residues
     •Reveal novel binding mechanisms

…NRIAA…   …TGACA…

…NRAAA… …TGACA…

          …
…NREAA… …TGTGA…                                                    6
Methods & Materials




                      7
Methods & Materials

 Both L-2 distance and p-value of Chi-
  squared test are used to shortlist subtypes
  (3rd: G-C; 4th:G/C-G )




                                                8
Results

 Sample results from
  http://www.cse.cuhk.edu.hk/~tmchan/subtypes/




                                                  9
Results

 Subtypes with evidence of changed binding
 preferences             >70% of subtypes (& pairs) reflect
                         changed binding preferences according
                         to PDB structure evidence.




                                                           10
Results

 Subtype clusters show more conserved
  (invariant) residues are important for protein-
  DNA interactions; variant residues show
  specific properties




                                                    11
Results

 Case study shows subtypes that are
  potentially critical for regulation through
  dimerization and thus TF-TFBS binding
 PKVEIL-CAGCTG                          PKVVIL-CACGTG
 myogenic regulatory factor (MRF)       Myc family (Oncogene): PDB 1NKP
 family: PDB 1MDY
 PKVEIL appears in TFs of MRF4,  PKVVIL appears in TFs of c/L/v-Myc
 Myf-5, Myf-6, MyoD… in TRANSFAC in TRANSFAC

 • The subtypes are discovered without family information while reflecting
   strong familial specificity
 • Literatures on wet-labs support that if V is mutated to AA (MycV394D)
   similar to E, the dimerization of Myc-Max will be abolished (Miz1 binding
   deficient)
                                                                               12
Discussion
 Further applications
   Applications on TFBS (motif) matching by adding TF associated
    subtype information

   Extension of the method on high-throughput sequencing data
    (e.g. ChIP-Seq, Protein Binding Microarrays)

   Integration of other information to enhance the TF-TFBS
    prediction

   Incorporation of 3D homology modeling to better model protein-
    DNA interactions

   Analysis of regulatory mechanisms with other data, e.g. allele-
    specific mRNA data, to reveal more detailed regulatory
    mechanisms
                                                                      13

More Related Content

PPT
Drug design
PDF
Gutell 079.nar.2001.29.04724
PPT
Cheminformatics: An overview
PPT
Sequence Analysis
PDF
Basics of bioinformatics
PDF
Molecular Evolution and Phylogenetics (2009)
PDF
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
PDF
Bioinformatics.Assignment
Drug design
Gutell 079.nar.2001.29.04724
Cheminformatics: An overview
Sequence Analysis
Basics of bioinformatics
Molecular Evolution and Phylogenetics (2009)
A Combined Approach to Part-of-Speech Tagging Using Features Extraction and H...
Bioinformatics.Assignment

What's hot (16)

PPT
Phylogenetic studies
PDF
Cambridge Beamer
PDF
Towards a Query Rewriting Algorithm Over Proteomics XML Resources
PDF
Hc3612711275
PPT
Biomedical literature mining
PDF
Hyponymy extraction of domain ontology
PPTX
The tree of life
PPTX
Molecular phylogenetics
PPT
Sequence Alignment In Bioinformatics
DOC
Lecture Notes in Computer Science:
DOC
Statistical Named Entity Recognition for Hungarian – analysis ...
PPT
Softwares For Phylogentic Analysis
PPTX
BioInformatics Tools -Genomics , Proteomics and metablomics
PDF
Protein Structure Alignment and Comparison
PPT
Modular Ontologies: the Package-based Description Logics Approach
PDF
Prediction of Answer Keywords using Char-RNN
Phylogenetic studies
Cambridge Beamer
Towards a Query Rewriting Algorithm Over Proteomics XML Resources
Hc3612711275
Biomedical literature mining
Hyponymy extraction of domain ontology
The tree of life
Molecular phylogenetics
Sequence Alignment In Bioinformatics
Lecture Notes in Computer Science:
Statistical Named Entity Recognition for Hungarian – analysis ...
Softwares For Phylogentic Analysis
BioInformatics Tools -Genomics , Proteomics and metablomics
Protein Structure Alignment and Comparison
Modular Ontologies: the Package-based Description Logics Approach
Prediction of Answer Keywords using Char-RNN
Ad

Viewers also liked (20)

PPT
Prediction of transcription factor binding to DNA using rule induction methods
PPT
A knowledge-based strategy for structural recognition of transcription factor...
PPTX
A functional and evolutionary perspective on transcription factor binding in ...
PPTX
Transcription regulatory elements
PPT
Protein dna interaction
PPT
634062fall11copyrtcarmona2
PPT
Getting Knowledge into Action for Best Quality Healthcare
PPTX
Report upf moscow 2012 rus
DOC
Loi chuc tet 2013 hay nhat tin.tuyensinh247.com
PPSX
The making of a mother final
PPTX
Gramm
PPT
We guardians Aug'13
PPTX
Computer and Internet
DOCX
Child development chapter 1 review
PPT
Making the Books Balance – Understanding the Financial Context and Efficiency...
PDF
ODP
Safeguarding ppt
PPTX
Parallel Session 2.3.2 What's Your Problem? Lessons on How to Solve National ...
Prediction of transcription factor binding to DNA using rule induction methods
A knowledge-based strategy for structural recognition of transcription factor...
A functional and evolutionary perspective on transcription factor binding in ...
Transcription regulatory elements
Protein dna interaction
634062fall11copyrtcarmona2
Getting Knowledge into Action for Best Quality Healthcare
Report upf moscow 2012 rus
Loi chuc tet 2013 hay nhat tin.tuyensinh247.com
The making of a mother final
Gramm
We guardians Aug'13
Computer and Internet
Child development chapter 1 review
Making the Books Balance – Understanding the Financial Context and Efficiency...
Safeguarding ppt
Parallel Session 2.3.2 What's Your Problem? Lessons on How to Solve National ...
Ad

Similar to Subtypes of Associated Protein-DNA (Transcription Factor-Transcription Factor Binding Site) Patterns (20)

PDF
AdaptivesequencingusingnanoporesanddeeplearningofmitochondrialDNA
PPTX
DNA Sequencing in Phylogeny
PDF
Analysis of Genomic and Proteomic Sequence Using Fir Filter
PPTX
PPTX
Main bioinfomatics alignment tools.pptx
PDF
International Journal of Engineering Research and Development
PDF
A Methodology For Motif Discovery Employing Iterated Cluster Re-Assignment
PDF
AdaptivesequencingusingnanoporesanddeeplearningofmitochondrialDNA
DOCX
Molecular marker and its application in breed improvement and conservation.docx
PPTX
Ivan Sotelo Poster FINAL Ver
PDF
Robust tn5 transposase
PPT
31931 31941
PPTX
Paper memo: persistent homology on biological problems
PPTX
DNA-Protein interaction by 3C based method.pptx
PPT
Prediction of protein function
DOC
Bioinformatics
POT
RNA-seq quality control and pre-processing
PDF
Webinar about JASPAR BioPython module and MANTA.
PPT
Useful.ppt
PPT
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
AdaptivesequencingusingnanoporesanddeeplearningofmitochondrialDNA
DNA Sequencing in Phylogeny
Analysis of Genomic and Proteomic Sequence Using Fir Filter
Main bioinfomatics alignment tools.pptx
International Journal of Engineering Research and Development
A Methodology For Motif Discovery Employing Iterated Cluster Re-Assignment
AdaptivesequencingusingnanoporesanddeeplearningofmitochondrialDNA
Molecular marker and its application in breed improvement and conservation.docx
Ivan Sotelo Poster FINAL Ver
Robust tn5 transposase
31931 31941
Paper memo: persistent homology on biological problems
DNA-Protein interaction by 3C based method.pptx
Prediction of protein function
Bioinformatics
RNA-seq quality control and pre-processing
Webinar about JASPAR BioPython module and MANTA.
Useful.ppt
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PPTX
Cloud computing and distributed systems.
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Approach and Philosophy of On baking technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Programs and apps: productivity, graphics, security and other tools
PPT
Teaching material agriculture food technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Spectroscopy.pptx food analysis technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
Electronic commerce courselecture one. Pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
KodekX | Application Modernization Development
Cloud computing and distributed systems.
Dropbox Q2 2025 Financial Results & Investor Presentation
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Approach and Philosophy of On baking technology
Unlocking AI with Model Context Protocol (MCP)
Programs and apps: productivity, graphics, security and other tools
Teaching material agriculture food technology
Per capita expenditure prediction using model stacking based on satellite ima...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Spectroscopy.pptx food analysis technology
Machine learning based COVID-19 study performance prediction
Electronic commerce courselecture one. Pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
sap open course for s4hana steps from ECC to s4
Digital-Transformation-Roadmap-for-Companies.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
The AUB Centre for AI in Media Proposal.docx
Advanced methodologies resolving dimensionality complications for autism neur...

Subtypes of Associated Protein-DNA (Transcription Factor-Transcription Factor Binding Site) Patterns

  • 1. Subtypes of Associated Protein- DNA (TF-TFBS) Patterns Prepared by: Cyrus Tak-Ming Chan (tmchan@cse.cuhk.edu.hk) Tak-Ming Chan, Kwong-Sak Leung, Kin-Hong Lee, Man-Hon Wong, Chi-Kong Lau, Stephen Kwok-Wing Tsui, Subtypes of Associated Protein-DNA (Transcription Factor-Transcription Factor Binding Site) Patterns , Nucleic Acids Research, 2012, doi: 10.1093/nar/gks749. 17/Sep/2012 Version 1.2 (Typos corrected on P12) 1
  • 2. Introduction  Proteins bind to DNA fragments to regulate genes  i.e. Transcription Factors (TFs) bind to Transcription Factor Binding Sites (TFBSs) Finding the binding cores (several residues only) is fundamental and important 2
  • 3. Motivations  Finding patterns/motifs one-sided is challenging and difficult  e.g. TFBS Motif Discovery: Noises, variations through mutations, unknown locations—weak signals to be recovered ? —Prediction —True TFBS Tak-Ming Chan et al, IEEE Transactions on Evolutionary Computation, 2012 / 3 BMC Bioinformatics, 2009, 10: 321 / Bioinformatics, 2007, 24(3)
  • 4. Introduction  Finding associated patterns on both sides is shown to be promising—when you have many diverse binding sequences (e.g. TRANSFAC)  Associated TF-TFBS patterns found from sequences… x 7664 in TRANSFAC; 408 AAs on average Associated pattern discovery x 26786 bound TFBSs, …NRIAA… …TGACA… 1225 matrices in TRANSFAC; …NRAAA… …TGACA… 25bp on average … …NREAA… …TGTGA… Tak-Ming Chan et al, Discovering approximate-associated sequence 4 patterns for protein-DNA interactions. Bioinformatics, 2011, 27(4)
  • 5. Introduction  Finding associated patterns on both sides is shown to be promising—when you have many diverse binding sequences (e.g. TRANSFAC)  Associated TF-TFBS patterns found from sequences are verified on 3D structures to be binding cores! x 40222 binding pairs from 1290 PDB protein- Verified on 3D structures DNA complexes (binding cores <3.5Å) …NRIAA… …TGACA… …NRAAA… …TGACA… … …NREAA… …TGTGA… Tak-Ming Chan et al, Discovering approximate-associated sequence 5 patterns for protein-DNA interactions. Bioinformatics, 2011, 27(4)
  • 6. Introduction—Motivations  We can go further with these promising associated TF-TFBS patterns  Discovering and analyzing the binding variances (subtypes) Subtypes may •Lead to changed binding preferences •Distinguish conserved from flexible binding residues •Reveal novel binding mechanisms …NRIAA… …TGACA… …NRAAA… …TGACA… … …NREAA… …TGTGA… 6
  • 8. Methods & Materials  Both L-2 distance and p-value of Chi- squared test are used to shortlist subtypes (3rd: G-C; 4th:G/C-G ) 8
  • 9. Results  Sample results from  http://www.cse.cuhk.edu.hk/~tmchan/subtypes/ 9
  • 10. Results  Subtypes with evidence of changed binding preferences >70% of subtypes (& pairs) reflect changed binding preferences according to PDB structure evidence. 10
  • 11. Results  Subtype clusters show more conserved (invariant) residues are important for protein- DNA interactions; variant residues show specific properties 11
  • 12. Results  Case study shows subtypes that are potentially critical for regulation through dimerization and thus TF-TFBS binding PKVEIL-CAGCTG PKVVIL-CACGTG myogenic regulatory factor (MRF) Myc family (Oncogene): PDB 1NKP family: PDB 1MDY PKVEIL appears in TFs of MRF4, PKVVIL appears in TFs of c/L/v-Myc Myf-5, Myf-6, MyoD… in TRANSFAC in TRANSFAC • The subtypes are discovered without family information while reflecting strong familial specificity • Literatures on wet-labs support that if V is mutated to AA (MycV394D) similar to E, the dimerization of Myc-Max will be abolished (Miz1 binding deficient) 12
  • 13. Discussion  Further applications  Applications on TFBS (motif) matching by adding TF associated subtype information  Extension of the method on high-throughput sequencing data (e.g. ChIP-Seq, Protein Binding Microarrays)  Integration of other information to enhance the TF-TFBS prediction  Incorporation of 3D homology modeling to better model protein- DNA interactions  Analysis of regulatory mechanisms with other data, e.g. allele- specific mRNA data, to reveal more detailed regulatory mechanisms 13