Unravelling transcription factor functions
through integrative inference of transcriptional
networks in Arabidopsis
KlaasVandepoele
Department of Plant Biotechnology and Bioinformatics, UGent
VIB Center for Plant Systems Biology
Utrecht
Artificial intelligence in plant science and breeding
24 February 2021
plaza_genomics
Comparative Network Biology -Vandepoele lab
• Extract biological knowledge from large-scale experimental data sets using data
integration, comparative sequence & expression analysis, and network biology, to improve
our understanding of gene functions and regulation in plants and diatoms.
Plant Gene Regulatory Networks Comparative functional genomics
• PLAZA 4.0: an integrative resource for functional, evolutionary and
comparative plant genomics. Van Bel et al., 2018, Nucleic Acids Res
• Curse: building expression atlases and co-expression networks
from public RNA-Seq data. Vaneechoutte D, Vandepoele K.
Bioinformatics. 2019
• TF2Network: predicting transcription factor regulators and gene
regulatory networks in Arabidopsis using publicly available binding
site information. Kulkarni et al., 2018, Nucleic Acids Res
• Enhanced maps of transcription factor binding sites improve
regulatory networks learned from accessible chromatin data.
Kulkarni et al., Plant Physiol. 2019
Mapping of Gene Regulatory Networks (GRNs)
Mejia-Guerra et al., 2012
Arabidopsis
-1,700-2,500 Transcription Factors
- 180-791 miRNA
- 2,708 expressed lncRNA
49MB non-coding DNA
AtRegNet:
17,224 regulatory interactions
Experimental characterization of transcriptional activity and
regulatory control
ENCODE
How to integrate the biological knowledge captured by different –omics layers
to build better networks reporting functional regulatory interactions?
1.TF ChIP-Seq
• in vivo method to measure protein-DNA
interactions using chromatin immuno-
precipitation
• Different cellular conditions can be profiled
TF ChIP-Seq
Furey et al., 2012
ChIP
Gene
annotation
Reads
sample
Control
Peak
/
Motif
Read
pileup
Output ChIP-Seq peak calling procedure displayed in genome browser
Position Weight Matrix
(PWM)
TF
target gene
2. in vitroTF binding specificities
Wang et al., 2011
ModelTF binding site as
Position Weight Matrix (PWM)
based on k-mer signals
Protein binding
microarray
target gene
Arabidopsis: PWMs for 990TFs
PWM
3. DNase-seq - Profiling of accesible chromatin
Hesselberth et al, 2009 binding site TF protein
DH footprint
DNase I hypersensitive site
(DHS)
DHS+PWM
 Map all known PWMs on the
promoters of the Arabidopsis query
gene and its orthologs
 Count per PWM position the #species
that support a TF binding site
 Significance estimation (FDR<10%)
Van deVelde et al., Plant Phys 2016 -A Collection of Conserved Non-Coding Sequences to StudyGene Regulation in Flowering Plants.
Conserved PWM
4. Detection of conservedTF binding sites using
phylogenetic footprinting
5-6. Network inference based on expression data
TF
target
Expression-based network inference
GENIE3 - Huynh-Thu et al., 2013
GENIE3
COE TF-target
GENIE3
7. Co-expression + PWM enrichment
• Integrate co-regulatory gene expression data withTF binding sites (PWMs)
PWM enrichment in kNN co-expression cluster
(hypergeometric distribution)
COE+PWM
Benchmarking of different methods to map gene regulatory
networks
Gold standard: 5.7k interactions covering 522 TFs (AtRegNet)
Test set: 20% of gold standard (80% used for training)
Benchmarking of different methods to map gene regulatory
networks
JanVan deVelde
Gold standard: 5.7k interactions covering 522TFs (AtRegNet+literature)
Test set: 20% of gold standard (80% used for training later)
Supervised learning: a network-based approach for large-
scale functional data integration
Marbach et al., Genome Research 2012
Gradient Boosting Machine
• 1000 trees (shrinkage of 0.01, interaction depth 3, 10-fold CV training)
• 80% training data withTrue:False sampling ratio of 3:1
• 7 input networks
d Motifs of TFs ChIP Binding of TFs TF Motif Occurence in
Open Chromatin
TF-Gene Co-Expression
targets
1800 TFs
Supervised Network
Known Interactions
Training Data
Test Data
Cross-validation
Input Features
Classifier
Target gene
TF gene
COE TF-target GENIE3 COE+PWM
DHS+PWM Conserved PWM
PWM
Supervised Learning max F1: 1793k interactions – 1766TFs
Gold standard
Max F1 network -Test set
Recall: 46%
Precision: 71%
F1-measure: 57%
ChIP
Performance supervised learning network (iGRN)
Supervised Learning max F1
Different support of input networks for iGRN
Supervised Learning max F1
iGRN captures functionalTF – target gene interactions
-- overlap target genes not significant (p-value hypergeometric distribution > 0.05)
--
34/40TFs have significant overlap between predicted target genes and DE
genes afterTF perturbation
--
--
--
--
--
iGRN-based functional annotation ofTFs
Recovery of experimental Gene Ontology Biological
Process annotations for TFs with known function
TF
TF function
Target genes
• Recovery of known experimentally-
supported functions for >600 TFs
• Novel functional predictions for 268
unknown TFs
• Highly complementary with AraNet v2
iGRN
catalase
H2O2 H20
3-AT
H2O2
PSI
MV
photorespiration
CIII
AntA
O2
- H2O2
O2
- H2O2
H2O2
Retrograde
signaling
Defense response
PCD
Oxidative stress signaling
In house dataset of ROS marker genes
Willems et al., 2016, Plant Physiology
Van Breusegem lab - VIB
Prediction and evaluation of novel oxidative stressTFs
(“ROS-TFs”)
Target gene enrichment:
• ROS wheel
• GO-BP ‘response to oxidative stress’
NovelTF
functions
(e.g. oxidative stress
responses)
TF Rank ID Gene name q-val
enrichme
nt Phenotype
1AT5G63790 ANAC102,NAC102 1,84E-35 22,74Oxidative (our data)
2AT3G55980 ATSZF1,SZF1 5,10E-26 32,71Salt
3AT2G37430 ZAT11 2,81E-25 14,41Oxidative (paraquat, Ni)
4AT2G40140 ATSZF2,CZF1 6,41E-25 17,45Salt
5AT5G59820 AtZAT12,RHL41,ZAT12 8,70E-25 18,64Oxidative and abiotic
6AT5G24110 ATWRKY30,WRKY30 1,42E-24 6,82oxidative, salt (at early developmental stage)
7AT2G38470 ATWRKY33,WRKY33 5,03E-24 6,85Pathogen, salt, heat stress
8AT2G46400 ATWRKY46,WRKY46 9,78E-24 5,79Osmotic/salt
9AT4G17500 ATERF-1,ERF-1 2,57E-23 9,42Biotic …
10AT1G28370 ATERF11,ERF11 3,10E-23 6,75Osmotic, ET signaling
11AT4G18880 AT-HSFA4A,HSF21 5,09E-23 6,39oxidative, salt
12AT2G23320 AtWRKY15,WRKY15 7,46E-23 6,31Oxidative, salt
13AT5G04340 C2H2,CZF2,ZAT6 7,52E-23 14,09
Cadmium, salt, osmotic stress, P deficiency,
pathogen, drought, cold
14AT1G80840 ATWRKY40,WRKY40 8,93E-23 6,51Biotic, MRS
15AT3G23250 ATMYB15,ATY19 1,86E-22 5,71Drought, cold
16AT1G27730 STZ,ZAT10 4,90E-22 9,99Oxidative transcripts, abiotic, ….
17AT5G49520 ATWRKY48,WRKY48 5,36E-22 11,26Biotic
18AT5G13080 ATWRKY75,WRKY75 1,40E-21 5,81Phosphate starvation, biotic
19AT4G17230 SCL13 3,76E-21 16,60Phytochrome dependent light signaling
20AT5G59450 AT5G59450 5,14E-21 19,95cell division
21AT1G42990 ATBZIP60,BZIP60 1,36E-20 5,46ER, unfolded protein
22AT1G18570 AtMYB51,BW51A 1,84E-20 5,82glucosinolate biosynthesis
23AT4G23810 ATWRKY53,WRKY53 6,64E-20 7,56
leaf senescence and regulation of oxidative
stress genes
24AT1G66550 ATWRKY67,WRKY67 3,25E-19 8,88None; no lines available
25AT3G23220 ESE1 3,54E-19 7,50Salt, ET signaling
26AT5G47230 ATERF5,ATERF-5 3,60E-19 6,00Biotic
27AT3G54810 BME3,BME3-ZF,GATA8 2,01E-18 14,92Germination, salt/drought
28AT5G47220 ATERF2,ATERF-2,ERF2 4,16E-18 5,57Biotic
29AT2G40740 ATWRKY55,WRKY55 6,69E-18 5,70None
30AT4G36990 ATHSF4,AT-HSFB1,HSF4 1,94E-17 5,98
similar to heat shock factor, no known
phenotype
31AT2G30250 ATWRKY25,WRKY25 1,59E-16 4,81Biotic, salt
32AT4G31550 ATWRKY11,WRKY11 2,03E-16 5,33Biotic
33AT5G01380 GT3a 2,15E-16 4,93None
34AT5G22570 ATWRKY38,WRKY38 2,19E-16 6,01Biotic
35AT3G23240 ATERF1,ERF1 2,36E-15 4,16ET
36AT4G17490 ATERF6,ERF6,ERF-6-6 3,13E-15 5,26Oxidative
37AT4G22950 AGL19,GL19 3,37E-15 15,26flowering
38AT3G10500 ANAC053,NAC053,NTL4 3,63E-15 5,02Oxidative/ROS
39AT3G49530 ANAC062,NAC062,NTL6 1,76E-14 7,24ER, unfolded protein
40AT5G62020 AT-HSFB2A,HSF6,HSFB2A 5,87E-14 5,87
41AT1G22070 TGA3 6,60E-14 5,33biotic
42AT1G67970 AT-HSFA8,HSFA8 7,97E-14 11,40redox dependent nucleus translocation
…
Unknown or no
stress-related
function
Oxidative stress
function
Other (a/biotic)
stress function
Ranking based on ROS wheel target
genes enrichment (n=124 TFs)
Functional validation of the predicted ROS-TFs
Inge De Clercq
13/32 regulators were
validated for a function
in ROS responses by
phenotyping
Rank – TF - perturbation
Phenotypes for predicted ROS-TFs
iGRN identified novel ROSTFs from the GRAS, BES1 and GATA families
Expression patterns for novel ROS-TFs
Responsiveness to a wide range of
oxidative stress conditions?
• 14/17 known ROS TFs
• 6/13 novel ROS TFs
Many novel ROS TFs would not have been
predicted solely relying on differential
expression at the whole plant or organ
level!
Conclusions
JanVan deVelde Inge De Clercq
 Different regulatory –omics data types as well as advanced
computational integration methods contribute significantly to the
improved delineation of high-quality gene regulatory networks
 TF binding site-based as well as expression-based regulatory
networks offer a complementary view on functional gene
regulatory interactions
 Gene regulatory networks obtained by supervised learning are a
starting point for
 the systematic functional/regulatory annotation of all Arabidopsis
genes
 new biological discoveries
Li Liu, DriesVaneechoutte
Robin Pottie, Xiaopeng Liu, FrankVan Breusegem
Further reading
Curse: Building expression atlases and co-expression networks from public
RNA-Seq data. Vaneechoutte and Vandepoele (2019) Bioinformatics
TF2Network: predicting transcription factor regulators and gene regulatory
networks in Arabidopsis using publicly available binding site information
Kulkarni, Vaneechoutte, Van de Velde and Vandepoele (2018). Nucleic Acids
Research

More Related Content

PPTX
Papaya ring spot disease
PPTX
9.2 phloem
PPTX
Agrobacterium tumefaciens: A natural genetic engineer
PPT
azotobacter-as-biofertilizer.ppt
PPTX
Abiotic stress classification and factors
PPTX
Structure of Eukaryotic Promoter .pptx
PDF
Fungal genomics
Papaya ring spot disease
9.2 phloem
Agrobacterium tumefaciens: A natural genetic engineer
azotobacter-as-biofertilizer.ppt
Abiotic stress classification and factors
Structure of Eukaryotic Promoter .pptx
Fungal genomics

What's hot (20)

PPTX
LEA(late embryogenesis abundant) protiens and heat shock
PPTX
PPTX
Plant Virus replication.pptx
PDF
DEUTEROMYCOTINA
PPTX
T dna & transposone tagging 1 (2)
PDF
MICROBIAL STRESS RESPONSE REGULATORY ENZYME AND THEIR PHARMACEUTICAL APPLICATION
PDF
Cloning vectors based on m13 and lambda bacteriophage
PPTX
phyllosphere
PPTX
Epidemiology of plant diseases
PPT
Population Genetics.ppt
PDF
DNA Methylation & C Value.pdf
DOCX
Physiological effects of virus infected plants
PPTX
Linkage and recombination of gene
PPT
Chi Square
PPTX
Gene for gene concept Hem raj pant
PPTX
Saif ppt phosphate solubilisation
PDF
Photoperiodism
PPTX
Bacteriophages
PPT
Classical and modern genetics
LEA(late embryogenesis abundant) protiens and heat shock
Plant Virus replication.pptx
DEUTEROMYCOTINA
T dna & transposone tagging 1 (2)
MICROBIAL STRESS RESPONSE REGULATORY ENZYME AND THEIR PHARMACEUTICAL APPLICATION
Cloning vectors based on m13 and lambda bacteriophage
phyllosphere
Epidemiology of plant diseases
Population Genetics.ppt
DNA Methylation & C Value.pdf
Physiological effects of virus infected plants
Linkage and recombination of gene
Chi Square
Gene for gene concept Hem raj pant
Saif ppt phosphate solubilisation
Photoperiodism
Bacteriophages
Classical and modern genetics
Ad

Similar to Integrative inference of transcriptional networks in Arabidopsis yields novel ROS signalling regulators (20)

PPTX
TF2Network: unravelling gene regulatory networks and transcription factor fun...
PPTX
A functional and evolutionary perspective on transcription factor binding in ...
PPTX
Tair workshop stanford2017
PPTX
Inferring gene functions and regulatory interactions in plants using differen...
PDF
2 partners ed_kickoff_dtai
PPT
Genome walking – a new strategy for identification of nucleotide sequence in ...
PPTX
“PHA PRODUCTION FROM RECOMBINANT E.COLI EXPRESSING PHAC1 GENE TO PRODUCE POLY...
PPTX
Using the Ondex system for exploring Arabidopsis regulatory networks
PDF
Molecular characterization of Pst isolates from Western Canada
PPT
Bioinfomatics Presentation
PPTX
MDC Connects: Cell-based screening: Old dogs with new tricks
PDF
In vitro and high throughput screening (HTS) assays
PDF
GPU-accelerated Virtual Screening
PPTX
Metabolomics basics_MKKB1103_biotech for engineers
PDF
Poster: Functional analysis of essential hypothetical proteins of Staphylococ...
PPT
2011 Rna Course Part 1
PDF
Precision Oncology
PPT
Multiplex Assays for Studying Gene Regulation and Cell Function
PPT
Pyrosequencing slide presentation rev3.
PDF
A next generation sequencing based sample-to-result pharmacogenomics research...
TF2Network: unravelling gene regulatory networks and transcription factor fun...
A functional and evolutionary perspective on transcription factor binding in ...
Tair workshop stanford2017
Inferring gene functions and regulatory interactions in plants using differen...
2 partners ed_kickoff_dtai
Genome walking – a new strategy for identification of nucleotide sequence in ...
“PHA PRODUCTION FROM RECOMBINANT E.COLI EXPRESSING PHAC1 GENE TO PRODUCE POLY...
Using the Ondex system for exploring Arabidopsis regulatory networks
Molecular characterization of Pst isolates from Western Canada
Bioinfomatics Presentation
MDC Connects: Cell-based screening: Old dogs with new tricks
In vitro and high throughput screening (HTS) assays
GPU-accelerated Virtual Screening
Metabolomics basics_MKKB1103_biotech for engineers
Poster: Functional analysis of essential hypothetical proteins of Staphylococ...
2011 Rna Course Part 1
Precision Oncology
Multiplex Assays for Studying Gene Regulation and Cell Function
Pyrosequencing slide presentation rev3.
A next generation sequencing based sample-to-result pharmacogenomics research...
Ad

Recently uploaded (20)

PDF
Social preventive and pharmacy. Pdf
PDF
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
PPTX
Cells and Organs of the Immune System (Unit-2) - Majesh Sir.pptx
PPTX
gene cloning powerpoint for general biology 2
PPTX
ELISA(Enzyme linked immunosorbent assay)
PPTX
Understanding the Circulatory System……..
PDF
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
PPTX
perinatal infections 2-171220190027.pptx
PDF
5.Physics 8-WBS_Light.pdfFHDGJDJHFGHJHFTY
PPTX
endocrine - management of adrenal incidentaloma.pptx
PPTX
PMR- PPT.pptx for students and doctors tt
PDF
From Molecular Interactions to Solubility in Deep Eutectic Solvents: Explorin...
PPTX
Preformulation.pptx Preformulation studies-Including all parameter
PDF
Cosmology using numerical relativity - what hapenned before big bang?
PPTX
Introduction to Immunology (Unit-1).pptx
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
PPTX
limit test definition and all limit tests
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PDF
Packaging materials of fruits and vegetables
PPT
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
Social preventive and pharmacy. Pdf
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
Cells and Organs of the Immune System (Unit-2) - Majesh Sir.pptx
gene cloning powerpoint for general biology 2
ELISA(Enzyme linked immunosorbent assay)
Understanding the Circulatory System……..
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
perinatal infections 2-171220190027.pptx
5.Physics 8-WBS_Light.pdfFHDGJDJHFGHJHFTY
endocrine - management of adrenal incidentaloma.pptx
PMR- PPT.pptx for students and doctors tt
From Molecular Interactions to Solubility in Deep Eutectic Solvents: Explorin...
Preformulation.pptx Preformulation studies-Including all parameter
Cosmology using numerical relativity - what hapenned before big bang?
Introduction to Immunology (Unit-1).pptx
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
limit test definition and all limit tests
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
Packaging materials of fruits and vegetables
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...

Integrative inference of transcriptional networks in Arabidopsis yields novel ROS signalling regulators

  • 1. Unravelling transcription factor functions through integrative inference of transcriptional networks in Arabidopsis KlaasVandepoele Department of Plant Biotechnology and Bioinformatics, UGent VIB Center for Plant Systems Biology Utrecht Artificial intelligence in plant science and breeding 24 February 2021 plaza_genomics
  • 2. Comparative Network Biology -Vandepoele lab • Extract biological knowledge from large-scale experimental data sets using data integration, comparative sequence & expression analysis, and network biology, to improve our understanding of gene functions and regulation in plants and diatoms. Plant Gene Regulatory Networks Comparative functional genomics • PLAZA 4.0: an integrative resource for functional, evolutionary and comparative plant genomics. Van Bel et al., 2018, Nucleic Acids Res • Curse: building expression atlases and co-expression networks from public RNA-Seq data. Vaneechoutte D, Vandepoele K. Bioinformatics. 2019 • TF2Network: predicting transcription factor regulators and gene regulatory networks in Arabidopsis using publicly available binding site information. Kulkarni et al., 2018, Nucleic Acids Res • Enhanced maps of transcription factor binding sites improve regulatory networks learned from accessible chromatin data. Kulkarni et al., Plant Physiol. 2019
  • 3. Mapping of Gene Regulatory Networks (GRNs) Mejia-Guerra et al., 2012 Arabidopsis -1,700-2,500 Transcription Factors - 180-791 miRNA - 2,708 expressed lncRNA 49MB non-coding DNA AtRegNet: 17,224 regulatory interactions
  • 4. Experimental characterization of transcriptional activity and regulatory control ENCODE How to integrate the biological knowledge captured by different –omics layers to build better networks reporting functional regulatory interactions?
  • 5. 1.TF ChIP-Seq • in vivo method to measure protein-DNA interactions using chromatin immuno- precipitation • Different cellular conditions can be profiled TF ChIP-Seq Furey et al., 2012 ChIP
  • 6. Gene annotation Reads sample Control Peak / Motif Read pileup Output ChIP-Seq peak calling procedure displayed in genome browser Position Weight Matrix (PWM) TF target gene
  • 7. 2. in vitroTF binding specificities Wang et al., 2011 ModelTF binding site as Position Weight Matrix (PWM) based on k-mer signals Protein binding microarray target gene Arabidopsis: PWMs for 990TFs PWM
  • 8. 3. DNase-seq - Profiling of accesible chromatin Hesselberth et al, 2009 binding site TF protein DH footprint DNase I hypersensitive site (DHS) DHS+PWM
  • 9.  Map all known PWMs on the promoters of the Arabidopsis query gene and its orthologs  Count per PWM position the #species that support a TF binding site  Significance estimation (FDR<10%) Van deVelde et al., Plant Phys 2016 -A Collection of Conserved Non-Coding Sequences to StudyGene Regulation in Flowering Plants. Conserved PWM 4. Detection of conservedTF binding sites using phylogenetic footprinting
  • 10. 5-6. Network inference based on expression data TF target Expression-based network inference GENIE3 - Huynh-Thu et al., 2013 GENIE3 COE TF-target GENIE3
  • 11. 7. Co-expression + PWM enrichment • Integrate co-regulatory gene expression data withTF binding sites (PWMs) PWM enrichment in kNN co-expression cluster (hypergeometric distribution) COE+PWM
  • 12. Benchmarking of different methods to map gene regulatory networks Gold standard: 5.7k interactions covering 522 TFs (AtRegNet) Test set: 20% of gold standard (80% used for training)
  • 13. Benchmarking of different methods to map gene regulatory networks JanVan deVelde Gold standard: 5.7k interactions covering 522TFs (AtRegNet+literature) Test set: 20% of gold standard (80% used for training later)
  • 14. Supervised learning: a network-based approach for large- scale functional data integration Marbach et al., Genome Research 2012 Gradient Boosting Machine • 1000 trees (shrinkage of 0.01, interaction depth 3, 10-fold CV training) • 80% training data withTrue:False sampling ratio of 3:1 • 7 input networks d Motifs of TFs ChIP Binding of TFs TF Motif Occurence in Open Chromatin TF-Gene Co-Expression targets 1800 TFs Supervised Network Known Interactions Training Data Test Data Cross-validation Input Features Classifier Target gene TF gene COE TF-target GENIE3 COE+PWM DHS+PWM Conserved PWM PWM Supervised Learning max F1: 1793k interactions – 1766TFs Gold standard Max F1 network -Test set Recall: 46% Precision: 71% F1-measure: 57% ChIP
  • 15. Performance supervised learning network (iGRN) Supervised Learning max F1
  • 16. Different support of input networks for iGRN Supervised Learning max F1
  • 17. iGRN captures functionalTF – target gene interactions -- overlap target genes not significant (p-value hypergeometric distribution > 0.05) -- 34/40TFs have significant overlap between predicted target genes and DE genes afterTF perturbation -- -- -- -- --
  • 18. iGRN-based functional annotation ofTFs Recovery of experimental Gene Ontology Biological Process annotations for TFs with known function TF TF function Target genes • Recovery of known experimentally- supported functions for >600 TFs • Novel functional predictions for 268 unknown TFs • Highly complementary with AraNet v2 iGRN
  • 19. catalase H2O2 H20 3-AT H2O2 PSI MV photorespiration CIII AntA O2 - H2O2 O2 - H2O2 H2O2 Retrograde signaling Defense response PCD Oxidative stress signaling In house dataset of ROS marker genes Willems et al., 2016, Plant Physiology Van Breusegem lab - VIB
  • 20. Prediction and evaluation of novel oxidative stressTFs (“ROS-TFs”) Target gene enrichment: • ROS wheel • GO-BP ‘response to oxidative stress’ NovelTF functions (e.g. oxidative stress responses) TF Rank ID Gene name q-val enrichme nt Phenotype 1AT5G63790 ANAC102,NAC102 1,84E-35 22,74Oxidative (our data) 2AT3G55980 ATSZF1,SZF1 5,10E-26 32,71Salt 3AT2G37430 ZAT11 2,81E-25 14,41Oxidative (paraquat, Ni) 4AT2G40140 ATSZF2,CZF1 6,41E-25 17,45Salt 5AT5G59820 AtZAT12,RHL41,ZAT12 8,70E-25 18,64Oxidative and abiotic 6AT5G24110 ATWRKY30,WRKY30 1,42E-24 6,82oxidative, salt (at early developmental stage) 7AT2G38470 ATWRKY33,WRKY33 5,03E-24 6,85Pathogen, salt, heat stress 8AT2G46400 ATWRKY46,WRKY46 9,78E-24 5,79Osmotic/salt 9AT4G17500 ATERF-1,ERF-1 2,57E-23 9,42Biotic … 10AT1G28370 ATERF11,ERF11 3,10E-23 6,75Osmotic, ET signaling 11AT4G18880 AT-HSFA4A,HSF21 5,09E-23 6,39oxidative, salt 12AT2G23320 AtWRKY15,WRKY15 7,46E-23 6,31Oxidative, salt 13AT5G04340 C2H2,CZF2,ZAT6 7,52E-23 14,09 Cadmium, salt, osmotic stress, P deficiency, pathogen, drought, cold 14AT1G80840 ATWRKY40,WRKY40 8,93E-23 6,51Biotic, MRS 15AT3G23250 ATMYB15,ATY19 1,86E-22 5,71Drought, cold 16AT1G27730 STZ,ZAT10 4,90E-22 9,99Oxidative transcripts, abiotic, …. 17AT5G49520 ATWRKY48,WRKY48 5,36E-22 11,26Biotic 18AT5G13080 ATWRKY75,WRKY75 1,40E-21 5,81Phosphate starvation, biotic 19AT4G17230 SCL13 3,76E-21 16,60Phytochrome dependent light signaling 20AT5G59450 AT5G59450 5,14E-21 19,95cell division 21AT1G42990 ATBZIP60,BZIP60 1,36E-20 5,46ER, unfolded protein 22AT1G18570 AtMYB51,BW51A 1,84E-20 5,82glucosinolate biosynthesis 23AT4G23810 ATWRKY53,WRKY53 6,64E-20 7,56 leaf senescence and regulation of oxidative stress genes 24AT1G66550 ATWRKY67,WRKY67 3,25E-19 8,88None; no lines available 25AT3G23220 ESE1 3,54E-19 7,50Salt, ET signaling 26AT5G47230 ATERF5,ATERF-5 3,60E-19 6,00Biotic 27AT3G54810 BME3,BME3-ZF,GATA8 2,01E-18 14,92Germination, salt/drought 28AT5G47220 ATERF2,ATERF-2,ERF2 4,16E-18 5,57Biotic 29AT2G40740 ATWRKY55,WRKY55 6,69E-18 5,70None 30AT4G36990 ATHSF4,AT-HSFB1,HSF4 1,94E-17 5,98 similar to heat shock factor, no known phenotype 31AT2G30250 ATWRKY25,WRKY25 1,59E-16 4,81Biotic, salt 32AT4G31550 ATWRKY11,WRKY11 2,03E-16 5,33Biotic 33AT5G01380 GT3a 2,15E-16 4,93None 34AT5G22570 ATWRKY38,WRKY38 2,19E-16 6,01Biotic 35AT3G23240 ATERF1,ERF1 2,36E-15 4,16ET 36AT4G17490 ATERF6,ERF6,ERF-6-6 3,13E-15 5,26Oxidative 37AT4G22950 AGL19,GL19 3,37E-15 15,26flowering 38AT3G10500 ANAC053,NAC053,NTL4 3,63E-15 5,02Oxidative/ROS 39AT3G49530 ANAC062,NAC062,NTL6 1,76E-14 7,24ER, unfolded protein 40AT5G62020 AT-HSFB2A,HSF6,HSFB2A 5,87E-14 5,87 41AT1G22070 TGA3 6,60E-14 5,33biotic 42AT1G67970 AT-HSFA8,HSFA8 7,97E-14 11,40redox dependent nucleus translocation … Unknown or no stress-related function Oxidative stress function Other (a/biotic) stress function Ranking based on ROS wheel target genes enrichment (n=124 TFs)
  • 21. Functional validation of the predicted ROS-TFs Inge De Clercq
  • 22. 13/32 regulators were validated for a function in ROS responses by phenotyping Rank – TF - perturbation
  • 23. Phenotypes for predicted ROS-TFs iGRN identified novel ROSTFs from the GRAS, BES1 and GATA families
  • 24. Expression patterns for novel ROS-TFs Responsiveness to a wide range of oxidative stress conditions? • 14/17 known ROS TFs • 6/13 novel ROS TFs Many novel ROS TFs would not have been predicted solely relying on differential expression at the whole plant or organ level!
  • 25. Conclusions JanVan deVelde Inge De Clercq  Different regulatory –omics data types as well as advanced computational integration methods contribute significantly to the improved delineation of high-quality gene regulatory networks  TF binding site-based as well as expression-based regulatory networks offer a complementary view on functional gene regulatory interactions  Gene regulatory networks obtained by supervised learning are a starting point for  the systematic functional/regulatory annotation of all Arabidopsis genes  new biological discoveries Li Liu, DriesVaneechoutte Robin Pottie, Xiaopeng Liu, FrankVan Breusegem
  • 26. Further reading Curse: Building expression atlases and co-expression networks from public RNA-Seq data. Vaneechoutte and Vandepoele (2019) Bioinformatics TF2Network: predicting transcription factor regulators and gene regulatory networks in Arabidopsis using publicly available binding site information Kulkarni, Vaneechoutte, Van de Velde and Vandepoele (2018). Nucleic Acids Research

Editor's Notes

  • #2: Good afternoon. Thanks to the organizers for the invitation to present our work today.
  • #3: Through the development and application of various bioinformatics methods, we try to identify new aspects of genome biology, especially in the area of gene function prediction, gene regulation and evolutionary/systems biology.
  • #4: For today, the playground will be Arabidopsis, a plant model with a fairly simple genome, but still containing thousands of regulators. When mapping gene regulatory networks, the goal is to identify which TF binds to the promoter of which target gene, and to determine what is the functional regulatory consequence for growth, development or stress response. Especially the latter criterion is important, as we know that many in vivo TF binding events do not lead to changes in transcriptional activity of the target gene.
  • #5: Thus, if we focus on different molecular profiling methods assessing chromatin state, TF binding or transcriptional activity, a major challenge is … In the next slides, I will very briefly introduce different experimental and computational methods to map GRNs, which will be discussed, compared and integrated using ML. In a second part, I will show how such an integrated network has a good predictive power to study TF functions.
  • #6: Thus, in total I will present 7 approaches, which will be input networks, which all have their strengths and weakness, for example with respect to completeness and accuracy.
  • #11: Apart from capturing potential TF binding events, which can be considered as an input signal to transcriptional control, it is also possible to focus on output signals, which is for example the spatial-temporal expression of all genes in the genome. - GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. - In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. - The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. -Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed
  • #12: PWM mapping: FP problem PWM mapping – reduce witthrough integration co-regulatory information
  • #13: Explain all input networks again + size of the networks. How good do these capture true regulatory events?
  • #15: Given the strenghts and weaknesses of these different networks, a network-based approach for large-scale functional data integration was applied. Gradient boosting produces a prediction model for classification using an ensemble of weaker prediction models. The family of boosting methods is based on a constructive strategy of ensemble formation. The main idea of boosting is to add new models to the ensemble sequentially. At each particular iteration, a new weak, base-learner model is trained with respect to the error of the whole ensemble learnt so far.  Model = Decision trees; Posterior prob to be a true interaction: 0.35
  • #16: 22, we can then show results for recent ChIP-Seq data + TF DE gene sets (incl. significance overlap using hypergeometric test).
  • #17: 70% regulatory interactions confirmed by PWM, 50% by PWM in DHS (accessible TFBS) 50% regulatory interactions confirmed by GENIE3, so also regulated info well integrated in the network!
  • #18: Also very good overlap with recent TF ChIP-Seq datasets.
  • #19: >600 known TF-GO BP annotations recovered ; examples cover Abiotic stress, development, hormone signalling
  • #20: Given the good performance to predict known TFs functions, we next wanted to experimentally evaluate novel functional predictions. Excessive production of ROS can cause damage and lead to cell death, but on the other hand, ROS can also serve as signaling molecules in oxidative stress responses. We recently obtained a dataset of robust ROS marker genes, called the ROS wheel, and we were interested in identifying the regulators of these ROS genes.
  • #21: We identified TFs for which the predicted target genes are enriched for ROS wheel markers and oxidative stress functions. We obtained a list of 124 TFs what we call the ROS-TFs and ranked based on ROS wheel target genes enrichment. In dark blue are 17 TFs that have been shown to have a phenotype under oxidative stress conditions. 58 TFs in light blue play a role under other stress conditions. The 49 TFs in orange have no known role in oxidative stress and these are our novel candidate ROS-TFs that we want to functionally validate. Considering the top 124 TFs, 17 TFs with known oxidative stress function (dark blue) – low ranks! 58 TFs with other stress function 49 TFs no known role in ROS/environmental stress
  • #22: For experimental validation, we selected the top 32 novel ROS-TFs, for which at least one Arabidopsis loss- or gain-of-function transgenic line was available. Plant performance under oxidative stress was assessed based on rosette growth of the mutant lines compared to the wild type when grown at low and high MV and 3-AT concentrations, respectively, using automated image analysis. In addition, bleaching, indicating chlorosis symptoms, were visually scored. ROS causing agents: MV : methyl viologen 3-AT: 3-amino triazole
  • #23: For 13 of the 32 phenotyped novel ROS TFs we found at least one mutant allele with significant oxidative stress-induced changes in rosette area relative to the wild type, thus indicating a genotype effect dependent on the oxidative stress treatment and/or genotype-by-treatment effect under at least one of the tested stress treatments. Explain figure. For SCL13, WRKY45, BEH2 and PHL1, experimental evidence is provided by at least two mutant alleles.