SlideShare a Scribd company logo
A Systematic approach to the Large-Scale Analysis of Genotype-Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass
The entire genetic identity of an individual that  does not show  any outward characteristics,  e.g.  Genes, mutations Genotype DNA ACTGCACTGACTGTACGTATATCT ACTGCACTG TG TGTACGTATATCT Mutations Genes
(harder to characterise)  The observable expression of gene’s producing  notable characteristics  in an individual,  e.g.  Hair or eye colour, body mass, resistance to disease Phenotype vs. Brown White and Brown
Genotype  to  Phenotype
Genotype Phenotype ? Current Methods 200 What processes to investigate?
? 200 Microarray + QTL Genes captured in microarray experiment and present in QTL ( Quantitative Trait Loci  )  region Genotype Phenotype Metabolic pathways Phenotypic response investigated using microarray in form of expressed genes or evidence provided through QTL mapping
CHR QTL Gene A Gene B Pathway A Pathway B Pathway linked to phenotype – high priority Pathway not linked to phenotype – medium priority Pathway C Phenotype literature literature literature Gene C Pathway not linked to QTL – low priority Genotype
Issues with current approaches
Huge amounts of data 200+ Genes QTL region on chromosome Microarray 1000+ Genes How do I look at ALL the genes systematically?
Hypothesis-Driven Analyses 200 QTL genes Case: African Sleeping sickness - parasitic infection - Known immune response Pick the genes involved in immunological process 40 QTL genes Pick the genes that I am most familiar with 2 QTL genes Biased view Result: African Sleeping sickness Immune response Cholesterol control Cell death
Manual Methods of data analysis Navigating through hyperlinks No explicit methods Human error Tedious and repetitive
Implicit methods
Issues with current approaches Scale of analysis task User bias and premature filtering Hypothesis-Driven approach to data analysis Constant flux of data - problems  with re-analysis of data Implicit methodologies (hyper-linking through web pages) Error proliferation from any of the listed issues Solution – Automate through workflows
The Two W’s Web Services Technology and standard for exposing code / database with an means that can be consumed by a third party remotely Describes how to interact with it Workflows General technique for describing and executing a process Describes  what  you want to do
Taverna Workflow Workbench http://taverna.sf.net
Hypothesis Utilising the capabilities of workflows and the pathway-driven approach, we are able to provide a more: - systematic - efficient - scalable - un-biased  - unambiguous the benefit will be that  new biology  results will be derived, increasing community knowledge of genotype and phenotype interactions.
Pathway Resource QTL mapping study Microarray gene expression study Identify genes in QTL regions Identify differentially expressed genes Wet Lab Literature Annotate genes with biological pathways Annotate genes with biological pathways Select common biological pathways Hypothesis generation and verification Statistical analysis Genomic Resource
Replicated original chain of data analysis
Trypanosomiasis in Africa http://www.genomics.liv.ac.uk/tryps/trypsindex.html Andy Brass Steve Kemp + many Others
Preliminary Results Trypanosomiasis resistance A strong candidate gene was found  Daxx  gene not found using manual investigation methods The gene was identified from analysis of biological pathway information Possible candidate identified by Yan et al (2004): Daxx SNP info Sequencing of the Daxx gene in  Wet Lab  showed mutations that is thought to change the structure of the protein Mutation was published in scientific literature, noting its effect on the binding of Daxx protein to p53 protein –  p53 plays direct role in cell death and apoptosis, one of the Trypanosomiasis phenotypes More genes to follow (hopefully) in publications being written
Shameless Plug! A Systematic Strategy for Large-Scale Analysis of Genotype-Phenotype Correlations: Identification of candidate genes involved in African Trypanosomiasis Fisher  et al ., (2007) Nucleic Acids Research doi:10.1093/nar/ gkm623   Explicitly discusses the methods we used for the Trypanosomiasis use case Discussion of the results for Daxx and shows mutation Sharing of workflows for re-use, re-purposing
Recycling, Reuse, Repurposing Identified a candidate gene (Daxx) for Trypanosomiasis resistance.  Manual analysis on the microarray and QTL data failed to identify this gene as a candidate.   Unbiased analysis. Confirmed by the wet lab. Here’s the  Science ! Here’s the  e-Science ! Trypanosomiasis  mouse workflow  reused without change  in  Trichuris muris  infection in mice  Identified biological pathways involved in sex dependence Previous manual  two year study  of candidate genes had failed to do this. Workflows now being run over  Colitis/ Inflammatory Bowel Disease in Mice   (without change)
Recycling, Reuse, Repurposing http://www.myexperiment.org/ Share Search Re-use Re-purpose Execute Communicate Record
What next? More use cases?? Can be done, but not for my project Text Mining !!! Aid biologists in identifying novel links between pathways Link pathways to phenotype through literature
Pathway Resource QTL mapping study Microarray gene expression study Identify genes in QTL regions Identify differentially expressed genes Wet Lab Literature Annotate genes with biological pathways Annotate genes with biological pathways Select common biological pathways Hypothesis generation and verification Statistical analysis Genomic Resource
CHR QTL Gene A Gene B Pathway A Pathway B Pathway linked to phenotype – high priority Pathway not linked to phenotype – medium priority Pathway C Phenotype literature literature literature Gene C Pathway not linked to QTL – low priority Genotype DONE MANUALLY
It can’t be that hard, right? PubMed contains ~17,787,763 journals to date Manually searching is tedious and frustrating Can be hard finding the links Computers can help with data gathering and information extraction – that’s their job !!!
Text Mining A means of  assisting  the researcher Time Effort Narrow searches Hypothesis generation and verification Suggested links Limited corpus, but its specific NOT A REPLACEMENT FOR  DOMAIN EXPERTISE
To Sum Up …. Need for Genotype-Phenotype correlations with respect to disease control High-throughput data can provide links between Genotype and Phenotype Highlighted issues with manually conducted  in silico  experiments  Improved the methods of current microarray and QTL based investigations through systematic nature Increased reproducibility of our methods - workflows stored in XML based schema - explicit declaration of services, parameters, and methods of data analysis Shown workflows are capable of deriving new biologically significant results African Trypanosomiasis in the mouse Infection of mice with  Trichuris muris The workflows require expansion to accommodate new analysis techniques – text mining
Many thanks to: including: Joanne Pennock, EPSRC, OMII, myGrid, and lots more people

More Related Content

PDF
Transcriptomics and metabolomics
PPTX
RIBOTYPING
PPTX
Arabidopsis thaliana genome project
PPT
PPTX
Whole genome sequencing
PPT
Similarity
PDF
Gene prediction methods vijay
PPTX
Phage display
Transcriptomics and metabolomics
RIBOTYPING
Arabidopsis thaliana genome project
Whole genome sequencing
Similarity
Gene prediction methods vijay
Phage display

What's hot (20)

PPT
Describe - Bacterial Genome
PPTX
Computational Biology and Bioinformatics
PPTX
OMICS.pptx
PPTX
bioinformatics simple
PPTX
Microsatellite
PPTX
PPTX
Functional proteomics, methods and tools
PDF
Introduction to next generation sequencing
PPTX
Whole genome sequence.
PPT
Sequence file formats
PPTX
gene prediction programs
DOC
Molecular hybridization
PPT
Sequence Alignment In Bioinformatics
PPT
Genome annotation 2013
PPTX
Methods of screening
PPTX
Comparative genomics
PPTX
Multiple sequence alignment
DOCX
PPTX
Sequence Alignment
PPT
Tetra Arm PCR
Describe - Bacterial Genome
Computational Biology and Bioinformatics
OMICS.pptx
bioinformatics simple
Microsatellite
Functional proteomics, methods and tools
Introduction to next generation sequencing
Whole genome sequence.
Sequence file formats
gene prediction programs
Molecular hybridization
Sequence Alignment In Bioinformatics
Genome annotation 2013
Methods of screening
Comparative genomics
Multiple sequence alignment
Sequence Alignment
Tetra Arm PCR
Ad

Viewers also liked (20)

PPTX
Genotypes and phenotypes
PPTX
Intro to genetics ppt
PPTX
Genotype
PPT
Genotype and phenotype
PPTX
Phenotype terminologies in use for genotype-phenotype databases: a common cor...
PPTX
Genetic Basis of Inheritance
PPTX
11 u mutations
PPT
B10vrv4133
PPTX
UTSpeaks: Raising babies (1 - Professor Maralyn Foureur)
PPT
Jay Fishman: indirect effects and viral infections: Infection in Transplantation
PDF
Phyto-oils
PPTX
Functions of nucleus
PPTX
Formal languages to map Genotype to Phenotype in Natural Genomes
PPT
Inclination
PPTX
UTB - Project Perigee Presentation
PPT
L06 from genotype_to_phenotype_
PPTX
Wireless, mobile computing and mobile commerce
PPTX
Incomplete and codominance
PPTX
Educational Grand Rounds: Arthritis
PPT
Genetics chapter 4 part 2(1)
Genotypes and phenotypes
Intro to genetics ppt
Genotype
Genotype and phenotype
Phenotype terminologies in use for genotype-phenotype databases: a common cor...
Genetic Basis of Inheritance
11 u mutations
B10vrv4133
UTSpeaks: Raising babies (1 - Professor Maralyn Foureur)
Jay Fishman: indirect effects and viral infections: Infection in Transplantation
Phyto-oils
Functions of nucleus
Formal languages to map Genotype to Phenotype in Natural Genomes
Inclination
UTB - Project Perigee Presentation
L06 from genotype_to_phenotype_
Wireless, mobile computing and mobile commerce
Incomplete and codominance
Educational Grand Rounds: Arthritis
Genetics chapter 4 part 2(1)
Ad

Similar to A systematic approach to Genotype-Phenotype correlations (20)

PDF
How to transform genomic big data into valuable clinical information
PPT
STRING - Prediction of a functional association network for the yeast mitocho...
PPTX
PadminiNarayanan-Intro-2018.pptx
PPT
INBIOMEDvision Workshop at MIE 2011. Victoria López
PPTX
Gene hunting strategies
PDF
Genome responses of trypanosome infected cattle
ODP
OKC Grand Rounds 2009
PDF
Digging into thousands of variants to find disease genes in Mendelian and com...
PPT
Introducción a la bioinformatica
PDF
Bioinformatics
PPTX
Transgenic animal models & their
PDF
From reads to pathways for efficient disease gene finding
DOCX
rheumatoid arthritis
PDF
A New Generation Of Mechanism-Based Biomarkers For The Clinic
PDF
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
PPTX
Next Generation Sequencing
PPT
provenance of microarray experiments
PDF
ASHG_2014_AP
PDF
A systematic, data driven approach to the combined analysis of microarray and...
PPTX
How to analyse large data sets
How to transform genomic big data into valuable clinical information
STRING - Prediction of a functional association network for the yeast mitocho...
PadminiNarayanan-Intro-2018.pptx
INBIOMEDvision Workshop at MIE 2011. Victoria López
Gene hunting strategies
Genome responses of trypanosome infected cattle
OKC Grand Rounds 2009
Digging into thousands of variants to find disease genes in Mendelian and com...
Introducción a la bioinformatica
Bioinformatics
Transgenic animal models & their
From reads to pathways for efficient disease gene finding
rheumatoid arthritis
A New Generation Of Mechanism-Based Biomarkers For The Clinic
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Next Generation Sequencing
provenance of microarray experiments
ASHG_2014_AP
A systematic, data driven approach to the combined analysis of microarray and...
How to analyse large data sets

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Hybrid model detection and classification of lung cancer
PPTX
The various Industrial Revolutions .pptx
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Architecture types and enterprise applications.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
August Patch Tuesday
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
DP Operators-handbook-extract for the Mautical Institute
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
TLE Review Electricity (Electricity).pptx
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
Modernising the Digital Integration Hub
PDF
1 - Historical Antecedents, Social Consideration.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Web App vs Mobile App What Should You Build First.pdf
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Hybrid model detection and classification of lung cancer
The various Industrial Revolutions .pptx
Final SEM Unit 1 for mit wpu at pune .pptx
observCloud-Native Containerability and monitoring.pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Architecture types and enterprise applications.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
August Patch Tuesday
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
DP Operators-handbook-extract for the Mautical Institute
Group 1 Presentation -Planning and Decision Making .pptx
TLE Review Electricity (Electricity).pptx
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Modernising the Digital Integration Hub
1 - Historical Antecedents, Social Consideration.pdf

A systematic approach to Genotype-Phenotype correlations

  • 1. A Systematic approach to the Large-Scale Analysis of Genotype-Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass
  • 2. The entire genetic identity of an individual that does not show any outward characteristics, e.g. Genes, mutations Genotype DNA ACTGCACTGACTGTACGTATATCT ACTGCACTG TG TGTACGTATATCT Mutations Genes
  • 3. (harder to characterise) The observable expression of gene’s producing notable characteristics in an individual, e.g. Hair or eye colour, body mass, resistance to disease Phenotype vs. Brown White and Brown
  • 4. Genotype to Phenotype
  • 5. Genotype Phenotype ? Current Methods 200 What processes to investigate?
  • 6. ? 200 Microarray + QTL Genes captured in microarray experiment and present in QTL ( Quantitative Trait Loci ) region Genotype Phenotype Metabolic pathways Phenotypic response investigated using microarray in form of expressed genes or evidence provided through QTL mapping
  • 7. CHR QTL Gene A Gene B Pathway A Pathway B Pathway linked to phenotype – high priority Pathway not linked to phenotype – medium priority Pathway C Phenotype literature literature literature Gene C Pathway not linked to QTL – low priority Genotype
  • 8. Issues with current approaches
  • 9. Huge amounts of data 200+ Genes QTL region on chromosome Microarray 1000+ Genes How do I look at ALL the genes systematically?
  • 10. Hypothesis-Driven Analyses 200 QTL genes Case: African Sleeping sickness - parasitic infection - Known immune response Pick the genes involved in immunological process 40 QTL genes Pick the genes that I am most familiar with 2 QTL genes Biased view Result: African Sleeping sickness Immune response Cholesterol control Cell death
  • 11. Manual Methods of data analysis Navigating through hyperlinks No explicit methods Human error Tedious and repetitive
  • 13. Issues with current approaches Scale of analysis task User bias and premature filtering Hypothesis-Driven approach to data analysis Constant flux of data - problems with re-analysis of data Implicit methodologies (hyper-linking through web pages) Error proliferation from any of the listed issues Solution – Automate through workflows
  • 14. The Two W’s Web Services Technology and standard for exposing code / database with an means that can be consumed by a third party remotely Describes how to interact with it Workflows General technique for describing and executing a process Describes what you want to do
  • 15. Taverna Workflow Workbench http://taverna.sf.net
  • 16. Hypothesis Utilising the capabilities of workflows and the pathway-driven approach, we are able to provide a more: - systematic - efficient - scalable - un-biased - unambiguous the benefit will be that new biology results will be derived, increasing community knowledge of genotype and phenotype interactions.
  • 17. Pathway Resource QTL mapping study Microarray gene expression study Identify genes in QTL regions Identify differentially expressed genes Wet Lab Literature Annotate genes with biological pathways Annotate genes with biological pathways Select common biological pathways Hypothesis generation and verification Statistical analysis Genomic Resource
  • 18. Replicated original chain of data analysis
  • 19. Trypanosomiasis in Africa http://www.genomics.liv.ac.uk/tryps/trypsindex.html Andy Brass Steve Kemp + many Others
  • 20. Preliminary Results Trypanosomiasis resistance A strong candidate gene was found Daxx gene not found using manual investigation methods The gene was identified from analysis of biological pathway information Possible candidate identified by Yan et al (2004): Daxx SNP info Sequencing of the Daxx gene in Wet Lab showed mutations that is thought to change the structure of the protein Mutation was published in scientific literature, noting its effect on the binding of Daxx protein to p53 protein – p53 plays direct role in cell death and apoptosis, one of the Trypanosomiasis phenotypes More genes to follow (hopefully) in publications being written
  • 21. Shameless Plug! A Systematic Strategy for Large-Scale Analysis of Genotype-Phenotype Correlations: Identification of candidate genes involved in African Trypanosomiasis Fisher et al ., (2007) Nucleic Acids Research doi:10.1093/nar/ gkm623 Explicitly discusses the methods we used for the Trypanosomiasis use case Discussion of the results for Daxx and shows mutation Sharing of workflows for re-use, re-purposing
  • 22. Recycling, Reuse, Repurposing Identified a candidate gene (Daxx) for Trypanosomiasis resistance. Manual analysis on the microarray and QTL data failed to identify this gene as a candidate. Unbiased analysis. Confirmed by the wet lab. Here’s the Science ! Here’s the e-Science ! Trypanosomiasis mouse workflow reused without change in Trichuris muris infection in mice Identified biological pathways involved in sex dependence Previous manual two year study of candidate genes had failed to do this. Workflows now being run over Colitis/ Inflammatory Bowel Disease in Mice (without change)
  • 23. Recycling, Reuse, Repurposing http://www.myexperiment.org/ Share Search Re-use Re-purpose Execute Communicate Record
  • 24. What next? More use cases?? Can be done, but not for my project Text Mining !!! Aid biologists in identifying novel links between pathways Link pathways to phenotype through literature
  • 25. Pathway Resource QTL mapping study Microarray gene expression study Identify genes in QTL regions Identify differentially expressed genes Wet Lab Literature Annotate genes with biological pathways Annotate genes with biological pathways Select common biological pathways Hypothesis generation and verification Statistical analysis Genomic Resource
  • 26. CHR QTL Gene A Gene B Pathway A Pathway B Pathway linked to phenotype – high priority Pathway not linked to phenotype – medium priority Pathway C Phenotype literature literature literature Gene C Pathway not linked to QTL – low priority Genotype DONE MANUALLY
  • 27. It can’t be that hard, right? PubMed contains ~17,787,763 journals to date Manually searching is tedious and frustrating Can be hard finding the links Computers can help with data gathering and information extraction – that’s their job !!!
  • 28. Text Mining A means of assisting the researcher Time Effort Narrow searches Hypothesis generation and verification Suggested links Limited corpus, but its specific NOT A REPLACEMENT FOR DOMAIN EXPERTISE
  • 29. To Sum Up …. Need for Genotype-Phenotype correlations with respect to disease control High-throughput data can provide links between Genotype and Phenotype Highlighted issues with manually conducted in silico experiments Improved the methods of current microarray and QTL based investigations through systematic nature Increased reproducibility of our methods - workflows stored in XML based schema - explicit declaration of services, parameters, and methods of data analysis Shown workflows are capable of deriving new biologically significant results African Trypanosomiasis in the mouse Infection of mice with Trichuris muris The workflows require expansion to accommodate new analysis techniques – text mining
  • 30. Many thanks to: including: Joanne Pennock, EPSRC, OMII, myGrid, and lots more people

Editor's Notes

  • #2: Title slide A Systematic approach to large-scale analysis Genotype-Phenotype correlations