SlideShare a Scribd company logo
Mouse genomic variation and its effect on
   phenotypes and gene regulation


    Thomas Keane
    Vertebrate Resequencing Informatics

    Wellcome Trust Sanger Institute,
    Hinxton,
    Cambridge,
    UK


NUI Maynooth 20th April, 2012
Mouse genomic variation and its effect on phenotypes
 and gene regulation




                     Mouse Genomes Project


                     RNA-Editing




NUI Maynooth 20th April, 2012
Sequencing Technologies over past 30 years




                                MR Stratton et al. Nature 458, 719-724 (2009)


NUI Maynooth 20th April, 2012
Sanger total sequence (2007-2009)
Gbp




  NUI Maynooth 20th April, 2012
Sanger total sequence to-date




                                  HiSeq 2000
Gbp




  NUI Maynooth 20th April, 2012
The Laboratory Mouse




NUI Maynooth 20th April, 2012
Mouse Genome Project (2002)




NUI Maynooth 20th April, 2012
International Knockout Mouse Consortium




NUI Maynooth 20th April, 2012
Large Outbred Crosses

 Founder set of inbred strains and randomly cross

      Heterogeneous stock
      Collaborative Cross


 Large numbers of resulting mice
      Comprehensively phenotyped
      Recurrent phenotypes assessed
      Identify QTL regions
             Knowing the origin of haplotype blocks
                                                       Collaborative Cross Consortium (2009) Genetics




 Full sequence variation of founder mice required to find potential
 causitive mutations


NUI Maynooth 20th April, 2012
Mouse Genomes Project

 Sequencing 18 laboratory mouse strains
      Largest effort to date to sequence genomes of laboratory mouse strains

 Primary goals
      Deep sequencing of each strain (>25x)
      Comprehensive catalog sequence variation

 What sort of variation?
      SNPs – single base changes (A->G etc.)
      Indels – insertions or deletions of a few bases
      Structural variation – larger structural differences

 Illumina sequencing platform
      Raw data was generated in 2009
      Approx. ~1.2Tbp


NUI Maynooth 20th April, 2012
What does the data look like?

 Whole-genome shotgun (WGS)

 Sequence in parallel ends of
 millions of fragments
     300-500bp in size
     Read sequence of 100bp of
      either end




                                Reference



NUI Maynooth 20th April, 2012
Variation Catalog




                                Keane et al (2011) Nature

NUI Maynooth 20th April, 2012
Variation Catalog




NUI Maynooth 20th April, 2012
Results of the project

 Genomic variation and its effect on phenotypes




                                          Jonathan Flint & Richard Mott
                                          Keane et al (2011) Nature

NUI Maynooth 20th April, 2012
Results of the project

 Genomic variation and its effect on phenotypes

 Structural variation catalog




                                Binnaz Yalcin, Kim Wong, Thomas Keane, Jonathan Flint
                                Yalcin et al. (2010) Nature




NUI Maynooth 20th April, 2012
Results of the project

 Genomic variation and its effect on phenotypes

                                              SVMerge
 Structural variation catalog

 Structural variation methods




                                     Wong, Keane, Stalker, Adams (2010) Gen Biol



NUI Maynooth 20th April, 2012
Results of the project

 Genomic variation and its effect on phenotypes

 Structural variation catalog

 Structural variation methods

 Novel structural variation types




                                Binnaz Yalcin & Kim Wong
                                Yalcin et al. (2012) Gen Biol
NUI Maynooth 20th April, 2012
Results of the project

 Genomic variation and its effect on phenotypes

 Structural variation catalog

 Structural variation methods

 Novel structural variation types

 Transposable elements




                                      Nellaker, Keane, Wong et al., under review

NUI Maynooth 20th April, 2012
Results of the project

 Genomic variation and its effect on phenotypes

 Structural variation catalog

 Structural variation methods

 Novel structural variation types

 Transposable elements

 RNA-Editing…….




NUI Maynooth 20th April, 2012
Mouse genomic variation and its effect on phenotypes
 and gene regulation




                     Mouse Genomes Project

                     RNA-Editing




NUI Maynooth 20th April, 2012
RNA-Editing

 Site-selective post-transcriptional alteration of double-stranded RNA

 Adenosine deaminase acting on RNA (ADAR) family of enzymes
      Adenosine residues to inosines
      Observe A-to-G SNPs in cDNA


 ADARs
      Bind to double-stranded regions of RNA
      Modify multiple neighbouring adenosines


 Apobec-1 mediated C-to-U RNA editing

 Novel source of protein isoform diversity
                                                      Wulff and Nishikura (2009) WIREs RNA
      HTR2C gene: five edit sites lead to 28 mRNAs

NUI Maynooth 20th April, 2012
HTR2C gene




                                Wahlstedt et al (2009) Gen Res

NUI Maynooth 20th April, 2012
RNA-Seq

 Isolate RNA and reverse transcribe to cDNA
      Fragment cDNA and directly sequence
      No reference bias and huge dynamic range


 Uses
      Gene expression analysis
      Transcript discovery and annotation new genomes
      Alternative splicing


 RNA-editing                                                  McIntyre et al (2011) BMC Gen

      Align the RNA-seq reads to the reference genome
      If the bases disagree with the genomic sequence data at the
       corresponding position…..




NUI Maynooth 20th April, 2012
RNA-Editing?


RNA-seq
Replicate 1




RNA-seq
Replicate 2




DNA




NUI Maynooth 20th April, 2012
Human RNA-Editing




                                Li et al. (2011) Science




NUI Maynooth 20th April, 2012
Human RNA-Editing




                                Li et al. (2011) Science



NUI Maynooth 20th April, 2012
RNA-Seq is not the same as genomic sequencing

       Alignment of RNA-Seq reads is not trivial
            Most genomic short read aligners are not splice aware


RNA-seq
Replicate 1




RNA-seq
Replicate 2




DNA




      NUI Maynooth 20th April, 2012
RNA-Seq is not the same as genomic sequencing

 What about processed pseudo-genes?




                                                  Pink et al (2011) RNA


    cDNA fragment                                                                  Pseudogene


                                         Functioning gene                 Exon 1    Exon 2


                                Exon 1              Exon 2


NUI Maynooth 20th April, 2012
What about in mouse?

 Mouse Genomes Project
      RNA-Seq of 15 mouse strains
      Whole-brain tissue
      2-4 biological replicates per strain
      ~5Gbp per replicate

 Previous catalogs
      Neeman et al.
      Zaranek et al. - several tens of gigabases of human and mouse cDNA
       sequence
      Rosenberg et al. - RNA-seq for C57BL/6J strain

 Hindered by lack of corresponding genomic sequencing

 We generated
      Deep whole genome sequencing
      Corresponding RNA-Seq from whole-brain tissue across 15 strains
             2-4 biological replicates


NUI Maynooth 20th April, 2012
Our Pipeline


     gDNA          cDNA                      Splice-aware
     SNVs          SNVs
                                             realignment
                                                                                          Minimum Depth 10x
                                                                                              31,923 sites




   304,817 candidate sites             98,061 unambiguous sites        Filtering         Replicate Consistency
                                                                                              62,889 sites

                                                                   Estimated FDR 2.9%
No assumptions about the
nature of editing made
Assumed editing by ADARs                                          5,579 filtered sites
which usually occurs in clusters                                                          End Distance Bias
                                                                                            59,775 sites




    One-type mismatch
                                           Cluster extension                                 Strand Bias
      clusters added
                                                                                             42,238 sites




                                                                                         Variant Distance Bias
     7,389 final sites                       7,133 sites                                      36,213 sites


       NUI Maynooth 20th April, 2012
Effect of Filtering Strategy




NUI Maynooth 20th April, 2012
Validation

 Sequenom validation
      Random set of 611 calls from both the filtered set of 5,579 RNA editing
       sites
      19 non A-to-G editing sites raw calls -> all confirmed false positives
      Discrepancy rate of 10.5%
             Enriched at positions where editing level is <20%


 T-to-C editing
      Novel form of RNA-editing?
             Uncertainties in strand assignment of transcripts
             Result of calls made in antisense transcripts, mis-annotations


 Assuming all non A-to-G edits are false
      False-discovery rate of our call set is 2.9%


NUI Maynooth 20th April, 2012
Striking Conservation




NUI Maynooth 20th April, 2012
Editing Levels




NUI Maynooth 20th April, 2012
Genomic Context




NUI Maynooth 20th April, 2012
Protein Coding Edits

 23 previously known non-synonymous coding edits

 Extended this by a further 30 sites
      24 were by Sanger sequencing of cDNA



 Cacna1d gene
      Encodes the Cav1.2 voltage-
       gated calcium channel
             Known to undergo extensive
              alternative splicing
      Two novel non-synonymous
       edits
      Capillary sequencing validation
             Observed 3 different transcripts


NUI Maynooth 20th April, 2012
Cacna1d




NUI Maynooth 20th April, 2012
Rare C-to-U Edit: Mfn1




NUI Maynooth 20th April, 2012
Rare C-to-U Edit: Mfn1




NUI Maynooth 20th April, 2012
Cds2 - UTR
                                                                                                               D   R
                                                                                                               a   g
                                                                                                               a   g
                                                                                                               a   g
                                                                                                               a   g
                                                                                                               a   g
                                                                                                               a   g
                                                                                                               a   g
                                                                                                               a   g
                                                                                                               a   g
                                                                                                               a   g
                                                                                                               a   g
                                                                                                               a   g
                                                                                                               a   g
                                                                                                               a   g
                                                                                                               a   g
                                                                                                               g   g
                                                                                                               g   g
                                                                                                               g   g



                                                                                                         Rat   g   g
 RNA-editing appears to revert genomic sequence back to ancestral state

 Mice homozygous for disruptions in this gene display a lethal phenotype

 Several known across-species examples
        RNA-editing maintaining conservation at the protein level despite genomic sequence divergence


NUI Maynooth 20th April, 2012
Human Follow-up Studies




                   Li et al (2011) Science




                                             Ramaswami et al. (2012) Nat Meth




                Bahn et al (2011) Gen Res
                                                Peng et al (2011) Nat Bio



NUI Maynooth 20th April, 2012
To do

 First phase of the project was cataloging variation

 Full denovo assemblies of the strains
      Generating higher quality sequencing data for the 18 strains
      Long fragment end sequencing – 3, 6, 10, 40kb fragments


 De novo assembly
      Discover novel haplotypes
      Novel gene structures in the divergent strains


 Mouse pan-genome
      Reference bias
      New mouse reference genome graph
             Including novel non-reference haplotypes shared amongst subsets of the
              strains

NUI Maynooth 20th April, 2012
Acknowledgements and Questions

 Mouse Genomes Project
      Sanger Insitute
              David Adams, Petr Danecek, Kim Wong, Guy Slater, Sendu Bala et al.
      Wellcome Trust Center for Human Genetics
              Jonathan Flint, Binnaz Yalcin, Richard Mott, Leo Goodstadt et al.
      EBI
              Ewan Birney
                                                                                    David Adams
      University of Oxford
              Chris Ponting, Chris Nellaker, Andres Heger, Grant Belgard
 RNA-Editing
      Petr Danecek, David Adams, Chris Nellaker




                                                                                    Jonathan Flint

 Email: thomas.keane@sanger.ac.uk


NUI Maynooth 20th April, 2012
WTSI PhD Programme




NUI Maynooth 20th April, 2012

More Related Content

PDF
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
PPTX
Differential gene expression
PDF
Kogo 2013 RNA-seq analysis
PPTX
Data Management for Quantitative Biology - Data sources (Next generation tech...
PPTX
PCR based molecular markers
PPTX
RNA-seq differential expression analysis
PPTX
Molecular markers by tahura mariyam ansari
PDF
Data analysis pipelines for NGS applications
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
Differential gene expression
Kogo 2013 RNA-seq analysis
Data Management for Quantitative Biology - Data sources (Next generation tech...
PCR based molecular markers
RNA-seq differential expression analysis
Molecular markers by tahura mariyam ansari
Data analysis pipelines for NGS applications

What's hot (19)

PDF
The Clinical Significance of Transcript Alignment Discrepancies
PPT
PDF
Introduction to NGS
PPTX
Why Transcriptome? Why RNA-Seq? ENCODE answers….
PDF
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
PPTX
Molecular QC: Interpreting your Bioinformatics Pipeline
PPTX
Ngs introduction
PPTX
Knowing Your NGS Upstream: Alignment and Variants
PPTX
A Comparison of NGS Platforms.
PPTX
Catalyzing Plant Science Research with RNA-seq
PPTX
Next generation sequencing
PPT
Assembly and finishing
PDF
Use of TGIRT for ssDNA-seq of cfDNA in human plasma
PPTX
Sequence assembly
PDF
Examining gene expression and methylation with next gen sequencing
PPTX
RNA-seq: A High-resolution View of the Transcriptome
PPTX
Bioinformatics workshop Sept 2014
PPT
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
PDF
RNA sequencing: advances and opportunities
The Clinical Significance of Transcript Alignment Discrepancies
Introduction to NGS
Why Transcriptome? Why RNA-Seq? ENCODE answers….
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
Molecular QC: Interpreting your Bioinformatics Pipeline
Ngs introduction
Knowing Your NGS Upstream: Alignment and Variants
A Comparison of NGS Platforms.
Catalyzing Plant Science Research with RNA-seq
Next generation sequencing
Assembly and finishing
Use of TGIRT for ssDNA-seq of cfDNA in human plasma
Sequence assembly
Examining gene expression and methylation with next gen sequencing
RNA-seq: A High-resolution View of the Transcriptome
Bioinformatics workshop Sept 2014
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
RNA sequencing: advances and opportunities
Ad

Viewers also liked (20)

DOCX
Transcription in prokaryotes and eukaryotes
PPTX
Rna editing, protein splicing & codon bias
PPT
Genetic Evidence For Theories Of Human Dispersal
PPTX
Rna editing1
PPTX
The complete genome sequence of a neanderthal article presentation
PPT
RNA editing
PPTX
RNA editing
PDF
Assessing the impact of transposable element variation on mouse phenotypes an...
PDF
AMR surveillance in Europe: historical background and future outlook. Hajo G...
PDF
Mousegenomes tk-wtsi (1)
PPT
Introduction to genomes
PDF
Long read sequencing - LSCC lab talk - fri 5 june 2015
PPTX
Eukaryotic transcription
PPTX
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
PPT
Rna processing
PDF
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
PPT
Assembling NGS Data - IMB Winter School - 3 July 2012
PDF
Multiple mouse reference genomes and strain specific gene annotations
PPT
Antimicrobial resistance (AMR) in N. gonorrhoeae (GC) - global problem but v...
PDF
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Transcription in prokaryotes and eukaryotes
Rna editing, protein splicing & codon bias
Genetic Evidence For Theories Of Human Dispersal
Rna editing1
The complete genome sequence of a neanderthal article presentation
RNA editing
RNA editing
Assessing the impact of transposable element variation on mouse phenotypes an...
AMR surveillance in Europe: historical background and future outlook. Hajo G...
Mousegenomes tk-wtsi (1)
Introduction to genomes
Long read sequencing - LSCC lab talk - fri 5 june 2015
Eukaryotic transcription
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Rna processing
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Assembling NGS Data - IMB Winter School - 3 July 2012
Multiple mouse reference genomes and strain specific gene annotations
Antimicrobial resistance (AMR) in N. gonorrhoeae (GC) - global problem but v...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Ad

Similar to Mouse Genomes Project + RNA-Editing (20)

PDF
Mouse Genomes Project Summary June 2010
PPTX
Next generation sequencing
PPTX
Genome sequencing in vegetable crops
PDF
New generation Sequencing
PPTX
BFG_Chapter09_Next Generaton Sequencing_v04.pptx
PDF
Introduction to Apollo for i5k
PPTX
PLANT GENOME SEQUENCING AND DATA MINING.pptx
PDF
DNA Sequencing Modern Approaches BHARGAV BHATT 54429.pdf
PPT
Next generation seqencing tecnologies and application vegetable crops
PDF
Human genetic variation and its contribution to complex traits
PPTX
Final seminar ppt
PPTX
Gene Editing: An Essential Tool For Plant Breeding
PDF
General Principles of Toxicogenomics
PPT
Synthetic biology
PDF
Plant functionalgenomics
PPT
DNA Chip
PPTX
Kulakova sbb2014
PDF
A decade into Next Generation Sequencing on marine non-model organisms: curre...
PPT
Functional Genomics lecture as part of Genomics unit.
PPTX
New insights into the human genome by ENCODE project
Mouse Genomes Project Summary June 2010
Next generation sequencing
Genome sequencing in vegetable crops
New generation Sequencing
BFG_Chapter09_Next Generaton Sequencing_v04.pptx
Introduction to Apollo for i5k
PLANT GENOME SEQUENCING AND DATA MINING.pptx
DNA Sequencing Modern Approaches BHARGAV BHATT 54429.pdf
Next generation seqencing tecnologies and application vegetable crops
Human genetic variation and its contribution to complex traits
Final seminar ppt
Gene Editing: An Essential Tool For Plant Breeding
General Principles of Toxicogenomics
Synthetic biology
Plant functionalgenomics
DNA Chip
Kulakova sbb2014
A decade into Next Generation Sequencing on marine non-model organisms: curre...
Functional Genomics lecture as part of Genomics unit.
New insights into the human genome by ENCODE project

More from Thomas Keane (9)

PDF
Wellcome Trust Advances Course: NGS Course - Lecture1
PDF
Large Scale Resequencing: Approaches and Challenges
PDF
Overview of methods for variant calling from next-generation sequence data
PDF
Enhanced structural variant and breakpoint detection using SVMerge by integra...
PDF
Next generation sequencing in cloud computing era
PDF
Overview of methods for variant calling from next-generation sequence data
PDF
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
PDF
Mouse Genomes Poster - Genetics 2010
PDF
ECCB 2010 Next-gen sequencing Tutorial
Wellcome Trust Advances Course: NGS Course - Lecture1
Large Scale Resequencing: Approaches and Challenges
Overview of methods for variant calling from next-generation sequence data
Enhanced structural variant and breakpoint detection using SVMerge by integra...
Next generation sequencing in cloud computing era
Overview of methods for variant calling from next-generation sequence data
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
Mouse Genomes Poster - Genetics 2010
ECCB 2010 Next-gen sequencing Tutorial

Recently uploaded (20)

PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Computing-Curriculum for Schools in Ghana
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
Trump Administration's workforce development strategy
PDF
Complications of Minimal Access Surgery at WLH
PPTX
UNIT III MENTAL HEALTH NURSING ASSESSMENT
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PPTX
Unit 4 Skeletal System.ppt.pptxopresentatiom
PDF
1_English_Language_Set_2.pdf probationary
PPTX
Lesson notes of climatology university.
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
Practical Manual AGRO-233 Principles and Practices of Natural Farming
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Computing-Curriculum for Schools in Ghana
Chinmaya Tiranga quiz Grand Finale.pdf
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Orientation - ARALprogram of Deped to the Parents.pptx
Trump Administration's workforce development strategy
Complications of Minimal Access Surgery at WLH
UNIT III MENTAL HEALTH NURSING ASSESSMENT
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Unit 4 Skeletal System.ppt.pptxopresentatiom
1_English_Language_Set_2.pdf probationary
Lesson notes of climatology university.
History, Philosophy and sociology of education (1).pptx
Paper A Mock Exam 9_ Attempt review.pdf.

Mouse Genomes Project + RNA-Editing

  • 1. Mouse genomic variation and its effect on phenotypes and gene regulation Thomas Keane Vertebrate Resequencing Informatics Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK NUI Maynooth 20th April, 2012
  • 2. Mouse genomic variation and its effect on phenotypes and gene regulation  Mouse Genomes Project  RNA-Editing NUI Maynooth 20th April, 2012
  • 3. Sequencing Technologies over past 30 years MR Stratton et al. Nature 458, 719-724 (2009) NUI Maynooth 20th April, 2012
  • 4. Sanger total sequence (2007-2009) Gbp NUI Maynooth 20th April, 2012
  • 5. Sanger total sequence to-date HiSeq 2000 Gbp NUI Maynooth 20th April, 2012
  • 6. The Laboratory Mouse NUI Maynooth 20th April, 2012
  • 7. Mouse Genome Project (2002) NUI Maynooth 20th April, 2012
  • 8. International Knockout Mouse Consortium NUI Maynooth 20th April, 2012
  • 9. Large Outbred Crosses Founder set of inbred strains and randomly cross   Heterogeneous stock   Collaborative Cross Large numbers of resulting mice   Comprehensively phenotyped   Recurrent phenotypes assessed   Identify QTL regions  Knowing the origin of haplotype blocks Collaborative Cross Consortium (2009) Genetics Full sequence variation of founder mice required to find potential causitive mutations NUI Maynooth 20th April, 2012
  • 10. Mouse Genomes Project Sequencing 18 laboratory mouse strains   Largest effort to date to sequence genomes of laboratory mouse strains Primary goals   Deep sequencing of each strain (>25x)   Comprehensive catalog sequence variation What sort of variation?   SNPs – single base changes (A->G etc.)   Indels – insertions or deletions of a few bases   Structural variation – larger structural differences Illumina sequencing platform   Raw data was generated in 2009   Approx. ~1.2Tbp NUI Maynooth 20th April, 2012
  • 11. What does the data look like? Whole-genome shotgun (WGS) Sequence in parallel ends of millions of fragments   300-500bp in size   Read sequence of 100bp of either end Reference NUI Maynooth 20th April, 2012
  • 12. Variation Catalog Keane et al (2011) Nature NUI Maynooth 20th April, 2012
  • 13. Variation Catalog NUI Maynooth 20th April, 2012
  • 14. Results of the project Genomic variation and its effect on phenotypes Jonathan Flint & Richard Mott Keane et al (2011) Nature NUI Maynooth 20th April, 2012
  • 15. Results of the project Genomic variation and its effect on phenotypes Structural variation catalog Binnaz Yalcin, Kim Wong, Thomas Keane, Jonathan Flint Yalcin et al. (2010) Nature NUI Maynooth 20th April, 2012
  • 16. Results of the project Genomic variation and its effect on phenotypes SVMerge Structural variation catalog Structural variation methods Wong, Keane, Stalker, Adams (2010) Gen Biol NUI Maynooth 20th April, 2012
  • 17. Results of the project Genomic variation and its effect on phenotypes Structural variation catalog Structural variation methods Novel structural variation types Binnaz Yalcin & Kim Wong Yalcin et al. (2012) Gen Biol NUI Maynooth 20th April, 2012
  • 18. Results of the project Genomic variation and its effect on phenotypes Structural variation catalog Structural variation methods Novel structural variation types Transposable elements Nellaker, Keane, Wong et al., under review NUI Maynooth 20th April, 2012
  • 19. Results of the project Genomic variation and its effect on phenotypes Structural variation catalog Structural variation methods Novel structural variation types Transposable elements RNA-Editing……. NUI Maynooth 20th April, 2012
  • 20. Mouse genomic variation and its effect on phenotypes and gene regulation  Mouse Genomes Project  RNA-Editing NUI Maynooth 20th April, 2012
  • 21. RNA-Editing Site-selective post-transcriptional alteration of double-stranded RNA Adenosine deaminase acting on RNA (ADAR) family of enzymes   Adenosine residues to inosines   Observe A-to-G SNPs in cDNA ADARs   Bind to double-stranded regions of RNA   Modify multiple neighbouring adenosines Apobec-1 mediated C-to-U RNA editing Novel source of protein isoform diversity Wulff and Nishikura (2009) WIREs RNA   HTR2C gene: five edit sites lead to 28 mRNAs NUI Maynooth 20th April, 2012
  • 22. HTR2C gene Wahlstedt et al (2009) Gen Res NUI Maynooth 20th April, 2012
  • 23. RNA-Seq Isolate RNA and reverse transcribe to cDNA   Fragment cDNA and directly sequence   No reference bias and huge dynamic range Uses   Gene expression analysis   Transcript discovery and annotation new genomes   Alternative splicing RNA-editing McIntyre et al (2011) BMC Gen   Align the RNA-seq reads to the reference genome   If the bases disagree with the genomic sequence data at the corresponding position….. NUI Maynooth 20th April, 2012
  • 25. Human RNA-Editing Li et al. (2011) Science NUI Maynooth 20th April, 2012
  • 26. Human RNA-Editing Li et al. (2011) Science NUI Maynooth 20th April, 2012
  • 27. RNA-Seq is not the same as genomic sequencing Alignment of RNA-Seq reads is not trivial   Most genomic short read aligners are not splice aware RNA-seq Replicate 1 RNA-seq Replicate 2 DNA NUI Maynooth 20th April, 2012
  • 28. RNA-Seq is not the same as genomic sequencing What about processed pseudo-genes? Pink et al (2011) RNA cDNA fragment Pseudogene Functioning gene Exon 1 Exon 2 Exon 1 Exon 2 NUI Maynooth 20th April, 2012
  • 29. What about in mouse? Mouse Genomes Project   RNA-Seq of 15 mouse strains   Whole-brain tissue   2-4 biological replicates per strain   ~5Gbp per replicate Previous catalogs   Neeman et al.   Zaranek et al. - several tens of gigabases of human and mouse cDNA sequence   Rosenberg et al. - RNA-seq for C57BL/6J strain Hindered by lack of corresponding genomic sequencing We generated   Deep whole genome sequencing   Corresponding RNA-Seq from whole-brain tissue across 15 strains  2-4 biological replicates NUI Maynooth 20th April, 2012
  • 30. Our Pipeline gDNA cDNA Splice-aware SNVs SNVs realignment Minimum Depth 10x 31,923 sites 304,817 candidate sites 98,061 unambiguous sites Filtering Replicate Consistency 62,889 sites Estimated FDR 2.9% No assumptions about the nature of editing made Assumed editing by ADARs 5,579 filtered sites which usually occurs in clusters End Distance Bias 59,775 sites One-type mismatch Cluster extension Strand Bias clusters added 42,238 sites Variant Distance Bias 7,389 final sites 7,133 sites 36,213 sites NUI Maynooth 20th April, 2012
  • 31. Effect of Filtering Strategy NUI Maynooth 20th April, 2012
  • 32. Validation Sequenom validation   Random set of 611 calls from both the filtered set of 5,579 RNA editing sites   19 non A-to-G editing sites raw calls -> all confirmed false positives   Discrepancy rate of 10.5%  Enriched at positions where editing level is <20% T-to-C editing   Novel form of RNA-editing?  Uncertainties in strand assignment of transcripts  Result of calls made in antisense transcripts, mis-annotations Assuming all non A-to-G edits are false   False-discovery rate of our call set is 2.9% NUI Maynooth 20th April, 2012
  • 34. Editing Levels NUI Maynooth 20th April, 2012
  • 35. Genomic Context NUI Maynooth 20th April, 2012
  • 36. Protein Coding Edits 23 previously known non-synonymous coding edits Extended this by a further 30 sites   24 were by Sanger sequencing of cDNA Cacna1d gene   Encodes the Cav1.2 voltage- gated calcium channel   Known to undergo extensive alternative splicing   Two novel non-synonymous edits   Capillary sequencing validation   Observed 3 different transcripts NUI Maynooth 20th April, 2012
  • 38. Rare C-to-U Edit: Mfn1 NUI Maynooth 20th April, 2012
  • 39. Rare C-to-U Edit: Mfn1 NUI Maynooth 20th April, 2012
  • 40. Cds2 - UTR D R a g a g a g a g a g a g a g a g a g a g a g a g a g a g a g g g g g g g Rat g g RNA-editing appears to revert genomic sequence back to ancestral state Mice homozygous for disruptions in this gene display a lethal phenotype Several known across-species examples   RNA-editing maintaining conservation at the protein level despite genomic sequence divergence NUI Maynooth 20th April, 2012
  • 41. Human Follow-up Studies Li et al (2011) Science Ramaswami et al. (2012) Nat Meth Bahn et al (2011) Gen Res Peng et al (2011) Nat Bio NUI Maynooth 20th April, 2012
  • 42. To do First phase of the project was cataloging variation Full denovo assemblies of the strains   Generating higher quality sequencing data for the 18 strains   Long fragment end sequencing – 3, 6, 10, 40kb fragments De novo assembly   Discover novel haplotypes   Novel gene structures in the divergent strains Mouse pan-genome   Reference bias   New mouse reference genome graph  Including novel non-reference haplotypes shared amongst subsets of the strains NUI Maynooth 20th April, 2012
  • 43. Acknowledgements and Questions Mouse Genomes Project   Sanger Insitute   David Adams, Petr Danecek, Kim Wong, Guy Slater, Sendu Bala et al.   Wellcome Trust Center for Human Genetics   Jonathan Flint, Binnaz Yalcin, Richard Mott, Leo Goodstadt et al.   EBI   Ewan Birney David Adams   University of Oxford   Chris Ponting, Chris Nellaker, Andres Heger, Grant Belgard RNA-Editing   Petr Danecek, David Adams, Chris Nellaker Jonathan Flint Email: thomas.keane@sanger.ac.uk NUI Maynooth 20th April, 2012
  • 44. WTSI PhD Programme NUI Maynooth 20th April, 2012