SlideShare a Scribd company logo
Analysis Tools for RNA-seq and
Isoform Characterization
Slides: t.co/nc7siKRm6W
Gunnar R¨atsch
Biomedical Data Science Group
Computational Biology Center
Memorial Sloan Kettering Cancer Center
gxr #RNA #MMR #SplAdder #riboDiff #Cancer
Biomedical Data Sciences Group
Facts
Cost of collecting data drops, amounts increase exponentially.
We have more data than accurate algorithms.
Group’s research
Data Science Algorithms, Models & Tools
Machine Learning,
Bioinformatics.
Biology & Medicine Problem Setting & Goals
RNA processing regulation,
Clinical data analysis.
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 1
Memorial Sloan-Kettering Cancer Center
Biomedical Data Sciences Group
Facts
Cost of collecting data drops, amounts increase exponentially.
We have more data than accurate algorithms.
Group’s research
Data Science Algorithms, Models & Tools
Machine Learning,
Bioinformatics.
Biology & Medicine Problem Setting & Goals
RNA processing regulation,
Clinical data analysis.
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 1
Memorial Sloan-Kettering Cancer Center
Biomedical Data Sciences Group
Facts
Cost of collecting data drops, amounts increase exponentially.
We have more data than accurate algorithms.
Group’s research
Data Science Algorithms, Models & Tools
Machine Learning,
Bioinformatics.
Biology & Medicine Problem Setting & Goals
RNA processing regulation,
Clinical data analysis.
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 1
Memorial Sloan-Kettering Cancer Center
Learning About the Central Dogma
Goal: Learn to predict what these processes accomplish:
Given the DNA, . . . , predict all gene products
f (DNA, ) = RNA g(RNA, ) = protein
Estimating f , g amounts to cracking the codes of
transcription, epigenetics, splicing, . . .
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 2
Memorial Sloan-Kettering Cancer Center
RNA-seq based Transcriptome Characterization
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 3
Memorial Sloan-Kettering Cancer Center
RNA-seq based Transcriptome Characterization
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 3
Memorial Sloan-Kettering Cancer Center
RNA-seq based Transcriptome Characterization
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 3
Memorial Sloan-Kettering Cancer Center
RNA-seq based Transcriptome Characterization
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 3
Memorial Sloan-Kettering Cancer Center
RNA-seq based Transcriptome Characterization
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 3
Memorial Sloan-Kettering Cancer Center
RNA-seq based Transcriptome Characterization
oqtans.org
cloud.oqtans.org
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 3
Memorial Sloan-Kettering Cancer Center
Transcript Quantitation and Dependence on Alignments
0 1 2 3 4 5 6 7 8
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
maximal number of mismatches
Pearsoncorrelationwithtrueabundance
True read origins
Alignments to
genome
92%
53%
84%
False alignments, multi-mappers etc. lead to weaker results
Simulated human reads from transcripts of known abundance (Fluxsimulator, [Sammeth, 2009]), 3% error rate, alignment w/
PALMapper [Jean et al., 2010], quantification w/ rQuant [Bohnert et al., 2009], Person correlation over considered transcripts.
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 4
Memorial Sloan-Kettering Cancer Center
bioRxiv dx.doi.org/10.1101/017103
Efficient BAM file postprocessor for RNA- & DNA-seq
100M alignments in 20 minutes (10 threads)
Suitable for large-scale projects
Improved accuracy for transcript quantification and prediction
Open Source bioweb.me/mmr (C++)
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 5
bioRxiv dx.doi.org/10.1101/017103
Efficient BAM file postprocessor for RNA- & DNA-seq
100M alignments in 20 minutes (10 threads)
Suitable for large-scale projects
Improved accuracy for transcript quantification and prediction
Open Source bioweb.me/mmr (C++)
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 5
bioRxiv dx.doi.org/10.1101/017103
Efficient BAM file postprocessor for RNA- & DNA-seq
100M alignments in 20 minutes (10 threads)
Suitable for large-scale projects
Improved accuracy for transcript quantification and prediction
Open Source bioweb.me/mmr (C++)
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 5
Multiple Mapper Resolution
Principle (Iterated over all reads, N times)
Use the change of local coverage around read mapping ...
... and use its smoothness to identify “better” mapping location
location 1 location 2
Coverage
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 6
Memorial Sloan-Kettering Cancer Center
Multiple Mapper Resolution
Principle (Iterated over all reads, N times)
Use the change of local coverage around read mapping ...
... and use its smoothness to identify “better” mapping location
Read not mapped to location 1 ... ... but mapped to location 2
Read pairCoverage
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 6
Memorial Sloan-Kettering Cancer Center
Multiple Mapper Resolution
Principle (Iterated over all reads, N times)
Use the change of local coverage around read mapping ...
... and use its smoothness to identify “better” mapping location
+
Read not mapped to location 1 ... ... but mapped to location 2
Read pair Variance measure Evaluation windowCoverage Average coverage
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 6
Memorial Sloan-Kettering Cancer Center
Multiple Mapper Resolution
Principle (Iterated over all reads, N times)
Use the change of local coverage around read mapping ...
... and use its smoothness to identify “better” mapping location
+
+
Read pair Variance measure Evaluation windowCoverage Average coverage
Read not mapped to location 1 ... ... but mapped to location 2
Read mapped to location 1 ... ... and not mapped to location 2
Assignment1Assignment2
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 6
Memorial Sloan-Kettering Cancer Center
Multiple Mapper Resolution
Results for simulated DNA-seq
Smooths coverage as expected on an artificial dataset
Simulated reads from tiling a part of A. thaliana genome, alignment w/ PALMapper [Jean et al., 2010] (with -a option), visual-
ization with IGV [Robinson et al., 2011].
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 7
Memorial Sloan-Kettering Cancer Center
Multiple Mapper Resolution
Results for simulated DNA-seq
Smooths coverage as expected on an artificial dataset
≥ ≥
Simulated reads from tiling a part of A. thaliana genome, alignment w/ PALMapper [Jean et al., 2010] (with -a option), visual-
ization with IGV [Robinson et al., 2011].
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 7
Memorial Sloan-Kettering Cancer Center
Multiple Mapper Resolution
Results for simulated RNA-seq
Improves performance of transcript quantification
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0 1 2 4 6 all
Correlation
Mismatches
original alignments
best alignment only
MMR treated alignment
Simulated reads (75nt) from subset of human annotated transcripts with Fluxsimulator [Sammeth, 2009], PALMapper alignments
[Jean et al., 2010], rQuant quantitation Bohnert et al. [2009], Pearson correlation over all considered transcripts.
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 8
Memorial Sloan-Kettering Cancer Center
LARGE-SCALE BIOLOGY ARTICLE
Nonsense-Mediated Decay of Alternative Precursor mRNA
Splicing Variants Is a Major Determinant of the Arabidopsis
Steady State TranscriptomeC W
Gabriele Drechsel,a,1 André Kahles,b,1 Anil K. Kesarwani,a Eva Stauffer,a,2 Jonas Behr,b Philipp Drewe,b
Gunnar Rätsch,b and Andreas Wachtera,3
a Center for Plant Molecular Biology, University of Tübingen, 72076 Tuebingen, Germany
b Computational Biology Center, Sloan-Kettering Institute, New York, New York 10065
ORCID IDs: 0000-0002-3411-0692 (A.K.); 0000-0001-5486-8532 (G.R.); 0000-0002-3132-5161 (A.W.).
The nonsense-mediated decay (NMD) surveillance pathway can recognize erroneous transcripts and physiological mRNA
such as precursor mRNA alternative splicing (AS) variants. Currently, information on the global extent of coupled AS and NM
The Plant Cell, Vol. 25: 3726–3742, October 2013, www.plantcell.org ã 2013 American Society of Plant Biologists. All rights reserved.
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 9
bioRxiv dx.doi.org/10.1101/017095
Analysis of alternative isoforms with RNA-seq data
Analyses known and identifies novel splicing events
Quantifies & visualizes splicing-related data
Suitable for large-scale projects (1000’s of samples)
Improved accuracy for transcript quantification and prediction
Open Source bioweb.me/spladder (python)
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 10
bioRxiv dx.doi.org/10.1101/017095
Analysis of alternative isoforms with RNA-seq data
Analyses known and identifies novel splicing events
Quantifies & visualizes splicing-related data
Suitable for large-scale projects (1000’s of samples)
Improved accuracy for transcript quantification and prediction
Open Source bioweb.me/spladder (python)
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 10
bioRxiv dx.doi.org/10.1101/017095
Analysis of alternative isoforms with RNA-seq data
Analyses known and identifies novel splicing events
Quantifies & visualizes splicing-related data
Suitable for large-scale projects (1000’s of samples)
Improved accuracy for transcript quantification and prediction
Open Source bioweb.me/spladder (python)
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 10
SplAdder Ideas
Major Problems in Transcriptome Analysis
1 Gene annotations are incomplete and often inaccurate
2 Whole transcript isoforms are difficult to predict/quantify
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 11
Memorial Sloan-Kettering Cancer Center
SplAdder Ideas
Major Problems in Transcriptome Analysis
1 Gene annotations are incomplete and often inaccurate
2 Whole transcript isoforms are difficult to predict/quantify
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 11
Memorial Sloan-Kettering Cancer Center
SplAdder Ideas
Major Problems in Transcriptome Analysis
1 Gene annotations are incomplete and often inaccurate
2 Whole transcript isoforms are difficult to predict/quantify
Solution
Augment annotation with RNA-Seq evidence
Use single splicing events instead of full transcripts
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 11
Memorial Sloan-Kettering Cancer Center
SplAdder Ideas
Major Problems in Transcriptome Analysis
1 Gene annotations are incomplete and often inaccurate
2 Whole transcript isoforms are difficult to predict/quantify
Solution
Augment annotation with RNA-Seq evidence
Use single splicing events instead of full transcripts
Annotation
Alignment
Data
Augmented
Splicing
Graph
Detected
Splice
Events
Quantified
Splice
Events
Differential
Analysis/
sQTL Tests
Kahles et al., bioRxiv, 2015
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 11
Memorial Sloan-Kettering Cancer Center
SplAdder Ideas
Major Problems in Transcriptome Analysis
1 Gene annotations are incomplete and often inaccurate
2 Whole transcript isoforms are difficult to predict/quantify
Solution
Augment annotation with RNA-Seq evidence
Use single splicing events instead of full transcripts
Annotation
Alignment
Data
Augmented
Splicing
Graph
Detected
Splice
Events
Quantified
Splice
Events
Differential
Analysis/
sQTL Tests
Kahles et al., bioRxiv, 2015
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 11
Memorial Sloan-Kettering Cancer Center
SplAdder Graph Augmentation
Principle
Collapse annotated transcripts into graph representation
Use RNA-Seq evidence to add new nodes and edges
T1E1 T1E2 T1E3 T1E4
T2E1 T2E2 T2E3
T3E1 T3E2 T3E3
T4E1 T4E2
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 12
Memorial Sloan-Kettering Cancer Center
SplAdder Graph Augmentation
Principle
Collapse annotated transcripts into graph representation
Use RNA-Seq evidence to add new nodes and edges
T1E1 T1E2 T1E3 T1E4
T2E1 T2E2 T2E3
T3E1 T3E2 T3E3
E1 E3 E4 E6
E2 E5
T4E1 T4E2
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 12
Memorial Sloan-Kettering Cancer Center
SplAdder Graph Augmentation
Principle
Collapse annotated transcripts into graph representation
Use RNA-Seq evidence to add new nodes and edges
coverage
split alignments
New cassette exon
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 12
Memorial Sloan-Kettering Cancer Center
SplAdder Graph Augmentation
Principle
Collapse annotated transcripts into graph representation
Use RNA-Seq evidence to add new nodes and edges
coverage
split alignments
New cassette exon
coverage
New retained intron
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 12
Memorial Sloan-Kettering Cancer Center
SplAdder Graph Augmentation
coverage
split alignments
A New cassette exon
coverage
B New retained intron
split alignments
C New intron
split alignments
D Alternative splice sites on both intron ends
split alignments
E New start-terminal node / New end-terminal node
split alignments
F Alternative 3’splice site / New end-terminal node
split alignments
G Alternative 5’splice site / New start terminal node
split alignments
H New exon skip
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 13
Memorial Sloan-Kettering Cancer Center
SplAdder Event Extraction
E1 E3 E4 E6
E2 E5
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 14
Memorial Sloan-Kettering Cancer Center
SplAdder Event Extraction
E1 E3 E4 E6
E2 E5
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 14
Memorial Sloan-Kettering Cancer Center
SplAdder Event Extraction
E1 E3 E4 E6
E2 E5
E1 E3 E4 E6
E2 E5
E1 E3 E4 E6
E2 E5
E1 E3 E4 E6
E2 E5
E1 E3 E4 E6
E2 E5
Exon Skip
Multiple Exon Skip
Alternative 5’Splice Site
Intron Retention
Alternative 3’Splice Site
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 14
Memorial Sloan-Kettering Cancer Center
SplAdder Event Quantification and Visualization
Exon Skip
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 15
Memorial Sloan-Kettering Cancer Center
SplAdder Event Quantification and Visualization
Exon Skip
a
cb
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 15
Memorial Sloan-Kettering Cancer Center
SplAdder Event Quantification and Visualization
Exon Skip
a
cb
PSI =
b + c
2 · a + b + c
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 15
Memorial Sloan-Kettering Cancer Center
SplAdder Event Quantification and Visualization
Alternative 5’ Site
a
b
Exon Skip
a
cb
PSI =
b + c
2 · a + b + c
PSI =
b
a + b
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 15
Memorial Sloan-Kettering Cancer Center
SplAdder Event Quantification and Visualization
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 15
Memorial Sloan-Kettering Cancer Center
SplAdder Event Quantification and Visualization
Summary
SplAdder effectively augments the annotation
Enables quantitative analysis of events instead of transcripts
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 15
Memorial Sloan-Kettering Cancer Center
Splicing Analysis Across Multiple Cancer Types
Goals
1 Identify cancer-specific splicing patterns
2 Identify variants regulating splicing in same gene (cis)
3 Identify variants regulating splicing in other cancer genes (trans)
TCGA provides RNA-seq and matching exome data
RNA-seq Find & quantify splicing events
Exome Identify variants in exons & flanking intronic regions
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 16
Memorial Sloan-Kettering Cancer Center
Splicing Analysis Across Multiple Cancer Types
Goals
1 Identify cancer-specific splicing patterns
2 Identify variants regulating splicing in same gene (cis)
3 Identify variants regulating splicing in other cancer genes (trans)
TCGA provides RNA-seq and matching exome data
RNA-seq Find & quantify splicing events
Exome Identify variants in exons & flanking intronic regions
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 16
Memorial Sloan-Kettering Cancer Center
Splicing Variation Across 4,700 Samples
Event Statistics Exon Skip Eve
A B
Analysis of a total of 4,700 RNA-seq samples from TCGA normal (tn), TCGA tumors (tc), Encode (ec) and Geuvadis (gv). Align-
ment w/ STAR [Dobin et al., 2013], analysis w/ SplAdder (SplA) and Gencode annotation (Anno). Figure from [Kahles, 2014].
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 17
Memorial Sloan-Kettering Cancer Center
Uniform analysis of Large-Scale RNA-seq Data
Large-scale Compute
4,700 RNA-seq libraries (≈100 TB)
⇒ STAR ≈ 6 CPU years
⇒ SplAdder ≈ 0.5 CPU years
[Kahles et al.]
Unified community resources
Docker with ICGC RNA-seq alignment SOP
bioweb.me/ICGC-RNA-SOP
Syncronize with Encode, gTex, TCGA, . . .
[ICGC PCAWG-3 WG]
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 18
Memorial Sloan-Kettering Cancer Center
Uniform analysis of Large-Scale RNA-seq Data
Large-scale Compute
4,700 RNA-seq libraries (≈100 TB)
⇒ STAR ≈ 6 CPU years
⇒ SplAdder ≈ 0.5 CPU years
[Kahles et al.]
Unified community resources
Docker with ICGC RNA-seq alignment SOP
bioweb.me/ICGC-RNA-SOP
Syncronize with Encode, gTex, TCGA, . . .
[ICGC PCAWG-3 WG]
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 18
Memorial Sloan-Kettering Cancer Center
bioRxiv dx.doi.org/10.1101/017095
Analysis of Ribosome profiling and RNA-seq data
Study translation efficiency
Adjusts for expression differences
Accurate method based on dispersion estimates and GLMs
Open Source bioweb.me/ribodiff (python)
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 19
bioRxiv dx.doi.org/10.1101/017095
Analysis of Ribosome profiling and RNA-seq data
Study translation efficiency
Adjusts for expression differences
Accurate method based on dispersion estimates and GLMs
Open Source bioweb.me/ribodiff (python)
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 19
bioRxiv dx.doi.org/10.1101/017095
Analysis of Ribosome profiling and RNA-seq data
Study translation efficiency
Adjusts for expression differences
Accurate method based on dispersion estimates and GLMs
Open Source bioweb.me/ribodiff (python)
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 19
Application to Ribosome Profiling
Found a motif that was strongly in enriched in detected genes:
G-quadruplex structures
Memorial Sloan-Kettering Cancer Center
Application to Ribosome Profiling
Found a motif that was strongly in enriched in detected genes:
G-quadruplex structures
Memorial Sloan-Kettering Cancer Center
(Analysis based on related but different strategy [Wolfe et al., 2014].)
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 20
Summary
MMR improves alignment choice for multi-mappers
⇒ Helps improving accuracy of tools like Cufflinks
SplAdder identifies, quantifies & visualizes alternative splicing
⇒ Finds unannotated alternative splicing, tumor/normal splicing
differences; splicing reprogramming; sQTLs
riboDiff accurately detects differential translation efficiency
⇒ Ribosome footprinting revealed RNA G-Quadruplex elements in
5’ UTR that interacts with compound via eIF4a
Tools (+ six other ones) are open source and available
⇒ ratschlab.org/tools
⇒ (more) Docker images come soon . . .
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 21
Memorial Sloan-Kettering Cancer Center
Summary
MMR improves alignment choice for multi-mappers
⇒ Helps improving accuracy of tools like Cufflinks
SplAdder identifies, quantifies & visualizes alternative splicing
⇒ Finds unannotated alternative splicing, tumor/normal splicing
differences; splicing reprogramming; sQTLs
riboDiff accurately detects differential translation efficiency
⇒ Ribosome footprinting revealed RNA G-Quadruplex elements in
5’ UTR that interacts with compound via eIF4a
Tools (+ six other ones) are open source and available
⇒ ratschlab.org/tools
⇒ (more) Docker images come soon . . .
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 21
Memorial Sloan-Kettering Cancer Center
Summary
MMR improves alignment choice for multi-mappers
⇒ Helps improving accuracy of tools like Cufflinks
SplAdder identifies, quantifies & visualizes alternative splicing
⇒ Finds unannotated alternative splicing, tumor/normal splicing
differences; splicing reprogramming; sQTLs
riboDiff accurately detects differential translation efficiency
⇒ Ribosome footprinting revealed RNA G-Quadruplex elements in
5’ UTR that interacts with compound via eIF4a
Tools (+ six other ones) are open source and available
⇒ ratschlab.org/tools
⇒ (more) Docker images come soon . . .
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 21
Memorial Sloan-Kettering Cancer Center
Summary
MMR improves alignment choice for multi-mappers
⇒ Helps improving accuracy of tools like Cufflinks
SplAdder identifies, quantifies & visualizes alternative splicing
⇒ Finds unannotated alternative splicing, tumor/normal splicing
differences; splicing reprogramming; sQTLs
riboDiff accurately detects differential translation efficiency
⇒ Ribosome footprinting revealed RNA G-Quadruplex elements in
5’ UTR that interacts with compound via eIF4a
Tools (+ six other ones) are open source and available
⇒ ratschlab.org/tools
⇒ (more) Docker images come soon . . .
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 21
Memorial Sloan-Kettering Cancer Center
Acknowledgements
R¨atsch Laboratory
Andre Kahles
Yi Zhong
Philipp Drewe @ MDC Berlin
Theofanis Karaletsos
Kjong Van Lehmann
Jonas Behr @ ETH Basel
Regina Bohnert @ Molecular Health
Geraldine Jean @ University of Nantes
Cancer Biology
Guido Wendel
Kamini Singh, . . .
Cancer Genomics Projects
Angela Brooks, Broad
Alvis Brazma, EBI
Matt Wilkerson, UNC
Niki Schultz, MSKCC
Chris Sander, MSKCC
Funding from MSKCC, Max Planck Society,
European Union, German Research Foundation,
Geoffrey Beene Foundation & NIH
Thank you!
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 22
Acknowledgements
R¨atsch Laboratory
Andre Kahles
Yi Zhong
Philipp Drewe @ MDC Berlin
Theofanis Karaletsos
Kjong Van Lehmann
Jonas Behr @ ETH Basel
Regina Bohnert @ Molecular Health
Geraldine Jean @ University of Nantes
Cancer Biology
Guido Wendel
Kamini Singh, . . .
Cancer Genomics Projects
Angela Brooks, Broad
Alvis Brazma, EBI
Matt Wilkerson, UNC
Niki Schultz, MSKCC
Chris Sander, MSKCC
Funding from MSKCC, Max Planck Society,
European Union, German Research Foundation,
Geoffrey Beene Foundation & NIH
Thank you!
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 22
References I
J. Behr, G. Schweikert, J. Cao, F. De Bona, G. Zeller, S. Laubinger, S. Ossowski,
K. Schneeberger, D. Weigel, and G. R¨atsch. Rna-seq and tiling arrays for improved gene
finding. Oral presentation at the CSHL Genome Informatics Meeting, September 2008. URL
http:
//www.fml.tuebingen.mpg.de/raetsch/lectures/RaetschGenomeInformatics08.pdf.
R. Bohnert, J. Behr, and G R¨atsch. Transcript quantification with RNA-Seq data. BMC
Bioinformatics, 10(S13):P5, 2009. URL
http://www.biomedcentral.com/1471-2105/10/S13/P5.
RM Clark, G Schweikert, C Toomajian, S Ossowski, G Zeller, P Shinn, N Warthmann, TT Hu,
G Fu, DA Hinds, H Chen, KA Frazer, DH Huson, B Sch¨olkopf, M Nordborg, G R¨atsch,
JR Ecker, and D Weigel. Common sequence polymorphisms shaping genetic diversity in
arabidopsis thaliana. Science, 317(5836):338–342, 2007. ISSN 1095-9203 (Electronic). doi:
10.1126/science.1138632.
Alexander Dobin, Carrie a Davis, Felix Schlesinger, Jorg Drenkow, Chris Zaleski, Sonali Jha,
Philippe Batut, Mark Chaisson, and Thomas R Gingeras. STAR: ultrafast universal RNA-seq
aligner. Bioinformatics (Oxford, England), 29(1):15–21, January 2013. ISSN 1367-4811. doi:
10.1093/bioinformatics/bts635. URL http://www.ncbi.nlm.nih.gov/pubmed/23104886.
Mitchell Guttman, Manuel Garber, Joshua Z Levin, Julie Donaghey, James Robinson, Xian
Adiconis, Lin Fan, Magdalena J Koziol, Andreas Gnirke, Chad Nusbaum, John L Rinn,
Eric S Lander, and Aviv Regev. Ab initio reconstruction of cell type-specific transcriptomes
in mouse reveals the conserved multi-exonic structure of lincrnas. Nat Biotechnol, 28(5):
503–10, May 2010. doi: 10.1038/nbt.1633.
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 23
References II
G Jean, A Kahles, VT Sreedharan, F De Bona, and G R¨atsch. Rna-seq read alignments with
palmapper. Curr Protoc Bioinformatics, Unit 11.6, 2010.
Andre Kahles. Novel Methods for the Computational Analysis of RNA-Seq Data with
Applications to Alternative Splicing. PhD thesis, University of T¨ubingen, T¨ubingen,
Germany, September 2014.
G. R¨atsch and S. Sonnenburg. Accurate splice site detection for Caenorhabditis elegans. In
K. Tsuda B. Schoelkopf and J.-P. Vert, editors, Kernel Methods in Computational Biology.
MIT Press, 2004.
G. R¨atsch, S. Sonnenburg, and B. Sch¨olkopf. RASE: recognition of alternatively spliced exons
in C. elegans. Bioinformatics, 21(Suppl. 1):i369–i377, June 2005.
James T. Robinson, Helga Thorvaldsd´ottir, Wendy Winckler, Mitchell Guttman, Eric S. Lander,
Gad Getz, and Jill P. Mesirov. Integrative genomics viewer. Nature Biotechnology, 29:
24–26, 2011.
M. Sammeth. The Flux Simulator. Website, 2009. http://flux.sammeth.net/simulator.html.
Gabriele Schweikert, Alexander Zien, Georg Zeller, Jonas Behr, Christoph Dieterich,
Cheng Soon Ong, Petra Philips, Fabio De Bona, Lisa Hartmann, Anja Bohlen, Nina Kr¨uger,
S¨oren Sonnenburg, and Gunnar R¨atsch. mgene: Accurate svm-based gene finding with an
application to nematode genomes. Genome Research, 2009. URL
http://genome.cshlp.org/content/early/2009/06/29/gr.090597.108.full.pdf+html.
Advance access June 29, 2009.
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 24
References III
S. Sonnenburg, G. R¨atsch, A. Jagota, and K.-R. M¨uller. New methods for splice-site
recognition. In Proc. International Conference on Artificial Neural Networks, 2002.
S¨oren Sonnenburg, Alexander Zien, and Gunnar R¨atsch. ARTS: Accurate Recognition of
Transcription Starts in Human. Bioinformatics, 22(14):e472–480, 2006.
Cole Trapnell, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van
Baren, Steven L Salzberg, Barbara J Wold, and Lior Pachter. Transcript assembly and
quantification by rna-seq reveals unannotated transcripts and isoform switching during cell
differentiation. Nature Biotech, advance online publication, May 2010. doi:
10.1038/nbt.1621. URL http://dx.doi.org/10.1038/nbt.1621.
A Wolfe, K Singh, Y Zhong, P Drewe, others, G R¨atsch, and HG Wendel. Rna g-quadruplexes
cause eif4a-dependent oncogene translation in cancer. Nature, 2014. doi:
10.1038/nature13485.
G Zeller, RM Clark, K Schneeberger, A Bohlen, D Weigel, and G Ratsch. Detecting
polymorphic regions in arabidopsis thaliana with resequencing microarrays. Genome Res, 18
(6):918–929, 2008. ISSN 1088-9051 (Print). doi: 10.1101/gr.070169.107.
A. Zien, G. R¨atsch, S. Mika, B. Sch¨olkopf, T. Lengauer, and K.-R. M¨uller. Engineering Support
Vector Machine Kernels That Recognize Translation Initiation Sites. BioInformatics, 16(9):
799–807, September 2000.
c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 25

More Related Content

PPTX
RNA-seq differential expression analysis
PDF
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
PPTX
Rna seq
PDF
wings2014 Workshop 1 Design, sequence, align, count, visualize
PDF
RNASeq Experiment Design
PPTX
Catalyzing Plant Science Research with RNA-seq
PDF
An introduction to RNA-seq data analysis
PDF
ChipSeq Data Analysis
RNA-seq differential expression analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
Rna seq
wings2014 Workshop 1 Design, sequence, align, count, visualize
RNASeq Experiment Design
Catalyzing Plant Science Research with RNA-seq
An introduction to RNA-seq data analysis
ChipSeq Data Analysis

What's hot (20)

PDF
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
PPTX
Transcript detection in RNAseq
PDF
Rna seq
POT
RNA-seq quality control and pre-processing
PDF
Part 1 of RNA-seq for DE analysis: Defining the goal
PPTX
RNASeq DE methods review Applied Bioinformatics Journal Club
PDF
presentation
PPTX
RNA-seq Data Analysis Overview
PDF
RNA-seq: Mapping and quality control - part 3
PPTX
Differential gene expression
PDF
RNA-seq: general concept, goal and experimental design - part 1
PPTX
RNA-seq: A High-resolution View of the Transcriptome
PPT
Rna seq pipeline
PPTX
RNASeq - Analysis Pipeline for Differential Expression
PDF
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
PDF
DEseq, voom and vst
PDF
RNA sequencing: advances and opportunities
PPTX
Dgaston dec-06-2012
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Transcript detection in RNAseq
Rna seq
RNA-seq quality control and pre-processing
Part 1 of RNA-seq for DE analysis: Defining the goal
RNASeq DE methods review Applied Bioinformatics Journal Club
presentation
RNA-seq Data Analysis Overview
RNA-seq: Mapping and quality control - part 3
Differential gene expression
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: A High-resolution View of the Transcriptome
Rna seq pipeline
RNASeq - Analysis Pipeline for Differential Expression
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
DEseq, voom and vst
RNA sequencing: advances and opportunities
Dgaston dec-06-2012
Ad

Similar to Talk ABRF 2015 (Gunnar Rätsch) (20)

PDF
RNA-seq based Genome Annotation with mGene.ngs and MiTie
PPTX
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
PDF
20140710 6 c_mason_ercc2.0_workshop
PPTX
26 nov2013seminar
PPTX
June 25-26, Workshop
PPTX
May workshop
PPTX
May 15 workshop
PDF
Impact_of_gene_length_on_DEG
PDF
Annotation capabilities
PPTX
RNA Sequencing Research
PPTX
BiPday 2014 -- Tulipano Angelica
PPTX
2015 functional genomics variant annotation and interpretation- tools and p...
PDF
Introduction to RNA-seq
PDF
Targeted RNA Sequencing, Urban Metagenomics, and Astronaut Genomics
PPTX
2014 ucl
PDF
RNA-seq Analysis
PDF
rnaseq2015-02-18-170327193409.pdf
PDF
Clinical Analysis of Long Non-coding RNA (LncRNA): Therapeutic Targeting of T...
PPTX
Aug2015 analysis team 10 mason epigentics
PPTX
Bioinformatics t8-go-hmm v2014
RNA-seq based Genome Annotation with mGene.ngs and MiTie
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
20140710 6 c_mason_ercc2.0_workshop
26 nov2013seminar
June 25-26, Workshop
May workshop
May 15 workshop
Impact_of_gene_length_on_DEG
Annotation capabilities
RNA Sequencing Research
BiPday 2014 -- Tulipano Angelica
2015 functional genomics variant annotation and interpretation- tools and p...
Introduction to RNA-seq
Targeted RNA Sequencing, Urban Metagenomics, and Astronaut Genomics
2014 ucl
RNA-seq Analysis
rnaseq2015-02-18-170327193409.pdf
Clinical Analysis of Long Non-coding RNA (LncRNA): Therapeutic Targeting of T...
Aug2015 analysis team 10 mason epigentics
Bioinformatics t8-go-hmm v2014
Ad

Recently uploaded (20)

PDF
Lecture on Anesthesia for ENT surgery 2025pptx.pdf
PDF
Oral Aspect of Metabolic Disease_20250717_192438_0000.pdf
PDF
Calcified coronary lesions management tips and tricks
PPTX
Post Op complications in general surgery
PPTX
NRP and care of Newborn.pptx- APPT presentation about neonatal resuscitation ...
PPTX
Cardiovascular - antihypertensive medical backgrounds
PDF
OSCE Series Set 1 ( Questions & Answers ).pdf
PDF
SEMEN PREPARATION TECHNIGUES FOR INTRAUTERINE INSEMINATION.pdf
PPT
Rheumatology Member of Royal College of Physicians.ppt
PPTX
y4d nutrition and diet in pregnancy and postpartum
PDF
OSCE SERIES ( Questions & Answers ) - Set 3.pdf
PPTX
Radiation Dose Management for Patients in Medical Imaging- Avinesh Shrestha
PPT
Dermatology for member of royalcollege.ppt
PPTX
IMAGING EQUIPMENiiiiìiiiiiTpptxeiuueueur
PPTX
Medical Law and Ethics powerpoint presen
PPTX
09. Diabetes in Pregnancy/ gestational.pptx
PDF
B C German Homoeopathy Medicineby Dr Brij Mohan Prasad
PPTX
Hearthhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
PPTX
Neonate anatomy and physiology presentation
PDF
The_EHRA_Book_of_Interventional Electrophysiology.pdf
Lecture on Anesthesia for ENT surgery 2025pptx.pdf
Oral Aspect of Metabolic Disease_20250717_192438_0000.pdf
Calcified coronary lesions management tips and tricks
Post Op complications in general surgery
NRP and care of Newborn.pptx- APPT presentation about neonatal resuscitation ...
Cardiovascular - antihypertensive medical backgrounds
OSCE Series Set 1 ( Questions & Answers ).pdf
SEMEN PREPARATION TECHNIGUES FOR INTRAUTERINE INSEMINATION.pdf
Rheumatology Member of Royal College of Physicians.ppt
y4d nutrition and diet in pregnancy and postpartum
OSCE SERIES ( Questions & Answers ) - Set 3.pdf
Radiation Dose Management for Patients in Medical Imaging- Avinesh Shrestha
Dermatology for member of royalcollege.ppt
IMAGING EQUIPMENiiiiìiiiiiTpptxeiuueueur
Medical Law and Ethics powerpoint presen
09. Diabetes in Pregnancy/ gestational.pptx
B C German Homoeopathy Medicineby Dr Brij Mohan Prasad
Hearthhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Neonate anatomy and physiology presentation
The_EHRA_Book_of_Interventional Electrophysiology.pdf

Talk ABRF 2015 (Gunnar Rätsch)

  • 1. Analysis Tools for RNA-seq and Isoform Characterization Slides: t.co/nc7siKRm6W Gunnar R¨atsch Biomedical Data Science Group Computational Biology Center Memorial Sloan Kettering Cancer Center gxr #RNA #MMR #SplAdder #riboDiff #Cancer
  • 2. Biomedical Data Sciences Group Facts Cost of collecting data drops, amounts increase exponentially. We have more data than accurate algorithms. Group’s research Data Science Algorithms, Models & Tools Machine Learning, Bioinformatics. Biology & Medicine Problem Setting & Goals RNA processing regulation, Clinical data analysis. c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 1 Memorial Sloan-Kettering Cancer Center
  • 3. Biomedical Data Sciences Group Facts Cost of collecting data drops, amounts increase exponentially. We have more data than accurate algorithms. Group’s research Data Science Algorithms, Models & Tools Machine Learning, Bioinformatics. Biology & Medicine Problem Setting & Goals RNA processing regulation, Clinical data analysis. c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 1 Memorial Sloan-Kettering Cancer Center
  • 4. Biomedical Data Sciences Group Facts Cost of collecting data drops, amounts increase exponentially. We have more data than accurate algorithms. Group’s research Data Science Algorithms, Models & Tools Machine Learning, Bioinformatics. Biology & Medicine Problem Setting & Goals RNA processing regulation, Clinical data analysis. c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 1 Memorial Sloan-Kettering Cancer Center
  • 5. Learning About the Central Dogma Goal: Learn to predict what these processes accomplish: Given the DNA, . . . , predict all gene products f (DNA, ) = RNA g(RNA, ) = protein Estimating f , g amounts to cracking the codes of transcription, epigenetics, splicing, . . . c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 2 Memorial Sloan-Kettering Cancer Center
  • 6. RNA-seq based Transcriptome Characterization c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 3 Memorial Sloan-Kettering Cancer Center
  • 7. RNA-seq based Transcriptome Characterization c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 3 Memorial Sloan-Kettering Cancer Center
  • 8. RNA-seq based Transcriptome Characterization c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 3 Memorial Sloan-Kettering Cancer Center
  • 9. RNA-seq based Transcriptome Characterization c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 3 Memorial Sloan-Kettering Cancer Center
  • 10. RNA-seq based Transcriptome Characterization c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 3 Memorial Sloan-Kettering Cancer Center
  • 11. RNA-seq based Transcriptome Characterization oqtans.org cloud.oqtans.org c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 3 Memorial Sloan-Kettering Cancer Center
  • 12. Transcript Quantitation and Dependence on Alignments 0 1 2 3 4 5 6 7 8 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 maximal number of mismatches Pearsoncorrelationwithtrueabundance True read origins Alignments to genome 92% 53% 84% False alignments, multi-mappers etc. lead to weaker results Simulated human reads from transcripts of known abundance (Fluxsimulator, [Sammeth, 2009]), 3% error rate, alignment w/ PALMapper [Jean et al., 2010], quantification w/ rQuant [Bohnert et al., 2009], Person correlation over considered transcripts. c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 4 Memorial Sloan-Kettering Cancer Center
  • 13. bioRxiv dx.doi.org/10.1101/017103 Efficient BAM file postprocessor for RNA- & DNA-seq 100M alignments in 20 minutes (10 threads) Suitable for large-scale projects Improved accuracy for transcript quantification and prediction Open Source bioweb.me/mmr (C++) c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 5
  • 14. bioRxiv dx.doi.org/10.1101/017103 Efficient BAM file postprocessor for RNA- & DNA-seq 100M alignments in 20 minutes (10 threads) Suitable for large-scale projects Improved accuracy for transcript quantification and prediction Open Source bioweb.me/mmr (C++) c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 5
  • 15. bioRxiv dx.doi.org/10.1101/017103 Efficient BAM file postprocessor for RNA- & DNA-seq 100M alignments in 20 minutes (10 threads) Suitable for large-scale projects Improved accuracy for transcript quantification and prediction Open Source bioweb.me/mmr (C++) c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 5
  • 16. Multiple Mapper Resolution Principle (Iterated over all reads, N times) Use the change of local coverage around read mapping ... ... and use its smoothness to identify “better” mapping location location 1 location 2 Coverage c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 6 Memorial Sloan-Kettering Cancer Center
  • 17. Multiple Mapper Resolution Principle (Iterated over all reads, N times) Use the change of local coverage around read mapping ... ... and use its smoothness to identify “better” mapping location Read not mapped to location 1 ... ... but mapped to location 2 Read pairCoverage c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 6 Memorial Sloan-Kettering Cancer Center
  • 18. Multiple Mapper Resolution Principle (Iterated over all reads, N times) Use the change of local coverage around read mapping ... ... and use its smoothness to identify “better” mapping location + Read not mapped to location 1 ... ... but mapped to location 2 Read pair Variance measure Evaluation windowCoverage Average coverage c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 6 Memorial Sloan-Kettering Cancer Center
  • 19. Multiple Mapper Resolution Principle (Iterated over all reads, N times) Use the change of local coverage around read mapping ... ... and use its smoothness to identify “better” mapping location + + Read pair Variance measure Evaluation windowCoverage Average coverage Read not mapped to location 1 ... ... but mapped to location 2 Read mapped to location 1 ... ... and not mapped to location 2 Assignment1Assignment2 c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 6 Memorial Sloan-Kettering Cancer Center
  • 20. Multiple Mapper Resolution Results for simulated DNA-seq Smooths coverage as expected on an artificial dataset Simulated reads from tiling a part of A. thaliana genome, alignment w/ PALMapper [Jean et al., 2010] (with -a option), visual- ization with IGV [Robinson et al., 2011]. c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 7 Memorial Sloan-Kettering Cancer Center
  • 21. Multiple Mapper Resolution Results for simulated DNA-seq Smooths coverage as expected on an artificial dataset ≥ ≥ Simulated reads from tiling a part of A. thaliana genome, alignment w/ PALMapper [Jean et al., 2010] (with -a option), visual- ization with IGV [Robinson et al., 2011]. c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 7 Memorial Sloan-Kettering Cancer Center
  • 22. Multiple Mapper Resolution Results for simulated RNA-seq Improves performance of transcript quantification 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 1 2 4 6 all Correlation Mismatches original alignments best alignment only MMR treated alignment Simulated reads (75nt) from subset of human annotated transcripts with Fluxsimulator [Sammeth, 2009], PALMapper alignments [Jean et al., 2010], rQuant quantitation Bohnert et al. [2009], Pearson correlation over all considered transcripts. c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 8 Memorial Sloan-Kettering Cancer Center
  • 23. LARGE-SCALE BIOLOGY ARTICLE Nonsense-Mediated Decay of Alternative Precursor mRNA Splicing Variants Is a Major Determinant of the Arabidopsis Steady State TranscriptomeC W Gabriele Drechsel,a,1 André Kahles,b,1 Anil K. Kesarwani,a Eva Stauffer,a,2 Jonas Behr,b Philipp Drewe,b Gunnar Rätsch,b and Andreas Wachtera,3 a Center for Plant Molecular Biology, University of Tübingen, 72076 Tuebingen, Germany b Computational Biology Center, Sloan-Kettering Institute, New York, New York 10065 ORCID IDs: 0000-0002-3411-0692 (A.K.); 0000-0001-5486-8532 (G.R.); 0000-0002-3132-5161 (A.W.). The nonsense-mediated decay (NMD) surveillance pathway can recognize erroneous transcripts and physiological mRNA such as precursor mRNA alternative splicing (AS) variants. Currently, information on the global extent of coupled AS and NM The Plant Cell, Vol. 25: 3726–3742, October 2013, www.plantcell.org ã 2013 American Society of Plant Biologists. All rights reserved. c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 9
  • 24. bioRxiv dx.doi.org/10.1101/017095 Analysis of alternative isoforms with RNA-seq data Analyses known and identifies novel splicing events Quantifies & visualizes splicing-related data Suitable for large-scale projects (1000’s of samples) Improved accuracy for transcript quantification and prediction Open Source bioweb.me/spladder (python) c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 10
  • 25. bioRxiv dx.doi.org/10.1101/017095 Analysis of alternative isoforms with RNA-seq data Analyses known and identifies novel splicing events Quantifies & visualizes splicing-related data Suitable for large-scale projects (1000’s of samples) Improved accuracy for transcript quantification and prediction Open Source bioweb.me/spladder (python) c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 10
  • 26. bioRxiv dx.doi.org/10.1101/017095 Analysis of alternative isoforms with RNA-seq data Analyses known and identifies novel splicing events Quantifies & visualizes splicing-related data Suitable for large-scale projects (1000’s of samples) Improved accuracy for transcript quantification and prediction Open Source bioweb.me/spladder (python) c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 10
  • 27. SplAdder Ideas Major Problems in Transcriptome Analysis 1 Gene annotations are incomplete and often inaccurate 2 Whole transcript isoforms are difficult to predict/quantify c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 11 Memorial Sloan-Kettering Cancer Center
  • 28. SplAdder Ideas Major Problems in Transcriptome Analysis 1 Gene annotations are incomplete and often inaccurate 2 Whole transcript isoforms are difficult to predict/quantify c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 11 Memorial Sloan-Kettering Cancer Center
  • 29. SplAdder Ideas Major Problems in Transcriptome Analysis 1 Gene annotations are incomplete and often inaccurate 2 Whole transcript isoforms are difficult to predict/quantify Solution Augment annotation with RNA-Seq evidence Use single splicing events instead of full transcripts c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 11 Memorial Sloan-Kettering Cancer Center
  • 30. SplAdder Ideas Major Problems in Transcriptome Analysis 1 Gene annotations are incomplete and often inaccurate 2 Whole transcript isoforms are difficult to predict/quantify Solution Augment annotation with RNA-Seq evidence Use single splicing events instead of full transcripts Annotation Alignment Data Augmented Splicing Graph Detected Splice Events Quantified Splice Events Differential Analysis/ sQTL Tests Kahles et al., bioRxiv, 2015 c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 11 Memorial Sloan-Kettering Cancer Center
  • 31. SplAdder Ideas Major Problems in Transcriptome Analysis 1 Gene annotations are incomplete and often inaccurate 2 Whole transcript isoforms are difficult to predict/quantify Solution Augment annotation with RNA-Seq evidence Use single splicing events instead of full transcripts Annotation Alignment Data Augmented Splicing Graph Detected Splice Events Quantified Splice Events Differential Analysis/ sQTL Tests Kahles et al., bioRxiv, 2015 c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 11 Memorial Sloan-Kettering Cancer Center
  • 32. SplAdder Graph Augmentation Principle Collapse annotated transcripts into graph representation Use RNA-Seq evidence to add new nodes and edges T1E1 T1E2 T1E3 T1E4 T2E1 T2E2 T2E3 T3E1 T3E2 T3E3 T4E1 T4E2 c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 12 Memorial Sloan-Kettering Cancer Center
  • 33. SplAdder Graph Augmentation Principle Collapse annotated transcripts into graph representation Use RNA-Seq evidence to add new nodes and edges T1E1 T1E2 T1E3 T1E4 T2E1 T2E2 T2E3 T3E1 T3E2 T3E3 E1 E3 E4 E6 E2 E5 T4E1 T4E2 c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 12 Memorial Sloan-Kettering Cancer Center
  • 34. SplAdder Graph Augmentation Principle Collapse annotated transcripts into graph representation Use RNA-Seq evidence to add new nodes and edges coverage split alignments New cassette exon c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 12 Memorial Sloan-Kettering Cancer Center
  • 35. SplAdder Graph Augmentation Principle Collapse annotated transcripts into graph representation Use RNA-Seq evidence to add new nodes and edges coverage split alignments New cassette exon coverage New retained intron c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 12 Memorial Sloan-Kettering Cancer Center
  • 36. SplAdder Graph Augmentation coverage split alignments A New cassette exon coverage B New retained intron split alignments C New intron split alignments D Alternative splice sites on both intron ends split alignments E New start-terminal node / New end-terminal node split alignments F Alternative 3’splice site / New end-terminal node split alignments G Alternative 5’splice site / New start terminal node split alignments H New exon skip c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 13 Memorial Sloan-Kettering Cancer Center
  • 37. SplAdder Event Extraction E1 E3 E4 E6 E2 E5 c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 14 Memorial Sloan-Kettering Cancer Center
  • 38. SplAdder Event Extraction E1 E3 E4 E6 E2 E5 c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 14 Memorial Sloan-Kettering Cancer Center
  • 39. SplAdder Event Extraction E1 E3 E4 E6 E2 E5 E1 E3 E4 E6 E2 E5 E1 E3 E4 E6 E2 E5 E1 E3 E4 E6 E2 E5 E1 E3 E4 E6 E2 E5 Exon Skip Multiple Exon Skip Alternative 5’Splice Site Intron Retention Alternative 3’Splice Site c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 14 Memorial Sloan-Kettering Cancer Center
  • 40. SplAdder Event Quantification and Visualization Exon Skip c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 15 Memorial Sloan-Kettering Cancer Center
  • 41. SplAdder Event Quantification and Visualization Exon Skip a cb c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 15 Memorial Sloan-Kettering Cancer Center
  • 42. SplAdder Event Quantification and Visualization Exon Skip a cb PSI = b + c 2 · a + b + c c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 15 Memorial Sloan-Kettering Cancer Center
  • 43. SplAdder Event Quantification and Visualization Alternative 5’ Site a b Exon Skip a cb PSI = b + c 2 · a + b + c PSI = b a + b c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 15 Memorial Sloan-Kettering Cancer Center
  • 44. SplAdder Event Quantification and Visualization c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 15 Memorial Sloan-Kettering Cancer Center
  • 45. SplAdder Event Quantification and Visualization Summary SplAdder effectively augments the annotation Enables quantitative analysis of events instead of transcripts c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 15 Memorial Sloan-Kettering Cancer Center
  • 46. Splicing Analysis Across Multiple Cancer Types Goals 1 Identify cancer-specific splicing patterns 2 Identify variants regulating splicing in same gene (cis) 3 Identify variants regulating splicing in other cancer genes (trans) TCGA provides RNA-seq and matching exome data RNA-seq Find & quantify splicing events Exome Identify variants in exons & flanking intronic regions c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 16 Memorial Sloan-Kettering Cancer Center
  • 47. Splicing Analysis Across Multiple Cancer Types Goals 1 Identify cancer-specific splicing patterns 2 Identify variants regulating splicing in same gene (cis) 3 Identify variants regulating splicing in other cancer genes (trans) TCGA provides RNA-seq and matching exome data RNA-seq Find & quantify splicing events Exome Identify variants in exons & flanking intronic regions c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 16 Memorial Sloan-Kettering Cancer Center
  • 48. Splicing Variation Across 4,700 Samples Event Statistics Exon Skip Eve A B Analysis of a total of 4,700 RNA-seq samples from TCGA normal (tn), TCGA tumors (tc), Encode (ec) and Geuvadis (gv). Align- ment w/ STAR [Dobin et al., 2013], analysis w/ SplAdder (SplA) and Gencode annotation (Anno). Figure from [Kahles, 2014]. c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 17 Memorial Sloan-Kettering Cancer Center
  • 49. Uniform analysis of Large-Scale RNA-seq Data Large-scale Compute 4,700 RNA-seq libraries (≈100 TB) ⇒ STAR ≈ 6 CPU years ⇒ SplAdder ≈ 0.5 CPU years [Kahles et al.] Unified community resources Docker with ICGC RNA-seq alignment SOP bioweb.me/ICGC-RNA-SOP Syncronize with Encode, gTex, TCGA, . . . [ICGC PCAWG-3 WG] c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 18 Memorial Sloan-Kettering Cancer Center
  • 50. Uniform analysis of Large-Scale RNA-seq Data Large-scale Compute 4,700 RNA-seq libraries (≈100 TB) ⇒ STAR ≈ 6 CPU years ⇒ SplAdder ≈ 0.5 CPU years [Kahles et al.] Unified community resources Docker with ICGC RNA-seq alignment SOP bioweb.me/ICGC-RNA-SOP Syncronize with Encode, gTex, TCGA, . . . [ICGC PCAWG-3 WG] c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 18 Memorial Sloan-Kettering Cancer Center
  • 51. bioRxiv dx.doi.org/10.1101/017095 Analysis of Ribosome profiling and RNA-seq data Study translation efficiency Adjusts for expression differences Accurate method based on dispersion estimates and GLMs Open Source bioweb.me/ribodiff (python) c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 19
  • 52. bioRxiv dx.doi.org/10.1101/017095 Analysis of Ribosome profiling and RNA-seq data Study translation efficiency Adjusts for expression differences Accurate method based on dispersion estimates and GLMs Open Source bioweb.me/ribodiff (python) c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 19
  • 53. bioRxiv dx.doi.org/10.1101/017095 Analysis of Ribosome profiling and RNA-seq data Study translation efficiency Adjusts for expression differences Accurate method based on dispersion estimates and GLMs Open Source bioweb.me/ribodiff (python) c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 19
  • 54. Application to Ribosome Profiling Found a motif that was strongly in enriched in detected genes: G-quadruplex structures Memorial Sloan-Kettering Cancer Center Application to Ribosome Profiling Found a motif that was strongly in enriched in detected genes: G-quadruplex structures Memorial Sloan-Kettering Cancer Center (Analysis based on related but different strategy [Wolfe et al., 2014].) c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 20
  • 55. Summary MMR improves alignment choice for multi-mappers ⇒ Helps improving accuracy of tools like Cufflinks SplAdder identifies, quantifies & visualizes alternative splicing ⇒ Finds unannotated alternative splicing, tumor/normal splicing differences; splicing reprogramming; sQTLs riboDiff accurately detects differential translation efficiency ⇒ Ribosome footprinting revealed RNA G-Quadruplex elements in 5’ UTR that interacts with compound via eIF4a Tools (+ six other ones) are open source and available ⇒ ratschlab.org/tools ⇒ (more) Docker images come soon . . . c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 21 Memorial Sloan-Kettering Cancer Center
  • 56. Summary MMR improves alignment choice for multi-mappers ⇒ Helps improving accuracy of tools like Cufflinks SplAdder identifies, quantifies & visualizes alternative splicing ⇒ Finds unannotated alternative splicing, tumor/normal splicing differences; splicing reprogramming; sQTLs riboDiff accurately detects differential translation efficiency ⇒ Ribosome footprinting revealed RNA G-Quadruplex elements in 5’ UTR that interacts with compound via eIF4a Tools (+ six other ones) are open source and available ⇒ ratschlab.org/tools ⇒ (more) Docker images come soon . . . c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 21 Memorial Sloan-Kettering Cancer Center
  • 57. Summary MMR improves alignment choice for multi-mappers ⇒ Helps improving accuracy of tools like Cufflinks SplAdder identifies, quantifies & visualizes alternative splicing ⇒ Finds unannotated alternative splicing, tumor/normal splicing differences; splicing reprogramming; sQTLs riboDiff accurately detects differential translation efficiency ⇒ Ribosome footprinting revealed RNA G-Quadruplex elements in 5’ UTR that interacts with compound via eIF4a Tools (+ six other ones) are open source and available ⇒ ratschlab.org/tools ⇒ (more) Docker images come soon . . . c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 21 Memorial Sloan-Kettering Cancer Center
  • 58. Summary MMR improves alignment choice for multi-mappers ⇒ Helps improving accuracy of tools like Cufflinks SplAdder identifies, quantifies & visualizes alternative splicing ⇒ Finds unannotated alternative splicing, tumor/normal splicing differences; splicing reprogramming; sQTLs riboDiff accurately detects differential translation efficiency ⇒ Ribosome footprinting revealed RNA G-Quadruplex elements in 5’ UTR that interacts with compound via eIF4a Tools (+ six other ones) are open source and available ⇒ ratschlab.org/tools ⇒ (more) Docker images come soon . . . c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 21 Memorial Sloan-Kettering Cancer Center
  • 59. Acknowledgements R¨atsch Laboratory Andre Kahles Yi Zhong Philipp Drewe @ MDC Berlin Theofanis Karaletsos Kjong Van Lehmann Jonas Behr @ ETH Basel Regina Bohnert @ Molecular Health Geraldine Jean @ University of Nantes Cancer Biology Guido Wendel Kamini Singh, . . . Cancer Genomics Projects Angela Brooks, Broad Alvis Brazma, EBI Matt Wilkerson, UNC Niki Schultz, MSKCC Chris Sander, MSKCC Funding from MSKCC, Max Planck Society, European Union, German Research Foundation, Geoffrey Beene Foundation & NIH Thank you! c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 22
  • 60. Acknowledgements R¨atsch Laboratory Andre Kahles Yi Zhong Philipp Drewe @ MDC Berlin Theofanis Karaletsos Kjong Van Lehmann Jonas Behr @ ETH Basel Regina Bohnert @ Molecular Health Geraldine Jean @ University of Nantes Cancer Biology Guido Wendel Kamini Singh, . . . Cancer Genomics Projects Angela Brooks, Broad Alvis Brazma, EBI Matt Wilkerson, UNC Niki Schultz, MSKCC Chris Sander, MSKCC Funding from MSKCC, Max Planck Society, European Union, German Research Foundation, Geoffrey Beene Foundation & NIH Thank you! c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 22
  • 61. References I J. Behr, G. Schweikert, J. Cao, F. De Bona, G. Zeller, S. Laubinger, S. Ossowski, K. Schneeberger, D. Weigel, and G. R¨atsch. Rna-seq and tiling arrays for improved gene finding. Oral presentation at the CSHL Genome Informatics Meeting, September 2008. URL http: //www.fml.tuebingen.mpg.de/raetsch/lectures/RaetschGenomeInformatics08.pdf. R. Bohnert, J. Behr, and G R¨atsch. Transcript quantification with RNA-Seq data. BMC Bioinformatics, 10(S13):P5, 2009. URL http://www.biomedcentral.com/1471-2105/10/S13/P5. RM Clark, G Schweikert, C Toomajian, S Ossowski, G Zeller, P Shinn, N Warthmann, TT Hu, G Fu, DA Hinds, H Chen, KA Frazer, DH Huson, B Sch¨olkopf, M Nordborg, G R¨atsch, JR Ecker, and D Weigel. Common sequence polymorphisms shaping genetic diversity in arabidopsis thaliana. Science, 317(5836):338–342, 2007. ISSN 1095-9203 (Electronic). doi: 10.1126/science.1138632. Alexander Dobin, Carrie a Davis, Felix Schlesinger, Jorg Drenkow, Chris Zaleski, Sonali Jha, Philippe Batut, Mark Chaisson, and Thomas R Gingeras. STAR: ultrafast universal RNA-seq aligner. Bioinformatics (Oxford, England), 29(1):15–21, January 2013. ISSN 1367-4811. doi: 10.1093/bioinformatics/bts635. URL http://www.ncbi.nlm.nih.gov/pubmed/23104886. Mitchell Guttman, Manuel Garber, Joshua Z Levin, Julie Donaghey, James Robinson, Xian Adiconis, Lin Fan, Magdalena J Koziol, Andreas Gnirke, Chad Nusbaum, John L Rinn, Eric S Lander, and Aviv Regev. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas. Nat Biotechnol, 28(5): 503–10, May 2010. doi: 10.1038/nbt.1633. c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 23
  • 62. References II G Jean, A Kahles, VT Sreedharan, F De Bona, and G R¨atsch. Rna-seq read alignments with palmapper. Curr Protoc Bioinformatics, Unit 11.6, 2010. Andre Kahles. Novel Methods for the Computational Analysis of RNA-Seq Data with Applications to Alternative Splicing. PhD thesis, University of T¨ubingen, T¨ubingen, Germany, September 2014. G. R¨atsch and S. Sonnenburg. Accurate splice site detection for Caenorhabditis elegans. In K. Tsuda B. Schoelkopf and J.-P. Vert, editors, Kernel Methods in Computational Biology. MIT Press, 2004. G. R¨atsch, S. Sonnenburg, and B. Sch¨olkopf. RASE: recognition of alternatively spliced exons in C. elegans. Bioinformatics, 21(Suppl. 1):i369–i377, June 2005. James T. Robinson, Helga Thorvaldsd´ottir, Wendy Winckler, Mitchell Guttman, Eric S. Lander, Gad Getz, and Jill P. Mesirov. Integrative genomics viewer. Nature Biotechnology, 29: 24–26, 2011. M. Sammeth. The Flux Simulator. Website, 2009. http://flux.sammeth.net/simulator.html. Gabriele Schweikert, Alexander Zien, Georg Zeller, Jonas Behr, Christoph Dieterich, Cheng Soon Ong, Petra Philips, Fabio De Bona, Lisa Hartmann, Anja Bohlen, Nina Kr¨uger, S¨oren Sonnenburg, and Gunnar R¨atsch. mgene: Accurate svm-based gene finding with an application to nematode genomes. Genome Research, 2009. URL http://genome.cshlp.org/content/early/2009/06/29/gr.090597.108.full.pdf+html. Advance access June 29, 2009. c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 24
  • 63. References III S. Sonnenburg, G. R¨atsch, A. Jagota, and K.-R. M¨uller. New methods for splice-site recognition. In Proc. International Conference on Artificial Neural Networks, 2002. S¨oren Sonnenburg, Alexander Zien, and Gunnar R¨atsch. ARTS: Accurate Recognition of Transcription Starts in Human. Bioinformatics, 22(14):e472–480, 2006. Cole Trapnell, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van Baren, Steven L Salzberg, Barbara J Wold, and Lior Pachter. Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotech, advance online publication, May 2010. doi: 10.1038/nbt.1621. URL http://dx.doi.org/10.1038/nbt.1621. A Wolfe, K Singh, Y Zhong, P Drewe, others, G R¨atsch, and HG Wendel. Rna g-quadruplexes cause eif4a-dependent oncogene translation in cancer. Nature, 2014. doi: 10.1038/nature13485. G Zeller, RM Clark, K Schneeberger, A Bohlen, D Weigel, and G Ratsch. Detecting polymorphic regions in arabidopsis thaliana with resequencing microarrays. Genome Res, 18 (6):918–929, 2008. ISSN 1088-9051 (Print). doi: 10.1101/gr.070169.107. A. Zien, G. R¨atsch, S. Mika, B. Sch¨olkopf, T. Lengauer, and K.-R. M¨uller. Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites. BioInformatics, 16(9): 799–807, September 2000. c Gunnar R¨atsch (cBio@MSKCC) Tools for RNA-seq and Isoform Characterization ABRF Annual Meeting 2015, St. Louis 25