SlideShare a Scribd company logo
RNA-seq: Generating Views of 
the Transcriptome 
Sean Davis, M.D., Ph.D. 
Genetics Branch, Center for Cancer Research 
National Cancer Institute 
National Institutes of Health
Normal 
Karyotype 
Tumor 
Karyotype
The Central Dogma
Patient and 
Population 
phenotype 
Characteristics 
Gene Copy 
Number 
Sequence 
Variation 
Chromatin 
Structure and 
Function 
Gene 
Expression 
Transcriptional 
Regulation 
DNA 
Methylation
RNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
Your Nature Paper
Overview 
• Quality Control 
• Alignment and Assembly 
• Transcript Quantification 
• Visualization 
• Differential Expression 
• Experimental Design 
10
RNA-seq Data Analysis Overview
RNA-seq protocol schematic
Approaches to RNA-seq 
Nature Biotech (2010) 28, 421-423
Quality Control 
• Specialized RNA-seq quality control software 
• Samples should be “similar” 
• No “absolute” cutoffs for good vs. bad samples 
• Visualize data in as many ways as necessary 
(browser, plots, sample similarity plots like 
MDS, etc.) 
14
Alignment
RNA-seq Alignment
RNA-seq Data Analysis Overview
From: https://research.fhcrc.org/mcintosh/en/tools.html 18
Transcript Quantification
Models for RNA-seq 
• Count-based models 
• Multi-reads (isoform resolution) 
• Paired-end reads (include length resolution 
step) 
• Positional bias along transcript length 
• Sequence bias
Read Counting
L. Pachter (2011) arXiv:1104.3889v
An Example of Sequencing Bias 
Hansen (2010), NAR
Sample-specific Sequence Bias
RNA-seq Data Analysis Overview
Transcript Quantification Models
Result of Quantification
Clustering and Visualization
RNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
Distance Metrics 
 Euclidean distance 
 Manhattan distance 
 Minkowski distance (generalized distance)
Hierarchical Clustering 
Gene 1 
Gene 2 
Gene 3 
Gene 4 
Gene 5 
Gene 6 
Gene 7 
Gene 8
Hierarchical Clustering 
Gene 1 
Gene 2 
Gene 3 
Gene 4 
Gene 5 
Gene 6 
Gene 7 
Gene 8
Hierarchical Clustering 
Gene 1 
Gene 2 
Gene 3 
Gene 4 
Gene 5 
Gene 6 
Gene 7 
Gene 8
Hierarchical Clustering 
Gene 1 
Gene 2 
Gene 3 
Gene 4 
Gene 5 
Gene 6 
Gene 7 
Gene 8
Differential Expression
MA Plot
RNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
DE Software Runtime
RNA-seq Data Analysis Overview
RNA-seq workflow as 
proposed by Anders et al. 
in Nature Protocols
Fusion Gene Detection
Fusion gene schematic
RNA-seq Data Analysis Overview
Fusion Detection
Other Applications 
• Alternative splicing 
• Isoform utilization 
• Functional annotation of genomic regions 
• Allele-specific expression 
• eQTL analysis 
• Classification problems (eg., cancer with 
unknown primary) 
• … 
51
Experimental Design 
• What are my goals? 
– Differential expression? 
– Transcriptome assembly? 
– Identify rare, novel trancripts? 
• System characteristics? 
– Large, expanded genome? 
– Intron/exon structures complex? 
– No reference genome or transcriptome
Experimental Design 
• Technical replicates 
– Probably not needed due to low technical variation 
• Biological replicates 
– Not explicitly needed for transcript assembly 
– Essential for differential expression analysis 
– Number of replicates often driven by sample 
availability for human studies 
– More is almost always better
Take Home Messages 
• Defining the experimental question(s) is 
critical 
• No gold-standard analysis workflows exist yet 
• Be aware that experimental biases present in 
nearly all -omics datasets 
• Biological replicates are almost always 
beneficial (necessary) 
• RPKM/FPKM are for human consumption, not 
computation (generally) 
54
Links of Interest 
• http://bioconductor.org 
• http://biostars.org 
• http://www.rna-seqblog.com/ 
• https://genome.ucsc.edu/ENCODE/ 
• http://www.ncbi.nlm.nih.gov/gds/

More Related Content

PPTX
Single cell RNA sequencing; Methods and applications
PDF
An introduction to RNA-seq data analysis
PDF
RNAseq Analysis
PDF
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
PDF
Rna seq
PDF
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
PDF
RNA-seq Analysis
PPTX
NGS data formats and analyses
Single cell RNA sequencing; Methods and applications
An introduction to RNA-seq data analysis
RNAseq Analysis
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Rna seq
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
RNA-seq Analysis
NGS data formats and analyses

What's hot (20)

PPTX
Third Generation Sequencing
PPTX
Next Generation Sequencing of DNA
PDF
A short introduction to single-cell RNA-seq analyses
PPTX
Next generation sequencing
PDF
Basics of Genome Assembly
PPTX
Introduction to Next Generation Sequencing
PPTX
Tools for Transcriptome Data Analysis
PPTX
Next generation sequencing technologies for crop improvement
PPT
Gene expression profiling i
PPT
Recombinant DNA technology lect
PPTX
Comparative genomics
PPTX
Single nucleotide polymorphisms (sn ps), haplotypes,
PPTX
Rna seq and chip seq
PPTX
Next Generation Sequencing and its Applications in Medical Research - Frances...
PPT
Rna seq pipeline
PPTX
Genome sequencing
PPTX
Gene expression concept and analysis
PPSX
Functional genomics
PPTX
RNA-seq differential expression analysis
Third Generation Sequencing
Next Generation Sequencing of DNA
A short introduction to single-cell RNA-seq analyses
Next generation sequencing
Basics of Genome Assembly
Introduction to Next Generation Sequencing
Tools for Transcriptome Data Analysis
Next generation sequencing technologies for crop improvement
Gene expression profiling i
Recombinant DNA technology lect
Comparative genomics
Single nucleotide polymorphisms (sn ps), haplotypes,
Rna seq and chip seq
Next Generation Sequencing and its Applications in Medical Research - Frances...
Rna seq pipeline
Genome sequencing
Gene expression concept and analysis
Functional genomics
RNA-seq differential expression analysis
Ad

Viewers also liked (11)

PDF
Analysis of ChIP-Seq Data
PDF
PDF
ChipSeq Data Analysis
PPTX
Rna seq
PDF
wings2014 Workshop 1 Design, sequence, align, count, visualize
PDF
RNASeq Experiment Design
PPT
PPTX
Illumina Sequencing
PDF
PDF
RNA-seq: general concept, goal and experimental design - part 1
Analysis of ChIP-Seq Data
ChipSeq Data Analysis
Rna seq
wings2014 Workshop 1 Design, sequence, align, count, visualize
RNASeq Experiment Design
Illumina Sequencing
RNA-seq: general concept, goal and experimental design - part 1
Ad

Similar to RNA-seq Data Analysis Overview (20)

PPTX
RNA-seq: A High-resolution View of the Transcriptome
PPTX
Dgaston dec-06-2012
PPTX
Bioinformatics class ppt arifuzzaman
PDF
RNA sequencing analysis tutorial with NGS
PDF
Impact_of_gene_length_on_DEG
PDF
20140710 6 c_mason_ercc2.0_workshop
PPTX
RNA Sequencing Research
PPTX
Module_6_Lecture 1_GG.pptx single cell RNA sequence
PDF
Introduction to Galaxy and RNA-Seq
PDF
Part 1 of RNA-seq for DE analysis: Defining the goal
PDF
RNA sequencing: advances and opportunities
PDF
sequencing-methods-review
PDF
RNA-Seq Data Analysis: An abstract Guide
PPTX
Bioinformatics
PDF
Visualization Approaches for Biomedical Omics Data: Putting It All Together
PPTX
Differential gene expression
PPTX
Why Transcriptome? Why RNA-Seq? ENCODE answers….
PPTX
Bioinformatics t8-go-hmm v2014
PPTX
RNA-Seq_Presentation
PDF
Forsharing cshl2011 sequencing
RNA-seq: A High-resolution View of the Transcriptome
Dgaston dec-06-2012
Bioinformatics class ppt arifuzzaman
RNA sequencing analysis tutorial with NGS
Impact_of_gene_length_on_DEG
20140710 6 c_mason_ercc2.0_workshop
RNA Sequencing Research
Module_6_Lecture 1_GG.pptx single cell RNA sequence
Introduction to Galaxy and RNA-Seq
Part 1 of RNA-seq for DE analysis: Defining the goal
RNA sequencing: advances and opportunities
sequencing-methods-review
RNA-Seq Data Analysis: An abstract Guide
Bioinformatics
Visualization Approaches for Biomedical Omics Data: Putting It All Together
Differential gene expression
Why Transcriptome? Why RNA-Seq? ENCODE answers….
Bioinformatics t8-go-hmm v2014
RNA-Seq_Presentation
Forsharing cshl2011 sequencing

More from Sean Davis (11)

PDF
Lightweight data engineering, tools, and software to facilitate data reuse an...
PPTX
2016 07 12_purdue_bigdatainomics_seandavis
PPTX
SRAdb Bioconductor Package Overview
PPTX
ShinySRAdb: an R package using shiny to wrap the SRAdb Bioconductor package
PDF
Introduction to R
PDF
Public datatutorialoverview
PPTX
Sssc retreat.bioinfo resources.20110411
ODP
OKC Grand Rounds 2009
PPT
Genetics Branch Journal club
ODP
Genomics Technologies
ODP
Bioc strucvariant seattle_11_09
Lightweight data engineering, tools, and software to facilitate data reuse an...
2016 07 12_purdue_bigdatainomics_seandavis
SRAdb Bioconductor Package Overview
ShinySRAdb: an R package using shiny to wrap the SRAdb Bioconductor package
Introduction to R
Public datatutorialoverview
Sssc retreat.bioinfo resources.20110411
OKC Grand Rounds 2009
Genetics Branch Journal club
Genomics Technologies
Bioc strucvariant seattle_11_09

Recently uploaded (20)

PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
BIOMOLECULES PPT........................
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
The scientific heritage No 166 (166) (2025)
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
Microbiology with diagram medical studies .pptx
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
2. Earth - The Living Planet earth and life
PPTX
INTRODUCTION TO EVS | Concept of sustainability
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
. Radiology Case Scenariosssssssssssssss
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Phytochemical Investigation of Miliusa longipes.pdf
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
BIOMOLECULES PPT........................
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
The scientific heritage No 166 (166) (2025)
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
Comparative Structure of Integument in Vertebrates.pptx
AlphaEarth Foundations and the Satellite Embedding dataset
Microbiology with diagram medical studies .pptx
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Placing the Near-Earth Object Impact Probability in Context
neck nodes and dissection types and lymph nodes levels
The KM-GBF monitoring framework – status & key messages.pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
2. Earth - The Living Planet earth and life
INTRODUCTION TO EVS | Concept of sustainability

RNA-seq Data Analysis Overview

Editor's Notes

  • #2: I am going to spend a few minutes illustrating how existing and emerging high-throughput genomic technologies are being used to understand cancer, a mindnumbingly complex and disregulated biologic process.
  • #3: The first karyotypes were produced in 1956. Shown here is a comparison of a normal karyotype of a normal female and one from a tumor. By 1960, a karyotype of a cancer genome revealed the presence of the Philadelphia chromosome. Now known to represent the BCR-ABL fusion protein, it was not until 33 years later in 1993 that a drug, gleevec, become available that targeted the fusion product. By applying high-throughput microarray technologies, the Cancer Genetics Branch is striving to make observations of the cancer genome that will provide deeper understandings of the biology of cancer, to develop prognostic and diagnostic markers to improve patient-specific treatments, and to find promising targets for directed drug therapy.
  • #5: Since Knudson’s famous hypothesis proposing the two-hit model, our understanding of cancer as a genetic disease has progressed to the realization that cancer is not often a function of a single gene gone awry, but probably represents a complex interaction of multiple processes in the genome including altered copy number, gene expression, transcriptional regulation, chromatin modification, sequence variation, and DNA methylation. It is vital to the goal of producing better patient outcomes to understand not only what genes are involved in a certain type of cancer, but also how these other processes affect gene regulation. In short, an integrated view of the cancer genome is necessary and is now becoming possible.