SlideShare a Scribd company logo
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic Data Analysis
Special Thank you to:Dr. Vladimir Galatenko, Chief Scientist at the
Tauber Bioinformatics Research Center. His work is
focused on issues related to Big Data analysis and,
in particular, on integration of multi-omics datasets.
A special research interest of Dr. Galatenko is
related to feature selection which is vital for efficient
development of clinical test systems.
Julia Panov, Ph.D. student involved in a number of
neuroscience research projects, an experienced
bioinformatics user. She relies on the T-BioInfo
platform for regular processing and integration of
omics data, collaborating with TBRC research
group on platform development. Dr. Javeed Iqbal, UNMC
Biological Examples and Reference Data sets:
• “Modeling precision treatment of breast cancer”, Daemen et. al.
(https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110),
• “Whole transcriptome profiling of patient-derived xenograft models as a tool to identify both
tumor and stromal specific biomarkers” Bradford et. al.
(http://www.oncotarget.com/index.php?journal=oncotarget&page=article&op=view&path[]=8014
&path[]=23533), and
• Processed data from The Cancer Genome Atlas samples (https://cancergenome.nih.gov/).
1. Next Generation Sequencing data pre-processing:
• Trimming technical sequences
• Removing PCR duplicates
2. RNA-seq based quantification of expression levels:
• Conventional pipelines (looking at known transcripts)
• Identification of novel isoforms
Processing of NGS data:
Gene set enrichment analysis
Analysis of Expression Data Using Machine Learning:
3. Unsupervised analysis of expression data:
• Principal Component Analysis
• Clustering
4. Supervised analysis:
• Differential expression analysis
• Classification, gene signature construction
Part 1:
Biological Significance
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic Data Analysis
Cell Line DataTypes: Gene Expression and RNA-editing
RNA-EditingGene expression
Breast cancer and cell line models
Cell Lines as Cancer Models
Sample 1 Sample 2 Sample 3 Sample 4
gene 1 4 3 3 7
gene 2 6 5 5 8
gene 3 6 6 6 6
gene 4 1 2 1 2
gene 5 9 10 1 5
gene 6 12 4 0 5
gene 7 1 7 9 8
gene 8 4 8 3 10
Gene ExpressionTable
Chr Pos start End Sample 1 Sample 2 Sample 3 Sample 4
chr1 1312400 1312400 0 0 0 0
chr1 8362100 8362100 0 0 0 0
chr11 842700 842700 0.705023 0 0 1.17938
chr12 753200 753200 0 0 0 0
chr16 521100 521100 0 0 0 0
chr16 1362700 1362700 0 0 0 0
chr16 1446900 1446900 0 0 8.55549 0
chr16 2176500 2176500 0 0 0 0
chr16 2896600 2896600 0 0 0 0
chr16 29972700 29972700 0 0 0 0
chr16 30358600 30358600 0 0 0 0
chr16 30778800 30778800 0 0 0 0
chr17 2042900 2042900 0 0 15.332 0
chr17 4538300 4538300 0 0 0 0
chr17 4891100 4891100 0 0 0 0
chr17 4946300 4946300 0 38.4794 0 0
chr17 5033200 5033200 0 0 0 0
RNA-editingTable
49 Cell Lines
Samples
Genes
Expression
values
Samples
Abundance
values
RNA-editing
Link1
Link2
Matrix of distances between samples based on
Gene Expression
Matrix of distances between samples based on
Abundance of RNA editing
HCC202
Gene expression and RNA editing abundance
tables similarly separate HCC202 sample
Genes and RNA editing
Genes: RNA Editing:
Olfactory
Receptor
s
miRNA
Rab
GTPases
EnhancedTrafficking
Learn more at: T-bio.info
Part 2:
Working with RNA-Seq Data
RNA-seq: overview
.…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….
Genome
19
RNA-seq: overview
Genome
Gene A Gene B Gene C
20
RNA-seq: overview
Genome
Gene A Gene B Gene C
Transcr. ATranscr. ATranscr. A Transcr. ATranscr. C
21
RNA-seq: overview
Genome
Gene A Gene B Gene C
Transcr. ATranscr. ATranscr. A Transcr. ATranscr. C
Reads
22
RNA-seq: overview
Genome
Gene A Gene B Gene C
Transcr. ATranscr. ATranscr. A Transcr. ATranscr. C
Reads
Transcr. A Transcr. C
23
RNA-seq: some details
Genome
Gene A Gene B Gene C
Transcr.Transcr.Transcr. A
Shattering
24
Transcr. CTranscr. C
RNA-seq: some details
Genome
Gene A Gene B Gene C
Transcr.Transcr.Transcr. A Transcr.
Adapters ligation
25
Transcr. C
RNA-seq: some details
Genome
Gene A Gene B Gene C
Transcr.Transcr.Transcr. A Transcr.
PCR amplification
26
Transcr. C
RNA-seq: some details
Genome
Gene A Gene B Gene C
Transcr.Transcr.Transcr. A Transcr.
“Reading”
27
Transcr. C
RNA-seq: per-sample processing
Preprocessing:
• Adapters removal plus additional trimming
• Removing PCR duplicates
Mapping
• Mapping on the set of known transcripts
• Mapping on genome (and potential identification of novel transcripts)
• Combined strategy
Quantification of expression levels
28
RNA-seq: Comments
PCR removal should be used with caution to avoid removing natural
duplicates (valuable links:
http://www.cureffi.org/2012/12/11/how-pcr-duplicates-arise-in-next-generation-sequencing/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4965708/ - DNA-seq and variant calling
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4597324/ - RNA-seq, ChIP-seq data
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3871669/ - trimming
29
RNA-seq: processing
30
31
RNA-seq: processing
RNA-seq: expression level quantification
Standard measures
• read counts (raw, expected)
• FPKM – fragments per kilo base per million mapped reads:
Number of reads mapped on the gene /
((total number of mapped reads – in millions) x (gene length – in kilobases))
• TPM – transcripts per million
For one sample TPMg = C x FPKMg, where C is selected in such a way that sum of all
TPMg is one million. But constants C are different for different samples.
32
RNA-seq: expression level quantification
Alternative definition of TPM:
(Number of reads mapped on the gene x read mean length x 106) /
(gene length x T),
where T is the sum over all genes of
(Number of reads mapped on the gene x read mean length) /
gene length
Each term here represents the number of sampled transcripts corresponding to a gene, and T estimates the
total number of sampled transcripts (molecules). Thus, TPM is the estimate of the number of transcripts
corresponding to a gene in every million transcripts.
Details: Wagner G.P., Kin K., Lynch V.J. (Theory Biosci., 2012) https://www.ncbi.nlm.nih.gov/pubmed/22872506
33
RNA-seq: expression level quantification
Linear scale vs Log-scale
Relative differences are biologically more meaningful than absolute.
Computations are simplified if a log-scaling is performed:
Log-scaled measure = log2 (linear-scale measure + shift)
For relatively large values a difference equal to 1 in log-scale is a 2x difference in linear
scale; difference equal to 3 in log-scale is a 8x difference in linear scale, etc.; difference
equal to -1 in log-scale is a 2x difference in linear scale, but in the opposite direction.
34
Comparison: the role of preprocessing
No
preprocessing
35
Comparison: the role of preprocessing
No PCR
duplicate
removal
36
Comparison: the role of preprocessing
Standard
37
Comparison: the role of preprocessing (output)
38
Comparison: the role of preprocessing
39
Comparison: the role of preprocessing
40
Extended pipeline
41
Extended pipeline
42
BREAK
43
B R E A K
Unsupervised analysis: PCA
44
Unsupervised analysis: PCA
45
Unsupervised analysis: PCA
46
47
Unsupervised analysis: hierarchical clustering
48
Unsupervised analysis: hierarchical clustering
49
Unsupervised analysis: hierarchical clustering
Unsupervised analysis: hierarchical clustering
50
Unsupervised analysis: hierarchical clustering
51
52
Unsupervised analysis: hierarchical clustering
53
Unsupervised analysis: hierarchical clustering
54
Unsupervised analysis: hierarchical clustering
55
Unsupervised analysis: hierarchical clustering
Dendrogram
56
Unsupervised analysis: hierarchical clustering
Dendrogram
Unsupervised analysis: PCA (15 genes)
57
Unsupervised analysis: PCA (15 genes)
58
Unsupervised analysis: hierarchical clustering, 15 genes
Dendrogram
59
Unsupervised analysis: hierarchical clustering, 15 genes
N-like BasalC-lowLuminal 60
Dendrogram
Gene annotation: ENSG to Gene Symbols plus GO
61
62
Unsupervised analysis: K-means, 15 genes
63
Unsupervised analysis: K-means, 15 genes
64
Unsupervised analysis: K-means, 15 genes
65
Unsupervised analysis: K-means, 15 genes
66
Unsupervised analysis: K-means, 15 genes
67
Unsupervised analysis: K-means, 15 genes
68
Unsupervised analysis: K-means, 15 genes
69
Unsupervised analysis: K-means, 15 genes
70
Unsupervised analysis: K-means, 15 genes
71
Unsupervised analysis: K-means, 15 genes
Unsupervised analysis: K-means, 15 genes
72
Unsupervised analysis: K-means, 15 genes
“The SUM52PE cell line was derived from a pleural effusion and was found to be
negative for ER and PR expression, however the original primary tumor from this
patient was positive for both hormone receptors”.
Chavez KJ, Garimella SV, Lipkowitz S. Triple negative breast cancer cell lines: one tool in the
search for better treatment of triple negative breast cancer. Breast Dis. 2010; 32(1-2):35-48.
Ethier SP, Kokeny KE, Ridings JW, Dilts CA. erbB family receptor expression and growth regulation
in a newly isolated human breast cancer cell line. Cancer Res. 1996; 56(4): 899-907.
73
BREAK
74
B R E A K
75
Supervised analysis:
SVM with a linear kernel as an example
76
Supervised analysis:
SVM with a linear kernel as an example
77
Supervised analysis:
SVM with a linear kernel as an example
d
d
78
Supervised analysis:
SVM with a linear kernel as an example
79
Supervised analysis:
SVM with a linear kernel as an example
?
80
Supervised analysis:
SVM with a linear kernel as an example
Supervised analysis:
SVM with a linear kernel as an example
?
81
Supervised analysis: available methods
• Linear Discriminant Analysis (LDA)
• Quadratic Discriminant Analysis (QDA)
• Random Forest
• Support Vector Machine (SVM)
• Naïve Bayes
82
Supervised analysis: 15 genes
83
Differential expression analysis
Quantities related to the degree of differential
expression:
• Difference between mean expression levels – fold
change (please, pay attention to scale);
• Statistical significance – p-value, adjusted p-value
(e.g., FDR)
• Expression level magnitude (caution with low-
expressed genes from the analysis).
84
Differential expression analysis
85
Differential expression analysis
86
Gene set / pathway enrichment analysis
Possible options:
• Use only lists (thresholding required): one of the standard
tools here is The Database for Annotation, Visualization and
Integrated Discovery – DAVID
(https://david.ncifcrf.gov/home.jsp, https://david-
d.ncifcrf.gov/).
• Take into consideration degrees of differential expression;
• Additionally take into consideration pathway topology.
87
Gene set / pathway enrichment analysis
88
Gene set / pathway enrichment analysis
89
BREAK
90
B R E A K
BREAK
91
HANDSON
Separation of TCGA and breast
cancer PDX samples
BREAK
92
HANDSON
Analysis of a subset of breast
cancer PDX samples
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic Data Analysis

More Related Content

PPTX
Free webinar-introduction to bioinformatics - biologist-1
PDF
User-friendly bioinformatics (Monthly Informational workshop)
PDF
Omics Logic - Bioinformatics 2.0
PDF
Louisiana Biomedical Research Network - Fall 2020 Bioinformatics Program Ove...
PPTX
A collaborative model for bioinformatics education: combining biologically i...
PDF
Omics Logic Genomics Program
PPTX
Introduction to bioinformatics
PDF
Bioinformatics resources and search tools - report on summer training proj...
Free webinar-introduction to bioinformatics - biologist-1
User-friendly bioinformatics (Monthly Informational workshop)
Omics Logic - Bioinformatics 2.0
Louisiana Biomedical Research Network - Fall 2020 Bioinformatics Program Ove...
A collaborative model for bioinformatics education: combining biologically i...
Omics Logic Genomics Program
Introduction to bioinformatics
Bioinformatics resources and search tools - report on summer training proj...

What's hot (20)

PPT
Introduction to Bioinformatics Slides
PPT
Bioinformatics
PPTX
Uses of Artificial Intelligence in Bioinformatics
PDF
Introduction to Bioinformatics
PPTX
AI in Bioinformatics
PPT
Intro bioinformatics
PPTX
Application of bioinformatics
PDF
Introduction to Bioinformatics
PPTX
Bioinformatics
PPSX
Bioinformatic tools in Pheromone technology
PDF
Bioinformatics
PPT
Bio Informatics
PPTX
Bioinformatics ppt
PPTX
Introduction to Bioinformatics
PPT
B.sc biochem i bobi u-1 introduction to bioinformatics
PPT
Bioinformatics-General_Intro
PPTX
Bioinformatics
DOCX
Bioinformatics on internet
PPTX
Bioinformatics
PPTX
Careers in bioinformatics
Introduction to Bioinformatics Slides
Bioinformatics
Uses of Artificial Intelligence in Bioinformatics
Introduction to Bioinformatics
AI in Bioinformatics
Intro bioinformatics
Application of bioinformatics
Introduction to Bioinformatics
Bioinformatics
Bioinformatic tools in Pheromone technology
Bioinformatics
Bio Informatics
Bioinformatics ppt
Introduction to Bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformatics
Bioinformatics-General_Intro
Bioinformatics
Bioinformatics on internet
Bioinformatics
Careers in bioinformatics
Ad

Similar to Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic Data Analysis (20)

PPTX
June 25-26, Workshop
PPTX
May workshop
PPTX
May 15 workshop
PPTX
Dgaston dec-06-2012
PPTX
TNBC Research Presentation and medical virology .pptx
PPTX
Rna seq
PDF
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
PDF
RapportHicham
PDF
Bioinfornatics Practical Lab Manual For Biotech
PPTX
RNA-Seq_Presentation
PDF
RNASeq Experiment Design
PDF
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2
PPTX
EiB Seminar from Antoni Miñarro, Ph.D
PPTX
Bioinformatics t8-go-hmm v2014
PPTX
Bioinformatics
PDF
SFScon 2020 - Paola Lecca - A network analysis computational pipeline to dete...
PDF
Fehrman Nat Gen 2014 - Journal Club
PPTX
RNA Sequencing Research
PPT
20100509 bioinformatics kapushesky_lecture05_0
PPTX
RNA-seq differential expression analysis
June 25-26, Workshop
May workshop
May 15 workshop
Dgaston dec-06-2012
TNBC Research Presentation and medical virology .pptx
Rna seq
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
RapportHicham
Bioinfornatics Practical Lab Manual For Biotech
RNA-Seq_Presentation
RNASeq Experiment Design
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2
EiB Seminar from Antoni Miñarro, Ph.D
Bioinformatics t8-go-hmm v2014
Bioinformatics
SFScon 2020 - Paola Lecca - A network analysis computational pipeline to dete...
Fehrman Nat Gen 2014 - Journal Club
RNA Sequencing Research
20100509 bioinformatics kapushesky_lecture05_0
RNA-seq differential expression analysis
Ad

Recently uploaded (20)

PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPT
protein biochemistry.ppt for university classes
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
2. Earth - The Living Planet earth and life
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
BIOMOLECULES PPT........................
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
famous lake in india and its disturibution and importance
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
An interstellar mission to test astrophysical black holes
PDF
Placing the Near-Earth Object Impact Probability in Context
HPLC-PPT.docx high performance liquid chromatography
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Derivatives of integument scales, beaks, horns,.pptx
protein biochemistry.ppt for university classes
Introduction to Fisheries Biotechnology_Lesson 1.pptx
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
2. Earth - The Living Planet earth and life
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
TOTAL hIP ARTHROPLASTY Presentation.pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
bbec55_b34400a7914c42429908233dbd381773.pdf
BIOMOLECULES PPT........................
Introduction to Cardiovascular system_structure and functions-1
famous lake in india and its disturibution and importance
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
. Radiology Case Scenariosssssssssssssss
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
An interstellar mission to test astrophysical black holes
Placing the Near-Earth Object Impact Probability in Context

Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic Data Analysis

Editor's Notes

  • #2: Welcome to our first workshop of this kind – we are constantly experimenting, so hopefully this experiment will be successful. Our goal is to share with you several important concepts around Next Generation Sequencing Analysis techniques, specifically how to process, analyze and annotate gene expression data.
  • #3: Before we start, I would like to say a special thank you to Dr. Javeed Iqbal, whom I am sure you all know from University of Nebraska Medical Center. He has been a tremendous help organizing the venue and sharing updates about the workshop with many of you. Also, let me introduce our speakers today – Dr. Vladimir Galatenko, the chief scientist at the Tauber Bioinformatics Research Center. Together with Dr. Galatenko we invited Julia Panov, a Ph.D. student who regularly relies on the T-BioInfo platform in her research
  • #4: In this workshop, we will utilize oncology-related public-domain datasets derived from cell lines, animal models and if we have time, will touch on TCGA data. I also want to mention that these are projects prepared as examples for this workshop, however one of our goals is to identify key topics of interest for future workshops and online courses we are developing. We would be happy to speak with you afterwards about topics of interest, pathologies or other types of data of interest.
  • #5: We will cover important topics about Next Generation sequencing data: pre-processing and quantification of expression levels
  • #94: Conclusion