SlideShare a Scribd company logo
1
T-BioInfo is designed for processing, analysis and
integration of multi-omics data. The platform is used in
multiple research groups to extract meaningful insights
from large multi-omics datasets. Our current effort
expands to education, by enabling more people to
extract meaningful, data-driven insights from omics
datasets with biomedical applications. To learn more
about the platform and it’s research and educational
features, follow the highlighted links .
T-bio.info | edu.t-bio.info | server.t-bio.info
2
3
4
5
Modeling Precision Medicine
Machine Learning forTranscriptomics Data: Extracting Meaningful
insights from high-throughput biomedical data.
6
Clinical Subtypes Molecular Subtypes
7
Diagnosis, Prognosis, Response toTreatment
8
Survival prediction
Treatment Selection
OncotypeDXPAM50
Daemen et al., 2013, “Modeling precision treatment of breast cancer”: an analysis of over 70 different Breast Cancer cell lines and over 90 different
therapeutic agents. https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110 9
Files we will use in this session
10
BREAK
11
Q&A
Part 1:
RNA-Seq Processing
from raw reads to a table of expression
12
RNA-Seq: overview
.…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….
Genome
13
Gene A Gene B Gene C
Transcr. ATranscript A Transcr. ATranscript C
14
.…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….Gene A Gene B Gene C
Transcr. ATranscript A Transcr. ATranscript C
Reads
RNA-Seq: overview
15
.…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….Gene A Gene B Gene C
Transcr. ATranscript A Transcr. ATranscript C
Reads
RNA-Seq: some details
1. Shattering 2. Adapters ligation 3. PCR amplification 4. “Reading”
Preprocessing:
• Adapters removal plus additional
• Removing PCR duplicates
16
Quantification of expression levels
Mapping
• Mapping on the set of known transcripts
• Mapping on genome (and potential
identification of novel transcripts)
• Combined strategy
RNA-Seq: overview
17
RNA-Seq: basic pipeline
18
Data Processing Practice
Create a pipeline:
1. Upload same SVL files
2. pre-processing steps:Trimmomatic, PCRclean
3. Mapping on Genome: HiSat2
4. IsoformConstruction: Cufflinks
5. GTF Merging: Cuffmerge
6. Mapping onTranscripts: Bowtie2-t
7. Quantification: RSEMExpTable
19
RNA-Seq: extended pipeline
20
ExpressionTable
Sample Name
Gene ID What is this number?
Standard Measures of RNA Quantification:
• Counts
• FPKM – fragments per kilobase per million mapped reads:
Number of reads mapped on the gene
((total number of mapped reads – in millions) x (gene length in
kilobases))
• TPM – transcripts per million
For one sample TPMg = C x FPKMg, where C is selected in such a way that sum of all
million. Constants C are different for different samples.
21
Linear scale vs Log-scale
Relative differences are biologically more meaningful than absolute.
are simplified if a log-scaling is performed:
Log-scaled measure =
log2 (linear-scale measure + shift)
For relatively large values:
difference equal to 1 in log-scale is a 2x difference in linear scale;
difference equal to 3 in log-scale is a 8x difference in linear scale. etc;
difference equal to -1 in log-scale is a 2x difference in linear scale, but in the opposite direction.
22
Preprocessing:
• Adapters removal plus additional
• Removing PCR duplicates
23
Quantification of expression levels
Mapping
• Mapping on the set of known transcripts
• Mapping on genome (and potential
identification of novel transcripts)
• Combined strategy
RNA-Seq: overview
Comparison: the role of preprocessing
24
High expression can be affected by pre-processing steps like PCR-clean and “Trimmomatic”
BREAK
25
Q&A
BREAK
26
Q&A
Error Correction – CORAL, ECHO, RACER, eMER
Different Mappers – HiSat,TopHat, STAR, BWA
Other Sections:
• Differential Expression – CuffDiff, EDGER, DESEQ
• Segmentation - BinS
Part 2:
Machine Learning
Data exploration and classification
27
28
Unsupervised Machine Learning
Dog
Dog
Dog
Cat
Cat
Cat
29
Group 1
Group 2
Outlier
Unsupervised analysis: PCA
30
• Explore data
• Visualize
Why use Principal Component
Analysis?
• Data Filtering
• Outliers
• Interpretation
Considerations:
31
Unsupervised analysis: PCA
32
Unsupervised analysis: PCA
PCA 7,000 genes PCA PAM50 (35) genes
Normal-like
Basal
Claudin-low
Luminal
33
Unsupervised analysis: Hierarchical Clustering
• Identify groups
• Associate sample to group
Why use clustering?
• Various methods
• Random selection in some methods
• Interpretation
Considerations:
34
Unsupervised analysis: Hierarchical Clustering
Unsupervised analysis: hierarchical clustering
Dendrogram
35
2 clusters
4 clusters
8 clusters
36
Unsupervised Analysis Practice
• Remove sample IDs
• Mark Group Names as ID
• Run H-clust
CellLines_ExprData_marked.txt
BREAK
37
Q&A
38
DogsCats
?????
Training Set Test Set
Supervised Machine Learning
39
Step-wise Linear Discriminant Analysis (swLDA)
40
SupportVector Machine (SVM) with Linear Kernel
d
d
41
SupportVector Machine (SVM) with Linear Kernel
?
?
42
Support Vector Machine (SVM) with Linear Kernel
• Fitting classifier on training set and predicting classes on the test set
• Is it possible to tune 7000 coefficients by 52 samples?
• Some algorithms do feature selection: swLDA, random forest
• Other algorithms won’t work if number of features >> number of
samples
• Curse of dimensionality
43
Considerations Supervised analysis
44
• Extracting 15 highly informative genes from the swLDA classifier
• How other supervised learning algorithms can be applied (e.g.,
SVM)
• Feature selection can also improve quality of unsupervised learning
analysis
Step-wise Linear Discriminant Analysis (swLDA)
45
Classification Practice
• Organize the table with 15
genes by sample type
• Color expression (green –
low; red – high)
• Which genes stand out?
• Which sample stand out?
• What groups are hard to
detect?
CellLines_15Genes_market.txt
46
Classification Practice: PCA of 15 gene table
47
Hierarchical Clustering of 15 gene table
N-like Basal
C-low
Luminal4 clusters
BREAK
48
Q&A
Part 3:
Interpretation
Annotating and Interpreting Gene Expression
49
Gene annotation: ENSG to Gene Symbols plus GO
50
51
Annotation Practice
52
http://www.oncotarget.com/index.php?journal=oncotarget&page=arti
cle&op=view&path[]=23869&path[]=75083
https://www.nature.com/articles/1208329
BREAK
53
Q&A
1. PCA plot using top 15 genes
from differential expression analysis
54
Homework:
Separation of samples from various sources:TCGA and PDX
55
2. New Datasets
56
Part 1: Conventional Machine Learning Approaches for Next
Generation Sequencing
Rapid RNA-seq processing for expression quantification applying
logical pipeline construction and pre-processing considerations.
hands-on exercises, participants will explore the expression
using conventional unsupervised machine learning methods and
supervised classifiers with and without feature extraction. Using
BioInfo platform, participants will learn about the logic and
considerations of applying such methods and be prepared for
independent downstream analysis and visualization of data
downloaded R scripts produced by the system. The
produced/downloaded code will be reviewed, customized and
subsequent session.
T-bio.info | edu.t-bio.info (FREE) | server.t-bio.info (14 days DEMO)
57
58
Required installations:
R >= 3.4
R Studio
gplots
ggfortify
ggplot2
ggpubr
e1071
mda
MASS
klaR
Part 2: Combining custom software with R to
streamline analysis workflows and visualize ‘Omics
data insights.
Differential Gene Expression, Gene Set Enrichment
Analysis
R visualization from scratch: utilize the same dataset for
basic data exploration and visualization in R.
This session will strengthen the participants ability to
transition to script-based workflows in RNA-seq
downstream analysis and visualization. Participants will
learn about downstream capabilities of R-based workflow
to transform and manipulate tables and visualize findings
in a meaningful way.
59
Download and Modify R Scripts
60
Differential expression analysis
Quantities related to the degree of differential expression:
• Difference between mean expression levels – fold change
(please, pay attention to scale);
• Statistical significance – p-value, adjusted p-value (e.g., FDR)
• Level of Expression (caution with low-expressed genes from the
analysis)
61
• Hard to interpret when number of groups is greater than two, so we can use Claudin-low vs normal-
like groups.
• Differential Expression is a natural and easy to interpret feature selection procedure.
• Pathway enrichment analysis can be applied to the resulting table 62
Differential expression analysis
63
Differential expression analysis
64
Differential expression analysis
Gene set / pathway enrichment analysis
GAGE -
• Use only lists (thresholding required): one of the standard tools here isThe
Database for Annotation,Visualization and Integrated Discovery – DAVID
(https://david.ncifcrf.gov/home.jsp, https://david-d.ncifcrf.gov/).
• Takes into consideration level of differential expression
65
66
Gene set / pathway enrichment analysis
67
Gene set / pathway enrichment analysis
68
Gene set / pathway enrichment analysis
Regulation of Actin Cytoskeleton B Cell Receptor Signaling Pathway
69
Required installations:
R >= 3.4
R Studio
gplots
ggfortify
ggplot2
ggpubr
e1071
mda
MASS
klaR
Part 2: Combining custom software with R to
streamline analysis workflows and visualize ‘Omics
data insights.
Differential Gene Expression, Gene Set Enrichment
Analysis
R visualization from scratch: utilize the same dataset for
basic data exploration and visualization in R.
This session will strengthen the participants ability to
transition to script-based workflows in RNA-seq
downstream analysis and visualization. Participants will
learn about downstream capabilities of R-based workflow
to transform and manipulate tables and visualize findings
in a meaningful way.
70
R Studio
71

More Related Content

DOC
Bio Scope
PDF
BITS - Comparative genomics: the Contra tool
PPTX
A Novel Approach for Developing Paraphrase Detection System using Machine Lea...
PDF
Giab ashg 2017
PDF
Genevestigator
PPTX
2015 bioinformatics go_hmm_wim_vancriekinge
PPTX
Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...
PDF
Genetic Algorithm based Optimization of Machining Parameters
Bio Scope
BITS - Comparative genomics: the Contra tool
A Novel Approach for Developing Paraphrase Detection System using Machine Lea...
Giab ashg 2017
Genevestigator
2015 bioinformatics go_hmm_wim_vancriekinge
Scaffold-based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemi...
Genetic Algorithm based Optimization of Machining Parameters

What's hot (11)

PPTX
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
PPT
Promiscuous patterns and perils in PubChem and the MLSCN
PDF
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
PDF
Improving the effectiveness of information retrieval system using adaptive ge...
PDF
Dynamic SA/Reports - ACS Philadelphia 2012
PPT
Rational Drug Design using Genetic Algorithm
PPTX
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
PPTX
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
PDF
Defence_5
PDF
Zarlish attique 187104 project assignment modeller
PPTX
Genome in a bottle for next gen dx v2 180821
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
Promiscuous patterns and perils in PubChem and the MLSCN
ANN Based Features Selection Approach Using Hybrid GA-PSO for Sirna Design
Improving the effectiveness of information retrieval system using adaptive ge...
Dynamic SA/Reports - ACS Philadelphia 2012
Rational Drug Design using Genetic Algorithm
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
Defence_5
Zarlish attique 187104 project assignment modeller
Genome in a bottle for next gen dx v2 180821
Ad

Similar to May 15 workshop (20)

PDF
AIQC - ISCB 2022.pdf
PDF
A short introduction to single-cell RNA-seq analyses
PDF
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
PPTX
TNBC Research Presentation and medical virology .pptx
PPTX
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
PPTX
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
PDF
Tpa 2013
PPTX
Bioinformatics-R program의 실례
PDF
Bioinfornatics Practical Lab Manual For Biotech
PPTX
Functional genomics
PPT
Integrative Networks Centric Bioinformatics
PDF
Neo4j_Cypher.pdf
PPTX
A Method to facilitate cancer detection and type classification from gene exp...
PDF
Introduction to Bioinformatics for Molecular Studies
PDF
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
PPT
20100509 bioinformatics kapushesky_lecture03-04_0
PDF
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
PDF
Ngs webinar 2013
PPSX
Functional genomics
PDF
Data analysis
AIQC - ISCB 2022.pdf
A short introduction to single-cell RNA-seq analyses
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
TNBC Research Presentation and medical virology .pptx
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Tpa 2013
Bioinformatics-R program의 실례
Bioinfornatics Practical Lab Manual For Biotech
Functional genomics
Integrative Networks Centric Bioinformatics
Neo4j_Cypher.pdf
A Method to facilitate cancer detection and type classification from gene exp...
Introduction to Bioinformatics for Molecular Studies
EnrichNet: Graph-based statistic and web-application for gene/protein set enr...
20100509 bioinformatics kapushesky_lecture03-04_0
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
Ngs webinar 2013
Functional genomics
Data analysis
Ad

Recently uploaded (20)

PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
1_English_Language_Set_2.pdf probationary
PPTX
Lesson notes of climatology university.
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PPTX
Unit 4 Skeletal System.ppt.pptxopresentatiom
PDF
Classroom Observation Tools for Teachers
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
Empowerment Technology for Senior High School Guide
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
advance database management system book.pdf
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
Indian roads congress 037 - 2012 Flexible pavement
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
1_English_Language_Set_2.pdf probationary
Lesson notes of climatology university.
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Final Presentation General Medicine 03-08-2024.pptx
Weekly quiz Compilation Jan -July 25.pdf
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
Unit 4 Skeletal System.ppt.pptxopresentatiom
Classroom Observation Tools for Teachers
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
What if we spent less time fighting change, and more time building what’s rig...
Empowerment Technology for Senior High School Guide
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
202450812 BayCHI UCSC-SV 20250812 v17.pptx
advance database management system book.pdf
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Indian roads congress 037 - 2012 Flexible pavement

May 15 workshop

  • 1. 1
  • 2. T-BioInfo is designed for processing, analysis and integration of multi-omics data. The platform is used in multiple research groups to extract meaningful insights from large multi-omics datasets. Our current effort expands to education, by enabling more people to extract meaningful, data-driven insights from omics datasets with biomedical applications. To learn more about the platform and it’s research and educational features, follow the highlighted links . T-bio.info | edu.t-bio.info | server.t-bio.info 2
  • 3. 3
  • 4. 4
  • 5. 5
  • 6. Modeling Precision Medicine Machine Learning forTranscriptomics Data: Extracting Meaningful insights from high-throughput biomedical data. 6
  • 8. Diagnosis, Prognosis, Response toTreatment 8 Survival prediction Treatment Selection OncotypeDXPAM50
  • 9. Daemen et al., 2013, “Modeling precision treatment of breast cancer”: an analysis of over 70 different Breast Cancer cell lines and over 90 different therapeutic agents. https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110 9
  • 10. Files we will use in this session 10
  • 12. Part 1: RNA-Seq Processing from raw reads to a table of expression 12
  • 13. RNA-Seq: overview .…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA…. Genome 13 Gene A Gene B Gene C Transcr. ATranscript A Transcr. ATranscript C
  • 14. 14 .…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….Gene A Gene B Gene C Transcr. ATranscript A Transcr. ATranscript C Reads RNA-Seq: overview
  • 15. 15 .…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….Gene A Gene B Gene C Transcr. ATranscript A Transcr. ATranscript C Reads RNA-Seq: some details 1. Shattering 2. Adapters ligation 3. PCR amplification 4. “Reading”
  • 16. Preprocessing: • Adapters removal plus additional • Removing PCR duplicates 16 Quantification of expression levels Mapping • Mapping on the set of known transcripts • Mapping on genome (and potential identification of novel transcripts) • Combined strategy RNA-Seq: overview
  • 18. 18 Data Processing Practice Create a pipeline: 1. Upload same SVL files 2. pre-processing steps:Trimmomatic, PCRclean 3. Mapping on Genome: HiSat2 4. IsoformConstruction: Cufflinks 5. GTF Merging: Cuffmerge 6. Mapping onTranscripts: Bowtie2-t 7. Quantification: RSEMExpTable
  • 21. Standard Measures of RNA Quantification: • Counts • FPKM – fragments per kilobase per million mapped reads: Number of reads mapped on the gene ((total number of mapped reads – in millions) x (gene length in kilobases)) • TPM – transcripts per million For one sample TPMg = C x FPKMg, where C is selected in such a way that sum of all million. Constants C are different for different samples. 21
  • 22. Linear scale vs Log-scale Relative differences are biologically more meaningful than absolute. are simplified if a log-scaling is performed: Log-scaled measure = log2 (linear-scale measure + shift) For relatively large values: difference equal to 1 in log-scale is a 2x difference in linear scale; difference equal to 3 in log-scale is a 8x difference in linear scale. etc; difference equal to -1 in log-scale is a 2x difference in linear scale, but in the opposite direction. 22
  • 23. Preprocessing: • Adapters removal plus additional • Removing PCR duplicates 23 Quantification of expression levels Mapping • Mapping on the set of known transcripts • Mapping on genome (and potential identification of novel transcripts) • Combined strategy RNA-Seq: overview
  • 24. Comparison: the role of preprocessing 24 High expression can be affected by pre-processing steps like PCR-clean and “Trimmomatic”
  • 26. BREAK 26 Q&A Error Correction – CORAL, ECHO, RACER, eMER Different Mappers – HiSat,TopHat, STAR, BWA Other Sections: • Differential Expression – CuffDiff, EDGER, DESEQ • Segmentation - BinS
  • 27. Part 2: Machine Learning Data exploration and classification 27
  • 30. Unsupervised analysis: PCA 30 • Explore data • Visualize Why use Principal Component Analysis? • Data Filtering • Outliers • Interpretation Considerations:
  • 32. 32 Unsupervised analysis: PCA PCA 7,000 genes PCA PAM50 (35) genes Normal-like Basal Claudin-low Luminal
  • 33. 33 Unsupervised analysis: Hierarchical Clustering • Identify groups • Associate sample to group Why use clustering? • Various methods • Random selection in some methods • Interpretation Considerations:
  • 35. Unsupervised analysis: hierarchical clustering Dendrogram 35 2 clusters 4 clusters 8 clusters
  • 36. 36 Unsupervised Analysis Practice • Remove sample IDs • Mark Group Names as ID • Run H-clust CellLines_ExprData_marked.txt
  • 38. 38 DogsCats ????? Training Set Test Set Supervised Machine Learning
  • 40. 40 SupportVector Machine (SVM) with Linear Kernel d d
  • 41. 41 SupportVector Machine (SVM) with Linear Kernel ?
  • 42. ? 42 Support Vector Machine (SVM) with Linear Kernel
  • 43. • Fitting classifier on training set and predicting classes on the test set • Is it possible to tune 7000 coefficients by 52 samples? • Some algorithms do feature selection: swLDA, random forest • Other algorithms won’t work if number of features >> number of samples • Curse of dimensionality 43 Considerations Supervised analysis
  • 44. 44 • Extracting 15 highly informative genes from the swLDA classifier • How other supervised learning algorithms can be applied (e.g., SVM) • Feature selection can also improve quality of unsupervised learning analysis Step-wise Linear Discriminant Analysis (swLDA)
  • 45. 45 Classification Practice • Organize the table with 15 genes by sample type • Color expression (green – low; red – high) • Which genes stand out? • Which sample stand out? • What groups are hard to detect? CellLines_15Genes_market.txt
  • 47. 47 Hierarchical Clustering of 15 gene table N-like Basal C-low Luminal4 clusters
  • 49. Part 3: Interpretation Annotating and Interpreting Gene Expression 49
  • 50. Gene annotation: ENSG to Gene Symbols plus GO 50
  • 54. 1. PCA plot using top 15 genes from differential expression analysis 54 Homework:
  • 55. Separation of samples from various sources:TCGA and PDX 55 2. New Datasets
  • 56. 56 Part 1: Conventional Machine Learning Approaches for Next Generation Sequencing Rapid RNA-seq processing for expression quantification applying logical pipeline construction and pre-processing considerations. hands-on exercises, participants will explore the expression using conventional unsupervised machine learning methods and supervised classifiers with and without feature extraction. Using BioInfo platform, participants will learn about the logic and considerations of applying such methods and be prepared for independent downstream analysis and visualization of data downloaded R scripts produced by the system. The produced/downloaded code will be reviewed, customized and subsequent session. T-bio.info | edu.t-bio.info (FREE) | server.t-bio.info (14 days DEMO)
  • 57. 57
  • 58. 58 Required installations: R >= 3.4 R Studio gplots ggfortify ggplot2 ggpubr e1071 mda MASS klaR Part 2: Combining custom software with R to streamline analysis workflows and visualize ‘Omics data insights. Differential Gene Expression, Gene Set Enrichment Analysis R visualization from scratch: utilize the same dataset for basic data exploration and visualization in R. This session will strengthen the participants ability to transition to script-based workflows in RNA-seq downstream analysis and visualization. Participants will learn about downstream capabilities of R-based workflow to transform and manipulate tables and visualize findings in a meaningful way.
  • 60. 60
  • 61. Differential expression analysis Quantities related to the degree of differential expression: • Difference between mean expression levels – fold change (please, pay attention to scale); • Statistical significance – p-value, adjusted p-value (e.g., FDR) • Level of Expression (caution with low-expressed genes from the analysis) 61
  • 62. • Hard to interpret when number of groups is greater than two, so we can use Claudin-low vs normal- like groups. • Differential Expression is a natural and easy to interpret feature selection procedure. • Pathway enrichment analysis can be applied to the resulting table 62 Differential expression analysis
  • 65. Gene set / pathway enrichment analysis GAGE - • Use only lists (thresholding required): one of the standard tools here isThe Database for Annotation,Visualization and Integrated Discovery – DAVID (https://david.ncifcrf.gov/home.jsp, https://david-d.ncifcrf.gov/). • Takes into consideration level of differential expression 65
  • 66. 66 Gene set / pathway enrichment analysis
  • 67. 67 Gene set / pathway enrichment analysis
  • 68. 68 Gene set / pathway enrichment analysis Regulation of Actin Cytoskeleton B Cell Receptor Signaling Pathway
  • 69. 69 Required installations: R >= 3.4 R Studio gplots ggfortify ggplot2 ggpubr e1071 mda MASS klaR Part 2: Combining custom software with R to streamline analysis workflows and visualize ‘Omics data insights. Differential Gene Expression, Gene Set Enrichment Analysis R visualization from scratch: utilize the same dataset for basic data exploration and visualization in R. This session will strengthen the participants ability to transition to script-based workflows in RNA-seq downstream analysis and visualization. Participants will learn about downstream capabilities of R-based workflow to transform and manipulate tables and visualize findings in a meaningful way.
  • 71. 71