SlideShare a Scribd company logo
MICROARRAYS AND
DATA ANALYSIS
FINAL PROJECT
Kiranmayee Bakshy
08/19/2014
Introduction
• Expression data from 46 cultured human ovarian
carcinoma cell lines with and without Cisplatin
treatment
• Array: A-AFFY-141 - Affymetrix GeneChip Human
Gene 1.0 ST Array [HuGene-1_0-st-v1] (GPL6244)
• Technology type: in situ oligonucleotide
• Experiment type: transcription profiling by array
• Samples: 171
• NCBI GEO accession no. GSE47856
(http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GS
E47856)
Epithelial-mesenchymal status renders differential responses to cisplatin in ovarian cancer. Miow QH,
Tan TZ, Ye J, Lau JA, Yokomizo T, Thiery JP, Mori S. , Europe PMC 24858042
Background
• Chemo-resistance to platinum in anti-cancer drugs such
as cisplatin is critical in the treatment of cancer.
• Epithelial-mesenchymal transition (EMT) is linked with
the drug resistance as a contributing mechanism.
• The current study is designed to explore the
connection between cellular responses to cisplatin with
EMT in ovarian cancer.
• Expression microarrays were utilized to estimate the
EMT status as a binary phenotype
• Various bioassays such as cell number, proliferation
rate and apoptosis were conducted to quantify
phenotypic responses to Cisplatin treatment.
Data Analysis pipeline
Load raw CEL
files into R and
normalize using RMA
Outlier analysis (CV vs mean plot, Hierarchical
clustering dendrogram and Average correlation
plot)
Run statistical tests and fold change to select
differentially expressed genes
Dimensionality
reduction/clustering (PCA)
Classification (QDA)
Report top 5 up-
regulated and down-
regulated gene names
and their functions
Dataset:
Total no. of samples: 171
Total no. of probesets: 33297
Annotation classes:
Epithelial-like: 86 samples
Mesenchymal-like: 85 samples
Outlier analysis
Outlier-GSM1160845
Outlier-GSM1160845
Outlier analysis
Outlier-GSM1160845
Outlier analysis
 The outlier - GSM1160845 was removed from the data matrix
 13707 genes that have low expression values (mean < 5) were also
deleted
Statistical analysis
Student’s t-test and fold change
No. of probes with p-value:
< 0.05 9507
< 0.01 7074
< 0.05/no. of probes 2133
Linear fold change:
Min. -8.651798
Max. 33.05966
Threshold for selecting
differentially
expressed genes:
p-value < 0.05/no. of probes
and
fold change > log2(2)
Visualization of differentially
expressed genes
656 differentially expressed genes were selected from the analysis based
on the threshold
Dimensionality reduction of
differentially expressed genes
Around 50% of variability in data can be explained by the first two eigenfunctions
of PCA
Principle component analysis
Spectral k-means clustering of 50 random
differentially expressed genes
Spectral k-means clustering is useful in this case as the variability can be
best summarized in a few eigenfunctions.
Classification – Quadratic Discriminant Analysis
Epithelial
-like
Mesenchymal-
like
Epithelial-like 36 0
Mesenchymal-
like
0 34
Confusion matrix from QDA
predicted membership
Actualmembership
 Training set: 50 epithelial-like
50 mesenchymal-like
 Test set: 36 epithelial-like
34 mesenchymal-like
 QDA was performed on the first three
components of principle component
analysis of training set.
QDA predicted all the samples of the test set correctly
AFFYMETRIX_EX
ON_GENE_ID
GENE NAME
GENE
SYMBOL
FUNCTION
8102792 protocadherin 18
PCDH18
Potential calcium-dependent cell-
adhesion protein
7899167
lin-28 homolog(C.
elegans)
LIN28A
Acts as a 'translational enhancer',
driving specific mRNAs to polysomes
and thus increasing the efficiency
of protein synthesis
7906878
discoidin domain
receptor tyrosine
kinase-2
DDR2
This tyrosine kinase receptor for
fibrillar collagen mediates
fibroblast migration and
proliferation
7906900*
discoidin domain
receptor tyrosine
kinase 2
DDR2
This tyrosine kinase receptor for
fibrillar collagen mediates
fibroblast migration and
proliferation
7926368 vimentin
VIM
class-III intermediate filaments
found in mesenchymal cells
DAVID functional annotations of top 5 discriminant genes
(Negative)
* Unmapped in DAVID; information obtained from NetAffx
No pathways or GO information was suggested by DAVID.
AFFYMETRIX_EX
ON_GENE_ID
GENE NAME
GENE
SYMBOL
FUNCTION
8026490
urothelial cancer
associated-1
UCA1 role in bladder cancer progression and
embryonic development
8041853
epithelial cell
adhesion molecule
EpCAM
carcinoma-associated antigen EpCAM up
regulates c-myc and induces cell
proliferation
8098439
epithelial cell
adhesion molecule
EpCAM
carcinoma-associated antigen EpCAM up
regulates c-myc and induces cell
proliferation
8147351
mal, T-cell
differentiation
protein-2
MAL-2 Member of the machinery of polarized
transport
8148040
epithelial splicing
regulatory protein-
1
Esrp-1
mRNA splicing factor that regulates the
formation of epithelial cell-specific
isoforms
DAVID functional annotations of top 5 discriminant genes
(Positive)
No pathways or GO information was suggested by DAVID.
Conclusions:
• The outlier observed in this dataset was GSM1160845 which is
a mesenchymal-like ovarian cancer cell line treated with
Cisplatin.
• 656 out of 19590 genes were selected as differentially
expressed genes based on the threshold.
• The QDA classification model trained using 100 samples
predicted the classes of test set with 70 samples successfully.
• All the top 5 positively and negatively regulated genes obtained
in this analysis are involved in cellular processes such as cell
adhesion, migration, proliferation and protein synthesis.
• The authors have reported an epithelial gene set consisting of
known epithelial cell markers such as DDR1, KRT8, KRT18,
CDH1, CDH3, CLDN3, CLDN4 and EPCAM, and a mesenchymal
gene set consisting of known mesenchymal cell markers ZEB1,
CDH2, VIM and TWIST1.

More Related Content

PDF
Navigating through disease maps
PPT
Crizo
PPT
FLT3 INHIBITORS
PDF
Differentiation of triple-negative breast cancer - BioGenex
PDF
A novel platform for in situ, multiomic, hyper-plexed analyses of systems bio...
PDF
Final presentation onurerdogan
PPTX
Navigating through disease maps
Crizo
FLT3 INHIBITORS
Differentiation of triple-negative breast cancer - BioGenex
A novel platform for in situ, multiomic, hyper-plexed analyses of systems bio...
Final presentation onurerdogan

What's hot (20)

PPTX
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models
PPTX
Rna seq - PDX models
PDF
prostate cancer classification - BioGenex
PPTX
Exome breast cancer-edu-tk-sb
PDF
2014 11-27 EATRIS biomarkers platform, Amsterdam, oncology case study
PDF
TINAGL1 and B3GALNT1 are potential therapy target genes to suppress metastasi...
PPTX
Relapsed AML: Steve Kornblau
PPTX
Petrulli_SNM_2014_Talk
PDF
MicroRNA Profiling in Serum from Donors with Germ Cell Cancer
PPT
Arriagada, r. breast cancer
PPT
ECCLU 2011 - J.J. Battermann - Prostate cancer: All the truth about local tre...
PPTX
Alexia Chrysostomou (083707160)
PDF
Wild Type and Mutated BRCA - Differentiation of Breast Cancer - BioGenex
PPTX
Pharmacogenomics
PPT
Probes 2010
PPTX
Breast Cancer - Molecular Basis of HER2+ Disease
PDF
Executive Summary_Smart Analyst_PARP Inhibitors (Repaired)
PPTX
Systemic therapy in malignant melanoma
PDF
poster FINAL
PPTX
Pharmacogenomics part I
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models
Rna seq - PDX models
prostate cancer classification - BioGenex
Exome breast cancer-edu-tk-sb
2014 11-27 EATRIS biomarkers platform, Amsterdam, oncology case study
TINAGL1 and B3GALNT1 are potential therapy target genes to suppress metastasi...
Relapsed AML: Steve Kornblau
Petrulli_SNM_2014_Talk
MicroRNA Profiling in Serum from Donors with Germ Cell Cancer
Arriagada, r. breast cancer
ECCLU 2011 - J.J. Battermann - Prostate cancer: All the truth about local tre...
Alexia Chrysostomou (083707160)
Wild Type and Mutated BRCA - Differentiation of Breast Cancer - BioGenex
Pharmacogenomics
Probes 2010
Breast Cancer - Molecular Basis of HER2+ Disease
Executive Summary_Smart Analyst_PARP Inhibitors (Repaired)
Systemic therapy in malignant melanoma
poster FINAL
Pharmacogenomics part I
Ad

Similar to Final project-kbakshy (20)

PPTX
RNA (gene expression) analysis of Prostate cancers and non-cancerous tissues t
PDF
Resolving transcriptional dynamics of the epithelial-mesenchymal transition u...
PPTX
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
PDF
Defining the relevant genome in solid tumors
PDF
Qpcrpcr array poster
PDF
Emt and ecm 2013
PDF
Quantification of Somatic Chromosomal Rearrangements in Circulating Cell- Fre...
PPT
20100509 bioinformatics kapushesky_lecture05_0
PPTX
Mason abrf single_cell_2017
PDF
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
PDF
Psb tutorial cancer_pathways
PPT
Bioinformatic data analysis – comparison from three human studies using diffe...
PPTX
RAnalysis
PPTX
TNBC Research Presentation and medical virology .pptx
PPTX
PSB 2018 presentation
PPTX
Personalized Medicine in Diagnosis and Treatment of Cancer
PDF
Aacr poster2007
PDF
Cancer drug targets 2013
PDF
Fehrman Nat Gen 2014 - Journal Club
PPTX
SOT2010 GeneGo presentation
RNA (gene expression) analysis of Prostate cancers and non-cancerous tissues t
Resolving transcriptional dynamics of the epithelial-mesenchymal transition u...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Defining the relevant genome in solid tumors
Qpcrpcr array poster
Emt and ecm 2013
Quantification of Somatic Chromosomal Rearrangements in Circulating Cell- Fre...
20100509 bioinformatics kapushesky_lecture05_0
Mason abrf single_cell_2017
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
Psb tutorial cancer_pathways
Bioinformatic data analysis – comparison from three human studies using diffe...
RAnalysis
TNBC Research Presentation and medical virology .pptx
PSB 2018 presentation
Personalized Medicine in Diagnosis and Treatment of Cancer
Aacr poster2007
Cancer drug targets 2013
Fehrman Nat Gen 2014 - Journal Club
SOT2010 GeneGo presentation
Ad

Final project-kbakshy

  • 1. MICROARRAYS AND DATA ANALYSIS FINAL PROJECT Kiranmayee Bakshy 08/19/2014
  • 2. Introduction • Expression data from 46 cultured human ovarian carcinoma cell lines with and without Cisplatin treatment • Array: A-AFFY-141 - Affymetrix GeneChip Human Gene 1.0 ST Array [HuGene-1_0-st-v1] (GPL6244) • Technology type: in situ oligonucleotide • Experiment type: transcription profiling by array • Samples: 171 • NCBI GEO accession no. GSE47856 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GS E47856) Epithelial-mesenchymal status renders differential responses to cisplatin in ovarian cancer. Miow QH, Tan TZ, Ye J, Lau JA, Yokomizo T, Thiery JP, Mori S. , Europe PMC 24858042
  • 3. Background • Chemo-resistance to platinum in anti-cancer drugs such as cisplatin is critical in the treatment of cancer. • Epithelial-mesenchymal transition (EMT) is linked with the drug resistance as a contributing mechanism. • The current study is designed to explore the connection between cellular responses to cisplatin with EMT in ovarian cancer. • Expression microarrays were utilized to estimate the EMT status as a binary phenotype • Various bioassays such as cell number, proliferation rate and apoptosis were conducted to quantify phenotypic responses to Cisplatin treatment.
  • 4. Data Analysis pipeline Load raw CEL files into R and normalize using RMA Outlier analysis (CV vs mean plot, Hierarchical clustering dendrogram and Average correlation plot) Run statistical tests and fold change to select differentially expressed genes Dimensionality reduction/clustering (PCA) Classification (QDA) Report top 5 up- regulated and down- regulated gene names and their functions Dataset: Total no. of samples: 171 Total no. of probesets: 33297 Annotation classes: Epithelial-like: 86 samples Mesenchymal-like: 85 samples
  • 7. Outlier-GSM1160845 Outlier analysis  The outlier - GSM1160845 was removed from the data matrix  13707 genes that have low expression values (mean < 5) were also deleted
  • 8. Statistical analysis Student’s t-test and fold change No. of probes with p-value: < 0.05 9507 < 0.01 7074 < 0.05/no. of probes 2133 Linear fold change: Min. -8.651798 Max. 33.05966 Threshold for selecting differentially expressed genes: p-value < 0.05/no. of probes and fold change > log2(2)
  • 9. Visualization of differentially expressed genes 656 differentially expressed genes were selected from the analysis based on the threshold
  • 10. Dimensionality reduction of differentially expressed genes Around 50% of variability in data can be explained by the first two eigenfunctions of PCA Principle component analysis
  • 11. Spectral k-means clustering of 50 random differentially expressed genes Spectral k-means clustering is useful in this case as the variability can be best summarized in a few eigenfunctions.
  • 12. Classification – Quadratic Discriminant Analysis Epithelial -like Mesenchymal- like Epithelial-like 36 0 Mesenchymal- like 0 34 Confusion matrix from QDA predicted membership Actualmembership  Training set: 50 epithelial-like 50 mesenchymal-like  Test set: 36 epithelial-like 34 mesenchymal-like  QDA was performed on the first three components of principle component analysis of training set. QDA predicted all the samples of the test set correctly
  • 13. AFFYMETRIX_EX ON_GENE_ID GENE NAME GENE SYMBOL FUNCTION 8102792 protocadherin 18 PCDH18 Potential calcium-dependent cell- adhesion protein 7899167 lin-28 homolog(C. elegans) LIN28A Acts as a 'translational enhancer', driving specific mRNAs to polysomes and thus increasing the efficiency of protein synthesis 7906878 discoidin domain receptor tyrosine kinase-2 DDR2 This tyrosine kinase receptor for fibrillar collagen mediates fibroblast migration and proliferation 7906900* discoidin domain receptor tyrosine kinase 2 DDR2 This tyrosine kinase receptor for fibrillar collagen mediates fibroblast migration and proliferation 7926368 vimentin VIM class-III intermediate filaments found in mesenchymal cells DAVID functional annotations of top 5 discriminant genes (Negative) * Unmapped in DAVID; information obtained from NetAffx No pathways or GO information was suggested by DAVID.
  • 14. AFFYMETRIX_EX ON_GENE_ID GENE NAME GENE SYMBOL FUNCTION 8026490 urothelial cancer associated-1 UCA1 role in bladder cancer progression and embryonic development 8041853 epithelial cell adhesion molecule EpCAM carcinoma-associated antigen EpCAM up regulates c-myc and induces cell proliferation 8098439 epithelial cell adhesion molecule EpCAM carcinoma-associated antigen EpCAM up regulates c-myc and induces cell proliferation 8147351 mal, T-cell differentiation protein-2 MAL-2 Member of the machinery of polarized transport 8148040 epithelial splicing regulatory protein- 1 Esrp-1 mRNA splicing factor that regulates the formation of epithelial cell-specific isoforms DAVID functional annotations of top 5 discriminant genes (Positive) No pathways or GO information was suggested by DAVID.
  • 15. Conclusions: • The outlier observed in this dataset was GSM1160845 which is a mesenchymal-like ovarian cancer cell line treated with Cisplatin. • 656 out of 19590 genes were selected as differentially expressed genes based on the threshold. • The QDA classification model trained using 100 samples predicted the classes of test set with 70 samples successfully. • All the top 5 positively and negatively regulated genes obtained in this analysis are involved in cellular processes such as cell adhesion, migration, proliferation and protein synthesis. • The authors have reported an epithelial gene set consisting of known epithelial cell markers such as DDR1, KRT8, KRT18, CDH1, CDH3, CLDN3, CLDN4 and EPCAM, and a mesenchymal gene set consisting of known mesenchymal cell markers ZEB1, CDH2, VIM and TWIST1.