SlideShare a Scribd company logo
A Method to Facilitate Cancer Detection
and Type Classification from Gene
Expression Data using a Deep Autoencoder
and Neural Network
By Xi Chen
March 27, 2019
Gene Expression Data Properties.
• Gene expresses differently depending upon various factors such as the type
of cells, environment and disease conditions.
• Gene expression data are highly available due to the increased affordability
of the sequencing technology.
• Gene expression data are multimodality, high dimensional with small
observation number (#row << #column).
• Gene expression data can be used for disease detection and classification,
and drug suggestion.
2
Gene Expression Data With Dimension
Reduction
• Using dimension reduction methods, such as PCA, for feature
selection, since gene expression data have high dimension.
• Apply traditional statistical and machine learning methods for
application such as disease detection or classification.
• Problem: how to explain the selected features. E.g. Each PC is a linear
combination of the gene expression features.
3
Proposed Drug Suggestion Scheme.
2D Gene Expression Representation
Feature 1
Feature2 Drug Sensitivity
Drug A
Drug B
Drug C
Drug D
Cluster Approaches:
• K-means
• Gaussian Mixture Models 4
Problem: Current Gene Expression Data Don’t
Include Drug Results.
• Most gene expression data aren’t associated with well documented
medical records.
• Available records often miss drug information and patient disease
outputs.
5
Solving The Harder Classification Problem First,
Then We Could Infer Cluster Approach Works
• In general, a classification problem is similar to a cluster problem, e.g.
k-Nearest Neighbors algorithm.
• If using gene expression data we could achieve high accurate
classification results, we might be able to suggest clustering gene
expression data for drug suggestion.
6
Data Processing
60,483
14,157
7
Computation Platform
8
Autoencoder For Feature Learning
Minimize 𝑓(𝐼𝑛𝑝𝑢𝑡 − 𝑂𝑢𝑡𝑝𝑢𝑡)
100
50
25
50
100
Training Autoencoder
1st hidden layer:
2nd hidden layer:
3rd hidden layer:
4th hidden layer:
5th hidden layer:
Model
Configuration
9
Learned Feature + Neural Network
10
Single Type Classification
Lung cancer, abundant and balanced data 11
Why Not PCA?
• PCA is a descriptive model.
• Each component is a linear
combination of all the
features.
• Hard to explain.
12
Cancer Type
Acronym
Full Name
LGG Lower Grad Glioma
UVM Uveal Melanoma
LUSC
Lung squamous cell
carcinoma
GBM Glioblastoma Multiforme
Multiple Type
Classification
• Misclassifications are due to small
sample size.
• Misclassifications are sparse,
clustering potential.
13
Conclusion
• Autoencoder to automatically generate feature representations, thus
addressing the very high dimensionality of gene expression data.
• The extracted feature vector captures the non-linearity of the data.
• This approach is scalable for new data after training, and it can
generalize in multi-classification of different types of cancer.
• We have demonstrated the high accuracy and low FNR/FPR of this
method for the majority of the abundant cancer types, and its
potential for handling sub-classification within certain cancers and
identifying metastasis cancers.
14
Other Projects—Deep Learning Behind The
Scenes
• Almost all machine learning applications use
similar approaches—Feature Engineering +
Deep Learning.
• E.g. Self-driving cars = CNN + DNN
• Feature engineering  CNN
• Deep Learning training  DNN
• Deployment
15
Thank you so
much!
Questions?
16

More Related Content

PPTX
Deep learning based multi-omics integration, a survey
PPTX
Revealing disease-associated pathways by network integration of untargeted me...
PPTX
The Role of The Statisticians in Personalized Medicine: An Overview of Stati...
PPTX
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
PPTX
Molecular dynamics synchronised Manipulator system to repair Biomolecules
PPTX
Systems genetics approaches to understand complex traits
PPTX
Bioinformatics
TXT
Deep learning based multi-omics integration, a survey
Revealing disease-associated pathways by network integration of untargeted me...
The Role of The Statisticians in Personalized Medicine: An Overview of Stati...
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
Molecular dynamics synchronised Manipulator system to repair Biomolecules
Systems genetics approaches to understand complex traits
Bioinformatics

What's hot (20)

PPT
CSCI 6505 Machine Learning Project
PPT
DREAM Challenge
PPTX
Cause-effect relationships in medicine
PPTX
Computational predictiction of prrotein structure
PDF
Lecture 13 – comparative modeling
PDF
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
PDF
Diagnosis of Cancer using Fuzzy Rough Set Theory
PDF
Reconstruction and analysis of cancerspecific Gene regulatory networks from G...
PDF
A Network View on Parkinson’s Disease Elsevier webinar 15 jan 2015
PPTX
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
DOCX
Data preprocessing
PDF
An Ensemble of Filters and Wrappers for Microarray Data Classification
PPTX
CADD by Dr. Rajan swami
PPTX
Integrative Genomics of Non-Small Cell Lung Cancer by Peter McLoughlin
PDF
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
PDF
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
PDF
PDF
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
PDF
Identification of novel potential anti cancer agents using network pharmacolo...
PPT
Network Pharmacology Tri-Con 022212
CSCI 6505 Machine Learning Project
DREAM Challenge
Cause-effect relationships in medicine
Computational predictiction of prrotein structure
Lecture 13 – comparative modeling
A Classification of Cancer Diagnostics based on Microarray Gene Expression Pr...
Diagnosis of Cancer using Fuzzy Rough Set Theory
Reconstruction and analysis of cancerspecific Gene regulatory networks from G...
A Network View on Parkinson’s Disease Elsevier webinar 15 jan 2015
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
Data preprocessing
An Ensemble of Filters and Wrappers for Microarray Data Classification
CADD by Dr. Rajan swami
Integrative Genomics of Non-Small Cell Lung Cancer by Peter McLoughlin
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
Machine Learning Based Approaches for Cancer Classification Using Gene Expres...
Identification of novel potential anti cancer agents using network pharmacolo...
Network Pharmacology Tri-Con 022212
Ad

Similar to A Method to facilitate cancer detection and type classification from gene expression using a deep auto-encoder and neural network (20)

PPTX
PPTPPT.PPT.PPT.PPT.PPT.PPT.PPT.PPT.PPT.PPT.PPT..pptx
PPTX
DataMining Techniques in BreastCancer.pptx
PPTX
Datamining in BreastCancer.pptx
PDF
Mining of Important Informative Genes and Classifier Construction for Cancer ...
PDF
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
PPTX
TNBC Research Presentation and medical virology .pptx
PPTX
Feature based heart disease prediction approach
PPTX
Updated proposal powerpoint.pptx
PPTX
Parkinson disease classification recorded v2.0
PDF
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
PDF
Design of an Intelligent System for Improving Classification of Cancer Diseases
PPTX
May workshop
PPTX
May 15 workshop
PDF
Microarray gene expression classification: dwarf mongoose optimization with d...
PDF
Efficacy of Non-negative Matrix Factorization for Feature Selection in Cancer...
PPTX
Parkinson disease classification v2.0
PPTX
u-Breast Cancer Detection Using Deep Learning.pptx
PDF
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
PDF
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
PPTX
seminar 2 of disease and healthcare.pptx
PPTPPT.PPT.PPT.PPT.PPT.PPT.PPT.PPT.PPT.PPT.PPT..pptx
DataMining Techniques in BreastCancer.pptx
Datamining in BreastCancer.pptx
Mining of Important Informative Genes and Classifier Construction for Cancer ...
MINING OF IMPORTANT INFORMATIVE GENES AND CLASSIFIER CONSTRUCTION FOR CANCER ...
TNBC Research Presentation and medical virology .pptx
Feature based heart disease prediction approach
Updated proposal powerpoint.pptx
Parkinson disease classification recorded v2.0
A CLASSIFICATION MODEL ON TUMOR CANCER DISEASE BASED MUTUAL INFORMATION AND F...
Design of an Intelligent System for Improving Classification of Cancer Diseases
May workshop
May 15 workshop
Microarray gene expression classification: dwarf mongoose optimization with d...
Efficacy of Non-negative Matrix Factorization for Feature Selection in Cancer...
Parkinson disease classification v2.0
u-Breast Cancer Detection Using Deep Learning.pptx
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
seminar 2 of disease and healthcare.pptx
Ad

More from Xi Chen (8)

PDF
SIAM CSE21 Broader Engagement Program Flyer
PPTX
Pan-Cancer Epigenetic Biomarker Selection from Blood Sample Using SAS®
PDF
Introduction to SAS Enterprise Miner
PDF
RapidPredictiveModelingfor Business Analysis
PDF
Cert
PDF
Cert-Stat1
PDF
Cert-SQL
PDF
Cert-Macro1
SIAM CSE21 Broader Engagement Program Flyer
Pan-Cancer Epigenetic Biomarker Selection from Blood Sample Using SAS®
Introduction to SAS Enterprise Miner
RapidPredictiveModelingfor Business Analysis
Cert
Cert-Stat1
Cert-SQL
Cert-Macro1

Recently uploaded (20)

PPTX
Tartificialntelligence_presentation.pptx
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
project resource management chapter-09.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Encapsulation theory and applications.pdf
PPTX
A Presentation on Touch Screen Technology
PDF
Approach and Philosophy of On baking technology
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Mushroom cultivation and it's methods.pdf
Tartificialntelligence_presentation.pptx
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Hindi spoken digit analysis for native and non-native speakers
Web App vs Mobile App What Should You Build First.pdf
DP Operators-handbook-extract for the Mautical Institute
TLE Review Electricity (Electricity).pptx
Getting Started with Data Integration: FME Form 101
Digital-Transformation-Roadmap-for-Companies.pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
project resource management chapter-09.pdf
A comparative study of natural language inference in Swahili using monolingua...
A comparative analysis of optical character recognition models for extracting...
Encapsulation theory and applications.pdf
A Presentation on Touch Screen Technology
Approach and Philosophy of On baking technology
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
WOOl fibre morphology and structure.pdf for textiles
Assigned Numbers - 2025 - Bluetooth® Document
Mushroom cultivation and it's methods.pdf

A Method to facilitate cancer detection and type classification from gene expression using a deep auto-encoder and neural network

  • 1. A Method to Facilitate Cancer Detection and Type Classification from Gene Expression Data using a Deep Autoencoder and Neural Network By Xi Chen March 27, 2019
  • 2. Gene Expression Data Properties. • Gene expresses differently depending upon various factors such as the type of cells, environment and disease conditions. • Gene expression data are highly available due to the increased affordability of the sequencing technology. • Gene expression data are multimodality, high dimensional with small observation number (#row << #column). • Gene expression data can be used for disease detection and classification, and drug suggestion. 2
  • 3. Gene Expression Data With Dimension Reduction • Using dimension reduction methods, such as PCA, for feature selection, since gene expression data have high dimension. • Apply traditional statistical and machine learning methods for application such as disease detection or classification. • Problem: how to explain the selected features. E.g. Each PC is a linear combination of the gene expression features. 3
  • 4. Proposed Drug Suggestion Scheme. 2D Gene Expression Representation Feature 1 Feature2 Drug Sensitivity Drug A Drug B Drug C Drug D Cluster Approaches: • K-means • Gaussian Mixture Models 4
  • 5. Problem: Current Gene Expression Data Don’t Include Drug Results. • Most gene expression data aren’t associated with well documented medical records. • Available records often miss drug information and patient disease outputs. 5
  • 6. Solving The Harder Classification Problem First, Then We Could Infer Cluster Approach Works • In general, a classification problem is similar to a cluster problem, e.g. k-Nearest Neighbors algorithm. • If using gene expression data we could achieve high accurate classification results, we might be able to suggest clustering gene expression data for drug suggestion. 6
  • 9. Autoencoder For Feature Learning Minimize 𝑓(𝐼𝑛𝑝𝑢𝑡 − 𝑂𝑢𝑡𝑝𝑢𝑡) 100 50 25 50 100 Training Autoencoder 1st hidden layer: 2nd hidden layer: 3rd hidden layer: 4th hidden layer: 5th hidden layer: Model Configuration 9
  • 10. Learned Feature + Neural Network 10
  • 11. Single Type Classification Lung cancer, abundant and balanced data 11
  • 12. Why Not PCA? • PCA is a descriptive model. • Each component is a linear combination of all the features. • Hard to explain. 12
  • 13. Cancer Type Acronym Full Name LGG Lower Grad Glioma UVM Uveal Melanoma LUSC Lung squamous cell carcinoma GBM Glioblastoma Multiforme Multiple Type Classification • Misclassifications are due to small sample size. • Misclassifications are sparse, clustering potential. 13
  • 14. Conclusion • Autoencoder to automatically generate feature representations, thus addressing the very high dimensionality of gene expression data. • The extracted feature vector captures the non-linearity of the data. • This approach is scalable for new data after training, and it can generalize in multi-classification of different types of cancer. • We have demonstrated the high accuracy and low FNR/FPR of this method for the majority of the abundant cancer types, and its potential for handling sub-classification within certain cancers and identifying metastasis cancers. 14
  • 15. Other Projects—Deep Learning Behind The Scenes • Almost all machine learning applications use similar approaches—Feature Engineering + Deep Learning. • E.g. Self-driving cars = CNN + DNN • Feature engineering  CNN • Deep Learning training  DNN • Deployment 15