SlideShare a Scribd company logo
18th June 2001017th December 2009
Genomics to Phenomics:
The Complex Journey in Big Data Biomedicine
Asoke K Talukder, PhD
InterpretOmics, Bangalore
Indian Society of Human Genetics
41st Annual Meeting,
Sankara Nethralaya, Chennai, 3-5 March, 2016
Acknowledgement
• Organizing Committee, ISHG2016
• Authors & Agencies Making their Research articles,
and Data available in the open domain and Internet
• Authors of Open source Software
• NCBI, NIH, Wikipedia, Google & other Internet sites
that believe in Bhikshu Economy by making their
contents open in the Cloud
2March 3-5, 2016
Hunting the “Dwarfing” Gene?
3March 3-5, 2016
Palm Oil – Activate the Dwarfing Gene (Genomics)Teak – Repress the Dwarfing Gene (Genomics)
The Human Genome – Decoding the Book of Life
A Milestone for Humanity – the Human genome
Human Genome Completed, 26 June, 2000
Francis CollinsBill ClintonJ Craig Ventor
Craig Venter Bill Clinton Francis Collins
4March 3-5, 2016
Trillion-Dollar Science to Trillion-Dollar Industry
5March 3-5, 2016
The relationship between the number
of stem cell divisions in the lifetime of
a given tissue and the lifetime risk of
cancer in that tissue
Reference: Cristian Tomasetti, and Bert Vogelstein, Jan 2 Science 2015;347:78-81
6March 3-5, 2016
Reference: Norbert Stefan, et al, Divergent associations of height with
cardiometabolic disease and cancer: epidemiology, pathophysiology, and
global implications. The Lancet Diabetes & Endocrinology, 2016; DOI:
Reduction Vs Integration
7March 3-5, 2016
Genomics (System)
(Genetics)
Talukder AK, Genomics 3.0, Big Data Analytics, Springer, 2015
Evidence Based Science (Biology & Medicine)
8March 3-5, 2016
Genetics Genomics
Confirmatory Exploratory
Hypothesis Driven Hypothesis Creating
Component Holistic
Biology Statistical Data Mining
Big Data in Biomedicine
The 7 Vs of Genomic Big Data
• Volume is defined in terms of the physical volume of the data that need to be online, like giga-byte
(10^9), tera-byte (10^12), peta-byte (10^15) or exa-byte (10^18) or even beyond.
• Velocity is about the data-retrieval time or the time taken to service a request. Velocity is also
measured through the rate of change of the data volume.
• Variety relates to heterogeneous types of data like text, structured, unstructured, video, audio
etcetera.
• Veracity is another dimension to measure data reliability - the ability of an organization to trust the
data and be able to confidently use it to make crucial decisions.
• Vexing covers the effectiveness of the algorithm. The algorithm needs to be designed to ensure that
data processing time is close to linear and the algorithm does not have any bias; irrespective of the
volume of the data, the algorithm is able to process the data in reasonable time.
• Variability is the scale of data. Data in biology is multi-scale, ranging from sub-atomic ions at
picometers, macro-molecules, cells, tissues and finally to a population [9] at thousands of kilometers.
• Value is the final actionable insight or the functional knowledge. The same mutation in a gene may
have a different effect depending on the population or the environmental factors.
9March 3-5, 2016
Reference: Talukder AK, Genomics 3.0: Big Data in Precision Medicine, Big Data Analytics, Springer, LNCS9498, 2015
21st Century Biomedicine is a Multi-Scale Challenge
Genome
Transcriptome
Proteome
Cellular Structure
and Function
Tissue Structure
and Function
Organ Structure
and Function
Patient
Molecular Scale
ηm~μm μm~mm mm~cm 1mηm
ηs
ηs-μs
μs-s
s~hour
hour~day
years
Molecular
events (Eg:
Ion-channel
gating
Diffusion
and cell
signaling
Motility
Mitosis
Protein
turnover
Human
lifetime
10March 3-5, 2016
Genomics
Phenomics
‘OMICS’ (High-throughput) Big Data Domains
GWAS
Population
GeneticsMicroarray
Systems
Biology
Phenomics
ChIp-Seq DNA-Seq
RNA-Seq
Exome-Seq
Repli-Seq
Small
RNA-Seq
Metabolic
Networks
Proteomics
Metagenomics
11March 3-5, 2016
Multi Omics Big Data
12March 3-5, 2016
Reference: Talukder AK, Genomics 3.0: Big Data in Precision Medicine, Big Data Analytics, Springer, LNCS9498, 2015
Hypothesis Creating Multi-Omics Big Data Analytics
13March 3-5, 2016
Genomic Big Data
Statistics
(Exploratory Data Analysis)
Phenomic &
Environmental
Knowledgebase
Systems Biology
Reference: Talukder AK, Genomics 3.0: Big Data in Precision Medicine, Big Data Analytics, Springer, LNCS9498, 2015
Lung Cancer: A Multi-Omics Multi-Scale
Big Data Case Study using iOMICS Pipelines
 We have taken a Lung squamous cell carcinoma
study and reanalysed its data using iOMICS pipelines
to unleash novel knowledge
 The reanalysis is for a Lung Squamous Cell
Carcinoma (SCC) 18 years Longitudinal clinical
research published in PMID: 25189482.
 The data consist of Omics data for 93 tumor patients
and 16 healthy individuals.
 DNA level genotype data: 64 tumor samples, 373,398
DNA sites
 RNA level gene expression data: 109 samples, 20,117
genes
 Clinical data: General and clinical information (where
applicable) for all 109 individuals in the study. Survival
information was also available in the form of overall and
disease recurrence free survival
14March 3-5, 2016
Multi-Omics Based Multi-Scale Analytics Framework
 Data from patient is integrated with existing
knowledgebases using a 3 step analysis framework
 Top-down Exploratory Data Analysis: Analysis of
experimental data for molecular information such as
DNA mutations and gene expression
 Multi-scale Integrative Analysis: Integration of
molecular scale data such as DNA, RNA level results
for mechanistic modeling
 Bottom-up Integrative and Network Analysis:
Integration of experimental data analysis results with
existing knowledgebases for generalizability and
improved quality results
 Results from the framework can be used to power
clinical decision support systems for treatment
strategies and drug design
15March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics,
Springer, LNCS9498, 2015
Patient Stratification
 Integration of gene expression with
recurrence free survival of patients
 Recurrence free survival known for 87
samples
 Used Cox regression to model survival
time as a function of gene expression
 Stratified patients into 3 response
groups:
Good, Average and Poor prognosis
Aim: Markers of Patient Survival
Survival Probability Curves for Stratified Prognosis Groups
 Top Significant Genes separating Poor and Average
prognosis are: EIF5A, SCEL, ABCA11P, VAV2
 Top Significant Genes separating Good and Average
prognosis are: SLC7A11, G6PD, ALDH3A1, NQO1,
SOST
 Top Significant Genes separating Good and Poor
prognosis are: SCEL, VAV2, PPP1R26, ZNF77, EIF5A
16March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
Phenotype Based Patient Stratification
17© InterpertOmics
Data: Lung Squamous Cell Carcinoma with Basaloid Histology Study (PubMed-ID: 25189482) Clinical Data.
Overall
Survivability: The
basaloid tumor
samples have
poor overall
survival (OS)
compared to the
other samples. Fig
1 in the original
paper
Recurrence Free
Survivability:
Basaloid tumors
show distinctly
poor recurrence-
free survival (RFS)
compared to other
samples
Age factor:
Patients diagnosed
at an age of 53 or
less showed better
prognosis
compared to those
diagnosed later
Adjuvant
Radiotherapy
Factor: Patients
who did not receive
adjuvant
radiotherapy (Age
≤ 53) show better
overall survival,
compared to those
who did
Unique Findings:
- The basaloid subtypes showed distinctly poor prognosis compared to the other samples
- Adjuvant radiotherapy is not very effective for improving patient survival in these cases
- For patients diagnosed before 53 years of age, administration of adjuvant radiotherapy represents worse long term overall survival
Aim: Markers of Patient Survival
Differential Gene Expression
 Key differentially expressed protein-
coding genes between the 2 cancer
subtypes were identified
 106 differentially expressed genes were
identified based on the filtering criteria
 Key differentially expressed genes were:
KLHL23, IVL, MPZL2, KCNK6, SPRR3,
ELL2, MALL, RPRD1A, ZNF124
p-value criteria ≤ 0.0001 and
absolute log fold change > 0.6
Aim: Basaloid vs. SCC Molecular Comparison
18March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
Mutation Association with Cancer
 Identified DNA sequence sites with different genotypes between the two lung carcinoma
subtypes (basaloid and SCC)
 After linkage analysis and filtering, the 373,398 sites were reduced to 735 disease type
associated DNA loci
 These mapped to 558 unique genes
Aim: Basaloid vs. SCC Molecular Comparison
Karyotype Plot for Mutation
Locations across
Chromosomes
p-value criteria ≤ 0.001 and
odds ratio criteria ≥ 3
19March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
We characterized the 558
mutated genes identified from the
DNA level analysis using
XomPathways
Results indicated the key
pathways differentiating the
tumor subtypes such as cell
signaling and adhesion
Functional Characterization
Aim: Basaloid vs. SCC Molecular Comparison
p-value criteria ≤ 0.001 for pathway enrichment
20March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data
Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
Pathway-Pathway
Network
Gene-Gene
Network
Genes-Pathway
Bipartite Network
Multi-Scale Integrative Biology (Expression QTL)
Aim: Basaloid vs. SCC Molecular Comparison
 The DNA level variants identified for the basaloid
histology comparisons were compared with Gene
Expression levels to view the effect of mutations from
the DNA to RNA level
 Expression levels of some of the associated genes
were altered with a large fold change
 Interesting genes include:
CLCA2, CENPF, SHROOM3, ELL2, ATP10B, CASC15,
TIAM2, PROX1, EYA1, C10orf54, HOXC9, SCEL,
BCL2, FUT3, YPEL1, PATZ1, CAV2
21March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
Multi-Scale Integration
 Mutations associated with expression level changes were identified
 These were associated with up or down-regulation of gene expression
Genes-Mutations Integration
22March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
Functional Characterization
Aim: Basaloid vs. SCC Molecular Comparison
 Functional Enrichment
highlights key pathways
involved. For the top
differentially expressed genes
between tumor and normal
samples.
 The pathways and processes
involved in epidermal and
epithelial cell differentiation
Together, the functional analysis results show that the primary differences between the basaloid and
SCC subtypes are associated with tissue structure
This is consistent with histology based distinction between the two subtypes
Genes -Biological Processes Bipartite Network
23March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
Metabolic and Biochemical Reactions Integration
Aim: Identification of Potential Drug Targets
 Genomic level alterations translate into
protein and metabolism changes, which finally
affect phenotype at a cellular and tissue level
 Using expression data, metabolic network
models were constructed for healthy and lung
cancer samples
 Recon X was taken as a reference genome
scale model
Genes associated with maximum metabolic
alterations can serve as effective targets
Carbohydrate Metabolism Pathways
Image source: Khazaei, T., McGuigan,
A., Mahadevan, R.: Ensemble
modeling of cancer metabolism.
Frontiers in physiology 3 (2012)
24March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
Solve Constrained Based Differential Equations
Aim: Identification of Potential Drug Targets
Three Step process
 Step I: Model initiation using constraint
based modeling
 Cancer state optimized for maximum
growth
 Healthy state optimized for maximum
energy production
 Step II: Identification of highly altered
reactions and associated genes
 Step III: Extension of gene list to include
first degree PPI interactions as potential
targets
Step I
Step II
Step III
25March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big
Data Analytics, Springer, LNCS9498, 2015
Metabolic and Protein Network Integration
Aim: Identification of Potential Drug Targets
Identified Metabolic Reactions Network Protein-protein interactions for an identified gene
(EIF1B)
26March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
 Potential targets were identified as genes with large association with altered
reactions
 High degree in the human protein interaction network for these genes indicates that
effect of targeting these will impact more pathways and may be toxic to the cell
 Identified potential drug targets include: NME2, GSR, YWHAZ, TGM2, JAM2,
STAT3, TIMP2, RHOB, GIT2 and TK1
Systems Biology and the Small Molecule Targets
Aim: Identification of Potential Drug Targets
27March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
Conclusions from the Cancer/iOMICS Case Study
 Molecular differences between basaloid and SCC lung carcinoma subtypes:
 Based on DNA and RNA level comparisons, we were able to identify genes involved in the
differentiation of the two cancer subtypes.
 We tracked the mutations in genes such as SHROOM3, PROX1, CLCA2 etc. to gene expression
alterations.
 The molecular level differences between the two subtypes were able to predict the cellular and
tissue level differences seen between the subtypes
 Molecular states associated with poor patient survival:
 Identified genes involved in poor patient survival probabilities such as VAV2, EIF5A, SCEL etc.
 Identified a hidden molecular subtype within the pure basaloid subgroup, having particularly poor
prognosis
 Identification of potential drug targets:
 Based on the translation of gene expression to metabolic fluxes, we identified key altered metabolic
pathways, reactions and associated genes which are putative drug targets
All analysis results were validated using extensive bibliomic data
28March 3-5, 2016
Omnia Knowledgebase
& Clinical Decision Support System
29March 3-5, 2016
Patient Specific Survival for breast cancer based on the patient
age, sex, grade and stage. There are 2,613 individuals with breast
cancer of age group 45-49, from SEER within Omnia
• For adjuvant therapeutic intervention A+B,
overall QALYs (Quality Adjusted Life Year)
is around 8 years and cost per QALY is
₹2,00,000; with likely disease burden of
~₹16,00,000 for 8 years of life.
• For drug A, the overall QALYs is around 6
years and cost per QALY is ₹80,000; with
likely burden of ~₹4,80,000 for 6 years of
life.
• Using this prognostic information, informed
decision can be made by considering the
QALYs and the total cancer burden.
Drugs with detailed description report for breast cancer type
chr16_g.69373414T>C (NIP7)
Omnia contains curated Multi-Omics data (Variation, Expression, GO,
Pathway, Drug, and Pharmacogenomics) along with subjects’ clinical data
such as Demographics, Environmental, Phenotype and other attributes like
HGNC, OMIM, UMLS, ICD10, SEER, and MeSH terms. Currently, Omnia
contains more than 200,000 Variations, 100 Genomic experiments and 5000
Curated papers for Genotype-Phenotype relationships.
Reference: Adhil M, Talukder AK, Gandham S, Agarwal M, CuraEx: Clinical Expert System Using Big data for Precision Medicine,
Big Data Analytics, Springer, LNCS9498, 2015
iOMICS – the MultiOmics Platform
30March 3-5, 2016
iOMICS App Store
31March 3-5, 2016
Enterprises Disrupting Biomedical Industries
InterpretOmics
(http://www.interpretomics.co)
Revolutionizing Genomics through Big data Multi-Scale
Multi-Omics Solutions
Singapore Life Sciences
Transforming Life Sciences and Precision Medicine
Applied Genetics Diagnostics
(http://www.appgendx.com)
The Next Generation Healthsciences company offering
Genetic Diagnostic Services
32March 3-5, 2016
JNCASR
Some Of Our Collaborators/Customers
33March 3-5, 2016
iOMICS Accelerate Your Biomedical Research –
Making it Quicker, Reliable, and Affordable
InterpretOmics
Office: Shezan Lavelle, 5th Floor,
#15 Walton Road, Bengaluru 560001
Sequencing Center: #329, 7th Main, HAL 2nd Stage,
Indiranagar, Bengaluru 560008
Phone: +91(80)46623800

More Related Content

PPT
PDF
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
PPTX
NetBioSIG2014-Talk by David Amar
PPTX
AI in Bioinformatics
PPT
Project report-on-bio-informatics
PPTX
Drug discovery using ai
PDF
Overpromise of AI in Drug Discovery
PPTX
Sigma Xi 2021 Andrew Gao Presentation
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
NetBioSIG2014-Talk by David Amar
AI in Bioinformatics
Project report-on-bio-informatics
Drug discovery using ai
Overpromise of AI in Drug Discovery
Sigma Xi 2021 Andrew Gao Presentation

What's hot (20)

PPT
Evolution of Knowledge Discovery and Management
PDF
Pine Biotech
PDF
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
PPTX
GENOME DATA ANALYSIS
PDF
Data for AI models, the past, the present, the future
PDF
Basics of Data Analysis in Bioinformatics
PPTX
NetBioSIG2013-Talk Thomas Kelder
PDF
Reconstruction and analysis of cancerspecific Gene regulatory networks from G...
PPTX
Pistoia Alliance-Elsevier Datathon
PDF
Technology R&D Theme 2: From Descriptive to Predictive Networks
PDF
Pine.Bio slide deck - Idea Village CAPITALx (New Orleans Entrepreneur Week 2017)
PDF
NetBioSIG2012 anyatsalenko-en-viz
PDF
Gcc talk baltimore july 2014
PPTX
Introduction to Gene Mining Part A: BLASTn-off!
PPTX
Database technologies in bioinformatics
PDF
Application of blockchain technology in healthcare and biomedicine
PPTX
Career oppurtunities in the field of Bioinformatics
PDF
Wim de Grave: Big Data in life sciences
PDF
Comparing Genetic Evolutionary Algorithms on Three Enzymes of HIV-1: Integras...
PDF
ASHG_2014_AP
Evolution of Knowledge Discovery and Management
Pine Biotech
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
GENOME DATA ANALYSIS
Data for AI models, the past, the present, the future
Basics of Data Analysis in Bioinformatics
NetBioSIG2013-Talk Thomas Kelder
Reconstruction and analysis of cancerspecific Gene regulatory networks from G...
Pistoia Alliance-Elsevier Datathon
Technology R&D Theme 2: From Descriptive to Predictive Networks
Pine.Bio slide deck - Idea Village CAPITALx (New Orleans Entrepreneur Week 2017)
NetBioSIG2012 anyatsalenko-en-viz
Gcc talk baltimore july 2014
Introduction to Gene Mining Part A: BLASTn-off!
Database technologies in bioinformatics
Application of blockchain technology in healthcare and biomedicine
Career oppurtunities in the field of Bioinformatics
Wim de Grave: Big Data in life sciences
Comparing Genetic Evolutionary Algorithms on Three Enzymes of HIV-1: Integras...
ASHG_2014_AP
Ad

Similar to Genomics2 Phenomics Complete (20)

PPTX
Genomics and Computation in Precision Medicine March 2017
PPTX
Cancer Moonshot, Data sharing and the Genomic Data Commons
PPTX
NCI Cancer Genomic Data Commons for NCAB September 2016
PPTX
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
PPTX
Nci clinical genomics data sharing ncra sept 2016
PPTX
ICBO 2014, October 8, 2014
PDF
Cancer Analytics Poster
PPTX
A Vision for a Cancer Research Knowledge System
PDF
Big data in basic and translational cancer research.pdf
PPT
The Importance Of Data Mining By Musa Mohd. Nordin, Noor
PDF
Analyze Genomes Services for Precision Medicine
PDF
Can SAR Database: An Overview on System, Role and Application
PPT
Quantitative Medicine Feb 2009
PDF
Analyze Genomes Services for Precision Medicine
PPTX
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...
PPTX
Enabling Translational Medicine with e-Science
PDF
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
PPTX
Emerging collaboration models for academic medical centers _ our place in the...
Genomics and Computation in Precision Medicine March 2017
Cancer Moonshot, Data sharing and the Genomic Data Commons
NCI Cancer Genomic Data Commons for NCAB September 2016
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
Nci clinical genomics data sharing ncra sept 2016
ICBO 2014, October 8, 2014
Cancer Analytics Poster
A Vision for a Cancer Research Knowledge System
Big data in basic and translational cancer research.pdf
The Importance Of Data Mining By Musa Mohd. Nordin, Noor
Analyze Genomes Services for Precision Medicine
Can SAR Database: An Overview on System, Role and Application
Quantitative Medicine Feb 2009
Analyze Genomes Services for Precision Medicine
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...
Enabling Translational Medicine with e-Science
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Emerging collaboration models for academic medical centers _ our place in the...
Ad

More from InterpretOmics (7)

PPTX
iOMICS Clinical & Omnia
PDF
Big Data in Disease Management
PPT
iOMICS Research
PDF
Bda2015 tutorial-part2-data&databases
PDF
Bda2015 tutorial-part1-intro
PPT
Cloud Computing and Innovations for Optimizing Life Sciences Research
PPT
CNVMiner: Pipeline to Mine CNV & Structural Variation in Hierarchical Fashion
iOMICS Clinical & Omnia
Big Data in Disease Management
iOMICS Research
Bda2015 tutorial-part2-data&databases
Bda2015 tutorial-part1-intro
Cloud Computing and Innovations for Optimizing Life Sciences Research
CNVMiner: Pipeline to Mine CNV & Structural Variation in Hierarchical Fashion

Recently uploaded (20)

PPTX
NRP and care of Newborn.pptx- APPT presentation about neonatal resuscitation ...
PPTX
HYPERSENSITIVITY REACTIONS - Pathophysiology Notes for Second Year Pharm D St...
PDF
Oral Aspect of Metabolic Disease_20250717_192438_0000.pdf
PPTX
MANAGEMENT SNAKE BITE IN THE TROPICALS.pptx
PPTX
IMAGING EQUIPMENiiiiìiiiiiTpptxeiuueueur
PPTX
1. Basic chemist of Biomolecule (1).pptx
PDF
شيت_عطا_0000000000000000000000000000.pdf
PPTX
Post Op complications in general surgery
PDF
Calcified coronary lesions management tips and tricks
PDF
SEMEN PREPARATION TECHNIGUES FOR INTRAUTERINE INSEMINATION.pdf
PDF
focused on the development and application of glycoHILIC, pepHILIC, and comm...
PDF
Pharmaceutical Regulation -2024.pdf20205939
PDF
OSCE Series Set 1 ( Questions & Answers ).pdf
PPTX
Enteric duplication cyst, etiology and management
PDF
Comparison of Swim-Up and Microfluidic Sperm Sorting.pdf
PDF
Lecture on Anesthesia for ENT surgery 2025pptx.pdf
PPTX
Human Reproduction: Anatomy, Physiology & Clinical Insights.pptx
PPTX
09. Diabetes in Pregnancy/ gestational.pptx
PPTX
Effects of lipid metabolism 22 asfelagi.pptx
PDF
OSCE SERIES ( Questions & Answers ) - Set 5.pdf
NRP and care of Newborn.pptx- APPT presentation about neonatal resuscitation ...
HYPERSENSITIVITY REACTIONS - Pathophysiology Notes for Second Year Pharm D St...
Oral Aspect of Metabolic Disease_20250717_192438_0000.pdf
MANAGEMENT SNAKE BITE IN THE TROPICALS.pptx
IMAGING EQUIPMENiiiiìiiiiiTpptxeiuueueur
1. Basic chemist of Biomolecule (1).pptx
شيت_عطا_0000000000000000000000000000.pdf
Post Op complications in general surgery
Calcified coronary lesions management tips and tricks
SEMEN PREPARATION TECHNIGUES FOR INTRAUTERINE INSEMINATION.pdf
focused on the development and application of glycoHILIC, pepHILIC, and comm...
Pharmaceutical Regulation -2024.pdf20205939
OSCE Series Set 1 ( Questions & Answers ).pdf
Enteric duplication cyst, etiology and management
Comparison of Swim-Up and Microfluidic Sperm Sorting.pdf
Lecture on Anesthesia for ENT surgery 2025pptx.pdf
Human Reproduction: Anatomy, Physiology & Clinical Insights.pptx
09. Diabetes in Pregnancy/ gestational.pptx
Effects of lipid metabolism 22 asfelagi.pptx
OSCE SERIES ( Questions & Answers ) - Set 5.pdf

Genomics2 Phenomics Complete

  • 1. 18th June 2001017th December 2009 Genomics to Phenomics: The Complex Journey in Big Data Biomedicine Asoke K Talukder, PhD InterpretOmics, Bangalore Indian Society of Human Genetics 41st Annual Meeting, Sankara Nethralaya, Chennai, 3-5 March, 2016
  • 2. Acknowledgement • Organizing Committee, ISHG2016 • Authors & Agencies Making their Research articles, and Data available in the open domain and Internet • Authors of Open source Software • NCBI, NIH, Wikipedia, Google & other Internet sites that believe in Bhikshu Economy by making their contents open in the Cloud 2March 3-5, 2016
  • 3. Hunting the “Dwarfing” Gene? 3March 3-5, 2016 Palm Oil – Activate the Dwarfing Gene (Genomics)Teak – Repress the Dwarfing Gene (Genomics)
  • 4. The Human Genome – Decoding the Book of Life A Milestone for Humanity – the Human genome Human Genome Completed, 26 June, 2000 Francis CollinsBill ClintonJ Craig Ventor Craig Venter Bill Clinton Francis Collins 4March 3-5, 2016
  • 5. Trillion-Dollar Science to Trillion-Dollar Industry 5March 3-5, 2016
  • 6. The relationship between the number of stem cell divisions in the lifetime of a given tissue and the lifetime risk of cancer in that tissue Reference: Cristian Tomasetti, and Bert Vogelstein, Jan 2 Science 2015;347:78-81 6March 3-5, 2016 Reference: Norbert Stefan, et al, Divergent associations of height with cardiometabolic disease and cancer: epidemiology, pathophysiology, and global implications. The Lancet Diabetes & Endocrinology, 2016; DOI:
  • 7. Reduction Vs Integration 7March 3-5, 2016 Genomics (System) (Genetics) Talukder AK, Genomics 3.0, Big Data Analytics, Springer, 2015
  • 8. Evidence Based Science (Biology & Medicine) 8March 3-5, 2016 Genetics Genomics Confirmatory Exploratory Hypothesis Driven Hypothesis Creating Component Holistic Biology Statistical Data Mining
  • 9. Big Data in Biomedicine The 7 Vs of Genomic Big Data • Volume is defined in terms of the physical volume of the data that need to be online, like giga-byte (10^9), tera-byte (10^12), peta-byte (10^15) or exa-byte (10^18) or even beyond. • Velocity is about the data-retrieval time or the time taken to service a request. Velocity is also measured through the rate of change of the data volume. • Variety relates to heterogeneous types of data like text, structured, unstructured, video, audio etcetera. • Veracity is another dimension to measure data reliability - the ability of an organization to trust the data and be able to confidently use it to make crucial decisions. • Vexing covers the effectiveness of the algorithm. The algorithm needs to be designed to ensure that data processing time is close to linear and the algorithm does not have any bias; irrespective of the volume of the data, the algorithm is able to process the data in reasonable time. • Variability is the scale of data. Data in biology is multi-scale, ranging from sub-atomic ions at picometers, macro-molecules, cells, tissues and finally to a population [9] at thousands of kilometers. • Value is the final actionable insight or the functional knowledge. The same mutation in a gene may have a different effect depending on the population or the environmental factors. 9March 3-5, 2016 Reference: Talukder AK, Genomics 3.0: Big Data in Precision Medicine, Big Data Analytics, Springer, LNCS9498, 2015
  • 10. 21st Century Biomedicine is a Multi-Scale Challenge Genome Transcriptome Proteome Cellular Structure and Function Tissue Structure and Function Organ Structure and Function Patient Molecular Scale ηm~μm μm~mm mm~cm 1mηm ηs ηs-μs μs-s s~hour hour~day years Molecular events (Eg: Ion-channel gating Diffusion and cell signaling Motility Mitosis Protein turnover Human lifetime 10March 3-5, 2016 Genomics Phenomics
  • 11. ‘OMICS’ (High-throughput) Big Data Domains GWAS Population GeneticsMicroarray Systems Biology Phenomics ChIp-Seq DNA-Seq RNA-Seq Exome-Seq Repli-Seq Small RNA-Seq Metabolic Networks Proteomics Metagenomics 11March 3-5, 2016
  • 12. Multi Omics Big Data 12March 3-5, 2016 Reference: Talukder AK, Genomics 3.0: Big Data in Precision Medicine, Big Data Analytics, Springer, LNCS9498, 2015
  • 13. Hypothesis Creating Multi-Omics Big Data Analytics 13March 3-5, 2016 Genomic Big Data Statistics (Exploratory Data Analysis) Phenomic & Environmental Knowledgebase Systems Biology Reference: Talukder AK, Genomics 3.0: Big Data in Precision Medicine, Big Data Analytics, Springer, LNCS9498, 2015
  • 14. Lung Cancer: A Multi-Omics Multi-Scale Big Data Case Study using iOMICS Pipelines  We have taken a Lung squamous cell carcinoma study and reanalysed its data using iOMICS pipelines to unleash novel knowledge  The reanalysis is for a Lung Squamous Cell Carcinoma (SCC) 18 years Longitudinal clinical research published in PMID: 25189482.  The data consist of Omics data for 93 tumor patients and 16 healthy individuals.  DNA level genotype data: 64 tumor samples, 373,398 DNA sites  RNA level gene expression data: 109 samples, 20,117 genes  Clinical data: General and clinical information (where applicable) for all 109 individuals in the study. Survival information was also available in the form of overall and disease recurrence free survival 14March 3-5, 2016
  • 15. Multi-Omics Based Multi-Scale Analytics Framework  Data from patient is integrated with existing knowledgebases using a 3 step analysis framework  Top-down Exploratory Data Analysis: Analysis of experimental data for molecular information such as DNA mutations and gene expression  Multi-scale Integrative Analysis: Integration of molecular scale data such as DNA, RNA level results for mechanistic modeling  Bottom-up Integrative and Network Analysis: Integration of experimental data analysis results with existing knowledgebases for generalizability and improved quality results  Results from the framework can be used to power clinical decision support systems for treatment strategies and drug design 15March 3-5, 2016 Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
  • 16. Patient Stratification  Integration of gene expression with recurrence free survival of patients  Recurrence free survival known for 87 samples  Used Cox regression to model survival time as a function of gene expression  Stratified patients into 3 response groups: Good, Average and Poor prognosis Aim: Markers of Patient Survival Survival Probability Curves for Stratified Prognosis Groups  Top Significant Genes separating Poor and Average prognosis are: EIF5A, SCEL, ABCA11P, VAV2  Top Significant Genes separating Good and Average prognosis are: SLC7A11, G6PD, ALDH3A1, NQO1, SOST  Top Significant Genes separating Good and Poor prognosis are: SCEL, VAV2, PPP1R26, ZNF77, EIF5A 16March 3-5, 2016 Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
  • 17. Phenotype Based Patient Stratification 17© InterpertOmics Data: Lung Squamous Cell Carcinoma with Basaloid Histology Study (PubMed-ID: 25189482) Clinical Data. Overall Survivability: The basaloid tumor samples have poor overall survival (OS) compared to the other samples. Fig 1 in the original paper Recurrence Free Survivability: Basaloid tumors show distinctly poor recurrence- free survival (RFS) compared to other samples Age factor: Patients diagnosed at an age of 53 or less showed better prognosis compared to those diagnosed later Adjuvant Radiotherapy Factor: Patients who did not receive adjuvant radiotherapy (Age ≤ 53) show better overall survival, compared to those who did Unique Findings: - The basaloid subtypes showed distinctly poor prognosis compared to the other samples - Adjuvant radiotherapy is not very effective for improving patient survival in these cases - For patients diagnosed before 53 years of age, administration of adjuvant radiotherapy represents worse long term overall survival Aim: Markers of Patient Survival
  • 18. Differential Gene Expression  Key differentially expressed protein- coding genes between the 2 cancer subtypes were identified  106 differentially expressed genes were identified based on the filtering criteria  Key differentially expressed genes were: KLHL23, IVL, MPZL2, KCNK6, SPRR3, ELL2, MALL, RPRD1A, ZNF124 p-value criteria ≤ 0.0001 and absolute log fold change > 0.6 Aim: Basaloid vs. SCC Molecular Comparison 18March 3-5, 2016 Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
  • 19. Mutation Association with Cancer  Identified DNA sequence sites with different genotypes between the two lung carcinoma subtypes (basaloid and SCC)  After linkage analysis and filtering, the 373,398 sites were reduced to 735 disease type associated DNA loci  These mapped to 558 unique genes Aim: Basaloid vs. SCC Molecular Comparison Karyotype Plot for Mutation Locations across Chromosomes p-value criteria ≤ 0.001 and odds ratio criteria ≥ 3 19March 3-5, 2016 Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
  • 20. We characterized the 558 mutated genes identified from the DNA level analysis using XomPathways Results indicated the key pathways differentiating the tumor subtypes such as cell signaling and adhesion Functional Characterization Aim: Basaloid vs. SCC Molecular Comparison p-value criteria ≤ 0.001 for pathway enrichment 20March 3-5, 2016 Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015 Pathway-Pathway Network Gene-Gene Network Genes-Pathway Bipartite Network
  • 21. Multi-Scale Integrative Biology (Expression QTL) Aim: Basaloid vs. SCC Molecular Comparison  The DNA level variants identified for the basaloid histology comparisons were compared with Gene Expression levels to view the effect of mutations from the DNA to RNA level  Expression levels of some of the associated genes were altered with a large fold change  Interesting genes include: CLCA2, CENPF, SHROOM3, ELL2, ATP10B, CASC15, TIAM2, PROX1, EYA1, C10orf54, HOXC9, SCEL, BCL2, FUT3, YPEL1, PATZ1, CAV2 21March 3-5, 2016 Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
  • 22. Multi-Scale Integration  Mutations associated with expression level changes were identified  These were associated with up or down-regulation of gene expression Genes-Mutations Integration 22March 3-5, 2016 Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
  • 23. Functional Characterization Aim: Basaloid vs. SCC Molecular Comparison  Functional Enrichment highlights key pathways involved. For the top differentially expressed genes between tumor and normal samples.  The pathways and processes involved in epidermal and epithelial cell differentiation Together, the functional analysis results show that the primary differences between the basaloid and SCC subtypes are associated with tissue structure This is consistent with histology based distinction between the two subtypes Genes -Biological Processes Bipartite Network 23March 3-5, 2016 Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
  • 24. Metabolic and Biochemical Reactions Integration Aim: Identification of Potential Drug Targets  Genomic level alterations translate into protein and metabolism changes, which finally affect phenotype at a cellular and tissue level  Using expression data, metabolic network models were constructed for healthy and lung cancer samples  Recon X was taken as a reference genome scale model Genes associated with maximum metabolic alterations can serve as effective targets Carbohydrate Metabolism Pathways Image source: Khazaei, T., McGuigan, A., Mahadevan, R.: Ensemble modeling of cancer metabolism. Frontiers in physiology 3 (2012) 24March 3-5, 2016 Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
  • 25. Solve Constrained Based Differential Equations Aim: Identification of Potential Drug Targets Three Step process  Step I: Model initiation using constraint based modeling  Cancer state optimized for maximum growth  Healthy state optimized for maximum energy production  Step II: Identification of highly altered reactions and associated genes  Step III: Extension of gene list to include first degree PPI interactions as potential targets Step I Step II Step III 25March 3-5, 2016 Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
  • 26. Metabolic and Protein Network Integration Aim: Identification of Potential Drug Targets Identified Metabolic Reactions Network Protein-protein interactions for an identified gene (EIF1B) 26March 3-5, 2016 Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
  • 27.  Potential targets were identified as genes with large association with altered reactions  High degree in the human protein interaction network for these genes indicates that effect of targeting these will impact more pathways and may be toxic to the cell  Identified potential drug targets include: NME2, GSR, YWHAZ, TGM2, JAM2, STAT3, TIMP2, RHOB, GIT2 and TK1 Systems Biology and the Small Molecule Targets Aim: Identification of Potential Drug Targets 27March 3-5, 2016 Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
  • 28. Conclusions from the Cancer/iOMICS Case Study  Molecular differences between basaloid and SCC lung carcinoma subtypes:  Based on DNA and RNA level comparisons, we were able to identify genes involved in the differentiation of the two cancer subtypes.  We tracked the mutations in genes such as SHROOM3, PROX1, CLCA2 etc. to gene expression alterations.  The molecular level differences between the two subtypes were able to predict the cellular and tissue level differences seen between the subtypes  Molecular states associated with poor patient survival:  Identified genes involved in poor patient survival probabilities such as VAV2, EIF5A, SCEL etc.  Identified a hidden molecular subtype within the pure basaloid subgroup, having particularly poor prognosis  Identification of potential drug targets:  Based on the translation of gene expression to metabolic fluxes, we identified key altered metabolic pathways, reactions and associated genes which are putative drug targets All analysis results were validated using extensive bibliomic data 28March 3-5, 2016
  • 29. Omnia Knowledgebase & Clinical Decision Support System 29March 3-5, 2016 Patient Specific Survival for breast cancer based on the patient age, sex, grade and stage. There are 2,613 individuals with breast cancer of age group 45-49, from SEER within Omnia • For adjuvant therapeutic intervention A+B, overall QALYs (Quality Adjusted Life Year) is around 8 years and cost per QALY is ₹2,00,000; with likely disease burden of ~₹16,00,000 for 8 years of life. • For drug A, the overall QALYs is around 6 years and cost per QALY is ₹80,000; with likely burden of ~₹4,80,000 for 6 years of life. • Using this prognostic information, informed decision can be made by considering the QALYs and the total cancer burden. Drugs with detailed description report for breast cancer type chr16_g.69373414T>C (NIP7) Omnia contains curated Multi-Omics data (Variation, Expression, GO, Pathway, Drug, and Pharmacogenomics) along with subjects’ clinical data such as Demographics, Environmental, Phenotype and other attributes like HGNC, OMIM, UMLS, ICD10, SEER, and MeSH terms. Currently, Omnia contains more than 200,000 Variations, 100 Genomic experiments and 5000 Curated papers for Genotype-Phenotype relationships. Reference: Adhil M, Talukder AK, Gandham S, Agarwal M, CuraEx: Clinical Expert System Using Big data for Precision Medicine, Big Data Analytics, Springer, LNCS9498, 2015
  • 30. iOMICS – the MultiOmics Platform 30March 3-5, 2016
  • 32. Enterprises Disrupting Biomedical Industries InterpretOmics (http://www.interpretomics.co) Revolutionizing Genomics through Big data Multi-Scale Multi-Omics Solutions Singapore Life Sciences Transforming Life Sciences and Precision Medicine Applied Genetics Diagnostics (http://www.appgendx.com) The Next Generation Healthsciences company offering Genetic Diagnostic Services 32March 3-5, 2016
  • 33. JNCASR Some Of Our Collaborators/Customers 33March 3-5, 2016
  • 34. iOMICS Accelerate Your Biomedical Research – Making it Quicker, Reliable, and Affordable InterpretOmics Office: Shezan Lavelle, 5th Floor, #15 Walton Road, Bengaluru 560001 Sequencing Center: #329, 7th Main, HAL 2nd Stage, Indiranagar, Bengaluru 560008 Phone: +91(80)46623800