Visualizing Patient Cohorts
Integrating Data Types, Relationships, and Time
Nils Gehlenborg, PhD
Department of Biomedical Informatics
Harvard Medical School
http://gehlenborglab.org | nils@hms.harvard.edu | @ngehlenborg
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Patient Cohorts
Typical Characteristics
Dozens to thousands of patients
One or more samples per patient: tumor &
normal tissue, primary tumor & metastatic
tumor(s), multiple time points, etc.
Many attributes per sample: omics data,
clinical measurements, outcomes, etc.
StratomeX
Discovering Subtypes in Tumor Cohorts
Marc Streit, Alexander Lex, Samuel Gratzl, Christian Partl, Dieter Schmalstieg, Hanspeter Pfister,
Peter Park, Nils Gehlenborg 
Guided Visual Exploration of Genomic Stratifications in Cancer
Nature Methods, 11, 884–885, 2014
Samuel Gratzl
datavisyn
Marc Streit
JKU Linz
Alexander Lex
University of Utah
The Cancer Genome Atlas
10,000+ patients
20+ tumor types
microRNA expression
DNA methylation
protein expression
copy number variants
mutation calls
clinical parameters
mRNA expression
The Cancer Genome Atlas
10,000+ patients
20+ tumor types
mRNA expression
C4C3C2C1
mRNA expression clustering
Tumor Subtypes
C4C3C2C1
DEL NORMAL AMP
copy number of gene X
mRNA expression clustering
Tumor Subtypes
C4C3C2C1
DEL NORMAL AMP
WILDTYPEMUT
copy number of gene X
mutation status of gene Y
mRNA expression clustering
Tumor Subtypes
C4C3C2C1
DEL NORMAL AMP
WILDTYPEMUT
mRNA expression clustering
copy number of gene X
mutation status of gene Y
Tumor Subtypes
C4C3C2C1
DEL NORMAL AMP
WILDTYPEMUT
mRNA expression clustering
copy number of gene X
mutation status of gene Y
Tumor Subtypes
C4C3C2C1
DEL NORMAL AMP
WILDTYPEMUT
mRNA expression clustering
copy number of gene X
mutation status of gene Y
Tumor Subtypes
C4C3C2C1
DEL NORMAL AMP
WILDTYPEMUT
copy number of gene X
mutation status of gene Y
mRNA expression clustering
Tumor Subtypes
Tumor Subtypes
PROBLEM 1
Visualize overlap of patient sets across two or more stratifications.
PROBLEM 2
Visualize characteristics of patient sets within a stratification of interest.
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
StratomeX
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Tumor Subtypes
PROBLEM 1
Visualize overlap of patient sets across two or more stratifications.
PROBLEM 2
Visualize characteristics of patient sets within a stratification of interest.
PROBLEM 3
Identify relevant stratifications, pathways, and clinical variables.
Is there a mutation that overlaps with this mRNA cluster?
Is there a CNV that affects survival?
Is there a pathway that is enriched in this cluster?
Is there a mutually exclusive mutation?
Query
Stratifications
Clinical Params
Pathways
Guided
Exploration
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
Query
Rank
Visualize
Stratifications
Clinical Params
Pathways
Guided
Exploration
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
Individual sets with large overlap: Jaccard Index
Overall similarity of stratifications: Adjusted Rand Index
Survival: Log Rank Score (one vs rest)
Queries to retrieve stratifications
Gene Set Enrichtment Score: original GSEA or Parametric
Assignment of Gene Set Enrichment (PAGE) (one vs rest)
Queries to retrieve pathways
L Hubert & P Arabie, Journal of Classification (1985)
A Subramanian et al., PNAS (2005)
S-Y Kim & DJ Volsky, BMC Bioinformatics (2005)
Guided
Exploration
Query
Rank
Visualize
Stratifications
Clinical Params
Pathways
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
Guided
Exploration
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Cluster Refinement
Adjust cluster (i.e. subtype) membership based on within- and between-cluster
metrics in context of other data
M Kern, A Lex, N Gehlenborg, C Johnson, BMC Bioinformatics (2017)
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Samuel Gratzl, Nils Gehlenborg, Alexander Lex, Hanspeter Pfister, Marc Streit 
Domino: Extracting, Comparing, and Manipulating Subsets across Multiple Tabular Datasets 
IEEE Transactions on Visualization and Computer Graphics (InfoVis '14), 20(12), 2023-2032, 2014
Samuel Gratzl
datavisyn
Marc Streit
JKU Linz
Alexander Lex
University of Utah
Domino
Extracting, Comparing, and Manipulating Subsets
across Tabular Datasets
Motivation
1. StratomeX is limited to a rigid columnar layout
2. StratomeX only shows connections on a block level, not for individual samples
3. StratomeX only supports exploration along the sample/patient dimension
TCGA Example
S Gratzl, N Gehlenborg, A Lex, H Pfister, M Streit, TVCG (2014)
Blocks:
Partitioned Numerical Matrix
S Gratzl, N Gehlenborg, A Lex, H Pfister, M Streit, TVCG (2014)
Types
Blocks:
Partitioned Numerical Matrix
S Gratzl, N Gehlenborg, A Lex, H Pfister, M Streit, TVCG (2014)
Representations
S Gratzl, N Gehlenborg, A Lex, H Pfister, M Streit, TVCG (2014)
Blocks: Relationships
S Gratzl, N Gehlenborg, A Lex, H Pfister, M Streit, TVCG (2015)
Supported Techniques
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
OncoThreads
Incorporating Longitudinal Information
Theresa Harbig, Sabrina Nusrat, Alex Thomson, Hans Bitter, Tali Mazor, Ethan Cerami, Nils Gehlenborg
Visualization of Longitudinal Cancer Genomics Data
Work in Progress
Sabrina Nusrat
Harvard
Theresa Harbig
Harvard
Motivation
1. Cohorts of patients with longitudinal sample information
2. Information about what happened between samples critical for interpretation
3. Application to longitudinal cancer cohorts or clinical trials
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
Conclusion
Take Aways
Despite highly heterogeneous data, the “block and ribbon” approaches are able
to integrate a wide range of data types
Integration of auxiliary visualization types (pathways, Kaplan-Meier plots, box
plots, etc.) extend the possibilities
Ability to aggregate data is critical to these approaches
Next Steps
Better integration of specialized visualizations with support for faceting and
aggregation
Scale to 100Ks or Ms of individuals (UK Biobank, All of Us, etc.)
Integration with data management systems (e.g. i2b2 TranSMART, cBioPortal)
- Challenge: generally not designed to support visualization, e.g. aggregation
- Opportunity: easier to deploy visualizations in real-world settings
Integration with analytical backends (e.g. Jupyter Notebooks or pipelines)
Broadening Impact
Consider role of academic visualization research in real-world settings
- novel visualization techniques are informing future work (cf. Domino)
- can fill niche not addressed by or not viable for commercial products
Collaboration with industry beginning at day 1 are ideal (cf. OncoThreads)
- generally true for any visualization project and any project partner!
Better education about strengths and weaknesses of visualization to avoid
disappointment and frustration by investing in the right places

More Related Content

PDF
Patients, Genomes, Time: Visualizing Disease Cohorts
PPTX
C-Change Cancer Big Data, NCI Genomic Data Commons, Cloud Pilots
PDF
DOCX
Abstract - Coy, Eric
PPTX
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...
PPTX
A Vision for a Cancer Research Knowledge System
PPTX
Cancer Moonshot, Data sharing and the Genomic Data Commons
PDF
Big Data & Immunotherapeutics Symposium Program
Patients, Genomes, Time: Visualizing Disease Cohorts
C-Change Cancer Big Data, NCI Genomic Data Commons, Cloud Pilots
Abstract - Coy, Eric
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...
A Vision for a Cancer Research Knowledge System
Cancer Moonshot, Data sharing and the Genomic Data Commons
Big Data & Immunotherapeutics Symposium Program

What's hot (20)

PPTX
National Cancer Data Ecosystem and Data Sharing
PDF
Central mucoepidermoid carcinoma an up to-date analysis of 147 cases
PPTX
Cancer moonshot and data sharing
PPTX
NCI Cancer Imaging Program - Cancer Research Data Ecosystem
PDF
International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...
PDF
Experimental Melanoma Vaccine Shows Early Promise
PPTX
NCI Cancer Genomic Data Commons for NCAB September 2016
PDF
REFRACT Lima 2019
PDF
Oncology
PDF
Incidence of pneumonia and risk factors among patients with head and neck can...
PPTX
Converged IT Summit - NCI Data Sharing
PDF
KI_Booklet (1)
PDF
JournalofCancerEpidemiologyandTreatment1
PPTX
NCI Cancer Genomics, Open Science and PMI: FAIR
PPTX
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
PDF
Computational challenges in precision medicine and genomics
PDF
Multicentric and multifocal versus unifocal breast cancer: differences in the...
PDF
Contribution of genome-wide association studies to scientific research: a pra...
PPTX
Nci clinical genomics data sharing ncra sept 2016
PPTX
NCI TEAG Cancer Moonshot Blue Ribbon Panel Presentation Oct 2016
National Cancer Data Ecosystem and Data Sharing
Central mucoepidermoid carcinoma an up to-date analysis of 147 cases
Cancer moonshot and data sharing
NCI Cancer Imaging Program - Cancer Research Data Ecosystem
International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...
Experimental Melanoma Vaccine Shows Early Promise
NCI Cancer Genomic Data Commons for NCAB September 2016
REFRACT Lima 2019
Oncology
Incidence of pneumonia and risk factors among patients with head and neck can...
Converged IT Summit - NCI Data Sharing
KI_Booklet (1)
JournalofCancerEpidemiologyandTreatment1
NCI Cancer Genomics, Open Science and PMI: FAIR
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
Computational challenges in precision medicine and genomics
Multicentric and multifocal versus unifocal breast cancer: differences in the...
Contribution of genome-wide association studies to scientific research: a pra...
Nci clinical genomics data sharing ncra sept 2016
NCI TEAG Cancer Moonshot Blue Ribbon Panel Presentation Oct 2016
Ad

Similar to Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time (20)

PDF
Cancer Genomics Visualization across Scales: Nucleotides to Cohorts
PDF
Visual Exploration of Clinical and Genomic Data for Patient Stratification
PDF
Data Visualization to Enhance our Understanding of the Cancer Genome
PDF
EMBL John Kendrew Award Lecture 2018
PDF
Guided visual exploration of patient stratifications in cancer genomics
PDF
Data Visualization in Biomedical Sciences: More than Meets the Eye
PDF
A Unified Approach to Exploration, Authoring, and Communication with Reproduc...
PDF
Subphenotyping in TCGA data
PDF
Tracing the Origins of Data and Ideas - Provenance Visualization for Biomedic...
PDF
coad_machine_learning
PPTX
Big Data Training for Cancer Research, Purdue, May 2023
PDF
Visualization Approaches for Biomedical Omics Data: Putting It All Together
PDF
TCIA Data Harmonization Project
PPTX
Jillian ms defense-4-14-14-ja
PPTX
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
PDF
2016 Presentation at the University of Hawaii Cancer Center
PDF
Big data in basic and translational cancer research.pdf
PDF
Bioinformatics in dermato-oncology
PDF
kkyle_poster_FINAL
PPT
GenomeSnip: Fragmenting the Genomic Wheel to augment discovery in cancer rese...
Cancer Genomics Visualization across Scales: Nucleotides to Cohorts
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Data Visualization to Enhance our Understanding of the Cancer Genome
EMBL John Kendrew Award Lecture 2018
Guided visual exploration of patient stratifications in cancer genomics
Data Visualization in Biomedical Sciences: More than Meets the Eye
A Unified Approach to Exploration, Authoring, and Communication with Reproduc...
Subphenotyping in TCGA data
Tracing the Origins of Data and Ideas - Provenance Visualization for Biomedic...
coad_machine_learning
Big Data Training for Cancer Research, Purdue, May 2023
Visualization Approaches for Biomedical Omics Data: Putting It All Together
TCIA Data Harmonization Project
Jillian ms defense-4-14-14-ja
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
2016 Presentation at the University of Hawaii Cancer Center
Big data in basic and translational cancer research.pdf
Bioinformatics in dermato-oncology
kkyle_poster_FINAL
GenomeSnip: Fragmenting the Genomic Wheel to augment discovery in cancer rese...
Ad

More from Nils Gehlenborg (13)

PDF
HiGlass & Friends
PDF
Power to the People: Data Visualization in Biology and Medicine
PDF
Mining Gems from the Data Visualization Literature
PDF
Visualization of 3D Genome Data
PDF
Bayer Data Science Meetup
PDF
HiGlass + HiPiler: Making Sense of Chromosome Interaction Data with Multi-Sca...
PDF
Relaxation Techniques for the Upset Data Scientist
PDF
Multi-Scale Visualization Tools for Exploration of Chromosome Interaction ...
PDF
SMC-RNA BioVis Data Visualization DREAM Challenge Preview
PDF
Approaches for the Integration of Visual and Computational Analysis of Biomed...
PDF
BioVis Meetup @ IEEE VIS 2015
PDF
Visualization Tools for the Refinery Platform - Supporting reproducible resea...
PDF
Biological Visualization Community Meetup 2014
HiGlass & Friends
Power to the People: Data Visualization in Biology and Medicine
Mining Gems from the Data Visualization Literature
Visualization of 3D Genome Data
Bayer Data Science Meetup
HiGlass + HiPiler: Making Sense of Chromosome Interaction Data with Multi-Sca...
Relaxation Techniques for the Upset Data Scientist
Multi-Scale Visualization Tools for Exploration of Chromosome Interaction ...
SMC-RNA BioVis Data Visualization DREAM Challenge Preview
Approaches for the Integration of Visual and Computational Analysis of Biomed...
BioVis Meetup @ IEEE VIS 2015
Visualization Tools for the Refinery Platform - Supporting reproducible resea...
Biological Visualization Community Meetup 2014

Recently uploaded (20)

PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PDF
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PPTX
A powerpoint on colorectal cancer with brief background
PDF
Cosmology using numerical relativity - what hapenned before big bang?
PDF
Packaging materials of fruits and vegetables
PPTX
gene cloning powerpoint for general biology 2
PPTX
Introduction to Immunology (Unit-1).pptx
PPTX
TORCH INFECTIONS in pregnancy with toxoplasma
PDF
7.Physics_8_WBS_Electricity.pdfXFGXFDHFHG
PDF
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
PPT
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PPTX
Preformulation.pptx Preformulation studies-Including all parameter
PPTX
Substance Disorders- part different drugs change body
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
PPTX
gene cloning powerpoint for general biology 2
PPTX
Presentation1 INTRODUCTION TO ENZYMES.pptx
PPTX
Introcution to Microbes Burton's Biology for the Health
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
A powerpoint on colorectal cancer with brief background
Cosmology using numerical relativity - what hapenned before big bang?
Packaging materials of fruits and vegetables
gene cloning powerpoint for general biology 2
Introduction to Immunology (Unit-1).pptx
TORCH INFECTIONS in pregnancy with toxoplasma
7.Physics_8_WBS_Electricity.pdfXFGXFDHFHG
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
Preformulation.pptx Preformulation studies-Including all parameter
Substance Disorders- part different drugs change body
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
gene cloning powerpoint for general biology 2
Presentation1 INTRODUCTION TO ENZYMES.pptx
Introcution to Microbes Burton's Biology for the Health
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine

Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time