Cancer Moonshot,
Data Sharing,
Genomic Data Commons
October 26th, 2016
Warren Kibbe, PhD
Warren.kibbe@nih.gov
@wakibbe
2
To develop the knowledge base
that will lessen the burden of
cancer in the United States and
around the world.
NCI Mission
3
Cancer Statistics
In 2015 there will be an estimated
1,700,000 new cancer cases
and
600,000 cancer deaths
- American Cancer Society 2015
Cancer remains the second most common cause of death in the U.S.
- Centers for Disease Control and Prevention 2015
4
Understanding Cancer
 Precision medicine will lead to fundamental
understanding of the complex interplay between
genetics, epigenetics, nutrition, environment and clinical
presentation and direct effective, evidence-based
prevention and treatment.
Cancer Moonshot: Genomic Data Commons, Data sharing, Ecosystem
 The era of precision oncology is predicated on the
integration of research, care, and molecular medicine
and the availability of data for modeling, risk analysis,
and optimal care
The Genomic Data Commons is a
completely new way of making
scientific data available to the cancer
research community
6
Cancer Data Sharing & Data Commons
• Making data available for discovery, validation,
new therapies
• Working toward a National Learning Health
System for Cancer
• Maximizing the impact, reuse, and
reproducibility of cancer research
• We need to change the incentives for data
sharing
Reduce the risk, improve early detection, outcomes and survivorship in cancer
7
FAIR –
Making data
Findable,
Accessible,
Attributable,
Interoperable,
Reusable,
and provide Recognition
Force11 white paper
https://www.force11.org/group/fairgroup/fairprinciples
8
NIH Genomic Data Sharing Policy
https://gds.nih.gov/
Went into effect January 25, 2015
NCI guidance:
http://www.cancer.gov/grants-training/grants-
management/nci-policies/genomic-data
Requires public sharing of genomic data sets
9
Changing the conversation around data sharing
 How do we find data, software, standards?
 How can we make data, annotations, software, metadata accessible?
 How do we reuse data standards
 How do we make more data machine readable?
NIH Data Commons
NCI Genomic Data Commons
Data commons co-locate data, storage and computing infrastructure, and
commonly used tools for analyzing and sharing data to create an
interoperable resource for the research community.
*Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson, A Case for Data Commons Towards Data Science as a
Service, to appear. Source of image: Interior of one of Google’s Data Center, www.google.com/about/datacenters/.
Cancer Research Data Ecosystem – Cancer Moonshot BRP
Well characterized
research data sets Cancer cohorts Patient data
EHR, Lab Data, Imaging,
PROs, Smart Devices,
Decision Support
Learning from every
cancer patient
Active research
participation
Research information
donor
Clinical Research
Observational studies
Proteogenomics
Imaging data
Clinical trials
Discovery
Patient engaged
Research
Surveillance
Big Data
Implementation research
SEERGDC
11
Cancer Research Data Commons Ecosystem
Genomic
Data Commons
Data Standards
Validation and Harmonization
Imaging
Data Commons
Proteomics
Data Commons
Clinical Data
Commons
(Cohorts / Indiv.)
SEER
(Populations)
Data Contributors and Consumers
Researchers PatientsClinician
s
Institutions
12
(10,000+ patient tumors and increasing)
Courtesy of P. Kuhn (USC)
2006-2015:
A Decade of Illuminating the
Underlying Causes of Primary
Untreated Tumors Omics
Characterization
13
1997 20152001 20051998 20142002 20061999 2003 20112000 2004 2007 20122008 20132009 2010
10/23/2001
(~5 yrs old)
4/21/1997
1/9/2007
(~10 yrs old)
iPod (10GB max)
WinAMP(mp3)
iPhone (EDGE, 16 GB max)
9/16/1999
(~3 yrs old)
802.11b WiFi
4/3/2010
(~13 yrs old)
iPad (EDGE, 64 GB max)
4/23/2005
(~8 yrs old)
9/26/2006
(~9 yrs old)
7/15/2006
2/7/2007
Google
Drive
4/24/2012
(~15 yrs old)7/11/2008
(~11 yrs old)
iPhone 3G
(16 GB max)
9/12/2012
(~15 yrs old)
iPhone5 (LTE, 128 GB max)
Google
Baseline
7/14/2014
(~17 yrs old)
3/9/2015
(~18 yrs old)
Digital technology is changing rapidly
14
$640M
(FY74)
$5.21 B
(FY16)
Cancer therapy is changing
1518
Application of Cancer Genomics is changing
16
Precision Oncology
Trials Launched
2014:
MPACT
Lung MAP
ALCHEMIST
Exceptional Responders
2015:
NCI-MATCH
2017:
Pediatric-MATCH
NCI-MATCH: Features
[Molecular Analysis for Therapy Choice]
 Foundational treatment/discovery trial;
assigns therapy based on molecular
abnormalities, not site of tumor origin for
patients without available standard
therapy
 Regulatory umbrella for phase II
drugs/studies from > 20 companies;
single agents or combinations
 Available nationwide (2400 sites)
 Accrual began mid-August 2015
 Latest match-rate is 25% with 24 open
arms
 Biopsy results in 13 days, average from
June-Sept 2016
17
Vice President’s
Cancer Moonshot
How do we enable meaningful,
patient-centered and patient-level
data sharing for cancer and
promote access to clinical trials for
all Americans?
18
Cancer Moonshot Outline
• Genomic Data Commons June 6, 2016
• Vice President’s Cancer Moonshot Summit – June 29, 2016
• Rethinking Clinical Trial Search
- Development of Application Programming Interface (API) to NCI’s Clinical Trials
Reporting Program, for use by:
- NCI’s Cancer.gov website
- Third party innovators providing clinical trial content to their communities
• Blue Ribbon Panel recommendations – accepted by the National Cancer Advisory
Board on September 7th, 2016
• Cancer Moonshot Task Force and BRP recommendations sent to President on October
17th, 2016 https://www.cancer.gov/research/key-initiatives/moonshot-cancer-
initiative/milestones
• https://cancer.gov/brp
19
https://cancer.gov/brp
Blue Ribbon Panel Recommendations
 Network for Direct Patient Engagement
 Cancer Immunotherapy Translational Science Network
 Therapeutic Target Identification to Overcome Drug Resistance
 A National Cancer Data Ecosystem for Sharing and Analysis
 Fusion Oncoproteins in Childhood Cancers
 Symptom Management Research
 Prevention and Early Detection – Implementation of Evidence-based Approaches
 Retrospective Analysis of Biospecimens from Patients Treated with Standard of Care
 Generation of 4D Human Tumor Atlas
 Development of New Enabling Cancer Technologies
21
The Cancer Genomic Data Commons
(GDC) is an existing effort to standardize
and simplify submission of genomic data
to NCI and follow the principles of FAIR
– Findable, Accessible, Attributable,
Interoperable, Reusable, and Provide
Recognition.
The GDC is part of the NIH Big Data to
Knowledge (BD2K) initiative and an
example of the NIH Commons
Genomic Data Commons
Microattribution, nanopublications, tracking the use of
data, annotation of data, use of algorithms, supports
the data /software /metadata life cycle to provide
credit and analyze impact of data, software, analytics,
algorithm, curation and knowledge sharing
Force11 white paper
https://www.force11.org/group/fairgroup/fairprinciples
NCI Genomic Data Commons
 The GDC went live on June 6, 2016 with approximately 4.1 PB
of data.
 This includes: 2.6 PB of legacy data;
 and 1.5 PB of “harmonized” data.
 577,878 files about 14194 cases (patients), in 42 cancer types,
across 29 primary sites.
 10 major data types, ranging from Raw Sequencing Data, Raw
Microarray Data, to Copy Number Variation, Simple Nucleotide
Variation and Gene Expression.
 Data are derived from 17 different experimental strategies, with
the major ones being RNA-Seq, WXS, WGS, miRNA-Seq,
Genotyping Array and Expression Array.
 Foundation Medicine announced the release of 18,000
genomic profiles to the GDC at the Cancer Moonshot Summit.
Converged IT Summit - NCI Data Sharing
24
25
Genomic Data Commons Data Portal
https://gdc.cancer.gov
The NCI Genomic Data Commons User Interface
Home Page
The NCI Genomic Data Commons User Interface
Sample Browser
29
Clinical data Biospecimen
data
Molecular data Files
uploaded
The NCI Genomic Data Commons User Interface
Data Submission Dashboard
Development of the NCI Genomic Data Commons (GDC)
To Foster the Molecular Diagnosis and Treatment of Cancer
GDC
Bob Grossman PI
Univ. of Chicago
Ontario Inst. Cancer Res.
Leidos
Institute of Medicine
Towards Precision Medicine
2011
The Mutational Burden of Human Cancer
Mike Lawrence and Gaddy Getz, Broad Institute: Nature 2014
Increasing genomic
complexity
Childhood
cancers
Known Carcinogens
32
Example
publication
from
TCGA
Converged IT Summit - NCI Data Sharing
Converged IT Summit - NCI Data Sharing
Discovery of Cancer Drivers With 2% Prevalence
Lung adeno.
+ 2,900
Colorectal
+ 1,200
Ovarian
+ 500
Lawrence et al, Nature 2014
Power Calculation for Cancer Driver Discovery
Need to resequence >100,000 tumors to
identify all cancer drivers at >2% prevalence
GDC Infrastructure and Functionality
Data
Submitters
Open
Access
Users
Controlled
Access
Users
eRA
Commons
& dbGaP
Open Access
Data
Metadata+Data
Storage
Reporting
System
Harmonization
GDC Users GDC System Components
Data
Submission
Data Security
System
APIsDigital ID
System
Controlled
Access Data
Exome-seq
Whole genome-seq
RNA-seq
Copy number
Genome
alignment
Genome
alignment
Genome
alignment
Data
segmentation
1° processing
Mutations
Mutations +
structural variants
Digital gene
expression
Copy number
calls
2° processing
Oncogene vs.
Tumor suppressor
Translocations
Relative RNA levels
Alternative splicing
Gene amplification/
deletion
3° processing
GDC Data Harmonization
Multiple data types and levels of processing
Mutect2
pipeline
GDC Data Harmonization
Open Source, Dockerized Pipelines
Recovery
rate
(% true
positives) A0F0
SomaticSniper 81.1% 76.5%
VarScan 93.9% 84.3%
MuSE 93.1% 87.3%
All Three 96.4% 91.2%
GDC variant calling
pipelines
Wash U
Baylor
Broad
GDC Data Harmonization
Multiple pipelines needed to recover all variants
GDC Content
 TCGA 11,353 cases
 TARGET 3,178 cases
Current
 Foundation Medicine 18,000 cases
 Cancer studies in dbGAP ~4,000 cases
Coming soon
 NCI-MATCH ~5,000 cases
 Clinical Trial Sequencing Program ~3,000 cases
Planned (1-3 years)
 Cancer Driver Discovery Program ~5,000 cases
 Human Cancer Model Initiative ~1,000 cases
 APOLLO – VA-DoD ~8,000 cases
~58,000 cases
Towards a Cancer Knowledge System
 Continue genomic investigations of cancer
• Need > 100,000 cases analyzed
• Embrace all genomic platforms
• Relationship of relapse and primary biopsies
 Incorporate associated clinical annotations
• Clinical trial data
• Observational, longitudinal standard-of-care data
• N-of-1 clinical data
 Promote and curate biological investigations of cancer genetic
variants
• Driver vs. passenger mutations
• Multiple phenotypic assays
• Alterations in regulatory pathways – proteomics
• Mechanisms of therapeutic resistance
• Functional genomic investigations
 Integrative models for high-dimensional data
Genomic
Data
Commons
GDC
Utility of a Cancer Knowledge System
Identify
low-frequency
cancer drivers
Define genomic
determinants of response
to therapy
Compose clinical trial
cohorts sharing
targeted genetic lesions
Cancer
information
donors
Genomic
Data
Commons
43
Support the Precision Medicine Initiative
• Expand data model to include
other data (e.g. imaging and
proteomics)
• Allow easy publication of
persistent links to data,
annotations, algorithms, tools,
workflows
• Measure usage and impact
• Change incentives for public
contributions
The Genomic Data Commons and Cloud Pilots
44
PMI – Oncology, the GDC and the Cloud Pilots Goals
 Support precision medicine-focused clinical research
 Enable researchers to deposit well-annotated
(Interoperable) genomic data sets with the GDC
 Provide a single source (and single dbGaP access
request!) to Find and Access these data
 Enable effective analysis and meta-analysis of these data
without requiring local downloads – data Reuse
 Understand Contributions, Assess value through usage,
and give Attribution to all users
45
PMI – Oncology, the GDC and the Cloud Pilots Goals
 Provide a data integration platform to allow multiple data
types, multi-scalar data, temporal data from cancer models
and patients through open APIs
 Work with the Global Alliance for Genomics and Health
(GA4GH) to define the next generation of secure,
flexible, meaningful, interoperable, lightweight
interfaces – open APIs
 Engage the cancer research community in evaluating
the open APIs for ease of use and effectiveness
Cancer Research Data Ecosystem – Cancer Moonshot BRP
Well characterized
research data sets Cancer cohorts Patient data
EHR, Lab Data, Imaging,
PROs, Smart Devices,
Decision Support
Learning from every
cancer patient
Active research
participation
Research information
donor
Clinical Research
Observational studies
Proteogenomics
Imaging data
Clinical trials
Discovery
Patient engaged
Research
Surveillance
Big Data
Implementation research
SEERGDC
GDC Acknowledgements
NCI Center for Cancer Genomics Univ. of Chicago
Bob Grossman
Allison Heath
Mike Ford
Zhenyu Zhang
Ontario Institute for Cancer Research
Lou Staudt
Zhining Wang
Martin Ferguson
JC Zenklusen
Daniela Gerhard
Deb Steverson
Vincent Ferretti
'Francois Gerthoffert
JunJun Zhang
Leidos Biomedical Research
Mark Jensen
Sharon Gaheen
Himanso Sahni
NCI NCI CBIIT
Tony Kerlavage
Tanya Davidsen
CGC Pilot Team Principal Investigators
• Gad Getz, Ph.D - Broad Institute - http://firecloud.org
• Ilya Shmulevich, Ph.D - ISB - http://cgc.systemsbiology.net/
• Deniz Kural, Ph.D - Seven Bridges – http://www.cancergenomicscloud.org
NCI Project Officer & CORs
• Anthony Kerlavage, Ph.D –Project Officer
• Juli Klemm, Ph.D – COR, Broad Institute
• Tanja Davidsen, Ph.D – COR, Institute for Systems Biology
• Ishwar Chandramouliswaran, MS, MBA – COR, Seven Bridges Genomics
GDC Principal Investigator
• Robert Grossman, Ph.D - University of Chicago
• Allison Heath, Ph.D - University of Chicago
• Vincent Ferretti, Ph.D - Ontario Institute for Cancer Research
Cancer Genomics Project Teams
NCI Leadership Team
• Doug Lowy, M.D.
• Lou Staudt, M.D., Ph.D.
• Stephen Chanock, M.D.
• George Komatsoulis, Ph.D.
• Warren Kibbe, Ph.D.
Center for Cancer Genomics Partners
• JC Zenklusen, Ph.D.
• Daniela Gerhard, Ph.D.
• Zhining Wang, Ph.D.
• Liming Yang, Ph.D.
• Martin Ferguson, Ph.D.
49
Rethinking Clinical Trials Search
Thanks to: PIFs, various NCI teams (CBIIT, CCCT, OCPL)
50
Improving Cancer Clinical Trials Search
 Engage Presidential Innovation Fellows
 Create an Application Programming Interface (API) for clinical trials
 Engage with third-party innovators to guide the API development
process
 Utilize the API on NCI’s website, cancer.gov
*An API is a set of protocols that provides communication between a software application
and a computer operating system or between applications
51
 The API (https://clinicaltrialsapi.cancer.gov/v1/) makes publicly available
trial registration information from NCI’s clinical trial data base (Clinical Trial
Reporting Program, CTRP) available to third-party innovators
 Enables innovators to build new tools tailored to the clinical trial search need
of their users
 Expands opportunities for patients and doctors to locate cancer trials
 Cancer.gov now utilizes the API (http://trials.cancer.gov) , enhancing
clinical trial searching on the NCI website
 The API source code is publicly available on GitHub
(https://github.com/presidential-innovation-fellows/clinical-trials-search)
Cancer Clinical Trials API
52
Converged IT Summit - NCI Data Sharing
54
Questions?
Warren Kibbe, Ph.D.
Warren.kibbe@nih.gov
@wakibbe
www.cancer.gov www.cancer.gov/espanol

More Related Content

PPTX
NCI TEAG Cancer Moonshot Blue Ribbon Panel Presentation Oct 2016
PPTX
Nci clinical genomics data sharing ncra sept 2016
PPTX
A Vision for a Cancer Research Knowledge System
PPTX
CI4CC Moonshot Blue Ribbon Panel Report 20161010
PPTX
Cancer Moonshot, Data sharing and the Genomic Data Commons
PPTX
NCI Cancer Genomics, Open Science and PMI: FAIR
PPTX
NCI Cancer Genomic Data Commons for NCAB September 2016
PPTX
National Cancer Data Ecosystem and Data Sharing
NCI TEAG Cancer Moonshot Blue Ribbon Panel Presentation Oct 2016
Nci clinical genomics data sharing ncra sept 2016
A Vision for a Cancer Research Knowledge System
CI4CC Moonshot Blue Ribbon Panel Report 20161010
Cancer Moonshot, Data sharing and the Genomic Data Commons
NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomic Data Commons for NCAB September 2016
National Cancer Data Ecosystem and Data Sharing

What's hot (20)

PPTX
Cancer moonshot and data sharing
PPTX
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
PPTX
SuperComputing 16 HPC Matters Panel on Precision Medicine
PPTX
Precision Medicine in the Age of NCI MATCH and the Beau Biden Cancer Moonshot
PPTX
ISCB ECCB BD2K keynote Kibbe 201707
PPT
Kibbe One Voice Against Cancer 20170605
PPTX
NCI Cancer Imaging Program - Cancer Research Data Ecosystem
PPTX
C-Change Cancer Big Data, NCI Genomic Data Commons, Cloud Pilots
PPTX
DOE-NCI Pilots presentation at the Frederick National Laboratory Advisory Com...
PPTX
Precision oncology ncabr kibbe oct 2017
PPTX
Data Commons & Data Science Workshop
PPTX
FDA NGS and Big Data Conference September 2014
PPTX
NCI Support for Cancer Data Sharing
PPTX
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...
PPTX
Frontiers of Predictive Oncology and Computing
PPTX
Role of data in precision oncology
PPTX
Oncoset symposium kibbe 201704
PPTX
ICBO 2014, October 8, 2014
PPTX
Federal Research & Development for the Florida system Sept 2014
PPTX
LLS Southern California Blood Cancer Conference, March 4, 2017
Cancer moonshot and data sharing
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meeting
SuperComputing 16 HPC Matters Panel on Precision Medicine
Precision Medicine in the Age of NCI MATCH and the Beau Biden Cancer Moonshot
ISCB ECCB BD2K keynote Kibbe 201707
Kibbe One Voice Against Cancer 20170605
NCI Cancer Imaging Program - Cancer Research Data Ecosystem
C-Change Cancer Big Data, NCI Genomic Data Commons, Cloud Pilots
DOE-NCI Pilots presentation at the Frederick National Laboratory Advisory Com...
Precision oncology ncabr kibbe oct 2017
Data Commons & Data Science Workshop
FDA NGS and Big Data Conference September 2014
NCI Support for Cancer Data Sharing
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...
Frontiers of Predictive Oncology and Computing
Role of data in precision oncology
Oncoset symposium kibbe 201704
ICBO 2014, October 8, 2014
Federal Research & Development for the Florida system Sept 2014
LLS Southern California Blood Cancer Conference, March 4, 2017
Ad

Viewers also liked (20)

PPTX
Genomics and Computation in Precision Medicine March 2017
PPTX
Taking Splunk to the Next Level - Architecture
PDF
Use of an EMR-based Registry to Support Clinical Research
PPTX
Thoughts on a research platform architecture: Simplify your application portf...
PPTX
City of hope research informatics common data elements
PPTX
From Clinical Decision Support to Precision Medicine
PDF
N.Dip Certificate.PDF
PPTX
Intern Orientation Survey Results Presentation and survey results
PPT
18.ago esmeralda 15.30_368_cgtf
PDF
Комплексный характер перевода и оценивания профессиональных компетенций и его...
PPT
18.ago ametista 11.15_224_cemig-d
PDF
Precision Medicine in Oncology Informatics
PPTX
Web 2
PPT
18.ago esmeralda 12.00_206_cesp
PPT
Ice technology definitiva
PDF
6 t130g formation-ibm-websphere-transformation-extender-v8-3-fundamentals
DOCX
Notebook 2 parcial
PPT
18.ago ouro i 15.45_406_eln
PPT
17.ago ouro i 12.20_438_aeselpa
DOCX
Genomics and Computation in Precision Medicine March 2017
Taking Splunk to the Next Level - Architecture
Use of an EMR-based Registry to Support Clinical Research
Thoughts on a research platform architecture: Simplify your application portf...
City of hope research informatics common data elements
From Clinical Decision Support to Precision Medicine
N.Dip Certificate.PDF
Intern Orientation Survey Results Presentation and survey results
18.ago esmeralda 15.30_368_cgtf
Комплексный характер перевода и оценивания профессиональных компетенций и его...
18.ago ametista 11.15_224_cemig-d
Precision Medicine in Oncology Informatics
Web 2
18.ago esmeralda 12.00_206_cesp
Ice technology definitiva
6 t130g formation-ibm-websphere-transformation-extender-v8-3-fundamentals
Notebook 2 parcial
18.ago ouro i 15.45_406_eln
17.ago ouro i 12.20_438_aeselpa
Ad

Similar to Converged IT Summit - NCI Data Sharing (17)

PPTX
Cancer Research Data Ecosystem - Dr. Warren Kibbe
PPTX
Keynote at NVIDIA GPU Technology Conference in D.C.
PPTX
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...
PPTX
PPTX
CCDI Kibbe Big Data Training May 2022
PPTX
Advancing Convergence and Innovation in Cancer Research
PPTX
Childhood Cancer Data Initiative presentation to the Children’s Brain Tumor N...
PPTX
Super computing 19 Cancer Computing Workshop Keynote
PDF
Can SAR Database: An Overview on System, Role and Application
PPTX
US Federal Cancer Moonshot- One Year Later
PPTX
CCDI Kibbe Wake Forest University Dec 2023.pptx
PDF
TCGC The Clinical Genome Conference 2015
PPTX
Participation of the population in decisions about their health and in the pr...
PPTX
Enabling Translational Medicine with e-Science
PPTX
CLQ Overview Deck
Cancer Research Data Ecosystem - Dr. Warren Kibbe
Keynote at NVIDIA GPU Technology Conference in D.C.
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...
CCDI Kibbe Big Data Training May 2022
Advancing Convergence and Innovation in Cancer Research
Childhood Cancer Data Initiative presentation to the Children’s Brain Tumor N...
Super computing 19 Cancer Computing Workshop Keynote
Can SAR Database: An Overview on System, Role and Application
US Federal Cancer Moonshot- One Year Later
CCDI Kibbe Wake Forest University Dec 2023.pptx
TCGC The Clinical Genome Conference 2015
Participation of the population in decisions about their health and in the pr...
Enabling Translational Medicine with e-Science
CLQ Overview Deck

More from Warren Kibbe (20)

PPTX
Big Data Training for Cancer Research, Purdue, May 2023
PPTX
CCDI Overview November 2022
PPTX
RADx-UP CDCC Overview November 2022
PPTX
Real world data, the National COVID-19 Cohort Consortium, and Oncology 2021
PPTX
RADx-UP CDCC presentation for the NIH Disaster Interest Group
PPTX
DCHI webinar on N3C January 2021
PPTX
NCATS CTSA N3C
PPTX
NAACCR June 2020
PPTX
NCI HTAN, cancer trajectories, precision oncology
PPTX
ENAR 2020
PPTX
ENAR 2020
PPTX
Technology and connected health for population science kibbe duke jan 2020
PPTX
Data Harmonization for a Molecularly Driven Health System
PPTX
Data supporting precision oncology fda wakibbe
PPTX
Data Harmonization for a Molecularly Driven Health System
PPTX
Data sharing Webinar March 2019
PPTX
Data in precision oncology SAMSI Precision Medicine Meeting mar 2019
PPTX
Opportunities for computing in cancer research
PPTX
Opportunities in technology and connected health for population science
PPTX
HPC, Machine Learning, and Big Data
Big Data Training for Cancer Research, Purdue, May 2023
CCDI Overview November 2022
RADx-UP CDCC Overview November 2022
Real world data, the National COVID-19 Cohort Consortium, and Oncology 2021
RADx-UP CDCC presentation for the NIH Disaster Interest Group
DCHI webinar on N3C January 2021
NCATS CTSA N3C
NAACCR June 2020
NCI HTAN, cancer trajectories, precision oncology
ENAR 2020
ENAR 2020
Technology and connected health for population science kibbe duke jan 2020
Data Harmonization for a Molecularly Driven Health System
Data supporting precision oncology fda wakibbe
Data Harmonization for a Molecularly Driven Health System
Data sharing Webinar March 2019
Data in precision oncology SAMSI Precision Medicine Meeting mar 2019
Opportunities for computing in cancer research
Opportunities in technology and connected health for population science
HPC, Machine Learning, and Big Data

Recently uploaded (20)

PDF
Plant-Based Antimicrobials: A New Hope for Treating Diarrhea in HIV Patients...
PPT
neurology Member of Royal College of Physicians (MRCP).ppt
PDF
OSCE SERIES ( Questions & Answers ) - Set 5.pdf
PPTX
IMAGING EQUIPMENiiiiìiiiiiTpptxeiuueueur
PPTX
preoerative assessment in anesthesia and critical care medicine
PDF
MNEMONICS MNEMONICS MNEMONICS MNEMONICS s
PDF
04 dr. Rahajeng - dr.rahajeng-KOGI XIX 2025-ed1.pdf
PDF
The_EHRA_Book_of_Interventional Electrophysiology.pdf
PPTX
09. Diabetes in Pregnancy/ gestational.pptx
PPTX
Hearthhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
PDF
Calcified coronary lesions management tips and tricks
PPTX
Neoplasia III.pptxjhghgjhfj fjfhgfgdfdfsrbvhv
DOCX
PEADIATRICS NOTES.docx lecture notes for medical students
PDF
AGE(Acute Gastroenteritis)pdf. Specific.
PPTX
Reading between the Rings: Imaging in Brain Infections
PDF
Comparison of Swim-Up and Microfluidic Sperm Sorting.pdf
PDF
OSCE Series ( Questions & Answers ) - Set 6.pdf
PDF
OSCE SERIES - Set 7 ( Questions & Answers ).pdf
PDF
Transcultural that can help you someday.
PPTX
Effects of lipid metabolism 22 asfelagi.pptx
Plant-Based Antimicrobials: A New Hope for Treating Diarrhea in HIV Patients...
neurology Member of Royal College of Physicians (MRCP).ppt
OSCE SERIES ( Questions & Answers ) - Set 5.pdf
IMAGING EQUIPMENiiiiìiiiiiTpptxeiuueueur
preoerative assessment in anesthesia and critical care medicine
MNEMONICS MNEMONICS MNEMONICS MNEMONICS s
04 dr. Rahajeng - dr.rahajeng-KOGI XIX 2025-ed1.pdf
The_EHRA_Book_of_Interventional Electrophysiology.pdf
09. Diabetes in Pregnancy/ gestational.pptx
Hearthhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Calcified coronary lesions management tips and tricks
Neoplasia III.pptxjhghgjhfj fjfhgfgdfdfsrbvhv
PEADIATRICS NOTES.docx lecture notes for medical students
AGE(Acute Gastroenteritis)pdf. Specific.
Reading between the Rings: Imaging in Brain Infections
Comparison of Swim-Up and Microfluidic Sperm Sorting.pdf
OSCE Series ( Questions & Answers ) - Set 6.pdf
OSCE SERIES - Set 7 ( Questions & Answers ).pdf
Transcultural that can help you someday.
Effects of lipid metabolism 22 asfelagi.pptx

Converged IT Summit - NCI Data Sharing

  • 1. Cancer Moonshot, Data Sharing, Genomic Data Commons October 26th, 2016 Warren Kibbe, PhD Warren.kibbe@nih.gov @wakibbe
  • 2. 2 To develop the knowledge base that will lessen the burden of cancer in the United States and around the world. NCI Mission
  • 3. 3 Cancer Statistics In 2015 there will be an estimated 1,700,000 new cancer cases and 600,000 cancer deaths - American Cancer Society 2015 Cancer remains the second most common cause of death in the U.S. - Centers for Disease Control and Prevention 2015
  • 4. 4 Understanding Cancer  Precision medicine will lead to fundamental understanding of the complex interplay between genetics, epigenetics, nutrition, environment and clinical presentation and direct effective, evidence-based prevention and treatment.
  • 5. Cancer Moonshot: Genomic Data Commons, Data sharing, Ecosystem  The era of precision oncology is predicated on the integration of research, care, and molecular medicine and the availability of data for modeling, risk analysis, and optimal care The Genomic Data Commons is a completely new way of making scientific data available to the cancer research community
  • 6. 6 Cancer Data Sharing & Data Commons • Making data available for discovery, validation, new therapies • Working toward a National Learning Health System for Cancer • Maximizing the impact, reuse, and reproducibility of cancer research • We need to change the incentives for data sharing Reduce the risk, improve early detection, outcomes and survivorship in cancer
  • 7. 7 FAIR – Making data Findable, Accessible, Attributable, Interoperable, Reusable, and provide Recognition Force11 white paper https://www.force11.org/group/fairgroup/fairprinciples
  • 8. 8 NIH Genomic Data Sharing Policy https://gds.nih.gov/ Went into effect January 25, 2015 NCI guidance: http://www.cancer.gov/grants-training/grants- management/nci-policies/genomic-data Requires public sharing of genomic data sets
  • 9. 9 Changing the conversation around data sharing  How do we find data, software, standards?  How can we make data, annotations, software, metadata accessible?  How do we reuse data standards  How do we make more data machine readable? NIH Data Commons NCI Genomic Data Commons Data commons co-locate data, storage and computing infrastructure, and commonly used tools for analyzing and sharing data to create an interoperable resource for the research community. *Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson, A Case for Data Commons Towards Data Science as a Service, to appear. Source of image: Interior of one of Google’s Data Center, www.google.com/about/datacenters/.
  • 10. Cancer Research Data Ecosystem – Cancer Moonshot BRP Well characterized research data sets Cancer cohorts Patient data EHR, Lab Data, Imaging, PROs, Smart Devices, Decision Support Learning from every cancer patient Active research participation Research information donor Clinical Research Observational studies Proteogenomics Imaging data Clinical trials Discovery Patient engaged Research Surveillance Big Data Implementation research SEERGDC
  • 11. 11 Cancer Research Data Commons Ecosystem Genomic Data Commons Data Standards Validation and Harmonization Imaging Data Commons Proteomics Data Commons Clinical Data Commons (Cohorts / Indiv.) SEER (Populations) Data Contributors and Consumers Researchers PatientsClinician s Institutions
  • 12. 12 (10,000+ patient tumors and increasing) Courtesy of P. Kuhn (USC) 2006-2015: A Decade of Illuminating the Underlying Causes of Primary Untreated Tumors Omics Characterization
  • 13. 13 1997 20152001 20051998 20142002 20061999 2003 20112000 2004 2007 20122008 20132009 2010 10/23/2001 (~5 yrs old) 4/21/1997 1/9/2007 (~10 yrs old) iPod (10GB max) WinAMP(mp3) iPhone (EDGE, 16 GB max) 9/16/1999 (~3 yrs old) 802.11b WiFi 4/3/2010 (~13 yrs old) iPad (EDGE, 64 GB max) 4/23/2005 (~8 yrs old) 9/26/2006 (~9 yrs old) 7/15/2006 2/7/2007 Google Drive 4/24/2012 (~15 yrs old)7/11/2008 (~11 yrs old) iPhone 3G (16 GB max) 9/12/2012 (~15 yrs old) iPhone5 (LTE, 128 GB max) Google Baseline 7/14/2014 (~17 yrs old) 3/9/2015 (~18 yrs old) Digital technology is changing rapidly
  • 15. 1518 Application of Cancer Genomics is changing
  • 16. 16 Precision Oncology Trials Launched 2014: MPACT Lung MAP ALCHEMIST Exceptional Responders 2015: NCI-MATCH 2017: Pediatric-MATCH NCI-MATCH: Features [Molecular Analysis for Therapy Choice]  Foundational treatment/discovery trial; assigns therapy based on molecular abnormalities, not site of tumor origin for patients without available standard therapy  Regulatory umbrella for phase II drugs/studies from > 20 companies; single agents or combinations  Available nationwide (2400 sites)  Accrual began mid-August 2015  Latest match-rate is 25% with 24 open arms  Biopsy results in 13 days, average from June-Sept 2016
  • 17. 17 Vice President’s Cancer Moonshot How do we enable meaningful, patient-centered and patient-level data sharing for cancer and promote access to clinical trials for all Americans?
  • 18. 18 Cancer Moonshot Outline • Genomic Data Commons June 6, 2016 • Vice President’s Cancer Moonshot Summit – June 29, 2016 • Rethinking Clinical Trial Search - Development of Application Programming Interface (API) to NCI’s Clinical Trials Reporting Program, for use by: - NCI’s Cancer.gov website - Third party innovators providing clinical trial content to their communities • Blue Ribbon Panel recommendations – accepted by the National Cancer Advisory Board on September 7th, 2016 • Cancer Moonshot Task Force and BRP recommendations sent to President on October 17th, 2016 https://www.cancer.gov/research/key-initiatives/moonshot-cancer- initiative/milestones • https://cancer.gov/brp
  • 20. Blue Ribbon Panel Recommendations  Network for Direct Patient Engagement  Cancer Immunotherapy Translational Science Network  Therapeutic Target Identification to Overcome Drug Resistance  A National Cancer Data Ecosystem for Sharing and Analysis  Fusion Oncoproteins in Childhood Cancers  Symptom Management Research  Prevention and Early Detection – Implementation of Evidence-based Approaches  Retrospective Analysis of Biospecimens from Patients Treated with Standard of Care  Generation of 4D Human Tumor Atlas  Development of New Enabling Cancer Technologies
  • 21. 21 The Cancer Genomic Data Commons (GDC) is an existing effort to standardize and simplify submission of genomic data to NCI and follow the principles of FAIR – Findable, Accessible, Attributable, Interoperable, Reusable, and Provide Recognition. The GDC is part of the NIH Big Data to Knowledge (BD2K) initiative and an example of the NIH Commons Genomic Data Commons Microattribution, nanopublications, tracking the use of data, annotation of data, use of algorithms, supports the data /software /metadata life cycle to provide credit and analyze impact of data, software, analytics, algorithm, curation and knowledge sharing Force11 white paper https://www.force11.org/group/fairgroup/fairprinciples
  • 22. NCI Genomic Data Commons  The GDC went live on June 6, 2016 with approximately 4.1 PB of data.  This includes: 2.6 PB of legacy data;  and 1.5 PB of “harmonized” data.  577,878 files about 14194 cases (patients), in 42 cancer types, across 29 primary sites.  10 major data types, ranging from Raw Sequencing Data, Raw Microarray Data, to Copy Number Variation, Simple Nucleotide Variation and Gene Expression.  Data are derived from 17 different experimental strategies, with the major ones being RNA-Seq, WXS, WGS, miRNA-Seq, Genotyping Array and Expression Array.  Foundation Medicine announced the release of 18,000 genomic profiles to the GDC at the Cancer Moonshot Summit.
  • 24. 24
  • 25. 25
  • 26. Genomic Data Commons Data Portal https://gdc.cancer.gov
  • 27. The NCI Genomic Data Commons User Interface Home Page
  • 28. The NCI Genomic Data Commons User Interface Sample Browser
  • 29. 29 Clinical data Biospecimen data Molecular data Files uploaded The NCI Genomic Data Commons User Interface Data Submission Dashboard
  • 30. Development of the NCI Genomic Data Commons (GDC) To Foster the Molecular Diagnosis and Treatment of Cancer GDC Bob Grossman PI Univ. of Chicago Ontario Inst. Cancer Res. Leidos Institute of Medicine Towards Precision Medicine 2011
  • 31. The Mutational Burden of Human Cancer Mike Lawrence and Gaddy Getz, Broad Institute: Nature 2014 Increasing genomic complexity Childhood cancers Known Carcinogens
  • 35. Discovery of Cancer Drivers With 2% Prevalence Lung adeno. + 2,900 Colorectal + 1,200 Ovarian + 500 Lawrence et al, Nature 2014 Power Calculation for Cancer Driver Discovery Need to resequence >100,000 tumors to identify all cancer drivers at >2% prevalence
  • 36. GDC Infrastructure and Functionality Data Submitters Open Access Users Controlled Access Users eRA Commons & dbGaP Open Access Data Metadata+Data Storage Reporting System Harmonization GDC Users GDC System Components Data Submission Data Security System APIsDigital ID System Controlled Access Data
  • 37. Exome-seq Whole genome-seq RNA-seq Copy number Genome alignment Genome alignment Genome alignment Data segmentation 1° processing Mutations Mutations + structural variants Digital gene expression Copy number calls 2° processing Oncogene vs. Tumor suppressor Translocations Relative RNA levels Alternative splicing Gene amplification/ deletion 3° processing GDC Data Harmonization Multiple data types and levels of processing
  • 38. Mutect2 pipeline GDC Data Harmonization Open Source, Dockerized Pipelines
  • 39. Recovery rate (% true positives) A0F0 SomaticSniper 81.1% 76.5% VarScan 93.9% 84.3% MuSE 93.1% 87.3% All Three 96.4% 91.2% GDC variant calling pipelines Wash U Baylor Broad GDC Data Harmonization Multiple pipelines needed to recover all variants
  • 40. GDC Content  TCGA 11,353 cases  TARGET 3,178 cases Current  Foundation Medicine 18,000 cases  Cancer studies in dbGAP ~4,000 cases Coming soon  NCI-MATCH ~5,000 cases  Clinical Trial Sequencing Program ~3,000 cases Planned (1-3 years)  Cancer Driver Discovery Program ~5,000 cases  Human Cancer Model Initiative ~1,000 cases  APOLLO – VA-DoD ~8,000 cases ~58,000 cases
  • 41. Towards a Cancer Knowledge System  Continue genomic investigations of cancer • Need > 100,000 cases analyzed • Embrace all genomic platforms • Relationship of relapse and primary biopsies  Incorporate associated clinical annotations • Clinical trial data • Observational, longitudinal standard-of-care data • N-of-1 clinical data  Promote and curate biological investigations of cancer genetic variants • Driver vs. passenger mutations • Multiple phenotypic assays • Alterations in regulatory pathways – proteomics • Mechanisms of therapeutic resistance • Functional genomic investigations  Integrative models for high-dimensional data Genomic Data Commons
  • 42. GDC Utility of a Cancer Knowledge System Identify low-frequency cancer drivers Define genomic determinants of response to therapy Compose clinical trial cohorts sharing targeted genetic lesions Cancer information donors Genomic Data Commons
  • 43. 43 Support the Precision Medicine Initiative • Expand data model to include other data (e.g. imaging and proteomics) • Allow easy publication of persistent links to data, annotations, algorithms, tools, workflows • Measure usage and impact • Change incentives for public contributions The Genomic Data Commons and Cloud Pilots
  • 44. 44 PMI – Oncology, the GDC and the Cloud Pilots Goals  Support precision medicine-focused clinical research  Enable researchers to deposit well-annotated (Interoperable) genomic data sets with the GDC  Provide a single source (and single dbGaP access request!) to Find and Access these data  Enable effective analysis and meta-analysis of these data without requiring local downloads – data Reuse  Understand Contributions, Assess value through usage, and give Attribution to all users
  • 45. 45 PMI – Oncology, the GDC and the Cloud Pilots Goals  Provide a data integration platform to allow multiple data types, multi-scalar data, temporal data from cancer models and patients through open APIs  Work with the Global Alliance for Genomics and Health (GA4GH) to define the next generation of secure, flexible, meaningful, interoperable, lightweight interfaces – open APIs  Engage the cancer research community in evaluating the open APIs for ease of use and effectiveness
  • 46. Cancer Research Data Ecosystem – Cancer Moonshot BRP Well characterized research data sets Cancer cohorts Patient data EHR, Lab Data, Imaging, PROs, Smart Devices, Decision Support Learning from every cancer patient Active research participation Research information donor Clinical Research Observational studies Proteogenomics Imaging data Clinical trials Discovery Patient engaged Research Surveillance Big Data Implementation research SEERGDC
  • 47. GDC Acknowledgements NCI Center for Cancer Genomics Univ. of Chicago Bob Grossman Allison Heath Mike Ford Zhenyu Zhang Ontario Institute for Cancer Research Lou Staudt Zhining Wang Martin Ferguson JC Zenklusen Daniela Gerhard Deb Steverson Vincent Ferretti 'Francois Gerthoffert JunJun Zhang Leidos Biomedical Research Mark Jensen Sharon Gaheen Himanso Sahni NCI NCI CBIIT Tony Kerlavage Tanya Davidsen
  • 48. CGC Pilot Team Principal Investigators • Gad Getz, Ph.D - Broad Institute - http://firecloud.org • Ilya Shmulevich, Ph.D - ISB - http://cgc.systemsbiology.net/ • Deniz Kural, Ph.D - Seven Bridges – http://www.cancergenomicscloud.org NCI Project Officer & CORs • Anthony Kerlavage, Ph.D –Project Officer • Juli Klemm, Ph.D – COR, Broad Institute • Tanja Davidsen, Ph.D – COR, Institute for Systems Biology • Ishwar Chandramouliswaran, MS, MBA – COR, Seven Bridges Genomics GDC Principal Investigator • Robert Grossman, Ph.D - University of Chicago • Allison Heath, Ph.D - University of Chicago • Vincent Ferretti, Ph.D - Ontario Institute for Cancer Research Cancer Genomics Project Teams NCI Leadership Team • Doug Lowy, M.D. • Lou Staudt, M.D., Ph.D. • Stephen Chanock, M.D. • George Komatsoulis, Ph.D. • Warren Kibbe, Ph.D. Center for Cancer Genomics Partners • JC Zenklusen, Ph.D. • Daniela Gerhard, Ph.D. • Zhining Wang, Ph.D. • Liming Yang, Ph.D. • Martin Ferguson, Ph.D.
  • 49. 49 Rethinking Clinical Trials Search Thanks to: PIFs, various NCI teams (CBIIT, CCCT, OCPL)
  • 50. 50 Improving Cancer Clinical Trials Search  Engage Presidential Innovation Fellows  Create an Application Programming Interface (API) for clinical trials  Engage with third-party innovators to guide the API development process  Utilize the API on NCI’s website, cancer.gov *An API is a set of protocols that provides communication between a software application and a computer operating system or between applications
  • 51. 51  The API (https://clinicaltrialsapi.cancer.gov/v1/) makes publicly available trial registration information from NCI’s clinical trial data base (Clinical Trial Reporting Program, CTRP) available to third-party innovators  Enables innovators to build new tools tailored to the clinical trial search need of their users  Expands opportunities for patients and doctors to locate cancer trials  Cancer.gov now utilizes the API (http://trials.cancer.gov) , enhancing clinical trial searching on the NCI website  The API source code is publicly available on GitHub (https://github.com/presidential-innovation-fellows/clinical-trials-search) Cancer Clinical Trials API
  • 52. 52