SlideShare a Scribd company logo
Canadian Cancer
Research Conference
November 3-6, 2013

Canadian Bioinformatics Workshops
www.bioinformatics.ca
Module #: Title of Module

2
You are free to:
Copy, share, adapt, or re-mix;
Photograph, film, or broadcast;
Blog, live-blog, or post video of;

This presentation. Provided that:
You attribute the work to its author and
respect the rights and licenses associated
with its components.
Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero.
Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at;
http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites

Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1
Cancer Genomic Databases
E-mail
E-mail

francis@oicr.on.ca
@bffo

Module 1: Cancer Genomic Databases

bioinformatics.ca
Schedule for Module 1
Cancer Genomic Databases
•The Databases:
– The International Cancer Genome Consortium (ICGC)
– The Cancer Genome Atlas (TCGA)
– The Catalogue of Somatic Mutations in Cancer (COSMIC)

•Data Access: human genomes and security and
privacy issues, Open vs. Controlled Access data

Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
http://bioinformatics.ca/

Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Workshops planned for 2014:
http://bioinformatics.ca/workshops

1.
2.
3.
4.
5.
6.
7.
8.

Exploratory Analysis of Biological Data using R
Bioinformatics for Cancer Genomics
Informatics for RNA-sequence Analysis
Informatics on High Throughput Sequencing Data
Pathway and Network Analysis of -omics Data
Flow Cytometry Data Analysis using R
Microarray Data Analysis
Informatics and Statistics for Metabolomics

Module 1: Cancer Genomic Databases

bioinformatics.ca
http://bioinformatics.ca/workshops/2013

Module 1: Cancer Genomic Databases

bioinformatics.ca
E-mail: course_info@bioinformatics.ca
Web: http://bioinformatics.ca
Workshop announcement mailing list:
http://bioinformatics.ca/mailman/listinfo/announce

Module 1: Cancer Genomic Databases

bioinformatics.ca
Soap-Box time!
•
•

Open Access, Open Data and Open Source are essential for good
Science.
Openness is a responsibility, an obligation, and something that comes
with the privilege of doing publicly funded work.

Open Source
Open Access
Open Data
Opencourseware
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Cancer therapy is like
beating the dog with
a stick to get rid of
his fleas.
- Anna Deavere Smith,
Let me down easy

Module 1: Cancer Genomic Databases

bioinformatics.ca
http://goo.gl/Yhbsj

Module 1: Cancer Genomic Databases

bioinformatics.ca
The revolution in cancer
research can summed up
in a single sentence:
cancer is in essence,
a genetic disease.
- Bert Vogelstein

Module 1: Cancer Genomic Databases

bioinformatics.ca
Cancer: a Disease of the Genome

Challenge in Treating Cancer:
 Every tumour is different
 Every cancer patient is different
Module 1: Cancer Genomic Databases

bioinformatics.ca
Cancer Genomic Databases

Chin et al, Genes. Dev. 2011 March 15; 25(6): 534-555
http://www.ncbi.nlm.nih.gov/pubmed/?term=21406553

Module 1: Cancer Genomic Databases

bioinformatics.ca
TCGA
The Cancer Genome Atlas is a
comprehensive and coordinated
effort to accelerate our
understanding of the molecular
basis of cancer through the
application of genome analysis
technologies, including largescale genome sequencing.

Module 1: Cancer Genomic Databases

bioinformatics.ca
About the TCGA
•
•
•
•

National Cancer Institute (NCI)
National Human Genome Research
Institute (NHGRI)
Phased Structure:
– Three-year pilot in 2006 with an investment of $50 million
from each
– TCGA will collect and characterize more than 20 additional
tumour types (now at 16)

Module 1: Cancer Genomic Databases

bioinformatics.ca
Where to start with the TCGA?
Wiki: https://wiki.nci.nih.gov/display/TCGA/About+TCGA

Module 1: Cancer Genomic Databases

bioinformatics.ca
Division of Labour
•

Biospecimen Core Resource (BCR)
– centre where samples are carefully catalogued, processed, qualitychecked
and stored along with participant clinical information

•

Genome Sequencing Centre (GSC)
– uses high-throughput methods to identify changes to DNA sequences that are
associated with specific cancer types

•

Genome Characterization Centre (GCC)
– uses high-throughput technologies to analyze genomic changes involved in cancer

•

Genome Data Analysis Centre (GDAC)
– provides novel informatics tools to the research community

•

– provides analysis results using TCGA data.
Data Coordinating Centre (DCC)
– Central provider of TCGA data.

– Standardizes data formats and validates submitted data.

Module 1: Cancer Genomic Databases

bioinformatics.ca
TCGA Data
• Sequence reads from newer sequencing
technologies are available at the Cancer Genome
Hub: https://cghub.ucsc.edu/
• Higher level sequence data (variation calls and
abundance measures) are available at the TCGA
Portal: http://cancergenome.nih.gov/

Module 1: Cancer Genomic Databases

bioinformatics.ca
TCGA data flow

http://goo.gl/b5nojx

Module 1: Cancer Genomic Databases

bioinformatics.ca
Data Coordinating Centre
• Play a central role
– Receiving data from BCR, GSC and GCC sites
– Providing access to users
– Performing analysis of data

• Responsibilities:
–
–
–
–

Protecting participant privacy and confidentiality
Developing data standards and controlled vocabularies
Establishing informatics pipelines for data flow
Developing new analytical and visualization technologies
to facilitate data analysis, for all audiences

Module 1: Cancer Genomic Databases

bioinformatics.ca
TCGA DCC Data Portal
• Provides a platform to search, download and
analyze TCGA data sets
• Two data access tiers: Open and Controlled
• Analytic tools include: Cancer Molecular Analysis
and Cancer Genome Workbench (NCBIB),
Integrative Genomics Viewer (Broad) and
CancerGenomics Analysis (MSKCC).

Module 1: Cancer Genomic Databases

bioinformatics.ca
TCGA Data Browser
https://tcga-data.nci.nih.gov/tcga/
Query TCGA
data online
using the
TCGA Data
Browser

Module 1: Cancer Genomic Databases

bioinformatics.ca
The International Cancer Genome Consortium
(ICGC)

• http://www.icgc.org/
• “ICGC was launched
to coordinate largescale cancer genome
studies in tumours
from 50 different
cancer types and/or
subtypes that are of
clinical and societal
importance across
the globe”

Module 1: Cancer Genomic Databases

bioinformatics.ca
ICGC

BAM/FASTQ

ICGC

Open
Data
(includes
TCGA
Open Data)

COSMIC
Open
Data

TCGA

BAM/FASTQ
ICGC Map – November 2013
67 projects launched

Module 1: Cancer Genomic Databases

bioinformatics.ca
Hardeep Nahal

ICGC datasets to date
ICGC Data Portal Cumulative Donor Count for Member Projects

10,000

Release 14

Release 11
Release 13

9000

Release 12
8000

Release 10
Release 9

7000

6000

Number
of
Donors
5000

Release 8

4000

Release 7

3000

2000

1000

Dec-11

Jan-2012

Feb

March

April

May

June

July

Aug

Sept

Oct

Nov

Module 1: Cancer Genomic Databases

Dec

Jan-2013

Feb

March

April

May

June

July

Aug

Sept-2013

bioinformatics.ca
ICGC dataset version 14
September 2013

Hardeep Nahal

• Cancer types: 41
• Donors: 8,532 (18,056 specimens)
• Simple somatic mutations: 1,995,134
• Copy number mutations: 18,526,593
• Structural rearrangements: 18,614
• Genes affected* by simple somatic mutations: 22,074
• Genes affected* by non-synonymous coding mutations: 19,150 Genes
affected* by copy number mutations: 20,341
• Genes affected* by structural rearrangements: 1,884
•

*out 22,259 protein coding genes annotated in Ensembl Human release 69

• Open tier and controlled data currently available
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Select “Pancreatic cancer – Canada”

Module 1: Cancer Genomic Databases

bioinformatics.ca
… But where is the data?

Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
http://dcc.icgc.org/

Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Can do bulk download of the data …

Module 1: Cancer Genomic Databases

bioinformatics.ca
ERA
ERA
TCGA
TCGA

DACO
DACO
ICGC
ICGC

dbGaP
dbGaP

EGA
EGA

BA
BA
BA
BA
MM
M
M

BA
BA
BA
BA
MM
M
M

+ EGA id
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
http://icgc.org/daco

Module 1: Cancer Genomic Databases

bioinformatics.ca
ICGC Controlled
Access Datasets
• Detailed Phenotype and Outcome data
Region of residence
Risk factors
Examination
Surgery
Radiation
Sample
Slide
Specific histological features
Analyte
Aliquot
Donor notes
• Gene Expression (probe-level data)
• Raw genotype calls
• Gene-sample identifier links
• Genome sequence files

ICGC OA
Datasets
• Cancer Pathology
Histologic type or subtype
Histologic nuclear grade
• Patient/Person
Gender, Age range,
Vital status, Survival time
Relapse type, Status at follow-up
• Gene Expression (normalized)
• DNA methylation
•Computed Copy Number and
Loss of Heterozygosity
• Newly discovered somatic variants
http://goo.gl/w4mrV

Module 1: Cancer Genomic Databases

bioinformatics.ca
Identify
Identify
yourself
yourself

Fill out detail form which
Fill out detail form which
includes:
includes:
••Contact and Project
Contact and Project
Information
Information
••InformationTechnology
Information Technology
details and procedures
details and procedures
for keeping data secure
for keeping data secure
••DataAccess Agreement
Data Access Agreement

Module 1: Cancer Genomic Databases

All of these
All of these
documents are
documents are
put into a PDF
put into a PDF
file that you
file that you
print and get your
print and get your
institution to sign
institution to sign
off on your behalf
off on your behalf

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
DACO approved projects

Module 1: Cancer Genomic Databases

bioinformatics.ca
DACO/DCC User Data Access Process
•

Users approved through DACO are now automatically granted access to
ICGC controlled access datasets available through the ICGC Data Portal and
the EBI’s EGA repository

DACO Web
DACO Web
Application
Application

application
approved
by DACO

user
accounts
activated

DCC Data
DCC Data
Portal
Portal

DCC User
DCC User
Registry
Registry
EBI EGA
EBI EGA

Module 1: Cancer Genomic Databases

bioinformatics.ca
Catalogue of Somatic Mutations in Cancer
(COSMIC)
• http://cancer.sanger.ac.uk/cancerg
enome/projects/cosmic/

• COSMIC is designed
to store and display
somatic mutation
information and
related details and
contains information
relating to human
cancers.

Module 1: Cancer Genomic Databases

bioinformatics.ca
COSMIC
• Somatic Mutations Only
• Diverse sources
– Literature (Arrays, Next-Gen, PCR...)
– TCGA
– ICGC

• Diverse ways to look at data
–
–
–
–
–

Gene
Variation
Tumour type
Cell line
Experiment

Module 1: Cancer Genomic Databases

bioinformatics.ca
FAQ

Module 1: Cancer Genomic Databases

bioinformatics.ca
Looking up your favorite gene

1

2

Module 1: Cancer Genomic Databases

3
bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
In closing
• Remember all these sites have great amounts of
documentation
• The field is changing quickly, and so are the portals.
• New features are planned as we speak, and so you
need to use the sites, and keep coming back.
• Don’t be afraid to explore
• Interested in learning more after today? Consider
one of the bioinformatics.ca workshops!

Module 1: Cancer Genomic Databases

bioinformatics.ca
Acknowledgements:
the CBW gang
Michael
Brudno

Michael

Stromberg

Michelle Brazas
Marc
Fiume

Module 1: Cancer Genomic Databases

bioinformatics.ca

More Related Content

PPTX
Cancer genome databases & Ecological databases
PDF
Cancer Databases
PDF
BITS: UCSC genome browser - Part 1
PPT
Bioinformatics
PPTX
Whole genome sequence
PDF
Tech Talk: UCSC Genome Browser
PDF
Genome Assembly
PPTX
Introduction to bioinformatics
Cancer genome databases & Ecological databases
Cancer Databases
BITS: UCSC genome browser - Part 1
Bioinformatics
Whole genome sequence
Tech Talk: UCSC Genome Browser
Genome Assembly
Introduction to bioinformatics

What's hot (20)

PPTX
(Expasy)
PPTX
sequence of file formats in bioinformatics
PDF
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
PDF
Vector engineering and codon optimization
PDF
Gene prediction methods vijay
PDF
PDF
PPT
Systems biology: Bioinformatics on complete biological systems
PPTX
Third Generation Sequencing
PPTX
History and scope in bioinformatics
PPTX
UCSC Genome browser BIOINFORMATICS (1).pptx
PPTX
Introduction to Next Generation Sequencing
PPT
PPTX
NEXT GENERATION SEQUENCING
PPTX
Bioinformatics Applications in Biotechnology
PPTX
Introduction to NCBI
PPTX
Functional proteomics, and tools
PPTX
Protein microarray
PPTX
Nanopore sequencing
PDF
Transcriptome Analysis & Applications
(Expasy)
sequence of file formats in bioinformatics
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Vector engineering and codon optimization
Gene prediction methods vijay
Systems biology: Bioinformatics on complete biological systems
Third Generation Sequencing
History and scope in bioinformatics
UCSC Genome browser BIOINFORMATICS (1).pptx
Introduction to Next Generation Sequencing
NEXT GENERATION SEQUENCING
Bioinformatics Applications in Biotechnology
Introduction to NCBI
Functional proteomics, and tools
Protein microarray
Nanopore sequencing
Transcriptome Analysis & Applications
Ad

Viewers also liked (7)

PPTX
Sssc retreat.bioinfo resources.20110411
PPTX
Bioinformatics in Gene Research
PPTX
Architecture and evolution of neochromosomes
PPT
Biometric encryption
PPT
Bioinformatics Project Training for 2,4,6 month
PPT
Sequence Alignment In Bioinformatics
PPT
Bioinformatics
Sssc retreat.bioinfo resources.20110411
Bioinformatics in Gene Research
Architecture and evolution of neochromosomes
Biometric encryption
Bioinformatics Project Training for 2,4,6 month
Sequence Alignment In Bioinformatics
Bioinformatics
Ad

Similar to Introduction to Cancer Genomics Databases (20)

PPTX
Nov 2014 ouellette_windsor_icgc_final
PPTX
Biocuration activities for the International Cancer Genome Consortium (ICGC).
PPTX
Omprn 2018 module1_final
PPTX
Federal Research & Development for the Florida system Sept 2014
PPTX
International Cancer Genomics Consortium (ICGC) Data Coordinating Center
PPTX
PPTX
FDA NGS and Big Data Conference September 2014
PPTX
Cancer moonshot and data sharing
PPTX
ICBO 2014, October 8, 2014
PPTX
Data Commons & Data Science Workshop
PPTX
Gene Wiki and Mark2Cure update for BD2K
PPT
Personal Genomes: what can I do with my data?
PPTX
Rozen 2016-10-05-ieee-cibcb-big-genome-data-to-share
PPT
Quantitative Medicine Feb 2009
PDF
Cancer genome repository_berkeley
PPTX
Keynote at NVIDIA GPU Technology Conference in D.C.
PPTX
NCI Cancer Genomics, Open Science and PMI: FAIR
PPTX
Cancer Moonshot, Data sharing and the Genomic Data Commons
PDF
EBI Industry programme TCGA Warren KIbbe November 2013
PPTX
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
Nov 2014 ouellette_windsor_icgc_final
Biocuration activities for the International Cancer Genome Consortium (ICGC).
Omprn 2018 module1_final
Federal Research & Development for the Florida system Sept 2014
International Cancer Genomics Consortium (ICGC) Data Coordinating Center
FDA NGS and Big Data Conference September 2014
Cancer moonshot and data sharing
ICBO 2014, October 8, 2014
Data Commons & Data Science Workshop
Gene Wiki and Mark2Cure update for BD2K
Personal Genomes: what can I do with my data?
Rozen 2016-10-05-ieee-cibcb-big-genome-data-to-share
Quantitative Medicine Feb 2009
Cancer genome repository_berkeley
Keynote at NVIDIA GPU Technology Conference in D.C.
NCI Cancer Genomics, Open Science and PMI: FAIR
Cancer Moonshot, Data sharing and the Genomic Data Commons
EBI Industry programme TCGA Warren KIbbe November 2013
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao

More from Neuro, McGill University (8)

PDF
White_matter_Ouellette_2022-06-07.pdf
PPTX
Open data genomics_palermo_2017_ver03
PPTX
Ouellette elixir 2017
PPTX
Madrid icgc pcawg_2016_slideshare
PPTX
Cancer uk 2015_module1_ouellette_ver02
PPTX
Genentech icgc 2015
PPTX
Ouellette icgc toronto_oct2012_fged_ver02
White_matter_Ouellette_2022-06-07.pdf
Open data genomics_palermo_2017_ver03
Ouellette elixir 2017
Madrid icgc pcawg_2016_slideshare
Cancer uk 2015_module1_ouellette_ver02
Genentech icgc 2015
Ouellette icgc toronto_oct2012_fged_ver02

Recently uploaded (20)

PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Complications of Minimal Access Surgery at WLH
PDF
Classroom Observation Tools for Teachers
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
Microbial disease of the cardiovascular and lymphatic systems
VCE English Exam - Section C Student Revision Booklet
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
2.FourierTransform-ShortQuestionswithAnswers.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Complications of Minimal Access Surgery at WLH
Classroom Observation Tools for Teachers
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Final Presentation General Medicine 03-08-2024.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Weekly quiz Compilation Jan -July 25.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Microbial diseases, their pathogenesis and prophylaxis
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Chinmaya Tiranga quiz Grand Finale.pdf

Introduction to Cancer Genomics Databases

  • 1. Canadian Cancer Research Conference November 3-6, 2013 Canadian Bioinformatics Workshops www.bioinformatics.ca
  • 2. Module #: Title of Module 2
  • 3. You are free to: Copy, share, adapt, or re-mix; Photograph, film, or broadcast; Blog, live-blog, or post video of; This presentation. Provided that: You attribute the work to its author and respect the rights and licenses associated with its components. Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero. Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at; http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites Module 1: Cancer Genomic Databases bioinformatics.ca
  • 5. E-mail E-mail francis@oicr.on.ca @bffo Module 1: Cancer Genomic Databases bioinformatics.ca
  • 6. Schedule for Module 1 Cancer Genomic Databases •The Databases: – The International Cancer Genome Consortium (ICGC) – The Cancer Genome Atlas (TCGA) – The Catalogue of Somatic Mutations in Cancer (COSMIC) •Data Access: human genomes and security and privacy issues, Open vs. Controlled Access data Module 1: Cancer Genomic Databases bioinformatics.ca
  • 7. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 9. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 10. Workshops planned for 2014: http://bioinformatics.ca/workshops 1. 2. 3. 4. 5. 6. 7. 8. Exploratory Analysis of Biological Data using R Bioinformatics for Cancer Genomics Informatics for RNA-sequence Analysis Informatics on High Throughput Sequencing Data Pathway and Network Analysis of -omics Data Flow Cytometry Data Analysis using R Microarray Data Analysis Informatics and Statistics for Metabolomics Module 1: Cancer Genomic Databases bioinformatics.ca
  • 12. E-mail: course_info@bioinformatics.ca Web: http://bioinformatics.ca Workshop announcement mailing list: http://bioinformatics.ca/mailman/listinfo/announce Module 1: Cancer Genomic Databases bioinformatics.ca
  • 13. Soap-Box time! • • Open Access, Open Data and Open Source are essential for good Science. Openness is a responsibility, an obligation, and something that comes with the privilege of doing publicly funded work. Open Source Open Access Open Data Opencourseware Module 1: Cancer Genomic Databases bioinformatics.ca
  • 14. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 15. Cancer therapy is like beating the dog with a stick to get rid of his fleas. - Anna Deavere Smith, Let me down easy Module 1: Cancer Genomic Databases bioinformatics.ca
  • 16. http://goo.gl/Yhbsj Module 1: Cancer Genomic Databases bioinformatics.ca
  • 17. The revolution in cancer research can summed up in a single sentence: cancer is in essence, a genetic disease. - Bert Vogelstein Module 1: Cancer Genomic Databases bioinformatics.ca
  • 18. Cancer: a Disease of the Genome Challenge in Treating Cancer:  Every tumour is different  Every cancer patient is different Module 1: Cancer Genomic Databases bioinformatics.ca
  • 19. Cancer Genomic Databases Chin et al, Genes. Dev. 2011 March 15; 25(6): 534-555 http://www.ncbi.nlm.nih.gov/pubmed/?term=21406553 Module 1: Cancer Genomic Databases bioinformatics.ca
  • 20. TCGA The Cancer Genome Atlas is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including largescale genome sequencing. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 21. About the TCGA • • • • National Cancer Institute (NCI) National Human Genome Research Institute (NHGRI) Phased Structure: – Three-year pilot in 2006 with an investment of $50 million from each – TCGA will collect and characterize more than 20 additional tumour types (now at 16) Module 1: Cancer Genomic Databases bioinformatics.ca
  • 22. Where to start with the TCGA? Wiki: https://wiki.nci.nih.gov/display/TCGA/About+TCGA Module 1: Cancer Genomic Databases bioinformatics.ca
  • 23. Division of Labour • Biospecimen Core Resource (BCR) – centre where samples are carefully catalogued, processed, qualitychecked and stored along with participant clinical information • Genome Sequencing Centre (GSC) – uses high-throughput methods to identify changes to DNA sequences that are associated with specific cancer types • Genome Characterization Centre (GCC) – uses high-throughput technologies to analyze genomic changes involved in cancer • Genome Data Analysis Centre (GDAC) – provides novel informatics tools to the research community • – provides analysis results using TCGA data. Data Coordinating Centre (DCC) – Central provider of TCGA data. – Standardizes data formats and validates submitted data. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 24. TCGA Data • Sequence reads from newer sequencing technologies are available at the Cancer Genome Hub: https://cghub.ucsc.edu/ • Higher level sequence data (variation calls and abundance measures) are available at the TCGA Portal: http://cancergenome.nih.gov/ Module 1: Cancer Genomic Databases bioinformatics.ca
  • 25. TCGA data flow http://goo.gl/b5nojx Module 1: Cancer Genomic Databases bioinformatics.ca
  • 26. Data Coordinating Centre • Play a central role – Receiving data from BCR, GSC and GCC sites – Providing access to users – Performing analysis of data • Responsibilities: – – – – Protecting participant privacy and confidentiality Developing data standards and controlled vocabularies Establishing informatics pipelines for data flow Developing new analytical and visualization technologies to facilitate data analysis, for all audiences Module 1: Cancer Genomic Databases bioinformatics.ca
  • 27. TCGA DCC Data Portal • Provides a platform to search, download and analyze TCGA data sets • Two data access tiers: Open and Controlled • Analytic tools include: Cancer Molecular Analysis and Cancer Genome Workbench (NCBIB), Integrative Genomics Viewer (Broad) and CancerGenomics Analysis (MSKCC). Module 1: Cancer Genomic Databases bioinformatics.ca
  • 28. TCGA Data Browser https://tcga-data.nci.nih.gov/tcga/ Query TCGA data online using the TCGA Data Browser Module 1: Cancer Genomic Databases bioinformatics.ca
  • 29. The International Cancer Genome Consortium (ICGC) • http://www.icgc.org/ • “ICGC was launched to coordinate largescale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe” Module 1: Cancer Genomic Databases bioinformatics.ca
  • 31. ICGC Map – November 2013 67 projects launched Module 1: Cancer Genomic Databases bioinformatics.ca
  • 32. Hardeep Nahal ICGC datasets to date ICGC Data Portal Cumulative Donor Count for Member Projects 10,000 Release 14 Release 11 Release 13 9000 Release 12 8000 Release 10 Release 9 7000 6000 Number of Donors 5000 Release 8 4000 Release 7 3000 2000 1000 Dec-11 Jan-2012 Feb March April May June July Aug Sept Oct Nov Module 1: Cancer Genomic Databases Dec Jan-2013 Feb March April May June July Aug Sept-2013 bioinformatics.ca
  • 33. ICGC dataset version 14 September 2013 Hardeep Nahal • Cancer types: 41 • Donors: 8,532 (18,056 specimens) • Simple somatic mutations: 1,995,134 • Copy number mutations: 18,526,593 • Structural rearrangements: 18,614 • Genes affected* by simple somatic mutations: 22,074 • Genes affected* by non-synonymous coding mutations: 19,150 Genes affected* by copy number mutations: 20,341 • Genes affected* by structural rearrangements: 1,884 • *out 22,259 protein coding genes annotated in Ensembl Human release 69 • Open tier and controlled data currently available
  • 34. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 35. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 36. Select “Pancreatic cancer – Canada” Module 1: Cancer Genomic Databases bioinformatics.ca
  • 37. … But where is the data? Module 1: Cancer Genomic Databases bioinformatics.ca
  • 38. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 39. http://dcc.icgc.org/ Module 1: Cancer Genomic Databases bioinformatics.ca
  • 40. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 41. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 42. Can do bulk download of the data … Module 1: Cancer Genomic Databases bioinformatics.ca
  • 44. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 45. http://icgc.org/daco Module 1: Cancer Genomic Databases bioinformatics.ca
  • 46. ICGC Controlled Access Datasets • Detailed Phenotype and Outcome data Region of residence Risk factors Examination Surgery Radiation Sample Slide Specific histological features Analyte Aliquot Donor notes • Gene Expression (probe-level data) • Raw genotype calls • Gene-sample identifier links • Genome sequence files ICGC OA Datasets • Cancer Pathology Histologic type or subtype Histologic nuclear grade • Patient/Person Gender, Age range, Vital status, Survival time Relapse type, Status at follow-up • Gene Expression (normalized) • DNA methylation •Computed Copy Number and Loss of Heterozygosity • Newly discovered somatic variants http://goo.gl/w4mrV Module 1: Cancer Genomic Databases bioinformatics.ca
  • 47. Identify Identify yourself yourself Fill out detail form which Fill out detail form which includes: includes: ••Contact and Project Contact and Project Information Information ••InformationTechnology Information Technology details and procedures details and procedures for keeping data secure for keeping data secure ••DataAccess Agreement Data Access Agreement Module 1: Cancer Genomic Databases All of these All of these documents are documents are put into a PDF put into a PDF file that you file that you print and get your print and get your institution to sign institution to sign off on your behalf off on your behalf bioinformatics.ca
  • 48. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 49. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 50. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 51. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 52. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 53. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 54. DACO approved projects Module 1: Cancer Genomic Databases bioinformatics.ca
  • 55. DACO/DCC User Data Access Process • Users approved through DACO are now automatically granted access to ICGC controlled access datasets available through the ICGC Data Portal and the EBI’s EGA repository DACO Web DACO Web Application Application application approved by DACO user accounts activated DCC Data DCC Data Portal Portal DCC User DCC User Registry Registry EBI EGA EBI EGA Module 1: Cancer Genomic Databases bioinformatics.ca
  • 56. Catalogue of Somatic Mutations in Cancer (COSMIC) • http://cancer.sanger.ac.uk/cancerg enome/projects/cosmic/ • COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 57. COSMIC • Somatic Mutations Only • Diverse sources – Literature (Arrays, Next-Gen, PCR...) – TCGA – ICGC • Diverse ways to look at data – – – – – Gene Variation Tumour type Cell line Experiment Module 1: Cancer Genomic Databases bioinformatics.ca
  • 58. FAQ Module 1: Cancer Genomic Databases bioinformatics.ca
  • 59. Looking up your favorite gene 1 2 Module 1: Cancer Genomic Databases 3 bioinformatics.ca
  • 60. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 61. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 62. In closing • Remember all these sites have great amounts of documentation • The field is changing quickly, and so are the portals. • New features are planned as we speak, and so you need to use the sites, and keep coming back. • Don’t be afraid to explore • Interested in learning more after today? Consider one of the bioinformatics.ca workshops! Module 1: Cancer Genomic Databases bioinformatics.ca
  • 63. Acknowledgements: the CBW gang Michael Brudno Michael Stromberg Michelle Brazas Marc Fiume Module 1: Cancer Genomic Databases bioinformatics.ca

Editor's Notes

  • #2: {"33":"Ensembl 61 Hs has 53,515 gene loci annotated, which explain high affected genes numbers for SSMs (I’ve double-checked these numbers)\n","29":"A few notes on ICGC\n","19":"Consequtive basepairs\n","59":"Summary page with basic gene description and list of curated pubs. Click on Histogram to view the distribution of mutations. \n"}