SlideShare a Scribd company logo
Open Data is Essential for
Personalized Medicine
BF Francis Ouellette
https://goo.gl/8U1QJa
Open Data is Essential for Genomics
This presentation is on:
https://www.slideshare.net/
3Module #: Title of Module
Open Data is Essential for Genomics
Open Data is Essential for Genomics
@bffo
francis@genomequebec.comE-mail
Open Data is Essential for Genomics
Times I’ve been in Italy
• Trieste 1996: Last Yeast Genome Meeting
• Naples 2005: NETTAB “Workflows management:
new abilities for the biological information overflow”
• Rome 2017: Elixir
• Palermo 2017: NETTAB
Open Data is Essential for Genomics
Outline
• What I do
• Open Data in genomics
• Final thoughts
Open Data is Essential for Genomics
But first, a little about me …
… an unfinished story!
Open Data is Essential for Genomics
https://goo.gl/anu933
Open Data is Essential for Genomics
http://goo.gl/dJIur
Open Data is Essential for Genomics
http://goo.gl/LwVOZ
Open Data is Essential for Genomics
http://goo.gl/QI6aL
Open Data is Essential for Genomics
http://goo.gl/mYHFO
Open Data is Essential for Genomics
http://goo.gl/Jc5TK
Open Data is Essential for Genomics
https://goo.gl/3PFr7L
1993-1997
Open Data is Essential for Genomics
from the National Centre for Biotechnology Information
Open Data is Essential for Genomics
from the National Centre for Biotechnology Information
Open Data is Essential for Genomics
from the National Centre for Biotechnology Information
PANIC
Open Data is Essential for Genomics
Open Data is Essential for Genomics
PANIC
Open Data is Essential for Genomics
PANIC
Open Data is Essential for Genomics
Open Data is Essential for Genomics
https://www.ubc.ca/
Open Data is Essential for Genomics
1999
Open Data is Essential for Genomics
2001: Human Genome Project
Open Data is Essential for Genomics
2003-2007
Open Data is Essential for Genomics
Open Data is Essential for Genomics
Toronto
Open Data is Essential for Genomics
2007-2017
Open Data is Essential for Genomics
International Cancer Genome Consortium
Open Data is Essential for Genomics
http://goo.gl/dJIur
Open Data is Essential for Genomics
2017- …
Open Data is Essential for Genomics
Open Data is Essential for Genomics
SABs, EBs & projects I’m on:
Open Data is Essential for Genomics
Open Data is Essential for Genomics
So what unifies all
of what I’ve done?
Open Data is Essential for Genomics
So what unifies all
of what I’ve done?
Helping scientists do science.
Open Data is Essential for Genomics
Open Data
https://goo.gl/Z63Wxp
Open Data is Essential for Genomics
Genomics
https://goo.gl/MX84KA
Open Data is Essential for Genomics
What am I calling “Genomics”?
All “omics”
– DNA and RNA, +Epigenomics
– Proteomics, +Protein Interactions, +Pathways
– Metabolomics
– Bioinformatics/Computational Biology
– All of the related data and metadata
• Phenotype
• Clinical
• Images
– New technologies …
Open Data is Essential for Genomics
Biological scope?
• Anything with DNA or RNA or protein
Open Data is Essential for Genomics
Open Data is Essential for Genomics
Example of one of a
challenge for all of us?
The integration of genomic data
with deep learning and artificial
intelligence
Open Data is Essential for Genomics
AI, Big Data, Deep Computing
• Artificial Intelligence / Deep Learning and
the Big Data Hype?
https://goo.gl/WHg36Q
Open Data is Essential for Genomics
What do we need for that?
https://goo.gl/JWpXj2
Open Data is Essential for Genomics
What do we need for that?
https://goo.gl/JWpXj2
Open Data is Essential for Genomics
What else?
• Data has to be FAIR
– TO BE FINDABLE
– TO BE ACCESSIBLE
– TO BE INTEROPERABLE
– TO BE RE-USABLE
• https://www.force11.org/group/fairgroup/fairprinciples
Open Data is Essential for Genomics
Big data examples
• Genomic sequences
• Imaging
• Population scale collected wearable data
Open Data is Essential for Genomics
Data Center for all in Québec?
• Health Care in Canada is governed
province by province.
• Génome Québec is working with various
ministries to set something that could be
useful/centralized and make genomic data
usable for research (controlled access).
• Needs to include clinical data
Open Data is Essential for Genomics
“Building a data centre is
like making pancakes, you
always need to throw
away the 1st one”
Robert Grossman
Frederick H. Rawson Professor and
the Director of the Center for Data
Intensive Science (CDIS) at the
University of Chicago
http://rgrossman.com/
Open Data is Essential for Genomics
Sharing all data types,
including clinical data?
https://goo.gl/ofEPeX
Open Data is Essential for Genomics
Authors present at the
“Toronto meeting”
https://goo.gl/ofEPeX
Open Data is Essential for Genomics
53 Introduction 1.0
Open data critical to
progress in Science
Open Data is Essential for Genomics
54 Introduction 1.0
One example: GenBank
GenBank sequence
database is an open
access, annotated
collection of all publicly
available nucleotide
sequences and their
protein translations.
Open Data is Essential for Genomics
55 Introduction 1.0
Open data critical to progress in Science
• Without GenBank and other public
sequence databases
– There would be no BLAST
– There would be no diagnostics DNA testing
– There would be no understanding of the
human genome (there probably would not
have been a human genome to work on in the
first place).
Open Data is Essential for Genomics
Adapted from Niko Beerenwinkel ,Chris D. Greenman ,Jens Lagergren
ICGC PCAWG
Docker
Testing
Computational Cancer Biology: An Evolutionary Perspective
•Published: February 4, 2016. https://doi.org/10.1371/journal.pcbi.1004717
Open Data is Essential for Genomics
Cancer is a Disease
of the Genome
Challenge in Treating Cancer:
 Every tumour is different
 Every cancer patient is different
Adapted from Tom Hudsonhttps://www.cancer.gov/research/areas/genomics
Open Data is Essential for Genomics
Analysis Data Types
• Simple Somatic Mutations (SSM or SNV)
• Copy Number Alterations (CAN or CNV)
• Structural Variants (SV)
• Germline variants (SNPs)
• Gene Expression (micro-arrays and RNASeq)
• miRNA Expression (RNASeq)
• Epigenomics (Arrays and Methylation)
• Splicing Variation (RNASeq)
• Protein Expression (Arrays)
Open Data is Essential for Genomics
International Cancer Genome Consortium
• Collect ~500 tumour/normal pairs from each of 50 different major
cancer types; 25,000 T/N pairs!
• Comprehensive genome analysis of each T/N pair:
– Genome
– Transcriptome
– Methylome
– Clinical data
• Make the data available to the research community & public.
Identify
genome
changes
…GATTATTCCAGGTAT… …GATTATTGCAGGTAT… …GATTATTGCAGGTAT…
Adapted from Tom Hudson
ONTARIO INSTITUTE FOR CANCER RESEARC
60
Open Data is Essential for Genomics
International Cancer Genome Consortium: http:/icgc.org
Open Data is Essential for Genomics
ICGC needs to deal with different
kinds of users!
62
• Biologists/Clinicians:
– Web interface to processed data, providing:
• Affected gene lists with consequences
• Impact on pathways
• Power users:
– Application Programing Interface (API) to get
to data
– Availability and Integration with cloud
resources
Open Data is Essential for Genomics
ICGC Data Coordinating Centre:
dcc.icgc.org
63
Open Data is Essential for Genomics
https://dcc.icgc.org/
64
Open Data is Essential for Genomics
65
https://dcc.icgc.org/icgc-in-the-cloud
Open Data is Essential for Genomics
66
http://www.cancercollaboratory.org/
Open Data is Essential for Genomics
Some challenges:
67
• So, we have lots of data, is
it generated the same way?
Open Data is Essential for Genomics
Every country/group has basically
been submitting:
68
– Simple Somatic Mutations (SSM or SNV)
– Copy Number Alterations (CAN or CNV)
– Structural Variants (SV)
– Germline variants (SNPs)
– Gene Expression (micro-arrays and RNASeq)
– miRNA Expression (RNASeq)
– Epigenomics (Arrays and Methylation)
– Splicing Variation (RNASeq)
– Protein Expression (Arrays)
Open Data is Essential for Genomics
Are they all using the same
pipelines?
69
• No
Open Data is Essential for Genomics
70
Open Data is Essential for Genomics
Steering Committee of PCAWG
71
• Peter Campbell, Sanger Inst.
• Gady Getz, Broad
• Jan Korbel, EMBL
• Lincoln Stein, OICR
• Josh Stuart, UCSC
Open Data is Essential for Genomics
PanCancer Analysis of Whole
Genomes (PCAWG)
• > 2,800 T/N pairs with clinical data from 20
tumour type of whole genome analysis.
• Aligned with one standard pipeline.
• Genomic Variants determined with 3 pipelines
• 17 working groups
• > 50 Papers are being
written now.
Open Data is Essential for Genomics
https://www.biorxiv.org/search/pcawg
Open Data is Essential for Genomics
Deliverable for PCAWG include:
74
• 1st PANCANCER analysis on > 2,800
cancer tumours from a WGS perspective
• RNA, SSM, CNV, Methylation analysis &
germline
• Published (executable) pipelines
– Docker / Dockstore
– Mutiple cloud access to data
– Multiple portal access to data
Open Data is Essential for Genomics
https://dcc.icgc.org/pcawg
75
Open Data is Essential for Genomics
Working Groups (1/2)
76
1. Novel somatic mutation calling methods
2. Analysis of mutations in regulatory regions
3. Integration of transcriptome and genome
4. Integration of epigenome and genome
5. Consequences of somatic mutations on pathway
and network activity
6. Patterns of structural variations, signatures,
genomic correlations, retrotransposons, mobile
elements
7. Mutation signatures and processes
8. Germline cancer genome
Open Data is Essential for Genomics
Working Groups (2/2)
77
9 Inferring driver mutations and identifying cancer
genes and pathways
10 Translating cancer genomes to the clinic
11 Evolution and heterogeneity
12 Exploratory: portals, visualization and software
infrastructure
13 Molecular subtypes and classification
14 Analysis of mutations in non-coding RNA
15 Exploratory: mitochondrial
16 Exploratory: pathogens
17 Tech Technical working group
Open Data is Essential for Genomics
https://goo.gl/AMxwSU
Open Data is Essential for Genomics
https://goo.gl/AMxwSU
Open Data is Essential for Genomics
https://goo.gl/AMxwSU
Open Data is Essential for Genomics
https://goo.gl/AMxwSU
Open Data is Essential for Genomics
http://dockstore.org
82
Open Data is Essential for Genomics
Docker Testing Group
• Group that to ensure all container
workflow work as expected.
https://goo.gl/AMxwSU
Open Data is Essential for Genomics
Access to Data?
• Human Data
• Patients consented to have their DNA
looked at so people could understand
cancer
• Need to have a system to maximize
people’s gift to science.
Open Data is Essential for Genomics
Open Data is Essential for Genomics
Identify
yourself
Fill out detail form which
includes:
• Contact and Project
Information
•Information Technology
details and procedures
for keeping data secure
•Data Access Agreement
All of these
documents are
put into a PDF
file that you
print and get your
institution to sign
off on your behalf
Open Data is Essential for Genomics
Open Data is Essential for Genomics
Open Data is Essential for Genomics
89
https://icgc.org/daco/approved-projects
314 groups
Open Data is Essential for Genomics
DACO
ICGC
dbGaP
GDC
EGA
TCGA
BAM
Open
Open
ERA
BA
M
BA
M
EGA id
& password
WGS
Ger m
Line
Open Data is Essential for Genomics
Challenge:
• Open Data and controlled access data
• Not enough eyeballs on the data
• Eyeballs on the data needed to make
discoveries.
https://goo.gl/ogbWXG
Open Data is Essential for Genomics
Culture of Sharing Openly
• Public Funding agencies
• Consortiums
• Mentors
• Peers
• New generation (vs my old generation)
• Has to become the norm
Open Data is Essential for Genomics
Final thoughts …
• Access to data is essential for science
• Getting data that is FAIR is hard work
• It is essential to share the work you do if
you want to be recognized, get tenure, get
a job or a promotion.
• Human data is more complicated, but
don’t let that get in the way!
• There is a lot of material out there, learn
from it (& cite your sources)!
Open Data is Essential for Genomics
Last message to students and
young PDFs and investigators:
Open Data is Essential for Genomics
Last message to students and
young PDFs and investigators:
Be open so people
can see how great
you are!
ONTARIO INSTITUTE FOR CANCER RESEARC
96
915
Open Data is Essential for Genomics
DCC Software
Developer
Vincent Ferretti
Dusan Andric
Phuong-My Do
Francois Gerthoffert
Terry Lin
Michael Moncada
Vitalii Slobodianyk
Bob Tiernay
Douglas Wong
Linda Xiang
Junjun Zhang
Acknowledgments
ICGC/OICR
Project leaders:
Tom Hudson
John McPherson
Lincoln Stein
Jared Simpson
Paul Boutros
Vincent Ferretti
Francis Ouellette
Jennifer Jennings
Ouellette Lab
Alysha Moncrieffe
Ann Meyer
Zhibin Lu
Web Dev
Joseph Yamada
Kaman Wu
Kim Cullion
Koji Miyauchi
Miyuki Fukuma
ICGC DCC Biocuration
Hardeep Nahal
Marc Perry
http://oicr.on.ca http://icgc.org
… and all the patients and their
families that that are putting
their hopes into our work!
Research
IT/Systems
David Sutton,
Bob Gibson
David Magda
Rob Naccarato
Brian Ott
Gino Yearwood
EGA
Jordi Rambla De
Argila
Arcadi Navarro
Audald Iloret
Mauricio Moldes
|
ÉQUIPE DES AFFAIRES SCIENTIFIQUES
9827 mars 2017
B.F. Francis
Ouellette
Annina Spilker
Joël Savard
Diana IglesiasDiane
Bouchard
Cristina CiurliMicheline
Ayoub
Hélène
Fournier
Open Data is Essential for Genomics
99
Grazie
Open Data is Essential for Genomics
100

More Related Content

PPTX
International Cancer Genomics Consortium (ICGC) Data Coordinating Center
PPTX
Introduction to Gene Mining Part A: BLASTn-off!
PPTX
Ouellette icgc toronto_oct2012_fged_ver02
PDF
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
PPTX
Data analytics challenges in genomics
PPTX
Cancer uk 2015_module1_ouellette_ver02
PPT
Microbial Metagenomics Drives a New Cyberinfrastructure
PPTX
Genentech icgc 2015
International Cancer Genomics Consortium (ICGC) Data Coordinating Center
Introduction to Gene Mining Part A: BLASTn-off!
Ouellette icgc toronto_oct2012_fged_ver02
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Data analytics challenges in genomics
Cancer uk 2015_module1_ouellette_ver02
Microbial Metagenomics Drives a New Cyberinfrastructure
Genentech icgc 2015

What's hot (20)

PPTX
2015 bioinformatics personal_genomics_wim_vancriekinge
PDF
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
PPTX
Reference Data Integration: A Strategy for the Future
PPTX
Data analysis & integration challenges in genomics
PDF
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
PPTX
NetBioSIG2013-Talk Thomas Kelder
PDF
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
PPT
Pathology is being disrupted by Data Integration, AI & Blockchain
PDF
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
PDF
Building bioinformatics resources for the global community
PDF
Data for AI models, the past, the present, the future
PDF
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
PPTX
Career oppurtunities in the field of Bioinformatics
DOCX
Pallavi online assignment
PPT
Bioinformatics lecture 1
PPT
Introduction to Cancer Genomics Databases
PDF
Basics of Data Analysis in Bioinformatics
PDF
Bioinformatics
PDF
NetBioSIG2013-Talk Robin Haw
DOCX
rheumatoid arthritis
2015 bioinformatics personal_genomics_wim_vancriekinge
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Reference Data Integration: A Strategy for the Future
Data analysis & integration challenges in genomics
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
NetBioSIG2013-Talk Thomas Kelder
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Pathology is being disrupted by Data Integration, AI & Blockchain
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
Building bioinformatics resources for the global community
Data for AI models, the past, the present, the future
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
Career oppurtunities in the field of Bioinformatics
Pallavi online assignment
Bioinformatics lecture 1
Introduction to Cancer Genomics Databases
Basics of Data Analysis in Bioinformatics
Bioinformatics
NetBioSIG2013-Talk Robin Haw
rheumatoid arthritis
Ad

Similar to Open data genomics_palermo_2017_ver03 (20)

PPTX
Genomics and Computation in Precision Medicine March 2017
PDF
Bda2015 tutorial-part1-intro
PDF
Big data in basic and translational cancer research.pdf
PPTX
Omprn 2018 module1_final
PPTX
Nci clinical genomics data sharing ncra sept 2016
PDF
Open Source Networking Solving Molecular Analysis of Cancer
PPTX
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...
PDF
BioData West 2017 Brochure.PDF
PPTX
Madrid icgc pcawg_2016_slideshare
PPTX
NCI Cancer Genomic Data Commons for NCAB September 2016
PPTX
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...
PPTX
The need to redefine genomic data sharing - moving towards Open Science Oct ...
PDF
BigData in Life Sciences, Genomics and Systems Biology
PPTX
Systems Genetics of Cancer - big data and all that
PPTX
Workshop finding and accessing data - fiona - lunteren april 18 2016
PPTX
2016 09 cxo forum
PPTX
Big data sharing
PPTX
Finding and Accessing Human Genomics Datasets
PDF
Stephen Friend WIN Symposium 2011 2011-07-06
PPTX
SciDataCon - How to increase accessibility and reuse for clinical and persona...
Genomics and Computation in Precision Medicine March 2017
Bda2015 tutorial-part1-intro
Big data in basic and translational cancer research.pdf
Omprn 2018 module1_final
Nci clinical genomics data sharing ncra sept 2016
Open Source Networking Solving Molecular Analysis of Cancer
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...
BioData West 2017 Brochure.PDF
Madrid icgc pcawg_2016_slideshare
NCI Cancer Genomic Data Commons for NCAB September 2016
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...
The need to redefine genomic data sharing - moving towards Open Science Oct ...
BigData in Life Sciences, Genomics and Systems Biology
Systems Genetics of Cancer - big data and all that
Workshop finding and accessing data - fiona - lunteren april 18 2016
2016 09 cxo forum
Big data sharing
Finding and Accessing Human Genomics Datasets
Stephen Friend WIN Symposium 2011 2011-07-06
SciDataCon - How to increase accessibility and reuse for clinical and persona...
Ad

Recently uploaded (20)

PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
neck nodes and dissection types and lymph nodes levels
PPT
protein biochemistry.ppt for university classes
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
famous lake in india and its disturibution and importance
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
Microbiology with diagram medical studies .pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
neck nodes and dissection types and lymph nodes levels
protein biochemistry.ppt for university classes
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
famous lake in india and its disturibution and importance
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Phytochemical Investigation of Miliusa longipes.pdf
Microbiology with diagram medical studies .pptx
The KM-GBF monitoring framework – status & key messages.pptx
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
microscope-Lecturecjchchchchcuvuvhc.pptx
2. Earth - The Living Planet Module 2ELS
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
AlphaEarth Foundations and the Satellite Embedding dataset
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Viruses (History, structure and composition, classification, Bacteriophage Re...
INTRODUCTION TO EVS | Concept of sustainability
cpcsea ppt.pptxssssssssssssssjjdjdndndddd

Open data genomics_palermo_2017_ver03

  • 1. Open Data is Essential for Personalized Medicine BF Francis Ouellette https://goo.gl/8U1QJa
  • 2. Open Data is Essential for Genomics This presentation is on: https://www.slideshare.net/
  • 3. 3Module #: Title of Module
  • 4. Open Data is Essential for Genomics
  • 5. Open Data is Essential for Genomics @bffo francis@genomequebec.comE-mail
  • 6. Open Data is Essential for Genomics Times I’ve been in Italy • Trieste 1996: Last Yeast Genome Meeting • Naples 2005: NETTAB “Workflows management: new abilities for the biological information overflow” • Rome 2017: Elixir • Palermo 2017: NETTAB
  • 7. Open Data is Essential for Genomics Outline • What I do • Open Data in genomics • Final thoughts
  • 8. Open Data is Essential for Genomics But first, a little about me … … an unfinished story!
  • 9. Open Data is Essential for Genomics https://goo.gl/anu933
  • 10. Open Data is Essential for Genomics http://goo.gl/dJIur
  • 11. Open Data is Essential for Genomics http://goo.gl/LwVOZ
  • 12. Open Data is Essential for Genomics http://goo.gl/QI6aL
  • 13. Open Data is Essential for Genomics http://goo.gl/mYHFO
  • 14. Open Data is Essential for Genomics http://goo.gl/Jc5TK
  • 15. Open Data is Essential for Genomics https://goo.gl/3PFr7L 1993-1997
  • 16. Open Data is Essential for Genomics from the National Centre for Biotechnology Information
  • 17. Open Data is Essential for Genomics from the National Centre for Biotechnology Information
  • 18. Open Data is Essential for Genomics from the National Centre for Biotechnology Information PANIC
  • 19. Open Data is Essential for Genomics
  • 20. Open Data is Essential for Genomics PANIC
  • 21. Open Data is Essential for Genomics PANIC
  • 22. Open Data is Essential for Genomics
  • 23. Open Data is Essential for Genomics https://www.ubc.ca/
  • 24. Open Data is Essential for Genomics 1999
  • 25. Open Data is Essential for Genomics 2001: Human Genome Project
  • 26. Open Data is Essential for Genomics 2003-2007
  • 27. Open Data is Essential for Genomics
  • 28. Open Data is Essential for Genomics Toronto
  • 29. Open Data is Essential for Genomics 2007-2017
  • 30. Open Data is Essential for Genomics International Cancer Genome Consortium
  • 31. Open Data is Essential for Genomics http://goo.gl/dJIur
  • 32. Open Data is Essential for Genomics 2017- …
  • 33. Open Data is Essential for Genomics
  • 34. Open Data is Essential for Genomics SABs, EBs & projects I’m on:
  • 35. Open Data is Essential for Genomics
  • 36. Open Data is Essential for Genomics So what unifies all of what I’ve done?
  • 37. Open Data is Essential for Genomics So what unifies all of what I’ve done? Helping scientists do science.
  • 38. Open Data is Essential for Genomics Open Data https://goo.gl/Z63Wxp
  • 39. Open Data is Essential for Genomics Genomics https://goo.gl/MX84KA
  • 40. Open Data is Essential for Genomics What am I calling “Genomics”? All “omics” – DNA and RNA, +Epigenomics – Proteomics, +Protein Interactions, +Pathways – Metabolomics – Bioinformatics/Computational Biology – All of the related data and metadata • Phenotype • Clinical • Images – New technologies …
  • 41. Open Data is Essential for Genomics Biological scope? • Anything with DNA or RNA or protein
  • 42. Open Data is Essential for Genomics
  • 43. Open Data is Essential for Genomics Example of one of a challenge for all of us? The integration of genomic data with deep learning and artificial intelligence
  • 44. Open Data is Essential for Genomics AI, Big Data, Deep Computing • Artificial Intelligence / Deep Learning and the Big Data Hype? https://goo.gl/WHg36Q
  • 45. Open Data is Essential for Genomics What do we need for that? https://goo.gl/JWpXj2
  • 46. Open Data is Essential for Genomics What do we need for that? https://goo.gl/JWpXj2
  • 47. Open Data is Essential for Genomics What else? • Data has to be FAIR – TO BE FINDABLE – TO BE ACCESSIBLE – TO BE INTEROPERABLE – TO BE RE-USABLE • https://www.force11.org/group/fairgroup/fairprinciples
  • 48. Open Data is Essential for Genomics Big data examples • Genomic sequences • Imaging • Population scale collected wearable data
  • 49. Open Data is Essential for Genomics Data Center for all in Québec? • Health Care in Canada is governed province by province. • Génome Québec is working with various ministries to set something that could be useful/centralized and make genomic data usable for research (controlled access). • Needs to include clinical data
  • 50. Open Data is Essential for Genomics “Building a data centre is like making pancakes, you always need to throw away the 1st one” Robert Grossman Frederick H. Rawson Professor and the Director of the Center for Data Intensive Science (CDIS) at the University of Chicago http://rgrossman.com/
  • 51. Open Data is Essential for Genomics Sharing all data types, including clinical data? https://goo.gl/ofEPeX
  • 52. Open Data is Essential for Genomics Authors present at the “Toronto meeting” https://goo.gl/ofEPeX
  • 53. Open Data is Essential for Genomics 53 Introduction 1.0 Open data critical to progress in Science
  • 54. Open Data is Essential for Genomics 54 Introduction 1.0 One example: GenBank GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations.
  • 55. Open Data is Essential for Genomics 55 Introduction 1.0 Open data critical to progress in Science • Without GenBank and other public sequence databases – There would be no BLAST – There would be no diagnostics DNA testing – There would be no understanding of the human genome (there probably would not have been a human genome to work on in the first place).
  • 56. Open Data is Essential for Genomics Adapted from Niko Beerenwinkel ,Chris D. Greenman ,Jens Lagergren ICGC PCAWG Docker Testing Computational Cancer Biology: An Evolutionary Perspective •Published: February 4, 2016. https://doi.org/10.1371/journal.pcbi.1004717
  • 57. Open Data is Essential for Genomics Cancer is a Disease of the Genome Challenge in Treating Cancer:  Every tumour is different  Every cancer patient is different Adapted from Tom Hudsonhttps://www.cancer.gov/research/areas/genomics
  • 58. Open Data is Essential for Genomics Analysis Data Types • Simple Somatic Mutations (SSM or SNV) • Copy Number Alterations (CAN or CNV) • Structural Variants (SV) • Germline variants (SNPs) • Gene Expression (micro-arrays and RNASeq) • miRNA Expression (RNASeq) • Epigenomics (Arrays and Methylation) • Splicing Variation (RNASeq) • Protein Expression (Arrays)
  • 59. Open Data is Essential for Genomics International Cancer Genome Consortium • Collect ~500 tumour/normal pairs from each of 50 different major cancer types; 25,000 T/N pairs! • Comprehensive genome analysis of each T/N pair: – Genome – Transcriptome – Methylome – Clinical data • Make the data available to the research community & public. Identify genome changes …GATTATTCCAGGTAT… …GATTATTGCAGGTAT… …GATTATTGCAGGTAT… Adapted from Tom Hudson
  • 60. ONTARIO INSTITUTE FOR CANCER RESEARC 60
  • 61. Open Data is Essential for Genomics International Cancer Genome Consortium: http:/icgc.org
  • 62. Open Data is Essential for Genomics ICGC needs to deal with different kinds of users! 62 • Biologists/Clinicians: – Web interface to processed data, providing: • Affected gene lists with consequences • Impact on pathways • Power users: – Application Programing Interface (API) to get to data – Availability and Integration with cloud resources
  • 63. Open Data is Essential for Genomics ICGC Data Coordinating Centre: dcc.icgc.org 63
  • 64. Open Data is Essential for Genomics https://dcc.icgc.org/ 64
  • 65. Open Data is Essential for Genomics 65 https://dcc.icgc.org/icgc-in-the-cloud
  • 66. Open Data is Essential for Genomics 66 http://www.cancercollaboratory.org/
  • 67. Open Data is Essential for Genomics Some challenges: 67 • So, we have lots of data, is it generated the same way?
  • 68. Open Data is Essential for Genomics Every country/group has basically been submitting: 68 – Simple Somatic Mutations (SSM or SNV) – Copy Number Alterations (CAN or CNV) – Structural Variants (SV) – Germline variants (SNPs) – Gene Expression (micro-arrays and RNASeq) – miRNA Expression (RNASeq) – Epigenomics (Arrays and Methylation) – Splicing Variation (RNASeq) – Protein Expression (Arrays)
  • 69. Open Data is Essential for Genomics Are they all using the same pipelines? 69 • No
  • 70. Open Data is Essential for Genomics 70
  • 71. Open Data is Essential for Genomics Steering Committee of PCAWG 71 • Peter Campbell, Sanger Inst. • Gady Getz, Broad • Jan Korbel, EMBL • Lincoln Stein, OICR • Josh Stuart, UCSC
  • 72. Open Data is Essential for Genomics PanCancer Analysis of Whole Genomes (PCAWG) • > 2,800 T/N pairs with clinical data from 20 tumour type of whole genome analysis. • Aligned with one standard pipeline. • Genomic Variants determined with 3 pipelines • 17 working groups • > 50 Papers are being written now.
  • 73. Open Data is Essential for Genomics https://www.biorxiv.org/search/pcawg
  • 74. Open Data is Essential for Genomics Deliverable for PCAWG include: 74 • 1st PANCANCER analysis on > 2,800 cancer tumours from a WGS perspective • RNA, SSM, CNV, Methylation analysis & germline • Published (executable) pipelines – Docker / Dockstore – Mutiple cloud access to data – Multiple portal access to data
  • 75. Open Data is Essential for Genomics https://dcc.icgc.org/pcawg 75
  • 76. Open Data is Essential for Genomics Working Groups (1/2) 76 1. Novel somatic mutation calling methods 2. Analysis of mutations in regulatory regions 3. Integration of transcriptome and genome 4. Integration of epigenome and genome 5. Consequences of somatic mutations on pathway and network activity 6. Patterns of structural variations, signatures, genomic correlations, retrotransposons, mobile elements 7. Mutation signatures and processes 8. Germline cancer genome
  • 77. Open Data is Essential for Genomics Working Groups (2/2) 77 9 Inferring driver mutations and identifying cancer genes and pathways 10 Translating cancer genomes to the clinic 11 Evolution and heterogeneity 12 Exploratory: portals, visualization and software infrastructure 13 Molecular subtypes and classification 14 Analysis of mutations in non-coding RNA 15 Exploratory: mitochondrial 16 Exploratory: pathogens 17 Tech Technical working group
  • 78. Open Data is Essential for Genomics https://goo.gl/AMxwSU
  • 79. Open Data is Essential for Genomics https://goo.gl/AMxwSU
  • 80. Open Data is Essential for Genomics https://goo.gl/AMxwSU
  • 81. Open Data is Essential for Genomics https://goo.gl/AMxwSU
  • 82. Open Data is Essential for Genomics http://dockstore.org 82
  • 83. Open Data is Essential for Genomics Docker Testing Group • Group that to ensure all container workflow work as expected. https://goo.gl/AMxwSU
  • 84. Open Data is Essential for Genomics Access to Data? • Human Data • Patients consented to have their DNA looked at so people could understand cancer • Need to have a system to maximize people’s gift to science.
  • 85. Open Data is Essential for Genomics
  • 86. Open Data is Essential for Genomics Identify yourself Fill out detail form which includes: • Contact and Project Information •Information Technology details and procedures for keeping data secure •Data Access Agreement All of these documents are put into a PDF file that you print and get your institution to sign off on your behalf
  • 87. Open Data is Essential for Genomics
  • 88. Open Data is Essential for Genomics
  • 89. Open Data is Essential for Genomics 89 https://icgc.org/daco/approved-projects 314 groups
  • 90. Open Data is Essential for Genomics DACO ICGC dbGaP GDC EGA TCGA BAM Open Open ERA BA M BA M EGA id & password WGS Ger m Line
  • 91. Open Data is Essential for Genomics Challenge: • Open Data and controlled access data • Not enough eyeballs on the data • Eyeballs on the data needed to make discoveries. https://goo.gl/ogbWXG
  • 92. Open Data is Essential for Genomics Culture of Sharing Openly • Public Funding agencies • Consortiums • Mentors • Peers • New generation (vs my old generation) • Has to become the norm
  • 93. Open Data is Essential for Genomics Final thoughts … • Access to data is essential for science • Getting data that is FAIR is hard work • It is essential to share the work you do if you want to be recognized, get tenure, get a job or a promotion. • Human data is more complicated, but don’t let that get in the way! • There is a lot of material out there, learn from it (& cite your sources)!
  • 94. Open Data is Essential for Genomics Last message to students and young PDFs and investigators:
  • 95. Open Data is Essential for Genomics Last message to students and young PDFs and investigators: Be open so people can see how great you are!
  • 96. ONTARIO INSTITUTE FOR CANCER RESEARC 96 915
  • 97. Open Data is Essential for Genomics DCC Software Developer Vincent Ferretti Dusan Andric Phuong-My Do Francois Gerthoffert Terry Lin Michael Moncada Vitalii Slobodianyk Bob Tiernay Douglas Wong Linda Xiang Junjun Zhang Acknowledgments ICGC/OICR Project leaders: Tom Hudson John McPherson Lincoln Stein Jared Simpson Paul Boutros Vincent Ferretti Francis Ouellette Jennifer Jennings Ouellette Lab Alysha Moncrieffe Ann Meyer Zhibin Lu Web Dev Joseph Yamada Kaman Wu Kim Cullion Koji Miyauchi Miyuki Fukuma ICGC DCC Biocuration Hardeep Nahal Marc Perry http://oicr.on.ca http://icgc.org … and all the patients and their families that that are putting their hopes into our work! Research IT/Systems David Sutton, Bob Gibson David Magda Rob Naccarato Brian Ott Gino Yearwood EGA Jordi Rambla De Argila Arcadi Navarro Audald Iloret Mauricio Moldes
  • 98. | ÉQUIPE DES AFFAIRES SCIENTIFIQUES 9827 mars 2017 B.F. Francis Ouellette Annina Spilker Joël Savard Diana IglesiasDiane Bouchard Cristina CiurliMicheline Ayoub Hélène Fournier
  • 99. Open Data is Essential for Genomics 99 Grazie
  • 100. Open Data is Essential for Genomics 100