SlideShare a Scribd company logo
© 2013 New York Genome Center 1PRIVILEGED & CONFIDENTIAL
Challenges of Building
Informatics Infrastructure
for Clinical Genomics
TOBY BLOOM, PH.D.
OCTOBER 5, 2013
© 2013 New York Genome Center 2PRIVILEGED & CONFIDENTIAL
COLLABORATION OF 12 FOUNDING
INSTITUTIONS
© 2013 New York Genome Center 3PRIVILEGED & CONFIDENTIAL
ASSOCIATE MEMBERS
More
To
Come
Objective: Extend NYGC value by expanding
collaborative network and enriching scientific community
© 2013 New York Genome Center 4PRIVILEGED & CONFIDENTIAL
NEW YORK GENOME CENTER
Fostering collaborative genomics research across
New York area
Central sequencing, informatics infrastructure,
analysis services for member institutions & others
Collaborative research projects
Focus on clinical genomics
Still in “start-up” mode
Bob Darnell – President & Scientific Director
HHMI Prof. of Neuroscience at Rockefeller University
First faculty member – Tuuli Lapalainen
Joint w/ Columbia
© 2013 New York Genome Center 5PRIVILEGED & CONFIDENTIAL
FROM THE INFORMATICS PERSPECTIVE
Build an informatics environment that fosters large
collaborative projects and provides the
infrastructure for large-scale clinical genomics
studies
Compute and storage resources
Informatics tools and methods
Privacy enforcement
Computational biology services
Standard large datasets
Reference databases
Submission and retrieval services
© 2013 New York Genome Center 6PRIVILEGED & CONFIDENTIAL
CLINICAL INFORMATICS INFRASTRUCTURE-
CURRENT CHALLENGES
Clinical data management
Analysis integrating disparate, longitudinal data
sources
Pipeline performance
turn-around vs throughput
Pipeline quality and validation
SECURITY and PRIVACY
Tracking IRBs, informed consents, DAC approvals
HIPAA, HiTech compliance
21 CFR Part 11
CLIA
© 2013 New York Genome Center 7PRIVILEGED & CONFIDENTIAL
STARTING “BACKWARDS”:
CLINICAL DATA
Working with 6 hospitals on a federated,
longitudinal clinical data warehouse
2.5 million patients to start
Could grow to 8 million
Basic, anonymized clinical data stored at
NYGC
Connected to hospital data warehouses for
more complex queries
© 2013 New York Genome Center 8PRIVILEGED & CONFIDENTIAL
DATA STORED AT NYGC
At least 5 years of longitudinal records for
each
Fully anonymized, HIPAA safe-haven data
Available for cohort identification and
retrospective studies
Patient-matching across institutions
Tagged with cohort consents
Tagged by availability of genomic data
© 2013 New York Genome Center 9PRIVILEGED & CONFIDENTIAL
PROSPECTIVE FUNCTIONALITY
Study-specific data marts
With informed consent
Connections back to hospital warehouse edge
servers for more complex data
Collection of study-specific clinical research
data
Connections to genomic data
© 2013 New York Genome Center 10PRIVILEGED & CONFIDENTIAL
THE CLINICAL REPOSITORY
© 2013 New York Genome Center 11PRIVILEGED & CONFIDENTIAL
DATA TYPE INTEGRATION –
COMPLEX AND GROWING
ResultsClinical Data
(Longitudinal)
Genome
RNA-Seq
(Longitudinal)
Microbiome
ChIP-Seq
Personal-
Reported
Data(FitBit)
Reference
Filters
© 2013 New York Genome Center 12PRIVILEGED & CONFIDENTIAL
DATA INTEGRATION: ONE EXAMPLE
Auto-immune disease project
4 hospitals and NYGC
Cross-disease
At least weekly blood collection
More frequent around flares
At least monthly clinical evaluations
More frequent around flares
Continuous personal device data
Correlate changes in expression w/ changes in
clinical metrics, and changes in personal device
data streams
© 2013 New York Genome Center 13PRIVILEGED & CONFIDENTIAL
DATA	INTEGRATION:
THE	COMPLEXITY	CONTINUES	TO	GROW
Genomic
WGS
WES
RNA-Seq
ChIP-Seq
Clip-Seq
RRBS
Longitudinal?
EHR*
Different dictionaries
Different standards
Doctors notes
Personal device
Patient-reported
Standard datasets,
references, ….
© 2013 New York Genome Center 14PRIVILEGED & CONFIDENTIAL
LONGITUDINAL MULTI-MODAL DATA
Different types have different event
frequencies
Do baselines exist?
How many variables?
© 2013 New York Genome Center 15PRIVILEGED & CONFIDENTIAL
ACCURATE ANALYSIS METHODS
Different tools produce different results
That’s one thing in TCGA or 1000G
It’s another for clinical pipelines
How many tools do we need to run in parallel?
How do we know whether what we find is the
cause?
How sure are we that the associated treatment is right?
If that variant has other implications, what do we do?
Yes, we can validate
But how many rounds?
At what cost?
© 2013 New York Genome Center 16PRIVILEGED & CONFIDENTIAL
ANOTHER EXAMPLE
Rare pediatric cancer
Research study
Tumor-normal- recurrence
Some of the kids are still alive
RNA-Seq and WGS
3 fusion tools
Before we found the fusion
3 structural variant tools to confirm the deletion
No one tool found all of them
2 split read tools to find the exact breakpoints
Nico Robine & Anne-Katrin Emde
© 2013 New York Genome Center 17PRIVILEGED & CONFIDENTIAL
METHOD ACCESS RIGHTS
Academic only?
Non-profit only?
Commercially available
Possibly different licenses or different versions
So multiple different pipelines for different
users
That get different results!!
© 2013 New York Genome Center 18PRIVILEGED & CONFIDENTIAL
HIGH-PERFORMANCE:
WHAT’S THE GOAL?
A single patient in a hospital bed is not the
same as 1000 genomes in a research study.
How do we ensure that pipelines run fast
enough?
For large research studies,
Want high-throughput - fastest way to get all the
samples out
For clinical samples.
Want fast turn-around
Can’t get both
© 2013 New York Genome Center 19PRIVILEGED & CONFIDENTIAL
DATA SECURITY AND PRIVACY
Compliance
HIPAA
HiTech
21 CFR Part 11
FISMA
IRB approvals
Informed Consents
Data Access Committees
Who decides? Who enforces?
© 2013 New York Genome Center 20PRIVILEGED & CONFIDENTIAL
HOSPITAL REGULATIONS
No cloud
No public transports – like globus online
All data encrypted
Can you really run a wgs pipeline with the reads
encrypted?
BTW – hardware-assisted decryption takes 3
hours/BAM
© 2013 New York Genome Center 21PRIVILEGED & CONFIDENTIAL
PI’S DON’T UNDERSTAND INFORMED
CONSENT
Who they can give their data to
What data they can use, if they can find it on the
file system
So the system has to monitor
Recent example:
We were asked to host a large dataset
First told it was publicly available data
Then corrected to “broadly consented”
We asked to see the IRB and an informed consent
THEN they realized they needed IRB approval for us to
host the data, and the IRB wanted to see our security
plans.
© 2013 New York Genome Center 22PRIVILEGED & CONFIDENTIAL
VERY GRANULAR ACCESS CONTROL
Access control is not per patient
Multiple studies per patient
Different Pis on each
Not per project
Informed consents still have check boxes
So some researchers can get access to some samples but
not others from a project
So need the ability to have same file in multiple
projects
© 2013 New York Genome Center 23PRIVILEGED & CONFIDENTIAL
PROVIDING A SECURE ENVIRONMENT
All data sits on shared storage
But that storage is isolated
All access goes through an invisible “security gate” that
allows researchers to see only the data they have rights
to:
Their data
Data for which they have data use authorization
Reference data
Public data
Catalog of all data
Needs to be fast and transparent
They need to work on virtual machines, so no data on
local disk or in memory is accidently shared.
© 2013 New York Genome Center 24PRIVILEGED & CONFIDENTIAL
MAINTAINING DATA PRIVACY
StorageCompute
Access Control
© 2013 New York Genome Center 25PRIVILEGED & CONFIDENTIAL
SUMMARY
Integrating with clinical data is not easy
Infrastructure is complex
Need to understand analysis methods better
Need to understand security better
© 2013 New York Genome Center 26PRIVILEGED & CONFIDENTIAL
101 AVENUE OF THE AMERICAS
COLLABORATIVE HUB
We’re hiring J
Computational Biologists
Software Engineers
Project Mgrs
Lab Personnel
Principal Investigators
Faculty Joint Appts
………..
© 2013 New York Genome Center 27PRIVILEGED & CONFIDENTIAL
ACKNOWLEDGEMENTS
NYGC
Bob Darnell
Vlada Vacic
Nicolas Robine
Avinash Abhiyankar
Uday Evani
Anne-Katrin Emde
James Spencer
Nina Lapchyk
Soren Germer
Dayna Oschwald
Yanlei Diao (UMass)
Deborah Estrin (Cornell Tech)
Rainu Kaushal (Weil-Cornell)
George Hripsak(Columbia)
Tom Check (Healthix)
Tom Campion (Cornell)
Parsa Mirhaji ( Einstein)
Dana Orange (Rockefeller)
Nathaniel Novod (Broad)
Seva Kashin (Mass General)
Leslie Greengard (Simons Center)
Alex Lash (Simons Center)

More Related Content

PDF
Bio it 2014-published
PDF
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
PDF
14 Drivers of eConsent Adoption
PPTX
National Cancer Data Ecosystem and Data Sharing
PPTX
Genomics and Computation in Precision Medicine March 2017
PDF
Metro nome agbt-poster
PPTX
Cloud Computing: Safe Haven from the Data Deluge? AGBT 2011
PDF
동북아 국제 정세(박인휘 교수)
Bio it 2014-published
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
14 Drivers of eConsent Adoption
National Cancer Data Ecosystem and Data Sharing
Genomics and Computation in Precision Medicine March 2017
Metro nome agbt-poster
Cloud Computing: Safe Haven from the Data Deluge? AGBT 2011
동북아 국제 정세(박인휘 교수)

Viewers also liked (20)

PPTX
Biomedical genomics lecture
PPTX
길벗 오픈 안내문
PDF
미래 인재상과 스펙초월 채용시스템(장석호)
PDF
Computational challenges in precision medicine and genomics
PPTX
Finding and Accessing Human Genomics Datasets
PPTX
Genomics isn't Special
PDF
[2014년 5월 20일] 바이오 및 의료산업동향
PPTX
Genomics Facts: Did You Know?
PPTX
Going Beyond Genomics in Precision Medicine: What's Next
PPTX
human genetics and population genetics
PDF
Sample Prep Solutions for Microbiome Research
PPTX
Genomic Medicine: Personalized Care for Just Pennies
PDF
Microbiome Profiling with the Microbial Genomics Pro Suite
PPTX
Precision Medicine in Oncology Informatics
PDF
Europe health tech report 2016
PPTX
NCI Cancer Imaging Program - Cancer Research Data Ecosystem
PPTX
Role of Amyloid Burden in cognitive decline
PPTX
Big Process for Big Data
ODP
Big Data and Genomics
PPTX
CI4CC sustainability-panel
Biomedical genomics lecture
길벗 오픈 안내문
미래 인재상과 스펙초월 채용시스템(장석호)
Computational challenges in precision medicine and genomics
Finding and Accessing Human Genomics Datasets
Genomics isn't Special
[2014년 5월 20일] 바이오 및 의료산업동향
Genomics Facts: Did You Know?
Going Beyond Genomics in Precision Medicine: What's Next
human genetics and population genetics
Sample Prep Solutions for Microbiome Research
Genomic Medicine: Personalized Care for Just Pennies
Microbiome Profiling with the Microbial Genomics Pro Suite
Precision Medicine in Oncology Informatics
Europe health tech report 2016
NCI Cancer Imaging Program - Cancer Research Data Ecosystem
Role of Amyloid Burden in cognitive decline
Big Process for Big Data
Big Data and Genomics
CI4CC sustainability-panel
Ad

Similar to Informatics Infrastructure for Clinical Genomics (20)

PPTX
Best Practices for Data Collection and Management in Clinical Trials
PDF
Lessons from the UK: Data access, patient trust & real-world impact with heal...
PPTX
Revolutionizing Clinical Trial Data Quality through Intelligent Query Management
PPTX
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
PDF
Data Governance in two different data archives: When is a federal data reposi...
PPTX
Webinar: Increase research efficiency and enable collaboration with the IDBS ...
PPTX
Electronic Data Capture (EDC) Systems: Streamlining Data Collection
DOCX
ScienceDirectAvailable online at www.sciencedirect.com
PDF
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
PDF
Data security in genomics: A review of Australian privacy requirements and th...
PPTX
V-REP_Research Kick Start Workshop_25092024_PPT Template (1) (1).pptx
PDF
Introduction to CTIM - the Clinical Trial Information Mediator
PDF
Revolutionizing Clinical Trial Data Quality through Intelligent Query Management
PPTX
Data management federal requirements 9 2015
PPTX
Research Data Management Services at UWA (November 2015)
PPTX
Securing, storing and enabling safe access to data
PDF
Challenges and Opportunities in Conducting Multi Centre Clinical Trials
PDF
Virtual Clinical Trials-06-12-2023.pdf
PPTX
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
PDF
Q&A: The Internet of Everything in Clinical Trials
Best Practices for Data Collection and Management in Clinical Trials
Lessons from the UK: Data access, patient trust & real-world impact with heal...
Revolutionizing Clinical Trial Data Quality through Intelligent Query Management
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
Data Governance in two different data archives: When is a federal data reposi...
Webinar: Increase research efficiency and enable collaboration with the IDBS ...
Electronic Data Capture (EDC) Systems: Streamlining Data Collection
ScienceDirectAvailable online at www.sciencedirect.com
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
Data security in genomics: A review of Australian privacy requirements and th...
V-REP_Research Kick Start Workshop_25092024_PPT Template (1) (1).pptx
Introduction to CTIM - the Clinical Trial Information Mediator
Revolutionizing Clinical Trial Data Quality through Intelligent Query Management
Data management federal requirements 9 2015
Research Data Management Services at UWA (November 2015)
Securing, storing and enabling safe access to data
Challenges and Opportunities in Conducting Multi Centre Clinical Trials
Virtual Clinical Trials-06-12-2023.pdf
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
Q&A: The Internet of Everything in Clinical Trials
Ad

Recently uploaded (20)

PPT
ASRH Presentation for students and teachers 2770633.ppt
PPTX
DENTAL CARIES FOR DENTISTRY STUDENT.pptx
PPTX
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
DOCX
NEET PG 2025 | Pharmacology Recall: 20 High-Yield Questions Simplified
PDF
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
PPT
genitourinary-cancers_1.ppt Nursing care of clients with GU cancer
DOC
Adobe Premiere Pro CC Crack With Serial Key Full Free Download 2025
PPTX
Important Obstetric Emergency that must be recognised
PPTX
Uterus anatomy embryology, and clinical aspects
PPT
OPIOID ANALGESICS AND THEIR IMPLICATIONS
PPTX
CEREBROVASCULAR DISORDER.POWERPOINT PRESENTATIONx
PDF
Khadir.pdf Acacia catechu drug Ayurvedic medicine
PPTX
Slider: TOC sampling methods for cleaning validation
PPT
CHAPTER FIVE. '' Association in epidemiological studies and potential errors
PDF
CT Anatomy for Radiotherapy.pdf eryuioooop
PPTX
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
PPTX
Electromyography (EMG) in Physiotherapy: Principles, Procedure & Clinical App...
PPTX
neonatal infection(7392992y282939y5.pptx
PDF
Deadly Stampede at Yaounde’s Olembe Stadium Forensic.pdf
PDF
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
ASRH Presentation for students and teachers 2770633.ppt
DENTAL CARIES FOR DENTISTRY STUDENT.pptx
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
NEET PG 2025 | Pharmacology Recall: 20 High-Yield Questions Simplified
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
genitourinary-cancers_1.ppt Nursing care of clients with GU cancer
Adobe Premiere Pro CC Crack With Serial Key Full Free Download 2025
Important Obstetric Emergency that must be recognised
Uterus anatomy embryology, and clinical aspects
OPIOID ANALGESICS AND THEIR IMPLICATIONS
CEREBROVASCULAR DISORDER.POWERPOINT PRESENTATIONx
Khadir.pdf Acacia catechu drug Ayurvedic medicine
Slider: TOC sampling methods for cleaning validation
CHAPTER FIVE. '' Association in epidemiological studies and potential errors
CT Anatomy for Radiotherapy.pdf eryuioooop
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
Electromyography (EMG) in Physiotherapy: Principles, Procedure & Clinical App...
neonatal infection(7392992y282939y5.pptx
Deadly Stampede at Yaounde’s Olembe Stadium Forensic.pdf
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf

Informatics Infrastructure for Clinical Genomics

  • 1. © 2013 New York Genome Center 1PRIVILEGED & CONFIDENTIAL Challenges of Building Informatics Infrastructure for Clinical Genomics TOBY BLOOM, PH.D. OCTOBER 5, 2013
  • 2. © 2013 New York Genome Center 2PRIVILEGED & CONFIDENTIAL COLLABORATION OF 12 FOUNDING INSTITUTIONS
  • 3. © 2013 New York Genome Center 3PRIVILEGED & CONFIDENTIAL ASSOCIATE MEMBERS More To Come Objective: Extend NYGC value by expanding collaborative network and enriching scientific community
  • 4. © 2013 New York Genome Center 4PRIVILEGED & CONFIDENTIAL NEW YORK GENOME CENTER Fostering collaborative genomics research across New York area Central sequencing, informatics infrastructure, analysis services for member institutions & others Collaborative research projects Focus on clinical genomics Still in “start-up” mode Bob Darnell – President & Scientific Director HHMI Prof. of Neuroscience at Rockefeller University First faculty member – Tuuli Lapalainen Joint w/ Columbia
  • 5. © 2013 New York Genome Center 5PRIVILEGED & CONFIDENTIAL FROM THE INFORMATICS PERSPECTIVE Build an informatics environment that fosters large collaborative projects and provides the infrastructure for large-scale clinical genomics studies Compute and storage resources Informatics tools and methods Privacy enforcement Computational biology services Standard large datasets Reference databases Submission and retrieval services
  • 6. © 2013 New York Genome Center 6PRIVILEGED & CONFIDENTIAL CLINICAL INFORMATICS INFRASTRUCTURE- CURRENT CHALLENGES Clinical data management Analysis integrating disparate, longitudinal data sources Pipeline performance turn-around vs throughput Pipeline quality and validation SECURITY and PRIVACY Tracking IRBs, informed consents, DAC approvals HIPAA, HiTech compliance 21 CFR Part 11 CLIA
  • 7. © 2013 New York Genome Center 7PRIVILEGED & CONFIDENTIAL STARTING “BACKWARDS”: CLINICAL DATA Working with 6 hospitals on a federated, longitudinal clinical data warehouse 2.5 million patients to start Could grow to 8 million Basic, anonymized clinical data stored at NYGC Connected to hospital data warehouses for more complex queries
  • 8. © 2013 New York Genome Center 8PRIVILEGED & CONFIDENTIAL DATA STORED AT NYGC At least 5 years of longitudinal records for each Fully anonymized, HIPAA safe-haven data Available for cohort identification and retrospective studies Patient-matching across institutions Tagged with cohort consents Tagged by availability of genomic data
  • 9. © 2013 New York Genome Center 9PRIVILEGED & CONFIDENTIAL PROSPECTIVE FUNCTIONALITY Study-specific data marts With informed consent Connections back to hospital warehouse edge servers for more complex data Collection of study-specific clinical research data Connections to genomic data
  • 10. © 2013 New York Genome Center 10PRIVILEGED & CONFIDENTIAL THE CLINICAL REPOSITORY
  • 11. © 2013 New York Genome Center 11PRIVILEGED & CONFIDENTIAL DATA TYPE INTEGRATION – COMPLEX AND GROWING ResultsClinical Data (Longitudinal) Genome RNA-Seq (Longitudinal) Microbiome ChIP-Seq Personal- Reported Data(FitBit) Reference Filters
  • 12. © 2013 New York Genome Center 12PRIVILEGED & CONFIDENTIAL DATA INTEGRATION: ONE EXAMPLE Auto-immune disease project 4 hospitals and NYGC Cross-disease At least weekly blood collection More frequent around flares At least monthly clinical evaluations More frequent around flares Continuous personal device data Correlate changes in expression w/ changes in clinical metrics, and changes in personal device data streams
  • 13. © 2013 New York Genome Center 13PRIVILEGED & CONFIDENTIAL DATA INTEGRATION: THE COMPLEXITY CONTINUES TO GROW Genomic WGS WES RNA-Seq ChIP-Seq Clip-Seq RRBS Longitudinal? EHR* Different dictionaries Different standards Doctors notes Personal device Patient-reported Standard datasets, references, ….
  • 14. © 2013 New York Genome Center 14PRIVILEGED & CONFIDENTIAL LONGITUDINAL MULTI-MODAL DATA Different types have different event frequencies Do baselines exist? How many variables?
  • 15. © 2013 New York Genome Center 15PRIVILEGED & CONFIDENTIAL ACCURATE ANALYSIS METHODS Different tools produce different results That’s one thing in TCGA or 1000G It’s another for clinical pipelines How many tools do we need to run in parallel? How do we know whether what we find is the cause? How sure are we that the associated treatment is right? If that variant has other implications, what do we do? Yes, we can validate But how many rounds? At what cost?
  • 16. © 2013 New York Genome Center 16PRIVILEGED & CONFIDENTIAL ANOTHER EXAMPLE Rare pediatric cancer Research study Tumor-normal- recurrence Some of the kids are still alive RNA-Seq and WGS 3 fusion tools Before we found the fusion 3 structural variant tools to confirm the deletion No one tool found all of them 2 split read tools to find the exact breakpoints Nico Robine & Anne-Katrin Emde
  • 17. © 2013 New York Genome Center 17PRIVILEGED & CONFIDENTIAL METHOD ACCESS RIGHTS Academic only? Non-profit only? Commercially available Possibly different licenses or different versions So multiple different pipelines for different users That get different results!!
  • 18. © 2013 New York Genome Center 18PRIVILEGED & CONFIDENTIAL HIGH-PERFORMANCE: WHAT’S THE GOAL? A single patient in a hospital bed is not the same as 1000 genomes in a research study. How do we ensure that pipelines run fast enough? For large research studies, Want high-throughput - fastest way to get all the samples out For clinical samples. Want fast turn-around Can’t get both
  • 19. © 2013 New York Genome Center 19PRIVILEGED & CONFIDENTIAL DATA SECURITY AND PRIVACY Compliance HIPAA HiTech 21 CFR Part 11 FISMA IRB approvals Informed Consents Data Access Committees Who decides? Who enforces?
  • 20. © 2013 New York Genome Center 20PRIVILEGED & CONFIDENTIAL HOSPITAL REGULATIONS No cloud No public transports – like globus online All data encrypted Can you really run a wgs pipeline with the reads encrypted? BTW – hardware-assisted decryption takes 3 hours/BAM
  • 21. © 2013 New York Genome Center 21PRIVILEGED & CONFIDENTIAL PI’S DON’T UNDERSTAND INFORMED CONSENT Who they can give their data to What data they can use, if they can find it on the file system So the system has to monitor Recent example: We were asked to host a large dataset First told it was publicly available data Then corrected to “broadly consented” We asked to see the IRB and an informed consent THEN they realized they needed IRB approval for us to host the data, and the IRB wanted to see our security plans.
  • 22. © 2013 New York Genome Center 22PRIVILEGED & CONFIDENTIAL VERY GRANULAR ACCESS CONTROL Access control is not per patient Multiple studies per patient Different Pis on each Not per project Informed consents still have check boxes So some researchers can get access to some samples but not others from a project So need the ability to have same file in multiple projects
  • 23. © 2013 New York Genome Center 23PRIVILEGED & CONFIDENTIAL PROVIDING A SECURE ENVIRONMENT All data sits on shared storage But that storage is isolated All access goes through an invisible “security gate” that allows researchers to see only the data they have rights to: Their data Data for which they have data use authorization Reference data Public data Catalog of all data Needs to be fast and transparent They need to work on virtual machines, so no data on local disk or in memory is accidently shared.
  • 24. © 2013 New York Genome Center 24PRIVILEGED & CONFIDENTIAL MAINTAINING DATA PRIVACY StorageCompute Access Control
  • 25. © 2013 New York Genome Center 25PRIVILEGED & CONFIDENTIAL SUMMARY Integrating with clinical data is not easy Infrastructure is complex Need to understand analysis methods better Need to understand security better
  • 26. © 2013 New York Genome Center 26PRIVILEGED & CONFIDENTIAL 101 AVENUE OF THE AMERICAS COLLABORATIVE HUB We’re hiring J Computational Biologists Software Engineers Project Mgrs Lab Personnel Principal Investigators Faculty Joint Appts ………..
  • 27. © 2013 New York Genome Center 27PRIVILEGED & CONFIDENTIAL ACKNOWLEDGEMENTS NYGC Bob Darnell Vlada Vacic Nicolas Robine Avinash Abhiyankar Uday Evani Anne-Katrin Emde James Spencer Nina Lapchyk Soren Germer Dayna Oschwald Yanlei Diao (UMass) Deborah Estrin (Cornell Tech) Rainu Kaushal (Weil-Cornell) George Hripsak(Columbia) Tom Check (Healthix) Tom Campion (Cornell) Parsa Mirhaji ( Einstein) Dana Orange (Rockefeller) Nathaniel Novod (Broad) Seva Kashin (Mass General) Leslie Greengard (Simons Center) Alex Lash (Simons Center)