SlideShare a Scribd company logo
“Using Supercomputers and Data Analytics to Discover
the Differences in Health and Disease”
Briefing for
Dell Analytics Team
Calit2’s Qualcomm Institute
University of California, San Diego
April 7, 2016
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net
1
We Gathered Raw Illumina Reads on 275 Humans
and Generated a Time Series of My Gut Microbiome
5 Ileal Crohn’s Patients,
3 Points in Time
2 Ulcerative Colitis Patients,
6 Points in Time
“Healthy” Individuals
Source: Jerry Sheehan, Calit2
Weizhong Li, Sitao Wu, CRBS, UCSD
Total of 27 Billion Reads
Or 2.7 Trillion Bases
Inflammatory Bowel Disease (IBD) Patients
250 Subjects
1 Point in Time
7 Points in Time
Each Sample Has 100-200 Million Illumina Short Reads (100 bases)
Larry Smarr
(Colonic Crohn’s)
To Map Out the Dynamics of Autoimmune Microbiome Ecology
Couples Next Generation Genome Sequencers to Big Data Supercomputers
Source: Weizhong Li, UCSD
Our Team Used 25 CPU-years
to Compute
Comparative Gut Microbiomes
Starting From
2.7 Trillion DNA Bases
of My Samples
and Healthy and IBD Controls
Illumina HiSeq 2000 at JCVI
SDSC Gordon Data Supercomputer
To Expand IBD Project the Knight/Smarr Labs Were Awarded
~ 1 CPU-Century Supercomputing Time
• Smarr Gut Microbiome Time Series
– From 7 Samples Over 1.5 Years
– To 50 Samples Over 4 Years
• IBD Patients: From 5 Crohn’s Disease and 2 Ulcerative Colitis
Patients to ~100 Patients
– 50 Carefully Phenotyped Patients Drawn from Sandborn BioBank
– 43 Metagenomes from the RISK Cohort of Newly Diagnosed IBD patients
• New Software Suite from Knight Lab
– Re-annotation of Reference Genomes, Functional / Taxonomic Variations
– Novel Compute-Intensive Assembly Algorithms from Pavel Pevzner
8x Compute Resources
Over Prior Study
Next Step
Programmability, Scalability, and Reproducibility using bioKepler
www.kepler-project.org
www.biokepler.org
National
Resources
(Gordon) (Comet)
(Stampede)(Lonestar)
Cloud
Resources
Optimized
Local Cluster
Resources
Source:
Ilkay
Altintas,
SDSC
Using HPC and Data Analytics
to Discover Microbial Diagnostics for Disease Dynamics
• Can Data Distinguish Between Health and Disease Subtypes?
• Can Data Track the Time Development of the Disease State?
• Can Data Create Novel Microbial Diagnostics for Identifying Health and Disease States?
• Can Data Discover Functional Microbiome Gene Changes Between Health and Disease?
Can Data Distinguish Between
Health and Disease Subtypes?
Dell Analytics Separates The 4 Patient Types in Our Data
Using Our Microbiome Species Data
Source: Thomas Hill, Ph.D.
Executive Director Analytics
Dell | Information Management Group, Dell Software
Healthy
Ulcerative Colitis
Colonic Crohn’s
Ileal Crohn’s
Can Data Track
the Time Development of the Disease State?
I Built on Dell Analytics to Show Dynamic Evolution of My Microbiome
Toward and Away from Healthy State – Colonic Crohn’s
Healthy
Ileal Crohn’s
Seven Time Samples Over 1.5 Years
Colonic Crohn’s
Source: Thomas Hill, Ph.D.
Executive Director Analytics
Dell | Information Management Group, Dell Software
Variation in My Gut Microbiome by 16S Families –
40 Samples Over 3.5 Years
Data from Justine Debelius & Jose Navas, Knight Lab, UCSD; Larry Smarr Analysis, January 2016
Larry Smarr Gut Microbiome Ecology Shifted After Drug Therapy
Between Two Time-Stable Equilibriums Correlated to Physical Symptoms
Lialda
&
Uceris
12/1/13 to 1/1/14
12/1/13-
1/1/14
Frequent IBD Symptoms
Weight Loss
5/1/12 to 12/1/14
Blue Balls on Diagram
to the Right
Few IBD Symptoms
Weight Gain
1/1/14 to 1/1/16
Red Balls on Diagram
to the Right
Principal Coordinate Analysis of
Microbiome Ecology
PCoA by Justine Debelius and Jose Navas,
Knight Lab, UCSD
Weight Data from Larry Smarr, Calit2, UCSD
Antibiotics
Prednisone
1/1/12 to 5/1/12
5/1/12
Weekly Weight (Red Dots Stool Sample)
Few IBD Symptoms
Weight Gain
1/1/14 to 1/1/16
Red Balls on Diagram
to the Right
Can Data Create Novel Microbial Diagnostics
for Identifying Health and Disease States?
Dell Analytics Tree Graphs Classifies
the 4 Health/Disease States With Just 3 Microbe Species
Source: Thomas Hill, Ph.D.
Executive Director Analytics
Dell | Information Management Group, Dell Software
Our Relative Abundance Results Across ~300 People
Show Why Dell Analytics Tree Classifier Works
UC 100x Healthy
LS 100x UC
We Produced Similar Results for ~2500 Microbial Species
Healthy 100x CD
Ayasdi Enables Discovery of Differences Between
Healthy and Disease States Using Microbiome Species
Healthy LS
Ileal Crohn’s Ulcerative Colitis
Using Multidimensional
Scaling Lens with
Correlation Metric
High in Healthy and LS
High in Healthy and
Ulcerative Colitis
High in Both LS and
Ileal Crohn’s Disease
Analysis by Mehrdad Yazdani, Calit2
Can Data Discover Functional Microbiome Gene Changes
Between Health and Disease?
We Computed the Relative Abundance of Microbial Gene Families -
~10,000 KEGG Orthologous Genes, Across Healthy and IBD Subjects
How Large is the Microbiome’s Genetic Change
Between Health and Disease States?
In a “Healthy” Gut Microbiome:
Large Taxonomy Variation, Low Protein Family Variation
Source: Nature, 486, 207-212 (2012)
Over 200 People
Ratio of HE11529 to Ave HE
Test to see How Much Variation There is Within Healthy
Most KEGGs Are Within 10x
Of Healthy for a Random HE
Ratio of Random HE11529 to Healthy Average for Each Nonzero KEGG
Similar to HMP Healthy Results
Our Research Shows Large Changes
in Protein Families Between Health and Disease – Ileal Crohns
KEGGs Greatly Increased
In the Disease State
KEGGs Greatly Decreased
In the Disease State
Over 7000 KEGGs Which Are Nonzero
in Health and Disease States
Ratio of CD Average to Healthy Average for Each Nonzero KEGG
Note Hi/Low
Symmetry
Similar Results for UC and LS
We Found a Set of Ayasdi Lenses That Separate Out
the 43 Extreme KEGGs Common to the Disease States
K00108(choline_dehydrogenase)
K00673(arginine_N-succinyltransferase)
K00867(type_I_pantothenate_kinase)
K01169(ribonuclease_I_(enterobacter_ribonuclease))
K01484(succinylarginine_dihydrolase)
K01682(aconitate_hydratase_2)
K01690(phosphogluconate_dehydratase)
K01825(3-hydroxyacyl-CoA_dehydrogenase_/_enoyl-CoA_hydratase_/3-hydroxybutyryl-CoA_epimerase_/_e
K02173(hypothetical_protein)
K02317(DNA_replication_protein_DnaT)
K02466(glucitol_operon_activator_protein)
K02846(N-methyl-L-tryptophan_oxidase)
K03081(3-dehydro-L-gulonate-6-phosphate_decarboxylase)
K03119(taurine_dioxygenase)
K03181(chorismate--pyruvate_lyase)
K03807(AmpE_protein)
K05522(endonuclease_VIII)
K05775(maltose_operon_periplasmic_protein)
K05812(conserved_hypothetical_protein)
K05997(Fe-S_cluster_assembly_protein_SufA)
K06073(vitamin_B12_transport_system_permease_protein)
K06205(MioC_protein)
K06445(acyl-CoA_dehydrogenase)
K06447(succinylglutamic_semialdehyde_dehydrogenase)
K07229(TrkA_domain_protein)
K07232(cation_transport_protein_ChaC)
K07312(putative_dimethyl_sulfoxide_reductase_subunit_YnfH_(DMSO_reductaseanchor_subunit))
K07336(PKHD-type_hydroxylase)
K08989(putative_membrane_protein)
K09018(putative_monooxygenase_RutA)
K09456(putative_acyl-CoA_dehydrogenase)
K09998(arginine_transport_system_permease_protein)
K10748(DNA_replication_terminus_site-binding_protein)
K11209(GST-like_protein)
K11391(ribosomal_RNA_large_subunit_methyltransferase_G)
K11734(aromatic_amino_acid_transport_protein_AroP)
K11735(GABA_permease)
K11925(SgrR_family_transcriptional_regulator)
K12288(pilus_assembly_protein_HofM)
K13255(ferric_iron_reductase_protein_FhuF)
K14588()
K15733()
K15834()
L-Infinity Centrality Lens
Using Norm Correlation
as Metric
(Resolution: 242, Gain: 5.7)
Entropy & Variance Lens
Using Angle as Metric
(Resolution: 30, Gain 3.00)
Analysis by Mehrdad Yazdani, Calit2
Disease Arises from Perturbed Protein Family Networks:
Dynamics of a Prion Perturbed Network in Mice
Source: Lee Hood, ISB 23
Our Next Goal is to Create
Such Perturbed Networks in Humans
Calit2’s Qualcomm Institute Has Developed
Interactive Scalable Visualization for Biological Networks
20,000 Samples
60,000 OTUs
18 Million Edges
Runs Native on 64Million Pixels
Center for
Microbiome
Innovation
Seminars
Faculty
Hiring
Education
UCSD Microbial Sciences Initiative
Instrument
Cores
Seed Grants
Fellowships
Chancellor Khosla Launched the UC San Diego
Microbiome and Microbial Sciences Initiative October 29, 2015
Thanks to Our Great Team!
Calit2@UCSD
Future Patient Team
Jerry Sheehan
Tom DeFanti
Joe Keefe
John Graham
Kevin Patrick
Mehrdad Yazdani
Jurgen Schulze
Andrew Prudhomme
Philip Weber
Fred Raab
Ernesto Ramirez
JCVI Team
Karen Nelson
Shibu Yooseph
Manolito Torralba
Ayasdi
Devi Ramanan
Pek Lum
UCSD Metagenomics Team
Weizhong Li
Sitao Wu
SDSC Team
Michael Norman
Mahidhar Tatineni
Robert Sinkovits
Ilkay Altintas
UCSD Health Sciences Team
David Brenner
Rob Knight Lab
Justine Debelius
Jose Navas
Bryn Taylor
Gail Ackermann
Greg Humphrey
William J. Sandborn Lab
Elisabeth Evans
John Chang
Brigid Boland
Dell/R Systems
Brian Kucic
John Thompson
Thomas Hill

More Related Content

PPTX
Linking Phenotype Changes to Internal/External Longitudinal Time Series in a ...
PPTX
Discovering the 100 Trillion Bacteria Living Within Each of Us
PPTX
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
PPTX
Machine Learning Opportunities in the Explosion of Personalized Precision Med...
PPTX
Stability in Health vs. Abrupt Changes in Disease in the Human Gut Microbiome...
PPTX
Quantifying Your Dynamic Human Body (Including Its Microbiome), Will Move Us ...
PPTX
Using Supercomputers and Data Science to Reveal Your Inner Microbiome
PPTX
The Human Microbiome, Supercomputers,and the Advancement of Medicine
Linking Phenotype Changes to Internal/External Longitudinal Time Series in a ...
Discovering the 100 Trillion Bacteria Living Within Each of Us
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Machine Learning Opportunities in the Explosion of Personalized Precision Med...
Stability in Health vs. Abrupt Changes in Disease in the Human Gut Microbiome...
Quantifying Your Dynamic Human Body (Including Its Microbiome), Will Move Us ...
Using Supercomputers and Data Science to Reveal Your Inner Microbiome
The Human Microbiome, Supercomputers,and the Advancement of Medicine

What's hot (20)

PPTX
Decoding the Software Inside of You
PPTX
Dynamics of Your Gut Microbiome in Health and Disease
PPTX
Fifty Years of Supercomputing: From Colliding Black Holes to Dynamic Microbio...
PPTX
Supercomputing Your Inner Microbiome
PPTX
Quantifying your Human Body & Its Trillions of Microbes
PPTX
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
PPTX
Using Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
PPTX
Finding the Patterns in the Big Data From Human Microbiome Ecology
PPTX
Exploring the Dynamics of The Microbiome in Health and Disease
PPT
Large Memory High Performance Computing Enables Comparison Across Human Gut M...
PPTX
Quantifying The Dynamics of Your Superorganism Body Using Big Data Supercompu...
PPTX
Quantfying Your Gut: A Personal Journey
PPTX
Discovering the Other 90% of our Human Superorganism
PPTX
Inspired by Carl: Exploring the Microbial Dynamics Within
PPT
The Human Microbiome and the Revolution in Digital Health
PPT
The Digital Transformation of Healthcare
PPT
Observing the Dynamics of the Human Immune System Coupled to the Microbiome i...
PPT
Big Data and Superorganism Genomics: Microbial Metagenomics Meets Human Genomics
PPTX
Assay Lab Within Your Body: Biometrics and Biomes
PPTX
The Human Gut Microbiome: A New Diagnostic for Disease?
Decoding the Software Inside of You
Dynamics of Your Gut Microbiome in Health and Disease
Fifty Years of Supercomputing: From Colliding Black Holes to Dynamic Microbio...
Supercomputing Your Inner Microbiome
Quantifying your Human Body & Its Trillions of Microbes
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Using Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
Finding the Patterns in the Big Data From Human Microbiome Ecology
Exploring the Dynamics of The Microbiome in Health and Disease
Large Memory High Performance Computing Enables Comparison Across Human Gut M...
Quantifying The Dynamics of Your Superorganism Body Using Big Data Supercompu...
Quantfying Your Gut: A Personal Journey
Discovering the Other 90% of our Human Superorganism
Inspired by Carl: Exploring the Microbial Dynamics Within
The Human Microbiome and the Revolution in Digital Health
The Digital Transformation of Healthcare
Observing the Dynamics of the Human Immune System Coupled to the Microbiome i...
Big Data and Superorganism Genomics: Microbial Metagenomics Meets Human Genomics
Assay Lab Within Your Body: Biometrics and Biomes
The Human Gut Microbiome: A New Diagnostic for Disease?
Ad

Viewers also liked (14)

PDF
การสำรองข้อมูล
PDF
Hot pump parts_HOT Mining
PPTX
Colton-Pierrepont Poverty Presentation 01132017
PPT
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
PDF
Product_Catalog
PDF
Medical complexity and complications of patients with traumatically induced doc
PPTX
Viewpoints: Leveraging ISS to Enable LEO Commercialization
PPTX
Velkommen som IKT student på 16 glu
PDF
Política del respeto y del buen trato
PDF
Ozone Detection in Pharmaceutical Containers
PPTX
Microbial agents involve in food contamination and food spoilage
PDF
Image Interpolation
PDF
Image interpolation
PPT
The Role of Libraries and Librarians in Information Literacy
การสำรองข้อมูล
Hot pump parts_HOT Mining
Colton-Pierrepont Poverty Presentation 01132017
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
Product_Catalog
Medical complexity and complications of patients with traumatically induced doc
Viewpoints: Leveraging ISS to Enable LEO Commercialization
Velkommen som IKT student på 16 glu
Política del respeto y del buen trato
Ozone Detection in Pharmaceutical Containers
Microbial agents involve in food contamination and food spoilage
Image Interpolation
Image interpolation
The Role of Libraries and Librarians in Information Literacy
Ad

Similar to Using Supercomputers and Data Analytics to Discover the Differences in Health and Disease (20)

PPTX
Using Data Analytics to Discover the 100 Trillion Bacteria Living Within Each...
PPTX
Recognizing the Patterns Within: How Biomedical Data Can Reveal Health vs. Di...
PDF
Using Dell’s HPC Cloud & Advanced Analytic Software to Discover Radical Chang...
PPTX
Using Supercomputing & Advanced Analytic Software to Discover Radical Changes...
PPTX
Measuring the Human Brain-Gut Microbiome-Immune System Dynamics: a Big Data C...
PPT
Digitally Revealing the Dynamics of Your Superorganism Body
PPTX
Mapping the Human Gut Microbiome in Health and Disease Using Sequencing, Supe...
PPT
From Digitally Enabled Genomic Medicine to Personalized Healthcare
PDF
Machine Learning in Healthcare by Mehrdad Yazdani
PPT
From N=1 to N=100: What I Have Learned from Quantifying My Superorganism Body
PPT
Tracking Large Variations in My Immune Biomarkers and My Gut Microbiome: Infl...
PPT
Living in a Microbial World
PPT
Individual, Consumer-Driven Care of the Future: Taking Wellness One Step Further
PPTX
"How Scientific Wellness will Drive The Future of Health" - Nathan Price (Pro...
PPTX
From Me To We: Discovering the Trillions of Microorganisms That are a Part of Us
PPTX
Toward Novel Human Microbiome Surveillance Diagnostics to Support Public Health
PPTX
Discovering Human Gut Microbiome Dynamics
PPT
Quantified Health and Disease
PPT
Using Genetic Sequencing to Unravel the Dynamics of Your Superorganism Body
PPT
Quantifying Your Superorganism Body Using Big Data Supercomputing
Using Data Analytics to Discover the 100 Trillion Bacteria Living Within Each...
Recognizing the Patterns Within: How Biomedical Data Can Reveal Health vs. Di...
Using Dell’s HPC Cloud & Advanced Analytic Software to Discover Radical Chang...
Using Supercomputing & Advanced Analytic Software to Discover Radical Changes...
Measuring the Human Brain-Gut Microbiome-Immune System Dynamics: a Big Data C...
Digitally Revealing the Dynamics of Your Superorganism Body
Mapping the Human Gut Microbiome in Health and Disease Using Sequencing, Supe...
From Digitally Enabled Genomic Medicine to Personalized Healthcare
Machine Learning in Healthcare by Mehrdad Yazdani
From N=1 to N=100: What I Have Learned from Quantifying My Superorganism Body
Tracking Large Variations in My Immune Biomarkers and My Gut Microbiome: Infl...
Living in a Microbial World
Individual, Consumer-Driven Care of the Future: Taking Wellness One Step Further
"How Scientific Wellness will Drive The Future of Health" - Nathan Price (Pro...
From Me To We: Discovering the Trillions of Microorganisms That are a Part of Us
Toward Novel Human Microbiome Surveillance Diagnostics to Support Public Health
Discovering Human Gut Microbiome Dynamics
Quantified Health and Disease
Using Genetic Sequencing to Unravel the Dynamics of Your Superorganism Body
Quantifying Your Superorganism Body Using Big Data Supercomputing

More from Larry Smarr (20)

PPTX
Smart Patients, Big Data, NextGen Primary Care
PPTX
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
PPTX
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
PPTX
National Research Platform: Application Drivers
PPT
From Supercomputing to the Grid - Larry Smarr
PPTX
The CENIC-AI Resource - Los Angeles Community College District (LACCD)
PPT
Redefining Collaboration through Groupware - From Groupware to Societyware
PPT
The Coming of the Grid - September 8-10,1997
PPT
Supercomputers: Directions in Technology, Architecture, and Applications
PPT
High Performance Geographic Information Systems
PPT
Data Intensive Applications at UCSD: Driving a Campus Research Cyberinfrastru...
PPT
Enhanced Telepresence and Green IT — The Next Evolution in the Internet
PPTX
The CENIC AI Resource CENIC AIR - CENIC Retreat 2024
PPTX
The CENIC-AI Resource: The Right Connection
PPTX
The Pacific Research Platform: The First Six Years
PPTX
The NSF Grants Leading Up to CHASE-CI ENS
PPTX
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
PPTX
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
PPTX
Toward a National Research Platform to Enable Data-Intensive Computing
PPTX
Digital Twins of Physical Reality - Future in Review
Smart Patients, Big Data, NextGen Primary Care
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
National Research Platform: Application Drivers
From Supercomputing to the Grid - Larry Smarr
The CENIC-AI Resource - Los Angeles Community College District (LACCD)
Redefining Collaboration through Groupware - From Groupware to Societyware
The Coming of the Grid - September 8-10,1997
Supercomputers: Directions in Technology, Architecture, and Applications
High Performance Geographic Information Systems
Data Intensive Applications at UCSD: Driving a Campus Research Cyberinfrastru...
Enhanced Telepresence and Green IT — The Next Evolution in the Internet
The CENIC AI Resource CENIC AIR - CENIC Retreat 2024
The CENIC-AI Resource: The Right Connection
The Pacific Research Platform: The First Six Years
The NSF Grants Leading Up to CHASE-CI ENS
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Toward a National Research Platform to Enable Data-Intensive Computing
Digital Twins of Physical Reality - Future in Review

Recently uploaded (20)

PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction to machine learning and Linear Models
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Database Infoormation System (DBIS).pptx
PDF
Foundation of Data Science unit number two notes
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Mega Projects Data Mega Projects Data
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Reliability_Chapter_ presentation 1221.5784
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
.pdf is not working space design for the following data for the following dat...
Introduction to machine learning and Linear Models
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Business Acumen Training GuidePresentation.pptx
ISS -ESG Data flows What is ESG and HowHow
oil_refinery_comprehensive_20250804084928 (1).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Database Infoormation System (DBIS).pptx
Foundation of Data Science unit number two notes
Miokarditis (Inflamasi pada Otot Jantung)
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Fluorescence-microscope_Botany_detailed content
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Mega Projects Data Mega Projects Data
Galatica Smart Energy Infrastructure Startup Pitch Deck
iec ppt-1 pptx icmr ppt on rehabilitation.pptx

Using Supercomputers and Data Analytics to Discover the Differences in Health and Disease

  • 1. “Using Supercomputers and Data Analytics to Discover the Differences in Health and Disease” Briefing for Dell Analytics Team Calit2’s Qualcomm Institute University of California, San Diego April 7, 2016 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net 1
  • 2. We Gathered Raw Illumina Reads on 275 Humans and Generated a Time Series of My Gut Microbiome 5 Ileal Crohn’s Patients, 3 Points in Time 2 Ulcerative Colitis Patients, 6 Points in Time “Healthy” Individuals Source: Jerry Sheehan, Calit2 Weizhong Li, Sitao Wu, CRBS, UCSD Total of 27 Billion Reads Or 2.7 Trillion Bases Inflammatory Bowel Disease (IBD) Patients 250 Subjects 1 Point in Time 7 Points in Time Each Sample Has 100-200 Million Illumina Short Reads (100 bases) Larry Smarr (Colonic Crohn’s)
  • 3. To Map Out the Dynamics of Autoimmune Microbiome Ecology Couples Next Generation Genome Sequencers to Big Data Supercomputers Source: Weizhong Li, UCSD Our Team Used 25 CPU-years to Compute Comparative Gut Microbiomes Starting From 2.7 Trillion DNA Bases of My Samples and Healthy and IBD Controls Illumina HiSeq 2000 at JCVI SDSC Gordon Data Supercomputer
  • 4. To Expand IBD Project the Knight/Smarr Labs Were Awarded ~ 1 CPU-Century Supercomputing Time • Smarr Gut Microbiome Time Series – From 7 Samples Over 1.5 Years – To 50 Samples Over 4 Years • IBD Patients: From 5 Crohn’s Disease and 2 Ulcerative Colitis Patients to ~100 Patients – 50 Carefully Phenotyped Patients Drawn from Sandborn BioBank – 43 Metagenomes from the RISK Cohort of Newly Diagnosed IBD patients • New Software Suite from Knight Lab – Re-annotation of Reference Genomes, Functional / Taxonomic Variations – Novel Compute-Intensive Assembly Algorithms from Pavel Pevzner 8x Compute Resources Over Prior Study
  • 5. Next Step Programmability, Scalability, and Reproducibility using bioKepler www.kepler-project.org www.biokepler.org National Resources (Gordon) (Comet) (Stampede)(Lonestar) Cloud Resources Optimized Local Cluster Resources Source: Ilkay Altintas, SDSC
  • 6. Using HPC and Data Analytics to Discover Microbial Diagnostics for Disease Dynamics • Can Data Distinguish Between Health and Disease Subtypes? • Can Data Track the Time Development of the Disease State? • Can Data Create Novel Microbial Diagnostics for Identifying Health and Disease States? • Can Data Discover Functional Microbiome Gene Changes Between Health and Disease?
  • 7. Can Data Distinguish Between Health and Disease Subtypes?
  • 8. Dell Analytics Separates The 4 Patient Types in Our Data Using Our Microbiome Species Data Source: Thomas Hill, Ph.D. Executive Director Analytics Dell | Information Management Group, Dell Software Healthy Ulcerative Colitis Colonic Crohn’s Ileal Crohn’s
  • 9. Can Data Track the Time Development of the Disease State?
  • 10. I Built on Dell Analytics to Show Dynamic Evolution of My Microbiome Toward and Away from Healthy State – Colonic Crohn’s Healthy Ileal Crohn’s Seven Time Samples Over 1.5 Years Colonic Crohn’s Source: Thomas Hill, Ph.D. Executive Director Analytics Dell | Information Management Group, Dell Software
  • 11. Variation in My Gut Microbiome by 16S Families – 40 Samples Over 3.5 Years Data from Justine Debelius & Jose Navas, Knight Lab, UCSD; Larry Smarr Analysis, January 2016
  • 12. Larry Smarr Gut Microbiome Ecology Shifted After Drug Therapy Between Two Time-Stable Equilibriums Correlated to Physical Symptoms Lialda & Uceris 12/1/13 to 1/1/14 12/1/13- 1/1/14 Frequent IBD Symptoms Weight Loss 5/1/12 to 12/1/14 Blue Balls on Diagram to the Right Few IBD Symptoms Weight Gain 1/1/14 to 1/1/16 Red Balls on Diagram to the Right Principal Coordinate Analysis of Microbiome Ecology PCoA by Justine Debelius and Jose Navas, Knight Lab, UCSD Weight Data from Larry Smarr, Calit2, UCSD Antibiotics Prednisone 1/1/12 to 5/1/12 5/1/12 Weekly Weight (Red Dots Stool Sample) Few IBD Symptoms Weight Gain 1/1/14 to 1/1/16 Red Balls on Diagram to the Right
  • 13. Can Data Create Novel Microbial Diagnostics for Identifying Health and Disease States?
  • 14. Dell Analytics Tree Graphs Classifies the 4 Health/Disease States With Just 3 Microbe Species Source: Thomas Hill, Ph.D. Executive Director Analytics Dell | Information Management Group, Dell Software
  • 15. Our Relative Abundance Results Across ~300 People Show Why Dell Analytics Tree Classifier Works UC 100x Healthy LS 100x UC We Produced Similar Results for ~2500 Microbial Species Healthy 100x CD
  • 16. Ayasdi Enables Discovery of Differences Between Healthy and Disease States Using Microbiome Species Healthy LS Ileal Crohn’s Ulcerative Colitis Using Multidimensional Scaling Lens with Correlation Metric High in Healthy and LS High in Healthy and Ulcerative Colitis High in Both LS and Ileal Crohn’s Disease Analysis by Mehrdad Yazdani, Calit2
  • 17. Can Data Discover Functional Microbiome Gene Changes Between Health and Disease?
  • 18. We Computed the Relative Abundance of Microbial Gene Families - ~10,000 KEGG Orthologous Genes, Across Healthy and IBD Subjects How Large is the Microbiome’s Genetic Change Between Health and Disease States?
  • 19. In a “Healthy” Gut Microbiome: Large Taxonomy Variation, Low Protein Family Variation Source: Nature, 486, 207-212 (2012) Over 200 People
  • 20. Ratio of HE11529 to Ave HE Test to see How Much Variation There is Within Healthy Most KEGGs Are Within 10x Of Healthy for a Random HE Ratio of Random HE11529 to Healthy Average for Each Nonzero KEGG Similar to HMP Healthy Results
  • 21. Our Research Shows Large Changes in Protein Families Between Health and Disease – Ileal Crohns KEGGs Greatly Increased In the Disease State KEGGs Greatly Decreased In the Disease State Over 7000 KEGGs Which Are Nonzero in Health and Disease States Ratio of CD Average to Healthy Average for Each Nonzero KEGG Note Hi/Low Symmetry Similar Results for UC and LS
  • 22. We Found a Set of Ayasdi Lenses That Separate Out the 43 Extreme KEGGs Common to the Disease States K00108(choline_dehydrogenase) K00673(arginine_N-succinyltransferase) K00867(type_I_pantothenate_kinase) K01169(ribonuclease_I_(enterobacter_ribonuclease)) K01484(succinylarginine_dihydrolase) K01682(aconitate_hydratase_2) K01690(phosphogluconate_dehydratase) K01825(3-hydroxyacyl-CoA_dehydrogenase_/_enoyl-CoA_hydratase_/3-hydroxybutyryl-CoA_epimerase_/_e K02173(hypothetical_protein) K02317(DNA_replication_protein_DnaT) K02466(glucitol_operon_activator_protein) K02846(N-methyl-L-tryptophan_oxidase) K03081(3-dehydro-L-gulonate-6-phosphate_decarboxylase) K03119(taurine_dioxygenase) K03181(chorismate--pyruvate_lyase) K03807(AmpE_protein) K05522(endonuclease_VIII) K05775(maltose_operon_periplasmic_protein) K05812(conserved_hypothetical_protein) K05997(Fe-S_cluster_assembly_protein_SufA) K06073(vitamin_B12_transport_system_permease_protein) K06205(MioC_protein) K06445(acyl-CoA_dehydrogenase) K06447(succinylglutamic_semialdehyde_dehydrogenase) K07229(TrkA_domain_protein) K07232(cation_transport_protein_ChaC) K07312(putative_dimethyl_sulfoxide_reductase_subunit_YnfH_(DMSO_reductaseanchor_subunit)) K07336(PKHD-type_hydroxylase) K08989(putative_membrane_protein) K09018(putative_monooxygenase_RutA) K09456(putative_acyl-CoA_dehydrogenase) K09998(arginine_transport_system_permease_protein) K10748(DNA_replication_terminus_site-binding_protein) K11209(GST-like_protein) K11391(ribosomal_RNA_large_subunit_methyltransferase_G) K11734(aromatic_amino_acid_transport_protein_AroP) K11735(GABA_permease) K11925(SgrR_family_transcriptional_regulator) K12288(pilus_assembly_protein_HofM) K13255(ferric_iron_reductase_protein_FhuF) K14588() K15733() K15834() L-Infinity Centrality Lens Using Norm Correlation as Metric (Resolution: 242, Gain: 5.7) Entropy & Variance Lens Using Angle as Metric (Resolution: 30, Gain 3.00) Analysis by Mehrdad Yazdani, Calit2
  • 23. Disease Arises from Perturbed Protein Family Networks: Dynamics of a Prion Perturbed Network in Mice Source: Lee Hood, ISB 23 Our Next Goal is to Create Such Perturbed Networks in Humans
  • 24. Calit2’s Qualcomm Institute Has Developed Interactive Scalable Visualization for Biological Networks 20,000 Samples 60,000 OTUs 18 Million Edges Runs Native on 64Million Pixels
  • 25. Center for Microbiome Innovation Seminars Faculty Hiring Education UCSD Microbial Sciences Initiative Instrument Cores Seed Grants Fellowships Chancellor Khosla Launched the UC San Diego Microbiome and Microbial Sciences Initiative October 29, 2015
  • 26. Thanks to Our Great Team! Calit2@UCSD Future Patient Team Jerry Sheehan Tom DeFanti Joe Keefe John Graham Kevin Patrick Mehrdad Yazdani Jurgen Schulze Andrew Prudhomme Philip Weber Fred Raab Ernesto Ramirez JCVI Team Karen Nelson Shibu Yooseph Manolito Torralba Ayasdi Devi Ramanan Pek Lum UCSD Metagenomics Team Weizhong Li Sitao Wu SDSC Team Michael Norman Mahidhar Tatineni Robert Sinkovits Ilkay Altintas UCSD Health Sciences Team David Brenner Rob Knight Lab Justine Debelius Jose Navas Bryn Taylor Gail Ackermann Greg Humphrey William J. Sandborn Lab Elisabeth Evans John Chang Brigid Boland Dell/R Systems Brian Kucic John Thompson Thomas Hill