SlideShare a Scribd company logo
“Using Supercomputers and Data Science
to Reveal Your Inner Microbiome”
Invited Data Sciences Lecture
School of Informatics and Computing
Indiana University
April 29, 2016
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net
1
Abstract
The human body is host to 100 trillion microorganisms, ten times the number of cells in the human body and these microbes
contain 300 times the number of DNA genes that our human DNA does. The microbial component of our “superorganism” is
comprised of hundreds of species with immense biodiversity. Thanks to the National Institutes of Health’s Human Microbiome
Program researchers have been discovering the states of the human microbiome in health and disease. To put a more personal
face on the “patient of the future,” I have been collecting massive amounts of data from my own body over the last ten years,
which reveals detailed examples of the episodic evolution of this coupled immune-microbial system. An elaborate software
pipeline, running on high performance computers, reveals the details of the microbial ecology and its genetic components. A
variety of data science techniques are used to pull biomedical insights from this large data set. We can look forward to
revolutionary changes in medical practice over the next decade.
From One to a Trillion Data Points Defining Me in 15 Years:
The Exponential Rise in Body Data
Weight
Blood Biomarker
Time Series
Human Genome
SNPs
Microbial Genome
Time Series
Improving Body
Discovering Disease
Human Genome
As a Model for the Precision Medicine Initiative,
I Have Tracked My Internal Biomarkers To Understand My Body’s Dynamics
My Quarterly
Blood Draw
Calit2 64 Megapixel VROOM
Only One of My Blood Measurements
Was Far Out of Range--Indicating Chronic Inflammation
Normal Range <1 mg/L
27x Upper Limit
Complex Reactive Protein (CRP) is a Blood Biomarker
for Detecting Presence of Inflammation
Episodic Peaks in Inflammation
Followed by Spontaneous Drops
Adding Stool Tests Revealed
Oscillatory Behavior in an Immune Variable Which is Antibacterial
Normal Range
<7.3 µg/mL
124x Upper Limit for Healthy
Lactoferrin is a Protein Shed from Neutrophils -
An Antibacterial that Sequesters Iron
Typical
Lactoferrin Value for
Active Inflammatory
Bowel Disease
(IBD)
To Understand these Excursions of the Immune System
We Must Consider the Human Microbiome
Your Microbiome is
Your “Near-Body” Environment
and its Cells
Contain 300x as Many DNA Genes
As Your Human DNA-Bearing Cells
Your Body Has 10 Times
As Many Microbe Cells As DNA-Bearing
Human Cells
Inclusion of the “Dark Matter” of the Body
Will Radically Alter Medicine
New Estimates In 2016 Estimate a Human Body Contains
~30 Trillion Human Cells and ~40 Trillion Microbes
However, Red Blood Cells and Platelets Have No Nuclear DNA.
Therefore, Ratio of DNA-Bearing Cells for Human vs. Microbiome is Still >10:1
DNA-Bearing Cells
The Human Gut
as a Super-Evolutionary Microbial Cauldron
• Enormous Density
– 1000x Ocean Water
• Highly Dynamic Microbial Ecology
– Hundreds to Thousands of Species
• Horizontal Gene Transfer
• Phages
• Adaptive Selection Pressures (Immune System)
– Innate Immune System
– Adaptive Immune System
– Macrophages and Antimicrobial proteins
• Constantly Changing Environmental Pressures
– Diet
– Antibiotics
– Pharmaceuticals
How Can
Data Science
Elucidate This
Dynamical System?
We Gathered Raw Illumina Reads on 275 Humans
and Generated a Time Series of My Gut Microbiome
5 Ileal Crohn’s Patients,
3 Points in Time
2 Ulcerative Colitis Patients,
6 Points in Time
“Healthy” Individuals
Source: Jerry Sheehan, Calit2
Weizhong Li, Sitao Wu, CRBS, UCSD
Total of 27 Billion Reads
Or 2.7 Trillion Bases
Inflammatory Bowel Disease (IBD) Patients
250 Subjects
1 Point in Time
7 Points in Time
Each Sample Has 100-200 Million Illumina Short Reads (100 bases)
Larry Smarr
(Colonic Crohn’s)
To Map Out the Dynamics of Autoimmune Microbiome Ecology
Couples Next Generation Genome Sequencers to Big Data Supercomputers
• Metagenomic Sequencing
– JCVI Produced
– ~150 Billion DNA Bases From
Seven of LS Stool Samples Over 1.5 Years
– We Downloaded ~3 Trillion DNA Bases
From NIH Human Microbiome Program Data Base
– 255 Healthy People, 21 with IBD
• Supercomputing (Weizhong Li, JCVI/HLI/UCSD):
– ~20 CPU-Years on SDSC’s Gordon
– ~4 CPU-Years on Dell’s HPC Cloud
• Produced Relative Abundance of
– ~10,000 Bacteria, Archaea, Viruses in ~300 People
– ~3 Million Filled Spreadsheet Cells
Illumina HiSeq 2000 at JCVI
SDSC Gordon Data Supercomputer
Example: Inflammatory Bowel Disease (IBD)
Computational NextGen Sequencing Pipeline:
From Sequence to Taxonomy and Function
PI: (Weizhong Li, CRBS, UCSD):
NIH R01HG005978 (2010-2013, $1.1M)
Using Scalable Visualization Allows Comparison of
the Relative Abundance of 200 Microbe Species
Calit2 VROOM-FuturePatient Expedition
Comparing 3 LS Time Snapshots (Left)
with Healthy, Crohn’s, Ulcerative Colitis (Right Top to Bottom)
The Carl Woese Tree of Life
Shows The Most Life on Earth is Bacterial
Nature Microbiology
Hug, et al.
Source: Carl Woese, et al (1990)
You Are Here
When We Think About Biological Diversity
We Typically Think of the Wide Range of Animals
But All These Animals Are in One SubPhylum Vertebrata
of the Chordata Phylum
All images from Wikimedia Commons.
Photos are public domain or by Trisha Shears, Richard Bartz, & Matt Clancy
Think of These Phyla of Animals When
You Consider the Biodiversity of Microbes Inside You
Phylum
Annelida
Phylum
Echinodermata
Phylum
Cnidaria
Phylum
Mollusca
Phylum
Arthropoda
Phylum
Chordata
Phylum
Porifera
All images from WikiMedia Commons.
Photos are public domain or by Dan Hershman, Michael Linnenbach, Manuae, B_cool, Nick Hobgood
Results Include Relative Abundance
of Hundreds of Microbial Species
Average Over 250 Healthy People
From NIH Human Microbiome Project
Note Log Scale
Clostridium difficile
200 Most Abundant Species
Colored by Phyla
Using Microbiome Profiles to Survey 155 Subjects
for Unhealthy Candidates
Using HPC and Data Analytics
to Discover Microbial Disease Dynamics
• Can Data Distinguish Between Health and Disease Subtypes?
• Can Data Track the Time Development of the Disease State?
• Can Data Discover Functional Microbiome Gene Changes Between Health and Disease?
Can Data Distinguish Between
Health and Disease Subtypes?
We Found Major State Shifts in Microbial Ecology Phyla
Between Healthy and Three Forms of IBD
Most
Common
Microbial
Phyla
Average HE
Average
Ulcerative Colitis
Average LS
Colonic Crohn’s Disease
Average
Ileal Crohn’s Disease
Dell Analytics Separates The 4 Patient Types in Our Data
Using Our Microbiome Species Data
Source: Thomas Hill, Ph.D.
Executive Director Analytics
Dell | Information Management Group, Dell Software
Healthy
Ulcerative Colitis
Colonic Crohn’s
Ileal Crohn’s
Can Data Track
the Time Development of the Disease State?
The Knight Lab Uses the Unifrac Metric
to Quantitatively Compare Different Microbiome Ecologies
“This method, UniFrac, measures the phylogenetic distance
between sets of taxa in a phylogenetic tree
as the fraction of the branch length of the tree that leads to descendants
from either one environment or the other, but not both.
UniFrac can be used to determine whether communities are significantly different…”
A Healthy Person’s Microbiome
Is in a Stable Equilbrium Over Time
• Background is Human Microbiome Project Data
• Using Unifrac in Principle Coordinate Analysis
– Map Microbiome Ecologies of Individuals to Points
– Samples From Multiple Body Sites
• Overlay Longitudinal Time Series of Male and Female Subject
– Duration 60 Days
– Time Points Separated by One Day
– Sampled Oral, Skin, Stool Microbiomes
– 16S Sequencing
Mouth
Stool
Vagina
Skin
Source: Knight Lab, UCSD
Source: Knight Lab, UCSD
Using Supercomputers and Data Science to Reveal Your Inner Microbiome
An Unhealthy Person’s Microbiome
Can Abruptly Shift Between Two States With External Influence
• Example: Clostridium difficile and Fecal Transplant
• Multiple C. diff Patients With a Single Donor
• Dramatic Shift Back to Healthy Microbiome in Days
Using Supercomputers and Data Science to Reveal Your Inner Microbiome
Fecal Transplants
From Healthy Donor
To C. Diff Patients
Source: Knight Lab, UCSD
In 2016 We Are Extending My Stool Time Series by
Collaborating with the UCSD Knight Lab
Larry’s 40 Stool Samples Over 3.5 Years
to Rob’s lab on April 30, 2015
Variation in My Gut Microbiome by 16S Families –
40 Samples Over 3.5 Years
Data from Justine Debelius & Jose Navas, Knight Lab, UCSD; Larry Smarr Analysis, January 2016
Larry Smarr Gut Microbiome Ecology Shifted After Drug Therapy
Between Two Time-Stable Equilibriums Correlated to Physical Symptoms
Lialda
&
Uceris
12/1/13 to 1/1/14
12/1/13-
1/1/14
Frequent IBD Symptoms
Weight Loss
5/1/12 to 12/1/14
Blue Balls on Diagram
to the Right
Few IBD Symptoms
Weight Gain
1/1/14 to 1/1/16
Red Balls on Diagram
to the Right
Principal Coordinate Analysis of
Microbiome Ecology
PCoA by Justine Debelius and Jose Navas,
Knight Lab, UCSD
Weight Data from Larry Smarr, Calit2, UCSD
Antibiotics
Prednisone
1/1/12 to 5/1/12
5/1/12
Weekly Weight (Red Dots Stool Sample)
Few IBD Symptoms
Weight Gain
1/1/14 to 1/1/16
Red Balls on Diagram
to the Right
Can Data Discover Functional Microbiome Gene Changes
Between Health and Disease?
We Computed the Relative Abundance of Microbial Gene Families -
~10,000 KEGG Orthologous Genes, Across Healthy and IBD Subjects
How Large is the Microbiome’s Genetic Change
Between Health and Disease States?
In a “Healthy” Gut Microbiome:
Large Taxonomy Variation, Low Protein Family Variation
Source: Nature, 486, 207-212 (2012)
Over 200 People
Ratio of HE11529 to Ave HE
Test to see How Much Variation There is Within Healthy
Most KEGGs Are Within 10x
Of Healthy for a Random HE
Ratio of Random HE11529 to Healthy Average for Each Nonzero KEGG
Similar to HMP Healthy Results
Our Research Shows Large Changes
in Protein Families Between Health and Disease – Ileal Crohns
KEGGs Greatly Increased
In the Disease State
KEGGs Greatly Decreased
In the Disease State
Over 7000 KEGGs Which Are Nonzero
in Health and Disease States
Ratio of CD Average to Healthy Average for Each Nonzero KEGG
Note Hi/Low
Symmetry
Similar Results for UC and LS
Using Ayasdi Topological Data Analysis
to Discover Hidden Patterns in Our Data
topological data analysis
Using Ayasdi Interactively to Explore
Protein Families in Healthy and Disease States
Source: Pek Lum,
Formerly Chief Data Scientist, Ayasdi
Dataset from Larry Smarr Team
With 60 Subjects (HE, CD, UC, LS)
Each with 10,000 KEGGs -
600,000 Cells
CD is Missing a Population of Bacteria
That Exists in High Quantities in HE ( Circled with Arrow)
Low in CD and LS
Source: Pek Lum,
Formerly Chief Data Scientist, Ayasdi
Disease Arises from Perturbed Protein Family Networks:
Dynamics of a Prion Perturbed Network in Mice
Source: Lee Hood, ISB 43
Our Next Goal is to Create
Such Perturbed Networks in Humans
Genetic and protein
interaction networks
Transcriptional networks
Metabolic networks
mRNA & protein
expression
UCSD’s Cytoscape Integrates and Visualizes
Molecular Networks and Molecular Profiles
Source: Trey Ideker, UCSD
Calit2’s Qualcomm Institute Has Developed
Interactive Scalable Visualization for Biological Networks
20,000 Samples
60,000 OTUs
18 Million Edges
Runs Native on 64Million Pixels
Next Steps in
Knight/Smarr Lab Collaboration
• Smarr Gut Microbiome Time Series
– From 7 to 50 Times Over Four Years
• Healthy Human Microbiome
– Use 255+ Raw Reads from NIH Human Microbiome Project
• IBD Patients: From 5 Crohn’s Disease and 2 Ulcerative Colitis Patients to ~100
– 50 Carefully Phenotyped Patients Drawn from Sandborn BioBank
– 43 Metagenomes from the RISK Cohort of Newly Diagnosed IBD patients,
• Illumina Reagent Grant Key
– Enables Deep Metagenomic (and 16S) Sequencing at IGM of Smarr + Sandborn Samples
• New Software Suite from Knight Lab
– Major Re-annotation of Reference Genomes, Functional and Taxonomic Variations
– Novel Assembly Algorithms from Pavel Pevzner-Very Computationally Intensive
• Supercomputer Grant On SDSC Comet (Awarded from XSEDE)
– From 25 Gordon to 100 Comet Core-Years
– Each Comet Core 40GF Peak=2x Gordon Core: 8X Increase in Compute
Center for
Microbiome
Innovation
Seminars
Faculty
Hiring
Education
UCSD Microbial Sciences Initiative
Instrument
Cores
Seed Grants
Fellowships
Chancellor Khosla Launched the UC San Diego
Microbiome and Microbial Sciences Initiative October 29, 2015
Building a UC San Diego Cyberinfrastructure
to Support Integrative Omics
FIONA
12 Cores/GPU
128 GB RAM
3.5 TB SSD
48TB Disk
10Gbps NIC
Knight Lab
10Gbps
Gordon
Prism@UCSD
Data Oasis
7.5PB,
200GB/s
Knight 1024 Cluster
In SDSC Co-Lo
CHERuB
100Gbps
Emperor & Other Vis Tools
64Mpixel Data Analysis Wall
120Gbps
40Gbps
1.3Tbps
PRP/
The Pacific Wave Platform
Creates a Regional Science-Driven “Big Data Freeway System”
Source:
John Hess, CENIC
Funded by NSF $5M Oct 2015-2020
Flash Disk to Flash Disk File Transfer Rate
PI: Larry Smarr, UC San Diego Calit2
Co-PIs:
• Camille Crittenden, UC Berkeley CITRIS,
• Tom DeFanti, UC San Diego Calit2,
• Philip Papadopoulos, UC San Diego SDSC,
• Frank Wuerthwein, UC San Diego Physics
and SDSC
Thanks to Our Great Team!
Calit2@UCSD
Future Patient Team
Jerry Sheehan
Tom DeFanti
Joe Keefe
John Graham
Kevin Patrick
Mehrdad Yazdani
Jurgen Schulze
Andrew Prudhomme
Philip Weber
Fred Raab
Ernesto Ramirez
JCVI Team
Karen Nelson
Shibu Yooseph
Manolito Torralba
Ayasdi
Devi Ramanan
Pek Lum
UCSD Metagenomics Team
Weizhong Li
Sitao Wu
SDSC Team
Michael Norman
Mahidhar Tatineni
Robert Sinkovits
Ilkay Altintas
UCSD Health Sciences Team
David Brenner
Rob Knight Lab
Justine Debelius
Jose Navas
Bryn Taylor
Gail Ackermann
Greg Humphrey
William J. Sandborn Lab
Elisabeth Evans
John Chang
Brigid Boland
Dell/R Systems
Brian Kucic
John Thompson
Thomas Hill

More Related Content

PPTX
The Human Microbiome, Supercomputers,and the Advancement of Medicine
PPTX
Discovering the 100 Trillion Bacteria Living Within Each of Us
PPTX
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
PPTX
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
PPTX
Quantifying Your Dynamic Human Body (Including Its Microbiome), Will Move Us ...
PPTX
Stability in Health vs. Abrupt Changes in Disease in the Human Gut Microbiome...
PPTX
Machine Learning Opportunities in the Explosion of Personalized Precision Med...
PPTX
Supercomputing Your Inner Microbiome
The Human Microbiome, Supercomputers,and the Advancement of Medicine
Discovering the 100 Trillion Bacteria Living Within Each of Us
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Quantifying Your Dynamic Human Body (Including Its Microbiome), Will Move Us ...
Stability in Health vs. Abrupt Changes in Disease in the Human Gut Microbiome...
Machine Learning Opportunities in the Explosion of Personalized Precision Med...
Supercomputing Your Inner Microbiome

What's hot (20)

PPTX
Using Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
PPTX
Using Supercomputers and Data Analytics to Discover the Differences in Health...
PPTX
Dynamics of Your Gut Microbiome in Health and Disease
PPTX
Decoding the Software Inside of You
PPTX
Linking Phenotype Changes to Internal/External Longitudinal Time Series in a ...
PPTX
Quantifying your Human Body & Its Trillions of Microbes
PPT
The Human Microbiome and the Revolution in Digital Health
PPTX
Discovering the Other 90% of our Human Superorganism
PPTX
Fifty Years of Supercomputing: From Colliding Black Holes to Dynamic Microbio...
PPT
Exploring Our Inner Universe Using Supercomputers and Gene Sequencers
PPTX
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
PPTX
Quantifying The Dynamics of Your Superorganism Body Using Big Data Supercompu...
PPT
Tracking Immune Biomarkers and the Human Gut Microbiome: Inflammation, Croh...
PPTX
Inspired by Carl: Exploring the Microbial Dynamics Within
PPT
Quantifying Your Superorganism Body Using Big Data Supercomputing
PPT
Large Memory High Performance Computing Enables Comparison Across Human Gut M...
PPT
How Studying Astrophysics and Coral Reefs Enabled Me to Become an Empowered,...
PPT
Tracking Large Variations in My Immune Biomarkers and My Gut Microbiome: Infl...
PPT
Quantifying Your Superorganism Body Using Big Data Supercomputing
PPTX
Inflammation, Gut Microbiome, Bacteriophages, and the Initiation of Colorecta...
Using Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
Using Supercomputers and Data Analytics to Discover the Differences in Health...
Dynamics of Your Gut Microbiome in Health and Disease
Decoding the Software Inside of You
Linking Phenotype Changes to Internal/External Longitudinal Time Series in a ...
Quantifying your Human Body & Its Trillions of Microbes
The Human Microbiome and the Revolution in Digital Health
Discovering the Other 90% of our Human Superorganism
Fifty Years of Supercomputing: From Colliding Black Holes to Dynamic Microbio...
Exploring Our Inner Universe Using Supercomputers and Gene Sequencers
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Quantifying The Dynamics of Your Superorganism Body Using Big Data Supercompu...
Tracking Immune Biomarkers and the Human Gut Microbiome: Inflammation, Croh...
Inspired by Carl: Exploring the Microbial Dynamics Within
Quantifying Your Superorganism Body Using Big Data Supercomputing
Large Memory High Performance Computing Enables Comparison Across Human Gut M...
How Studying Astrophysics and Coral Reefs Enabled Me to Become an Empowered,...
Tracking Large Variations in My Immune Biomarkers and My Gut Microbiome: Infl...
Quantifying Your Superorganism Body Using Big Data Supercomputing
Inflammation, Gut Microbiome, Bacteriophages, and the Initiation of Colorecta...
Ad

Viewers also liked (9)

PPTX
CENIC: Pacific Wave and PRP Update Big News for Big Data
PPT
The Forty Year Path to the 2016 Filmatic Festival
PPTX
Pacific Research Platform Science Drivers
PPTX
The Pacific Research Platform
PPTX
The Pacific Research Platform
PPTX
Pacific Wave and PRP Update Big News for Big Data
PPT
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
PPTX
Creating a Science-Driven Big Data Superhighway
PPTX
Supercomputing Your Inner Microbiome
CENIC: Pacific Wave and PRP Update Big News for Big Data
The Forty Year Path to the 2016 Filmatic Festival
Pacific Research Platform Science Drivers
The Pacific Research Platform
The Pacific Research Platform
Pacific Wave and PRP Update Big News for Big Data
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
Creating a Science-Driven Big Data Superhighway
Supercomputing Your Inner Microbiome
Ad

Similar to Using Supercomputers and Data Science to Reveal Your Inner Microbiome (20)

PPTX
Capturing the Interactive Dynamics of the Human Host/Microbiome System
PPTX
Know Thyself: Quantifying Your Human Body and Its One Hundred Trillion Microbes
PPT
From N=1 to N=100: What I Have Learned from Quantifying My Superorganism Body
PPT
Big Data and Superorganism Genomics: Microbial Metagenomics Meets Human Genomics
PPTX
Assay Lab Within Your Body: Biometrics and Biomes
PPTX
Assay Lab Within Your Body: Biometrics and Biomes
PPTX
Finding the Patterns in the Big Data From Human Microbiome Ecology
PPTX
Measuring the Human Brain-Gut Microbiome-Immune System Dynamics: a Big Data C...
PPTX
Exploring the Dynamics of The Microbiome in Health and Disease
PPTX
Discovering Human Gut Microbiome Dynamics
PPT
Individual, Consumer-Driven Care of the Future: Taking Wellness One Step Further
PPTX
Using Supercomputing & Advanced Analytic Software to Discover Radical Changes...
PDF
Using Dell’s HPC Cloud & Advanced Analytic Software to Discover Radical Chang...
PPTX
From Me To We: Discovering the Trillions of Microorganisms That are a Part of Us
PPTX
Using Data Analytics to Discover the 100 Trillion Bacteria Living Within Each...
PPTX
Exploring your Inner Microbiome
PPT
Using Genetic Sequencing to Unravel the Dynamics of Your Superorganism Body
PPT
Living in a Microbial World
PPT
Digitally Revealing the Dynamics of Your Superorganism Body
PPTX
The Human Gut Microbiome: A New Diagnostic for Disease?
Capturing the Interactive Dynamics of the Human Host/Microbiome System
Know Thyself: Quantifying Your Human Body and Its One Hundred Trillion Microbes
From N=1 to N=100: What I Have Learned from Quantifying My Superorganism Body
Big Data and Superorganism Genomics: Microbial Metagenomics Meets Human Genomics
Assay Lab Within Your Body: Biometrics and Biomes
Assay Lab Within Your Body: Biometrics and Biomes
Finding the Patterns in the Big Data From Human Microbiome Ecology
Measuring the Human Brain-Gut Microbiome-Immune System Dynamics: a Big Data C...
Exploring the Dynamics of The Microbiome in Health and Disease
Discovering Human Gut Microbiome Dynamics
Individual, Consumer-Driven Care of the Future: Taking Wellness One Step Further
Using Supercomputing & Advanced Analytic Software to Discover Radical Changes...
Using Dell’s HPC Cloud & Advanced Analytic Software to Discover Radical Chang...
From Me To We: Discovering the Trillions of Microorganisms That are a Part of Us
Using Data Analytics to Discover the 100 Trillion Bacteria Living Within Each...
Exploring your Inner Microbiome
Using Genetic Sequencing to Unravel the Dynamics of Your Superorganism Body
Living in a Microbial World
Digitally Revealing the Dynamics of Your Superorganism Body
The Human Gut Microbiome: A New Diagnostic for Disease?

More from Larry Smarr (20)

PPTX
Smart Patients, Big Data, NextGen Primary Care
PPTX
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
PPTX
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
PPTX
National Research Platform: Application Drivers
PPT
From Supercomputing to the Grid - Larry Smarr
PPTX
The CENIC-AI Resource - Los Angeles Community College District (LACCD)
PPT
Redefining Collaboration through Groupware - From Groupware to Societyware
PPT
The Coming of the Grid - September 8-10,1997
PPT
Supercomputers: Directions in Technology, Architecture, and Applications
PPT
High Performance Geographic Information Systems
PPT
Data Intensive Applications at UCSD: Driving a Campus Research Cyberinfrastru...
PPT
Enhanced Telepresence and Green IT — The Next Evolution in the Internet
PPTX
The CENIC AI Resource CENIC AIR - CENIC Retreat 2024
PPTX
The CENIC-AI Resource: The Right Connection
PPTX
The Pacific Research Platform: The First Six Years
PPTX
The NSF Grants Leading Up to CHASE-CI ENS
PPTX
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
PPTX
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
PPTX
Toward a National Research Platform to Enable Data-Intensive Computing
PPTX
Digital Twins of Physical Reality - Future in Review
Smart Patients, Big Data, NextGen Primary Care
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
National Research Platform: Application Drivers
From Supercomputing to the Grid - Larry Smarr
The CENIC-AI Resource - Los Angeles Community College District (LACCD)
Redefining Collaboration through Groupware - From Groupware to Societyware
The Coming of the Grid - September 8-10,1997
Supercomputers: Directions in Technology, Architecture, and Applications
High Performance Geographic Information Systems
Data Intensive Applications at UCSD: Driving a Campus Research Cyberinfrastru...
Enhanced Telepresence and Green IT — The Next Evolution in the Internet
The CENIC AI Resource CENIC AIR - CENIC Retreat 2024
The CENIC-AI Resource: The Right Connection
The Pacific Research Platform: The First Six Years
The NSF Grants Leading Up to CHASE-CI ENS
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Toward a National Research Platform to Enable Data-Intensive Computing
Digital Twins of Physical Reality - Future in Review

Recently uploaded (20)

PDF
Fluorescence-microscope_Botany_detailed content
PPT
Quality review (1)_presentation of this 21
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Computer network topology notes for revision
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
annual-report-2024-2025 original latest.
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Business Analytics and business intelligence.pdf
Fluorescence-microscope_Botany_detailed content
Quality review (1)_presentation of this 21
IBA_Chapter_11_Slides_Final_Accessible.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Qualitative Qantitative and Mixed Methods.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Reliability_Chapter_ presentation 1221.5784
Business Acumen Training GuidePresentation.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
IB Computer Science - Internal Assessment.pptx
Computer network topology notes for revision
.pdf is not working space design for the following data for the following dat...
Data_Analytics_and_PowerBI_Presentation.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Supervised vs unsupervised machine learning algorithms
annual-report-2024-2025 original latest.
oil_refinery_comprehensive_20250804084928 (1).pptx
Business Analytics and business intelligence.pdf

Using Supercomputers and Data Science to Reveal Your Inner Microbiome

  • 1. “Using Supercomputers and Data Science to Reveal Your Inner Microbiome” Invited Data Sciences Lecture School of Informatics and Computing Indiana University April 29, 2016 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net 1
  • 2. Abstract The human body is host to 100 trillion microorganisms, ten times the number of cells in the human body and these microbes contain 300 times the number of DNA genes that our human DNA does. The microbial component of our “superorganism” is comprised of hundreds of species with immense biodiversity. Thanks to the National Institutes of Health’s Human Microbiome Program researchers have been discovering the states of the human microbiome in health and disease. To put a more personal face on the “patient of the future,” I have been collecting massive amounts of data from my own body over the last ten years, which reveals detailed examples of the episodic evolution of this coupled immune-microbial system. An elaborate software pipeline, running on high performance computers, reveals the details of the microbial ecology and its genetic components. A variety of data science techniques are used to pull biomedical insights from this large data set. We can look forward to revolutionary changes in medical practice over the next decade.
  • 3. From One to a Trillion Data Points Defining Me in 15 Years: The Exponential Rise in Body Data Weight Blood Biomarker Time Series Human Genome SNPs Microbial Genome Time Series Improving Body Discovering Disease Human Genome
  • 4. As a Model for the Precision Medicine Initiative, I Have Tracked My Internal Biomarkers To Understand My Body’s Dynamics My Quarterly Blood Draw Calit2 64 Megapixel VROOM
  • 5. Only One of My Blood Measurements Was Far Out of Range--Indicating Chronic Inflammation Normal Range <1 mg/L 27x Upper Limit Complex Reactive Protein (CRP) is a Blood Biomarker for Detecting Presence of Inflammation Episodic Peaks in Inflammation Followed by Spontaneous Drops
  • 6. Adding Stool Tests Revealed Oscillatory Behavior in an Immune Variable Which is Antibacterial Normal Range <7.3 µg/mL 124x Upper Limit for Healthy Lactoferrin is a Protein Shed from Neutrophils - An Antibacterial that Sequesters Iron Typical Lactoferrin Value for Active Inflammatory Bowel Disease (IBD)
  • 7. To Understand these Excursions of the Immune System We Must Consider the Human Microbiome Your Microbiome is Your “Near-Body” Environment and its Cells Contain 300x as Many DNA Genes As Your Human DNA-Bearing Cells Your Body Has 10 Times As Many Microbe Cells As DNA-Bearing Human Cells Inclusion of the “Dark Matter” of the Body Will Radically Alter Medicine
  • 8. New Estimates In 2016 Estimate a Human Body Contains ~30 Trillion Human Cells and ~40 Trillion Microbes However, Red Blood Cells and Platelets Have No Nuclear DNA. Therefore, Ratio of DNA-Bearing Cells for Human vs. Microbiome is Still >10:1 DNA-Bearing Cells
  • 9. The Human Gut as a Super-Evolutionary Microbial Cauldron • Enormous Density – 1000x Ocean Water • Highly Dynamic Microbial Ecology – Hundreds to Thousands of Species • Horizontal Gene Transfer • Phages • Adaptive Selection Pressures (Immune System) – Innate Immune System – Adaptive Immune System – Macrophages and Antimicrobial proteins • Constantly Changing Environmental Pressures – Diet – Antibiotics – Pharmaceuticals How Can Data Science Elucidate This Dynamical System?
  • 10. We Gathered Raw Illumina Reads on 275 Humans and Generated a Time Series of My Gut Microbiome 5 Ileal Crohn’s Patients, 3 Points in Time 2 Ulcerative Colitis Patients, 6 Points in Time “Healthy” Individuals Source: Jerry Sheehan, Calit2 Weizhong Li, Sitao Wu, CRBS, UCSD Total of 27 Billion Reads Or 2.7 Trillion Bases Inflammatory Bowel Disease (IBD) Patients 250 Subjects 1 Point in Time 7 Points in Time Each Sample Has 100-200 Million Illumina Short Reads (100 bases) Larry Smarr (Colonic Crohn’s)
  • 11. To Map Out the Dynamics of Autoimmune Microbiome Ecology Couples Next Generation Genome Sequencers to Big Data Supercomputers • Metagenomic Sequencing – JCVI Produced – ~150 Billion DNA Bases From Seven of LS Stool Samples Over 1.5 Years – We Downloaded ~3 Trillion DNA Bases From NIH Human Microbiome Program Data Base – 255 Healthy People, 21 with IBD • Supercomputing (Weizhong Li, JCVI/HLI/UCSD): – ~20 CPU-Years on SDSC’s Gordon – ~4 CPU-Years on Dell’s HPC Cloud • Produced Relative Abundance of – ~10,000 Bacteria, Archaea, Viruses in ~300 People – ~3 Million Filled Spreadsheet Cells Illumina HiSeq 2000 at JCVI SDSC Gordon Data Supercomputer Example: Inflammatory Bowel Disease (IBD)
  • 12. Computational NextGen Sequencing Pipeline: From Sequence to Taxonomy and Function PI: (Weizhong Li, CRBS, UCSD): NIH R01HG005978 (2010-2013, $1.1M)
  • 13. Using Scalable Visualization Allows Comparison of the Relative Abundance of 200 Microbe Species Calit2 VROOM-FuturePatient Expedition Comparing 3 LS Time Snapshots (Left) with Healthy, Crohn’s, Ulcerative Colitis (Right Top to Bottom)
  • 14. The Carl Woese Tree of Life Shows The Most Life on Earth is Bacterial Nature Microbiology Hug, et al. Source: Carl Woese, et al (1990) You Are Here
  • 15. When We Think About Biological Diversity We Typically Think of the Wide Range of Animals But All These Animals Are in One SubPhylum Vertebrata of the Chordata Phylum All images from Wikimedia Commons. Photos are public domain or by Trisha Shears, Richard Bartz, & Matt Clancy
  • 16. Think of These Phyla of Animals When You Consider the Biodiversity of Microbes Inside You Phylum Annelida Phylum Echinodermata Phylum Cnidaria Phylum Mollusca Phylum Arthropoda Phylum Chordata Phylum Porifera All images from WikiMedia Commons. Photos are public domain or by Dan Hershman, Michael Linnenbach, Manuae, B_cool, Nick Hobgood
  • 17. Results Include Relative Abundance of Hundreds of Microbial Species Average Over 250 Healthy People From NIH Human Microbiome Project Note Log Scale Clostridium difficile 200 Most Abundant Species Colored by Phyla
  • 18. Using Microbiome Profiles to Survey 155 Subjects for Unhealthy Candidates
  • 19. Using HPC and Data Analytics to Discover Microbial Disease Dynamics • Can Data Distinguish Between Health and Disease Subtypes? • Can Data Track the Time Development of the Disease State? • Can Data Discover Functional Microbiome Gene Changes Between Health and Disease?
  • 20. Can Data Distinguish Between Health and Disease Subtypes?
  • 21. We Found Major State Shifts in Microbial Ecology Phyla Between Healthy and Three Forms of IBD Most Common Microbial Phyla Average HE Average Ulcerative Colitis Average LS Colonic Crohn’s Disease Average Ileal Crohn’s Disease
  • 22. Dell Analytics Separates The 4 Patient Types in Our Data Using Our Microbiome Species Data Source: Thomas Hill, Ph.D. Executive Director Analytics Dell | Information Management Group, Dell Software Healthy Ulcerative Colitis Colonic Crohn’s Ileal Crohn’s
  • 23. Can Data Track the Time Development of the Disease State?
  • 24. The Knight Lab Uses the Unifrac Metric to Quantitatively Compare Different Microbiome Ecologies “This method, UniFrac, measures the phylogenetic distance between sets of taxa in a phylogenetic tree as the fraction of the branch length of the tree that leads to descendants from either one environment or the other, but not both. UniFrac can be used to determine whether communities are significantly different…”
  • 25. A Healthy Person’s Microbiome Is in a Stable Equilbrium Over Time • Background is Human Microbiome Project Data • Using Unifrac in Principle Coordinate Analysis – Map Microbiome Ecologies of Individuals to Points – Samples From Multiple Body Sites • Overlay Longitudinal Time Series of Male and Female Subject – Duration 60 Days – Time Points Separated by One Day – Sampled Oral, Skin, Stool Microbiomes – 16S Sequencing
  • 29. An Unhealthy Person’s Microbiome Can Abruptly Shift Between Two States With External Influence • Example: Clostridium difficile and Fecal Transplant • Multiple C. diff Patients With a Single Donor • Dramatic Shift Back to Healthy Microbiome in Days
  • 31. Fecal Transplants From Healthy Donor To C. Diff Patients Source: Knight Lab, UCSD
  • 32. In 2016 We Are Extending My Stool Time Series by Collaborating with the UCSD Knight Lab Larry’s 40 Stool Samples Over 3.5 Years to Rob’s lab on April 30, 2015
  • 33. Variation in My Gut Microbiome by 16S Families – 40 Samples Over 3.5 Years Data from Justine Debelius & Jose Navas, Knight Lab, UCSD; Larry Smarr Analysis, January 2016
  • 34. Larry Smarr Gut Microbiome Ecology Shifted After Drug Therapy Between Two Time-Stable Equilibriums Correlated to Physical Symptoms Lialda & Uceris 12/1/13 to 1/1/14 12/1/13- 1/1/14 Frequent IBD Symptoms Weight Loss 5/1/12 to 12/1/14 Blue Balls on Diagram to the Right Few IBD Symptoms Weight Gain 1/1/14 to 1/1/16 Red Balls on Diagram to the Right Principal Coordinate Analysis of Microbiome Ecology PCoA by Justine Debelius and Jose Navas, Knight Lab, UCSD Weight Data from Larry Smarr, Calit2, UCSD Antibiotics Prednisone 1/1/12 to 5/1/12 5/1/12 Weekly Weight (Red Dots Stool Sample) Few IBD Symptoms Weight Gain 1/1/14 to 1/1/16 Red Balls on Diagram to the Right
  • 35. Can Data Discover Functional Microbiome Gene Changes Between Health and Disease?
  • 36. We Computed the Relative Abundance of Microbial Gene Families - ~10,000 KEGG Orthologous Genes, Across Healthy and IBD Subjects How Large is the Microbiome’s Genetic Change Between Health and Disease States?
  • 37. In a “Healthy” Gut Microbiome: Large Taxonomy Variation, Low Protein Family Variation Source: Nature, 486, 207-212 (2012) Over 200 People
  • 38. Ratio of HE11529 to Ave HE Test to see How Much Variation There is Within Healthy Most KEGGs Are Within 10x Of Healthy for a Random HE Ratio of Random HE11529 to Healthy Average for Each Nonzero KEGG Similar to HMP Healthy Results
  • 39. Our Research Shows Large Changes in Protein Families Between Health and Disease – Ileal Crohns KEGGs Greatly Increased In the Disease State KEGGs Greatly Decreased In the Disease State Over 7000 KEGGs Which Are Nonzero in Health and Disease States Ratio of CD Average to Healthy Average for Each Nonzero KEGG Note Hi/Low Symmetry Similar Results for UC and LS
  • 40. Using Ayasdi Topological Data Analysis to Discover Hidden Patterns in Our Data topological data analysis
  • 41. Using Ayasdi Interactively to Explore Protein Families in Healthy and Disease States Source: Pek Lum, Formerly Chief Data Scientist, Ayasdi Dataset from Larry Smarr Team With 60 Subjects (HE, CD, UC, LS) Each with 10,000 KEGGs - 600,000 Cells
  • 42. CD is Missing a Population of Bacteria That Exists in High Quantities in HE ( Circled with Arrow) Low in CD and LS Source: Pek Lum, Formerly Chief Data Scientist, Ayasdi
  • 43. Disease Arises from Perturbed Protein Family Networks: Dynamics of a Prion Perturbed Network in Mice Source: Lee Hood, ISB 43 Our Next Goal is to Create Such Perturbed Networks in Humans
  • 44. Genetic and protein interaction networks Transcriptional networks Metabolic networks mRNA & protein expression UCSD’s Cytoscape Integrates and Visualizes Molecular Networks and Molecular Profiles Source: Trey Ideker, UCSD
  • 45. Calit2’s Qualcomm Institute Has Developed Interactive Scalable Visualization for Biological Networks 20,000 Samples 60,000 OTUs 18 Million Edges Runs Native on 64Million Pixels
  • 46. Next Steps in Knight/Smarr Lab Collaboration • Smarr Gut Microbiome Time Series – From 7 to 50 Times Over Four Years • Healthy Human Microbiome – Use 255+ Raw Reads from NIH Human Microbiome Project • IBD Patients: From 5 Crohn’s Disease and 2 Ulcerative Colitis Patients to ~100 – 50 Carefully Phenotyped Patients Drawn from Sandborn BioBank – 43 Metagenomes from the RISK Cohort of Newly Diagnosed IBD patients, • Illumina Reagent Grant Key – Enables Deep Metagenomic (and 16S) Sequencing at IGM of Smarr + Sandborn Samples • New Software Suite from Knight Lab – Major Re-annotation of Reference Genomes, Functional and Taxonomic Variations – Novel Assembly Algorithms from Pavel Pevzner-Very Computationally Intensive • Supercomputer Grant On SDSC Comet (Awarded from XSEDE) – From 25 Gordon to 100 Comet Core-Years – Each Comet Core 40GF Peak=2x Gordon Core: 8X Increase in Compute
  • 47. Center for Microbiome Innovation Seminars Faculty Hiring Education UCSD Microbial Sciences Initiative Instrument Cores Seed Grants Fellowships Chancellor Khosla Launched the UC San Diego Microbiome and Microbial Sciences Initiative October 29, 2015
  • 48. Building a UC San Diego Cyberinfrastructure to Support Integrative Omics FIONA 12 Cores/GPU 128 GB RAM 3.5 TB SSD 48TB Disk 10Gbps NIC Knight Lab 10Gbps Gordon Prism@UCSD Data Oasis 7.5PB, 200GB/s Knight 1024 Cluster In SDSC Co-Lo CHERuB 100Gbps Emperor & Other Vis Tools 64Mpixel Data Analysis Wall 120Gbps 40Gbps 1.3Tbps PRP/
  • 49. The Pacific Wave Platform Creates a Regional Science-Driven “Big Data Freeway System” Source: John Hess, CENIC Funded by NSF $5M Oct 2015-2020 Flash Disk to Flash Disk File Transfer Rate PI: Larry Smarr, UC San Diego Calit2 Co-PIs: • Camille Crittenden, UC Berkeley CITRIS, • Tom DeFanti, UC San Diego Calit2, • Philip Papadopoulos, UC San Diego SDSC, • Frank Wuerthwein, UC San Diego Physics and SDSC
  • 50. Thanks to Our Great Team! Calit2@UCSD Future Patient Team Jerry Sheehan Tom DeFanti Joe Keefe John Graham Kevin Patrick Mehrdad Yazdani Jurgen Schulze Andrew Prudhomme Philip Weber Fred Raab Ernesto Ramirez JCVI Team Karen Nelson Shibu Yooseph Manolito Torralba Ayasdi Devi Ramanan Pek Lum UCSD Metagenomics Team Weizhong Li Sitao Wu SDSC Team Michael Norman Mahidhar Tatineni Robert Sinkovits Ilkay Altintas UCSD Health Sciences Team David Brenner Rob Knight Lab Justine Debelius Jose Navas Bryn Taylor Gail Ackermann Greg Humphrey William J. Sandborn Lab Elisabeth Evans John Chang Brigid Boland Dell/R Systems Brian Kucic John Thompson Thomas Hill