SlideShare a Scribd company logo
“Using Data Analytics to Discover 
the 100 Trillion Bacteria Living Within Each of Us” 
Invited Talk 
Ayasdi 
Menlo Park, CA 
December 5, 2014 
Dr. Larry Smarr 
Director, California Institute for Telecommunications and Information Technology 
Harry E. Gruber Professor, 
Dept. of Computer Science and Engineering 
Jacobs School of Engineering, UCSD 
http://lsmarr.calit2.net 
1
From One to a Billion Data Points Defining Me: 
The Exponential Rise in Body Data in Just One Decade 
Billion: My Full DNA, 
MRI/CT Images 
Million: My DNA SNPs, 
Zeo, FitBit 
One: Hundred: My Blood Variables 
WeigMhyt Weight 
Blood 
Variables 
SNPs 
Microbial Genome 
Improving Body 
Discovering Disease
How Will Detailed Knowledge of Microbiome Ecology 
Radically Change Medicine and Wellness? 
Your Body Has 10 Times 
As Many Microbe Cells As Human Cells 
99% of Your 
DNA Genes 
Are in Microbe Cells 
Not Human Cells 
Challenge: 
Map Out Microbial Ecology and Function 
in Health and Disease States
Intense Scientific Research is Underway 
on Understanding the Human Microbiome 
June 8, 2012 June 14, 2012 
August 18, 2012
To Map Out the Dynamics of Autoimmune Microbiome Ecology 
Couples Next Generation Genome Sequencers to Big Data Supercomputers 
• Metagenomic Sequencing 
– JCVI Produced 
– ~150 Billion DNA Bases From 
Seven of LS Stool Samples Over 1.5 Years 
– We Downloaded ~3 Trillion DNA Bases 
From NIH Human Microbiome Program Data Base 
– 255 Healthy People, 21 with IBD 
• Supercomputing (Weizhong Li, JCVI/HLI/UCSD): 
– ~20 CPU-Years on SDSC’s Gordon 
– ~4 CPU-Years on Dell’s HPC Cloud 
• Produced Relative Abundance of 
– ~10,000 Bacteria, Archaea, Viruses in ~300 People 
– ~3Million Filled Spreadsheet Cells 
Illumina HiSeq 2000 at JCVI 
SDSC Gordon Data Supercomputer 
Example: Inflammatory Bowel Disease (IBD)
Computational NextGen Sequencing Pipeline: 
From Sequence to Taxonomy and Function 
PI: (Weizhong Li, CRBS, UCSD): 
NIH R01HG005978 (2010-2013, $1.1M)
Next Step 
Programmability, Scalability and Reproducibility using bioKepler 
www.kepler-project.org 
www.biokepler.org 
National 
Resources 
(Gordon) (Comet) 
(Lonestar) (Stampede) 
Optimized 
Cloud 
Resources 
Local Cluster 
Resources 
Source: 
Ilkay 
Altintas, 
SDSC
How Best to Analyze The Microbiome Datasets 
to Discover Patterns in Health and Disease? 
Can We Find New Noninvasive Diagnostics 
In Microbiome Ecologies?
We Found Major State Shifts in Microbial Ecology Phyla 
Between Healthy and Two Forms of IBD 
Most 
Common 
Microbial 
Phyla 
Average HE 
Average Ulcerative Colitis Average LS Average Crohn’s Disease 
Collapse of Bacteroidetes 
Explosion of Actinobacteria 
Explosion of 
Proteobacteria 
Hybrid of UC and CD 
High Level of Archaea
Using Scalable Visualization Allows Comparison of 
the Relative Abundance of 200 Microbe Species 
Comparing 3 LS Time Snapshots (Left) 
with Healthy, Crohn’s, Ulcerative Colitis (Right Top to Bottom) 
Calit2 VROOM-FuturePatient Expedition
Using Dell HPC Cloud and Dell Analytics 
to Discover Microbial Diagnostics for Disease Dynamics 
• Can We Distinguish Noninvasively Between Health and Disease States? 
• Are There Subsets of Health or Disease States? 
• Can We Track Time Development of the Disease State? 
• Can Novel Microbial Diagnostics Differentiate Health and Disease States?
Using Microbiome Profiles to Survey 155 Subjects 
for Unhealthy Candidates
Dell Analytics Separates The 4 Patient Types in Our Data 
Using Our Microbiome Species Data 
Ulcerative Colitis 
Source: Thomas Hill, Ph.D. 
Executive Director Analytics 
Dell | Information Management Group, Dell Software 
Healthy 
Colonic Crohn’s 
Ileal Crohn’s
I Built on Dell Analytics to Show Dynamic Evolution of My Microbiome 
Toward and Away from Healthy State – Colonic Crohn’s 
Source: Thomas Hill, Ph.D. 
Executive Director Analytics 
Dell | Information Management Group, Dell Software
I Built on Dell Analytics to Show Dynamic Evolution of My Microbiome 
Toward and Away from Healthy State – Colonic Crohn’s 
Healthy 
Ileal Crohn’s 
Seven Time Samples Over 1.5 Years 
Colonic Crohn’s
Dell Analytics Tree Graphs Classifies 
the 4 Health/Disease States With Just 3 Microbe Species 
Source: Thomas Hill, Ph.D. 
Executive Director Analytics 
Dell | Information Management Group, Dell Software
Our Relative Abundance Results Across ~300 People 
Show Why Dell Analytics Tree Classifier Works 
UC 100x Healthy 
Healthy 100x CD 
LS 100x UC 
We Produced Similar Results for ~2500 Microbial Species
Using Ayasdi’s Advanced Topological Data Analysis 
to Separate Healthy from Disease States 
All Healthy 
All Healthy 
All Ileal Crohn’s 
Using Ayasdi Categorical Data Lens 
Healthy, Ulcerative 
Colitis, and LS 
All Healthy 
Analysis by Mehrdad Yazdani, Calit2 
Talk to Ayasdi in the Intel Booth at SC14
Ayasdi Enables Discovery of Differences Between 
Healthy and Disease States Using Microbiome Species 
Healthy LS 
Ileal Crohn’s Ulcerative Colitis 
High in Healthy and LS 
High in Healthy and 
Ulcerative Colitis 
High in Both LS and 
Ileal Crohn’s Disease 
Using Multidimensional 
Scaling Lens with 
Correlation Metric 
Analysis by Mehrdad Yazdani, Calit2
From Taxonomy to Function: 
Analysis of LS Clusters of Orthologous Groups (COGs) 
Analysis: Weizhong Li & Sitao Wu, UCSD
In a “Healthy” Gut Microbiome: 
Large Taxonomy Variation, Low Protein Family Variation 
Over 200 People 
Source: Nature, 486, 207-212 (2012)
Ratio of HE11529 to Ave HE 
Test to see How Much Variation There is Within Healthy 
Ratio of Random HE11529 to Healthy Average for Each Nonzero KEGG 
Most KEGGs Are Within 10x 
Of Healthy for a Random HE
However, Our Research Shows Large Changes 
in Protein Families Between Health and Disease 
Ratio of CD Average to Healthy Average for Each Nonzero KEGG 
KEGGs Greatly Increased 
In the Disease State 
Most KEGGs Are Within 10x 
In Healthy and Ileal Crohn’s Disease 
KEGGs Greatly Decreased 
In the Disease State 
Over 7000 KEGGs Which Are Nonzero 
in Health and Disease States 
Note Hi/Low 
Symmetry
Note UC Has Many Few KEGGs that are Much Smaller than HE; 
Also Fewer KEGGs That are Nonzero; Note Asymmetry Between High & Low 
Ratio of UC Average to Healthy Average for Each Nonzero KEGG 
Most KEGGs Are Within 10x 
In Healthy and Ulcerative Colitis 
KEGGs Greatly Increased 
In the Disease State 
KEGGs Greatly Decreased 
In the Disease State
Note LS001 Has Many Few KEGGs that are Much Smaller than HE; 
~Same # KEGGs That are Nonzero; Note Asymmetry Between High & Low 
Ratio of LS001 Average to Healthy Average for Each Nonzero KEGG 
Most KEGGs Are Within 10x 
In Healthy and LS001 
KEGGs Greatly Increased 
In the Disease State 
KEGGs Greatly Decreased 
In the Disease State
We Can Define a Subgroup of the 10,000 KEGGs 
Which Are Extreme in the Disease State 
• Look for KEGGs That Have the Properties: 
– Are 100x in All Four Disease States 
– LS001/Ave HE 
– Ave CD/ Ave HE 
– Ave UC/Ave HE 
– Sick HE Person/Ave HE 
• There are 48 of These Extreme KEGGs 
• A New Way to Define What is Wrong with the Microbiome in Disease? 
• Can We Devise an Ayasdi Lens That Can Separates These Extreme KEGGs?
Using Ayasdi Interactively to Explore 
Protein Families in Healthy and Disease States 
Dataset from Larry Smarr Team 
With 60 Subjects (HE, CD, UC, LS) 
Each with 10,000 KEGGs - 
600,000 Cells 
Source: Pek Lum, 
Formerly Chief Data Scientist, Ayasdi
CD is Missing a Population of Bacteria 
That Exists in High Quantities in HE ( Circled with Arrow) 
• Problem is That These 
KEGGs Have Moderate 
Values of Ave CD/ Ave HE 
• How Can We Change the 
Ayasdi Lenses So That We 
Pick Out The Very High 
Values of Ratios to Ave 
HE? 
Low in CD and LS 
Source: Pek Lum, 
Formerly Chief Data Scientist, Ayasdi
This Ayasdi Lens Does Identify 
KEGGs In Which Ave CD and LS001 Are Less Than Ave HE 
• Problem is That These KEGGs 
Have Moderate Low Values 
of Ave CD/ Ave HE 
• How Can We Change the Ayasdi 
Lenses So That We Pick Out The Very 
High Values of Ratios to Ave HE?
We Found a Set of Lenes That 
Clearer Find the 43 Extreme KEGGs 
K00108(choline_dehydrogenase) 
K00673(arginine_N-succinyltransferase) 
K00867(type_I_pantothenate_kinase) 
K01169(ribonuclease_I_(enterobacter_ribonuclease)) 
K01484(succinylarginine_dihydrolase) 
K01682(aconitate_hydratase_2) 
K01690(phosphogluconate_dehydratase) 
K01825(3-hydroxyacyl-CoA_dehydrogenase_/_enoyl-CoA_hydratase_/3-hydroxybutyryl-CoA_epimerase_/_enoyl 
K02173(hypothetical_protein) 
K02317(DNA_replication_protein_DnaT) 
K02466(glucitol_operon_activator_protein) 
K02846(N-methyl-L-tryptophan_oxidase) 
K03081(3-dehydro-L-gulonate-6-phosphate_decarboxylase) 
K03119(taurine_dioxygenase) 
K03181(chorismate--pyruvate_lyase) 
K03807(AmpE_protein) 
K05522(endonuclease_VIII) 
K05775(maltose_operon_periplasmic_protein) 
K05812(conserved_hypothetical_protein) 
K05997(Fe-S_cluster_assembly_protein_SufA) 
K06073(vitamin_B12_transport_system_permease_protein) 
K06205(MioC_protein) 
K06445(acyl-CoA_dehydrogenase) 
K06447(succinylglutamic_semialdehyde_dehydrogenase) 
K07229(TrkA_domain_protein) 
K07232(cation_transport_protein_ChaC) 
K07312(putative_dimethyl_sulfoxide_reductase_subunit_YnfH_(DMSO_reductaseanchor_subunit)) 
K07336(PKHD-type_hydroxylase) 
K08989(putative_membrane_protein) 
K09018(putative_monooxygenase_RutA) 
K09456(putative_acyl-CoA_dehydrogenase) 
K09998(arginine_transport_system_permease_protein) 
K10748(DNA_replication_terminus_site-binding_protein) 
K11209(GST-like_protein) 
K11391(ribosomal_RNA_large_subunit_methyltransferase_G) 
K11734(aromatic_amino_acid_transport_protein_AroP) 
K11735(GABA_permease) 
K11925(SgrR_family_transcriptional_regulator) 
K12288(pilus_assembly_protein_HofM) 
K13255(ferric_iron_reductase_protein_FhuF) 
K14588() 
K15733() 
K15834() 
L-Infinity Centrality Lens 
Using Norm Correlation 
as Metric 
(Resolution: 242, Gain: 5.7) 
Entropy & Variance Lens 
Using Angle as Metric 
(Resolution: 30, Gain 3.00) 
Analysis by Mehrdad Yazdani, Calit2
Disease Arises from Perturbed Protein Family Networks: 
Dynamics of a Prion Perturbed Network in Mice 
Source: Lee Hood, ISB 31 
Our Next Goal is to Create 
Such Perturbed Networks in Humans
Visualizing Time Series of 
150 LS Blood and Stool Variables, Each Over 5-10 Years 
Calit2 64 megapixel VROOM 
One Blood Draw 
For Me
Only One of My Blood Measurements 
Was Far Out of Range--Indicating Chronic Inflammation 
Episodic Peaks in Inflammation 
Followed by Spontaneous Drops 
Normal Range 
<1 mg/L 
27x Upper Limit 
Normal 
Complex Reactive Protein (CRP) is a Blood Biomarker 
for Detecting Presence of Inflammation
Adding Stool Tests Revealed 
Oscillatory Behavior in an Immune Variable 
Typical 
Lactoferrin 
Value for 
Active 
IBD 
Normal Range 
<7.3 μg/mL 
124x Upper Limit 
Hypothesis: Lactoferrin Oscillations 
Coupled to Relative Abundance 
of Microbes that Require Iron 
Antibiotics 
Antibiotics 
Lactoferrin is a Protein Shed from Neutrophils - 
An Antibacterial that Sequesters Iron
Fine Time-Resolution Sampling Enables Analysis of 
Dynamical Innate and Adaptive Immune Dysfunction 
Normal 
Innate Immune System 
Normal 
Adaptive Immune System
By Overlaying a Number of Immune/Inflammation Variables, 
CRP 
SED 
Lact 
Lyzo 
SigA 
Calp 
It Appears There May be Phase Correlations 
Data Analytics by Benjamin Smarr, UC Berkeley
One Can Use Sine Fitting with Least Squares 
To Try and Approximate the Time Series Dynamics 
5 Sines 
Data Analytics by Benjamin Smarr, UC Berkeley
With Low Resolution Sine Fitting, 
There Is Indication of Phase Correlation 
Data Analytics by Benjamin Smarr, UC Berkeley 
2 Sines
Are There Ayasdi Tools to More Deeply Analyze Such Time Series?
UC San Diego Will Be Carrying Out 
a Major Clinical Study of IBD Using These Techniques 
Announced Last Friday! 
Inflammatory Bowel Disease Biobank 
For Healthy and Disease Patients 
Already 120 Enrolled, 
Goal is 1500 
Drs. William J. Sandborn, John Chang, & Brigid Boland 
UCSD School of Medicine, Division of Gastroenterology
Inexpensive Consumer Time Series of Microbiome 
Now Possible Through Ubiome 
Data source: LS (Stool Samples); 
Sequencing and Analysis Ubiome
By Crowdsourcing, Ubiome Can Show 
I Have a Major Disruption of My Gut Microbiome 
(-) 
(+) 
LS Sample on September 24, 2014 
Visit Ubiome in the Exponential Medicine 
Healthcare Innovation Lab
Where I Believe We are Headed: 
Predictive, Personalized, Preventive, & Participatory Medicine 
Will Grow to 1000, Then 10,000, 
Then 100,000 
www.newsweek.com/2009/06/26/a-doctor-s-vision-of-the-future-of-medicine.html
Genetic Sequencing of Humans and Their Microbes 
Is a Huge Growth Area and the Future Foundation of Medicine 
Source: @EricTopol 
Twitter 9/27/2014
Thanks to Our Great Team! 
UCSD Metagenomics Team 
Weizhong Li 
Sitao Wu 
Calit2@UCSD 
Future Patient Team 
Jerry Sheehan 
Tom DeFanti 
Kevin Patrick 
Jurgen Schulze 
Andrew Prudhomme 
Philip Weber 
Fred Raab 
Joe Keefe 
Ernesto Ramirez 
Ayasdi 
Devi Ramanan 
Pek Lum 
JCVI Team 
Karen Nelson 
Shibu Yooseph 
Manolito Torralba 
SDSC Team 
Michael Norman 
Mahidhar Tatineni 
Robert Sinkovits 
Dell/R Systems 
Brian Kucic 
John Thompson 
UCSD Health Sciences Team 
William J. Sandborn 
Elisabeth Evans 
John Chang 
Brigid Boland 
David Brenner

More Related Content

PDF
Using Dell’s HPC Cloud & Advanced Analytic Software to Discover Radical Chang...
PPTX
Using Supercomputers and Data Analytics to Discover the Differences in Health...
PPTX
Finding the Patterns in the Big Data From Human Microbiome Ecology
PPTX
Quantifying The Dynamics of Your Superorganism Body Using Big Data Supercompu...
PPTX
Linking Phenotype Changes to Internal/External Longitudinal Time Series in a ...
PPTX
Using Supercomputing & Advanced Analytic Software to Discover Radical Changes...
PPT
Individual, Consumer-Driven Care of the Future: Taking Wellness One Step Further
PPTX
Discovering the 100 Trillion Bacteria Living Within Each of Us
Using Dell’s HPC Cloud & Advanced Analytic Software to Discover Radical Chang...
Using Supercomputers and Data Analytics to Discover the Differences in Health...
Finding the Patterns in the Big Data From Human Microbiome Ecology
Quantifying The Dynamics of Your Superorganism Body Using Big Data Supercompu...
Linking Phenotype Changes to Internal/External Longitudinal Time Series in a ...
Using Supercomputing & Advanced Analytic Software to Discover Radical Changes...
Individual, Consumer-Driven Care of the Future: Taking Wellness One Step Further
Discovering the 100 Trillion Bacteria Living Within Each of Us

What's hot (20)

PPTX
Machine Learning Opportunities in the Explosion of Personalized Precision Med...
PPT
Quantified Health and Disease
PPTX
Stability in Health vs. Abrupt Changes in Disease in the Human Gut Microbiome...
PPTX
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
PPTX
Quantifying Your Dynamic Human Body (Including Its Microbiome), Will Move Us ...
PPT
Large Memory High Performance Computing Enables Comparison Across Human Gut M...
PPT
From N=1 to N=100: What I Have Learned from Quantifying My Superorganism Body
PPTX
Using Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
PPT
Living in a Microbial World
PPTX
Using Supercomputers and Data Science to Reveal Your Inner Microbiome
PPTX
Recognizing the Patterns Within: How Biomedical Data Can Reveal Health vs. Di...
PPTX
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
PPT
Digitally Revealing the Dynamics of Your Superorganism Body
PPTX
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
PPTX
Supercomputing Your Inner Microbiome
PPT
The Digital Transformation of Healthcare
PPTX
Discovering the Other 90% of our Human Superorganism
PPT
The Human Microbiome and the Revolution in Digital Health
PPTX
Creating a High Performance Cyberinfrastructure to Support Analysis of Illumi...
PPTX
Decoding the Software Inside of You
Machine Learning Opportunities in the Explosion of Personalized Precision Med...
Quantified Health and Disease
Stability in Health vs. Abrupt Changes in Disease in the Human Gut Microbiome...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Quantifying Your Dynamic Human Body (Including Its Microbiome), Will Move Us ...
Large Memory High Performance Computing Enables Comparison Across Human Gut M...
From N=1 to N=100: What I Have Learned from Quantifying My Superorganism Body
Using Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
Living in a Microbial World
Using Supercomputers and Data Science to Reveal Your Inner Microbiome
Recognizing the Patterns Within: How Biomedical Data Can Reveal Health vs. Di...
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Digitally Revealing the Dynamics of Your Superorganism Body
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Supercomputing Your Inner Microbiome
The Digital Transformation of Healthcare
Discovering the Other 90% of our Human Superorganism
The Human Microbiome and the Revolution in Digital Health
Creating a High Performance Cyberinfrastructure to Support Analysis of Illumi...
Decoding the Software Inside of You
Ad

Similar to Using Data Analytics to Discover the 100 Trillion Bacteria Living Within Each of Us (20)

PPTX
The Human Gut Microbiome: A New Diagnostic for Disease?
PPTX
Mapping the Human Gut Microbiome in Health and Disease Using Sequencing, Supe...
PPT
Tracking Large Variations in My Immune Biomarkers and My Gut Microbiome: Infl...
PPT
Big Data and Superorganism Genomics: Microbial Metagenomics Meets Human Genomics
PPTX
Inspired by Carl: Exploring the Microbial Dynamics Within
PPT
Quantifying Your Superorganism Body Using Big Data Supercomputing
PPTX
Exploring the Dynamics of The Microbiome in Health and Disease
PDF
Machine Learning in Healthcare by Mehrdad Yazdani
PPTX
Discovering Human Gut Microbiome Dynamics
PPTX
Using Data Analytics to Discover the 100 Trillion Bacteria Living Within Each...
PPT
Tracking Immune Biomarkers and the Human Gut Microbiome: Inflammation, Croh...
PPTX
Supercomputing Your Inner Microbiome
PPTX
Measuring the Human Brain-Gut Microbiome-Immune System Dynamics: a Big Data C...
PPTX
Capturing the Interactive Dynamics of the Human Host/Microbiome System
PPT
Using Genetic Sequencing to Unravel the Dynamics of Your Superorganism Body
PPTX
The Systems Biology Dynamics of the Human Immune System and Gut Microbiome
PPTX
Know Thyself: Quantifying Your Human Body and Its One Hundred Trillion Microbes
PPTX
Toward Novel Human Microbiome Surveillance Diagnostics to Support Public Health
PPT
Exploring Our Inner Universe Using Supercomputers and Gene Sequencers
PPT
Interactions of the Immune System with the Gut Microbiome in Inflammatory Bo...
The Human Gut Microbiome: A New Diagnostic for Disease?
Mapping the Human Gut Microbiome in Health and Disease Using Sequencing, Supe...
Tracking Large Variations in My Immune Biomarkers and My Gut Microbiome: Infl...
Big Data and Superorganism Genomics: Microbial Metagenomics Meets Human Genomics
Inspired by Carl: Exploring the Microbial Dynamics Within
Quantifying Your Superorganism Body Using Big Data Supercomputing
Exploring the Dynamics of The Microbiome in Health and Disease
Machine Learning in Healthcare by Mehrdad Yazdani
Discovering Human Gut Microbiome Dynamics
Using Data Analytics to Discover the 100 Trillion Bacteria Living Within Each...
Tracking Immune Biomarkers and the Human Gut Microbiome: Inflammation, Croh...
Supercomputing Your Inner Microbiome
Measuring the Human Brain-Gut Microbiome-Immune System Dynamics: a Big Data C...
Capturing the Interactive Dynamics of the Human Host/Microbiome System
Using Genetic Sequencing to Unravel the Dynamics of Your Superorganism Body
The Systems Biology Dynamics of the Human Immune System and Gut Microbiome
Know Thyself: Quantifying Your Human Body and Its One Hundred Trillion Microbes
Toward Novel Human Microbiome Surveillance Diagnostics to Support Public Health
Exploring Our Inner Universe Using Supercomputers and Gene Sequencers
Interactions of the Immune System with the Gut Microbiome in Inflammatory Bo...
Ad

More from Larry Smarr (20)

PPTX
Smart Patients, Big Data, NextGen Primary Care
PPTX
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
PPTX
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
PPTX
National Research Platform: Application Drivers
PPT
From Supercomputing to the Grid - Larry Smarr
PPTX
The CENIC-AI Resource - Los Angeles Community College District (LACCD)
PPT
Redefining Collaboration through Groupware - From Groupware to Societyware
PPT
The Coming of the Grid - September 8-10,1997
PPT
Supercomputers: Directions in Technology, Architecture, and Applications
PPT
High Performance Geographic Information Systems
PPT
Data Intensive Applications at UCSD: Driving a Campus Research Cyberinfrastru...
PPT
Enhanced Telepresence and Green IT — The Next Evolution in the Internet
PPTX
The CENIC AI Resource CENIC AIR - CENIC Retreat 2024
PPTX
The CENIC-AI Resource: The Right Connection
PPTX
The Pacific Research Platform: The First Six Years
PPTX
The NSF Grants Leading Up to CHASE-CI ENS
PPTX
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
PPTX
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
PPTX
Toward a National Research Platform to Enable Data-Intensive Computing
PPTX
Digital Twins of Physical Reality - Future in Review
Smart Patients, Big Data, NextGen Primary Care
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
National Research Platform: Application Drivers
From Supercomputing to the Grid - Larry Smarr
The CENIC-AI Resource - Los Angeles Community College District (LACCD)
Redefining Collaboration through Groupware - From Groupware to Societyware
The Coming of the Grid - September 8-10,1997
Supercomputers: Directions in Technology, Architecture, and Applications
High Performance Geographic Information Systems
Data Intensive Applications at UCSD: Driving a Campus Research Cyberinfrastru...
Enhanced Telepresence and Green IT — The Next Evolution in the Internet
The CENIC AI Resource CENIC AIR - CENIC Retreat 2024
The CENIC-AI Resource: The Right Connection
The Pacific Research Platform: The First Six Years
The NSF Grants Leading Up to CHASE-CI ENS
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Toward a National Research Platform to Enable Data-Intensive Computing
Digital Twins of Physical Reality - Future in Review

Recently uploaded (20)

PDF
01. Histology New Classification of histo is clear calssification
PPTX
PEDIATRIC OSCE, MBBS, by Dr. Sangit Chhantyal(IOM)..pptx
PPTX
Rheumatic heart diseases with Type 2 Diabetes Mellitus
PDF
NURSING INFORMATICS AND NURSE ENTREPRENEURSHIP
PDF
Assessment of Complications in Patients Maltreated with Fixed Self Cure Acryl...
PPTX
Nancy Caroline Emergency Paramedic Chapter 7
PPTX
Arthritis Types, Signs & Treatment with physiotherapy management
PPTX
Nancy Caroline Emergency Paramedic Chapter 17
PDF
Back node with known primary managementt
PPTX
Theories and Principles of Nursing Management
PPTX
Dissertationn. Topics for obg pg(3).pptx
PPTX
Nancy Caroline Emergency Paramedic Chapter 13
PPT
Pyramid Points Lab Values Power Point(11).ppt
PPT
Pyramid Points Acid Base Power Point (10).ppt
PPTX
unit1-introduction of nursing education..
PDF
Introduction to Clinical Psychology, 4th Edition by John Hunsley Test Bank.pdf
DOCX
ch 9 botes for OB aka Pregnant women eww
PDF
Essentials of Hysteroscopy at World Laparoscopy Hospital
PPTX
Nancy Caroline Emergency Paramedic Chapter 16
PPTX
Newer Technologies in medical field.pptx
01. Histology New Classification of histo is clear calssification
PEDIATRIC OSCE, MBBS, by Dr. Sangit Chhantyal(IOM)..pptx
Rheumatic heart diseases with Type 2 Diabetes Mellitus
NURSING INFORMATICS AND NURSE ENTREPRENEURSHIP
Assessment of Complications in Patients Maltreated with Fixed Self Cure Acryl...
Nancy Caroline Emergency Paramedic Chapter 7
Arthritis Types, Signs & Treatment with physiotherapy management
Nancy Caroline Emergency Paramedic Chapter 17
Back node with known primary managementt
Theories and Principles of Nursing Management
Dissertationn. Topics for obg pg(3).pptx
Nancy Caroline Emergency Paramedic Chapter 13
Pyramid Points Lab Values Power Point(11).ppt
Pyramid Points Acid Base Power Point (10).ppt
unit1-introduction of nursing education..
Introduction to Clinical Psychology, 4th Edition by John Hunsley Test Bank.pdf
ch 9 botes for OB aka Pregnant women eww
Essentials of Hysteroscopy at World Laparoscopy Hospital
Nancy Caroline Emergency Paramedic Chapter 16
Newer Technologies in medical field.pptx

Using Data Analytics to Discover the 100 Trillion Bacteria Living Within Each of Us

  • 1. “Using Data Analytics to Discover the 100 Trillion Bacteria Living Within Each of Us” Invited Talk Ayasdi Menlo Park, CA December 5, 2014 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net 1
  • 2. From One to a Billion Data Points Defining Me: The Exponential Rise in Body Data in Just One Decade Billion: My Full DNA, MRI/CT Images Million: My DNA SNPs, Zeo, FitBit One: Hundred: My Blood Variables WeigMhyt Weight Blood Variables SNPs Microbial Genome Improving Body Discovering Disease
  • 3. How Will Detailed Knowledge of Microbiome Ecology Radically Change Medicine and Wellness? Your Body Has 10 Times As Many Microbe Cells As Human Cells 99% of Your DNA Genes Are in Microbe Cells Not Human Cells Challenge: Map Out Microbial Ecology and Function in Health and Disease States
  • 4. Intense Scientific Research is Underway on Understanding the Human Microbiome June 8, 2012 June 14, 2012 August 18, 2012
  • 5. To Map Out the Dynamics of Autoimmune Microbiome Ecology Couples Next Generation Genome Sequencers to Big Data Supercomputers • Metagenomic Sequencing – JCVI Produced – ~150 Billion DNA Bases From Seven of LS Stool Samples Over 1.5 Years – We Downloaded ~3 Trillion DNA Bases From NIH Human Microbiome Program Data Base – 255 Healthy People, 21 with IBD • Supercomputing (Weizhong Li, JCVI/HLI/UCSD): – ~20 CPU-Years on SDSC’s Gordon – ~4 CPU-Years on Dell’s HPC Cloud • Produced Relative Abundance of – ~10,000 Bacteria, Archaea, Viruses in ~300 People – ~3Million Filled Spreadsheet Cells Illumina HiSeq 2000 at JCVI SDSC Gordon Data Supercomputer Example: Inflammatory Bowel Disease (IBD)
  • 6. Computational NextGen Sequencing Pipeline: From Sequence to Taxonomy and Function PI: (Weizhong Li, CRBS, UCSD): NIH R01HG005978 (2010-2013, $1.1M)
  • 7. Next Step Programmability, Scalability and Reproducibility using bioKepler www.kepler-project.org www.biokepler.org National Resources (Gordon) (Comet) (Lonestar) (Stampede) Optimized Cloud Resources Local Cluster Resources Source: Ilkay Altintas, SDSC
  • 8. How Best to Analyze The Microbiome Datasets to Discover Patterns in Health and Disease? Can We Find New Noninvasive Diagnostics In Microbiome Ecologies?
  • 9. We Found Major State Shifts in Microbial Ecology Phyla Between Healthy and Two Forms of IBD Most Common Microbial Phyla Average HE Average Ulcerative Colitis Average LS Average Crohn’s Disease Collapse of Bacteroidetes Explosion of Actinobacteria Explosion of Proteobacteria Hybrid of UC and CD High Level of Archaea
  • 10. Using Scalable Visualization Allows Comparison of the Relative Abundance of 200 Microbe Species Comparing 3 LS Time Snapshots (Left) with Healthy, Crohn’s, Ulcerative Colitis (Right Top to Bottom) Calit2 VROOM-FuturePatient Expedition
  • 11. Using Dell HPC Cloud and Dell Analytics to Discover Microbial Diagnostics for Disease Dynamics • Can We Distinguish Noninvasively Between Health and Disease States? • Are There Subsets of Health or Disease States? • Can We Track Time Development of the Disease State? • Can Novel Microbial Diagnostics Differentiate Health and Disease States?
  • 12. Using Microbiome Profiles to Survey 155 Subjects for Unhealthy Candidates
  • 13. Dell Analytics Separates The 4 Patient Types in Our Data Using Our Microbiome Species Data Ulcerative Colitis Source: Thomas Hill, Ph.D. Executive Director Analytics Dell | Information Management Group, Dell Software Healthy Colonic Crohn’s Ileal Crohn’s
  • 14. I Built on Dell Analytics to Show Dynamic Evolution of My Microbiome Toward and Away from Healthy State – Colonic Crohn’s Source: Thomas Hill, Ph.D. Executive Director Analytics Dell | Information Management Group, Dell Software
  • 15. I Built on Dell Analytics to Show Dynamic Evolution of My Microbiome Toward and Away from Healthy State – Colonic Crohn’s Healthy Ileal Crohn’s Seven Time Samples Over 1.5 Years Colonic Crohn’s
  • 16. Dell Analytics Tree Graphs Classifies the 4 Health/Disease States With Just 3 Microbe Species Source: Thomas Hill, Ph.D. Executive Director Analytics Dell | Information Management Group, Dell Software
  • 17. Our Relative Abundance Results Across ~300 People Show Why Dell Analytics Tree Classifier Works UC 100x Healthy Healthy 100x CD LS 100x UC We Produced Similar Results for ~2500 Microbial Species
  • 18. Using Ayasdi’s Advanced Topological Data Analysis to Separate Healthy from Disease States All Healthy All Healthy All Ileal Crohn’s Using Ayasdi Categorical Data Lens Healthy, Ulcerative Colitis, and LS All Healthy Analysis by Mehrdad Yazdani, Calit2 Talk to Ayasdi in the Intel Booth at SC14
  • 19. Ayasdi Enables Discovery of Differences Between Healthy and Disease States Using Microbiome Species Healthy LS Ileal Crohn’s Ulcerative Colitis High in Healthy and LS High in Healthy and Ulcerative Colitis High in Both LS and Ileal Crohn’s Disease Using Multidimensional Scaling Lens with Correlation Metric Analysis by Mehrdad Yazdani, Calit2
  • 20. From Taxonomy to Function: Analysis of LS Clusters of Orthologous Groups (COGs) Analysis: Weizhong Li & Sitao Wu, UCSD
  • 21. In a “Healthy” Gut Microbiome: Large Taxonomy Variation, Low Protein Family Variation Over 200 People Source: Nature, 486, 207-212 (2012)
  • 22. Ratio of HE11529 to Ave HE Test to see How Much Variation There is Within Healthy Ratio of Random HE11529 to Healthy Average for Each Nonzero KEGG Most KEGGs Are Within 10x Of Healthy for a Random HE
  • 23. However, Our Research Shows Large Changes in Protein Families Between Health and Disease Ratio of CD Average to Healthy Average for Each Nonzero KEGG KEGGs Greatly Increased In the Disease State Most KEGGs Are Within 10x In Healthy and Ileal Crohn’s Disease KEGGs Greatly Decreased In the Disease State Over 7000 KEGGs Which Are Nonzero in Health and Disease States Note Hi/Low Symmetry
  • 24. Note UC Has Many Few KEGGs that are Much Smaller than HE; Also Fewer KEGGs That are Nonzero; Note Asymmetry Between High & Low Ratio of UC Average to Healthy Average for Each Nonzero KEGG Most KEGGs Are Within 10x In Healthy and Ulcerative Colitis KEGGs Greatly Increased In the Disease State KEGGs Greatly Decreased In the Disease State
  • 25. Note LS001 Has Many Few KEGGs that are Much Smaller than HE; ~Same # KEGGs That are Nonzero; Note Asymmetry Between High & Low Ratio of LS001 Average to Healthy Average for Each Nonzero KEGG Most KEGGs Are Within 10x In Healthy and LS001 KEGGs Greatly Increased In the Disease State KEGGs Greatly Decreased In the Disease State
  • 26. We Can Define a Subgroup of the 10,000 KEGGs Which Are Extreme in the Disease State • Look for KEGGs That Have the Properties: – Are 100x in All Four Disease States – LS001/Ave HE – Ave CD/ Ave HE – Ave UC/Ave HE – Sick HE Person/Ave HE • There are 48 of These Extreme KEGGs • A New Way to Define What is Wrong with the Microbiome in Disease? • Can We Devise an Ayasdi Lens That Can Separates These Extreme KEGGs?
  • 27. Using Ayasdi Interactively to Explore Protein Families in Healthy and Disease States Dataset from Larry Smarr Team With 60 Subjects (HE, CD, UC, LS) Each with 10,000 KEGGs - 600,000 Cells Source: Pek Lum, Formerly Chief Data Scientist, Ayasdi
  • 28. CD is Missing a Population of Bacteria That Exists in High Quantities in HE ( Circled with Arrow) • Problem is That These KEGGs Have Moderate Values of Ave CD/ Ave HE • How Can We Change the Ayasdi Lenses So That We Pick Out The Very High Values of Ratios to Ave HE? Low in CD and LS Source: Pek Lum, Formerly Chief Data Scientist, Ayasdi
  • 29. This Ayasdi Lens Does Identify KEGGs In Which Ave CD and LS001 Are Less Than Ave HE • Problem is That These KEGGs Have Moderate Low Values of Ave CD/ Ave HE • How Can We Change the Ayasdi Lenses So That We Pick Out The Very High Values of Ratios to Ave HE?
  • 30. We Found a Set of Lenes That Clearer Find the 43 Extreme KEGGs K00108(choline_dehydrogenase) K00673(arginine_N-succinyltransferase) K00867(type_I_pantothenate_kinase) K01169(ribonuclease_I_(enterobacter_ribonuclease)) K01484(succinylarginine_dihydrolase) K01682(aconitate_hydratase_2) K01690(phosphogluconate_dehydratase) K01825(3-hydroxyacyl-CoA_dehydrogenase_/_enoyl-CoA_hydratase_/3-hydroxybutyryl-CoA_epimerase_/_enoyl K02173(hypothetical_protein) K02317(DNA_replication_protein_DnaT) K02466(glucitol_operon_activator_protein) K02846(N-methyl-L-tryptophan_oxidase) K03081(3-dehydro-L-gulonate-6-phosphate_decarboxylase) K03119(taurine_dioxygenase) K03181(chorismate--pyruvate_lyase) K03807(AmpE_protein) K05522(endonuclease_VIII) K05775(maltose_operon_periplasmic_protein) K05812(conserved_hypothetical_protein) K05997(Fe-S_cluster_assembly_protein_SufA) K06073(vitamin_B12_transport_system_permease_protein) K06205(MioC_protein) K06445(acyl-CoA_dehydrogenase) K06447(succinylglutamic_semialdehyde_dehydrogenase) K07229(TrkA_domain_protein) K07232(cation_transport_protein_ChaC) K07312(putative_dimethyl_sulfoxide_reductase_subunit_YnfH_(DMSO_reductaseanchor_subunit)) K07336(PKHD-type_hydroxylase) K08989(putative_membrane_protein) K09018(putative_monooxygenase_RutA) K09456(putative_acyl-CoA_dehydrogenase) K09998(arginine_transport_system_permease_protein) K10748(DNA_replication_terminus_site-binding_protein) K11209(GST-like_protein) K11391(ribosomal_RNA_large_subunit_methyltransferase_G) K11734(aromatic_amino_acid_transport_protein_AroP) K11735(GABA_permease) K11925(SgrR_family_transcriptional_regulator) K12288(pilus_assembly_protein_HofM) K13255(ferric_iron_reductase_protein_FhuF) K14588() K15733() K15834() L-Infinity Centrality Lens Using Norm Correlation as Metric (Resolution: 242, Gain: 5.7) Entropy & Variance Lens Using Angle as Metric (Resolution: 30, Gain 3.00) Analysis by Mehrdad Yazdani, Calit2
  • 31. Disease Arises from Perturbed Protein Family Networks: Dynamics of a Prion Perturbed Network in Mice Source: Lee Hood, ISB 31 Our Next Goal is to Create Such Perturbed Networks in Humans
  • 32. Visualizing Time Series of 150 LS Blood and Stool Variables, Each Over 5-10 Years Calit2 64 megapixel VROOM One Blood Draw For Me
  • 33. Only One of My Blood Measurements Was Far Out of Range--Indicating Chronic Inflammation Episodic Peaks in Inflammation Followed by Spontaneous Drops Normal Range <1 mg/L 27x Upper Limit Normal Complex Reactive Protein (CRP) is a Blood Biomarker for Detecting Presence of Inflammation
  • 34. Adding Stool Tests Revealed Oscillatory Behavior in an Immune Variable Typical Lactoferrin Value for Active IBD Normal Range <7.3 μg/mL 124x Upper Limit Hypothesis: Lactoferrin Oscillations Coupled to Relative Abundance of Microbes that Require Iron Antibiotics Antibiotics Lactoferrin is a Protein Shed from Neutrophils - An Antibacterial that Sequesters Iron
  • 35. Fine Time-Resolution Sampling Enables Analysis of Dynamical Innate and Adaptive Immune Dysfunction Normal Innate Immune System Normal Adaptive Immune System
  • 36. By Overlaying a Number of Immune/Inflammation Variables, CRP SED Lact Lyzo SigA Calp It Appears There May be Phase Correlations Data Analytics by Benjamin Smarr, UC Berkeley
  • 37. One Can Use Sine Fitting with Least Squares To Try and Approximate the Time Series Dynamics 5 Sines Data Analytics by Benjamin Smarr, UC Berkeley
  • 38. With Low Resolution Sine Fitting, There Is Indication of Phase Correlation Data Analytics by Benjamin Smarr, UC Berkeley 2 Sines
  • 39. Are There Ayasdi Tools to More Deeply Analyze Such Time Series?
  • 40. UC San Diego Will Be Carrying Out a Major Clinical Study of IBD Using These Techniques Announced Last Friday! Inflammatory Bowel Disease Biobank For Healthy and Disease Patients Already 120 Enrolled, Goal is 1500 Drs. William J. Sandborn, John Chang, & Brigid Boland UCSD School of Medicine, Division of Gastroenterology
  • 41. Inexpensive Consumer Time Series of Microbiome Now Possible Through Ubiome Data source: LS (Stool Samples); Sequencing and Analysis Ubiome
  • 42. By Crowdsourcing, Ubiome Can Show I Have a Major Disruption of My Gut Microbiome (-) (+) LS Sample on September 24, 2014 Visit Ubiome in the Exponential Medicine Healthcare Innovation Lab
  • 43. Where I Believe We are Headed: Predictive, Personalized, Preventive, & Participatory Medicine Will Grow to 1000, Then 10,000, Then 100,000 www.newsweek.com/2009/06/26/a-doctor-s-vision-of-the-future-of-medicine.html
  • 44. Genetic Sequencing of Humans and Their Microbes Is a Huge Growth Area and the Future Foundation of Medicine Source: @EricTopol Twitter 9/27/2014
  • 45. Thanks to Our Great Team! UCSD Metagenomics Team Weizhong Li Sitao Wu Calit2@UCSD Future Patient Team Jerry Sheehan Tom DeFanti Kevin Patrick Jurgen Schulze Andrew Prudhomme Philip Weber Fred Raab Joe Keefe Ernesto Ramirez Ayasdi Devi Ramanan Pek Lum JCVI Team Karen Nelson Shibu Yooseph Manolito Torralba SDSC Team Michael Norman Mahidhar Tatineni Robert Sinkovits Dell/R Systems Brian Kucic John Thompson UCSD Health Sciences Team William J. Sandborn Elisabeth Evans John Chang Brigid Boland David Brenner