SlideShare a Scribd company logo
“Determining the Human Gut Microbiome
using Genome Sequencing and Dell’s Cloud Computing”
Dell Webinar
April 29, 2014
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net
1
The Human Microbiome Ecology is Critical
to Health and Disease
Inclusion of the Microbiome
Will Radically Change Medicine
99% of Your
DNA Genes
Are in Microbe Cells
Not Human Cells
Your Body Has 10 Times
As Many Microbe Cells As Human Cells
To Map Out the Dynamics of My Microbiome Ecology
I Partnered with the J. Craig Venter Institute
• JCVI Did Metagenomic
Sequencing on Seven of
My Stool Samples
Over 1.5 Years
• Sequencing on
Illumina HiSeq 2000
– Generates 100bp Reads
• JCVI Lab Manager,
Genomic Medicine
– Manolito Torralba
• IRB PI Karen Nelson
– President JCVI
Illumina HiSeq 2000 at JCVI
Manolito Torralba, JCVI Karen Nelson, JCVI
We Downloaded Additional Phenotypes from NIH’s
Human Microbiome Program For Comparative Analysis
5 Ileal Crohn’s Patients,
3 Points in Time
2 Ulcerative Colitis Patients,
6 Points in Time
“Healthy” Individuals
Download Raw Reads
~100M Per Person
Source: Jerry Sheehan, Calit2
Weizhong Li, Sitao Wu, CRBS, UCSD
Total of ~28 Billion Reads
Or 2.8 Trillion DNA Bases
“Disease” Patients
250 Subjects
1 Point in Time Larry Smarr
7 Points in Time
Over 1.5 Years
Inflammatory Bowel Disease
We Created a Reference Database
Of Known Gut Genomes
• NCBI April 2013
– 2471 Complete + 5543 Draft Bacteria & Archaea Genomes
– 2399 Complete Virus Genomes
– 26 Complete Fungi Genomes
– 309 HMP Eukaryote Reference Genomes
• Total 10,741 genomes, ~30 GB of sequences
Now to Align Our 28 Billion Reads
Against the Reference Database
Source: Weizhong Li, Sitao Wu, CRBS, UCSD
Computational NextGen Sequencing Pipeline:
From Sequence to Taxonomy and Function
PI: (Weizhong Li, CRBS, UCSD):
NIH R01HG005978 (2010-2013, $1.1M)
We Used Dell’s Cloud (Sanger) to Analyze
All of Our Human Gut Microbiomes
• Dell’s Sanger Cluster
– 32 Nodes, 512 Cores,
– 48GB RAM per Node
– 50GB SSD Local Drive, 390TB Lustre File System
• We Processed the Taxonomic Relative Abundance
– Used ~35,000 Core-Hours on Dell’s Sanger
– With 30 TB data
• Full Processing to Function (COGs, KEGGs)
– Would Require ~1-2 Million Core-Hours
Source: Weizhong Li, UCSD
Dell Cloud Results Are Leading
Toward Microbiome Disease Diagnosis
UC 100x Healthy
CD 100x Healthy
We Produced Similar Results for ~2500 Microbial Species

More Related Content

PPTX
Creating a High Performance Cyberinfrastructure to Support Analysis of Illumi...
PPT
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
PPT
Living in a Microbial World
PPTX
Supercomputing Your Inner Microbiome
PPT
Using Supercomputers and Supernetworks to Explore the Ocean of Life
PPTX
Trends In Genomics
PPTX
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
PPTX
Using Supercomputers and Gene Sequencers to Discover Your Inner Microbiome
Creating a High Performance Cyberinfrastructure to Support Analysis of Illumi...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Living in a Microbial World
Supercomputing Your Inner Microbiome
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Trends In Genomics
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Using Supercomputers and Gene Sequencers to Discover Your Inner Microbiome

What's hot (20)

PPTX
Using Supercomputers and Data Science to Reveal Your Inner Microbiome
PPTX
Stability in Health vs. Abrupt Changes in Disease in the Human Gut Microbiome...
PPT
How Studying Astrophysics and Coral Reefs Enabled Me to Become an Empowered,...
PPTX
Assay Lab Within Your Body: Biometrics and Biomes
PPTX
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
PPTX
Finding the Patterns in the Big Data From Human Microbiome Ecology
PPTX
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
PPTX
Machine Learning Opportunities in the Explosion of Personalized Precision Med...
PPT
Quantifying Your Superorganism Body Using Big Data Supercomputing
PPT
Quantifying the Time Progression of the Interaction of the Human Immune Syste...
PPTX
Using Supercomputers and Data Analytics to Discover the Differences in Health...
PPTX
Quantifying The Dynamics of Your Superorganism Body Using Big Data Supercompu...
PPTX
Linking Phenotype Changes to Internal/External Longitudinal Time Series in a ...
PPTX
Quantifying Your Dynamic Human Body (Including Its Microbiome), Will Move Us ...
PDF
Using Dell’s HPC Cloud & Advanced Analytic Software to Discover Radical Chang...
PPTX
Discovering the 100 Trillion Bacteria Living Within Each of Us
PPTX
Fifty Years of Supercomputing: From Colliding Black Holes to Dynamic Microbio...
PPTX
Quantified Self On Being A Personal Genomic Observatory
PPTX
Decoding the Software Inside of You
PPT
From N=1 to N=100: What I Have Learned from Quantifying My Superorganism Body
Using Supercomputers and Data Science to Reveal Your Inner Microbiome
Stability in Health vs. Abrupt Changes in Disease in the Human Gut Microbiome...
How Studying Astrophysics and Coral Reefs Enabled Me to Become an Empowered,...
Assay Lab Within Your Body: Biometrics and Biomes
Using Supercomputers to Discover the 100 Trillion Bacteria Living Within Each...
Finding the Patterns in the Big Data From Human Microbiome Ecology
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Machine Learning Opportunities in the Explosion of Personalized Precision Med...
Quantifying Your Superorganism Body Using Big Data Supercomputing
Quantifying the Time Progression of the Interaction of the Human Immune Syste...
Using Supercomputers and Data Analytics to Discover the Differences in Health...
Quantifying The Dynamics of Your Superorganism Body Using Big Data Supercompu...
Linking Phenotype Changes to Internal/External Longitudinal Time Series in a ...
Quantifying Your Dynamic Human Body (Including Its Microbiome), Will Move Us ...
Using Dell’s HPC Cloud & Advanced Analytic Software to Discover Radical Chang...
Discovering the 100 Trillion Bacteria Living Within Each of Us
Fifty Years of Supercomputing: From Colliding Black Holes to Dynamic Microbio...
Quantified Self On Being A Personal Genomic Observatory
Decoding the Software Inside of You
From N=1 to N=100: What I Have Learned from Quantifying My Superorganism Body
Ad

Similar to Determining the Human Gut Microbiome Using Genome Sequencing and Dell's Cloud Computing (20)

PPTX
Discovering Human Gut Microbiome Dynamics
PPTX
Exploring the Dynamics of The Microbiome in Health and Disease
PPT
Quantifying Your Superorganism Body Using Big Data Supercomputing
PPTX
The Human Microbiome, Supercomputers,and the Advancement of Medicine
PPTX
Discovering the Other 90% of our Human Superorganism
PPTX
Toward Novel Human Microbiome Surveillance Diagnostics to Support Public Health
PPT
Exploring Our Inner Universe Using Supercomputers and Gene Sequencers
PPTX
Using Data Analytics to Discover the 100 Trillion Bacteria Living Within Each...
PPTX
Mapping the Human Gut Microbiome in Health and Disease Using Sequencing, Supe...
PPT
Big Data and Superorganism Genomics: Microbial Metagenomics Meets Human Genomics
PPT
Large Memory High Performance Computing Enables Comparison Across Human Gut M...
PPTX
Inspired by Carl: Exploring the Microbial Dynamics Within
PPT
Tracking Large Variations in My Immune Biomarkers and My Gut Microbiome: Infl...
PPTX
Recognizing the Patterns Within: How Biomedical Data Can Reveal Health vs. Di...
PPTX
Capturing the Interactive Dynamics of the Human Host/Microbiome System
PPTX
The Human Gut Microbiome: A New Diagnostic for Disease?
PPT
Interactions of the Immune System with the Gut Microbiome in Inflammatory Bo...
PPT
Using Genetic Sequencing to Unravel the Dynamics of Your Superorganism Body
PPTX
Supercomputing Your Inner Microbiome
PPTX
Measuring the Human Brain-Gut Microbiome-Immune System Dynamics: a Big Data C...
Discovering Human Gut Microbiome Dynamics
Exploring the Dynamics of The Microbiome in Health and Disease
Quantifying Your Superorganism Body Using Big Data Supercomputing
The Human Microbiome, Supercomputers,and the Advancement of Medicine
Discovering the Other 90% of our Human Superorganism
Toward Novel Human Microbiome Surveillance Diagnostics to Support Public Health
Exploring Our Inner Universe Using Supercomputers and Gene Sequencers
Using Data Analytics to Discover the 100 Trillion Bacteria Living Within Each...
Mapping the Human Gut Microbiome in Health and Disease Using Sequencing, Supe...
Big Data and Superorganism Genomics: Microbial Metagenomics Meets Human Genomics
Large Memory High Performance Computing Enables Comparison Across Human Gut M...
Inspired by Carl: Exploring the Microbial Dynamics Within
Tracking Large Variations in My Immune Biomarkers and My Gut Microbiome: Infl...
Recognizing the Patterns Within: How Biomedical Data Can Reveal Health vs. Di...
Capturing the Interactive Dynamics of the Human Host/Microbiome System
The Human Gut Microbiome: A New Diagnostic for Disease?
Interactions of the Immune System with the Gut Microbiome in Inflammatory Bo...
Using Genetic Sequencing to Unravel the Dynamics of Your Superorganism Body
Supercomputing Your Inner Microbiome
Measuring the Human Brain-Gut Microbiome-Immune System Dynamics: a Big Data C...
Ad

More from Larry Smarr (20)

PPTX
Smart Patients, Big Data, NextGen Primary Care
PPTX
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
PPTX
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
PPTX
National Research Platform: Application Drivers
PPT
From Supercomputing to the Grid - Larry Smarr
PPTX
The CENIC-AI Resource - Los Angeles Community College District (LACCD)
PPT
Redefining Collaboration through Groupware - From Groupware to Societyware
PPT
The Coming of the Grid - September 8-10,1997
PPT
Supercomputers: Directions in Technology, Architecture, and Applications
PPT
High Performance Geographic Information Systems
PPT
Data Intensive Applications at UCSD: Driving a Campus Research Cyberinfrastru...
PPT
Enhanced Telepresence and Green IT — The Next Evolution in the Internet
PPTX
The CENIC AI Resource CENIC AIR - CENIC Retreat 2024
PPTX
The CENIC-AI Resource: The Right Connection
PPTX
The Pacific Research Platform: The First Six Years
PPTX
The NSF Grants Leading Up to CHASE-CI ENS
PPTX
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
PPTX
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
PPTX
Toward a National Research Platform to Enable Data-Intensive Computing
PPTX
Digital Twins of Physical Reality - Future in Review
Smart Patients, Big Data, NextGen Primary Care
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
National Research Platform: Application Drivers
From Supercomputing to the Grid - Larry Smarr
The CENIC-AI Resource - Los Angeles Community College District (LACCD)
Redefining Collaboration through Groupware - From Groupware to Societyware
The Coming of the Grid - September 8-10,1997
Supercomputers: Directions in Technology, Architecture, and Applications
High Performance Geographic Information Systems
Data Intensive Applications at UCSD: Driving a Campus Research Cyberinfrastru...
Enhanced Telepresence and Green IT — The Next Evolution in the Internet
The CENIC AI Resource CENIC AIR - CENIC Retreat 2024
The CENIC-AI Resource: The Right Connection
The Pacific Research Platform: The First Six Years
The NSF Grants Leading Up to CHASE-CI ENS
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Toward a National Research Platform to Enable Data-Intensive Computing
Digital Twins of Physical Reality - Future in Review

Recently uploaded (20)

PDF
Copy of OB - Exam #2 Study Guide. pdf
PPT
STD NOTES INTRODUCTION TO COMMUNITY HEALT STRATEGY.ppt
PPT
MENTAL HEALTH - NOTES.ppt for nursing students
PPTX
ACID BASE management, base deficit correction
PPTX
Acid Base Disorders educational power point.pptx
PPTX
Transforming Regulatory Affairs with ChatGPT-5.pptx
PPTX
Important Obstetric Emergency that must be recognised
PPTX
antibiotics rational use of antibiotics.pptx
PPTX
Cardiovascular - antihypertensive medical backgrounds
PPTX
surgery guide for USMLE step 2-part 1.pptx
PDF
Medical Evidence in the Criminal Justice Delivery System in.pdf
DOCX
RUHS II MBBS Microbiology Paper-II with Answer Key | 6th August 2025 (New Sch...
DOC
Adobe Premiere Pro CC Crack With Serial Key Full Free Download 2025
PPT
ASRH Presentation for students and teachers 2770633.ppt
PPTX
Respiratory drugs, drugs acting on the respi system
PDF
Human Health And Disease hggyutgghg .pdf
PPTX
anal canal anatomy with illustrations...
PDF
focused on the development and application of glycoHILIC, pepHILIC, and comm...
PPT
OPIOID ANALGESICS AND THEIR IMPLICATIONS
PPT
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
Copy of OB - Exam #2 Study Guide. pdf
STD NOTES INTRODUCTION TO COMMUNITY HEALT STRATEGY.ppt
MENTAL HEALTH - NOTES.ppt for nursing students
ACID BASE management, base deficit correction
Acid Base Disorders educational power point.pptx
Transforming Regulatory Affairs with ChatGPT-5.pptx
Important Obstetric Emergency that must be recognised
antibiotics rational use of antibiotics.pptx
Cardiovascular - antihypertensive medical backgrounds
surgery guide for USMLE step 2-part 1.pptx
Medical Evidence in the Criminal Justice Delivery System in.pdf
RUHS II MBBS Microbiology Paper-II with Answer Key | 6th August 2025 (New Sch...
Adobe Premiere Pro CC Crack With Serial Key Full Free Download 2025
ASRH Presentation for students and teachers 2770633.ppt
Respiratory drugs, drugs acting on the respi system
Human Health And Disease hggyutgghg .pdf
anal canal anatomy with illustrations...
focused on the development and application of glycoHILIC, pepHILIC, and comm...
OPIOID ANALGESICS AND THEIR IMPLICATIONS
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt

Determining the Human Gut Microbiome Using Genome Sequencing and Dell's Cloud Computing

  • 1. “Determining the Human Gut Microbiome using Genome Sequencing and Dell’s Cloud Computing” Dell Webinar April 29, 2014 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net 1
  • 2. The Human Microbiome Ecology is Critical to Health and Disease Inclusion of the Microbiome Will Radically Change Medicine 99% of Your DNA Genes Are in Microbe Cells Not Human Cells Your Body Has 10 Times As Many Microbe Cells As Human Cells
  • 3. To Map Out the Dynamics of My Microbiome Ecology I Partnered with the J. Craig Venter Institute • JCVI Did Metagenomic Sequencing on Seven of My Stool Samples Over 1.5 Years • Sequencing on Illumina HiSeq 2000 – Generates 100bp Reads • JCVI Lab Manager, Genomic Medicine – Manolito Torralba • IRB PI Karen Nelson – President JCVI Illumina HiSeq 2000 at JCVI Manolito Torralba, JCVI Karen Nelson, JCVI
  • 4. We Downloaded Additional Phenotypes from NIH’s Human Microbiome Program For Comparative Analysis 5 Ileal Crohn’s Patients, 3 Points in Time 2 Ulcerative Colitis Patients, 6 Points in Time “Healthy” Individuals Download Raw Reads ~100M Per Person Source: Jerry Sheehan, Calit2 Weizhong Li, Sitao Wu, CRBS, UCSD Total of ~28 Billion Reads Or 2.8 Trillion DNA Bases “Disease” Patients 250 Subjects 1 Point in Time Larry Smarr 7 Points in Time Over 1.5 Years Inflammatory Bowel Disease
  • 5. We Created a Reference Database Of Known Gut Genomes • NCBI April 2013 – 2471 Complete + 5543 Draft Bacteria & Archaea Genomes – 2399 Complete Virus Genomes – 26 Complete Fungi Genomes – 309 HMP Eukaryote Reference Genomes • Total 10,741 genomes, ~30 GB of sequences Now to Align Our 28 Billion Reads Against the Reference Database Source: Weizhong Li, Sitao Wu, CRBS, UCSD
  • 6. Computational NextGen Sequencing Pipeline: From Sequence to Taxonomy and Function PI: (Weizhong Li, CRBS, UCSD): NIH R01HG005978 (2010-2013, $1.1M)
  • 7. We Used Dell’s Cloud (Sanger) to Analyze All of Our Human Gut Microbiomes • Dell’s Sanger Cluster – 32 Nodes, 512 Cores, – 48GB RAM per Node – 50GB SSD Local Drive, 390TB Lustre File System • We Processed the Taxonomic Relative Abundance – Used ~35,000 Core-Hours on Dell’s Sanger – With 30 TB data • Full Processing to Function (COGs, KEGGs) – Would Require ~1-2 Million Core-Hours Source: Weizhong Li, UCSD
  • 8. Dell Cloud Results Are Leading Toward Microbiome Disease Diagnosis UC 100x Healthy CD 100x Healthy We Produced Similar Results for ~2500 Microbial Species