Robustness, Reproducibility! 
& Ecological Consistency! 
in the Demarcation of Operational Taxonomic Units 
Sebastian Schmidt! 
Institute for Molecular Life Sciences! 
University of Zürich! 
sebastian.schmidt@imls.uzh.ch
A general workflow in (targeted) metagenomics 
ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch
A general workflow in (targeted) metagenomics 
Jean Tinguely, “Heureka”! 
Lake Zürich 
Sampling &! 
Sequencing “Making OTUs” 
ISME15, Seoul, 2014/08/29 
Understanding! 
your data! 
(hopefully) 
sebastian.schmidt@imls.uzh.ch
Concepts 
ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch 
replicability! 
! 
robustness! 
! 
reproducibility! 
! 
ecological consistency
Concepts 
ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch 
replicability! 
! 
robustness! 
! 
reproducibility! 
! 
ecological consistency 
42! 
Life, the Universe and 
Everything? 
42! 
Life, the Universe and 
Everything?
Concepts 
ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch 
replicability! 
! 
robustness! 
! 
reproducibility! 
! 
ecological consistency 
42! 
Life, the Universe and 
Everything? 
42! 
Life, Microbial Ecology 
and Everything?
Concepts 
ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch 
replicability! 
! 
robustness! 
! 
reproducibility! 
! 
ecological consistency 
42! 
Life, the Universe and 
Everything? 
Life, the Universe and 
Everything? 
42!
The Human Skin Microbiome (HSM) dataset:! 
! 
~115,000 full-length 16S sequences! 
! 
sampled from 21 distinct body sites! 
Grice et al, Science, 2009 
! 
clustered to 97% sequence identity 
ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch
OTU A 
UPARSE 
all methods 
agree (almost) 
5,423 SEQ. perfectly 
SMALL OTUS 
õ4EQ 
PER OTU 
methods provide 
different # of “small” 
OTUs 
	õTFRQFS056
 
OTU D 
2,692 SEQ. 
TQMJUUJOH 
by Uclust 
OTU C 

4EQ. 
TQMJUUJOH 
by CL 
OTU B 
8,465 SEQ. 
MVNQJOH 
by SL 

 OTUS 
UCLUST 
3,282 OTUS 
CD-HIT 

 OTUS 
SINGLE LINKAGE 

 OTUS 
COMPLETE LINKAGE 

 OTUS 
AVERAGE LINKAGE 

 OTUS 
ISME15, Seoul, 2014/08/29 Schmidt et al, Environ Microbiol, in press
MORISITA-HORN 
0.749 
0.154 
AL 
0.920 
-0.075 
CL 0.932 
-0.095 
0.682 
-0.051 
SL HIT CD-UCLUST UPARSE AVERAGE LINKAGE COMPLETE LINKAGE SINGLE LINKAGE CD-HIT UCLUST UPARSE CHAO1 
INV SIMPSON 
SHANNON 
0.988 
0.387 
0.969 
0.300 
0.991 
0.150 
0.794 
0.079 
0.981 
-0.008 
0.576 
0.116 
0.991 
-0.299 
0.858 
-0.261 
0.966 
-0.136 
MORISITA-HORN 
0.772 
-0.099 
0.928 
-0.060 
0.545 
-0.131 
0.986 
0.522 
0.773 
0.463 
0.953 
0.216 
MORISITA-HORN 
0.749 
0.154 
0.922 
0.087 
0.551 
0.167 
0.973 
-0.686 
0.817 
-0.561 
0.949 
-0.286 
0.953 
0.064 
0.513 
-0.358 
0.358 
-0.207 
0.984 
0.204 
0.672 
-0.163 
0.584 
0.780 
0.665 
0.350 
0.922 
0.087 
PEARSON CORRELATION 
MORISITA-HORN 
0.918 
0.128 
0.802 
-0.194 
0.805 
1.511 
0.855 
-0.181 
0.948 
0.390 
0.912 
0.427 
0.472 
-0.325 
0.785 
2.033 
0.694 
-0.280 
0.668 
0.853 
0.799 
0.642 
0.993 
0.027 
0.920 
0.151 
0.643 
-0.158 
0.881 
1.347 
0.884 
-0.126 
0.922 
0.292 
0.905 
0.356 
0.937 
-0.068 
0.981 
-0.056 
0.791 
-0.209 
0.838 
1.734 
0.862 
-0.201 
0.945 
0.592 
0.912 
0.506 
0.482 
-0.366 
0.614 
-0.091 
0.984 
-0.095 
0.764 
-0.084 
0.613 
0.518 
0.608 
0.214 
0.762 
0.036 
0.977 
-0.164 
0.945 
0.055 
0.989 
-0.098 
0.998 
-0.071 
0.558 
-0.271 
0.464 
-0.040 
0.978 
-0.482 
0.759 
-0.009 
0.584 
0.219 
0.574 
0.063 
0.552 
-0.298 
0.630 
-0.076 
0.972 
-0.318 
0.793 
-0.064 
0.569 
0.317 
0.570 
0.134 
Q Q 
MORISITA-HORN 
0.436 
-0.422 
0.520 
0.118 
Qö 
0.837 
-1.829 
0.617 
0.117 
0.559 
-0.073 
0.434 
-0.292 
0.886 
-0.015 
0.993 
0.224 
0.957 
-0.020 
0.974 
0.202 
0.995 
0.079 
SØRENSEN 
JABD 
CHAO1 
INV SIMPSON 
SHANNON 
SØRENSEN 
JABD 
CHAO1 
INV SIMPSON 
SHANNON 
SØRENSEN 
JABD 
CHAO1 
INV SIMPSON 
SHANNON 
SØRENSEN 
JABD 
CHAO1 
INV SIMPSON 
SHANNON 
Q 
SØRENSEN 
MORISITA-HORN 
JABD 
CHAO1 
INV SIMPSON 
SHANNON 
SØRENSEN 
JABD 
B 
significance 
of mean shift 
red: shift towards higher values 
blue: shift towards lower values 
0.551 
0.167 
0.973 
-0.686 
0.817 
-0.561 
0.949 
-0.286 
PEARSON CORRELATION 
RELATIVE SHIFT (LOG2) 
RELATIVE SHIFT (LOG2) 
PEARSON CORRELATION 
RELATIVE SHIFT (LOG2) 
Qö 
Q 
Q Q 
ISME15, Seoul, 2014/08/29 
Schmidt et al, Environ Microbiol, in press! 
(data from Grice et al, Science, 2009)
90 
95 
100 
90 
95 
100 
90 
95 
100 
90 
95 
100 
90 
95 
100 
90 
95 
100 
AVERAGE LINKAGE 
90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 
0.5 0.6 0.7 0.8 0.9 1.0 
UCLUST CD-HIT SINGLE LINKAGE COMPLETE LINKAGE AVERAGE LINKAGE 
COMPLETE LINKAGE SINGLE LINKAGE CD-HIT UCLUST UPARSE 
UPARSE 
ADJUSTED 
MUTUAL INF 
A ‘global’ 16S dataset! 
~1.1M full-length sequences! 
≥30k samples, diverse 
environments! 
! 
Adjusted Mutual 
Information (AMI), a 
measure of partition 
similarity! 
! 
high replicability! 
…when clustering twice to 
the exact same threshold! 
! 
differential robustness! 
…to slight threshold changes 
Schmidt et al, Environ Microbiol,! 
in press
90 
95 
100 
90 
95 
100 
90 
95 
100 
90 
95 
100 
90 
95 
100 
90 
95 
100 
AVERAGE LINKAGE 
90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 
0.5 0.6 0.7 0.8 0.9 1.0 
UCLUST CD-HIT SINGLE LINKAGE COMPLETE LINKAGE AVERAGE LINKAGE 
COMPLETE LINKAGE SINGLE LINKAGE CD-HIT UCLUST UPARSE 
UPARSE 
ADJUSTED 
MUTUAL INF 
A ‘global’ 16S dataset! 
~1.1M full-length sequences! 
≥30k samples, diverse 
environments! 
! 
Adjusted Mutual 
Information (AMI), a 
measure of partition 
similarity! 
! 
high replicability! 
…when clustering twice to 
the exact same threshold! 
! 
differential robustness! 
…to slight threshold changes! 
! 
differential reproducibility! 
pairwise similarity maxima 
between methods off-diagonal! 
comparability of results across 
studies? 
Schmidt et al, Environ Microbiol,! 
in press
90 
95 
100 
90 
95 
100 
90 
95 
100 
90 
95 
100 
90 
95 
100 
90 
95 
100 
AVERAGE LINKAGE 
90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 
0.5 0.6 0.7 0.8 0.9 1.0 
UCLUST CD-HIT SINGLE LINKAGE COMPLETE LINKAGE AVERAGE LINKAGE 
COMPLETE LINKAGE SINGLE LINKAGE CD-HIT UCLUST UPARSE 
UPARSE 
ADJUSTED 
MUTUAL INF 
“Greengenes 97”! 
vs.! 
“SILVA 99”! 
AMI ~ 0.65 
A ‘global’ 16S dataset! 
~1.1M full-length sequences! 
≥30k samples, diverse 
environments! 
! 
Adjusted Mutual 
Information (AMI), a 
measure of partition 
similarity! 
! 
high replicability! 
…when clustering twice to 
the exact same threshold! 
! 
differential robustness! 
…to slight threshold changes! 
Schmidt et al, Environ Microbiol,! 
in press 
! 
differential reproducibility! 
pairwise similarity maxima 
between methods off-diagonal! 
comparability of results across 
studies?
90 
95 
100 
90 
95 
100 
90 
95 
100 
90 
95 
100 
90 
95 
100 
90 
95 
100 
AVERAGE LINKAGE 
90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 
0.5 0.6 0.7 0.8 0.9 1.0 
UCLUST CD-HIT SINGLE LINKAGE COMPLETE LINKAGE AVERAGE LINKAGE 
COMPLETE LINKAGE SINGLE LINKAGE CD-HIT UCLUST UPARSE 
UPARSE 
ADJUSTED 
MUTUAL INF 
A 
~1.1M 
≥ 
environments 
! 
Adjusted Mutual 
Information (AMI) 
measure of partition 
similarity! 
! 
high 
… the exact same threshold! 
! 
differential 
…to slight threshold changes! 
! 
differential 
pairwise similarity maxima 
between 
comparability of results across 
studies? 
Schmidt et al, Environ Microbiol,! 
in press 
But which method makes the ‘best’ OTUs?
‘Good’ OTUs should correspond to ‘true’ bacterial lineages (‘species’)! 
they should comply with evolutionary theory of bacterial speciation! 
BUT: no unifying / commonly accepted bacterial species concept! 
! 
! 
Two main criteria for theory-compliant OTUs! 
phylogenetic consistency (represent monophyletic lineages)! 
ecological consistency (represent ecologically homogenous groups of organisms) 
Gevers et al., Nat Rev Microbiol, 2005! 
Cohan, Philos T R Soc B, 2006! 
Koeppel et al., PNAS, 2008! 
Hunt et al., Science, 2008! 
Fraser et al., Science, 2009! 
Vos, Trends Microbiol, 2011! 
Koeppel  Wu, NAR, 2013! 
Preheim et al, Appl Env Microbiol, 2013! 
! 
[and many more…] 
ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch
rumen 
halotolerant 
hypersaline 
pathogenic 
intestinal infection 
degradation 
day 
resistant 
producing 
gut 
endosymbiont 
deep 
mat 
thermophilic 
high 
metal 
activated 
cold 
milk 
soil 
environmental diverse 
iron 
diversity 
sediment 
water 
community marine 
associated 
acid 
plant 
sludge 
anaerobic 
field 
sea 
rhizosphere lake 
spring 
halophilic 
culture 
consortium 
extremely 
archaeon 
paddy 
pesticide 
activity root 
surface 
production 
contaminated 
wastewater 
structure 
degrading 
seawater 
treatment 
hydrothermal 
oil 
feces 
hot 
biofilm 
waste 
endophytic 
nodule 
freshwater deepsea 
reactor 
vent 
enrichment 
microbiota 
growth 
disease 
pathogen 
salt 
patient 
aerobic 
coastal 
mine host 
fermented 
culturable 
habitat archaeal actinomycete 
res 
pond 
lactic 
forest 
region 
clinical 
symbiont 
biodegradation 
temperature 
skin 
moderately 
antarctic 
methanogenic 
swab 
reveal 
zone 
ocean 
tract 
natural 
control 
bioreactor 
river 
sponge 
produced 
carbon 
blood 
fluid 
coral 
mud 
food 
shift 
highly 
leaf 
ice 
organic 
rock 
draft 
diet 
oral 
tree 
solar 
stream 
coast 
wild 
core 
fed 
low 
grown 
tidal 
fecal 
mineral 
flat 
compost 
saline 
symbiotic 
content 
saltern 
alkaline 
diseased 
rhizobia 
wound 
active 
intestine 
traditional 
sand 
subsurface 
antimicrobial 
fermentation 
effluent 
comb 
sewage 
condition 
caused 
product 
treating 
sulfatereducing 
ecology 
purification 
station 
hydrocarbon 
nitrogen 
coidentity 
degrade 
resistance 
mangrove 
methane 
polluted 
acidic 
antibiotic 
cultivation 
oxidation 
probiotic cultured 
methanogen 
process 
revealed 
tissue 
agricultural 
chemical 
heterotrophic 
biocontrol 
alkaliphilic 
legume 
denitrifying 
indigenous 
industrial 
correlate 
defense 
cluster 
heavy 
reduction 
tolerant 
aquifer 
reservoir 
wetland 
diabetic 
enriched 
chloroplast 
cultivated 
cultureindependent 
nitrogenfixing 
prolonged 
protease 
basin 
compound 
mesophilic 
microbiome 
removal 
formation 
laboratory 
adult 
anoxic 
petroleum 
termite 
functional 
aquatic 
association 
factory 
fresh 
antifungal 
korean 
terrestrial 
involved 
promoting 
geothermal 
bay 
black 
island 
sulfur 
drainage 
farm 
groundwater 
hydrogen 
ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch
AVERAGE LINKAGE 
SINGLE LINKAGE 
1000 10000 100000 
NUMBER OF OTUS 
6000 
5500 
5000 
4500 
4000 
3500 
3000 
2500 
2000 
1500 
1000 
A 
ECOLOGICAL CONSISTENCY SCORE (ECS) 
COMPLETE LINKAGE 
UCLUST 
CD-HIT 
97% NOMINAL SIMILARITY 
ISME15, Seoul, 2014/08/29 Schmidt et al, PLOS Comp Biol, 2014
AVERAGE LINKAGE 
SINGLE LINKAGE 
1000 10000 100000 
NUMBER OF OTUS 
6000 
5500 
5000 
4500 
4000 
3500 
3000 
2500 
2000 
1500 
1000 
A 
ECOLOGICAL CONSISTENCY SCORE (ECS) 
COMPLETE LINKAGE 
UCLUST 
CD-HIT 
97% NOMINAL SIMILARITY 
D BACTERIA, SAMPLING SITES 
B ARCHAEA, ECOLOGICAL TERMS 
100 1000 10000 
E BACTERIA, HOST TAXONOMY 
F 
5000 
4000 
3000 
2000 
1000 
1000 10000 100000 
2500 
2000 
1500 
1000 
500 
0 
1000 10000 100000 
2500 
2000 
1500 
1000 
500 
BACTERIA, ENVO TERMS 
1000 10000 100000 
C 
100 1000 10000 
400 
300 
200 
100 
EUKARYA, ECOLOGICAL TERMS 
700 
600 
500 
400 
300 
ISME15, Seoul, 2014/08/29 Schmidt et al, PLOS Comp Biol, 2014
Conclusions 
ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch 
replicability! 
clustering was generally replicable! 
! 
robustness! 
AL, CL  CD-HIT were highly robust to (slightly) changing thresholds, UCLUST, UPARSE  SL more sensitive! 
similar trends for robustness to clustering context and choice of subregion (not shown)! 
! 
reproducibility! 
surprisingly discordant partitions by different methods! 
similarity maxima generally off-diagonal! 
AL and CD-HIT most similar pair! 
implications for reference-based OTU-binning: choice of reference clustering determines quality!! 
! 
ecological consistency! 
CL provided most consistent OTU sets! 
implications for taxonomy and species definitions?

More Related Content

PPT
PPTX
Teresa Coque Hospital Universitario Ramón y Cajal.
PDF
Bacterial Taxonomy.pdf
PPTX
characteristics used in classification of micro-organism
PPTX
Beiko taconic-nov3
PDF
Phylogeny-Driven Approaches to Genomics and Metagenomics - talk by Jonathan E...
PDF
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
PPTX
Adensonian classification
Teresa Coque Hospital Universitario Ramón y Cajal.
Bacterial Taxonomy.pdf
characteristics used in classification of micro-organism
Beiko taconic-nov3
Phylogeny-Driven Approaches to Genomics and Metagenomics - talk by Jonathan E...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Adensonian classification

Similar to Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units (20)

PDF
Comprehensive Exam Slides 11/13/2013
PPTX
Microbiological identification
PPTX
GLBIO/CCBC Metagenomics Workshop
PPTX
bacterial systematics in the diversity of bacteria
PDF
The Era of the Microbiome - Talk by Jonathan Eisen
PDF
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
PDF
Leveraging ancestral state reconstruction to infer community function from a ...
PPTX
ISB nov 2014
PPT
Chapter II Classification of bacteria.ppt
PPTX
CCBC tutorial beiko
PDF
EVE 161 Winter 2018 Class 16
PPTX
Identification of micro organisms
PPTX
Beiko dcsi2013
PDF
Marine Host-Microbiome Interactions: Challenges and Opportunities
PDF
Pizza club - May 2016 - Shaman
PDF
Automated DNA purification from diverse Microbiome samples using dedicated Mi...
PPT
Classification of Enterobacteriaceae family
PDF
Deep learning methods in metagenomics: a review
PDF
16s r rna
PPTX
Criteria for classification of microbes
Comprehensive Exam Slides 11/13/2013
Microbiological identification
GLBIO/CCBC Metagenomics Workshop
bacterial systematics in the diversity of bacteria
The Era of the Microbiome - Talk by Jonathan Eisen
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Leveraging ancestral state reconstruction to infer community function from a ...
ISB nov 2014
Chapter II Classification of bacteria.ppt
CCBC tutorial beiko
EVE 161 Winter 2018 Class 16
Identification of micro organisms
Beiko dcsi2013
Marine Host-Microbiome Interactions: Challenges and Opportunities
Pizza club - May 2016 - Shaman
Automated DNA purification from diverse Microbiome samples using dedicated Mi...
Classification of Enterobacteriaceae family
Deep learning methods in metagenomics: a review
16s r rna
Criteria for classification of microbes
Ad

Recently uploaded (20)

PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPTX
Microbes in human welfare class 12 .pptx
PDF
Packaging materials of fruits and vegetables
PDF
Social preventive and pharmacy. Pdf
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PPTX
endocrine - management of adrenal incidentaloma.pptx
PPTX
Probability.pptx pearl lecture first year
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPTX
Seminar Hypertension and Kidney diseases.pptx
PPTX
A powerpoint on colorectal cancer with brief background
PDF
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
PPTX
Substance Disorders- part different drugs change body
PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PDF
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Microbes in human welfare class 12 .pptx
Packaging materials of fruits and vegetables
Social preventive and pharmacy. Pdf
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
endocrine - management of adrenal incidentaloma.pptx
Probability.pptx pearl lecture first year
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Seminar Hypertension and Kidney diseases.pptx
A powerpoint on colorectal cancer with brief background
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
Substance Disorders- part different drugs change body
BODY FLUIDS AND CIRCULATION class 11 .pptx
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
Ad

Robustness, Reproducibility & Ecological Consistency in the Demarcation of Operational Taxonomic Units

  • 1. Robustness, Reproducibility! & Ecological Consistency! in the Demarcation of Operational Taxonomic Units Sebastian Schmidt! Institute for Molecular Life Sciences! University of Zürich! sebastian.schmidt@imls.uzh.ch
  • 2. A general workflow in (targeted) metagenomics ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch
  • 3. A general workflow in (targeted) metagenomics Jean Tinguely, “Heureka”! Lake Zürich Sampling &! Sequencing “Making OTUs” ISME15, Seoul, 2014/08/29 Understanding! your data! (hopefully) sebastian.schmidt@imls.uzh.ch
  • 4. Concepts ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch replicability! ! robustness! ! reproducibility! ! ecological consistency
  • 5. Concepts ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch replicability! ! robustness! ! reproducibility! ! ecological consistency 42! Life, the Universe and Everything? 42! Life, the Universe and Everything?
  • 6. Concepts ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch replicability! ! robustness! ! reproducibility! ! ecological consistency 42! Life, the Universe and Everything? 42! Life, Microbial Ecology and Everything?
  • 7. Concepts ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch replicability! ! robustness! ! reproducibility! ! ecological consistency 42! Life, the Universe and Everything? Life, the Universe and Everything? 42!
  • 8. The Human Skin Microbiome (HSM) dataset:! ! ~115,000 full-length 16S sequences! ! sampled from 21 distinct body sites! Grice et al, Science, 2009 ! clustered to 97% sequence identity ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch
  • 9. OTU A UPARSE all methods agree (almost) 5,423 SEQ. perfectly SMALL OTUS õ4EQ PER OTU methods provide different # of “small” OTUs õTFRQFS056 OTU D 2,692 SEQ. TQMJUUJOH by Uclust OTU C 4EQ. TQMJUUJOH by CL OTU B 8,465 SEQ. MVNQJOH by SL OTUS UCLUST 3,282 OTUS CD-HIT OTUS SINGLE LINKAGE OTUS COMPLETE LINKAGE OTUS AVERAGE LINKAGE OTUS ISME15, Seoul, 2014/08/29 Schmidt et al, Environ Microbiol, in press
  • 10. MORISITA-HORN 0.749 0.154 AL 0.920 -0.075 CL 0.932 -0.095 0.682 -0.051 SL HIT CD-UCLUST UPARSE AVERAGE LINKAGE COMPLETE LINKAGE SINGLE LINKAGE CD-HIT UCLUST UPARSE CHAO1 INV SIMPSON SHANNON 0.988 0.387 0.969 0.300 0.991 0.150 0.794 0.079 0.981 -0.008 0.576 0.116 0.991 -0.299 0.858 -0.261 0.966 -0.136 MORISITA-HORN 0.772 -0.099 0.928 -0.060 0.545 -0.131 0.986 0.522 0.773 0.463 0.953 0.216 MORISITA-HORN 0.749 0.154 0.922 0.087 0.551 0.167 0.973 -0.686 0.817 -0.561 0.949 -0.286 0.953 0.064 0.513 -0.358 0.358 -0.207 0.984 0.204 0.672 -0.163 0.584 0.780 0.665 0.350 0.922 0.087 PEARSON CORRELATION MORISITA-HORN 0.918 0.128 0.802 -0.194 0.805 1.511 0.855 -0.181 0.948 0.390 0.912 0.427 0.472 -0.325 0.785 2.033 0.694 -0.280 0.668 0.853 0.799 0.642 0.993 0.027 0.920 0.151 0.643 -0.158 0.881 1.347 0.884 -0.126 0.922 0.292 0.905 0.356 0.937 -0.068 0.981 -0.056 0.791 -0.209 0.838 1.734 0.862 -0.201 0.945 0.592 0.912 0.506 0.482 -0.366 0.614 -0.091 0.984 -0.095 0.764 -0.084 0.613 0.518 0.608 0.214 0.762 0.036 0.977 -0.164 0.945 0.055 0.989 -0.098 0.998 -0.071 0.558 -0.271 0.464 -0.040 0.978 -0.482 0.759 -0.009 0.584 0.219 0.574 0.063 0.552 -0.298 0.630 -0.076 0.972 -0.318 0.793 -0.064 0.569 0.317 0.570 0.134 Q Q MORISITA-HORN 0.436 -0.422 0.520 0.118 Qö 0.837 -1.829 0.617 0.117 0.559 -0.073 0.434 -0.292 0.886 -0.015 0.993 0.224 0.957 -0.020 0.974 0.202 0.995 0.079 SØRENSEN JABD CHAO1 INV SIMPSON SHANNON SØRENSEN JABD CHAO1 INV SIMPSON SHANNON SØRENSEN JABD CHAO1 INV SIMPSON SHANNON SØRENSEN JABD CHAO1 INV SIMPSON SHANNON Q SØRENSEN MORISITA-HORN JABD CHAO1 INV SIMPSON SHANNON SØRENSEN JABD B significance of mean shift red: shift towards higher values blue: shift towards lower values 0.551 0.167 0.973 -0.686 0.817 -0.561 0.949 -0.286 PEARSON CORRELATION RELATIVE SHIFT (LOG2) RELATIVE SHIFT (LOG2) PEARSON CORRELATION RELATIVE SHIFT (LOG2) Qö Q Q Q ISME15, Seoul, 2014/08/29 Schmidt et al, Environ Microbiol, in press! (data from Grice et al, Science, 2009)
  • 11. 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 AVERAGE LINKAGE 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 0.5 0.6 0.7 0.8 0.9 1.0 UCLUST CD-HIT SINGLE LINKAGE COMPLETE LINKAGE AVERAGE LINKAGE COMPLETE LINKAGE SINGLE LINKAGE CD-HIT UCLUST UPARSE UPARSE ADJUSTED MUTUAL INF A ‘global’ 16S dataset! ~1.1M full-length sequences! ≥30k samples, diverse environments! ! Adjusted Mutual Information (AMI), a measure of partition similarity! ! high replicability! …when clustering twice to the exact same threshold! ! differential robustness! …to slight threshold changes Schmidt et al, Environ Microbiol,! in press
  • 12. 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 AVERAGE LINKAGE 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 0.5 0.6 0.7 0.8 0.9 1.0 UCLUST CD-HIT SINGLE LINKAGE COMPLETE LINKAGE AVERAGE LINKAGE COMPLETE LINKAGE SINGLE LINKAGE CD-HIT UCLUST UPARSE UPARSE ADJUSTED MUTUAL INF A ‘global’ 16S dataset! ~1.1M full-length sequences! ≥30k samples, diverse environments! ! Adjusted Mutual Information (AMI), a measure of partition similarity! ! high replicability! …when clustering twice to the exact same threshold! ! differential robustness! …to slight threshold changes! ! differential reproducibility! pairwise similarity maxima between methods off-diagonal! comparability of results across studies? Schmidt et al, Environ Microbiol,! in press
  • 13. 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 AVERAGE LINKAGE 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 0.5 0.6 0.7 0.8 0.9 1.0 UCLUST CD-HIT SINGLE LINKAGE COMPLETE LINKAGE AVERAGE LINKAGE COMPLETE LINKAGE SINGLE LINKAGE CD-HIT UCLUST UPARSE UPARSE ADJUSTED MUTUAL INF “Greengenes 97”! vs.! “SILVA 99”! AMI ~ 0.65 A ‘global’ 16S dataset! ~1.1M full-length sequences! ≥30k samples, diverse environments! ! Adjusted Mutual Information (AMI), a measure of partition similarity! ! high replicability! …when clustering twice to the exact same threshold! ! differential robustness! …to slight threshold changes! Schmidt et al, Environ Microbiol,! in press ! differential reproducibility! pairwise similarity maxima between methods off-diagonal! comparability of results across studies?
  • 14. 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 AVERAGE LINKAGE 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 90 95 100 0.5 0.6 0.7 0.8 0.9 1.0 UCLUST CD-HIT SINGLE LINKAGE COMPLETE LINKAGE AVERAGE LINKAGE COMPLETE LINKAGE SINGLE LINKAGE CD-HIT UCLUST UPARSE UPARSE ADJUSTED MUTUAL INF A ~1.1M ≥ environments ! Adjusted Mutual Information (AMI) measure of partition similarity! ! high … the exact same threshold! ! differential …to slight threshold changes! ! differential pairwise similarity maxima between comparability of results across studies? Schmidt et al, Environ Microbiol,! in press But which method makes the ‘best’ OTUs?
  • 15. ‘Good’ OTUs should correspond to ‘true’ bacterial lineages (‘species’)! they should comply with evolutionary theory of bacterial speciation! BUT: no unifying / commonly accepted bacterial species concept! ! ! Two main criteria for theory-compliant OTUs! phylogenetic consistency (represent monophyletic lineages)! ecological consistency (represent ecologically homogenous groups of organisms) Gevers et al., Nat Rev Microbiol, 2005! Cohan, Philos T R Soc B, 2006! Koeppel et al., PNAS, 2008! Hunt et al., Science, 2008! Fraser et al., Science, 2009! Vos, Trends Microbiol, 2011! Koeppel Wu, NAR, 2013! Preheim et al, Appl Env Microbiol, 2013! ! [and many more…] ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch
  • 16. rumen halotolerant hypersaline pathogenic intestinal infection degradation day resistant producing gut endosymbiont deep mat thermophilic high metal activated cold milk soil environmental diverse iron diversity sediment water community marine associated acid plant sludge anaerobic field sea rhizosphere lake spring halophilic culture consortium extremely archaeon paddy pesticide activity root surface production contaminated wastewater structure degrading seawater treatment hydrothermal oil feces hot biofilm waste endophytic nodule freshwater deepsea reactor vent enrichment microbiota growth disease pathogen salt patient aerobic coastal mine host fermented culturable habitat archaeal actinomycete res pond lactic forest region clinical symbiont biodegradation temperature skin moderately antarctic methanogenic swab reveal zone ocean tract natural control bioreactor river sponge produced carbon blood fluid coral mud food shift highly leaf ice organic rock draft diet oral tree solar stream coast wild core fed low grown tidal fecal mineral flat compost saline symbiotic content saltern alkaline diseased rhizobia wound active intestine traditional sand subsurface antimicrobial fermentation effluent comb sewage condition caused product treating sulfatereducing ecology purification station hydrocarbon nitrogen coidentity degrade resistance mangrove methane polluted acidic antibiotic cultivation oxidation probiotic cultured methanogen process revealed tissue agricultural chemical heterotrophic biocontrol alkaliphilic legume denitrifying indigenous industrial correlate defense cluster heavy reduction tolerant aquifer reservoir wetland diabetic enriched chloroplast cultivated cultureindependent nitrogenfixing prolonged protease basin compound mesophilic microbiome removal formation laboratory adult anoxic petroleum termite functional aquatic association factory fresh antifungal korean terrestrial involved promoting geothermal bay black island sulfur drainage farm groundwater hydrogen ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch
  • 17. AVERAGE LINKAGE SINGLE LINKAGE 1000 10000 100000 NUMBER OF OTUS 6000 5500 5000 4500 4000 3500 3000 2500 2000 1500 1000 A ECOLOGICAL CONSISTENCY SCORE (ECS) COMPLETE LINKAGE UCLUST CD-HIT 97% NOMINAL SIMILARITY ISME15, Seoul, 2014/08/29 Schmidt et al, PLOS Comp Biol, 2014
  • 18. AVERAGE LINKAGE SINGLE LINKAGE 1000 10000 100000 NUMBER OF OTUS 6000 5500 5000 4500 4000 3500 3000 2500 2000 1500 1000 A ECOLOGICAL CONSISTENCY SCORE (ECS) COMPLETE LINKAGE UCLUST CD-HIT 97% NOMINAL SIMILARITY D BACTERIA, SAMPLING SITES B ARCHAEA, ECOLOGICAL TERMS 100 1000 10000 E BACTERIA, HOST TAXONOMY F 5000 4000 3000 2000 1000 1000 10000 100000 2500 2000 1500 1000 500 0 1000 10000 100000 2500 2000 1500 1000 500 BACTERIA, ENVO TERMS 1000 10000 100000 C 100 1000 10000 400 300 200 100 EUKARYA, ECOLOGICAL TERMS 700 600 500 400 300 ISME15, Seoul, 2014/08/29 Schmidt et al, PLOS Comp Biol, 2014
  • 19. Conclusions ISME15, Seoul, 2014/08/29 sebastian.schmidt@imls.uzh.ch replicability! clustering was generally replicable! ! robustness! AL, CL CD-HIT were highly robust to (slightly) changing thresholds, UCLUST, UPARSE SL more sensitive! similar trends for robustness to clustering context and choice of subregion (not shown)! ! reproducibility! surprisingly discordant partitions by different methods! similarity maxima generally off-diagonal! AL and CD-HIT most similar pair! implications for reference-based OTU-binning: choice of reference clustering determines quality!! ! ecological consistency! CL provided most consistent OTU sets! implications for taxonomy and species definitions?