SlideShare a Scribd company logo
16S rRNA analysis using
Mothur pipeline
Eman Abdelrazik
Bioinformatics Research Assistant, Center of Informatics Science, Nile University
H3ABioNet Teaching Assistant
Before we start!
● Slides reproduced from Galaxy tutorials & H3ABioNet tutorials
● For Questions:
https://bit.ly/2N4mlv2
● Make sure you have a Galaxy account:
https://usegalaxy.org.au/
https://usegalaxy.org
Our Journey ^^
1) Theoretical:
a) Introduction
b) Analysis pipelines
2) Practical
a) File formats
b) Introduction to Galaxy
c) Mothur workflow
d) Let’s do it together
e) Do it by yourself
Theoretical
Part I: Introduction
What is the difference?
● Microbiome: Entire set of microorganism at given site,
and time
● Metagenome: entire genetic information of
microorganism at specific site/time
● Meta-Transcriptome
● Meta-Proteome
Why to study microbiome?
1) Health Care research
● Humans are full of microorganisms
● Skin, gut, oral cavity, nasal cavity,
eyes, ..
● Affects health, drug efficacy, etc
referred to as your second genome
● ~10 times more cells than you
● ~100 times more genes than you
● ~1000s different species
Cesarean Vs. Vaginal Delivery
16S rRNA Analysis using Mothur Pipeline
2) Environmental Research
2) Environmental Research
● Microbes in the soil affect plants and animals
● improve agriculture
16S rRNA Analysis using Mothur Pipeline
3) Marine
Microbiome
Research
Global Ocean Sampling Expedition
Ocean Exploration Genome Project (Pacific Ocean)
Shotgun vs. Amplicon!
● Sequence only specific gene
● No functional information
● Less complex to analyse
● Cheaper
● Sequence all DNA
● More information
● Higher complexity
● Higher cost
Amplicon (16S rRNA gene)
● Targeted approach (e.g.
16S/18S rRNA gene)
● Amplifies bacteria, not host, or
environmental fungi, plants.
● Present in all living organisms
(viruses?!)
16S rRNA Secondary Structure
Amplicon
Amplicon
With variable regions: distinguish between genus
● Pros
○ Well-established
○ Inexpensive
● Cons
○ V-region choice can bias results
○ Is based on a very well conserved gene, making it
hard to resolve species and strains
Shotgun metagenomics
Aims to sequence the "whole" metagenome
● Pros:
○ Not biased by amplicon primer set
○ Not limited by conservation of the amplicon
○ Can also provide functional information
● Cons:
○ Environmental contamination, including host
○ More expensive
○ Complex data analysis
○ Requires high performance computing, high memory.
What sequencing technologies offer for
metagenomics!
Bioinformatics
Theoretical
Part II: Analysis Pipelines
Analysis pipelines
1. Pre-processing
● There are a lot of
ways to filter and
trim your data
● Trade-off
between quality
and amount of
information
retained
Quality Control: Phred Score
16S rRNA Analysis using Mothur Pipeline
2. Chimera Removal
● During PCR multiple
sequences can
combine to form a
hybrid
● Must be removed from
your data for better
results
3. OTU Clustering
● Operational Taxonomic Unit: a cluster of similar
sequences, represented by a single consensus sequence
~ one species.
● OTU clusters are defined by a 97% identity threshold of
the 16S gene sequence variants at genus level. 98% or
99% identity is suggested for species separation.
16S rRNA Analysis using Mothur Pipeline
Search marker database and taxonomy assignment
Alignment
Results: Visualizations
1. Krona
● interactive exploration of sample taxonomy / per-sample
groups
● Illustrate abundance
Results: Visualizations
2. Phinch
● explore the community structure
● BIOM file input
● various visualizations
● multi-sample data
Diversity
Pipelines: 1. QIIME
Quantitative Insight Into
Microbial Ecology
QIIME.2 -plugins
Pipelines: 2. DADA2
DADA2 stands for - Divisive Amplicon Denoising Algorithm
Other available pipelines
● UPARSE: http://www.drive5.com/uparse
● IM Tornado:
https://github.com/pjeraldo/imtornado2
● FROGS:
https://github.com/geraldinepascal/FROGS
● VSEARCH: https://github.com/torognes/vsearch
Summary
16S rRNA Analysis using Mothur Pipeline
Practical
Part I: File Formats
Important terms
Output file formats
FASTA Format
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
Sequence Alignment/Map Format (SAM)
Header
Records
CIGAR (Compact Idiosyncratic Gapped Alignment
Report) strings.
The CIGAR string is the result of the sequence alignments, defining the sequence
of matches/mismatches and deletions (or gaps) compared to the reference
sequence.
CIGAR strings, together with the allele sequences, are used to generate a
visualization of the loci alignment.
https://samtools.github.io/hts-specs/SAMv1.pdf
SAM vs. BAM
● Binary format
● Better than fastq file in data storage especially from
different samples as it adds extra annotation to reads
(where they come from?) uBAM
BIOM format
● The Biological Observation Matrix (BIOM) format
● a general-use format for representing biological sample by
observation contingency tables
● command line interface (CLI) for working with BIOM files,
including converting between file formats, adding metadata to
BIOM files, and summarizing BIOM files.
BIOM format
http://biom-format.org/
Practical
Part II: Introduction to Galaxy
Before we start!
What is Galaxy?
● Web-based platform for biological data analysis.
● Command-line tools >> wrapped >> Galaxy
● Retain histories of analysis: re-run and share.
Galaxy Servers
Courses in Higher Education that use Galaxy
Make your account!
https://usegalaxy.org/
Galaxy Interface
Center Panel
Tools Panel History
Panel
Dataset status
job is completed
job is executing
job is queued
job is paused
job has failed
1. Refresh
history
2. View file
1
2
3
3. History setting
How to get Data?
● The maximum size limit is 50G
(uncompressed).
● Most individual file compression formats
are supported, but multi-file archives are
not (.tar, .zip).
ENA ID: PRJEP5480
Workflows: extract workflow
Your workflow
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
Edit workflow
Import workflow
Practical
Part III: Mothur workflow
https://mothur.org/
Mothur
● A collection of tools combined together.
● Mothur project, initiated by Dr. Patrick Schloss, at The
University of Michigan.
● most cited bioinformatics tool for analyzing 16S rRNA gene
sequences.
● process data generated by Sanger, PacBio, IonTorrent,
454, and Illumina (MiSeq/HiSeq).
Mothur wiki
Manual
File types
Download: Latest version “1.43.0”
https://github.com/mothur/mothur/releases/tag/v.1.43.0
https://www.mothur.org/wiki/Installation
Main steps
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
Let’s start!
1. Get Data
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
Functional Analysis
https://bpa-csiro-workshops.github.io/btp-manuals-md/modules/metagenomics-mo
dule-fda/fda/
http://motherbox.chemeng.ntua.gr/anastasia_dev/u/makis/w/copy-of-starting-from-
reads-1
Wrap up!
Resources
● Soil Tutorial:
https://galaxyproject.github.io/training-material/topics/metagenomics/tutorials/general-tutorial/tutorial.html
● Gut Tutorial:
https://galaxyproject.github.io/training-material/topics/metagenomics/tutorials/mothur-miseq-sop-short/tutoria
l.html
● https://galaxyproject.github.io/training-material/topics/metagenomics/
● https://galaxyproject.github.io/training-material/topics/introduction/
● https://moin.galaxyproject.org/Support#Dataset_status_and_how_jobs_execute
● https://docs.qiime2.org/2019.7/tutorials/
● https://benjjneb.github.io/dada2/tutorial.html
● https://www.coursera.org/learn/galaxy-project

More Related Content

PDF
Introduction to 16S Microbiome Analysis
PPTX
16S Ribosomal DNA Sequence Analysis
PDF
Flash introduction to Qiime2 -- 16S Amplicon analysis
PPT
SUSTAINABLE TOURISM DEVELOPMENT
PPTX
Turner syndrome
PPTX
Genomics
PPT
13 meiosis and sexual life cycles
PPTX
RNA-seq differential expression analysis
Introduction to 16S Microbiome Analysis
16S Ribosomal DNA Sequence Analysis
Flash introduction to Qiime2 -- 16S Amplicon analysis
SUSTAINABLE TOURISM DEVELOPMENT
Turner syndrome
Genomics
13 meiosis and sexual life cycles
RNA-seq differential expression analysis

What's hot (20)

PPT
Metagenomic analysis
PDF
RNA-seq Analysis
PPTX
Introduction to Bioinformatics
PPTX
Genes, Genomics and Proteomics
PPTX
Comparative genomics
PPTX
Single cell RNA sequencing; Methods and applications
PPTX
Single nucleotide polymorphism
PDF
BITS: UCSC genome browser - Part 1
PPT
Microarray Data Analysis
PPTX
Express sequence tags
PPTX
Introduction to second generation sequencing
PPTX
DOCX
Data retrieval tools
PPTX
Genomics(functional genomics)
PPTX
System biology and its tools
PPTX
Types of genomics ppt
PPTX
Techniques in proteomics
PPTX
Next generation sequencing technologies for crop improvement
Metagenomic analysis
RNA-seq Analysis
Introduction to Bioinformatics
Genes, Genomics and Proteomics
Comparative genomics
Single cell RNA sequencing; Methods and applications
Single nucleotide polymorphism
BITS: UCSC genome browser - Part 1
Microarray Data Analysis
Express sequence tags
Introduction to second generation sequencing
Data retrieval tools
Genomics(functional genomics)
System biology and its tools
Types of genomics ppt
Techniques in proteomics
Next generation sequencing technologies for crop improvement
Ad

Similar to 16S rRNA Analysis using Mothur Pipeline (20)

PPTX
Designing a community resource - Sandra Orchard
PDF
Overview of Next Gen Sequencing Data Analysis
PDF
Brizio rossibiodec
PDF
BioDec Srl Company Profile
PDF
BITS - Comparative genomics: the Contra tool
PDF
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
PDF
20120907 microbiome-intro
PDF
Model repositories and standard formats for model reusability
PPTX
Pine education-platform
PDF
Microbiome Analysis Methods and Protocols Robert G. Beiko
PPTX
BioAssay Express: Creating and exploiting assay metadata
PDF
AI and Machine Learning for Secondary Metabolite Prediction
PDF
Scientific Workflows: what do we have, what do we miss?
PPTX
Omprn 2018 module1_final
PPTX
Giab jan2016 intro and update 160128
PPT
PACS strategic plan and needs assessment, technical Issues, PACS architecture.
PDF
From construction to deployment of LifeWatchGreece the potentail role of EGI-...
PPTX
Best Practices for Validating a Next-Gen Sequencing Workflow
PDF
Hi-C Data Analysis 1st Edition Silvio Bicciato
PDF
OpenDiscovery
Designing a community resource - Sandra Orchard
Overview of Next Gen Sequencing Data Analysis
Brizio rossibiodec
BioDec Srl Company Profile
BITS - Comparative genomics: the Contra tool
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
20120907 microbiome-intro
Model repositories and standard formats for model reusability
Pine education-platform
Microbiome Analysis Methods and Protocols Robert G. Beiko
BioAssay Express: Creating and exploiting assay metadata
AI and Machine Learning for Secondary Metabolite Prediction
Scientific Workflows: what do we have, what do we miss?
Omprn 2018 module1_final
Giab jan2016 intro and update 160128
PACS strategic plan and needs assessment, technical Issues, PACS architecture.
From construction to deployment of LifeWatchGreece the potentail role of EGI-...
Best Practices for Validating a Next-Gen Sequencing Workflow
Hi-C Data Analysis 1st Edition Silvio Bicciato
OpenDiscovery
Ad

More from Eman Abdelrazik (12)

PPTX
Corona Virus: facts & myths
PPTX
Electronic health records and machine learning
PPT
DNA Origami
PDF
Programming in a Nutshell
PPTX
Abortion
PPTX
Organ transplantation
PPTX
Euthanasia
PPTX
stem cells' techniques
PPTX
Telomeres
PPTX
DNA extraction
PPTX
Natural PH indicator
Corona Virus: facts & myths
Electronic health records and machine learning
DNA Origami
Programming in a Nutshell
Abortion
Organ transplantation
Euthanasia
stem cells' techniques
Telomeres
DNA extraction
Natural PH indicator

Recently uploaded (20)

PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Introduction to Knowledge Engineering Part 1
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Foundation of Data Science unit number two notes
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Qualitative Qantitative and Mixed Methods.pptx
Mega Projects Data Mega Projects Data
Introduction-to-Cloud-ComputingFinal.pptx
Quality review (1)_presentation of this 21
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Galatica Smart Energy Infrastructure Startup Pitch Deck
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Supervised vs unsupervised machine learning algorithms
Introduction to Knowledge Engineering Part 1
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
.pdf is not working space design for the following data for the following dat...
oil_refinery_comprehensive_20250804084928 (1).pptx
Clinical guidelines as a resource for EBP(1).pdf
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Business Analytics and business intelligence.pdf
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx

16S rRNA Analysis using Mothur Pipeline