SlideShare a Scribd company logo
Spring 2016
Kurt Wollenberg, PhD
Phylogenetics Specialist
Bioinformatics and Computational Biosciences Branch
Office of Cyber Infrastructure and Computational Biology
Molecular Evolutionary Analysis
Using BEAST
Part 1: Introduction to Bayesian
phylogenetics and BEAST
Course Organization
• Introduction to Bayesian phylogenetics
• Introduction to BEAST
• Building a Bayesian phylogeny
• Incorporating sample time in the phylogeny
• Estimating demographic parameters
• Estimating species trees from gene trees
• Estimating ancestral trait states (esp. geography)
Lecture Organization
• Why Bayesian phylogenetics is well-suited
to the analysis of pathogen molecular
evolution
• A short tour of Bayesian MCMC analysis
• What is BEAST? An overview of the
BEAST package
• BEAST Analysis Demo
What’s so special about pathogens?
• Short generation time
• Rapid evolution
• Genotypes - easy, phenotypes - hard
• Large populations
• Structured populations
• Rigorous temporal sampling of genotypes
Why use Bayesian methods on pathogens?
• Coalescent approach is more appropriate
• Can incorporate temporal data
• Can incorporate geographical data
• Can incorporate host data
What is Bayesian analysis?
• Calculation of the probability of parameters
(tree, substitution model) given the data
(sequence alignment)
• p(θ|D) = (Likelihood x prior)/probability of
the data
• p(θ|D) = p(D|θ)p(θ)/p(D)
Exploring the posterior probability distribution
Posterior probabilities of trees and
parameters are approximated using Markov
Chain Monte Carlo (MCMC) sampling
Markov Chain: A statement of the probability
of moving from one state to another
Bayesian Analysis
What is MCMC?
Markov Chain Monte Carlo
Markov chain Monte Carlo
One link in the chain Choosing a link
What is MCMC?
Markov Chain Monte Carlo: accept or reject?
Metropolis-Hastings algorithm
PosteriorProbability
Topology A Topology B Topology C
20%
48%
32%
Accept!
Maybe
What is BEAST?
• Bayesian Evolutionary Analysis Sampling Trees
• A collection of programs for performing Bayesian
MCMC analysis of molecular sequences
• Can incorporate sample time information
• Can perform a broad range of other evolutionary
analyses using sequence data.
What is BEAST?
The Programs:
• BEAUti - Creating XML input files
• BEAST - MCMC analysis of molecular
sequences
• Tracer - Viewing MCMC output
• LogCombiner - Combining output files
• TreeAnnotator - Generate the consensus tree
• FigTree - Drawing a tree
Different types of BEAST analyses
• Calculating a Bayesian coalescent phylogeny
• Calculating a Time-Stamped Bayesian coalescent
• Estimated population dynamics (Bayesian
skyline/skyride/skygrid)
• Combined gene and species phylogeny estimate
(*BEAST)
• Phylogeographic analysis (time and location data)
Defining your analysis
• Prior knowledge of tree?
• Calibrating nodes?
• Substitution model?
• Effective population sizes?
• What priors to use?
Setting up the analysis: BEAUTi
Setting up the analysis: BEAUTi
• Import data – Nexus or fasta format
• Incorporate known structure - taxa
• Substitution model parameters
• Strict or relaxed clock?
• Tree prior
• Substitution model priors
• Adjustments from previous runs (operators)
• Setting the chain
Setting up the analysis: BEAUTi
Import data: Nexus format
#NEXUS
[These are comments.
They are ignored by the program.]
Begin data;
dimensions ntax=5 nchar=15;
format datatype=DNA gap=- missing=?;
matrix
Bug1 ACCTGATTACGGGCA
Bug2 ACCCGAATACGGACA
Bug3 ACCTATTTACGCCCA
BugF ACTATATTACCGGCA
BugBX4W ACCAAA---CGGGCA
;
End;
Setting up the analysis: BEAUTi
Import data: Fasta format
>Bug1
ACCTGATTACGGGCA
>Bug2
ACCCGAATACGGACA
>Bug3
ACCTATTTACGCCCA
>BugF
ACTATATTACCGGCA
>BugBX4W
ACCAAA---CGGGCA
Setting up the analysis: Models
Substitution Models
• HKY - Unequal base frequencies and
transition/transversion rate ratio
• Must specify prior and initial estimates for
transition/transversion rate ratio
• GTR - Unequal base frequencies and each
substitution has its own rate parameter
• Must specify prior and initial estimates for each
substitution rate (relative to C-T rate)
Site Models
• Site heterogeneity models
• Gamma
• Modeling rate of change using a discrete
gamma distribution
• Invariant
• Percent of non-variable sites in the data
Setting up the analysis: Models
Estimating best-fit models and initial parameters:
jModelTest
Model selected: TVM+I+G
-lnL = 1676.8109
K = 9
AIC = 3371.6218
Base frequencies:
freqA = 0.2259
freqC = 0.3199
freqG = 0.2405
freqT = 0.2137
Substitution model:
Rate matrix
R(a) [A-C] = 0.2494
R(b) [A-G] = 4.8655
R(c) [A-T] = 0.7435
R(d) [C-G] = 0.3907
R(e) [C-T] = 4.8655
R(f) [G-T] = 1.0000
Among-site rate variation
Proportion of invariable sites (I) = 0.6508
Variable sites (G)
Gamma distribution shape parameter = 0.5913
Setting up the analysis: Models
Site heterogeneity models:
The Gamma Distribution
Mean = kθ
Shape parameter = θ
Coefficient of Variation = 1/√θ
Setting up the analysis: Models
Setting up the analysis: Models
Clock Models
• Strict clock – same rate for all branches
• Relaxed clock – independent rate among
branches
• Exponential or Lognormal distribution of rates
• For contemporaneous data setting a fixed mean
substitution rate of 1.0 (uncheck “Estimate”)
results in node ages as substitutions per site
(MrBayes branch lengths)
Setting up the analysis: Models
Tree Prior
• Coalescent
• constant size
• exponential growth
• GMRF Bayesian Skygrid
• Speciation
• Yule process
• Birth-Death
• Epidemiology
Setting up the analysis: Models
Testing Models and Priors
Path Sampling/Stepping Stone analysis
• Estimation of marginal likelihoods under different
analysis parameters.
• Invoke on MCMC tab in BEAUti.
• Separate runs necessary for each changed parameter.
• Runs a complete MCMC analysis, then the X PS/SS
iterations.
Setting up the analysis: Models
Testing Models and Priors
Path Sampling/Stepping Stone analysis
Setting up the analysis: Models
Testing Models and Priors
Path Sampling/Stepping Stone analysis
log marginal likelihoods
Path Sampling Stepping Stone
HKY/strict clock -4725.85 -4728.68
HKY+gi/strict -4515.99 -4518.05
HKY+gi/LN relaxed -4436.10 -4438.75
GTR/strict clock -4746.62 -4749.14
GTR+gi/strict -4526.87 -4529.05
GTR+gi/LN relaxed -4548.39 -4551.22
Setting up the analysis: Models
Testing Models and Priors
Does the relaxed clock fit the data?
ucld.stdev
Frequency
0.5 1 1.5 2 2.5
0
50
100
150
200
250
300
350
400
Setting up the analysis: Models
Testing Models and Priors
Does the relaxed clock fit the data?
2014_GN.SL_SRDpart1.ucld.stdev
Frequency
0 0.5 1 1.5 2 2.5 3
0
50
100
150
200
250
300
350
Setting up the analysis: Models
Testing Models and Priors
Does the relaxed clock fit the data?
ucld.stdev
Frequency
0 0.5 1 1.5 2 2.5 3
0
500
1000
1500
2000
2500
Running the analysis: BEAST
• Load your input file
• That’s it
Evaluating the analysis: Tracer
Evaluating the analysis: Tracer
• Check for convergence
• Evaluating ESS values
• Viewing behavior of parameter estimates
• Examining traces
• Extracting parameter estimates and statistics
Evaluating the analysis: Tracer
• What if my analysis didn’t converge?
• Can I make multiple simultaneous runs?
• Swarm on Biowulf
Evaluating the analysis: Tracer
What if my analysis didn’t converge?
Evaluating the analysis: Tracer
What if my analysis didn’t converge?
What if my analysis didn’t converge?
What if my analysis didn’t converge?
Running BEAST: swarm on Biowulf
• Requires a .swarm file
• A text file containing
• Run in command line
beast beastJob_1.xml > beastJob_1out.txt
sleep 2; beast beastJob_2.xml > beastJob_2out.txt
sleep 4; beast beastJob_3.xml > beastJob_3out.txt
sleep 6; beast beastJob_4.xml > beastJob_4out.txt
sleep 8; beast beastJob_5.xml > beastJob_5out.txt
[username]$ swarm -f beastInput.swarm --module BEAST
Merging output files: LogCombiner
Merging output files: LogCombiner
• Log files vs Tree files
• Selecting files
• Specifying burn-in (number of steps or trees)
• Specifying subsampling
• Specifying output file
Merging output files: LogCombiner
• Burn in?
Calculating the tree: TreeAnnotator
Calculating the tree: TreeAnnotator
• Burn in? Number of trees or the number of
steps.
• Tree Type: MCC, Max sum of CC, or target
• Node heights: target, mean, or median
• Specify input and output files
Drawing trees: FigTree
Drawing trees: FigTree
• Specifying additional values (esp. posterior
probabilities)
• Tree appearance
• Ordering branches
• Re-rooting
• Exporting graphics
Running BEAST
DEMO
Setting up the analysis: BEAUTi
Running the analysis: BEAST
Evaluating the analysis: Tracer
Merging output files: LogCombiner
Calculating the tree: TreeAnnotator
Drawing trees: FigTree
BEAST2
• Still … Bayesian Evolutionary Analysis Sampling
Trees
• Modular rewrite of the BEAST software
• Various evolutionary analyses performed through
a system of independent software packages.
• Access software, documentation, etc., through
the website beast2.org
• Still a few bugs in the system…
Seminar Follow-Up Site
 For access to past recordings, handouts, slides visit this site from the
NIH network:
http://collab.niaid.nih.gov/sites/research/SIG/Bioinformatics/
54
1. Select a
Subject Matter
View:
• Seminar Details
• Handout and
Reference Docs
• Relevant Links
• Seminar
Recording Links
2. Select a
Topic
Recommended Browsers:
• IE for Windows,
• Safari for Mac (Firefox on a
Mac is incompatible with
NIH Authentication
technology)
Login
• If prompted to log in use
“NIH” in front of your
username
55
Retrieving Slides/Handouts
This lecture
series
56
Retrieving Slides/Handouts
Last lecture
Those slides
57
Questions?
58
Next?
• Time-structured phylogenies
• Estimating demographic parameters
• GMRF skyride analysis
59

More Related Content

PDF
Population Genetics AQA
PPTX
Dna extraction method
PPTX
Spectrophotometry: basic concepts, instrumentation and application
PPTX
Distribution and abundance
PPTX
Cell Apoptosis Assays
PPT
The Celestial Sphere
PDF
Mendeley Desktop Reference Manager
PPTX
Lecture enteroviruses
Population Genetics AQA
Dna extraction method
Spectrophotometry: basic concepts, instrumentation and application
Distribution and abundance
Cell Apoptosis Assays
The Celestial Sphere
Mendeley Desktop Reference Manager
Lecture enteroviruses

What's hot (20)

PPT
Seq alignment
PDF
MCQs on DNA MicroArray.pdf
PPTX
(Expasy)
PPTX
Bio153 microbial genomics 2012
PPTX
IMMUNOPRECIPITATION
PPTX
e. coli
PPTX
PPT
Biological databases
PPTX
PDF
Gene prediction strategies
PPTX
Parsimony methods
PPT
Systems biology & Approaches of genomics and proteomics
PPTX
Scop database
PPTX
Bioinformatic in drug designing
PDF
Sequence alignment
PPTX
Chou fasman algorithm for protein structure prediction
PPTX
Introduction to systems biology
PPTX
System biology and its tools
PPTX
Sequence alig Sequence Alignment Pairwise alignment:-
PPT
Microarray Data Analysis
Seq alignment
MCQs on DNA MicroArray.pdf
(Expasy)
Bio153 microbial genomics 2012
IMMUNOPRECIPITATION
e. coli
Biological databases
Gene prediction strategies
Parsimony methods
Systems biology & Approaches of genomics and proteomics
Scop database
Bioinformatic in drug designing
Sequence alignment
Chou fasman algorithm for protein structure prediction
Introduction to systems biology
System biology and its tools
Sequence alig Sequence Alignment Pairwise alignment:-
Microarray Data Analysis
Ad

Viewers also liked (20)

PPTX
Phylogenetics: Making publication-quality tree figures
PPTX
BEAST: Species phylogeny and phylogeographic analysis
PPTX
BEAST: Time-stamped data and population dynamics
PPTX
Pathogen phylogenetics using BEAST
PPTX
PPTX
PPTX
Advanced Molecular Dynamics 2016
PDF
Fungal genomicsbongsoo final
PPTX
Phytophthora cinnamomi
PPTX
Colloquium Presentation 2010 Bongsoo
PPT
2010 Spring, Bioinformatics II Presentation
PDF
Variant analysis and whole exome sequencing
PPT
BG Journal Club
PPTX
PPTX
Proteomics
Phylogenetics: Making publication-quality tree figures
BEAST: Species phylogeny and phylogeographic analysis
BEAST: Time-stamped data and population dynamics
Pathogen phylogenetics using BEAST
Advanced Molecular Dynamics 2016
Fungal genomicsbongsoo final
Phytophthora cinnamomi
Colloquium Presentation 2010 Bongsoo
2010 Spring, Bioinformatics II Presentation
Variant analysis and whole exome sequencing
BG Journal Club
Proteomics
Ad

Similar to Introduction to Bayesian phylogenetics and BEAST (20)

PPTX
RAMSES: Robust Analytic Models for Science at Extreme Scales
PPTX
Metabolomic Data Analysis Workshop and Tutorials (2014)
PDF
ChIP-seq - Data processing
PDF
Contour Forest
PDF
Commercial features spss modules 12 june 2020
PDF
An Evaluation of Science Data Formats and Their Use at the Community Coordin...
PDF
Cost-Based Optimizer in Apache Spark 2.2
PPTX
background.pptx
PDF
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
PDF
Revisiting the Notion of Diversity in Software Testing
PDF
Overview of DuraMat software tool development
PPT
Dream3D and its Extension to Abaqus Input Files
PDF
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
PDF
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
PDF
"Using step-by-step Bayesian updating to better estimate the reinforcement lo...
PPTX
BIRTE-13-Kawashima
PDF
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
PPTX
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
PDF
Examples of how Blueback tools can expand and enhance your Petrel workflow
PPTX
RAMSES: Robust Analytic Models for Science at Extreme Scales
Metabolomic Data Analysis Workshop and Tutorials (2014)
ChIP-seq - Data processing
Contour Forest
Commercial features spss modules 12 june 2020
An Evaluation of Science Data Formats and Their Use at the Community Coordin...
Cost-Based Optimizer in Apache Spark 2.2
background.pptx
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Revisiting the Notion of Diversity in Software Testing
Overview of DuraMat software tool development
Dream3D and its Extension to Abaqus Input Files
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
"Using step-by-step Bayesian updating to better estimate the reinforcement lo...
BIRTE-13-Kawashima
Automating Speed: A Proven Approach to Preventing Performance Regressions in ...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Examples of how Blueback tools can expand and enhance your Petrel workflow

More from Bioinformatics and Computational Biosciences Branch (20)

PPTX
PPTX
Virus Sequence Alignment and Phylogenetic Analysis 2019
PDF
Nephele 2.0: How to get the most out of your Nephele results
PPTX
Protein fold recognition and ab_initio modeling
PDF
Protein structure prediction with a focus on Rosetta
PDF
UNIX Basics and Cluster Computing
PDF
Statistical applications in GraphPad Prism
PDF
Automating biostatistics workflows using R-based webtools
PDF
Overview of statistical tests: Data handling and data quality (Part II)
PDF
Overview of statistics: Statistical testing (Part I)
PDF
Virus Sequence Alignment and Phylogenetic Analysis 2019
Nephele 2.0: How to get the most out of your Nephele results
Protein fold recognition and ab_initio modeling
Protein structure prediction with a focus on Rosetta
UNIX Basics and Cluster Computing
Statistical applications in GraphPad Prism
Automating biostatistics workflows using R-based webtools
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistics: Statistical testing (Part I)

Recently uploaded (20)

PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
Placing the Near-Earth Object Impact Probability in Context
PPT
protein biochemistry.ppt for university classes
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
An interstellar mission to test astrophysical black holes
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
famous lake in india and its disturibution and importance
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
HPLC-PPT.docx high performance liquid chromatography
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
7. General Toxicologyfor clinical phrmacy.pptx
POSITIONING IN OPERATION THEATRE ROOM.ppt
Comparative Structure of Integument in Vertebrates.pptx
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
The KM-GBF monitoring framework – status & key messages.pptx
2. Earth - The Living Planet Module 2ELS
Placing the Near-Earth Object Impact Probability in Context
protein biochemistry.ppt for university classes
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
An interstellar mission to test astrophysical black holes
Cell Membrane: Structure, Composition & Functions
Phytochemical Investigation of Miliusa longipes.pdf
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
famous lake in india and its disturibution and importance
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
HPLC-PPT.docx high performance liquid chromatography
ECG_Course_Presentation د.محمد صقران ppt
neck nodes and dissection types and lymph nodes levels
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...

Introduction to Bayesian phylogenetics and BEAST

  • 1. Spring 2016 Kurt Wollenberg, PhD Phylogenetics Specialist Bioinformatics and Computational Biosciences Branch Office of Cyber Infrastructure and Computational Biology Molecular Evolutionary Analysis Using BEAST Part 1: Introduction to Bayesian phylogenetics and BEAST
  • 2. Course Organization • Introduction to Bayesian phylogenetics • Introduction to BEAST • Building a Bayesian phylogeny • Incorporating sample time in the phylogeny • Estimating demographic parameters • Estimating species trees from gene trees • Estimating ancestral trait states (esp. geography)
  • 3. Lecture Organization • Why Bayesian phylogenetics is well-suited to the analysis of pathogen molecular evolution • A short tour of Bayesian MCMC analysis • What is BEAST? An overview of the BEAST package • BEAST Analysis Demo
  • 4. What’s so special about pathogens? • Short generation time • Rapid evolution • Genotypes - easy, phenotypes - hard • Large populations • Structured populations • Rigorous temporal sampling of genotypes
  • 5. Why use Bayesian methods on pathogens? • Coalescent approach is more appropriate • Can incorporate temporal data • Can incorporate geographical data • Can incorporate host data
  • 6. What is Bayesian analysis? • Calculation of the probability of parameters (tree, substitution model) given the data (sequence alignment) • p(θ|D) = (Likelihood x prior)/probability of the data • p(θ|D) = p(D|θ)p(θ)/p(D)
  • 7. Exploring the posterior probability distribution Posterior probabilities of trees and parameters are approximated using Markov Chain Monte Carlo (MCMC) sampling Markov Chain: A statement of the probability of moving from one state to another Bayesian Analysis
  • 8. What is MCMC? Markov Chain Monte Carlo Markov chain Monte Carlo One link in the chain Choosing a link
  • 9. What is MCMC? Markov Chain Monte Carlo: accept or reject? Metropolis-Hastings algorithm PosteriorProbability Topology A Topology B Topology C 20% 48% 32% Accept! Maybe
  • 10. What is BEAST? • Bayesian Evolutionary Analysis Sampling Trees • A collection of programs for performing Bayesian MCMC analysis of molecular sequences • Can incorporate sample time information • Can perform a broad range of other evolutionary analyses using sequence data.
  • 11. What is BEAST? The Programs: • BEAUti - Creating XML input files • BEAST - MCMC analysis of molecular sequences • Tracer - Viewing MCMC output • LogCombiner - Combining output files • TreeAnnotator - Generate the consensus tree • FigTree - Drawing a tree
  • 12. Different types of BEAST analyses • Calculating a Bayesian coalescent phylogeny • Calculating a Time-Stamped Bayesian coalescent • Estimated population dynamics (Bayesian skyline/skyride/skygrid) • Combined gene and species phylogeny estimate (*BEAST) • Phylogeographic analysis (time and location data)
  • 13. Defining your analysis • Prior knowledge of tree? • Calibrating nodes? • Substitution model? • Effective population sizes? • What priors to use?
  • 14. Setting up the analysis: BEAUTi
  • 15. Setting up the analysis: BEAUTi • Import data – Nexus or fasta format • Incorporate known structure - taxa • Substitution model parameters • Strict or relaxed clock? • Tree prior • Substitution model priors • Adjustments from previous runs (operators) • Setting the chain
  • 16. Setting up the analysis: BEAUTi Import data: Nexus format #NEXUS [These are comments. They are ignored by the program.] Begin data; dimensions ntax=5 nchar=15; format datatype=DNA gap=- missing=?; matrix Bug1 ACCTGATTACGGGCA Bug2 ACCCGAATACGGACA Bug3 ACCTATTTACGCCCA BugF ACTATATTACCGGCA BugBX4W ACCAAA---CGGGCA ; End;
  • 17. Setting up the analysis: BEAUTi Import data: Fasta format >Bug1 ACCTGATTACGGGCA >Bug2 ACCCGAATACGGACA >Bug3 ACCTATTTACGCCCA >BugF ACTATATTACCGGCA >BugBX4W ACCAAA---CGGGCA
  • 18. Setting up the analysis: Models Substitution Models • HKY - Unequal base frequencies and transition/transversion rate ratio • Must specify prior and initial estimates for transition/transversion rate ratio • GTR - Unequal base frequencies and each substitution has its own rate parameter • Must specify prior and initial estimates for each substitution rate (relative to C-T rate)
  • 19. Site Models • Site heterogeneity models • Gamma • Modeling rate of change using a discrete gamma distribution • Invariant • Percent of non-variable sites in the data Setting up the analysis: Models
  • 20. Estimating best-fit models and initial parameters: jModelTest Model selected: TVM+I+G -lnL = 1676.8109 K = 9 AIC = 3371.6218 Base frequencies: freqA = 0.2259 freqC = 0.3199 freqG = 0.2405 freqT = 0.2137 Substitution model: Rate matrix R(a) [A-C] = 0.2494 R(b) [A-G] = 4.8655 R(c) [A-T] = 0.7435 R(d) [C-G] = 0.3907 R(e) [C-T] = 4.8655 R(f) [G-T] = 1.0000 Among-site rate variation Proportion of invariable sites (I) = 0.6508 Variable sites (G) Gamma distribution shape parameter = 0.5913 Setting up the analysis: Models
  • 21. Site heterogeneity models: The Gamma Distribution Mean = kθ Shape parameter = θ Coefficient of Variation = 1/√θ Setting up the analysis: Models
  • 22. Setting up the analysis: Models Clock Models • Strict clock – same rate for all branches • Relaxed clock – independent rate among branches • Exponential or Lognormal distribution of rates • For contemporaneous data setting a fixed mean substitution rate of 1.0 (uncheck “Estimate”) results in node ages as substitutions per site (MrBayes branch lengths)
  • 23. Setting up the analysis: Models Tree Prior • Coalescent • constant size • exponential growth • GMRF Bayesian Skygrid • Speciation • Yule process • Birth-Death • Epidemiology
  • 24. Setting up the analysis: Models Testing Models and Priors Path Sampling/Stepping Stone analysis • Estimation of marginal likelihoods under different analysis parameters. • Invoke on MCMC tab in BEAUti. • Separate runs necessary for each changed parameter. • Runs a complete MCMC analysis, then the X PS/SS iterations.
  • 25. Setting up the analysis: Models Testing Models and Priors Path Sampling/Stepping Stone analysis
  • 26. Setting up the analysis: Models Testing Models and Priors Path Sampling/Stepping Stone analysis log marginal likelihoods Path Sampling Stepping Stone HKY/strict clock -4725.85 -4728.68 HKY+gi/strict -4515.99 -4518.05 HKY+gi/LN relaxed -4436.10 -4438.75 GTR/strict clock -4746.62 -4749.14 GTR+gi/strict -4526.87 -4529.05 GTR+gi/LN relaxed -4548.39 -4551.22
  • 27. Setting up the analysis: Models Testing Models and Priors Does the relaxed clock fit the data? ucld.stdev Frequency 0.5 1 1.5 2 2.5 0 50 100 150 200 250 300 350 400
  • 28. Setting up the analysis: Models Testing Models and Priors Does the relaxed clock fit the data? 2014_GN.SL_SRDpart1.ucld.stdev Frequency 0 0.5 1 1.5 2 2.5 3 0 50 100 150 200 250 300 350
  • 29. Setting up the analysis: Models Testing Models and Priors Does the relaxed clock fit the data? ucld.stdev Frequency 0 0.5 1 1.5 2 2.5 3 0 500 1000 1500 2000 2500
  • 30. Running the analysis: BEAST • Load your input file • That’s it
  • 32. Evaluating the analysis: Tracer • Check for convergence • Evaluating ESS values • Viewing behavior of parameter estimates • Examining traces • Extracting parameter estimates and statistics
  • 33. Evaluating the analysis: Tracer • What if my analysis didn’t converge? • Can I make multiple simultaneous runs? • Swarm on Biowulf
  • 34. Evaluating the analysis: Tracer What if my analysis didn’t converge?
  • 35. Evaluating the analysis: Tracer What if my analysis didn’t converge?
  • 36. What if my analysis didn’t converge?
  • 37. What if my analysis didn’t converge?
  • 38. Running BEAST: swarm on Biowulf • Requires a .swarm file • A text file containing • Run in command line beast beastJob_1.xml > beastJob_1out.txt sleep 2; beast beastJob_2.xml > beastJob_2out.txt sleep 4; beast beastJob_3.xml > beastJob_3out.txt sleep 6; beast beastJob_4.xml > beastJob_4out.txt sleep 8; beast beastJob_5.xml > beastJob_5out.txt [username]$ swarm -f beastInput.swarm --module BEAST
  • 39. Merging output files: LogCombiner
  • 40. Merging output files: LogCombiner • Log files vs Tree files • Selecting files • Specifying burn-in (number of steps or trees) • Specifying subsampling • Specifying output file
  • 41. Merging output files: LogCombiner • Burn in?
  • 42. Calculating the tree: TreeAnnotator
  • 43. Calculating the tree: TreeAnnotator • Burn in? Number of trees or the number of steps. • Tree Type: MCC, Max sum of CC, or target • Node heights: target, mean, or median • Specify input and output files
  • 45. Drawing trees: FigTree • Specifying additional values (esp. posterior probabilities) • Tree appearance • Ordering branches • Re-rooting • Exporting graphics
  • 47. Setting up the analysis: BEAUTi
  • 50. Merging output files: LogCombiner
  • 51. Calculating the tree: TreeAnnotator
  • 53. BEAST2 • Still … Bayesian Evolutionary Analysis Sampling Trees • Modular rewrite of the BEAST software • Various evolutionary analyses performed through a system of independent software packages. • Access software, documentation, etc., through the website beast2.org • Still a few bugs in the system…
  • 54. Seminar Follow-Up Site  For access to past recordings, handouts, slides visit this site from the NIH network: http://collab.niaid.nih.gov/sites/research/SIG/Bioinformatics/ 54 1. Select a Subject Matter View: • Seminar Details • Handout and Reference Docs • Relevant Links • Seminar Recording Links 2. Select a Topic Recommended Browsers: • IE for Windows, • Safari for Mac (Firefox on a Mac is incompatible with NIH Authentication technology) Login • If prompted to log in use “NIH” in front of your username
  • 58. 58 Next? • Time-structured phylogenies • Estimating demographic parameters • GMRF skyride analysis
  • 59. 59

Editor's Notes

  • #10: Ratio of proposed to current estimate (prior) times the likelihood ratio of the proposed to the current estimate times the Hastings ratio (proposal ratio) for asymmetric operators
  • #24: GMRF = Gaussian Markov random field