Phylogenetic models and MCMC methods for the reconstruction of language history Robin J. Ryder CEREMADE – Paris Dauphine / CREST – INSEE Joint work with Geoff K. Nicholls at the Department of Statistics, University of Oxford www.slideshare.net/robinryder
Carles li reis, nostre emper[er]e magnes Set anz tuz pleins ad estet en Espaigne : Tresqu’en la mer cunquist la tere altaigne. N’i ad castel ki devant lui remaigne ; Mur ne citet n’i est remes a fraindre, Fors Sarraguce, ki est en une muntaigne. Chanson de Roland , 1r (11 th  century)
La plus commune façon d'amollir les coeurs de ceux qu'on a offensez, lors qu'ayant la vengeance en main, ils nous tiennent à leur mercy, c'est de les esmouvoir par submission à commiseration et à pitié. Montaigne,  Essais , I, 1 (1580)
Tes yeux sont si profonds qu'en me penchant pour boire J'ai vu tous les soleils y venir se mirer S'y jeter à mourir tous les désespérés Tes yeux sont si profonds que j'y perds la mémoire Aragon,  Les Yeux d'Elsa  (1942)
Et la piaule swingue au son du ghetto, on tape à la porte Chill c'est trop fort ! baisse le son merde ! j'connais A chaque fois c'est pareil tant pis il faut qu'ça pète Et profite en traître des nouveaux albums qu'Rod m'achète Akhénaton,  Juste une pression  (2005)
What to expect Description of the data
Model of language diversification
MCMC for phylogenetic trees
Synthetic studies
Analysis of two data sets
Indo-European languages
Indo-European languages
Language diversification Languages change in a way comparable to biological species Similarities between languages indicate that they may be cousins. Most common model : phylogenetic tree
 
Questions Topology
Internal ages
Age of the root: 6000-6500 BP or 8000-9500 BP?
(BP=Before Present)
Core vocabulary 100 or 200 meanings, present in almost all languages :  bird, hand, to eat, red...
Borrowing is possible (non-tree-like change), but:
“ Easy” to detect
Uncommon
Does not introduce systematic bias
Data coding Old English:  stierfþ Old High German:  stirbit ,  touwit Avestan:  miriiete Old Church Slavonic:  umĭretŭ Latin:  moritur Oscan: ? Cognacy classes: 1.  {stierfþ, stirbit} 2.  {touwit} 3.  {miriiete, umĭretŭ, moritur}
Constraints Constraints on parts of the topology
Constraints on some internal ages
We use these constraints to infer rates and other ages
 
Description of the model (1)‏ Traits are born at rate  λ
Trait instances die at rate μ
λ and μ are constants
Description of the model (2)‏ Catastrophes occur at rate  ρ
At a catastrophe, each trait dies with probability κ and Poiss(ν) traits are born.
λ/μ=ν/κ: the number of traits is constant on average.
Description of the model (3)‏ Observation model: each data point (0s and 1s) is missing with probability ξ
Some traits are not observed and are therefore deleted from the data
Registration process
Registration process
Registration process
Registration process
Posterior distribution
Likelihood calculations
Prior distribution on trees Our main focus is on the root age
We would like the marginal prior on the root age to be (approximately) uniform over (say) 5000-15000BP
MCMC moves Random walk on the parameters
Various moves on the tree (Drummond et al., 2002)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Checking mixing and convergence Auto-correlations
Need statistics on the tree
Length of the tree
Root age
Presence/Absence of a few subtrees

More Related Content

PDF
Uniform and non-uniform pseudo random numbers generators for high dimensional...
PDF
Omiros' talk on the Bernoulli factory problem
PDF
Can we estimate a constant?
PDF
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
PDF
ISBA 2016: Foundations
PDF
folding Markov chains: the origaMCMC
PDF
Introduction to MCMC methods
PDF
Simulation (AMSI Public Lecture)
Uniform and non-uniform pseudo random numbers generators for high dimensional...
Omiros' talk on the Bernoulli factory problem
Can we estimate a constant?
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
ISBA 2016: Foundations
folding Markov chains: the origaMCMC
Introduction to MCMC methods
Simulation (AMSI Public Lecture)

More from Robin Ryder (11)

PDF
Bayesian Methods for Historical Linguistics
PDF
A phylogenetic model of language diversification
PDF
Statistical Methods in Historical Linguistics
PDF
Introduction à ABC
PDF
On the convergence properties of the Wang-Landau algorithm
PDF
Bayesian case studies, practical 2
PDF
Bayesian case studies, practical 1
PDF
Modèles phylogéniques de la diversification des langues
ODP
Talk at Institut Jean Nicod on 6 October 2010
ODP
Modèles phylogénétiques de la diversification des langues
PDF
Approximate Bayesian Computation (ABC)
Bayesian Methods for Historical Linguistics
A phylogenetic model of language diversification
Statistical Methods in Historical Linguistics
Introduction à ABC
On the convergence properties of the Wang-Landau algorithm
Bayesian case studies, practical 2
Bayesian case studies, practical 1
Modèles phylogéniques de la diversification des langues
Talk at Institut Jean Nicod on 6 October 2010
Modèles phylogénétiques de la diversification des langues
Approximate Bayesian Computation (ABC)
Ad

Recently uploaded (20)

PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Architecture types and enterprise applications.pdf
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Five Habits of High-Impact Board Members
PPTX
Modernising the Digital Integration Hub
PPT
What is a Computer? Input Devices /output devices
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPT
Module 1.ppt Iot fundamentals and Architecture
PPTX
2018-HIPAA-Renewal-Training for executives
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
PDF
Flame analysis and combustion estimation using large language and vision assi...
sbt 2.0: go big (Scala Days 2025 edition)
Taming the Chaos: How to Turn Unstructured Data into Decisions
Architecture types and enterprise applications.pdf
Developing a website for English-speaking practice to English as a foreign la...
Credit Without Borders: AI and Financial Inclusion in Bangladesh
Chapter 5: Probability Theory and Statistics
Hindi spoken digit analysis for native and non-native speakers
NewMind AI Weekly Chronicles – August ’25 Week III
Final SEM Unit 1 for mit wpu at pune .pptx
Five Habits of High-Impact Board Members
Modernising the Digital Integration Hub
What is a Computer? Input Devices /output devices
The influence of sentiment analysis in enhancing early warning system model f...
Module 1.ppt Iot fundamentals and Architecture
2018-HIPAA-Renewal-Training for executives
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
Flame analysis and combustion estimation using large language and vision assi...
Ad

Phylogenetic models and MCMC methods for the reconstruction of language history