SlideShare a Scribd company logo
A unified framework to combine disperate data types in species distribution modelling
A unified framework to combine disperate data
types in species distribution modelling
Slides on Slideshare:
http://www.slideshare.net/oharar/gf-o2014talk
Bob O’Hara1 Petr Keil 2 Walter Jetz2
1BiK-F, Biodiversity and Climate Change Research Centre
Frankfurt am Main
Germany
Twitter: @bobohara
2Department of Ecology and Evolutionary Biology
Yale University
New Haven, CT, USA
A unified framework to combine disperate data types in species distribution modelling
A ”Real”Curve
0 20 40 60 80 100
020406080
Curve
A unified framework to combine disperate data types in species distribution modelling
Approximated with a Discretised Curve
0 20 40 60 80 100
020406080
Curve
Discrete
A unified framework to combine disperate data types in species distribution modelling
Better: linear interpolation
0 20 40 60 80 100
020406080
Curve
Discrete
Interpolated
A unified framework to combine disperate data types in species distribution modelling
With more points, the approximations improve
0 20 40 60 80 100
020406080
Curve
Discrete
Interpolated
A unified framework to combine disperate data types in species distribution modelling
What does this have to do with distribution models?
A unified framework to combine disperate data types in species distribution modelling
What does this have to do with distribution models?
This is how SDMs see the world:
source: http://bit.ly/1l8sG7M
Map produced by Peter Blancher, Science and Technology Branch, Environment Canada, based on data from the
North American Breeding Bird Survey
A unified framework to combine disperate data types in species distribution modelling
Problems: scale, within-grid heterogeneity
A unified framework to combine disperate data types in species distribution modelling
Let’s sidestep the whole problem
Work in continuous space instead
The maths will let us work on different scales
e.g. Renner & Warton (2013) doi:
10.1111/j.1541-0420.2012.01824.x
Lets us deal with points & irregular shapes
Makes it straightforward to include different sorts of data
A unified framework to combine disperate data types in species distribution modelling
Motivation
Map Of Life
www.mol.org/
Different data sources
GBIF
expert range maps
eBird and similar
citizen science efforts
organised surveys
(BBS, BMSs)
Regional checklists
A unified framework to combine disperate data types in species distribution modelling
A Unified Model
There is a single state - density of the species
Actual State
Presence
Absence
Presence
Only
Expert
Range
Maps
¨
¨¨% c
r
rrj
A unified framework to combine disperate data types in species distribution modelling
Point Processes: Model
Each point in space, ξ, has an
intensity, ρ(ξ)
log(ρ(ξ)) = η(ξ) = βX(ξ)+ν(ξ)
The number of individuals in an
area A follows a Poisson
distibution with mean
λ(A) =
A
ρ(ξ)ds
A unified framework to combine disperate data types in species distribution modelling
Point Processes: Reality
Approximate λ(ξ) numerically:
select some integration points,
and sum over those
λ(A) ≈
N
s=1
|A(s)|eη(s)
A unified framework to combine disperate data types in species distribution modelling
Observation Models
Presence only points: thinned point process
Abundance: Poisson Presence/Absence: binomial, cloglog
with µA(A, t) = η(A) + log(|A|) + log(t) + log(p)
(large) areas:
Pr(n(A) > 0) = 1 − e A eρ(ξ)dξ
Expert range: use distance to range as a covariate
A unified framework to combine disperate data types in species distribution modelling
Put these together
Data likelihoods: P(Xi |λ) for data Xi . Total likelihood is
P(X) =
i
P(Xi |λ)P(λ)
Where P(λ) is the actual distribution model, and will depend on
environmental and other covariates
A unified framework to combine disperate data types in species distribution modelling
In practice
Be Bayesian. Could use MCMC, but this is quicker in INLA
SolTim.res <- inla(SolTim.formula,
family=c('poisson','binomial'),
data=inla.stack.data(stk.all),
control.family = list(list(link = "log"),
list(link = "cloglog")),
control.predictor=list(A=inla.stack.A(stk.all)),
Ntrials=1, E=inla.stack.data(stk.all)$e, verbose=FALSE)
A unified framework to combine disperate data types in species distribution modelling
The Solitary Tinamou
Photo credit: Francesco Veronesi on Flickr (https://www.flickr.com/photos/francesco veronesi/12797666343)
A unified framework to combine disperate data types in species distribution modelling
Data
Whole Region
Expert range
Park, absent
Park, present
eBird
GBIF
expert range
2 point
processes (49
points)
28 parks
A unified framework to combine disperate data types in species distribution modelling
A Fitted Model
mean sd
Intercept -0.03 0.02
b.eBird 1.54 0.39
b.GBIF 1.54 0.24
Forest 0.00 0.01
NPP -0.01 0.01
Altitude -0.01 0.01
DistToRange -0.01 0.00
A unified framework to combine disperate data types in species distribution modelling
Predicted Distribution
Posterior Mean
−0.10
−0.09
−0.08
−0.07
−0.06
−0.05
−0.04
−0.03
−0.02
Posterior Standard Deviation
0.01
0.02
0.03
0.04
0.05
0.06
A unified framework to combine disperate data types in species distribution modelling
Individual Data Types
eBird GBIF Parks Expert Range
A unified framework to combine disperate data types in species distribution modelling
Join the bandwagon!
Using continuous space - makes life
easier
In practice, use INLA (but I need to
tidy up the code)
A unified framework to combine disperate data types in species distribution modelling
Not the final answer...
http://www.gocomics.com/nonsequitur/2014/06/24

More Related Content

PDF
Disentangling ecological networks using graph embedding methods
PDF
Management of simulation studies in computational biology
PPTX
Presenting and Preserving the Change in Taxonomic Knowledge for Linked Data
PDF
Reproducibility, dissemination, and management of modeling results
PDF
Neural Networks with Complex Sample Data
PDF
Learning from (dis)similarity data
PPTX
Co-clustering with augmented data
PDF
Discrete talk
Disentangling ecological networks using graph embedding methods
Management of simulation studies in computational biology
Presenting and Preserving the Change in Taxonomic Knowledge for Linked Data
Reproducibility, dissemination, and management of modeling results
Neural Networks with Complex Sample Data
Learning from (dis)similarity data
Co-clustering with augmented data
Discrete talk

Similar to Gf o2014talk (7)

PDF
BES/SfE talk 2014
PDF
Combining Data in Species Distribution Models
PDF
Isec july2 h1_solymos
ODP
Multispecies Distribution Models
PDF
Subject-3---Bayesian-regression-models-2024.pdf
PPTX
An Introduction to Simulation in the Social Sciences
PDF
computational science & engineering seminar, 16 oct 2013
BES/SfE talk 2014
Combining Data in Species Distribution Models
Isec july2 h1_solymos
Multispecies Distribution Models
Subject-3---Bayesian-regression-models-2024.pdf
An Introduction to Simulation in the Social Sciences
computational science & engineering seminar, 16 oct 2013
Ad

More from Bob O'Hara (14)

PDF
Integrated modelling Cape Town
ODP
What are we? Statistical Ecologists or Ecological Statisticians?
ODP
Trying to clean up the mess: Bayes, Frequentism, NHST, Parameter estimation e...
PDF
What, exactly, is a biotic interactions?
ODP
Iwsmbvs
PDF
Isec2012 o hara
ODP
Interaction networks
ODP
Meta analyses
ODP
Blogging
ODP
Lammi2011
ODP
SDM Observer Models
PDF
Populations
ODP
Glued Ecology
PDF
Web20
Integrated modelling Cape Town
What are we? Statistical Ecologists or Ecological Statisticians?
Trying to clean up the mess: Bayes, Frequentism, NHST, Parameter estimation e...
What, exactly, is a biotic interactions?
Iwsmbvs
Isec2012 o hara
Interaction networks
Meta analyses
Blogging
Lammi2011
SDM Observer Models
Populations
Glued Ecology
Web20
Ad

Recently uploaded (20)

PDF
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
PPTX
Welcome-grrewfefweg-students-of-2024.pptx
PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PPT
Presentation of a Romanian Institutee 2.
PPTX
Lesson-1-Introduction-to-the-Study-of-Chemistry.pptx
PPT
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
PPT
veterinary parasitology ````````````.ppt
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PPTX
Fluid dynamics vivavoce presentation of prakash
PPT
Animal tissues, epithelial, muscle, connective, nervous tissue
PPTX
A powerpoint on colorectal cancer with brief background
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
Welcome-grrewfefweg-students-of-2024.pptx
BODY FLUIDS AND CIRCULATION class 11 .pptx
lecture 2026 of Sjogren's syndrome l .pdf
Placing the Near-Earth Object Impact Probability in Context
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
Presentation of a Romanian Institutee 2.
Lesson-1-Introduction-to-the-Study-of-Chemistry.pptx
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
veterinary parasitology ````````````.ppt
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Fluid dynamics vivavoce presentation of prakash
Animal tissues, epithelial, muscle, connective, nervous tissue
A powerpoint on colorectal cancer with brief background
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
Looking into the jet cone of the neutrino-associated very high-energy blazar ...

Gf o2014talk

  • 1. A unified framework to combine disperate data types in species distribution modelling A unified framework to combine disperate data types in species distribution modelling Slides on Slideshare: http://www.slideshare.net/oharar/gf-o2014talk Bob O’Hara1 Petr Keil 2 Walter Jetz2 1BiK-F, Biodiversity and Climate Change Research Centre Frankfurt am Main Germany Twitter: @bobohara 2Department of Ecology and Evolutionary Biology Yale University New Haven, CT, USA
  • 2. A unified framework to combine disperate data types in species distribution modelling A ”Real”Curve 0 20 40 60 80 100 020406080 Curve
  • 3. A unified framework to combine disperate data types in species distribution modelling Approximated with a Discretised Curve 0 20 40 60 80 100 020406080 Curve Discrete
  • 4. A unified framework to combine disperate data types in species distribution modelling Better: linear interpolation 0 20 40 60 80 100 020406080 Curve Discrete Interpolated
  • 5. A unified framework to combine disperate data types in species distribution modelling With more points, the approximations improve 0 20 40 60 80 100 020406080 Curve Discrete Interpolated
  • 6. A unified framework to combine disperate data types in species distribution modelling What does this have to do with distribution models?
  • 7. A unified framework to combine disperate data types in species distribution modelling What does this have to do with distribution models? This is how SDMs see the world: source: http://bit.ly/1l8sG7M Map produced by Peter Blancher, Science and Technology Branch, Environment Canada, based on data from the North American Breeding Bird Survey
  • 8. A unified framework to combine disperate data types in species distribution modelling Problems: scale, within-grid heterogeneity
  • 9. A unified framework to combine disperate data types in species distribution modelling Let’s sidestep the whole problem Work in continuous space instead The maths will let us work on different scales e.g. Renner & Warton (2013) doi: 10.1111/j.1541-0420.2012.01824.x Lets us deal with points & irregular shapes Makes it straightforward to include different sorts of data
  • 10. A unified framework to combine disperate data types in species distribution modelling Motivation Map Of Life www.mol.org/ Different data sources GBIF expert range maps eBird and similar citizen science efforts organised surveys (BBS, BMSs) Regional checklists
  • 11. A unified framework to combine disperate data types in species distribution modelling A Unified Model There is a single state - density of the species Actual State Presence Absence Presence Only Expert Range Maps ¨ ¨¨% c r rrj
  • 12. A unified framework to combine disperate data types in species distribution modelling Point Processes: Model Each point in space, ξ, has an intensity, ρ(ξ) log(ρ(ξ)) = η(ξ) = βX(ξ)+ν(ξ) The number of individuals in an area A follows a Poisson distibution with mean λ(A) = A ρ(ξ)ds
  • 13. A unified framework to combine disperate data types in species distribution modelling Point Processes: Reality Approximate λ(ξ) numerically: select some integration points, and sum over those λ(A) ≈ N s=1 |A(s)|eη(s)
  • 14. A unified framework to combine disperate data types in species distribution modelling Observation Models Presence only points: thinned point process Abundance: Poisson Presence/Absence: binomial, cloglog with µA(A, t) = η(A) + log(|A|) + log(t) + log(p) (large) areas: Pr(n(A) > 0) = 1 − e A eρ(ξ)dξ Expert range: use distance to range as a covariate
  • 15. A unified framework to combine disperate data types in species distribution modelling Put these together Data likelihoods: P(Xi |λ) for data Xi . Total likelihood is P(X) = i P(Xi |λ)P(λ) Where P(λ) is the actual distribution model, and will depend on environmental and other covariates
  • 16. A unified framework to combine disperate data types in species distribution modelling In practice Be Bayesian. Could use MCMC, but this is quicker in INLA SolTim.res <- inla(SolTim.formula, family=c('poisson','binomial'), data=inla.stack.data(stk.all), control.family = list(list(link = "log"), list(link = "cloglog")), control.predictor=list(A=inla.stack.A(stk.all)), Ntrials=1, E=inla.stack.data(stk.all)$e, verbose=FALSE)
  • 17. A unified framework to combine disperate data types in species distribution modelling The Solitary Tinamou Photo credit: Francesco Veronesi on Flickr (https://www.flickr.com/photos/francesco veronesi/12797666343)
  • 18. A unified framework to combine disperate data types in species distribution modelling Data Whole Region Expert range Park, absent Park, present eBird GBIF expert range 2 point processes (49 points) 28 parks
  • 19. A unified framework to combine disperate data types in species distribution modelling A Fitted Model mean sd Intercept -0.03 0.02 b.eBird 1.54 0.39 b.GBIF 1.54 0.24 Forest 0.00 0.01 NPP -0.01 0.01 Altitude -0.01 0.01 DistToRange -0.01 0.00
  • 20. A unified framework to combine disperate data types in species distribution modelling Predicted Distribution Posterior Mean −0.10 −0.09 −0.08 −0.07 −0.06 −0.05 −0.04 −0.03 −0.02 Posterior Standard Deviation 0.01 0.02 0.03 0.04 0.05 0.06
  • 21. A unified framework to combine disperate data types in species distribution modelling Individual Data Types eBird GBIF Parks Expert Range
  • 22. A unified framework to combine disperate data types in species distribution modelling Join the bandwagon! Using continuous space - makes life easier In practice, use INLA (but I need to tidy up the code)
  • 23. A unified framework to combine disperate data types in species distribution modelling Not the final answer... http://www.gocomics.com/nonsequitur/2014/06/24