SlideShare a Scribd company logo
GIS in Public Health Research:
Understanding Spatial Analysis &
Interpreting Outcomes
Kristin Osiecki PhD
GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14
Houston Aerosol Characterization &
Health Experiment (HACHE)
• UT Health Science Center School of
Biomedical Informatics
• University of Houston Department
of Earth and Atmospheric Sciences
• Rice University Department of
Sociology and Department of Civil &
Environmental Engineering
GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14
GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14
Applications in Public Health Research
• Space matters
– communities,census tracts, counties, states

• Multidisciplinary and Interdisciplinary
• Collaborative
• Simple and Complex Models
What research questions are we
trying to answer?
• Do we need visualizations or maps? OR
• Are we interested in investigating possible
spatial relationships within the data?
ArcGIS Toolbox
Handyman’s Dream
or
Do-it-yourself nightmare?
Objectives
•
•
•
•

Traditional Statistics & Spatial Analysis
Permutations
Spatial Weights
EDA & ESDA
"Spatial Statistics" does not mean
applying traditional (non-spatial)
statistical methods to data that just
happens to be spatial (has X and Y
coordinates).
Source: ESRI
http://resources.esri.com/help/9.3/arcgisen
gine/java/gp_toolref/spatial_statistics_tools
/how_generate_spatial_weights_matrix_spa
tial_statistics_works.htm
Spatial Analysis

Traditional
Statistical
Methodology

Spatial
Methodology
GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14
Global & Local
Global
Model

EDA
ESDA

Global autocorrelation
Local autocorrelation

Local
Model
The most crucial step in the process
Exploring the Data: EDA & ESDA
GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14
GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14
Scatter Plot Matrix
1
0.8
0.6
0.4

pct_pov

0.2
0

p_FHH

0

p_blck

pct_pov

0.2

0.4
0.6
p_blck x p_FHH

0.8

1
GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14
GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14
Exploratory Spatial Data Analysis
• Interactively visualize and explore data
where space matter
• Detect patterns
• Hypothesis generation
• spatial modeling is needed to test
hypotheses
• Works on point feature and polygon
features (i.e. census, epidemiology,
demographic layers)
What is Spatial Randomness?
• Observed spatial pattern of value is equally as
likely as any other spatial pattern
• Value at one location does not depend on
values at neighboring locations under spatial
randomness, the location of values may be
altered without affecting the information
content of the data
• random permutation or reshuffling of values
Dr. Luc Anselin 2012
Spatial Randomness
• Spatial Randomness Null Hypothesis
– Spatial randomness is absence in any pattern
– If rejected, evidence of spatial structure

Dr. Luc Anselin 2012
ArcGIS Spatial Autocorrelation
• The Randomization Null Hypothesis: Where appropriate, the tools in the
Spatial Statistics toolbox use the randomization null hypothesis as the
basis for statistical significance testing. The randomization null hypothesis
postulates that the observed spatial pattern of your data represents one
of many (n!) possible spatial arrangements. If you could pick up your data
values and throw them down onto the features in your study area, you
would have one possible spatial arrangement of those values. (Note that
picking up your data values and throwing them down arbitrarily is an
example of a random spatial process). The randomization null hypothesis
states that if you could do this exercise (pick them up, throw them down)
infinite times, most of the time you would produce a pattern that would
not be markedly different from the observed pattern (your real data).
Once in a while you might accidentally throw all the highest values into
the same corner of your study area, but the probability of doing that is
small. The randomization null hypothesis states that your data is one of
many, many, many possible versions of complete spatial randomness. The
data values are fixed; only their spatial arrangement could vary.
http://resources.arcgis.com/en/help/main/10.
1/index.html#//005p00000006000000
Permutations
• A numerical approach to testing for statistical
significance (in contrast to analytical
approaches)
• It is data-driven and makes no assumptions
(such as normality) about the data
Permutations in Geoda
• Permutation inference is shuffling values around
and re-computing statistics each time with a
different set of random numbers to construct a
reference distribution.
• Permutations are used to determine how likely it
would be to observe the Moran’s I value of an
actual distribution under conditions of spatial
randomness.
• P-values are dependent on the number of
permutations so they are “pseudo p-values”
Permutations
Spatial Weights
The first step in the analysis of spatial
autocorrelation is to construct a spatial weights
file that contains information on the
“neighborhood” structure for each location
(luc anselin)
Generation of Spatial Weights ESRI
• For binary strategies (fixed distance, K nearest
neighbors, or contiguity) a feature is either a
neighbor (1) or it is not (0).
• For weighted strategies (inverse distance or
zone of indifference) neighboring features
have a varying amount of impact (or
influence) and weights are computed to
reflect that variation.
Row Standardization
• Adjusts the weights in a spatial weights matrix
• Each weight is divided by its row sum
• The row sum is the sum of weights for a
feature’s neighbors.
• A weights matrix is row-standardized when
the values of each of its rows sum to one.
Binary vs. row-standardized
• A binary weights matrix looks like:
0

1

0

0

0

0

1

1

1

1

0

0

0

1

1

1

• A row-standardized matrix it looks like:
0

1

0

0

0

0

.5

.5

.5

.5

0

0

0

.33

.33

.33
Spatial Weights

• Formal expression of locational similarity
Distance Models
• Inverse distance – all features influence all
other features, but the closer something is,
the more influence it has
• Distance band – features outside a specified
distance do not influence the features within
the area
• Zone of indifference – combines inverse
distance and distance band
Inverse Distance (impedance) (ArcGIS)
• features impact/influence all other features
– farther away something is, the smaller the impact

• specify a Distance Band/Threshold Distance value
to reduce the number of required computations
– especially with large datasets.
– If not specified, a default threshold
value is computed for you

• Choosing an appropriate distance is important
– Some spatial statistics require each feature to have at
least one neighbor for the analysis to be reliable.
Distance band (sphere of influence)
• impose a sphere of influence, or moving window
conceptual model of spatial interactions onto the data
• Neighbors within the specified distance are weighted
equally. Features outside have no influence (weight = 0)
• Evaluate the statistical properties of your data at a
particular (fixed) spatial scale
• have at least one neighbor, or results will not be valid
• if the input data is skewed make sure that your distance
band is neither too small (only one or two neighbors) nor
too large (include all other features as neighbors)
– resultant z-scores less reliable.
Adjacency Models
• K Nearest Neighbors – a specified number of
neighboring features are included in
calculations
• Polygon Contiguity – polygons that share an
edge or node influence each other
K-nearest neighbors

• each feature assessed in the spatial context of a
specified number of its closest neighbors. If K (t is
8, then eight closest neighbors to the target
feature will be included If feature density is high spatial context of the analysis will be smaller.
• If feature density is sparse, the spatial context for
the analysis will be larger.
• method is available using the Generate Spatial
Weights Matrix tool
Polygon contiguity (first order)
• polygons that share an edge (that have
coincident boundaries) are included in
computations for the target polygon
• modeling some type of contagious process or
are dealing with continuous data represented
as polygons.
Binary Contiguity Weights
• contiguity = common border
• i and j share a border, then wij = 1
• i and j are not neighbors, then wij = 0
• weights are 0 or 1, hence binary
Distance-Based Weights
• distance between points
• distance between polygon
centroids or central points
• distance-band weights:
wij nonzero for dij < d
less than a critical distance d
• k-nearest neighbor weights:
same number of neighbors for all
observations
potential problems with ties
Global vs. Local Statistics
• Global statistics (Clustering) – identify and
measure the pattern of the entire study area
– Do not indicate where specific patterns occur

• Local Statistics (Clusters) – identify variation
across the study area, focusing on individual
features and their relationships to nearby
features (i.e. specific areas of clustering)
Spatial Autocorrelation (Moran’s I)
• Global statistic
• Measures whether the pattern of feature values is clustered,
dispersed, or random.
• Compares the difference between the mean of the target
feature and the mean for all features to the difference
between the mean for each neighbor and the mean for all
features.
Mean of Target
Feature

Mean of each
neighbor
Mean of
all
features
Z-Score & P-value (ArcGIS)
• Very high or very low (negative) z-scores,
associated with very small p-values, are found in
the tails of the normal distribution
• it is unlikely that the observed spatial pattern
reflects the theoretical random pattern
represented by your null hypothesis (CSR)
• The null hypothesis for the pattern analysis tools
is Complete Spatial Randomness (CSR), either of
the features themselves or of the values
associated with those features.
http://resources.arcgis.com/en/help/main/10.
1/index.html#//005p00000006000000
Pseudo P-Value
• significance levels are dependent on the
number of permutations
• One-sided significance test
• For instance, if an observed Moran's I value is
higher than any of the randomly generated
Moran's I values, the pseudo p-value would be
1/100=0.01 for 99 permutations or
1/1,000=0.001 for 999 permutations
Spatial Autocorrelation (Moran’s I)
Polygon Contiguity (first order)
Spatial Autocorrelation (Moran’s I)
Polygon Contiguity (first order)
Percent Black Population, Cook County, IL
Generate Spatial Weights Matrix
K-Nearest Neighbor
Spatial Autocorrelation (Moran’s I)
K-Nearest Neighbor
Percent Black Population, Cook County, IL
Spatial Autocorrelation (Moran’s I)
K-Nearest Neighbor
Percent Black Population, Cook County, IL
Spatial Autocorrelation (Getis –Ord General G High/Low Clustering)
Polygon Contiguity
Percent Black Population, Cook County, IL

If the z-score value is positive, the observed General G index is larger than the expected
General G index, indicating high values for the attribute are clustered in the study area
Geoda Spatial Autocorrelation (Moran’s I)
Percent Black Population, Cook County, IL
Geoda Spatial Autocorrelation (Moran’s I)
Queen Contiguity Weight (1st order)
Percent Black Population, Cook County, IL
Geoda Spatial Autocorrelation (Moran’s I)
K-Nearest Neighbor (eight)
Percent Black Population, Cook County, IL
Geoda Spatial Autocorrelation (Moran’s I)
K-Nearest Neighbor (four)
Percent Black Population, Cook County, IL
Anselin Local Moran’s I
• Local statistic
• Measures the strength of patterns for
each specific feature.
• Compares the value of each feature in a
pair to the mean value for all features in
the study area.
Anselin Local Moran’s I
• Positive I value:
– Feature is surrounded by features with similar values, either high or low.
– Feature is part of a cluster.
– Statistically significant clusters can consist of high values (HH) or low
values (LL)

• Negative I value:
– Feature is surrounded by features with dissimilar values.
– Feature is an outlier.
– Statistically significant outliers can be a feature with a high value
surrounded by features with low values (HL) or a feature with a low
value surrounded by features with high values (LH).
Anselin Local Moran’s I

• The z- scores and p-values are measures of statistical
significance which tell you whether or not to reject the
null hypothesis, feature by feature.
• Indicate whether the apparent similarity (or
dissimilarity) in values for a feature and its neighbors is
greater than one would expect in a random distribution.
http://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/clu
ster_and_outlier_analysis_colon_anselin_local_moran_s_i_spatial_statistics_.htm
index

z-score

p-value
Anselin’s Local Moran’s I
Polygon Contiguity Weight
Percent Black Population
Cook County, IL

HH LH
Geoda Univariate LISA
Queen Contiguity Weight
Percent Black Population, Cook County, IL
p-values 499 Permutations

p-values 999 Permutations
Geoda Univariate LISA
Queen Contiguity Weight
Percent Black Population, Cook County, IL
HH HL 999 Permutations
Comparison ArcGIS & Geoda Results
Queen Contiguity Weight
Percent Black Population, Cook County, IL
p-values
Comparison ArcGIS & Geoda Univariate LISA
Queen Contiguity Weight
Percent Black Population, Cook County, IL
HH HL

HH HL 999 Permutations
Bivariate LISA Scatterplot
High - High

Low-Low

High - Low

Non-point Source
Cancer Risk

Low-High

Percent Poverty

Chow test for selected/unselected regression subsets distribution F(2,1339)
ratio=214.6 p-value=0
INTERCEPT
# of

R^2

Constant

Observations

Std

t-statistic

SLOPE
p-value

Slope

Error

Std

t-statistic

p-value

Error

1343

0.209

0.00442

0.0176

0.251

0.802

0.332

0.0176

18.8

0

80

0.1116

1.58

0.0797

19.8

0

0.045

0.0475

0.957

0.342

1263

0.118

-0.0794

0.0161

-4.92

0

0.223

0.0172

13

0
Global
Model

EDA
ESDA

Local
Model

More Related Content

PPTX
Gis dr rahul
PPTX
Introduction to Health GIS
PPTX
DEM,DTM,DSM
PPT
GIS and Mapping Software Introduction
PDF
Spatial data analysis
PDF
Spatial data analysis 1
PPTX
Gis
Gis dr rahul
Introduction to Health GIS
DEM,DTM,DSM
GIS and Mapping Software Introduction
Spatial data analysis
Spatial data analysis 1
Gis

What's hot (20)

PPT
Digital Cartography
PPTX
Geographic information system (gis)
PDF
Gis+tutorial+1+basic+workbook
PPT
Health GIS (Geographic Information System)
PPT
Image quality assessment and statistical evaluation
PPTX
Geographic information system in public health
PDF
Digital image processing
PDF
Introduction to GIS
PPT
Accuracy assessment of Remote Sensing Data
PPT
Gis Geographical Information System Fundamentals
PPTX
Remote Sensing in Digital Model Elevation
PPTX
Accuracy Assessment in Remote Sensing
PPTX
Remote sensing for change detection (presentation) - Prepared by A F M Fakhru...
PPTX
PPTX
Presentation on gis and future trends
PPTX
Surface Representations using GIS AND Topographical Mapping
PPSX
Measures of fertility
PPTX
Geographic Information System unit 1
PPT
Lecture for landsat
PPTX
Applications of remote sensing and gis
Digital Cartography
Geographic information system (gis)
Gis+tutorial+1+basic+workbook
Health GIS (Geographic Information System)
Image quality assessment and statistical evaluation
Geographic information system in public health
Digital image processing
Introduction to GIS
Accuracy assessment of Remote Sensing Data
Gis Geographical Information System Fundamentals
Remote Sensing in Digital Model Elevation
Accuracy Assessment in Remote Sensing
Remote sensing for change detection (presentation) - Prepared by A F M Fakhru...
Presentation on gis and future trends
Surface Representations using GIS AND Topographical Mapping
Measures of fertility
Geographic Information System unit 1
Lecture for landsat
Applications of remote sensing and gis
Ad

Viewers also liked (20)

PPT
GIS in Health
PDF
Role of GIS in Health Care Management by Dr. Dipti Mukherji
PPT
Geographic information system
PDF
What Is GIS?
PPT
GIS presentation
PPT
Gis (geographic information system)
PPTX
My ppt on gis
PDF
What is GIS
PPT
Integrating Virtual Environment and GIS for 3D Virtual City.ppt
PDF
Esri Health GIS Conference
PPTX
Applications of GIS in Public Health Engineering
PPTX
Open source health gis presentation final
PDF
Spatial data analysis 2
PPTX
Geographic data in public health: Lessons from the field
PDF
Nepal Earthquake 2015 ICIMOD’s focus on reconstruction
PPT
Virginia Gis Health Presentation
PPT
GIS Day 2015 - New Light Technologies, Inc.
PPTX
Mosquera assign ch1_1_1
PDF
ABSTRACT Public Health GIS- DST NATIONAL PROJECT
PPTX
Overview of gis new
GIS in Health
Role of GIS in Health Care Management by Dr. Dipti Mukherji
Geographic information system
What Is GIS?
GIS presentation
Gis (geographic information system)
My ppt on gis
What is GIS
Integrating Virtual Environment and GIS for 3D Virtual City.ppt
Esri Health GIS Conference
Applications of GIS in Public Health Engineering
Open source health gis presentation final
Spatial data analysis 2
Geographic data in public health: Lessons from the field
Nepal Earthquake 2015 ICIMOD’s focus on reconstruction
Virginia Gis Health Presentation
GIS Day 2015 - New Light Technologies, Inc.
Mosquera assign ch1_1_1
ABSTRACT Public Health GIS- DST NATIONAL PROJECT
Overview of gis new
Ad

Similar to GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14 (20)

PDF
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
PPTX
Building maps with analysis
PDF
Autocorrelation_kriging_techniques for Hydrology
PDF
SPATIAL POINT PATTERNS
PPT
GEOSTATISTICAL_ANALYST
PPT
remote sesing resolution for satelitte imag
PPTX
s3859353_Lab2.pptx
PDF
Spatial analysis and Analysis Tools
PPTX
Spatial analysis & interpolation in ARC GIS
PPT
Lecturer1-Introduction to statistics1.ppt
PPT
Sa Presentation 20070917111 Thomas
PDF
(eBook PDF) Introduction to Geographic Information Systems 8th
PDF
Data_Visualization_and_Engineering_UC_2022.pdf
PDF
unitiv-spacialdataanalysis-200423132043.pdf
PPTX
TYBSC IT PGIS Unit IV Spacial Data Analysis
PPT
RichardPughspatial.ppt
PDF
Geographic query and analysis
PPTX
Developing a Tutorial for Grouping Analysis in ArcGIS
PPTX
Review presentation for Orientation 2014
PPT
Marek - Spatial analyses of health data: From points to models
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R
Building maps with analysis
Autocorrelation_kriging_techniques for Hydrology
SPATIAL POINT PATTERNS
GEOSTATISTICAL_ANALYST
remote sesing resolution for satelitte imag
s3859353_Lab2.pptx
Spatial analysis and Analysis Tools
Spatial analysis & interpolation in ARC GIS
Lecturer1-Introduction to statistics1.ppt
Sa Presentation 20070917111 Thomas
(eBook PDF) Introduction to Geographic Information Systems 8th
Data_Visualization_and_Engineering_UC_2022.pdf
unitiv-spacialdataanalysis-200423132043.pdf
TYBSC IT PGIS Unit IV Spacial Data Analysis
RichardPughspatial.ppt
Geographic query and analysis
Developing a Tutorial for Grouping Analysis in ArcGIS
Review presentation for Orientation 2014
Marek - Spatial analyses of health data: From points to models

Recently uploaded (20)

PPTX
Neuropathic pain.ppt treatment managment
PPTX
Respiratory drugs, drugs acting on the respi system
PPTX
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
DOCX
NEET PG 2025 | Pharmacology Recall: 20 High-Yield Questions Simplified
PPTX
1 General Principles of Radiotherapy.pptx
PDF
CT Anatomy for Radiotherapy.pdf eryuioooop
PPTX
Fundamentals of human energy transfer .pptx
PPTX
CEREBROVASCULAR DISORDER.POWERPOINT PRESENTATIONx
PDF
Medical Evidence in the Criminal Justice Delivery System in.pdf
PPTX
SKIN Anatomy and physiology and associated diseases
PPTX
CME 2 Acute Chest Pain preentation for education
PPTX
post stroke aphasia rehabilitation physician
PPT
1b - INTRODUCTION TO EPIDEMIOLOGY (comm med).ppt
PDF
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
PPTX
15.MENINGITIS AND ENCEPHALITIS-elias.pptx
PPT
OPIOID ANALGESICS AND THEIR IMPLICATIONS
PDF
NEET PG 2025 | 200 High-Yield Recall Topics Across All Subjects
PPT
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
PPTX
Imaging of parasitic D. Case Discussions.pptx
PDF
Deadly Stampede at Yaounde’s Olembe Stadium Forensic.pdf
Neuropathic pain.ppt treatment managment
Respiratory drugs, drugs acting on the respi system
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
NEET PG 2025 | Pharmacology Recall: 20 High-Yield Questions Simplified
1 General Principles of Radiotherapy.pptx
CT Anatomy for Radiotherapy.pdf eryuioooop
Fundamentals of human energy transfer .pptx
CEREBROVASCULAR DISORDER.POWERPOINT PRESENTATIONx
Medical Evidence in the Criminal Justice Delivery System in.pdf
SKIN Anatomy and physiology and associated diseases
CME 2 Acute Chest Pain preentation for education
post stroke aphasia rehabilitation physician
1b - INTRODUCTION TO EPIDEMIOLOGY (comm med).ppt
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
15.MENINGITIS AND ENCEPHALITIS-elias.pptx
OPIOID ANALGESICS AND THEIR IMPLICATIONS
NEET PG 2025 | 200 High-Yield Recall Topics Across All Subjects
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
Imaging of parasitic D. Case Discussions.pptx
Deadly Stampede at Yaounde’s Olembe Stadium Forensic.pdf

GIS in Public Health Research: Understanding Spatial Analysis and Interpreting Outcomes 1-31-14

  • 1. GIS in Public Health Research: Understanding Spatial Analysis & Interpreting Outcomes Kristin Osiecki PhD
  • 3. Houston Aerosol Characterization & Health Experiment (HACHE)
  • 4. • UT Health Science Center School of Biomedical Informatics • University of Houston Department of Earth and Atmospheric Sciences • Rice University Department of Sociology and Department of Civil & Environmental Engineering
  • 7. Applications in Public Health Research • Space matters – communities,census tracts, counties, states • Multidisciplinary and Interdisciplinary • Collaborative • Simple and Complex Models
  • 8. What research questions are we trying to answer? • Do we need visualizations or maps? OR • Are we interested in investigating possible spatial relationships within the data?
  • 10. Objectives • • • • Traditional Statistics & Spatial Analysis Permutations Spatial Weights EDA & ESDA
  • 11. "Spatial Statistics" does not mean applying traditional (non-spatial) statistical methods to data that just happens to be spatial (has X and Y coordinates). Source: ESRI http://resources.esri.com/help/9.3/arcgisen gine/java/gp_toolref/spatial_statistics_tools /how_generate_spatial_weights_matrix_spa tial_statistics_works.htm
  • 14. Global & Local Global Model EDA ESDA Global autocorrelation Local autocorrelation Local Model
  • 15. The most crucial step in the process
  • 16. Exploring the Data: EDA & ESDA
  • 22. Exploratory Spatial Data Analysis • Interactively visualize and explore data where space matter • Detect patterns • Hypothesis generation • spatial modeling is needed to test hypotheses • Works on point feature and polygon features (i.e. census, epidemiology, demographic layers)
  • 23. What is Spatial Randomness? • Observed spatial pattern of value is equally as likely as any other spatial pattern • Value at one location does not depend on values at neighboring locations under spatial randomness, the location of values may be altered without affecting the information content of the data • random permutation or reshuffling of values Dr. Luc Anselin 2012
  • 24. Spatial Randomness • Spatial Randomness Null Hypothesis – Spatial randomness is absence in any pattern – If rejected, evidence of spatial structure Dr. Luc Anselin 2012
  • 25. ArcGIS Spatial Autocorrelation • The Randomization Null Hypothesis: Where appropriate, the tools in the Spatial Statistics toolbox use the randomization null hypothesis as the basis for statistical significance testing. The randomization null hypothesis postulates that the observed spatial pattern of your data represents one of many (n!) possible spatial arrangements. If you could pick up your data values and throw them down onto the features in your study area, you would have one possible spatial arrangement of those values. (Note that picking up your data values and throwing them down arbitrarily is an example of a random spatial process). The randomization null hypothesis states that if you could do this exercise (pick them up, throw them down) infinite times, most of the time you would produce a pattern that would not be markedly different from the observed pattern (your real data). Once in a while you might accidentally throw all the highest values into the same corner of your study area, but the probability of doing that is small. The randomization null hypothesis states that your data is one of many, many, many possible versions of complete spatial randomness. The data values are fixed; only their spatial arrangement could vary. http://resources.arcgis.com/en/help/main/10. 1/index.html#//005p00000006000000
  • 26. Permutations • A numerical approach to testing for statistical significance (in contrast to analytical approaches) • It is data-driven and makes no assumptions (such as normality) about the data
  • 27. Permutations in Geoda • Permutation inference is shuffling values around and re-computing statistics each time with a different set of random numbers to construct a reference distribution. • Permutations are used to determine how likely it would be to observe the Moran’s I value of an actual distribution under conditions of spatial randomness. • P-values are dependent on the number of permutations so they are “pseudo p-values”
  • 29. Spatial Weights The first step in the analysis of spatial autocorrelation is to construct a spatial weights file that contains information on the “neighborhood” structure for each location (luc anselin)
  • 30. Generation of Spatial Weights ESRI • For binary strategies (fixed distance, K nearest neighbors, or contiguity) a feature is either a neighbor (1) or it is not (0). • For weighted strategies (inverse distance or zone of indifference) neighboring features have a varying amount of impact (or influence) and weights are computed to reflect that variation.
  • 31. Row Standardization • Adjusts the weights in a spatial weights matrix • Each weight is divided by its row sum • The row sum is the sum of weights for a feature’s neighbors. • A weights matrix is row-standardized when the values of each of its rows sum to one.
  • 32. Binary vs. row-standardized • A binary weights matrix looks like: 0 1 0 0 0 0 1 1 1 1 0 0 0 1 1 1 • A row-standardized matrix it looks like: 0 1 0 0 0 0 .5 .5 .5 .5 0 0 0 .33 .33 .33
  • 33. Spatial Weights • Formal expression of locational similarity
  • 34. Distance Models • Inverse distance – all features influence all other features, but the closer something is, the more influence it has • Distance band – features outside a specified distance do not influence the features within the area • Zone of indifference – combines inverse distance and distance band
  • 35. Inverse Distance (impedance) (ArcGIS) • features impact/influence all other features – farther away something is, the smaller the impact • specify a Distance Band/Threshold Distance value to reduce the number of required computations – especially with large datasets. – If not specified, a default threshold value is computed for you • Choosing an appropriate distance is important – Some spatial statistics require each feature to have at least one neighbor for the analysis to be reliable.
  • 36. Distance band (sphere of influence) • impose a sphere of influence, or moving window conceptual model of spatial interactions onto the data • Neighbors within the specified distance are weighted equally. Features outside have no influence (weight = 0) • Evaluate the statistical properties of your data at a particular (fixed) spatial scale • have at least one neighbor, or results will not be valid • if the input data is skewed make sure that your distance band is neither too small (only one or two neighbors) nor too large (include all other features as neighbors) – resultant z-scores less reliable.
  • 37. Adjacency Models • K Nearest Neighbors – a specified number of neighboring features are included in calculations • Polygon Contiguity – polygons that share an edge or node influence each other
  • 38. K-nearest neighbors • each feature assessed in the spatial context of a specified number of its closest neighbors. If K (t is 8, then eight closest neighbors to the target feature will be included If feature density is high spatial context of the analysis will be smaller. • If feature density is sparse, the spatial context for the analysis will be larger. • method is available using the Generate Spatial Weights Matrix tool
  • 39. Polygon contiguity (first order) • polygons that share an edge (that have coincident boundaries) are included in computations for the target polygon • modeling some type of contagious process or are dealing with continuous data represented as polygons.
  • 40. Binary Contiguity Weights • contiguity = common border • i and j share a border, then wij = 1 • i and j are not neighbors, then wij = 0 • weights are 0 or 1, hence binary Distance-Based Weights • distance between points • distance between polygon centroids or central points • distance-band weights: wij nonzero for dij < d less than a critical distance d • k-nearest neighbor weights: same number of neighbors for all observations potential problems with ties
  • 41. Global vs. Local Statistics • Global statistics (Clustering) – identify and measure the pattern of the entire study area – Do not indicate where specific patterns occur • Local Statistics (Clusters) – identify variation across the study area, focusing on individual features and their relationships to nearby features (i.e. specific areas of clustering)
  • 42. Spatial Autocorrelation (Moran’s I) • Global statistic • Measures whether the pattern of feature values is clustered, dispersed, or random. • Compares the difference between the mean of the target feature and the mean for all features to the difference between the mean for each neighbor and the mean for all features. Mean of Target Feature Mean of each neighbor Mean of all features
  • 43. Z-Score & P-value (ArcGIS) • Very high or very low (negative) z-scores, associated with very small p-values, are found in the tails of the normal distribution • it is unlikely that the observed spatial pattern reflects the theoretical random pattern represented by your null hypothesis (CSR) • The null hypothesis for the pattern analysis tools is Complete Spatial Randomness (CSR), either of the features themselves or of the values associated with those features. http://resources.arcgis.com/en/help/main/10. 1/index.html#//005p00000006000000
  • 44. Pseudo P-Value • significance levels are dependent on the number of permutations • One-sided significance test • For instance, if an observed Moran's I value is higher than any of the randomly generated Moran's I values, the pseudo p-value would be 1/100=0.01 for 99 permutations or 1/1,000=0.001 for 999 permutations
  • 45. Spatial Autocorrelation (Moran’s I) Polygon Contiguity (first order)
  • 46. Spatial Autocorrelation (Moran’s I) Polygon Contiguity (first order) Percent Black Population, Cook County, IL
  • 47. Generate Spatial Weights Matrix K-Nearest Neighbor
  • 48. Spatial Autocorrelation (Moran’s I) K-Nearest Neighbor Percent Black Population, Cook County, IL
  • 49. Spatial Autocorrelation (Moran’s I) K-Nearest Neighbor Percent Black Population, Cook County, IL
  • 50. Spatial Autocorrelation (Getis –Ord General G High/Low Clustering) Polygon Contiguity Percent Black Population, Cook County, IL If the z-score value is positive, the observed General G index is larger than the expected General G index, indicating high values for the attribute are clustered in the study area
  • 51. Geoda Spatial Autocorrelation (Moran’s I) Percent Black Population, Cook County, IL
  • 52. Geoda Spatial Autocorrelation (Moran’s I) Queen Contiguity Weight (1st order) Percent Black Population, Cook County, IL
  • 53. Geoda Spatial Autocorrelation (Moran’s I) K-Nearest Neighbor (eight) Percent Black Population, Cook County, IL
  • 54. Geoda Spatial Autocorrelation (Moran’s I) K-Nearest Neighbor (four) Percent Black Population, Cook County, IL
  • 55. Anselin Local Moran’s I • Local statistic • Measures the strength of patterns for each specific feature. • Compares the value of each feature in a pair to the mean value for all features in the study area.
  • 56. Anselin Local Moran’s I • Positive I value: – Feature is surrounded by features with similar values, either high or low. – Feature is part of a cluster. – Statistically significant clusters can consist of high values (HH) or low values (LL) • Negative I value: – Feature is surrounded by features with dissimilar values. – Feature is an outlier. – Statistically significant outliers can be a feature with a high value surrounded by features with low values (HL) or a feature with a low value surrounded by features with high values (LH).
  • 57. Anselin Local Moran’s I • The z- scores and p-values are measures of statistical significance which tell you whether or not to reject the null hypothesis, feature by feature. • Indicate whether the apparent similarity (or dissimilarity) in values for a feature and its neighbors is greater than one would expect in a random distribution. http://resources.esri.com/help/9.3/arcgisengine/java/gp_toolref/spatial_statistics_tools/clu ster_and_outlier_analysis_colon_anselin_local_moran_s_i_spatial_statistics_.htm
  • 58. index z-score p-value Anselin’s Local Moran’s I Polygon Contiguity Weight Percent Black Population Cook County, IL HH LH
  • 59. Geoda Univariate LISA Queen Contiguity Weight Percent Black Population, Cook County, IL p-values 499 Permutations p-values 999 Permutations
  • 60. Geoda Univariate LISA Queen Contiguity Weight Percent Black Population, Cook County, IL HH HL 999 Permutations
  • 61. Comparison ArcGIS & Geoda Results Queen Contiguity Weight Percent Black Population, Cook County, IL p-values
  • 62. Comparison ArcGIS & Geoda Univariate LISA Queen Contiguity Weight Percent Black Population, Cook County, IL HH HL HH HL 999 Permutations
  • 63. Bivariate LISA Scatterplot High - High Low-Low High - Low Non-point Source Cancer Risk Low-High Percent Poverty Chow test for selected/unselected regression subsets distribution F(2,1339) ratio=214.6 p-value=0 INTERCEPT # of R^2 Constant Observations Std t-statistic SLOPE p-value Slope Error Std t-statistic p-value Error 1343 0.209 0.00442 0.0176 0.251 0.802 0.332 0.0176 18.8 0 80 0.1116 1.58 0.0797 19.8 0 0.045 0.0475 0.957 0.342 1263 0.118 -0.0794 0.0161 -4.92 0 0.223 0.0172 13 0