Briggs Henan University 2010
1
The Standard Regression Model
and its Spatial Alternatives.
Relationships Between Variables
and
Building Predictive Models
Briggs Henan University 2010
2
Spatial Statistics
Descriptive Spatial Statistics: Centrographic Statistics
– single, summary measures of a spatial distribution
–- Spatial equivalents of mean, standard deviation, etc..
Inferential Spatial Statistics: Point Pattern Analysis
Analysis of point location only--no quantity or magnitude (no attribute variable)
--Quadrat Analysis
--Nearest Neighbor Analysis, Ripley’s K function
Spatial Autocorrelation
– One attribute variable with different magnitudes at each location
The Weights Matrix
Global Measures of Spatial Autocorrelation
(Moran’s I, Geary’s C, Getis/Ord Global G)
Local Measures of Spatial Autocorrelation (LISA and others)
Prediction with Correlation and Regression
–Two or more attribute variables
Standard statistical models
Spatial statistical models



Bivariate and Multivariate
• All measures so far have focused
on one variable at a time
– univariate
• Often, we are interested in the
association or relationship
between two variables
– bivariate.
• Or more than two variables
– multivariate
Briggs Henan University 2010 3
Y
X1 X2
education
income
gender*
income
*Gender = male or female
education
Correlation and Regression
The most commonly used techniques in science.
Review standard (non-spatial) approaches
Correlation
Regression
Spatial Regression
Why it is necessary.
How to do it.
Briggs Henan University 2010
4
Correlation and Regression
What is the difference?
• Mathematically, they are identical.
• Conceptually, very different.
Correlation
• Co-variation
• Relationship or association
• No direction or causation is implied
• Y X X1 X2
Regression
– Prediction of Y from X
– Implies, but does not prove, causation
– X (independent variable) Y (dependent variable)
Briggs Henan University 2010 5
Regression line
predicts
• The most common statistic in all of science
• measures the strength of the relationship (or “association”) between two
variables e.g. income and education
• Varies on a scale from –1 thru 0 to +1
+1 implies a perfect positive association
• As values go up () on one, they also go up () on the other
• income and education
0 implies no association
-1 implies perfect negative association
• As values go up on one () , they go down () on the other
• price and quantity purchased
• Full name is the Pearson Product Moment correlation coefficient,
6
Correlation Coefficient (r)
Briggs Henan University 2010
-1 0 +1
() ()
() ()
Examples of Scatter Diagrams and the Correlation Coefficient
r = 1
perfect
positive
r = -1
r = 0.26
weak
positive
perfect
negative
Education
Income
r = -0.71
strong
negative
Price
Quantity
Positive
Negative
r = 0.72
strong
positive
7
Briggs Henan University 2010
Correlation Coefficient: example
Briggs Henan University 2010 8
China Provinces 29
excludes Xizang/Tibet, Macao, Hong Kong, Hainan, Taiwan, P'eng-hu
Correlation coefficient
= 0.9458
(see later for calculation)
Briggs Henan University 2010 9
Pearson Product Moment Correlation Coefficient (r)
Where Sx and Sy are the standard
deviations of X and Y, and X and Y
are the means.
y
x
n
i
i
i
S
S
n
Y
y
X
x
r
)
)(
(
1



 
Moments about the mean
)
(
1
2


N
Y
Y
n
i
i
Sy=
)
(
1
2


N
X
X
n
i
i
SX=
“product” is the result of
a multiplication
X * Y = P
Briggs UT-Dallas GISC 6382 Spring 2007 10
Calculation Formulae for
Correlation Coefficient (r)
Before the days of
computers, these
formulae where easier
to do “by hand.”
See next slide for example
Calculating r for urban v. rural income
11
Province UrbanIncome x-xMean x-xMean2 RuralIncome y-ymean y-ymean2 (x-xM)(y-yM)
Anhui 14086 -2376 5646195 4504 -1202 1444970 2856323
Zhejiang 24611 8149 66403391 10008 4302 18506611 35055694
Jiangxi 14022 -2440 5954441 5075 -631 398248 1539917
Jiangsu 20552 4090 16726690 8004 2298 5280487 9398142
Jilin 14006 -2456 6032783 5450 -256 65571 628950
Qinghai 12692 -3770 14214200 3346 -2360 5569926 8897867
Fujian 19557 3095 9577958 6880 1174 1378114 3633114
Heilongjiang 12566 -3896 15180159 5207 -499 249070 1944459
Henan 14372 -2090 4368821 4807 -899 808325 1879209
Hebei 14718 -1744 3042137 5150 -556 309213 969880
Hunan 15084 -1378 1899359 4910 -796 633726 1097120
Hubei 14367 -2095 4389747 5035 -671 450334 1406005
Xinjiang 12258 -4204 17675066 4005 -1701 2893636 7151587
Gansu 11930 -4532 20540587 2980 -2726 7431452 12355015
Guangxi 15451 -1011 1022470 3980 -1726 2979314 1745353
Guizhou 12863 -3599 12954042 3005 -2701 7295774 9721613
Liaoning 15800 -662 438472 6000 294 86395 -194633
Nei Mongol 15849 -613 375980 4938 -768 589930 470959
Ningxia 14025 -2437 5939809 4048 -1658 2749193 4041000
Beijing 26738 10276 105592633 11986 6280 39437534 64531489
Shanghai 28838 12376 153161108 12324 6618 43797011 81902373
Shanxi 13997 -2465 6077075 4244 -1462 2137646 3604252
Shandong 17811 1349 1819336 6119 413 170512 556973
Shaanxi 14129 -2333 5443694 3438 -2268 5144137 5291796
Sichuan 13904 -2558 6544246 4462 -1244 1547708 3182543
Tianjin 21430 4968 24679311 10675 4969 24690276 24684793
Yunnan 14424 -2038 4154147 3369 -2337 5461891 4763349
Guangdong 21574 5112 26130781 6906 1200 1439834 6133841
Chongqing 15749 -713 508615 4621 -1085 1177375 773841
SUM 477403 0.00 546493254 165476 0.00 184124210 300022824
AVERAGE 16462 0.00 18844595 5706 0.00 6349111 10345615
Sx= SQRT( 18844595) Sy= SQRT( 6349111) r = 10345615/4341/2520
= 4341 = 2520 = 0.9458
Briggs Henan University 2010 12
Correlation Coefficient
example using
“calculation formulae”
Scatter Diagram
Source: Lee and Wong
Regression
• Simple regression
– Between two variables
• One dependent variable (Y)
• One independent variable (X)
• Multiple Regression
– Between three or more variables
• One dependent variable (Y)
• Two or independent variable (X1 ,X2…)
Briggs Henan University 2010 13
Y
X
Y
X1 X2
education
income
gender*
Briggs Henan University 2010 14
Simple Linear Regression
• Concerned with “predicting” one variable (Y - the dependent
variable) from another variable (X - the independent variable)
Y = a +bX +
 = residual= error = Yi-Ŷi =Actual (Yi ) – Predicted (Ŷi )
a
b
1
Y
X
Yi
X
a is the intercept —the value of Y when X =0
b is the regression coefficient or slope of the line
—the change in Y for a one unit change in X
Ŷi
0
Regression line
Ordinary Least Squares (OLS)
--the standard criteria for obtaining the
regression line
Briggs Henan University 2010 15
Ŷi
Yi
The regression line
minimizes the sum of the
squared deviations
between actual Yi and
predicted Ŷi
Min (Yi-Ŷi)2
Yi
Ŷi
Coefficient of Determination (r2
)
• The coefficient of determination (r2
) measures the proportion of the variance in Y (the
dependent variable) which can be predicted or “explained by” X (the independent
variable). Varies from 1 to 0.
• It equals the correlation coefficient (r) squared.
16
2
i
2
i
2
)
Y
–
Y
(
)
Y
–
Ŷ
(



r
SS Regression or Explained Sum of Squares
SS Total or Total Sum of Squares
SS Residual or
Error Sum of
Squares
SS Total or
Total Sum of
Squares
2
i
i
2
i
2
i )
Ŷ
–
Y
(
)
Y
–
Ŷ
(
)
Y
–
Y
( 

 

SS Regression or
Explained Sum of
Squares
Note:
= +
Partitioning the Variance on Y
Briggs Henan University 2010 17
SS Residual
or Error Sum of Squares
SS Total
or Total Sum of
Squares
2
i
i
2
i
2
i )
Ŷ
–
Y
(
)
Y
–
Ŷ
(
)
Y
–
Y
( 

 

SS Regression
or Explained Sum of
Squares
2
i
i
2
i
i
2
)
Y
–
Y
(
)
Y
–
Ŷ
(



r
Y Y Y
Y Y
Y
Standard Error of the Estimate (se)
Briggs Henan University 2010 18
Measures predictive accuracy: the bigger the standard error, the
greater the spread of the observations about the regression line,
thus the predictions are less accurate
Se
2
= error mean square, or average squared residual
= variance of the estimate, variance about regression
(called sigma-square in GeoDA)
k
n
se


 2
i
i )
Ŷ
–
Y
( Sum of squared residuals
Number of observations minus degrees of freedom
(for simple regression, degrees of freedom = 2)
r2
= r = 1
Se= 0.0
Sy = 2
b = 2 perfect
positive
r2
= 0.94 r = .97
Se= 0.3
r2
= 0.51 r = .71
Se= 1.1
b = 1.1
Very strong strong
r2
= 0.26 r = .51
Se= 1.3
b = 0.8
r2
= 0.07
Se= 1.8
b = 0.1
r2
= r= 0.00
Se= Sy = 2
moderate weak none
Regression
line in
blue
As the coefficient of determination gets smaller, the slope of
the regression line (b) gets closer to zero.
As the coefficient of determination gets smaller, the standard
error gets larger, and closer to the standard deviation of the
dependent variable (Y) (Sy = 2)
Coefficient of determination (r2
), correlation coefficinet (r),
regression coefficient (b), and standard error (Se)
b = 0
(Values are hypothetical and for illustration of relative change only)
Sample Statistics, Population Parameters and
Statistical Significance tests
Yi = a +bXi +i a and b are sample statistics
which are estimates of
population parameters α and β
β (and b) measure the change in Y for a one unit change in X. If β = 0 then X has no effect on Y, therefore
Null Hypothesis (H0): in the population β = 0
Alternative Hypothesis (H1): in the population β ≠ 0
Thus, we test if our sample regression coefficient, b, is sufficiently different from zero to reject the Null
Hypothesis and conclude that X has a statistically significant affect on Y
Briggs Henan University 2010 20
i
i
i ε
βX
α
Y 


Test Statistics in Simple Regression
Test statistic for b is distributed according to the Student’s t Distribution
(similar to normal):
where is the variance of the estimate,
with degrees of freedom = n – 2
A test can also be conducted on the coefficient of determination (r2
) to test if it is
significantly greater than zero, using the F frequency distribution.
It is mathematically identical to the t test.
Briggs Henan University 2010 21
2
/
)
Ŷ
–
Y
(
1
/
)
Y
–
Ŷ
(
S.S./d.f.
Residual
S.S./d.f.
Regression
2
i
i
2
i





n
F
)
(
SE(b)
2
2
 


i
e
X
X
s
b
b
t
2
e
s
Multiple regression
Briggs Henan University 2010 22
ε
X
β
β
Y
ε
βX
α
Y






1
0
We can rewrite simple regression as:
ε
X
β
X
β
X
β
β
Y m
m 




 ...
2
2
1
1
0
Multiple regression: Y is predicted from 2 or more independent variables
β0 is the intercept —the value of Y when values of all Xj = 0
β1… β m are partial regression coefficients which give the
change in Y for a one unit change in Xj, all other X variables held
constant
m is the number of independent variables
Y
X1 X2
education
income
gender*
Multiple regression: least squares criteria
23
residuals
)
ˆ
Predicted
-
Actual
(
)
ˆ
(
)
hyperplane
n
(regressio
Y
for
values
predicted
ˆ
0
0











i
i
i
i
m
j
j
ij
i
i
m
j
j
ij
i
Y
Y
Y
Y
b
X
Y
e
b
X
Y
ε
X
β
X
β
X
β
β
Y m
m 




 ...
2
2
1
1
0
).
Y
(actual i
0
i
m
j
j
ij
i ε
β
X
Y 


or
2
n
1
i
)
ˆ
(
Min i
i Y
Y 
 
As in simple regression, the “least squares”
criteria is used. Regression coefficients bj are
chosen to minimize the sum of the squared
residuals (the deviations between actual Yi and
predicted Ŷi)
The difference is that Ŷi is predicted from 2 or
more independent variables, not one.
Regression
hyperplane
Coefficient of Multiple Determination (R2
)
• Similar to simple regression, the coefficient of multiple determination (R2
) measures the
proportion of the variance in Y (the dependent variable) which can be predicted or
“explained by” all of X variables in combination.
Varies from 0 to 1.
24
2
i
2
i
2
)
Y
–
Y
(
)
Y
–
Ŷ
(



R
SS Regression or Explained Sum of Squares
SS Total or Total Sum of Squares
SS Residual or
Error Sum of
Squares
SS Total or
Total Sum of
Squares
2
i
i
2
i
2
i )
Ŷ
–
Y
(
)
Y
–
Ŷ
(
)
Y
–
Y
( 

 

SS Regression or
Explained Sum of
Squares
As with
simple
regression
= +
Formulae identical to simple regression
Reduced or Adjusted
• R2
will always increase each time another
independent variable is included
– an additional dimension is available
for fitting the regression hyperplane
(the multiple regression equivalent
of the regression line)
• Adjusted is normally used instead of R2
in
multiple regression
Briggs Henan University 2010 25
2
R
2
R
)
1
)(
1
(
1 2
2
k
n
n
R
R





k is the number of coefficients
in the regression equation,
normally equal to the number of
independent variables plus 1
for the intercept.
Y
X1 X2
perfect
Not perfect
Interpreting partial regression coefficients
• The regression coefficients (bj) tell us the change in Y for a 1 unit
change in Xj, all other X variables “held constant”
• Can we compare these bj values to tell us the relative importance of
the independent variables in affecting the dependent variable?
– If b1 = 2 and b2 = 4, is the affect of X2 twice as big as the affect of X1 ?
• No, no, no in general!!!!
• The size of bj depends on the measurement scale used for each
independent variable
– if X1 is income, then a 1 unit change is $1
– but if X2 is rmb or Euro(€) or even cents ( )
₵
1 unit is not the same!
– And if X2 is % population urban, 1 unit is very different
• Regression coefficients are only directly comparable if the units are all
the same: all $ for example
26
a
b
1
Y
X
Standardized partial regression coefficients
Comparing the Importance of Independent Variables
• How do we compare the relative importance of independent variables?
• We know we cannot use partial regression coefficients to directly compare
independent variables unless they are all measured on the same scale
• However, we can use standardized partial regression coefficients (also
called beta weights, beta coefficients, or path coefficients).
• They tell us the number of standard deviation (SD) unit changes in Y for
a one SD change in X)
• They are the partial regression coefficients if we had measured every
variable in standardized form
27
)
s
(
Y
j
j
X
j
XY
s
b


Note the confusing use of β for both standardized partial regression
coefficients and for the population parameter they estimate.
s
)
(
X
x
x
z i
i


Test Statistics in Multiple Regression:
testing each independent variable
A test can be conducted for each partial regression coefficient bj
to test if the associated independent variable influences the
dependent variable. It is distributed according to the Student’s t
Distribution (similar to the normal frequency distribution):
Null Hypothesis Ho : bj = 0
.
Briggs Henan University 2010 28
)
SE(bj
j
b
t 
2
e
s
with degrees of freedom = n – k, where k is the
number of coefficients in the regression equation,
normally equal to the number of independent
variables plus 1 for the intercept (m+1).
The formula for calculating the standard error (SE) of bj is more
complex than for simple regression , so it is not shown here.
Test Statistics in Mutiple Regression
testing the overall model
• We test the coefficient of multiple determination (R2
) to see if it is significantly greater
than zero, using the F frequency distribution.
• It is an overall test to see if at least one independent variable, or two or more in
combination, affect the dependent variable.
• Does not test if each and every independent variable has an effect
• Similar to the F test in simple regression.
– But unlike simple regression, it is not identical to the t tests.
• It is possible (but unusual) for the F test to be significant but all t tests not significant.
29
k
n
l
k
F






/
)
Ŷ
–
Y
(
/
)
Y
–
Ŷ
(
S.S./d.f.
Residual
S.S./d.f.
Regression
2
i
i
2
i
Again, k is the number of coefficients in the regression equation,
normally equal to the number of variables (m) plus 1.
Briggs Henan University 2010
30
Anscombe, Francis J. (1973). "Graphs in statistical analysis". The American
Statistician 27: 17–21.
Always look at your data
Don’t just rely on the statistics!
Anscombe's quartet
Summary statistics are the
same for all four data sets:
mean (7.5),
standard deviation (4.12),
correlation (0.816)
regression line
(y = 3 + 0.5x).
Briggs Henan University 2010
31
Waiting time between eruptions and the duration of the eruption for the Old
Faithful Geyser in Yellowstone National Park, Wyoming, USA. This chart
suggests there are generally two "types" of eruptions: short-wait-short-
duration, and long-wait-long-duration.
Source: Wikipedia
Real data is almost
always more complex
than the simple,
straight line
relationship assumed in
regression.
Spurious relationships
Briggs Henan University 2010 32
Eating ice cream
inhibits swimming
ability.
--eat too much, you
cannot swim
Omitted variable
problem
--both are related to a
third variable not
included in the analysis
Summer temperatures:
--more people swim
(and some drown)
--more ice cream is sold
Help!
Regression does not prove direction or cause!
Income and Illiteracy
• Provinces with higher incomes can afford
to spend more on education, so illiteracy is
lower
– Higher Income>>>>Less Illiteracy
• The higher the level of literacy (and thus
the lower the level of illiteracy) the more
high income jobs.
– Less Illiteracy>>>>Higher Income
• Regression will not decide!
Briggs Henan University 2010 33
Income
Illiteracy
Income
Illiteracy
Spatial Regression
It doesn’t solve any of the problems just discussed!
You always must examine your data!
Briggs Henan University 2010
34
Spatial Autocorrelation:
shows the association or
relationship between the
same variable in “near-
by” areas.
Spatial Autocorrelation & Correlation
Standard Correlation
shows the association or
relationship between two
different variables
35
income
education
education
Education
“next door”
In a neighboring
or near-by area
Each point is a
geographic location
Briggs Henan University 2010
36
If Spatial Autocorrelation exists:
• correlation coefficients and coefficients of
determination appear bigger than they really are
• biased upward
• You think the relationship is
stronger than it really is
• the variables in nearby areas
affect each other
• Standard errors appear smaller than they really are
• exaggerated precision
• You think your predictions are better than they really are
since standard errors measure predictive accuracy
• More likely to conclude
relationship is statistically significant.
Briggs Henan University 2010
(We discussed this in detail in the lecture on Spatial Autocorrelation concepts.)
SE(b)
b
t 
How do I know if I have a problem?
For correlation, calculate Moran’s I for each variable and test its statistical
significance
– If Moran’s I is significant, you may have a problem!
For regression, calculate the residuals
Yi-Ŷi =Actual (Yi ) – Predicted (Ŷi )
Then:
(1)map the residuals: do you see any spatial patterns?
--if yes, you may have a problem
(2)Calculate Moran’s I for the residuals: is it statistically significant?
--if yes, you have a problem
Briggs Henan University 2010 37
What do I do if SA exists?
• Acknowledge in your paper that SA exists
and that the calculated correlation
coefficients may be larger than their true
value, and may not be statistically
significant
• Try to fix the problem!
Briggs Henan University 2010 38
How do I fix SA?
Step 1:
Try to identify omitted variables and include them in a
multiple regression.
• Missing (omitted) variables may cause spatial autocorrelation
• Regression assumes all relevant variables influencing the
dependent variable are included
– If relevant variables are missing, model is misspecified
Step 2:
If additional variables cannot be identified, or SA still exists,
use a spatial regression model
Briggs Henan University 2010 39
Spatial Regression: 4 Options
1. Spatial Autoregressive Models
1. Lag model
2. Error model
2. Spatial Filtering
--based on eigenfunctions (Griffith)
3. Spatial Filtering
--based on Ripley’s K and Getis-Ord G (Getis)
3. Others
We will consider the first option only.
– simpler and the more commonly used
– Getis and Griffith 2002 compare the first three
40
Getis, A. and Daniel Griffith (2002) Comparative Spatial Filtering in
Regression Analysis Geographical Analysis 34 (2) 130-140
41
• Spatial lag model
values of the dependent variable in neighboring locations
(WY) are included as an extra explanatory variable
• these are the “spatial lag” of Y
• Spatial error model
values of the residuals in neighboring locations (Wε) are
included as an extra term in the equation;
• these are “spatial error”
W is the spatial weights matrix
Y = β0 + λ WY + Xβ + ε
Y = β0 + Xβ + ρWε + ξ
Spatial Lag and Spatial Error Models:
mathematical comparison
ξ is “white noise”
Spatial Lag and Spatial Error Models:
conceptual comparison
Briggs Henan University 2010 42
OLS SPATIAL LAG SPATIAL ERROR
Baller, R., L. Anselin, S. Messner, G. Deane and D. Hawkins. 2001.
Structural covariates of US County homicide rates: incorporating spatial
effects,. Criminology , 39, 561-590
Ordinary Least Squares
No influence from
neighbors
Dependent variable
influenced by
neighbors
Residuals influenced
by neighbors
Lag or Error Model: Which to use?
• Lag model primarily controls spatial
autocorrelation in the dependent variable
• Error model controls spatial autocorrelation
in the residuals, thus it controls
autocorrelation in both the dependent and
the independent variables
• Conclusion: the error model is more robust
and generally the better choice.
• Statistical tests called the LM Robust test
can also be used to select
– Will not discuss these
Briggs Henan University 2010 43
Comparing our models
• Which model best predicts the dependent variable?
• Neither R2
nor Adjusted can be used to compare different spatial
regression models
• Instead, we use Akaike Information Criteria (AIC)
– the smaller the AIC value the better the model
Note: can only be used to compare models with the same dependent variable
Briggs Henan University 2010 44
2
R
)]
Squares
of
Sum
Residual
[ln(
2 n
k
AIC 

k is the number of coefficients in the regression equation, normally equal
to the number of independent variables plus 1 for the intercept term.
Akaike, Hirotuga (1974) A new look at statistical model identification
IEEE Transactions on Automatic Control 19 (6) 716-723
Briggs Henan University 2010 45
Geographically Weighted Regression
• The idea of Local Indicators can also be applied to regression
• Its called geographically weighted regression
• It calculates a separate regression
for each polygon and its neighbors,
– then maps the parameters from the model, such as the regression
coefficient (b) and/or its significance value
• Mathematically, this is done by applying the spatial weights
matrix (Wij) to the standard formulae for regression
See Fotheringham, Brunsdon and Charlton Geographically Weighted
Regression Wiley, 2002
Xi
Problems with Geographically Weighted Regression
• Each regression is based on few observations
– the estimates of the regression
parameters (b) are unreliable
• Need to use more observations than just those with
shared border, but
– how far out do we go?
– How far out is the “local effect”?
• Need strong theory to explain why the regression
parameters are different at different places
• Serious questions about validity of statistical
inference tests since observations not independent
Briggs Henan University 2010 46
Xi
What have we learned today?
• Correlation and regression are very good tools
for science.
• Spatial data can cause problems with standard
correlation and regression
• The problems are caused by spatial
autocorrelation
• We need to use Spatial Regression Models
• Geographers and GIS specialists are experts on
spatial data
– They need to understand these issues!
Briggs Henan University 2010 47
Briggs Henan University 2010
48

More Related Content

PPTX
Stat 1163 -correlation and regression
PPTX
Correlation and Regression ppt
PPTX
Bivariate linear regression
PPTX
Correlation and regression
PPT
Intro to corhklloytdeb koptrcb k & reg.ppt
PDF
Introduction to correlation and regression analysis
PPT
2-20-04.ppt
PPTX
Correlation and regression
Stat 1163 -correlation and regression
Correlation and Regression ppt
Bivariate linear regression
Correlation and regression
Intro to corhklloytdeb koptrcb k & reg.ppt
Introduction to correlation and regression analysis
2-20-04.ppt
Correlation and regression

Similar to spatio-temporal modelling, in samall area (20)

PPTX
Correlation and Regression.pptx
PPT
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh
PDF
need help with stats 301 assignment help
PPTX
6 the six uContinuous data analysis.pptx
PPT
Regression and Co-Relation
PPTX
CORRELATION AND REGRESSION.pptx
PPT
correlation in Marketing research uses..
PPT
Linear regression.ppt
DOCX
Unit 5 Correlation
PPT
Lecture 13 Regression & Correlation.ppt
PDF
Shahid Lecture-6- MKAG1273
PPTX
Regression-SIMPLE LINEAR (1).psssssssssptx
PPT
correlation and regression
PPTX
Correlation-and-regression-Analysis.pptx
PDF
Chapter 2 part3-Least-Squares Regression
PPTX
Correlation and Regression2_student.pptx
PPTX
CH-VI Regression and Correlation.pptx
PPTX
Regression and correlation in statistics
PDF
9. parametric regression
PPTX
STATISTICAL REGRESSION MODELS
Correlation and Regression.pptx
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh
need help with stats 301 assignment help
6 the six uContinuous data analysis.pptx
Regression and Co-Relation
CORRELATION AND REGRESSION.pptx
correlation in Marketing research uses..
Linear regression.ppt
Unit 5 Correlation
Lecture 13 Regression & Correlation.ppt
Shahid Lecture-6- MKAG1273
Regression-SIMPLE LINEAR (1).psssssssssptx
correlation and regression
Correlation-and-regression-Analysis.pptx
Chapter 2 part3-Least-Squares Regression
Correlation and Regression2_student.pptx
CH-VI Regression and Correlation.pptx
Regression and correlation in statistics
9. parametric regression
STATISTICAL REGRESSION MODELS
Ad

More from yonas381043 (6)

PPT
mapping disease risk in space and time, re-mapping
PPT
Statistical tests for categorical data(2020)88.ppt
PPT
spatial modeling of aggregated data in small area
PPT
Non-parametric methods of data analysis for non normal data
PPT
Categorical data analysis which part of the generalized linear model
PPT
Non-parametric statistics - a class of statistics associated with non paramet...
mapping disease risk in space and time, re-mapping
Statistical tests for categorical data(2020)88.ppt
spatial modeling of aggregated data in small area
Non-parametric methods of data analysis for non normal data
Categorical data analysis which part of the generalized linear model
Non-parametric statistics - a class of statistics associated with non paramet...
Ad

Recently uploaded (20)

DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
Empowerment Technology for Senior High School Guide
PDF
Hazard Identification & Risk Assessment .pdf
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
IGGE1 Understanding the Self1234567891011
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
Trump Administration's workforce development strategy
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PPTX
History, Philosophy and sociology of education (1).pptx
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PDF
International_Financial_Reporting_Standa.pdf
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
LDMMIA Reiki Yoga Finals Review Spring Summer
Paper A Mock Exam 9_ Attempt review.pdf.
Empowerment Technology for Senior High School Guide
Hazard Identification & Risk Assessment .pdf
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
Chinmaya Tiranga quiz Grand Finale.pdf
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
IGGE1 Understanding the Self1234567891011
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
A powerpoint presentation on the Revised K-10 Science Shaping Paper
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Trump Administration's workforce development strategy
Cambridge-Practice-Tests-for-IELTS-12.docx
Uderstanding digital marketing and marketing stratergie for engaging the digi...
FORM 1 BIOLOGY MIND MAPS and their schemes
History, Philosophy and sociology of education (1).pptx
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
International_Financial_Reporting_Standa.pdf

spatio-temporal modelling, in samall area

  • 1. Briggs Henan University 2010 1 The Standard Regression Model and its Spatial Alternatives. Relationships Between Variables and Building Predictive Models
  • 2. Briggs Henan University 2010 2 Spatial Statistics Descriptive Spatial Statistics: Centrographic Statistics – single, summary measures of a spatial distribution –- Spatial equivalents of mean, standard deviation, etc.. Inferential Spatial Statistics: Point Pattern Analysis Analysis of point location only--no quantity or magnitude (no attribute variable) --Quadrat Analysis --Nearest Neighbor Analysis, Ripley’s K function Spatial Autocorrelation – One attribute variable with different magnitudes at each location The Weights Matrix Global Measures of Spatial Autocorrelation (Moran’s I, Geary’s C, Getis/Ord Global G) Local Measures of Spatial Autocorrelation (LISA and others) Prediction with Correlation and Regression –Two or more attribute variables Standard statistical models Spatial statistical models   
  • 3. Bivariate and Multivariate • All measures so far have focused on one variable at a time – univariate • Often, we are interested in the association or relationship between two variables – bivariate. • Or more than two variables – multivariate Briggs Henan University 2010 3 Y X1 X2 education income gender* income *Gender = male or female education
  • 4. Correlation and Regression The most commonly used techniques in science. Review standard (non-spatial) approaches Correlation Regression Spatial Regression Why it is necessary. How to do it. Briggs Henan University 2010 4
  • 5. Correlation and Regression What is the difference? • Mathematically, they are identical. • Conceptually, very different. Correlation • Co-variation • Relationship or association • No direction or causation is implied • Y X X1 X2 Regression – Prediction of Y from X – Implies, but does not prove, causation – X (independent variable) Y (dependent variable) Briggs Henan University 2010 5 Regression line predicts
  • 6. • The most common statistic in all of science • measures the strength of the relationship (or “association”) between two variables e.g. income and education • Varies on a scale from –1 thru 0 to +1 +1 implies a perfect positive association • As values go up () on one, they also go up () on the other • income and education 0 implies no association -1 implies perfect negative association • As values go up on one () , they go down () on the other • price and quantity purchased • Full name is the Pearson Product Moment correlation coefficient, 6 Correlation Coefficient (r) Briggs Henan University 2010 -1 0 +1 () () () ()
  • 7. Examples of Scatter Diagrams and the Correlation Coefficient r = 1 perfect positive r = -1 r = 0.26 weak positive perfect negative Education Income r = -0.71 strong negative Price Quantity Positive Negative r = 0.72 strong positive 7 Briggs Henan University 2010
  • 8. Correlation Coefficient: example Briggs Henan University 2010 8 China Provinces 29 excludes Xizang/Tibet, Macao, Hong Kong, Hainan, Taiwan, P'eng-hu Correlation coefficient = 0.9458 (see later for calculation)
  • 9. Briggs Henan University 2010 9 Pearson Product Moment Correlation Coefficient (r) Where Sx and Sy are the standard deviations of X and Y, and X and Y are the means. y x n i i i S S n Y y X x r ) )( ( 1      Moments about the mean ) ( 1 2   N Y Y n i i Sy= ) ( 1 2   N X X n i i SX= “product” is the result of a multiplication X * Y = P
  • 10. Briggs UT-Dallas GISC 6382 Spring 2007 10 Calculation Formulae for Correlation Coefficient (r) Before the days of computers, these formulae where easier to do “by hand.” See next slide for example
  • 11. Calculating r for urban v. rural income 11 Province UrbanIncome x-xMean x-xMean2 RuralIncome y-ymean y-ymean2 (x-xM)(y-yM) Anhui 14086 -2376 5646195 4504 -1202 1444970 2856323 Zhejiang 24611 8149 66403391 10008 4302 18506611 35055694 Jiangxi 14022 -2440 5954441 5075 -631 398248 1539917 Jiangsu 20552 4090 16726690 8004 2298 5280487 9398142 Jilin 14006 -2456 6032783 5450 -256 65571 628950 Qinghai 12692 -3770 14214200 3346 -2360 5569926 8897867 Fujian 19557 3095 9577958 6880 1174 1378114 3633114 Heilongjiang 12566 -3896 15180159 5207 -499 249070 1944459 Henan 14372 -2090 4368821 4807 -899 808325 1879209 Hebei 14718 -1744 3042137 5150 -556 309213 969880 Hunan 15084 -1378 1899359 4910 -796 633726 1097120 Hubei 14367 -2095 4389747 5035 -671 450334 1406005 Xinjiang 12258 -4204 17675066 4005 -1701 2893636 7151587 Gansu 11930 -4532 20540587 2980 -2726 7431452 12355015 Guangxi 15451 -1011 1022470 3980 -1726 2979314 1745353 Guizhou 12863 -3599 12954042 3005 -2701 7295774 9721613 Liaoning 15800 -662 438472 6000 294 86395 -194633 Nei Mongol 15849 -613 375980 4938 -768 589930 470959 Ningxia 14025 -2437 5939809 4048 -1658 2749193 4041000 Beijing 26738 10276 105592633 11986 6280 39437534 64531489 Shanghai 28838 12376 153161108 12324 6618 43797011 81902373 Shanxi 13997 -2465 6077075 4244 -1462 2137646 3604252 Shandong 17811 1349 1819336 6119 413 170512 556973 Shaanxi 14129 -2333 5443694 3438 -2268 5144137 5291796 Sichuan 13904 -2558 6544246 4462 -1244 1547708 3182543 Tianjin 21430 4968 24679311 10675 4969 24690276 24684793 Yunnan 14424 -2038 4154147 3369 -2337 5461891 4763349 Guangdong 21574 5112 26130781 6906 1200 1439834 6133841 Chongqing 15749 -713 508615 4621 -1085 1177375 773841 SUM 477403 0.00 546493254 165476 0.00 184124210 300022824 AVERAGE 16462 0.00 18844595 5706 0.00 6349111 10345615 Sx= SQRT( 18844595) Sy= SQRT( 6349111) r = 10345615/4341/2520 = 4341 = 2520 = 0.9458
  • 12. Briggs Henan University 2010 12 Correlation Coefficient example using “calculation formulae” Scatter Diagram Source: Lee and Wong
  • 13. Regression • Simple regression – Between two variables • One dependent variable (Y) • One independent variable (X) • Multiple Regression – Between three or more variables • One dependent variable (Y) • Two or independent variable (X1 ,X2…) Briggs Henan University 2010 13 Y X Y X1 X2 education income gender*
  • 14. Briggs Henan University 2010 14 Simple Linear Regression • Concerned with “predicting” one variable (Y - the dependent variable) from another variable (X - the independent variable) Y = a +bX +  = residual= error = Yi-Ŷi =Actual (Yi ) – Predicted (Ŷi ) a b 1 Y X Yi X a is the intercept —the value of Y when X =0 b is the regression coefficient or slope of the line —the change in Y for a one unit change in X Ŷi 0 Regression line
  • 15. Ordinary Least Squares (OLS) --the standard criteria for obtaining the regression line Briggs Henan University 2010 15 Ŷi Yi The regression line minimizes the sum of the squared deviations between actual Yi and predicted Ŷi Min (Yi-Ŷi)2 Yi Ŷi
  • 16. Coefficient of Determination (r2 ) • The coefficient of determination (r2 ) measures the proportion of the variance in Y (the dependent variable) which can be predicted or “explained by” X (the independent variable). Varies from 1 to 0. • It equals the correlation coefficient (r) squared. 16 2 i 2 i 2 ) Y – Y ( ) Y – Ŷ (    r SS Regression or Explained Sum of Squares SS Total or Total Sum of Squares SS Residual or Error Sum of Squares SS Total or Total Sum of Squares 2 i i 2 i 2 i ) Ŷ – Y ( ) Y – Ŷ ( ) Y – Y (      SS Regression or Explained Sum of Squares Note: = +
  • 17. Partitioning the Variance on Y Briggs Henan University 2010 17 SS Residual or Error Sum of Squares SS Total or Total Sum of Squares 2 i i 2 i 2 i ) Ŷ – Y ( ) Y – Ŷ ( ) Y – Y (      SS Regression or Explained Sum of Squares 2 i i 2 i i 2 ) Y – Y ( ) Y – Ŷ (    r Y Y Y Y Y Y
  • 18. Standard Error of the Estimate (se) Briggs Henan University 2010 18 Measures predictive accuracy: the bigger the standard error, the greater the spread of the observations about the regression line, thus the predictions are less accurate Se 2 = error mean square, or average squared residual = variance of the estimate, variance about regression (called sigma-square in GeoDA) k n se    2 i i ) Ŷ – Y ( Sum of squared residuals Number of observations minus degrees of freedom (for simple regression, degrees of freedom = 2)
  • 19. r2 = r = 1 Se= 0.0 Sy = 2 b = 2 perfect positive r2 = 0.94 r = .97 Se= 0.3 r2 = 0.51 r = .71 Se= 1.1 b = 1.1 Very strong strong r2 = 0.26 r = .51 Se= 1.3 b = 0.8 r2 = 0.07 Se= 1.8 b = 0.1 r2 = r= 0.00 Se= Sy = 2 moderate weak none Regression line in blue As the coefficient of determination gets smaller, the slope of the regression line (b) gets closer to zero. As the coefficient of determination gets smaller, the standard error gets larger, and closer to the standard deviation of the dependent variable (Y) (Sy = 2) Coefficient of determination (r2 ), correlation coefficinet (r), regression coefficient (b), and standard error (Se) b = 0 (Values are hypothetical and for illustration of relative change only)
  • 20. Sample Statistics, Population Parameters and Statistical Significance tests Yi = a +bXi +i a and b are sample statistics which are estimates of population parameters α and β β (and b) measure the change in Y for a one unit change in X. If β = 0 then X has no effect on Y, therefore Null Hypothesis (H0): in the population β = 0 Alternative Hypothesis (H1): in the population β ≠ 0 Thus, we test if our sample regression coefficient, b, is sufficiently different from zero to reject the Null Hypothesis and conclude that X has a statistically significant affect on Y Briggs Henan University 2010 20 i i i ε βX α Y   
  • 21. Test Statistics in Simple Regression Test statistic for b is distributed according to the Student’s t Distribution (similar to normal): where is the variance of the estimate, with degrees of freedom = n – 2 A test can also be conducted on the coefficient of determination (r2 ) to test if it is significantly greater than zero, using the F frequency distribution. It is mathematically identical to the t test. Briggs Henan University 2010 21 2 / ) Ŷ – Y ( 1 / ) Y – Ŷ ( S.S./d.f. Residual S.S./d.f. Regression 2 i i 2 i      n F ) ( SE(b) 2 2     i e X X s b b t 2 e s
  • 22. Multiple regression Briggs Henan University 2010 22 ε X β β Y ε βX α Y       1 0 We can rewrite simple regression as: ε X β X β X β β Y m m       ... 2 2 1 1 0 Multiple regression: Y is predicted from 2 or more independent variables β0 is the intercept —the value of Y when values of all Xj = 0 β1… β m are partial regression coefficients which give the change in Y for a one unit change in Xj, all other X variables held constant m is the number of independent variables Y X1 X2 education income gender*
  • 23. Multiple regression: least squares criteria 23 residuals ) ˆ Predicted - Actual ( ) ˆ ( ) hyperplane n (regressio Y for values predicted ˆ 0 0            i i i i m j j ij i i m j j ij i Y Y Y Y b X Y e b X Y ε X β X β X β β Y m m       ... 2 2 1 1 0 ). Y (actual i 0 i m j j ij i ε β X Y    or 2 n 1 i ) ˆ ( Min i i Y Y    As in simple regression, the “least squares” criteria is used. Regression coefficients bj are chosen to minimize the sum of the squared residuals (the deviations between actual Yi and predicted Ŷi) The difference is that Ŷi is predicted from 2 or more independent variables, not one. Regression hyperplane
  • 24. Coefficient of Multiple Determination (R2 ) • Similar to simple regression, the coefficient of multiple determination (R2 ) measures the proportion of the variance in Y (the dependent variable) which can be predicted or “explained by” all of X variables in combination. Varies from 0 to 1. 24 2 i 2 i 2 ) Y – Y ( ) Y – Ŷ (    R SS Regression or Explained Sum of Squares SS Total or Total Sum of Squares SS Residual or Error Sum of Squares SS Total or Total Sum of Squares 2 i i 2 i 2 i ) Ŷ – Y ( ) Y – Ŷ ( ) Y – Y (      SS Regression or Explained Sum of Squares As with simple regression = + Formulae identical to simple regression
  • 25. Reduced or Adjusted • R2 will always increase each time another independent variable is included – an additional dimension is available for fitting the regression hyperplane (the multiple regression equivalent of the regression line) • Adjusted is normally used instead of R2 in multiple regression Briggs Henan University 2010 25 2 R 2 R ) 1 )( 1 ( 1 2 2 k n n R R      k is the number of coefficients in the regression equation, normally equal to the number of independent variables plus 1 for the intercept. Y X1 X2 perfect Not perfect
  • 26. Interpreting partial regression coefficients • The regression coefficients (bj) tell us the change in Y for a 1 unit change in Xj, all other X variables “held constant” • Can we compare these bj values to tell us the relative importance of the independent variables in affecting the dependent variable? – If b1 = 2 and b2 = 4, is the affect of X2 twice as big as the affect of X1 ? • No, no, no in general!!!! • The size of bj depends on the measurement scale used for each independent variable – if X1 is income, then a 1 unit change is $1 – but if X2 is rmb or Euro(€) or even cents ( ) ₵ 1 unit is not the same! – And if X2 is % population urban, 1 unit is very different • Regression coefficients are only directly comparable if the units are all the same: all $ for example 26 a b 1 Y X
  • 27. Standardized partial regression coefficients Comparing the Importance of Independent Variables • How do we compare the relative importance of independent variables? • We know we cannot use partial regression coefficients to directly compare independent variables unless they are all measured on the same scale • However, we can use standardized partial regression coefficients (also called beta weights, beta coefficients, or path coefficients). • They tell us the number of standard deviation (SD) unit changes in Y for a one SD change in X) • They are the partial regression coefficients if we had measured every variable in standardized form 27 ) s ( Y j j X j XY s b   Note the confusing use of β for both standardized partial regression coefficients and for the population parameter they estimate. s ) ( X x x z i i  
  • 28. Test Statistics in Multiple Regression: testing each independent variable A test can be conducted for each partial regression coefficient bj to test if the associated independent variable influences the dependent variable. It is distributed according to the Student’s t Distribution (similar to the normal frequency distribution): Null Hypothesis Ho : bj = 0 . Briggs Henan University 2010 28 ) SE(bj j b t  2 e s with degrees of freedom = n – k, where k is the number of coefficients in the regression equation, normally equal to the number of independent variables plus 1 for the intercept (m+1). The formula for calculating the standard error (SE) of bj is more complex than for simple regression , so it is not shown here.
  • 29. Test Statistics in Mutiple Regression testing the overall model • We test the coefficient of multiple determination (R2 ) to see if it is significantly greater than zero, using the F frequency distribution. • It is an overall test to see if at least one independent variable, or two or more in combination, affect the dependent variable. • Does not test if each and every independent variable has an effect • Similar to the F test in simple regression. – But unlike simple regression, it is not identical to the t tests. • It is possible (but unusual) for the F test to be significant but all t tests not significant. 29 k n l k F       / ) Ŷ – Y ( / ) Y – Ŷ ( S.S./d.f. Residual S.S./d.f. Regression 2 i i 2 i Again, k is the number of coefficients in the regression equation, normally equal to the number of variables (m) plus 1.
  • 30. Briggs Henan University 2010 30 Anscombe, Francis J. (1973). "Graphs in statistical analysis". The American Statistician 27: 17–21. Always look at your data Don’t just rely on the statistics! Anscombe's quartet Summary statistics are the same for all four data sets: mean (7.5), standard deviation (4.12), correlation (0.816) regression line (y = 3 + 0.5x).
  • 31. Briggs Henan University 2010 31 Waiting time between eruptions and the duration of the eruption for the Old Faithful Geyser in Yellowstone National Park, Wyoming, USA. This chart suggests there are generally two "types" of eruptions: short-wait-short- duration, and long-wait-long-duration. Source: Wikipedia Real data is almost always more complex than the simple, straight line relationship assumed in regression.
  • 32. Spurious relationships Briggs Henan University 2010 32 Eating ice cream inhibits swimming ability. --eat too much, you cannot swim Omitted variable problem --both are related to a third variable not included in the analysis Summer temperatures: --more people swim (and some drown) --more ice cream is sold Help!
  • 33. Regression does not prove direction or cause! Income and Illiteracy • Provinces with higher incomes can afford to spend more on education, so illiteracy is lower – Higher Income>>>>Less Illiteracy • The higher the level of literacy (and thus the lower the level of illiteracy) the more high income jobs. – Less Illiteracy>>>>Higher Income • Regression will not decide! Briggs Henan University 2010 33 Income Illiteracy Income Illiteracy
  • 34. Spatial Regression It doesn’t solve any of the problems just discussed! You always must examine your data! Briggs Henan University 2010 34
  • 35. Spatial Autocorrelation: shows the association or relationship between the same variable in “near- by” areas. Spatial Autocorrelation & Correlation Standard Correlation shows the association or relationship between two different variables 35 income education education Education “next door” In a neighboring or near-by area Each point is a geographic location Briggs Henan University 2010
  • 36. 36 If Spatial Autocorrelation exists: • correlation coefficients and coefficients of determination appear bigger than they really are • biased upward • You think the relationship is stronger than it really is • the variables in nearby areas affect each other • Standard errors appear smaller than they really are • exaggerated precision • You think your predictions are better than they really are since standard errors measure predictive accuracy • More likely to conclude relationship is statistically significant. Briggs Henan University 2010 (We discussed this in detail in the lecture on Spatial Autocorrelation concepts.) SE(b) b t 
  • 37. How do I know if I have a problem? For correlation, calculate Moran’s I for each variable and test its statistical significance – If Moran’s I is significant, you may have a problem! For regression, calculate the residuals Yi-Ŷi =Actual (Yi ) – Predicted (Ŷi ) Then: (1)map the residuals: do you see any spatial patterns? --if yes, you may have a problem (2)Calculate Moran’s I for the residuals: is it statistically significant? --if yes, you have a problem Briggs Henan University 2010 37
  • 38. What do I do if SA exists? • Acknowledge in your paper that SA exists and that the calculated correlation coefficients may be larger than their true value, and may not be statistically significant • Try to fix the problem! Briggs Henan University 2010 38
  • 39. How do I fix SA? Step 1: Try to identify omitted variables and include them in a multiple regression. • Missing (omitted) variables may cause spatial autocorrelation • Regression assumes all relevant variables influencing the dependent variable are included – If relevant variables are missing, model is misspecified Step 2: If additional variables cannot be identified, or SA still exists, use a spatial regression model Briggs Henan University 2010 39
  • 40. Spatial Regression: 4 Options 1. Spatial Autoregressive Models 1. Lag model 2. Error model 2. Spatial Filtering --based on eigenfunctions (Griffith) 3. Spatial Filtering --based on Ripley’s K and Getis-Ord G (Getis) 3. Others We will consider the first option only. – simpler and the more commonly used – Getis and Griffith 2002 compare the first three 40 Getis, A. and Daniel Griffith (2002) Comparative Spatial Filtering in Regression Analysis Geographical Analysis 34 (2) 130-140
  • 41. 41 • Spatial lag model values of the dependent variable in neighboring locations (WY) are included as an extra explanatory variable • these are the “spatial lag” of Y • Spatial error model values of the residuals in neighboring locations (Wε) are included as an extra term in the equation; • these are “spatial error” W is the spatial weights matrix Y = β0 + λ WY + Xβ + ε Y = β0 + Xβ + ρWε + ξ Spatial Lag and Spatial Error Models: mathematical comparison ξ is “white noise”
  • 42. Spatial Lag and Spatial Error Models: conceptual comparison Briggs Henan University 2010 42 OLS SPATIAL LAG SPATIAL ERROR Baller, R., L. Anselin, S. Messner, G. Deane and D. Hawkins. 2001. Structural covariates of US County homicide rates: incorporating spatial effects,. Criminology , 39, 561-590 Ordinary Least Squares No influence from neighbors Dependent variable influenced by neighbors Residuals influenced by neighbors
  • 43. Lag or Error Model: Which to use? • Lag model primarily controls spatial autocorrelation in the dependent variable • Error model controls spatial autocorrelation in the residuals, thus it controls autocorrelation in both the dependent and the independent variables • Conclusion: the error model is more robust and generally the better choice. • Statistical tests called the LM Robust test can also be used to select – Will not discuss these Briggs Henan University 2010 43
  • 44. Comparing our models • Which model best predicts the dependent variable? • Neither R2 nor Adjusted can be used to compare different spatial regression models • Instead, we use Akaike Information Criteria (AIC) – the smaller the AIC value the better the model Note: can only be used to compare models with the same dependent variable Briggs Henan University 2010 44 2 R )] Squares of Sum Residual [ln( 2 n k AIC   k is the number of coefficients in the regression equation, normally equal to the number of independent variables plus 1 for the intercept term. Akaike, Hirotuga (1974) A new look at statistical model identification IEEE Transactions on Automatic Control 19 (6) 716-723
  • 45. Briggs Henan University 2010 45 Geographically Weighted Regression • The idea of Local Indicators can also be applied to regression • Its called geographically weighted regression • It calculates a separate regression for each polygon and its neighbors, – then maps the parameters from the model, such as the regression coefficient (b) and/or its significance value • Mathematically, this is done by applying the spatial weights matrix (Wij) to the standard formulae for regression See Fotheringham, Brunsdon and Charlton Geographically Weighted Regression Wiley, 2002 Xi
  • 46. Problems with Geographically Weighted Regression • Each regression is based on few observations – the estimates of the regression parameters (b) are unreliable • Need to use more observations than just those with shared border, but – how far out do we go? – How far out is the “local effect”? • Need strong theory to explain why the regression parameters are different at different places • Serious questions about validity of statistical inference tests since observations not independent Briggs Henan University 2010 46 Xi
  • 47. What have we learned today? • Correlation and regression are very good tools for science. • Spatial data can cause problems with standard correlation and regression • The problems are caused by spatial autocorrelation • We need to use Spatial Regression Models • Geographers and GIS specialists are experts on spatial data – They need to understand these issues! Briggs Henan University 2010 47