SlideShare a Scribd company logo
Week 4.1: Model Comparison
! Lab: Interactions Practice
! Model Comparison
! Nested Models
! Hypothesis Testing
! REML vs ML
! Non-Nested Models
! Shrinkage
! The Problem
! Solutions
Interpreting Interactions
• Numerical interaction term tells us how the
interaction works:
• Strengthens individual effects with the same sign
as the interaction
• Weakens individual effects with a different sign as
the interaction
• Or, again, just look at the graph ☺
Interpreting Interactions Practice
• Dependent variable: Classroom learning
• Independent variable 1: Intrinsic motivation
• Learning because you want to learn (intrinsic) vs.
to get a good grade (extrinsic)
• Intrinsic motivation has a + effect on learning
• Independent variable 2: Autonomy language
• “You can…” (vs. “You must…”)
• Also has a + effect on learning
• Motivation x autonomy interaction is +
• Interpretation: Combining intrinsic
motivation and autonomy
language especially benefits
learning
• “Synergistic” interaction
Vansteenkiste
et al., 2004,
JPSP
Interpreting Interactions Practice
• Dependent variable: Satisfaction with a
consumer purchase
• Number of choices: - effect on
satisfaction
• “Maximizing” strategy: - effect on satisfaction
• Trying to find the best option vs. “good enough”
• Choices x maximizing strategy is -
• Interpretation: Having lots
of choices when you’re a
maximizer especially
reduces satisfaction
• Also a synergistic
interaction
(Carrillat, Ladik, & Legoux, 2011; Marketing Letters)
Week 4.1: Model Comparison
! Lab: Interactions Practice
! Model Comparison
! Nested Models
! Hypothesis Testing
! REML vs ML
! Non-Nested Models
! Shrinkage
! The Problem
! Solutions
Model Formulae Practice
• Write the R formula for each model:
• 1) We’re interested in the effects of FamilySES,
PriorNightSleep, and Nutrition on MathTest
Performance, but we don’t expect them to interact
• 2) We factorially manipulated SentenceType (active
or passive) and Plausibility (low or high) in a test
of TextComprehensionAccuracy
Model Formulae Practice
• Write the R formula for each model:
• 1) We’re interested in the effects of FamilySES,
PriorNightSleep, and Nutrition on MathTest
Performance, but we don’t expect them to interact
• MathPerformance ~ 1 + SES + Sleep +
Nutrition
• 2) We factorially manipulated SentenceType (active
or passive) and Plausibility (low or high) in a test
of TextComprehensionAccuracy
• ComprehensionAccuracy ~ 1 + SentenceType +
Plausibility + SentenceType:Plausibility
or
ComprehensionAccuracy ~ 1 +
SentenceType*Plausibility
Interpreting Interactions Practice
• Second language proficiency: + effect on
translation accuracy
• Word frequency: + effect on accuracy
• Frequency x proficiency interaction is -
• Interpretation: Proficiency matters less when translating
high frequency words
• Or: Difference between high & low proficiency words gets
smaller if you have high proficiency
• “Antagonistic” interaction. Combining the effects reduces or
reverses the individual effects.
(e.g., Diependaele, Lemhöfer,
Brysbaert, 2012, QJEP)
Interpreting Interactions Practice
• Retrieval practice: + effect on long-term
learning
• Working memory span: + effect on learning
• Retrieval practice x WM span interaction is -
(Agarwal et al., 2016)
• Interpretation: Retrieval practice is especially
beneficial for people with low working memory.
• Or: Low WM confers less of a disadvantage if you
do retrieval practice
Interpreting Interactions Practice
• Affectionate touch: + effect on feeling of
relationship security
• Avoidant attachment style: - effect on security
• Touch x avoidant attachment interaction is -
• Interpretation: Affectionate touch enhances
relationship security less for people with
an avoidant attachment style
(Jakubiak & Feeney, SPPS, 2016)
Interpreting Interactions Practice
• Age: - effect on picture memory
• Older adults have poorer memory
• Emotional valence: - effect on accuracy
• Positive pictures are not remembered as well
compared to negative pictures
• Age x Valence interaction is +
• Interpretation: Age declines are smaller for positive pictures
• Or: Disadvantage of positive pictures is not as strong for
older adults
(e.g., Mather & Carstensen, 2005, TiCS)
Week 4.1: Model Comparison
! Lab: Interactions Practice
! Model Comparison
! Nested Models
! Hypothesis Testing
! REML vs ML
! Non-Nested Models
! Shrinkage
! The Problem
! Solutions
Model Comparison
• Sometimes, we may have more than 1 model
that we could consider applying to the data
• 2 or more competing theoretical models
• e.g., critical period in language acquisition
No critical period (Vanhove, 2013)
Critical period hypothesis
(Hartshorne et al., 2020)
1 + AgeOfAcquisition
1 + AgeOfAcquisition*CriticalPeriod
Model Comparison
• Sometimes, we may have more than 1 model
that we could consider applying to the data
• 2 or more competing theoretical models
• Exploratory analysis where we don’t yet know
which model would be appropriate
Dataset
! Social support & health (e.g., Cohen & Wills, 1985)
! lifeexpectancy.csv:
! Longitudinal study of 1000 subjects – some
siblings from same family, so 517 total families
! Perceived social support (z-scored)
! Lifespan
! And several control variables
Nested Models
! Three possible models of life expectancy:
! Amount of weekly exercise
! Amount of weekly exercise & perceived social
support
! Amount of weekly exercise, perceived social
support, years of education, conscientiousness,
yearly income, and number of vowels in your last
name
! These are nested models—each one can be
formed by subtracting variables from the one
below it (“nested inside it”)
Nested Models
Nested Models
! Three possible models of life expectancy:
! Amount of weekly exercise
! Amount of weekly exercise & perceived social
support
! Amount of weekly exercise, perceived social
support, years of education, conscientiousness,
yearly income, and number of vowels in your last
name
! Which set of information would give us the
most accurate fitted() values?
Nested Models
! Three possible models of life expectancy:
! Amount of weekly exercise
! Amount of weekly exercise & perceived social
support
! Amount of weekly exercise, perceived social
support, years of education, conscientiousness,
yearly income, and number of vowels in your last
name
• The “biggest” nested model will always provide
predictions that are at least as good
• Adding info can only explain more of the variance
Nested Models
• The “biggest” nested model will always provide
predictions that are at least as good
• Adding info can only explain more of the variance
• Might not be much better (“number of vowels”
effect zero or close to zero) but can’t be worse
Slope of regression
line relating last
name vowels to life
expectancy is near 0
But that merely fails
to improve
predictions; doesn’t
hurt them
Week 4.1: Model Comparison
! Lab: Interactions Practice
! Model Comparison
! Nested Models
! Hypothesis Testing
! REML vs ML
! Non-Nested Models
! Shrinkage
! The Problem
! Solutions
Hypothesis Testing
! Let’s think about our first two models:
! Comparing these two statistical models closely
relates to our research question: Which theoretical
model best explains data?
! The theoretical model where social support does affect life
expectancy
! The model where social support doesn’t affect life
expectancy
E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport
model1
E(Yi(j)) = γ00 + γ10HrsExercise
model2
Hypothesis Testing
! Let’s think about our first two models:
! What are some possible values of γ20 (the
SocSupport effect) in model1?
! 3.83
! -1.04
! 0 – there is no social support effect
E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport
model1
E(Yi(j)) = γ00 + γ10HrsExercise
model2
! Let’s think about our first two models:
! What happens when γ20 is equal to 0?
! Anything multiplied by 0 is 0, so SocSupport just
drops out of the equation
! Becomes the same thing as model2
E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport
Hypothesis Testing
0
E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport
model1
E(Yi(j)) = γ00 + γ10HrsExercise
model2
Hypothesis Testing
! Let’s think about our first two models:
! model2 is just a special case of model1
! The version of model1 where γ20 happens to be 0
! One of many possible versions of model1
! Why we say model2 is “nested” in model1
E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport
E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport
model1
E(Yi(j)) = γ00 + γ10HrsExercise
model2
0
Hypothesis Testing
! Let’s think about our first two models:
! This also helps show why model1 always
fits as well as model2 or better
! model1 can account for the case where γ20 = 0
! But it can also account for many other cases, too
E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport
E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport
model1
E(Yi(j)) = γ00 + γ10HrsExercise
model2
0
Likelihood Ratio Test
! We can compare nested models (only) using
the likelihood-ratio test
! Remember that likelihood is what we search for in
fitting an individual model (find the values with the
highest likelihood)
! First, fit each of the models to be compared
! model1 <- lmer(Lifespan ~ 1 +
HrsExercise + SocSupport + (1|Family),
data=lifeexpectancy)
! model2 <- lmer(Lifespan ~ 1 +
HrsExercise + (1|Family),
data=lifeexpectancy)
Likelihood Ratio Test
• Then, compare them with anova():
• anova(model1, model2)
• Order doesn’t matter
• Differences in (log) likelihoods are
distributed as a chi-square
• d.f. = # of variables added or removed
• Here, χ2
(1) = 8.67, p = .003
Log likelihood will also be somewhat higher (better) for the
complex model … but is it SIGNIFICANTLY better?
We’ll discuss what
this means in a
moment (don’t
worry; it’s what we
want)
Likelihood Ratio Test
• t-test and LR test are very similar!
• t-test: Tests whether an effect differs from 0,
based on this model
• Likelihood ratio: Compare to a model where the
effect actually IS constrained to be 0
• With an infinitely large sample, these two
tests would produce identical conclusions
• With small sample, t-test is less likely to
detect spurious differences (Luke, 2017)
• But, large differences uncommon
Likelihood Ratio Test
• t-test and LR test are very similar!
• t-test: Tests whether an effect differs from 0,
based on this model
• Likelihood ratio: Compare to a model where the
effect actually IS constrained to be 0
p-value from likelihood
ratio test: .0032
p-value from lmerTest t-
test: .0033
Likelihood Ratio Test
• t-test and LR test are very similar!
• t-test: Tests whether an effect differs from 0,
based on this model
• Likelihood ratio: Compare to a model where the
effect actually IS constrained to be 0
• Guidance:
• LR test is useful for testing groups of variable
• model1 <- lmer(Lifespan ~ 1 + HrsExercise …)
• model3 <- lmer(Lifespan ~ 1 + HrsExercise +
SocSupport + YrsEducation +
Conscientiousness …)
• If testing just one variable at a time, use t-test—
slightly less likely to produce Type I error
Week 4.1: Model Comparison
! Lab: Interactions Practice
! Model Comparison
! Nested Models
! Hypothesis Testing
! REML vs ML
! Non-Nested Models
! Shrinkage
! The Problem
! Solutions
REML vs ML
• Technically, two different algorithms that R can
use “behind the scenes” to get the estimates
# REML: Restricted Maximum Likelihood
• Assumes the fixed effects structure is correct
• Bad for comparing models that differ in fixed effects
# ML: Maximum Likelihood
• OK for comparing models
• But, may underestimate variance of random effects
• Ideal: ML for model comparison, REML for final
results
• lme4 does this automatically for you!
• Defaults to REML. But automatically refits models
with ML when you do likelihood ratio test.
REML vs ML
• The one time you might want to mess with this:
• If you are going to be doing a lot of model
comparisons, can fit the model with ML to begin
with
• model1 <- lmer(DV ~ 1 + Predictors,
data=lifeexpectancy, REML=FALSE)
• Saves refitting for each comparison
• Remember to refit the model with REML=TRUE
for your final results
Week 4.1: Model Comparison
! Lab: Interactions Practice
! Model Comparison
! Nested Models
! Hypothesis Testing
! REML vs ML
! Non-Nested Models
! Shrinkage
! The Problem
! Solutions
Non-Nested Models
• Which of these pairs is not a case of nested
models?
• A
• Accuracy ~ SentenceType + Aphasia +
SentenceType:Aphasia
• Accuracy ~ SentenceType + Aphasia
• B
• MathAchievement ~ SocioeconomicStatus
• MathAchievement ~ TeacherRating + ClassSize
• C
• Recall ~ StudyTime
• Recall ~ StudyTime + StudyStrategy
Non-Nested Models
• Which of these pairs is not a case of nested
models?
• A
• Accuracy ~ SentenceType + Aphasia +
SentenceType:Aphasia
• Accuracy ~ SentenceType + Aphasia
• B
• MathAchievement ~ SocioeconomicStatus
• MathAchievement ~ TeacherRating + ClassSize
• Each of these models has something that the other doesn’t have.
Non-Nested Models
• Models that aren’t nested can’t be tested the
same way
• A non-nested comparison:
• What would support 1st model over 2nd?
• γ20 is significantly greater than 0, but also γ10 is 0
• But remember we can’t test that something is 0 with
frequentist statistics … can’t prove the H0 is true
• Parametric statistics don’t apply here $
E(Yi(j)) = γ00 + γ10YrsEducation + γ20IncomeThousands
E(Yi(j)) = γ00 + γ10YrsEducation + γ20IncomeThousands
0
0
Non-Nested Models: Comparison
• Can be compared with information criteria
• Remember our fitted values from last week?
• fitted(model2)
• What if we replaced all of our observations with
just the fitted (predicted) values?
• We’d be losing some information
• However, if the model predicted the data well, we
would not be losing that much
• Information criteria measure how much information is
lost with the fitted values (so, lower is better)
Non-Nested Models: Comparison
• AIC: An Information Criterion or Akaike’s Information Criterion
• -2(log likelihood) + 2k
• k = # of fixed and random effects in a particular model
• A model with a lower AIC is better
Akaike, 1974
Non-Nested Models: Comparison
• AIC: An Information Criterion or Akaike’s Information Criterion
• -2(log likelihood) + 2k
• k = # of fixed and random effects in a particular model
• A model with a lower AIC is better
• Doesn’t assume any of the models is correct
• Appropriate for correlational / non-experimental data
• BIC: Bayesian Information Criterion
• -2(log likelihood) + log(n)k
• k = # of fixed & random effects, n = num. observations
• A model with a lower BIC is better
• Typically prefers simpler models than AIC
• Assumes that there’s a “true” underlying model in the
set of variables being considered
• Appropriate for experimental data Yang, 2005; Oehlert, 2012
Non-Nested Models: Comparison
• Can also get these from anova(model1, model2)
• Just ignore the chi-square if non-nested models
• AIC and BIC do not have a significance test
associated with them
• The model with the lower AIC/BIC is preferred, but
we don’t know how reliable this preference is
Week 4.1: Model Comparison
! Lab: Interactions Practice
! Model Comparison
! Nested Models
! Hypothesis Testing
! REML vs ML
! Non-Nested Models
! Shrinkage
! The Problem
! Solutions
Shrinkage
• The “Madden curse”…
• Each year, a top NFL football player is picked to
appear on the cover of the Madden NFL video
game
• That player often doesn’t
play as well in the following
season
• Is the cover “cursed”?
Shrinkage
• The “Madden curse”…
• Each year, a top NFL football player is picked to
appear on the cover of the Madden NFL video
game
• That player often doesn’t
play as well in the following
season
• Is the cover “cursed”?
Shrinkage
• What’s needed to be one of the top NFL players
in a season?
• You have to be a good player
• Genuine predictor (signal)
• And, luck on your side
• Random chance or error
• Top-performing player probably
very good and very lucky
• The next season…
• Your skill may persist
• Random chance probably won’t
• Regression to the mean
• Madden video game cover imperfect predicts next
season’s performance because it was partly based
on random error
Shrinkage
• Our estimates (& any choice of variables
based on them) always partially reflect random
chance in the dataset we used to obtain them
• Won’t fit any later data set quite
as well … shrinkage
• Problem when we’re using the
data to decide the model
Shrinkage
• Our estimates (& any choice of variables
based on them) always partially reflect random
chance in the dataset we used to obtain them
• Won’t fit any later data set quite
as well … shrinkage
• “If you use a sample to construct a model, or to
choose a hypothesis to test, you cannot make a
rigorous scientific test of the model or the hypothesis
using that same sample data.”
(Babyak, 2004, p. 414)
Shrinkage—Examples
• Relations that we observe between a predictor
variable and a dependent variable might simply
be capitalizing on random chance
• U.S. government puts out 45,000 economic
statistics each year (Silver, 2012)
• Can we use these to predict whether US economy
will go into recession?
• With 45,000 predictors, we are very likely to find a
spurious relation by chance
• Especially w/ only 15
recessions since
the end of WW II
Shrinkage—Examples
• Relations that we observe between a predictor
variable and a dependent variable might simply
be capitalizing on random chance
• U.S. government puts out 45,000 economic
statistics each year (Silver, 2012)
• Can we use these to predict whether US economy
will go into recession?
• With 45,000 predictors, we are very likely to find a
spurious relation by chance
• Significance tests try to address this … but with
45,000 predictors, we are likely to find significant
effects by chance (5% Type I error rate at ɑ=.05)
Shrinkage—Examples
• Adak Island, Alaska
• Daily temperature here predicts
stock market activity!
• r = -.87 correlation with the price
of a specific group of stocks!
• Completely true—I’m not making this up!
• Problem with this:
• With thousands of weather stations & stocks, easy to find a
strong correlation somewhere, even if it’s just sampling error
• Problem is that this factoid doesn’t reveal all of the other (non-
significant) weather stations & stocks we searched through
• Would only be impressive if this hypothesis continued to be
true on a new set of weather data & stock prices
Vul et al., 2009
Shrinkage—Examples
• “Puzzlingly high correlations” in some fMRI work
• Correlate each voxel in a brain scan with a behavioral
measure (e.g., personality survey)
• Restrict the analysis to voxels where
the correlation is above some threshold
• Compute final correlation in this region
with behavioral measure—very high!
• Problem: Voxels were already chosen based on
those high correlations
• Includes sampling error favoring the correlation but
excludes error that doesn’t
Vul et al., 2009
Week 4.1: Model Comparison
! Lab: Interactions Practice
! Model Comparison
! Nested Models
! Hypothesis Testing
! REML vs ML
! Non-Nested Models
! Shrinkage
! The Problem
! Solutions
Shrinkage—Solutions
• One solution: Select model(s) in advance
(perhaps even pre-registered)
• A theory is valuable for this
• Adak Island example is implausible in part because there’s
no causal reason why an island in Alaska would relate to
stock prices
“Just as you do not need to know exactly how a car engine
works in order to drive safely, you do not need to
understand all the intricacies of the economy to accurately
read those gauges.” – Economic forecasting firm ECRI
(quoted in Silver, 2012)
Shrinkage—Solutions
• One solution: Select model(s) in advance
(perhaps even pre-registered)
• A theory is valuable for this
• Not driven purely by the data or by chance if we have an a
priori reason to favor this variable
“There is really nothing so practical as a good theory.”
-- Social psychologist Kurt Lewin (Lewin’s Maxim)
Shrinkage—Solutions
• One solution: Select model(s) in advance
(perhaps even pre-registered)
• A theory is valuable for this
• Not driven purely by the data or by chance if we have an a
priori reason to favor this variable
• Based on some other measure (e.g., another brain
scan)
Shrinkage—Solutions
• One solution: Select model(s) in advance
(perhaps even pre-registered)
• A theory is valuable for this
• Not driven purely by the data or by chance if we have an a
priori reason to favor this variable
• Based on some other measure (e.g., another brain
scan)
• Based on research design
• For factorial experiments, typical to include all
experimental variables and interactions
• Research design implies you were interested in all of these
Shrinkage—Solutions
• For more exploratory analyses: Show that the
finding replicates
• On a second dataset
• Test a model obtained from one subset of the data
applies to another subset (cross-validation)
• e.g., training and test sets
• A better version: Do this with
many randomly chosen subsets
• Monte Carlo methods
• Reading on Canvas for some
general ways to do this in R

More Related Content

PDF
Mixed Effects Models - Fixed Effect Interactions
PDF
Mixed Effects Models - Random Intercepts
PDF
Mixed Effects Models - Random Slopes
PDF
Mixed Effects Models - Data Processing
PDF
Mixed Effects Models - Descriptive Statistics
PDF
Mixed Effects Models - Centering and Transformations
PDF
Mixed Effects Models - Orthogonal Contrasts
PDF
Mixed Effects Models - Fixed Effects
Mixed Effects Models - Fixed Effect Interactions
Mixed Effects Models - Random Intercepts
Mixed Effects Models - Random Slopes
Mixed Effects Models - Data Processing
Mixed Effects Models - Descriptive Statistics
Mixed Effects Models - Centering and Transformations
Mixed Effects Models - Orthogonal Contrasts
Mixed Effects Models - Fixed Effects

What's hot (20)

PDF
Mixed Effects Models - Introduction
PDF
Mixed Effects Models - Simple and Main Effects
PDF
Mixed Effects Models - Crossed Random Effects
PDF
Mixed Effects Models - Effect Size
PDF
Mixed Effects Models - Growth Curve Analysis
PDF
Time series forecasting
PDF
Mixed Effects Models - Autocorrelation
PDF
Introduction to Apache Cassandra
PPTX
Arima model
PDF
APRIORI Algorithm
PPTX
Time series analysis
PPTX
When to Use MongoDB...and When You Should Not...
PDF
Time-series Analysis in Minutes
ZIP
NoSQL databases
PPTX
How Criteo is managing one of the largest Kafka Infrastructure in Europe
PDF
Prophet at Scale: Using Prophet at scale to tune and forecast time series at ...
PDF
Model selection and cross validation techniques
PPTX
Rate limiting
PPTX
Reporting Phi Coefficient test in APA
PPTX
Zookeeper Architecture
Mixed Effects Models - Introduction
Mixed Effects Models - Simple and Main Effects
Mixed Effects Models - Crossed Random Effects
Mixed Effects Models - Effect Size
Mixed Effects Models - Growth Curve Analysis
Time series forecasting
Mixed Effects Models - Autocorrelation
Introduction to Apache Cassandra
Arima model
APRIORI Algorithm
Time series analysis
When to Use MongoDB...and When You Should Not...
Time-series Analysis in Minutes
NoSQL databases
How Criteo is managing one of the largest Kafka Infrastructure in Europe
Prophet at Scale: Using Prophet at scale to tune and forecast time series at ...
Model selection and cross validation techniques
Rate limiting
Reporting Phi Coefficient test in APA
Zookeeper Architecture
Ad

Similar to Mixed Effects Models - Model Comparison (20)

DOCX
35878 Topic Discussion5Number of Pages 1 (Double Spaced).docx
PPTX
PSY 150 403 Chapter 9 SLIDES
PDF
M08 BiasVarianceTradeoff
PDF
DS-38data sciencehandbooknotescompiled-46.pdf
PPTX
NLP_KASHK:Evaluating Language Model
DOCX
8 Week Curriculum Map
PDF
Mixed Effects Models - Missing Data
PDF
Endogeneity and Entrepreneurship Research
PDF
Mba724 s2 w1 elements of scientific research
DOCX
1RUNNING HEAD METHODS AND RESULTS1RUNNING HEAD METHODS.docx
DOCX
35881 DiscussionNumber of Pages 1 (Double Spaced)Number o.docx
DOCX
Statistical Calculations 5Statistical Calculations.docx
DOCX
BUS 308 Week 3 Lecture 1 Examining Differences - Continued.docx
PPTX
Personalized Learning_1.pptx
PPTX
Gender, language, and Twitter: Social theory and computational methods
PPTX
Gender and language (linguistics, social network theory, Twitter!)
DOCX
Seawell_Exam
PPT
How Do Coping Strategies Correlate With Job Satisfaction Revised
DOCX
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
DOCX
Running head Organization behaviorOrganization behavior 2.docx
35878 Topic Discussion5Number of Pages 1 (Double Spaced).docx
PSY 150 403 Chapter 9 SLIDES
M08 BiasVarianceTradeoff
DS-38data sciencehandbooknotescompiled-46.pdf
NLP_KASHK:Evaluating Language Model
8 Week Curriculum Map
Mixed Effects Models - Missing Data
Endogeneity and Entrepreneurship Research
Mba724 s2 w1 elements of scientific research
1RUNNING HEAD METHODS AND RESULTS1RUNNING HEAD METHODS.docx
35881 DiscussionNumber of Pages 1 (Double Spaced)Number o.docx
Statistical Calculations 5Statistical Calculations.docx
BUS 308 Week 3 Lecture 1 Examining Differences - Continued.docx
Personalized Learning_1.pptx
Gender, language, and Twitter: Social theory and computational methods
Gender and language (linguistics, social network theory, Twitter!)
Seawell_Exam
How Do Coping Strategies Correlate With Job Satisfaction Revised
BUS308 – Week 1 Lecture 2 Describing Data Expected Out.docx
Running head Organization behaviorOrganization behavior 2.docx
Ad

More from Scott Fraundorf (7)

PDF
Mixed Effects Models - Signal Detection Theory
PDF
Mixed Effects Models - Power
PDF
Mixed Effects Models - Empirical Logit
PDF
Mixed Effects Models - Logit Models
PDF
Mixed Effects Models - Post-Hoc Comparisons
PDF
Mixed Effects Models - Level-2 Variables
PDF
Scott_Fraundorf_Resume
Mixed Effects Models - Signal Detection Theory
Mixed Effects Models - Power
Mixed Effects Models - Empirical Logit
Mixed Effects Models - Logit Models
Mixed Effects Models - Post-Hoc Comparisons
Mixed Effects Models - Level-2 Variables
Scott_Fraundorf_Resume

Recently uploaded (20)

PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Pharma ospi slides which help in ospi learning
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
master seminar digital applications in india
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Presentation on HIE in infants and its manifestations
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Cell Structure & Organelles in detailed.
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Pharma ospi slides which help in ospi learning
O7-L3 Supply Chain Operations - ICLT Program
O5-L3 Freight Transport Ops (International) V1.pdf
master seminar digital applications in india
FourierSeries-QuestionsWithAnswers(Part-A).pdf
RMMM.pdf make it easy to upload and study
VCE English Exam - Section C Student Revision Booklet
Supply Chain Operations Speaking Notes -ICLT Program
Chinmaya Tiranga quiz Grand Finale.pdf
Anesthesia in Laparoscopic Surgery in India
Presentation on HIE in infants and its manifestations
Abdominal Access Techniques with Prof. Dr. R K Mishra
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
2.FourierTransform-ShortQuestionswithAnswers.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Cell Structure & Organelles in detailed.
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx

Mixed Effects Models - Model Comparison

  • 1. Week 4.1: Model Comparison ! Lab: Interactions Practice ! Model Comparison ! Nested Models ! Hypothesis Testing ! REML vs ML ! Non-Nested Models ! Shrinkage ! The Problem ! Solutions
  • 2. Interpreting Interactions • Numerical interaction term tells us how the interaction works: • Strengthens individual effects with the same sign as the interaction • Weakens individual effects with a different sign as the interaction • Or, again, just look at the graph ☺
  • 3. Interpreting Interactions Practice • Dependent variable: Classroom learning • Independent variable 1: Intrinsic motivation • Learning because you want to learn (intrinsic) vs. to get a good grade (extrinsic) • Intrinsic motivation has a + effect on learning • Independent variable 2: Autonomy language • “You can…” (vs. “You must…”) • Also has a + effect on learning • Motivation x autonomy interaction is + • Interpretation: Combining intrinsic motivation and autonomy language especially benefits learning • “Synergistic” interaction Vansteenkiste et al., 2004, JPSP
  • 4. Interpreting Interactions Practice • Dependent variable: Satisfaction with a consumer purchase • Number of choices: - effect on satisfaction • “Maximizing” strategy: - effect on satisfaction • Trying to find the best option vs. “good enough” • Choices x maximizing strategy is - • Interpretation: Having lots of choices when you’re a maximizer especially reduces satisfaction • Also a synergistic interaction (Carrillat, Ladik, & Legoux, 2011; Marketing Letters)
  • 5. Week 4.1: Model Comparison ! Lab: Interactions Practice ! Model Comparison ! Nested Models ! Hypothesis Testing ! REML vs ML ! Non-Nested Models ! Shrinkage ! The Problem ! Solutions
  • 6. Model Formulae Practice • Write the R formula for each model: • 1) We’re interested in the effects of FamilySES, PriorNightSleep, and Nutrition on MathTest Performance, but we don’t expect them to interact • 2) We factorially manipulated SentenceType (active or passive) and Plausibility (low or high) in a test of TextComprehensionAccuracy
  • 7. Model Formulae Practice • Write the R formula for each model: • 1) We’re interested in the effects of FamilySES, PriorNightSleep, and Nutrition on MathTest Performance, but we don’t expect them to interact • MathPerformance ~ 1 + SES + Sleep + Nutrition • 2) We factorially manipulated SentenceType (active or passive) and Plausibility (low or high) in a test of TextComprehensionAccuracy • ComprehensionAccuracy ~ 1 + SentenceType + Plausibility + SentenceType:Plausibility or ComprehensionAccuracy ~ 1 + SentenceType*Plausibility
  • 8. Interpreting Interactions Practice • Second language proficiency: + effect on translation accuracy • Word frequency: + effect on accuracy • Frequency x proficiency interaction is - • Interpretation: Proficiency matters less when translating high frequency words • Or: Difference between high & low proficiency words gets smaller if you have high proficiency • “Antagonistic” interaction. Combining the effects reduces or reverses the individual effects. (e.g., Diependaele, Lemhöfer, Brysbaert, 2012, QJEP)
  • 9. Interpreting Interactions Practice • Retrieval practice: + effect on long-term learning • Working memory span: + effect on learning • Retrieval practice x WM span interaction is - (Agarwal et al., 2016) • Interpretation: Retrieval practice is especially beneficial for people with low working memory. • Or: Low WM confers less of a disadvantage if you do retrieval practice
  • 10. Interpreting Interactions Practice • Affectionate touch: + effect on feeling of relationship security • Avoidant attachment style: - effect on security • Touch x avoidant attachment interaction is - • Interpretation: Affectionate touch enhances relationship security less for people with an avoidant attachment style (Jakubiak & Feeney, SPPS, 2016)
  • 11. Interpreting Interactions Practice • Age: - effect on picture memory • Older adults have poorer memory • Emotional valence: - effect on accuracy • Positive pictures are not remembered as well compared to negative pictures • Age x Valence interaction is + • Interpretation: Age declines are smaller for positive pictures • Or: Disadvantage of positive pictures is not as strong for older adults (e.g., Mather & Carstensen, 2005, TiCS)
  • 12. Week 4.1: Model Comparison ! Lab: Interactions Practice ! Model Comparison ! Nested Models ! Hypothesis Testing ! REML vs ML ! Non-Nested Models ! Shrinkage ! The Problem ! Solutions
  • 13. Model Comparison • Sometimes, we may have more than 1 model that we could consider applying to the data • 2 or more competing theoretical models • e.g., critical period in language acquisition No critical period (Vanhove, 2013) Critical period hypothesis (Hartshorne et al., 2020) 1 + AgeOfAcquisition 1 + AgeOfAcquisition*CriticalPeriod
  • 14. Model Comparison • Sometimes, we may have more than 1 model that we could consider applying to the data • 2 or more competing theoretical models • Exploratory analysis where we don’t yet know which model would be appropriate
  • 15. Dataset ! Social support & health (e.g., Cohen & Wills, 1985) ! lifeexpectancy.csv: ! Longitudinal study of 1000 subjects – some siblings from same family, so 517 total families ! Perceived social support (z-scored) ! Lifespan ! And several control variables
  • 16. Nested Models ! Three possible models of life expectancy: ! Amount of weekly exercise ! Amount of weekly exercise & perceived social support ! Amount of weekly exercise, perceived social support, years of education, conscientiousness, yearly income, and number of vowels in your last name ! These are nested models—each one can be formed by subtracting variables from the one below it (“nested inside it”)
  • 18. Nested Models ! Three possible models of life expectancy: ! Amount of weekly exercise ! Amount of weekly exercise & perceived social support ! Amount of weekly exercise, perceived social support, years of education, conscientiousness, yearly income, and number of vowels in your last name ! Which set of information would give us the most accurate fitted() values?
  • 19. Nested Models ! Three possible models of life expectancy: ! Amount of weekly exercise ! Amount of weekly exercise & perceived social support ! Amount of weekly exercise, perceived social support, years of education, conscientiousness, yearly income, and number of vowels in your last name • The “biggest” nested model will always provide predictions that are at least as good • Adding info can only explain more of the variance
  • 20. Nested Models • The “biggest” nested model will always provide predictions that are at least as good • Adding info can only explain more of the variance • Might not be much better (“number of vowels” effect zero or close to zero) but can’t be worse Slope of regression line relating last name vowels to life expectancy is near 0 But that merely fails to improve predictions; doesn’t hurt them
  • 21. Week 4.1: Model Comparison ! Lab: Interactions Practice ! Model Comparison ! Nested Models ! Hypothesis Testing ! REML vs ML ! Non-Nested Models ! Shrinkage ! The Problem ! Solutions
  • 22. Hypothesis Testing ! Let’s think about our first two models: ! Comparing these two statistical models closely relates to our research question: Which theoretical model best explains data? ! The theoretical model where social support does affect life expectancy ! The model where social support doesn’t affect life expectancy E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport model1 E(Yi(j)) = γ00 + γ10HrsExercise model2
  • 23. Hypothesis Testing ! Let’s think about our first two models: ! What are some possible values of γ20 (the SocSupport effect) in model1? ! 3.83 ! -1.04 ! 0 – there is no social support effect E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport model1 E(Yi(j)) = γ00 + γ10HrsExercise model2
  • 24. ! Let’s think about our first two models: ! What happens when γ20 is equal to 0? ! Anything multiplied by 0 is 0, so SocSupport just drops out of the equation ! Becomes the same thing as model2 E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport Hypothesis Testing 0 E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport model1 E(Yi(j)) = γ00 + γ10HrsExercise model2
  • 25. Hypothesis Testing ! Let’s think about our first two models: ! model2 is just a special case of model1 ! The version of model1 where γ20 happens to be 0 ! One of many possible versions of model1 ! Why we say model2 is “nested” in model1 E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport model1 E(Yi(j)) = γ00 + γ10HrsExercise model2 0
  • 26. Hypothesis Testing ! Let’s think about our first two models: ! This also helps show why model1 always fits as well as model2 or better ! model1 can account for the case where γ20 = 0 ! But it can also account for many other cases, too E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport E(Yi(j)) = γ00 + γ10HrsExercise + γ20SocSupport model1 E(Yi(j)) = γ00 + γ10HrsExercise model2 0
  • 27. Likelihood Ratio Test ! We can compare nested models (only) using the likelihood-ratio test ! Remember that likelihood is what we search for in fitting an individual model (find the values with the highest likelihood) ! First, fit each of the models to be compared ! model1 <- lmer(Lifespan ~ 1 + HrsExercise + SocSupport + (1|Family), data=lifeexpectancy) ! model2 <- lmer(Lifespan ~ 1 + HrsExercise + (1|Family), data=lifeexpectancy)
  • 28. Likelihood Ratio Test • Then, compare them with anova(): • anova(model1, model2) • Order doesn’t matter • Differences in (log) likelihoods are distributed as a chi-square • d.f. = # of variables added or removed • Here, χ2 (1) = 8.67, p = .003 Log likelihood will also be somewhat higher (better) for the complex model … but is it SIGNIFICANTLY better? We’ll discuss what this means in a moment (don’t worry; it’s what we want)
  • 29. Likelihood Ratio Test • t-test and LR test are very similar! • t-test: Tests whether an effect differs from 0, based on this model • Likelihood ratio: Compare to a model where the effect actually IS constrained to be 0 • With an infinitely large sample, these two tests would produce identical conclusions • With small sample, t-test is less likely to detect spurious differences (Luke, 2017) • But, large differences uncommon
  • 30. Likelihood Ratio Test • t-test and LR test are very similar! • t-test: Tests whether an effect differs from 0, based on this model • Likelihood ratio: Compare to a model where the effect actually IS constrained to be 0 p-value from likelihood ratio test: .0032 p-value from lmerTest t- test: .0033
  • 31. Likelihood Ratio Test • t-test and LR test are very similar! • t-test: Tests whether an effect differs from 0, based on this model • Likelihood ratio: Compare to a model where the effect actually IS constrained to be 0 • Guidance: • LR test is useful for testing groups of variable • model1 <- lmer(Lifespan ~ 1 + HrsExercise …) • model3 <- lmer(Lifespan ~ 1 + HrsExercise + SocSupport + YrsEducation + Conscientiousness …) • If testing just one variable at a time, use t-test— slightly less likely to produce Type I error
  • 32. Week 4.1: Model Comparison ! Lab: Interactions Practice ! Model Comparison ! Nested Models ! Hypothesis Testing ! REML vs ML ! Non-Nested Models ! Shrinkage ! The Problem ! Solutions
  • 33. REML vs ML • Technically, two different algorithms that R can use “behind the scenes” to get the estimates # REML: Restricted Maximum Likelihood • Assumes the fixed effects structure is correct • Bad for comparing models that differ in fixed effects # ML: Maximum Likelihood • OK for comparing models • But, may underestimate variance of random effects • Ideal: ML for model comparison, REML for final results • lme4 does this automatically for you! • Defaults to REML. But automatically refits models with ML when you do likelihood ratio test.
  • 34. REML vs ML • The one time you might want to mess with this: • If you are going to be doing a lot of model comparisons, can fit the model with ML to begin with • model1 <- lmer(DV ~ 1 + Predictors, data=lifeexpectancy, REML=FALSE) • Saves refitting for each comparison • Remember to refit the model with REML=TRUE for your final results
  • 35. Week 4.1: Model Comparison ! Lab: Interactions Practice ! Model Comparison ! Nested Models ! Hypothesis Testing ! REML vs ML ! Non-Nested Models ! Shrinkage ! The Problem ! Solutions
  • 36. Non-Nested Models • Which of these pairs is not a case of nested models? • A • Accuracy ~ SentenceType + Aphasia + SentenceType:Aphasia • Accuracy ~ SentenceType + Aphasia • B • MathAchievement ~ SocioeconomicStatus • MathAchievement ~ TeacherRating + ClassSize • C • Recall ~ StudyTime • Recall ~ StudyTime + StudyStrategy
  • 37. Non-Nested Models • Which of these pairs is not a case of nested models? • A • Accuracy ~ SentenceType + Aphasia + SentenceType:Aphasia • Accuracy ~ SentenceType + Aphasia • B • MathAchievement ~ SocioeconomicStatus • MathAchievement ~ TeacherRating + ClassSize • Each of these models has something that the other doesn’t have.
  • 38. Non-Nested Models • Models that aren’t nested can’t be tested the same way • A non-nested comparison: • What would support 1st model over 2nd? • γ20 is significantly greater than 0, but also γ10 is 0 • But remember we can’t test that something is 0 with frequentist statistics … can’t prove the H0 is true • Parametric statistics don’t apply here $ E(Yi(j)) = γ00 + γ10YrsEducation + γ20IncomeThousands E(Yi(j)) = γ00 + γ10YrsEducation + γ20IncomeThousands 0 0
  • 39. Non-Nested Models: Comparison • Can be compared with information criteria • Remember our fitted values from last week? • fitted(model2) • What if we replaced all of our observations with just the fitted (predicted) values? • We’d be losing some information • However, if the model predicted the data well, we would not be losing that much • Information criteria measure how much information is lost with the fitted values (so, lower is better)
  • 40. Non-Nested Models: Comparison • AIC: An Information Criterion or Akaike’s Information Criterion • -2(log likelihood) + 2k • k = # of fixed and random effects in a particular model • A model with a lower AIC is better Akaike, 1974
  • 41. Non-Nested Models: Comparison • AIC: An Information Criterion or Akaike’s Information Criterion • -2(log likelihood) + 2k • k = # of fixed and random effects in a particular model • A model with a lower AIC is better • Doesn’t assume any of the models is correct • Appropriate for correlational / non-experimental data • BIC: Bayesian Information Criterion • -2(log likelihood) + log(n)k • k = # of fixed & random effects, n = num. observations • A model with a lower BIC is better • Typically prefers simpler models than AIC • Assumes that there’s a “true” underlying model in the set of variables being considered • Appropriate for experimental data Yang, 2005; Oehlert, 2012
  • 42. Non-Nested Models: Comparison • Can also get these from anova(model1, model2) • Just ignore the chi-square if non-nested models • AIC and BIC do not have a significance test associated with them • The model with the lower AIC/BIC is preferred, but we don’t know how reliable this preference is
  • 43. Week 4.1: Model Comparison ! Lab: Interactions Practice ! Model Comparison ! Nested Models ! Hypothesis Testing ! REML vs ML ! Non-Nested Models ! Shrinkage ! The Problem ! Solutions
  • 44. Shrinkage • The “Madden curse”… • Each year, a top NFL football player is picked to appear on the cover of the Madden NFL video game • That player often doesn’t play as well in the following season • Is the cover “cursed”?
  • 45. Shrinkage • The “Madden curse”… • Each year, a top NFL football player is picked to appear on the cover of the Madden NFL video game • That player often doesn’t play as well in the following season • Is the cover “cursed”?
  • 46. Shrinkage • What’s needed to be one of the top NFL players in a season? • You have to be a good player • Genuine predictor (signal) • And, luck on your side • Random chance or error • Top-performing player probably very good and very lucky • The next season… • Your skill may persist • Random chance probably won’t • Regression to the mean • Madden video game cover imperfect predicts next season’s performance because it was partly based on random error
  • 47. Shrinkage • Our estimates (& any choice of variables based on them) always partially reflect random chance in the dataset we used to obtain them • Won’t fit any later data set quite as well … shrinkage • Problem when we’re using the data to decide the model
  • 48. Shrinkage • Our estimates (& any choice of variables based on them) always partially reflect random chance in the dataset we used to obtain them • Won’t fit any later data set quite as well … shrinkage • “If you use a sample to construct a model, or to choose a hypothesis to test, you cannot make a rigorous scientific test of the model or the hypothesis using that same sample data.” (Babyak, 2004, p. 414)
  • 49. Shrinkage—Examples • Relations that we observe between a predictor variable and a dependent variable might simply be capitalizing on random chance • U.S. government puts out 45,000 economic statistics each year (Silver, 2012) • Can we use these to predict whether US economy will go into recession? • With 45,000 predictors, we are very likely to find a spurious relation by chance • Especially w/ only 15 recessions since the end of WW II
  • 50. Shrinkage—Examples • Relations that we observe between a predictor variable and a dependent variable might simply be capitalizing on random chance • U.S. government puts out 45,000 economic statistics each year (Silver, 2012) • Can we use these to predict whether US economy will go into recession? • With 45,000 predictors, we are very likely to find a spurious relation by chance • Significance tests try to address this … but with 45,000 predictors, we are likely to find significant effects by chance (5% Type I error rate at ɑ=.05)
  • 51. Shrinkage—Examples • Adak Island, Alaska • Daily temperature here predicts stock market activity! • r = -.87 correlation with the price of a specific group of stocks! • Completely true—I’m not making this up! • Problem with this: • With thousands of weather stations & stocks, easy to find a strong correlation somewhere, even if it’s just sampling error • Problem is that this factoid doesn’t reveal all of the other (non- significant) weather stations & stocks we searched through • Would only be impressive if this hypothesis continued to be true on a new set of weather data & stock prices Vul et al., 2009
  • 52. Shrinkage—Examples • “Puzzlingly high correlations” in some fMRI work • Correlate each voxel in a brain scan with a behavioral measure (e.g., personality survey) • Restrict the analysis to voxels where the correlation is above some threshold • Compute final correlation in this region with behavioral measure—very high! • Problem: Voxels were already chosen based on those high correlations • Includes sampling error favoring the correlation but excludes error that doesn’t Vul et al., 2009
  • 53. Week 4.1: Model Comparison ! Lab: Interactions Practice ! Model Comparison ! Nested Models ! Hypothesis Testing ! REML vs ML ! Non-Nested Models ! Shrinkage ! The Problem ! Solutions
  • 54. Shrinkage—Solutions • One solution: Select model(s) in advance (perhaps even pre-registered) • A theory is valuable for this • Adak Island example is implausible in part because there’s no causal reason why an island in Alaska would relate to stock prices “Just as you do not need to know exactly how a car engine works in order to drive safely, you do not need to understand all the intricacies of the economy to accurately read those gauges.” – Economic forecasting firm ECRI (quoted in Silver, 2012)
  • 55. Shrinkage—Solutions • One solution: Select model(s) in advance (perhaps even pre-registered) • A theory is valuable for this • Not driven purely by the data or by chance if we have an a priori reason to favor this variable “There is really nothing so practical as a good theory.” -- Social psychologist Kurt Lewin (Lewin’s Maxim)
  • 56. Shrinkage—Solutions • One solution: Select model(s) in advance (perhaps even pre-registered) • A theory is valuable for this • Not driven purely by the data or by chance if we have an a priori reason to favor this variable • Based on some other measure (e.g., another brain scan)
  • 57. Shrinkage—Solutions • One solution: Select model(s) in advance (perhaps even pre-registered) • A theory is valuable for this • Not driven purely by the data or by chance if we have an a priori reason to favor this variable • Based on some other measure (e.g., another brain scan) • Based on research design • For factorial experiments, typical to include all experimental variables and interactions • Research design implies you were interested in all of these
  • 58. Shrinkage—Solutions • For more exploratory analyses: Show that the finding replicates • On a second dataset • Test a model obtained from one subset of the data applies to another subset (cross-validation) • e.g., training and test sets • A better version: Do this with many randomly chosen subsets • Monte Carlo methods • Reading on Canvas for some general ways to do this in R