SlideShare a Scribd company logo
IS 4800 Empirical Research Methods
for Information Science
Class Notes March 16, 2012
Instructor: Prof. Carole Hafner, 446 WVH
hafner@ccs.neu.edu Tel: 617-373-5116
Course Web site: www.ccs.neu.edu/course/is4800sp12/
Outline
• Sampling and statistics (cont.)
• T test for paired samples
• T test for independent means
• Analysis of Variance
• Two way analysis of Variance
3
Relationship Between Population
and Samples When a Treatment
Had No Effect
Population

M1 M2
Sample 2
Sample 1
4
Relationship Between Population
and Samples When a Treatment
Had An Effect
Control
group
population
c
Control
group
sample
Mc
Treatment
group
sample
Mt
Treatment
group
population
t
Population

Mean? Variance?
2

Sampling
Sample of size N
Mean values from all possible
samples of size N
aka “distribution of means” MM = 
N
X
M

=
N
M
2
2 
 =
N
M
X
SD
 
=
2
2
)
(
ZM = ( M -  ) / M

Z tests and t-tests
t is like Z:
Z = M - μ /
t = M – μ / μ = 0 for paired samples
We use a stricter criterion (t) instead of Z because is
based on an estimate of the population variance while is
based on a known population variance.
M

M
S
M
S
M

S2 = Σ (X - M)2 = SS
N – 1 N-1
S2
M = S2/N
Given info about
population of change
scores and the
sample size we will
be using (N)
T-test with paired samples
Now, given a
particular sample of
change scores of
size N
We can compute the
distribution of means
We compute its mean
and finally determine
the probability that this
mean occurred by
chance
?
 = 0
S2 est 2 from sample = SS/df
M
S
M
t =
df = N-1
S2
M = S2/N
t test for independent samples
Given two
samples
Estimate population
variances
(assume same)
Estimate variances
of distributions
of means
Estimate variance
of differences
between means
(mean = 0)
This is now your
comparison distribution
Estimating the Population Variance
S2 is an estimate of σ2
S2 = SS/(N-1) for one sample (take sq root for S)
For two independent samples – “pooled estimate”:
S2 = df1/dfTotal * S1
2 + df2/dfTotal * S2
2
dfTotal = df1 + df2 = (N1 -1) + (N2 – 1)
From this calculate variance of sample means: S2
M = S2/N
needed to compute t statistic
S2
difference = S2
Pooled / N1 + S2
Pooled / N2
t test for independent samples, continued
This is your
comparison distribution
NOT normal, is a ‘t’
distribution
Shape changes depending on
df
df = (N1 – 1) + (N2 – 1)
Distribution of differences
between means
Compute t = (M1-M2)/SDifference
Determine if beyond cutoff score
for test parameters (df,sig, tails)
from lookup table.
ANOVA: When to use
• Categorial IV
numerical DV (same as t-test)
• HOWEVER:
– There are more than 2 levels of IV so:
– (M1 – M2) / Sm won’t work
12
ANOVA Assumptions
• Populations are normal
• Populations have equal variances
• More or less..
13
Basic Logic of ANOVA
• Null hypothesis
– Means of all groups are equal.
• Test: do the means differ more than expected
give the null hypothesis?
• Terminology
– Group = Condition = Cell
14
Accompanying Statistics
• Experimental
– Between-subjects
• Single factor, N-level (for N>2)
– One-way Analysis of Variance (ANOVA)
• Two factor, two-level (or more!)
– Factorial Analysis of Variance
– AKA N-way Analysis of Variance (for N IVs)
– AKA N-factor ANOVA
– Within-subjects
• Repeated-measures ANOVA (not discussed)
– AKA within-subjects ANOVA
15
• The Analysis of Variance is used when you have more
than two groups in an experiment
– The F-ratio is the statistic computed in an Analysis of
Variance and is compared to critical values of F
– The analysis of variance may be used with unequal sample
size (weighted or unweighted means analysis)
– When there are just 2 groups, ANOVA is equivalent to the t
test for independent means
ANOVA: Single factor, N-level
(for N>2)
One-Way ANOVA – Assuming
Null Hypothesis is True…
Within-Group Estimate
Of Population Variance
2
1
est

2
2
est

2
3
est

2
est
within

Between-Group Estimate
Of Population Variance
M1
M2
M3
2
est
between

2
2
est
within
est
between
F


=


Justification for F statistic
Calculating F
Example
Example
Using the F Statistic
• Use a table for F(BDF, WDF)
– And also α
BDF = between-groups degrees of freedom =
number of groups -1
WDF = within-groups degrees of freedom =
Σ df for all groups = N – number of groups
One-way ANOVA in SPSS
23
Data
0
1
2
3
4
5
6
1 Day 2 Day 3 Day
Performance
Mean
24
Analyze/Compare Means/One Way
ANOVA…
SPSS Results…
ANOVA
Performance
24.813 2 12.406 9.442 .001
27.594 21 1.314
52.406 23
Between Groups
Within Groups
Total
Sum of
Squares df Mean Square F Sig.
F(2,21)=9.442, p<.05
26
Factorial Designs
• Two or more nominal independent variables,
each with two or more levels, and a numeric
dependent variable.
• Factorial ANOVA teases apart the contribution
of each variable separately.
• For N IVs, aka “N-way” ANOVA
27
Factorial Designs
• Adding a second independent variable to a single-
factor design results in a FACTORIAL DESIGN
• Two components can be assessed
– The MAIN EFFECT of each independent variable
• The separate effect of each independent variable
• Analogous to separate experiments involving those variables
– The INTERACTION between independent variables
• When the effect of one independent variable changes over levels of a
second
• Or– when the effect of one variable depends on the level of the other
variable.
Example
Wait Time Sign in Student Center
vs.
No Sign
Satisfaction
0
2
4
6
8
10
12
Level 1 Level 2
Level of Independent Variable A
Value
of
the
Dependent
Variable
Level 1 Level 2
Example of An Interaction - Student Center Sign –
2 Genders x 2 Sign Conditions
F
M
No
Sign
Sign
30
Two-way ANOVA in SPSS
31
Analyze/General Linear
Model/Univariate
32
Results
Tests of Between-Subjects Effects
Dependent Variable: Performance
26.507a
5 5.301 3.685 .018
210.855 1 210.855 146.547 .000
20.728 2 10.364 7.203 .005
.002 1 .002 .001 .974
1.680 2 .840 .584 .568
25.899 18 1.439
401.250 24
52.406 23
Source
Corrected Model
Intercept
TrainingDays
Trainer
TrainingDays * Trainer
Error
Total
Corrected Total
Type III Sum
of Squares df Mean Square F Sig.
R Squared = .506 (Adjusted R Squared = .369)
a.
33
Results
34
Degrees of Freedom
• df for between-group variance estimates for main
effects
– Number of levels – 1
• df for between-group variance estimates for
interaction effect
– Total num cells – df for both main effects – 1
– e.g. 2x2 => 4 – (1+1) – 1 = 1
• df for within-group variance estimate
– Sum of df for each cell = N – num cells
• Report: “F(bet-group, within-group)=F, Sig.”
Publication format
Tests of Between-Subjects Effects
Dependent Variable: Performance
26.507a
5 5.301 3.685 .018
210.855 1 210.855 146.547 .000
20.728 2 10.364 7.203 .005
.002 1 .002 .001 .974
1.680 2 .840 .584 .568
25.899 18 1.439
401.250 24
52.406 23
Source
Corrected Model
Intercept
TrainingDays
Trainer
TrainingDays * Trainer
Error
Total
Corrected Total
Type III Sum
of Squares df Mean Square F Sig.
R Squared = .506 (Adjusted R Squared = .369)
a.
N=24, 2x3=6 cells => df TrainingDays=2,
df within-group variance=24-6=18
=> F(2,18)=7.20, p<.05
36
Reporting rule
• IF you have a significant interaction
• THEN
– If 2x2 study: do not report main effects, even if
significant
– Else: must look at patterns of means in cells to
determine whether to report main effects or not.
Results?
TrainingDays
Trainer
TrainingDays * Trainer
Sig.
0.34
0.12
0.41
n.s.
Results?
TrainingDays
Trainer
TrainingDays * Trainer
Sig.
0.34
0.12
0.02
Significant interaction between TrainingDays
And Trainer, F(2,22)=.584, p<.05
Results?
TrainingDays
Trainer
TrainingDays * Trainer
Sig.
0.34
0.02
0.41
Main effect of Trainer, F(1,22)=.001, p<.05
Results?
TrainingDays
Trainer
TrainingDays * Trainer
Sig.
0.04
0.12
0.01
Significant interaction between TrainingDays
And Trainer, F(2,22)=.584, p<.05
Do not report TrainingDays as significant
Results?
TrainingDays
Trainer
TrainingDays * Trainer
Sig.
0.04
0.02
0.41
Main effects for both TrainingDays,
F(2,22)=7.20, p<.05, and Trainer,
F(1,22)=.001, p<.05
“Factorial Design”
• Not all cells in your design need to be tested
– But if they are, it is a “full factorial design”, and you
do a “full factorial ANOVA”
Real-Time Retrospective
Agent
Text
 
 X
43
Higher-Order Factorial Designs
• More than two independent variables are included in a
higher-order factorial design
– As factors are added, the complexity of the experimental
design increases
• The number of possible main effects and interactions increases
• The number of subjects required increases
• The volume of materials and amount of time needed to complete the
experiment increases

More Related Content

PDF
jhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhg
PPTX
Analysis of variance
PDF
Independent samples t-test
PDF
Module 6-T-tests pdf copy.pdffghdfjfdjdjdj
PPTX
Lecture 11 Paired t test.pptx
PDF
Applied statistics lecture_8
PPT
Unit-5.-t-test.ppt
jhghgjhgjhgjhfhcgjfjhvjhjgjkggjhgjhgjhfjgjgfgfhgfhg
Analysis of variance
Independent samples t-test
Module 6-T-tests pdf copy.pdffghdfjfdjdjdj
Lecture 11 Paired t test.pptx
Applied statistics lecture_8
Unit-5.-t-test.ppt

Similar to classmar16.ppt (20)

PDF
Dr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdf
PPTX
Anova; analysis of variance
PPTX
Statistical analysis.pptx
PPTX
Biostat.
PPTX
The t Test for Two Related Samples
PPTX
Repeated-Measures and Two-Factor Analysis of Variance
PPTX
Sd,t test
PPTX
PPT
Anova.ppt
PPTX
Analysis of variance (ANOVA)
PPT
comparison of two population means - chapter 8
PPT
comparison of two population means - chapter 8
PPTX
Comparing means
PPTX
Hypothesis Test _Two-sample t-test, Z-test, Proportion Z-test
PPT
Anova by Hazilah Mohd Amin
PDF
Anova one way sem 1 20142015 dk
PPTX
Statistical analysis
PPTX
Full Lecture Presentation on ANOVA
PPTX
Experimental design data analysis
PDF
Repeated Measures ANOVA
Dr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdf
Anova; analysis of variance
Statistical analysis.pptx
Biostat.
The t Test for Two Related Samples
Repeated-Measures and Two-Factor Analysis of Variance
Sd,t test
Anova.ppt
Analysis of variance (ANOVA)
comparison of two population means - chapter 8
comparison of two population means - chapter 8
Comparing means
Hypothesis Test _Two-sample t-test, Z-test, Proportion Z-test
Anova by Hazilah Mohd Amin
Anova one way sem 1 20142015 dk
Statistical analysis
Full Lecture Presentation on ANOVA
Experimental design data analysis
Repeated Measures ANOVA

More from RangothriSreenivasaS (20)

PPT
Ch12 south asia for cd.ppt
PPT
classapr06.ppt
PPTX
classfeb03.pptx
PPT
classfeb08and10.ppt
PDF
literary-theories_session-1_leaders-and-ideas-compatibility-mode.pdf
PDF
literary-theories_session-6_psychology-of-literature-compatibility-mode.pdf
PPT
classJan11.ppt
PPT
classfeb24.ppt
PPT
classfeb15.ppt
PPT
classmar2.ppt
PPT
classmar16.ppt
PPT
Literary Criticism Notes.ppt
PPT
12 FAM - Respect.ppt
PPT
10 FAM - Harmony in the Family.ppt
PPT
9 HB - Prosperity _ Health.ppt
PPT
8 HB - Self.ppt
PPT
7 HB - Body as an Instrument.ppt
PPTX
Students Induction Program Overview.pptx
PPT
1 About this Workshop or Course.ppt
PPT
Unit II 2.3 Body language- Non verbal communication.ppt
Ch12 south asia for cd.ppt
classapr06.ppt
classfeb03.pptx
classfeb08and10.ppt
literary-theories_session-1_leaders-and-ideas-compatibility-mode.pdf
literary-theories_session-6_psychology-of-literature-compatibility-mode.pdf
classJan11.ppt
classfeb24.ppt
classfeb15.ppt
classmar2.ppt
classmar16.ppt
Literary Criticism Notes.ppt
12 FAM - Respect.ppt
10 FAM - Harmony in the Family.ppt
9 HB - Prosperity _ Health.ppt
8 HB - Self.ppt
7 HB - Body as an Instrument.ppt
Students Induction Program Overview.pptx
1 About this Workshop or Course.ppt
Unit II 2.3 Body language- Non verbal communication.ppt

Recently uploaded (20)

PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
medical staffing services at VALiNTRY
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
L1 - Introduction to python Backend.pptx
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
history of c programming in notes for students .pptx
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Odoo Companies in India – Driving Business Transformation.pdf
medical staffing services at VALiNTRY
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
L1 - Introduction to python Backend.pptx
How to Migrate SBCGlobal Email to Yahoo Easily
PTS Company Brochure 2025 (1).pdf.......
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Design an Analysis of Algorithms II-SECS-1021-03
history of c programming in notes for students .pptx
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
How to Choose the Right IT Partner for Your Business in Malaysia
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
2025 Textile ERP Trends: SAP, Odoo & Oracle
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
wealthsignaloriginal-com-DS-text-... (1).pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool

classmar16.ppt

  • 1. IS 4800 Empirical Research Methods for Information Science Class Notes March 16, 2012 Instructor: Prof. Carole Hafner, 446 WVH hafner@ccs.neu.edu Tel: 617-373-5116 Course Web site: www.ccs.neu.edu/course/is4800sp12/
  • 2. Outline • Sampling and statistics (cont.) • T test for paired samples • T test for independent means • Analysis of Variance • Two way analysis of Variance
  • 3. 3 Relationship Between Population and Samples When a Treatment Had No Effect Population  M1 M2 Sample 2 Sample 1
  • 4. 4 Relationship Between Population and Samples When a Treatment Had An Effect Control group population c Control group sample Mc Treatment group sample Mt Treatment group population t
  • 5. Population  Mean? Variance? 2  Sampling Sample of size N Mean values from all possible samples of size N aka “distribution of means” MM =  N X M  = N M 2 2   = N M X SD   = 2 2 ) ( ZM = ( M -  ) / M 
  • 6. Z tests and t-tests t is like Z: Z = M - μ / t = M – μ / μ = 0 for paired samples We use a stricter criterion (t) instead of Z because is based on an estimate of the population variance while is based on a known population variance. M  M S M S M  S2 = Σ (X - M)2 = SS N – 1 N-1 S2 M = S2/N
  • 7. Given info about population of change scores and the sample size we will be using (N) T-test with paired samples Now, given a particular sample of change scores of size N We can compute the distribution of means We compute its mean and finally determine the probability that this mean occurred by chance ?  = 0 S2 est 2 from sample = SS/df M S M t = df = N-1 S2 M = S2/N
  • 8. t test for independent samples Given two samples Estimate population variances (assume same) Estimate variances of distributions of means Estimate variance of differences between means (mean = 0) This is now your comparison distribution
  • 9. Estimating the Population Variance S2 is an estimate of σ2 S2 = SS/(N-1) for one sample (take sq root for S) For two independent samples – “pooled estimate”: S2 = df1/dfTotal * S1 2 + df2/dfTotal * S2 2 dfTotal = df1 + df2 = (N1 -1) + (N2 – 1) From this calculate variance of sample means: S2 M = S2/N needed to compute t statistic S2 difference = S2 Pooled / N1 + S2 Pooled / N2
  • 10. t test for independent samples, continued This is your comparison distribution NOT normal, is a ‘t’ distribution Shape changes depending on df df = (N1 – 1) + (N2 – 1) Distribution of differences between means Compute t = (M1-M2)/SDifference Determine if beyond cutoff score for test parameters (df,sig, tails) from lookup table.
  • 11. ANOVA: When to use • Categorial IV numerical DV (same as t-test) • HOWEVER: – There are more than 2 levels of IV so: – (M1 – M2) / Sm won’t work
  • 12. 12 ANOVA Assumptions • Populations are normal • Populations have equal variances • More or less..
  • 13. 13 Basic Logic of ANOVA • Null hypothesis – Means of all groups are equal. • Test: do the means differ more than expected give the null hypothesis? • Terminology – Group = Condition = Cell
  • 14. 14 Accompanying Statistics • Experimental – Between-subjects • Single factor, N-level (for N>2) – One-way Analysis of Variance (ANOVA) • Two factor, two-level (or more!) – Factorial Analysis of Variance – AKA N-way Analysis of Variance (for N IVs) – AKA N-factor ANOVA – Within-subjects • Repeated-measures ANOVA (not discussed) – AKA within-subjects ANOVA
  • 15. 15 • The Analysis of Variance is used when you have more than two groups in an experiment – The F-ratio is the statistic computed in an Analysis of Variance and is compared to critical values of F – The analysis of variance may be used with unequal sample size (weighted or unweighted means analysis) – When there are just 2 groups, ANOVA is equivalent to the t test for independent means ANOVA: Single factor, N-level (for N>2)
  • 16. One-Way ANOVA – Assuming Null Hypothesis is True… Within-Group Estimate Of Population Variance 2 1 est  2 2 est  2 3 est  2 est within  Between-Group Estimate Of Population Variance M1 M2 M3 2 est between  2 2 est within est between F   =  
  • 17. Justification for F statistic
  • 21. Using the F Statistic • Use a table for F(BDF, WDF) – And also α BDF = between-groups degrees of freedom = number of groups -1 WDF = within-groups degrees of freedom = Σ df for all groups = N – number of groups
  • 23. 23 Data 0 1 2 3 4 5 6 1 Day 2 Day 3 Day Performance Mean
  • 25. SPSS Results… ANOVA Performance 24.813 2 12.406 9.442 .001 27.594 21 1.314 52.406 23 Between Groups Within Groups Total Sum of Squares df Mean Square F Sig. F(2,21)=9.442, p<.05
  • 26. 26 Factorial Designs • Two or more nominal independent variables, each with two or more levels, and a numeric dependent variable. • Factorial ANOVA teases apart the contribution of each variable separately. • For N IVs, aka “N-way” ANOVA
  • 27. 27 Factorial Designs • Adding a second independent variable to a single- factor design results in a FACTORIAL DESIGN • Two components can be assessed – The MAIN EFFECT of each independent variable • The separate effect of each independent variable • Analogous to separate experiments involving those variables – The INTERACTION between independent variables • When the effect of one independent variable changes over levels of a second • Or– when the effect of one variable depends on the level of the other variable.
  • 28. Example Wait Time Sign in Student Center vs. No Sign Satisfaction
  • 29. 0 2 4 6 8 10 12 Level 1 Level 2 Level of Independent Variable A Value of the Dependent Variable Level 1 Level 2 Example of An Interaction - Student Center Sign – 2 Genders x 2 Sign Conditions F M No Sign Sign
  • 32. 32 Results Tests of Between-Subjects Effects Dependent Variable: Performance 26.507a 5 5.301 3.685 .018 210.855 1 210.855 146.547 .000 20.728 2 10.364 7.203 .005 .002 1 .002 .001 .974 1.680 2 .840 .584 .568 25.899 18 1.439 401.250 24 52.406 23 Source Corrected Model Intercept TrainingDays Trainer TrainingDays * Trainer Error Total Corrected Total Type III Sum of Squares df Mean Square F Sig. R Squared = .506 (Adjusted R Squared = .369) a.
  • 34. 34 Degrees of Freedom • df for between-group variance estimates for main effects – Number of levels – 1 • df for between-group variance estimates for interaction effect – Total num cells – df for both main effects – 1 – e.g. 2x2 => 4 – (1+1) – 1 = 1 • df for within-group variance estimate – Sum of df for each cell = N – num cells • Report: “F(bet-group, within-group)=F, Sig.”
  • 35. Publication format Tests of Between-Subjects Effects Dependent Variable: Performance 26.507a 5 5.301 3.685 .018 210.855 1 210.855 146.547 .000 20.728 2 10.364 7.203 .005 .002 1 .002 .001 .974 1.680 2 .840 .584 .568 25.899 18 1.439 401.250 24 52.406 23 Source Corrected Model Intercept TrainingDays Trainer TrainingDays * Trainer Error Total Corrected Total Type III Sum of Squares df Mean Square F Sig. R Squared = .506 (Adjusted R Squared = .369) a. N=24, 2x3=6 cells => df TrainingDays=2, df within-group variance=24-6=18 => F(2,18)=7.20, p<.05
  • 36. 36 Reporting rule • IF you have a significant interaction • THEN – If 2x2 study: do not report main effects, even if significant – Else: must look at patterns of means in cells to determine whether to report main effects or not.
  • 38. Results? TrainingDays Trainer TrainingDays * Trainer Sig. 0.34 0.12 0.02 Significant interaction between TrainingDays And Trainer, F(2,22)=.584, p<.05
  • 40. Results? TrainingDays Trainer TrainingDays * Trainer Sig. 0.04 0.12 0.01 Significant interaction between TrainingDays And Trainer, F(2,22)=.584, p<.05 Do not report TrainingDays as significant
  • 41. Results? TrainingDays Trainer TrainingDays * Trainer Sig. 0.04 0.02 0.41 Main effects for both TrainingDays, F(2,22)=7.20, p<.05, and Trainer, F(1,22)=.001, p<.05
  • 42. “Factorial Design” • Not all cells in your design need to be tested – But if they are, it is a “full factorial design”, and you do a “full factorial ANOVA” Real-Time Retrospective Agent Text    X
  • 43. 43 Higher-Order Factorial Designs • More than two independent variables are included in a higher-order factorial design – As factors are added, the complexity of the experimental design increases • The number of possible main effects and interactions increases • The number of subjects required increases • The volume of materials and amount of time needed to complete the experiment increases