Statistical test in spss

Statistics
Statistics:
 Science dealing with data collection, organization, analysis, interpretation and presentation.
 Numbers that are used to describe data or relationship.
 Study which develop critical thinking and analytic skills.

Statistical Analysis
Types of Statistical Analysis:
Descriptive Statistics:
 Organizing and summarizing data using number and graphs.
 Data Summary: Bar graphs, Histograms, Pie-charts, Shape of graph and skewness
 Measures of Central Tendency: Mean, Median, Mode
 Measures of variability: Range, Variance and Standard Deviation
 Ex: What is the average age of the sample?
Inferential Statistics:
 Using sample data to make an inference or draw a conclusion of the population.
 Use probability to determine how confident we can be that the conclusions we make are correct. (Confidence Intervals and
Margins of Error)
 t-Test, Chi-square, and ANOVA (analysis of variance)
 Ex: Is the average age of population different from 35?

Level of Measurement
Types of Level of Measurement:
Qualitative or Categorical:
 Nominal: Two or more categories and lack order. Ex: Gender; 1= male, 2= female, Football jersey number.
 Ordinal: Two or more categories and have order but differences are not meaningful. Ex: Small, Medium and
Large farmers, Place finished in a race: 1st, 2nd, 3rd , and so on.
 Non-Parametric Statistical methods.
Quantitative or Continuous:
 Interval: Quantitative variables having own scale. It lacks ‘0’ and divisions are not meaningful. Ex: temperature in
Celsius.
 Ratio: Quantitative variables with true zero point and divisions are meaningful. Ex: Yield in quintal, Germination, etc.
 Parametric Statistical methods.

Variable Analysis
Types of Variable Analysis:
Univariate Analysis:
 It uses only one variable to simply describe what is in a result.
 Its purpose is more toward descriptive rather than explanatory.
 Frequencies, Histograms, Means, Medians, Modes, Ranges, Percentiles, Scatterplots, Confidence Intervals.
 Ex: Gender
Bivariate Analysis:
 It compares or contrasts the effect of two variables on each other.
 Crosstabs, Chi-square, ANOVAs, t-Tests, Correlations.
 Ex: Gender and GPA value
Multivariate Analysis:
 It tests three or more variables together to check for all kinds of effects that occur together.
 OLS regression, Binary logistic regression, and other types of regressions.
 Gender, GPA value and prejudice

Statistical Tests
Types of Statistical test:
Parametric test:
 Specific assumptions are made about the population parameter.
 It is powerful if existed.
 Test statistics is based on distribution.
 Central measure- mean.
 Can draw more conclusions.
 One way ANOVA, Independent sample t-test, Paired sample t-test, correlation etc.
 Ex: Length and weight of human body.
 Conditions to be satisfied:
Data must be interval/ratio.
Subjects should be randomly selected.
Data should be normally distributed.
Variation in the results should be roughly same.

Statistical Tests
Non-Parametric test:
 Distribution free test: Does not assume that our data follow a specific distribution.
 Not powerful like parametric test.
 Test statistics is arbitrary.
 Central measure: Median.
 Simplicity, not affected by outliers.
 Chi-square, Friedman test, Kruskal wallis test, Sign test etc.
 Ex: Did fungus growth take place or not?
 Conditions to be satisfied:
Data must be nominal/ordinal.
Non normal distribution of data.

Statistical Tests
Parametric Vs. Non-Parametric test:
Study Type Parametric test Non parametric test
Compare means between two
distinct/independent groups
Two sample t-test Mann-whitney test
Compare two quantitative
measurements taken from the same
individual
Paired t-test Wilcoxon signed-ranked test
Compare means between three or
more distinct/independent groups
Analysis of variance (ANOVA) Kruskal wallis test
Repeated measures, >two conditions One way, repeated measures
ANOVA
Friedman’s test
Estimate the degree of association
between two quantitative variables
Pearson Coefficient of correlation Spearman’s rank correlation

Choosing Right Statistical Test

Choosing right statistical test
Regression test (test cause and effect relationship):
Predictor variable Outcome variable Research question example
Simple linear regression • Continuous
• One predictor
• Continuous
• One outcome
What is the effect of income on
longevity?
Multiple linear regression • Continuous
• Two or more predictors
• Continuous
• One outcome
What is the effect of income and
minutes of exercise per day on
longevity?
Logistic regression • Continuous • Binary What is the effect of drug
dosages on the survival of test
subjects?

Comparison test (differences among group means):
Paired t-test • Categorical
• One predictor
• Quantitative
• Groups come from the same
population
What is the effect of two different
test prep programs on the average
exam scores for students from the
same class?
Independent t-test • Categorical
• One predictor
• Quantitative
• Groups come from the
different population
What is the difference in average
exam scores for students from two
different schools?
ANOVA • Categorical
• One or more predictor
• Quantitative
• One outcome
What is the difference in average
pain levels among post-surgical
patients given three different
painkillers?

Correlation test (Check whether two variables are related ):
Pearson • Continuous • Continuous How are latitude and temperature related?
Chi-square • Categorical • Categorical How is membership in a sports team related
to membership in drama club among high
school students?

Non-Parametric test:
Predictor variable Outcome variable Use in place of
Spearman •Ordinal •Ordinal Regression and
correlation tests
Sign test •Categorical •Quantitative T-test
Kruskal–Wallis •Categorical
•3 or more groups
•Quantitative ANOVA
Wilcoxon Rank-
Sum test
•Categorical
•2 groups
•Quantitative
•Groups come from different populations
Independent t-test
Wilcoxon Signed-
rank test
•Categorical
•2 groups
•Quantitative
•Groups come from the same population
Paired t-test

One sample t-test
Assumptions:
 Dependent variable should be measured at interval or scale ratio (continuous).
 The data are independent.
 There should be no significant outliers.
 Dependent variable should be approximately normally distributed.
 Sample are small, mostly lower than 30.
Example:
 A researcher is planning a psychological intervention study to test depression index, where anyone who achieves a
score of 4.0 is deemed to have 'normal' levels of depression.
 Lower scores indicate less depression and higher scores indicate greater depression.
 He has recruited 40 participants to take part in the study.
 Depression scores are recorded in the variable dep_score.
 He wants to know whether his sample is representative of the normal population.
Onesamplet-test

One sample t-test
Procedure:
1. Click Analyze > Compare Means > One-Sample T Test... on the main menu.
2. It will be presented with the One-Sample T Test dialogue box, as shown below:
1. 2.

One sample t-test
3. Transfer the dependent variable, dep_score, into the Test Variable(s). Enter the population mean you are comparing the
sample against in the Test Value: box, by changing the current value of "0" to "4". You will end up with the screen.
4. Click on the Options button. You will be presented with the One-Sample T Test: Options dialogue box, as shown below:
5. Click on the Continue button. You will be returned to the One-Sample T Test dialogue box.
6. Click on the OK button to generate the output.
3. 4.

One sample t-test
Result and Interpretation:
 In this example, p < .05 (0.022) therefore, it can be concluded that the population means are statistically significantly
different.
 If p > .05, the difference between the sample-estimated population mean and the comparison population mean
would not be statistically significantly different.
 Depression score was statistically significantly lower than the population normal depression score, t(39) = -2.381, p =
.022.
 There was a statistically significant difference between means (p < .05). Therefore, we can reject the null hypothesis
and accept the alternative hypothesis.

Independent sample t-test
Assumptions:
 Independent variable should consist of two categorical, independent groups.
 The data are independent.
 Dependent variable should be approximately normally distributed.
 There needs to be homogeneity of variances.
Example:
 A researcher decided to investigate whether an exercise or weight loss intervention is more effective in lowering
cholesterol levels.
 To this end, the researcher recruited a random sample of inactive males that were classified as overweight.
 This sample was then randomly split into two groups:
 Group 1 underwent a calorie-controlled diet and Group 2 undertook the exercise-training programme.
 In order to determine which treatment programme was more effective, the mean cholesterol concentrations were
compared between the two groups at the end of the treatment programmes.
Independent samplet-test

Procedure:
1. Click Analyze > Compare Means > Independent-Samples T Test... on the top menu, as shown below:
2. You will be presented with the Independent-Samples T Test dialogue box, as shown below:
1. 2.

3. Transfer the dependent variable, Cholesterol, into the Test Variable(s): box, and transfer the independent variable, Treatment,
into the Grouping Variable: box.
3.
4.
4. You then need to define the groups (treatments). Click on the Define Options Button. You will be presented with the Define
Groups dialogue box
5. Enter "1" into the Group 1: box and enter "2" into the Group 2: box. Remember that we labelled the Diet Treatment group as 1
and the Exercise Treatment group as 2.
6. Click the Continue Button.
7. Click the Ok Button.
5.

 You can see that the group means are statistically significantly different because the value in the "Sig. (2-tailed)" row is less
than 0.05.
 Looking at the Group Statistics table, we can see that those people who undertook the exercise trial had lower cholesterol
levels at the end of the programme than those who underwent a calorie-controlled diet.
 This study found that overweight, physically inactive male participants had statistically significantly lower cholesterol
concentrations (5.80 ± 0.38 mmol/L) at the end of an exercise-training programme compared to after a calorie-controlled diet
(6.15 ± 0.52 mmol/L), t(38)=2.428, p=0.020.

Paired t-test
Assumptions:
 Independent variable should consist of two categorical, related groups or matched pairs.
 The distribution of the differences in the dependent variable between the two related groups should be approximately
normally distributed.
Example:
 A group of Sports Science students (n = 20) are selected from the population to investigate whether a 12-week
plyometric-training programme improves their standing long jump performance.
 In order to test whether this training improves performance, the students are tested for their long jump
performance before they undertake a plyometric-training programme and then again at the end of the programme
(i.e., the dependent variable is "standing long jump performance", and the two related groups are the standing long
jump values "before" and "after" the 12-week plyometric-training programme).
Paired t-test

Paired t-test
Procedure:
1. Click Analyze > Compare Means > Paired Samples T Test... on the top menu, as shown below:
2. You will be presented with the Paired-Samples T Test dialogue box, as shown below:
1. 2.

Paired t-test
3. Transfer the variables JUMP1 and JUMP2 into the Paired Variables: box.
3.

Paired t-test
 We can conclude that there was a statistically significant improvement in jump distance following the plyometric-
training programme from 2.48 ± 0.16 m to 2.52 ± 0.16 m (p < 0.0005); an improvement of 0.03 ± 0.03 m.

One way ANOVA
Assumptions:
 Dependent variable should be measured at interval or scale ratio (continuous)
 Independent variable should consist of two or more categorical, independent groups
 Independence of observations
 There need to be homogeneity of variances
 There should be no significant outlier
 Dependent variable should be approximately normally distributed for each category of the independent variable.
Example:
 A manager employs an external agency which provides training in the spreadsheet program for his employees.
 They offer 3 courses: a beginner, intermediate and advanced course.
 He is unsure which course is needed for the type of work they do at his company, so he sends 10 employees on the
beginner course, 10 on the intermediate and 10 on the advanced course.
 When they all return from the training, he gives them a problem to solve using the spreadsheet program, and times
how long it takes them to complete the problem.
 He then compares the three courses (beginner, intermediate, advanced) to see if there are any differences in the
average time it took to complete the problem.
OnewayANOVA

One way ANOVA
Procedure:
1. Click Analyze > Compare Means > One way ANOVA on the top menu, as shown below:
2. You will be presented with the One way ANOVA dialogue box, as shown below:
1. 2.

One way ANOVA
3. Transfer the dependent variable, Time, into the Test Variable(s): box, and transfer the independent variable, course into the
Factor box.
3.
4.
4. Click on the Post hoc button. Tick the Tukey checkbox as shown below:
6.
6. Click on the Options button. Tick the Descriptive checkbox in the –Statistics– area, as shown below:

One way ANOVA

One way ANOVA
 We can see from the table that there is a statistically significant difference in time to complete the problem between
the group that took the beginner course and the intermediate course (p = 0.046), as well as between the beginner
course and advanced course (p = 0.034). However, there were no differences between the groups that took the
intermediate and advanced course (p = 0.989).
 There was a statistically significant difference between groups as determined by one-way ANOVA (F(2,27) = 4.467, p =
.021). A Tukey post hoc test revealed that the time to complete the problem was statistically significantly lower after
taking the intermediate (23.6 ± 3.3 min, p = .046) and advanced (23.4 ± 3.2 min, p = .034) course compared to the
beginners course (27.2 ± 3.0 min). There was no statistically significant difference between the intermediate and
advanced groups (p = .989).
 We can see that the significance value is 0.021 (i.e., p = .021), which is below 0.05. and, therefore, there is a statistically
significant difference in the mean length of time to complete the spreadsheet problem between the different courses
taken

Two way ANOVA
Assumptions:
 Independent variable should consist of two or more categorical, independent groups.
 Independence of observations.
 There need to be homogeneity of variances.
 There should be no significant outlier.
 Dependent variable should be approximately normally distributed for each category of two independent variable.
Example:
 A researcher was interested in whether an individual's interest in politics was influenced by their level of education
and gender.
 They recruited a random sample of participants to their study and asked them about their interest in politics, which
they scored from 0 to 100, with higher scores indicating a greater interest in politics.
 The researcher then divided the participants by gender (Male/Female) and then again by level of education
(School/College/University).
 Therefore, the dependent variable was "interest in politics", and the two independent variables were "gender" and
"education".
TwowayANOVA

Two way ANOVA
Procedure:
1. For Gender, code "males" as 1 and "females" as 2, and for Educational Level, code "school" as 1, "college" as 2 and
"university" as 3.
2. Click Analyze > General Linear Model > Univariate on the top menu, as shown below:
3. You will be presented with the Univariate dialogue box, as shown below:
2. 3.

Two way ANOVA
4. Transfer the dependent variable, Int_Politics, into the Dependent Variable: box, and transfer both independent variables,
Gender and Edu_Level, into the Fixed Factor(s): box
4. 5.
5. Click on the Plots Button . You will be presented with the Univariate: Profile Plots dialogue box, as shown below:
6. Transfer the independent variable, Edu_Level, from the Factors: box into the Horizontal Axis: box, and transfer the other
independent variable, Gender, into the Separate Lines: box. You will be presented with the following screen:
6.

Two way ANOVA
7. Click on the Add Button. You will see that "Edu_Level*Gender" has been added to the Plots: box, as shown below:
7. 9.
8. Click on the Continue button. This will return you to the Univariate dialogue box.
9. Click on the post-hoc Button. You will be presented with the Univariate: Post Hoc Multiple Comparisons for Observed Means dialogue.
Transfer Edu_Level from the Factor(s): box to the Post Hoc Tests for: box and select Tukey
11.
10. Click on the Continue button to return to the Univariate dialogue box.
11. Click on the Options Button. This will present you with the Univariate: Options dialogue box.
Transfer Gender, Edu_Level and Gender*Edu_Level from the Factor(s) and Factor Interactions: box into the Display Means for: box.
In the –Display– area, tick the Descriptive Statistics option.. Then click continue and OK.

Two way ANOVA
 You can see from this graph that the lines do not appear to be parallel (with the lines actually crossing). You might
expect there to be a statistically significant interaction.
 We have a statistically significant interaction at the p = .014 level as seen in figure.
 We can see from the table above that there was no statistically significant difference in mean interest in politics
between males and females (p = .207), but there were statistically significant differences between educational levels
(p < .0005).

Two way ANOVA
 We can see that there is a statistically significant difference between all three different educational levels (p < .0005).
 A two-way ANOVA was conducted that examined the effect of gender and education level on interest in politics. There
was a statistically significant interaction between the effects of gender and education level on interest in politics, F (2, 54)
= 4.643, p = .014.
 Simple main effects analysis showed that males were significantly more interested in politics than females when
educated to university level (p = .002), but there were no differences between gender when educated to school (p =
.465) or college level (p = .793).

Pearson’s Correlation
Assumptions:
 Two variables should be measured at the interval or ratio level (continuous)
 There is a linear relationship between your two variables.
 Variables should be approximately normally distributed.
Example:
 A researcher wants to know whether a person's height is related to how well they perform in a long jump.
 The researcher recruited untrained individuals from the general population, measured their height and had them
perform a long jump.
 The researcher then investigated whether there was an association between height and long jump performance by
running a Pearson's correlation.
Range of correlation coefficient ‘r’:
 -1.0 to -0.7 Strong negative association
 -0.7 to -0.3 Weak negative association
 -0.3 to +0.3 little (or no) association
 +0.3 to +0.7 Weak positive association
 +0.7 to +1 Strong positive association

Procedure:
1. Click Analyze > Correlate > Bivariate on the main menu
2. It will be presented with the Bivariate Correlations dialogue box.
1. 2.

3. Transfer the variables Height and Jump_Dist into the Variables. You will end up with a screen similar to the one below:
4. Make sure that the Pearson checkbox is selected under the –Correlation Coefficients– area
5. Click on the Options button and you will be presented with the Bivariate Correlations: Options dialogue box. If you wish
to generate some descriptive, you can do it here by clicking on the relevant checkbox in the –Statistics– area.
6. Click on Continue and then OK button to generate the output.
3. 5.

 We can see that the Pearson correlation coefficient, r, is 0.706, and that it is statistically significant (p = 0.005).
 A Pearson product-moment correlation was run to determine the relationship between height and distance jumped in
a long jump. There was a strong, positive correlation between height and distance jumped, which was statistically
significant (r = .706, n = 14, p = .005).

Linear Regression
Assumptions:
 Two variables should be measured at the interval or ratio level (continuous)
 There is a linear relationship between your two variables.
 There should be independence of observations
 Data needs to show homoscedasticity
 Check that the residuals (errors) of the regression line are approximately normally distributed
Example:
 A salesperson for a large car brand wants to determine whether there is a relationship between an individual's
income and the price they pay for a car.
 As such, the individual's "income" is the independent variable and the "price" they pay for a car is the dependent
variable.
 The salesperson wants to use this information to determine which cars to offer potential customers in new areas
where average income is known.
Linear Regression

Linear Regression
Procedure:
1. Click Analyze > Regression > Linear on the main menu
2. It will be presented with the Linear Regression dialogue box.
1. 3.
3. Transfer the independent variable, Income, into the Independent(s): box and the dependent variable, Price, into the
Dependent: box.
4. Click on the OK button. This will generate the results.

Linear Regression
 This table indicates that the regression model predicts the
dependent variable significantly well
 The R value represents the simple correlation and is 0.873, which indicates a high degree of correlation.
 This table presents the regression equation as:
Price = 8287 + 0.564(Income)
 The R2 value indicates how much of the total variation in the dependent variable, Price, can be explained by the
independent variable, Income. In this case, 76.2% can be explained, which is very large.

Multiple Regression
Assumptions:
 Dependent variables should be measured at the interval or ratio level (continuous)
 Two or more independent variables, which can be either continuous or categorical.
 There is a linear relationship between dependent variable and each of independent variables.
 Data needs to show homoscedasticity
 Data must not show multicollinearity
 There should be no significant outliers, high leverage points or highly influential points
 Check that the residuals (errors) of the regression line are approximately normally distributed
Example:
 A health researcher wants to be able to predict "VO2max", an indicator of fitness and health
 The researcher's goal is to be able to predict VO2max based on four attributes: age, weight, heart rate and gender.
MultipleRegression

Multiple Regression
Procedure:
1. Click Analyze > Regression > Linear on the main menu
2. It will be presented with the Linear Regression dialogue box. Transfer the dependent variable, VO2max, into the Dependent: box
and the independent variables, age, weight, heart_rate and gender into the Independent(s): box,
1. 3.
3. Click on the Statistics button. You will be presented with the Linear Regression: Statistics dialogue box Select Confidence intervals
in the –Regression Coefficients– area leaving the Level(%): option at "95"
4. Click on Continue and then OK button. This will generate the results.
2.

Multiple Regression
 This table indicates that the regression model predicts the dependent variable significantly well
 The r value of 0.760 indicates a good level of prediction/correlation.
 The r2 value of 0.577 that our independent variables explain 57.7% of the variability of our dependent variable, VO2max.

Multiple Regression
 This table presents the regression equation as:
VO2max = 87.83 – (0.165 x age) – (0.385 x weight) – (0.118 x heart_rate) + (13.208 x gender)
 A multiple regression was run to predict VO2max from gender, age, weight and heart rate. These variables statistically
significantly predicted VO2max, F(4, 95) = 32.393, p < .0005, R2 = .577. All four variables added statistically significantly to
the prediction, p < .05.

Binary logistic Regression
Assumptions:
 Dependent variables should be measured on dichotomous scale.
 One or more independent variables, which can be either continuous or categorical.
 There is a linear relationship between dependent variable and each of independent variables.
Example:
 A health researcher wants to be able to predict heart disease.
 The researcher's goal is to be able to predict heart disease from four attributes: age, weight, VO2max and gender.
 There is caseno, a case number as an independent variable which helps to identify the outliers.
Binarylogistic Regression

Binary Logistic Regression
Procedure:
1. Click Analyze > Regression > Binary Logistic on the main menu
2. It will be presented with the Logistic Regression dialogue box. Transfer the dependent variable, heart disease, into the
Dependent: box and the independent variables, age, weight, VO2max and gender into the Independent(s): box,
1. 3.
3. Click on the Categorical button. You will be presented with the Linear Regression: Define Categorical variables box.
Transfer the categorical independent variable, gender, from the Covariates: box to the Categorical Covariates: box
2.

Procedure:
4. In the –Change Contrast– area, change the Reference Category: from the Last option to the First option. Then, click on the
Change button
5. Click on the Continue button. You will be returned to the Logistic Regression dialogue box.
4.
6. Click on the Options button. You will be presented with the Logistic Regression: Options dialogue box.
In the –Statistics and Plots– area, click the Classification plots, Hosmer-Lemeshow goodness-of-fit, Casewise listing of residuals
and CI for exp(B): options, and in the –Display– area, click the At last step option
6.
7. Click on Continue and then OK button. This will generate the results.

 33% variation in dependent variable is explained by independent variable .
 The cut value is .500. This means that if the probability of a case being classified into the "yes" category is greater than
.500, then that particular case is classified into the "yes" category. Otherwise, the case is classified as in the "no" category.

 You can see that age (p = .003), gender (p = .021) and VO2max (p = .039) added significantly to the model/prediction, but
weight (p = .799) did not add significantly to the model.
 Table shows that the odds of having heart disease ("yes" category) is 7.026 times greater for males as opposed to
females.
 A logistic regression was performed to ascertain the effects of age, weight, gender and VO2max on the likelihood that
participants have heart disease. The logistic regression model was statistically significant, χ2(4) = 27.402, p < .0005. The
model explained 33.0% (Nagelkerke R2) of the variance in heart disease and correctly classified 71.0% of cases. Males
were 7.02 times more likely to exhibit heart disease than females. Increasing age was associated with an increased
likelihood of exhibiting heart disease, but increasing VO2max was associated with a reduction in the likelihood of
exhibiting heart disease.

Chi-Square Test
Assumptions:
 Two variables should be measured at an ordinal or nominal level (categorical data)
 Two variable should consist of two or more categorical, independent groups.
Example:
 Educators are looking for novel ways in which to teach statistics to undergraduates as part of a non-statistics degree
course.
 They would like to know whether gender (male/female) is associated with the preferred type of learning medium
(online vs. books).
 We have two nominal variables: Gender (male/female) and Preferred Learning Medium (online/books).
Chi-SquareTest

Chi-Square Test
Procedure:
1. Click Analyze > Descriptives Statistics > Crosstabs... on the top menu
1. 2.
2. It will be presented with the Logistic Regression dialogue box.
Transfer one of the variables into the Row(s): box and the other variable into the Column(s): box.
3. If you want to display clustered bar charts (recommended), make sure that Display clustered bar charts checkbox is ticked.

Chi-Square Test
6. Click on the Format button. You will be presented with table.
4.
5. Click on the Cells button. You will be presented with the following Crosstabs: Cell Display dialogue box.
Select Observed from the –Counts– area, and Row, Column and Total from the –Percentages– area. Then click continue.
6.5.
4. Click on the Statistics button. You will be presented with the following Crosstabs: Statistics dialogue box.
Select the Chi-square and Phi and Cramer's V options, as shown below. Then click continue.
7. Once you made choice, then click on the continue and OK to generate output.

Chi-Square Test
 Both males and females prefer to learn using online materials versus books.
 We can see here that χ(1) = 0.487, p = .485. This tells us that there is no statistically significant association between
Gender and Preferred Learning Medium; that is, both Males and Females equally prefer online learning versus books.
 We can see that the strength of association between the variables is very weak (0.078).

Mann-Whitney U Test
Assumptions:
 Dependent variable should be measured at the ordinal or continuous level.
 Independent variable should consist of two categorical, independent groups.
 It should have independence of observations.
 Mann-Whitney U test can be used when your two variables are not normally distributed.
Example:
 A researcher decided to investigate whether an exercise or weight loss intervention was more effective in lowering
cholesterol levels.
 The researcher recruited a random sample of inactive males that were classified as overweight.
 This sample was then randomly split into two groups: Group 1 underwent a calorie-controlled diet (i.e., the 'diet'
group) and Group 2 undertook an exercise-training programme (i.e., the 'exercise' group).
 In order to determine which treatment programme was more effective, cholesterol concentrations were compared
between the two groups at the end of the treatment programmes.
Mann-WhitneyU Test

Mann-Whitney U Test
Procedure:
1. Click Analyze > Nonparametric Tests > Legacy Dialogs > 2 Independent Samples... on the top menu
1. 2.
2. It will be presented with the Two-Independent-Samples Tests dialogue box.
Transfer the dependent variable, Cholesterol, into the Test Variable List: box and the independent variable, Group, into the
Grouping Variable: box

Mann-Whitney U Test
3. Click on the Define groups button. The button will not be clickable if you have not highlighted the Grouping Variable.
Enter 1 into the Group 1: box and enter 2 into the Group 2: box. Remember that we labelled the Diet group as 1 and the Exercise
group as 2. Then click continue.
3.
4.
4. If you wish to use this procedure to generate some descriptive statistics, click on the Options button and then tick Descriptive and
Quartiles within the –Statistics– area.
5. Click on the Continue button, which will bring you back to the main dialogue box with the Grouping Variable: box now completed.
6. Click on the OK button. This will generate the output for the Mann-Whitney U test.
5.

Mann-Whitney U Test
 Descriptive statistics for the Mann-Whitney U test, they are not actually very useful since the data are not normally
distributed.
 The rank table is very useful because it indicates which group can be considered as having the higher cholesterol
concentrations, overall; namely, the group with the highest mean rank. In this case, the diet group had the highest
cholesterol concentrations.
 It can be concluded that cholesterol concentration in the diet group was statistically significantly higher than the exercise
group (U = 110, p = .014).

Sign Test
Assumptions:
 Independent variable should consist of two categorical, "related groups" or "matched pairs“.
 The paired observations for each participant need to be independent.
 The difference scores (i.e., differences between the paired observations) are from a continuous distribution.
Example:
 A researcher wants to test a new formula for a sports drink that improves running performance.
 He recruited 20 participants who each performed two trials in which they had to run as far as possible in two hours on a
treadmill.
 In one of the trials they drank the carbohydrate-only drink and in the other trial they drank the carbohydrate-protein
drink.
 The order of the trials was counterbalanced and the distance they ran in both trials was recorded.
SignTest

Sign Test
Procedure:
1. Click Analyze > Nonparametric Tests > Legacy Dialogs > 2 Related Samples... on the main menu.
1. 2.
2. It will be presented with the Two-Related-Samples Tests dialogue box.
Transfer the variables carb and carb_protein into the Test Pairs: box by highlighting both variables
Deselect Wilcoxon and select Sign in the –Test Type– area.
Click on the OK button to generate the output.

Sign Test
 You can see how many participants decreased (the "Negative Differences" row), improved (the "Positive Differences"
row) or witnessed no change (the "Ties" row) in their performance in the carbohydrate-protein trial (i.e., carb_protein)
compared to the carbohydrate-only trial (i.e., carb) in Frequencies table.
 The statistical significance (i.e., p-value) of the sign test is found in the "Exact Sig. (2-tailed)" row of the table above.
However, if you had more than a total of 25 positive and negative differences, an "Asymp. Sig. (2-sided test)" row will be
displayed instead..
 Twenty participants were recruited to understand the performance benefits of a carbohydrate-protein versus
carbohydrate-only drink on running performance as measured by the distance run in two hours on a treadmill. An exact
sign test was used to compare the differences in distance run in the two trials. The carbohydrate-protein drink elicited a
statistically significant median increase in distance run (0.113 km) compared to the carbohydrate-only drink, p = .004.

Kruskal-Wallis Test
Assumptions:
 Independent variable should consist of two or more categorical, independent groups.
 It should have independence of observations.
Example:
 The researcher identifies 3 well-known, anti-depressive drugs which might have this positive side effect, and labels them
Drug A, Drug B and Drug C.
 The researcher then recruits a group of 60 individuals with a similar level of back pain and randomly assigns them to one
of three groups – Drug A, Drug B or Drug C treatment groups – and prescribes the relevant drug for a 4 week period.
 At the end of the 4 week period, the researcher asks the participants to rate their back pain on a scale of 1 to 10, with
10 indicating the greatest level of pain.
 The researcher wants to compare the levels of pain experienced by the different groups at the end of the drug
treatment period.
Kruskal-Wallis Test

Kruskal-Wallis Test
Procedure:
1. Click Analyze > Nonparametric Tests > Legacy Dialogs > K Independent Samples... on the top menu.
1. 2.
2. It will be presented with the Test for Several Independent Samples dialogue box.
Transfer the dependent variable, Pain_Score , into the Test Variable List: box and the independent variable, Drug_Treatment
Group, into the Grouping Variable: box

Kruskal-Wallis Test
3. Click on the Define range button. You will be presented with the "Several Independent Samples: Define Range" dialogue box.
Enter "1" into the Minimum: box and "3" into the Maximum box. These values represent the range of codes you gave the groups
of the independent variable, Drug_Treatment_Group (i.e., Drug A was coded "1" through to Drug C which was coded "3").
3. 4.
4. Click on the Continue button and you will be returned to the "Tests for Several Independent Samples" dialogue box, but now with
a completed Grouping Variable: box
5. Click on the OK button to generate the output.

Kruskal-Wallis Test
 The mean rank of the Pain_Score for each drug treatment group can be used to compare the effect of the different drug
treatments. Whether these drug treatment groups have different pain scores can be assessed using the Test Statistics table
which presents the result of the Kruskal-Wallis H test.
 A Kruskal-Wallis H test showed that there was a statistically significant difference in pain score between the different drug
treatments, χ2(2) = 8.520, p = 0.014, with a mean rank pain score of 35.33 for Drug A, 34.83 for Drug B and 21.35 for Drug C.

Descriptives &Frequencies
Procedure:
1. Click Analyze > Descriptive Statistics > Descriptives on the top menu
1. 2.
2. It will be presented with the Descriptives dialogue box. Transfer the dependent variable, Income, into the Variable(s) List box
A. Descriptives
3.
3. Click on Options button. Check list every measures to be displayed. Click continue and then OK.

 Descriptive statistics displayed all the measures to be studied.
 Descriptive statistics displays the result of the continuous variables only.

Procedure:
1. Click Analyze > Descriptive Statistics > Frequencies on the top menu
1. 2.
2. It will be presented with the Frequencies dialogue box. Transfer the both independent and dependent variable, Income, into the
Variable(s) List box
B. Frequencies
3.

4. Click on Charts button. Select any chart you want. In –Chart values- option, click either Frequencies or Percentages
3. 4.
5. Click continue and then Ok to generate output.
3. Click on Statistics button. Check list every measures to be displayed and click continue .

 Frequencies statistics displayed all the measures to be studied
along with graph

Test of Normality and Outliers
Normality and Outliers:
 Skewness and Kurtosis:
The skewness and kurtosis measures should be as close to zero as possible. Divide the measure with its standard error, it
will give you the z-score, which should be somewhere in between -1.96 to +1.96.
 Kolmogorov-Smirnov Test & Shapiro-Wilk Test:
If the sig. value of Shapiro-Wilk test is > 0.05, the data is normal. If it is below 0.005, the data significantly deviate from a
normal distribution
 Histograms and Box plots:
Look at the graphical figures.
Histogram should have the approximate shape of a normal curve.
In Q-Q Plot, the dots should be approximately distributed along the line.
Box plot should be approximately symmetrical.
Process Involved:
In statistics, normality tests are used to determine if a data set is well-modeled by a normal distribution and to compute
how likely it is for a random variable underlying the data set to be normally distributed. is utmost to have a normality
test before going to analysis
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to
variability in the measurement or it may indicate experimental error

Procedure:
1. Click Analyze > Descriptives Statistics > Explore on the top menu.
1.
2.
2. It will be presented with the Explore box. Transfer the dependent variable, Course , into Dependent List: box.

3.
4.
3. Click Statistics. It will present a box. Select Descriptives and Outliers and click continue.
4. Click Plots. It will present a box. Select Stem-and-leaf, Histogram and Normality plots with tests. Then click continue and OK.

Testof Normality and Outliers
 The z-value calculated by dividing skewness and kurtosis measures with their respective Std. Error shows that value
does not lie between -1.96 to +1.96. Thus they are not normally distributed.
 P value of .004 in Shapiro-Wilk test indicates that it is less than 0.05 and thus are not normally distributed.

 Both box plot and Q-Q plot shows that they are not distributed normally.
 In Box plot, Round sign indicates normal outlier whereas Star sign indicates extreme outliers. And the number near to
sign are the cases which show outliers. Ex: 9 indicates the value of course time is 2.00 hours ( consult Fig 1 in procedure)

Dealing with Outliers
1. Leave it in if it is a legitimate outlier. Use a non-parametric test on skewed data.
2. Correct data entry errors
3. Winsorize it to match the highest value in the rest of the data
4. Throw it out only if necessary. Better for multivariate analysis.
5. Transform it in form of log. Ex: 2,3,4,5,6,10,100  0.3, 0.47, 0.6, 0.69, 1, 2
Basic way to winsorize data:
Consider the data set consisting of:
{92, 19, 101, 58, 1053, 91, 26, 78, 10, 13, −40, 101, 86, 85, 15, 89, 89, 28, −5, 41} (N = 20, mean = 101.5)
The data below the 5th percentile lies between −40 and −5, while the data above the 95th percentile lies between 101
and 1053. (Values shown in bold.) Then a 90% winsorization would result in the following:
{92, 19, 101, 58, 101, 91, 26, 78, 10, 13, −5, 101, 86, 85, 15, 89, 89, 28, −5, 41} (N = 20, mean = 55.65)
[For large data you can use Frequencies test in SPSS to winsorize data]

Statistical test in spss

More Related Content

What's hot (20)

Similar to Statistical test in spss (20)

Recently uploaded (20)

Statistical test in spss