SlideShare a Scribd company logo
Statistical test in spss
Statistics
Statistics:
 Science dealing with data collection, organization, analysis, interpretation and presentation.
 Numbers that are used to describe data or relationship.
 Study which develop critical thinking and analytic skills.
Statistical Analysis
Statistical test in spss
Statistical Analysis
Types of Statistical Analysis:
Descriptive Statistics:
 Organizing and summarizing data using number and graphs.
 Data Summary: Bar graphs, Histograms, Pie-charts, Shape of graph and skewness
 Measures of Central Tendency: Mean, Median, Mode
 Measures of variability: Range, Variance and Standard Deviation
 Ex: What is the average age of the sample?
Inferential Statistics:
 Using sample data to make an inference or draw a conclusion of the population.
 Use probability to determine how confident we can be that the conclusions we make are correct. (Confidence Intervals and
Margins of Error)
 t-Test, Chi-square, and ANOVA (analysis of variance)
 Ex: Is the average age of population different from 35?
Level ofMeasurement
Statistical test in spss
Level of Measurement
Types of Level of Measurement:
Qualitative or Categorical:
 Nominal: Two or more categories and lack order. Ex: Gender; 1= male, 2= female, Football jersey number.
 Ordinal: Two or more categories and have order but differences are not meaningful. Ex: Small, Medium and
Large farmers, Place finished in a race: 1st, 2nd, 3rd , and so on.
 Non-Parametric Statistical methods.
Quantitative or Continuous:
 Interval: Quantitative variables having own scale. It lacks ‘0’ and divisions are not meaningful. Ex: temperature in
Celsius.
 Ratio: Quantitative variables with true zero point and divisions are meaningful. Ex: Yield in quintal, Germination, etc.
 Parametric Statistical methods.
Variables Analysis
Statistical test in spss
Variable Analysis
Types of Variable Analysis:
Univariate Analysis:
 It uses only one variable to simply describe what is in a result.
 Its purpose is more toward descriptive rather than explanatory.
 Frequencies, Histograms, Means, Medians, Modes, Ranges, Percentiles, Scatterplots, Confidence Intervals.
 Ex: Gender
Bivariate Analysis:
 It compares or contrasts the effect of two variables on each other.
 Crosstabs, Chi-square, ANOVAs, t-Tests, Correlations.
 Ex: Gender and GPA value
Multivariate Analysis:
 It tests three or more variables together to check for all kinds of effects that occur together.
 OLS regression, Binary logistic regression, and other types of regressions.
 Gender, GPA value and prejudice
Statistical Test
Statistical test in spss
Statistical Tests
Types of Statistical test:
Parametric test:
 Specific assumptions are made about the population parameter.
 It is powerful if existed.
 Test statistics is based on distribution.
 Central measure- mean.
 Can draw more conclusions.
 One way ANOVA, Independent sample t-test, Paired sample t-test, correlation etc.
 Ex: Length and weight of human body.
 Conditions to be satisfied:
Data must be interval/ratio.
Subjects should be randomly selected.
Data should be normally distributed.
Variation in the results should be roughly same.
Statistical Tests
Types of Statistical test:
Non-Parametric test:
 Distribution free test: Does not assume that our data follow a specific distribution.
 Not powerful like parametric test.
 Test statistics is arbitrary.
 Central measure: Median.
 Simplicity, not affected by outliers.
 Chi-square, Friedman test, Kruskal wallis test, Sign test etc.
 Ex: Did fungus growth take place or not?
 Conditions to be satisfied:
Data must be nominal/ordinal.
Non normal distribution of data.
Statistical Tests
Parametric Vs. Non-Parametric test:
Study Type Parametric test Non parametric test
Compare means between two
distinct/independent groups
Two sample t-test Mann-whitney test
Compare two quantitative
measurements taken from the same
individual
Paired t-test Wilcoxon signed-ranked test
Compare means between three or
more distinct/independent groups
Analysis of variance (ANOVA) Kruskal wallis test
Repeated measures, >two conditions One way, repeated measures
ANOVA
Friedman’s test
Estimate the degree of association
between two quantitative variables
Pearson Coefficient of correlation Spearman’s rank correlation
Choosing Right Statistical Test
Statistical test in spss
Choosing right statistical test
Types of Statistical test:
Regression test (test cause and effect relationship):
Predictor variable Outcome variable Research question example
Simple linear regression • Continuous
• One predictor
• Continuous
• One outcome
What is the effect of income on
longevity?
Multiple linear regression • Continuous
• Two or more predictors
• Continuous
• One outcome
What is the effect of income and
minutes of exercise per day on
longevity?
Logistic regression • Continuous • Binary What is the effect of drug
dosages on the survival of test
subjects?
Choosing right statistical test
Types of Statistical test:
Comparison test (differences among group means):
Predictor variable Outcome variable Research question example
Paired t-test • Categorical
• One predictor
• Quantitative
• Groups come from the same
population
What is the effect of two different
test prep programs on the average
exam scores for students from the
same class?
Independent t-test • Categorical
• One predictor
• Quantitative
• Groups come from the
different population
What is the difference in average
exam scores for students from two
different schools?
ANOVA • Categorical
• One or more predictor
• Quantitative
• One outcome
What is the difference in average
pain levels among post-surgical
patients given three different
painkillers?
Choosing right statistical test
Types of Statistical test:
Correlation test (Check whether two variables are related ):
Predictor variable Outcome variable Research question example
Pearson • Continuous • Continuous How are latitude and temperature related?
Chi-square • Categorical • Categorical How is membership in a sports team related
to membership in drama club among high
school students?
Choosing right statistical test
Types of Statistical test:
Non-Parametric test:
Predictor variable Outcome variable Use in place of
Spearman •Ordinal •Ordinal Regression and
correlation tests
Sign test •Categorical •Quantitative T-test
Kruskal–Wallis •Categorical
•3 or more groups
•Quantitative ANOVA
Wilcoxon Rank-
Sum test
•Categorical
•2 groups
•Quantitative
•Groups come from different populations
Independent t-test
Wilcoxon Signed-
rank test
•Categorical
•2 groups
•Quantitative
•Groups come from the same population
Paired t-test
Statistical test in spss
One sample t-test
Assumptions:
 Dependent variable should be measured at interval or scale ratio (continuous).
 The data are independent.
 There should be no significant outliers.
 Dependent variable should be approximately normally distributed.
 Sample are small, mostly lower than 30.
Example:
 A researcher is planning a psychological intervention study to test depression index, where anyone who achieves a
score of 4.0 is deemed to have 'normal' levels of depression.
 Lower scores indicate less depression and higher scores indicate greater depression.
 He has recruited 40 participants to take part in the study.
 Depression scores are recorded in the variable dep_score.
 He wants to know whether his sample is representative of the normal population.
Onesamplet-test
One sample t-test
Procedure:
1. Click Analyze > Compare Means > One-Sample T Test... on the main menu.
2. It will be presented with the One-Sample T Test dialogue box, as shown below:
1. 2.
One sample t-test
3. Transfer the dependent variable, dep_score, into the Test Variable(s). Enter the population mean you are comparing the
sample against in the Test Value: box, by changing the current value of "0" to "4". You will end up with the screen.
4. Click on the Options button. You will be presented with the One-Sample T Test: Options dialogue box, as shown below:
5. Click on the Continue button. You will be returned to the One-Sample T Test dialogue box.
6. Click on the OK button to generate the output.
3. 4.
One sample t-test
Result and Interpretation:
 In this example, p < .05 (0.022) therefore, it can be concluded that the population means are statistically significantly
different.
 If p > .05, the difference between the sample-estimated population mean and the comparison population mean
would not be statistically significantly different.
 Depression score was statistically significantly lower than the population normal depression score, t(39) = -2.381, p =
.022.
 There was a statistically significant difference between means (p < .05). Therefore, we can reject the null hypothesis
and accept the alternative hypothesis.
Independent sample t-test
Assumptions:
 Dependent variable should be measured at interval or scale ratio (continuous).
 Independent variable should consist of two categorical, independent groups.
 The data are independent.
 There should be no significant outliers.
 Dependent variable should be approximately normally distributed.
 There needs to be homogeneity of variances.
Example:
 A researcher decided to investigate whether an exercise or weight loss intervention is more effective in lowering
cholesterol levels.
 To this end, the researcher recruited a random sample of inactive males that were classified as overweight.
 This sample was then randomly split into two groups:
 Group 1 underwent a calorie-controlled diet and Group 2 undertook the exercise-training programme.
 In order to determine which treatment programme was more effective, the mean cholesterol concentrations were
compared between the two groups at the end of the treatment programmes.
Independent samplet-test
Independent sample t-test
Procedure:
1. Click Analyze > Compare Means > Independent-Samples T Test... on the top menu, as shown below:
2. You will be presented with the Independent-Samples T Test dialogue box, as shown below:
1. 2.
Independent sample t-test
3. Transfer the dependent variable, Cholesterol, into the Test Variable(s): box, and transfer the independent variable, Treatment,
into the Grouping Variable: box.
3.
4.
4. You then need to define the groups (treatments). Click on the Define Options Button. You will be presented with the Define
Groups dialogue box
5. Enter "1" into the Group 1: box and enter "2" into the Group 2: box. Remember that we labelled the Diet Treatment group as 1
and the Exercise Treatment group as 2.
6. Click the Continue Button.
7. Click the Ok Button.
5.
Independent sample t-test
Result and Interpretation:
 You can see that the group means are statistically significantly different because the value in the "Sig. (2-tailed)" row is less
than 0.05.
 Looking at the Group Statistics table, we can see that those people who undertook the exercise trial had lower cholesterol
levels at the end of the programme than those who underwent a calorie-controlled diet.
 This study found that overweight, physically inactive male participants had statistically significantly lower cholesterol
concentrations (5.80 ± 0.38 mmol/L) at the end of an exercise-training programme compared to after a calorie-controlled diet
(6.15 ± 0.52 mmol/L), t(38)=2.428, p=0.020.
Paired t-test
Assumptions:
 Dependent variable should be measured at interval or scale ratio (continuous).
 Independent variable should consist of two categorical, related groups or matched pairs.
 There should be no significant outliers.
 The distribution of the differences in the dependent variable between the two related groups should be approximately
normally distributed.
Example:
 A group of Sports Science students (n = 20) are selected from the population to investigate whether a 12-week
plyometric-training programme improves their standing long jump performance.
 In order to test whether this training improves performance, the students are tested for their long jump
performance before they undertake a plyometric-training programme and then again at the end of the programme
(i.e., the dependent variable is "standing long jump performance", and the two related groups are the standing long
jump values "before" and "after" the 12-week plyometric-training programme).
Paired t-test
Paired t-test
Procedure:
1. Click Analyze > Compare Means > Paired Samples T Test... on the top menu, as shown below:
2. You will be presented with the Paired-Samples T Test dialogue box, as shown below:
1. 2.
Paired t-test
3. Transfer the variables JUMP1 and JUMP2 into the Paired Variables: box.
3.
4. Click the Continue Button.
5. Click the Ok Button.
Paired t-test
Result and Interpretation:
 We can conclude that there was a statistically significant improvement in jump distance following the plyometric-
training programme from 2.48 ± 0.16 m to 2.52 ± 0.16 m (p < 0.0005); an improvement of 0.03 ± 0.03 m.
One way ANOVA
Assumptions:
 Dependent variable should be measured at interval or scale ratio (continuous)
 Independent variable should consist of two or more categorical, independent groups
 Independence of observations
 There need to be homogeneity of variances
 There should be no significant outlier
 Dependent variable should be approximately normally distributed for each category of the independent variable.
Example:
 A manager employs an external agency which provides training in the spreadsheet program for his employees.
 They offer 3 courses: a beginner, intermediate and advanced course.
 He is unsure which course is needed for the type of work they do at his company, so he sends 10 employees on the
beginner course, 10 on the intermediate and 10 on the advanced course.
 When they all return from the training, he gives them a problem to solve using the spreadsheet program, and times
how long it takes them to complete the problem.
 He then compares the three courses (beginner, intermediate, advanced) to see if there are any differences in the
average time it took to complete the problem.
OnewayANOVA
One way ANOVA
Procedure:
1. Click Analyze > Compare Means > One way ANOVA on the top menu, as shown below:
2. You will be presented with the One way ANOVA dialogue box, as shown below:
1. 2.
One way ANOVA
3. Transfer the dependent variable, Time, into the Test Variable(s): box, and transfer the independent variable, course into the
Factor box.
3.
4.
4. Click on the Post hoc button. Tick the Tukey checkbox as shown below:
5. Click the Continue Button.
8. Click the Ok Button.
6.
6. Click on the Options button. Tick the Descriptive checkbox in the –Statistics– area, as shown below:
7. Click the Continue Button.
One way ANOVA
Result and Interpretation:
One way ANOVA
Result and Interpretation:
 We can see from the table that there is a statistically significant difference in time to complete the problem between
the group that took the beginner course and the intermediate course (p = 0.046), as well as between the beginner
course and advanced course (p = 0.034). However, there were no differences between the groups that took the
intermediate and advanced course (p = 0.989).
 There was a statistically significant difference between groups as determined by one-way ANOVA (F(2,27) = 4.467, p =
.021). A Tukey post hoc test revealed that the time to complete the problem was statistically significantly lower after
taking the intermediate (23.6 ± 3.3 min, p = .046) and advanced (23.4 ± 3.2 min, p = .034) course compared to the
beginners course (27.2 ± 3.0 min). There was no statistically significant difference between the intermediate and
advanced groups (p = .989).
 We can see that the significance value is 0.021 (i.e., p = .021), which is below 0.05. and, therefore, there is a statistically
significant difference in the mean length of time to complete the spreadsheet problem between the different courses
taken
Two way ANOVA
Assumptions:
 Dependent variable should be measured at interval or scale ratio (continuous).
 Independent variable should consist of two or more categorical, independent groups.
 Independence of observations.
 There need to be homogeneity of variances.
 There should be no significant outlier.
 Dependent variable should be approximately normally distributed for each category of two independent variable.
Example:
 A researcher was interested in whether an individual's interest in politics was influenced by their level of education
and gender.
 They recruited a random sample of participants to their study and asked them about their interest in politics, which
they scored from 0 to 100, with higher scores indicating a greater interest in politics.
 The researcher then divided the participants by gender (Male/Female) and then again by level of education
(School/College/University).
 Therefore, the dependent variable was "interest in politics", and the two independent variables were "gender" and
"education".
TwowayANOVA
Two way ANOVA
Procedure:
1. For Gender, code "males" as 1 and "females" as 2, and for Educational Level, code "school" as 1, "college" as 2 and
"university" as 3.
2. Click Analyze > General Linear Model > Univariate on the top menu, as shown below:
3. You will be presented with the Univariate dialogue box, as shown below:
2. 3.
Two way ANOVA
4. Transfer the dependent variable, Int_Politics, into the Dependent Variable: box, and transfer both independent variables,
Gender and Edu_Level, into the Fixed Factor(s): box
4. 5.
5. Click on the Plots Button . You will be presented with the Univariate: Profile Plots dialogue box, as shown below:
6. Transfer the independent variable, Edu_Level, from the Factors: box into the Horizontal Axis: box, and transfer the other
independent variable, Gender, into the Separate Lines: box. You will be presented with the following screen:
6.
Two way ANOVA
7. Click on the Add Button. You will see that "Edu_Level*Gender" has been added to the Plots: box, as shown below:
7. 9.
8. Click on the Continue button. This will return you to the Univariate dialogue box.
9. Click on the post-hoc Button. You will be presented with the Univariate: Post Hoc Multiple Comparisons for Observed Means dialogue.
Transfer Edu_Level from the Factor(s): box to the Post Hoc Tests for: box and select Tukey
11.
10. Click on the Continue button to return to the Univariate dialogue box.
11. Click on the Options Button. This will present you with the Univariate: Options dialogue box.
Transfer Gender, Edu_Level and Gender*Edu_Level from the Factor(s) and Factor Interactions: box into the Display Means for: box.
In the –Display– area, tick the Descriptive Statistics option.. Then click continue and OK.
Two way ANOVA
Result and Interpretation:
 You can see from this graph that the lines do not appear to be parallel (with the lines actually crossing). You might
expect there to be a statistically significant interaction.
 We have a statistically significant interaction at the p = .014 level as seen in figure.
 We can see from the table above that there was no statistically significant difference in mean interest in politics
between males and females (p = .207), but there were statistically significant differences between educational levels
(p < .0005).
Two way ANOVA
Result and Interpretation:
 We can see that there is a statistically significant difference between all three different educational levels (p < .0005).
 A two-way ANOVA was conducted that examined the effect of gender and education level on interest in politics. There
was a statistically significant interaction between the effects of gender and education level on interest in politics, F (2, 54)
= 4.643, p = .014.
 Simple main effects analysis showed that males were significantly more interested in politics than females when
educated to university level (p = .002), but there were no differences between gender when educated to school (p =
.465) or college level (p = .793).
Pearson’s Correlation
Assumptions:
 Two variables should be measured at the interval or ratio level (continuous)
 There is a linear relationship between your two variables.
 There should be no significant outliers.
 Variables should be approximately normally distributed.
Example:
 A researcher wants to know whether a person's height is related to how well they perform in a long jump.
 The researcher recruited untrained individuals from the general population, measured their height and had them
perform a long jump.
 The researcher then investigated whether there was an association between height and long jump performance by
running a Pearson's correlation.
Range of correlation coefficient ‘r’:
 -1.0 to -0.7 Strong negative association
 -0.7 to -0.3 Weak negative association
 -0.3 to +0.3 little (or no) association
 +0.3 to +0.7 Weak positive association
 +0.7 to +1 Strong positive association
Pearson’s Correlation
Pearson’s Correlation
Procedure:
1. Click Analyze > Correlate > Bivariate on the main menu
2. It will be presented with the Bivariate Correlations dialogue box.
1. 2.
Pearson’s Correlation
3. Transfer the variables Height and Jump_Dist into the Variables. You will end up with a screen similar to the one below:
4. Make sure that the Pearson checkbox is selected under the –Correlation Coefficients– area
5. Click on the Options button and you will be presented with the Bivariate Correlations: Options dialogue box. If you wish
to generate some descriptive, you can do it here by clicking on the relevant checkbox in the –Statistics– area.
6. Click on Continue and then OK button to generate the output.
3. 5.
Pearson’s Correlation
Result and Interpretation:
 We can see that the Pearson correlation coefficient, r, is 0.706, and that it is statistically significant (p = 0.005).
 A Pearson product-moment correlation was run to determine the relationship between height and distance jumped in
a long jump. There was a strong, positive correlation between height and distance jumped, which was statistically
significant (r = .706, n = 14, p = .005).
Linear Regression
Assumptions:
 Two variables should be measured at the interval or ratio level (continuous)
 There is a linear relationship between your two variables.
 There should be independence of observations
 Data needs to show homoscedasticity
 Check that the residuals (errors) of the regression line are approximately normally distributed
Example:
 A salesperson for a large car brand wants to determine whether there is a relationship between an individual's
income and the price they pay for a car.
 As such, the individual's "income" is the independent variable and the "price" they pay for a car is the dependent
variable.
 The salesperson wants to use this information to determine which cars to offer potential customers in new areas
where average income is known.
Linear Regression
Linear Regression
Procedure:
1. Click Analyze > Regression > Linear on the main menu
2. It will be presented with the Linear Regression dialogue box.
1. 3.
3. Transfer the independent variable, Income, into the Independent(s): box and the dependent variable, Price, into the
Dependent: box.
4. Click on the OK button. This will generate the results.
Linear Regression
Result and Interpretation:
 This table indicates that the regression model predicts the
dependent variable significantly well
 The R value represents the simple correlation and is 0.873, which indicates a high degree of correlation.
 This table presents the regression equation as:
Price = 8287 + 0.564(Income)
 The R2 value indicates how much of the total variation in the dependent variable, Price, can be explained by the
independent variable, Income. In this case, 76.2% can be explained, which is very large.
Multiple Regression
Assumptions:
 Dependent variables should be measured at the interval or ratio level (continuous)
 Two or more independent variables, which can be either continuous or categorical.
 There is a linear relationship between dependent variable and each of independent variables.
 There should be independence of observations
 Data needs to show homoscedasticity
 Data must not show multicollinearity
 There should be no significant outliers, high leverage points or highly influential points
 Check that the residuals (errors) of the regression line are approximately normally distributed
Example:
 A health researcher wants to be able to predict "VO2max", an indicator of fitness and health
 The researcher's goal is to be able to predict VO2max based on four attributes: age, weight, heart rate and gender.
MultipleRegression
Multiple Regression
Procedure:
1. Click Analyze > Regression > Linear on the main menu
2. It will be presented with the Linear Regression dialogue box. Transfer the dependent variable, VO2max, into the Dependent: box
and the independent variables, age, weight, heart_rate and gender into the Independent(s): box,
1. 3.
3. Click on the Statistics button. You will be presented with the Linear Regression: Statistics dialogue box Select Confidence intervals
in the –Regression Coefficients– area leaving the Level(%): option at "95"
4. Click on Continue and then OK button. This will generate the results.
2.
Multiple Regression
Result and Interpretation:
 This table indicates that the regression model predicts the dependent variable significantly well
 The r value of 0.760 indicates a good level of prediction/correlation.
 The r2 value of 0.577 that our independent variables explain 57.7% of the variability of our dependent variable, VO2max.
Multiple Regression
Result and Interpretation:
 This table presents the regression equation as:
VO2max = 87.83 – (0.165 x age) – (0.385 x weight) – (0.118 x heart_rate) + (13.208 x gender)
 A multiple regression was run to predict VO2max from gender, age, weight and heart rate. These variables statistically
significantly predicted VO2max, F(4, 95) = 32.393, p < .0005, R2 = .577. All four variables added statistically significantly to
the prediction, p < .05.
Binary logistic Regression
Assumptions:
 Dependent variables should be measured on dichotomous scale.
 One or more independent variables, which can be either continuous or categorical.
 There is a linear relationship between dependent variable and each of independent variables.
 There should be independence of observations
Example:
 A health researcher wants to be able to predict heart disease.
 The researcher's goal is to be able to predict heart disease from four attributes: age, weight, VO2max and gender.
 There is caseno, a case number as an independent variable which helps to identify the outliers.
Binarylogistic Regression
Binary Logistic Regression
Procedure:
1. Click Analyze > Regression > Binary Logistic on the main menu
2. It will be presented with the Logistic Regression dialogue box. Transfer the dependent variable, heart disease, into the
Dependent: box and the independent variables, age, weight, VO2max and gender into the Independent(s): box,
1. 3.
3. Click on the Categorical button. You will be presented with the Linear Regression: Define Categorical variables box.
Transfer the categorical independent variable, gender, from the Covariates: box to the Categorical Covariates: box
2.
Binary Logistic Regression
Procedure:
4. In the –Change Contrast– area, change the Reference Category: from the Last option to the First option. Then, click on the
Change button
5. Click on the Continue button. You will be returned to the Logistic Regression dialogue box.
4.
6. Click on the Options button. You will be presented with the Logistic Regression: Options dialogue box.
In the –Statistics and Plots– area, click the Classification plots, Hosmer-Lemeshow goodness-of-fit, Casewise listing of residuals
and CI for exp(B): options, and in the –Display– area, click the At last step option
6.
7. Click on Continue and then OK button. This will generate the results.
Binary Logistic Regression
Result and Interpretation:
 33% variation in dependent variable is explained by independent variable .
 The cut value is .500. This means that if the probability of a case being classified into the "yes" category is greater than
.500, then that particular case is classified into the "yes" category. Otherwise, the case is classified as in the "no" category.
Binary Logistic Regression
Result and Interpretation:
 You can see that age (p = .003), gender (p = .021) and VO2max (p = .039) added significantly to the model/prediction, but
weight (p = .799) did not add significantly to the model.
 Table shows that the odds of having heart disease ("yes" category) is 7.026 times greater for males as opposed to
females.
 A logistic regression was performed to ascertain the effects of age, weight, gender and VO2max on the likelihood that
participants have heart disease. The logistic regression model was statistically significant, χ2(4) = 27.402, p < .0005. The
model explained 33.0% (Nagelkerke R2) of the variance in heart disease and correctly classified 71.0% of cases. Males
were 7.02 times more likely to exhibit heart disease than females. Increasing age was associated with an increased
likelihood of exhibiting heart disease, but increasing VO2max was associated with a reduction in the likelihood of
exhibiting heart disease.
Chi-Square Test
Assumptions:
 Two variables should be measured at an ordinal or nominal level (categorical data)
 Two variable should consist of two or more categorical, independent groups.
Example:
 Educators are looking for novel ways in which to teach statistics to undergraduates as part of a non-statistics degree
course.
 They would like to know whether gender (male/female) is associated with the preferred type of learning medium
(online vs. books).
 We have two nominal variables: Gender (male/female) and Preferred Learning Medium (online/books).
Chi-SquareTest
Chi-Square Test
Procedure:
1. Click Analyze > Descriptives Statistics > Crosstabs... on the top menu
1. 2.
2. It will be presented with the Logistic Regression dialogue box.
Transfer one of the variables into the Row(s): box and the other variable into the Column(s): box.
3. If you want to display clustered bar charts (recommended), make sure that Display clustered bar charts checkbox is ticked.
Chi-Square Test
6. Click on the Format button. You will be presented with table.
4.
5. Click on the Cells button. You will be presented with the following Crosstabs: Cell Display dialogue box.
Select Observed from the –Counts– area, and Row, Column and Total from the –Percentages– area. Then click continue.
6.5.
4. Click on the Statistics button. You will be presented with the following Crosstabs: Statistics dialogue box.
Select the Chi-square and Phi and Cramer's V options, as shown below. Then click continue.
7. Once you made choice, then click on the continue and OK to generate output.
Chi-Square Test
Result and Interpretation:
 Both males and females prefer to learn using online materials versus books.
 We can see here that χ(1) = 0.487, p = .485. This tells us that there is no statistically significant association between
Gender and Preferred Learning Medium; that is, both Males and Females equally prefer online learning versus books.
 We can see that the strength of association between the variables is very weak (0.078).
Mann-Whitney U Test
Assumptions:
 Dependent variable should be measured at the ordinal or continuous level.
 Independent variable should consist of two categorical, independent groups.
 It should have independence of observations.
 Mann-Whitney U test can be used when your two variables are not normally distributed.
Example:
 A researcher decided to investigate whether an exercise or weight loss intervention was more effective in lowering
cholesterol levels.
 The researcher recruited a random sample of inactive males that were classified as overweight.
 This sample was then randomly split into two groups: Group 1 underwent a calorie-controlled diet (i.e., the 'diet'
group) and Group 2 undertook an exercise-training programme (i.e., the 'exercise' group).
 In order to determine which treatment programme was more effective, cholesterol concentrations were compared
between the two groups at the end of the treatment programmes.
Mann-WhitneyU Test
Mann-Whitney U Test
Procedure:
1. Click Analyze > Nonparametric Tests > Legacy Dialogs > 2 Independent Samples... on the top menu
1. 2.
2. It will be presented with the Two-Independent-Samples Tests dialogue box.
Transfer the dependent variable, Cholesterol, into the Test Variable List: box and the independent variable, Group, into the
Grouping Variable: box
Mann-Whitney U Test
3. Click on the Define groups button. The button will not be clickable if you have not highlighted the Grouping Variable.
Enter 1 into the Group 1: box and enter 2 into the Group 2: box. Remember that we labelled the Diet group as 1 and the Exercise
group as 2. Then click continue.
3.
4.
4. If you wish to use this procedure to generate some descriptive statistics, click on the Options button and then tick Descriptive and
Quartiles within the –Statistics– area.
5. Click on the Continue button, which will bring you back to the main dialogue box with the Grouping Variable: box now completed.
6. Click on the OK button. This will generate the output for the Mann-Whitney U test.
5.
Mann-Whitney U Test
Result and Interpretation:
 Descriptive statistics for the Mann-Whitney U test, they are not actually very useful since the data are not normally
distributed.
 The rank table is very useful because it indicates which group can be considered as having the higher cholesterol
concentrations, overall; namely, the group with the highest mean rank. In this case, the diet group had the highest
cholesterol concentrations.
 It can be concluded that cholesterol concentration in the diet group was statistically significantly higher than the exercise
group (U = 110, p = .014).
Sign Test
Assumptions:
 Dependent variable should be measured at the ordinal or continuous level.
 Independent variable should consist of two categorical, "related groups" or "matched pairs“.
 The paired observations for each participant need to be independent.
 The difference scores (i.e., differences between the paired observations) are from a continuous distribution.
Example:
 A researcher wants to test a new formula for a sports drink that improves running performance.
 He recruited 20 participants who each performed two trials in which they had to run as far as possible in two hours on a
treadmill.
 In one of the trials they drank the carbohydrate-only drink and in the other trial they drank the carbohydrate-protein
drink.
 The order of the trials was counterbalanced and the distance they ran in both trials was recorded.
SignTest
Sign Test
Procedure:
1. Click Analyze > Nonparametric Tests > Legacy Dialogs > 2 Related Samples... on the main menu.
1. 2.
2. It will be presented with the Two-Related-Samples Tests dialogue box.
Transfer the variables carb and carb_protein into the Test Pairs: box by highlighting both variables
Deselect Wilcoxon and select Sign in the –Test Type– area.
Click on the OK button to generate the output.
Sign Test
Result and Interpretation:
 You can see how many participants decreased (the "Negative Differences" row), improved (the "Positive Differences"
row) or witnessed no change (the "Ties" row) in their performance in the carbohydrate-protein trial (i.e., carb_protein)
compared to the carbohydrate-only trial (i.e., carb) in Frequencies table.
 The statistical significance (i.e., p-value) of the sign test is found in the "Exact Sig. (2-tailed)" row of the table above.
However, if you had more than a total of 25 positive and negative differences, an "Asymp. Sig. (2-sided test)" row will be
displayed instead..
 Twenty participants were recruited to understand the performance benefits of a carbohydrate-protein versus
carbohydrate-only drink on running performance as measured by the distance run in two hours on a treadmill. An exact
sign test was used to compare the differences in distance run in the two trials. The carbohydrate-protein drink elicited a
statistically significant median increase in distance run (0.113 km) compared to the carbohydrate-only drink, p = .004.
Kruskal-Wallis Test
Assumptions:
 Dependent variable should be measured at the ordinal or continuous level.
 Independent variable should consist of two or more categorical, independent groups.
 It should have independence of observations.
Example:
 The researcher identifies 3 well-known, anti-depressive drugs which might have this positive side effect, and labels them
Drug A, Drug B and Drug C.
 The researcher then recruits a group of 60 individuals with a similar level of back pain and randomly assigns them to one
of three groups – Drug A, Drug B or Drug C treatment groups – and prescribes the relevant drug for a 4 week period.
 At the end of the 4 week period, the researcher asks the participants to rate their back pain on a scale of 1 to 10, with
10 indicating the greatest level of pain.
 The researcher wants to compare the levels of pain experienced by the different groups at the end of the drug
treatment period.
Kruskal-Wallis Test
Kruskal-Wallis Test
Procedure:
1. Click Analyze > Nonparametric Tests > Legacy Dialogs > K Independent Samples... on the top menu.
1. 2.
2. It will be presented with the Test for Several Independent Samples dialogue box.
Transfer the dependent variable, Pain_Score , into the Test Variable List: box and the independent variable, Drug_Treatment
Group, into the Grouping Variable: box
Kruskal-Wallis Test
3. Click on the Define range button. You will be presented with the "Several Independent Samples: Define Range" dialogue box.
Enter "1" into the Minimum: box and "3" into the Maximum box. These values represent the range of codes you gave the groups
of the independent variable, Drug_Treatment_Group (i.e., Drug A was coded "1" through to Drug C which was coded "3").
3. 4.
4. Click on the Continue button and you will be returned to the "Tests for Several Independent Samples" dialogue box, but now with
a completed Grouping Variable: box
5. Click on the OK button to generate the output.
Kruskal-Wallis Test
Result and Interpretation:
 The mean rank of the Pain_Score for each drug treatment group can be used to compare the effect of the different drug
treatments. Whether these drug treatment groups have different pain scores can be assessed using the Test Statistics table
which presents the result of the Kruskal-Wallis H test.
 A Kruskal-Wallis H test showed that there was a statistically significant difference in pain score between the different drug
treatments, χ2(2) = 8.520, p = 0.014, with a mean rank pain score of 35.33 for Drug A, 34.83 for Drug B and 21.35 for Drug C.
Descriptives &Frequencies
Procedure:
1. Click Analyze > Descriptive Statistics > Descriptives on the top menu
1. 2.
2. It will be presented with the Descriptives dialogue box. Transfer the dependent variable, Income, into the Variable(s) List box
A. Descriptives
3.
3. Click on Options button. Check list every measures to be displayed. Click continue and then OK.
Descriptives &Frequencies
Result and Interpretation:
 Descriptive statistics displayed all the measures to be studied.
 Descriptive statistics displays the result of the continuous variables only.
Descriptives &Frequencies
Procedure:
1. Click Analyze > Descriptive Statistics > Frequencies on the top menu
1. 2.
2. It will be presented with the Frequencies dialogue box. Transfer the both independent and dependent variable, Income, into the
Variable(s) List box
B. Frequencies
3.
Descriptives &Frequencies
4. Click on Charts button. Select any chart you want. In –Chart values- option, click either Frequencies or Percentages
3. 4.
5. Click continue and then Ok to generate output.
3. Click on Statistics button. Check list every measures to be displayed and click continue .
Descriptives &Frequencies
Result and Interpretation:
 Frequencies statistics displayed all the measures to be studied
along with graph
Test of Normality and Outliers
Normality and Outliers:
 Skewness and Kurtosis:
The skewness and kurtosis measures should be as close to zero as possible. Divide the measure with its standard error, it
will give you the z-score, which should be somewhere in between -1.96 to +1.96.
 Kolmogorov-Smirnov Test & Shapiro-Wilk Test:
If the sig. value of Shapiro-Wilk test is > 0.05, the data is normal. If it is below 0.005, the data significantly deviate from a
normal distribution
 Histograms and Box plots:
Look at the graphical figures.
Histogram should have the approximate shape of a normal curve.
In Q-Q Plot, the dots should be approximately distributed along the line.
Box plot should be approximately symmetrical.
Process Involved:
In statistics, normality tests are used to determine if a data set is well-modeled by a normal distribution and to compute
how likely it is for a random variable underlying the data set to be normally distributed. is utmost to have a normality
test before going to analysis
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to
variability in the measurement or it may indicate experimental error
Test of Normality and Outliers
Procedure:
1. Click Analyze > Descriptives Statistics > Explore on the top menu.
1.
2.
2. It will be presented with the Explore box. Transfer the dependent variable, Course , into Dependent List: box.
Test of Normality and Outliers
3.
4.
3. Click Statistics. It will present a box. Select Descriptives and Outliers and click continue.
4. Click Plots. It will present a box. Select Stem-and-leaf, Histogram and Normality plots with tests. Then click continue and OK.
Testof Normality and Outliers
Result and Interpretation:
 The z-value calculated by dividing skewness and kurtosis measures with their respective Std. Error shows that value
does not lie between -1.96 to +1.96. Thus they are not normally distributed.
 P value of .004 in Shapiro-Wilk test indicates that it is less than 0.05 and thus are not normally distributed.
Test of Normality and Outliers
Result and Interpretation:
 Both box plot and Q-Q plot shows that they are not distributed normally.
 In Box plot, Round sign indicates normal outlier whereas Star sign indicates extreme outliers. And the number near to
sign are the cases which show outliers. Ex: 9 indicates the value of course time is 2.00 hours ( consult Fig 1 in procedure)
Dealing with Outliers
1. Leave it in if it is a legitimate outlier. Use a non-parametric test on skewed data.
2. Correct data entry errors
3. Winsorize it to match the highest value in the rest of the data
4. Throw it out only if necessary. Better for multivariate analysis.
5. Transform it in form of log. Ex: 2,3,4,5,6,10,100  0.3, 0.47, 0.6, 0.69, 1, 2
Basic way to winsorize data:
Consider the data set consisting of:
{92, 19, 101, 58, 1053, 91, 26, 78, 10, 13, −40, 101, 86, 85, 15, 89, 89, 28, −5, 41} (N = 20, mean = 101.5)
The data below the 5th percentile lies between −40 and −5, while the data above the 95th percentile lies between 101
and 1053. (Values shown in bold.) Then a 90% winsorization would result in the following:
{92, 19, 101, 58, 101, 91, 26, 78, 10, 13, −5, 101, 86, 85, 15, 89, 89, 28, −5, 41} (N = 20, mean = 55.65)
[For large data you can use Frequencies test in SPSS to winsorize data]
Statistical test in spss

More Related Content

PDF
Anova, ancova, manova thiyagu
PDF
Data Analysis using SPSS: Part 1
PPT
Setting credit limits
PDF
Spss series - data entry and coding
PPTX
Business analytics and it's tools and competitive advantage
PPTX
data analysis techniques and statistical softwares
PPTX
Statistical tests
PPTX
Confirmatory Factor Analysis Presented by Mahfoudh Mgammal
Anova, ancova, manova thiyagu
Data Analysis using SPSS: Part 1
Setting credit limits
Spss series - data entry and coding
Business analytics and it's tools and competitive advantage
data analysis techniques and statistical softwares
Statistical tests
Confirmatory Factor Analysis Presented by Mahfoudh Mgammal

What's hot (20)

PPTX
Lecture 6. univariate and bivariate analysis
PPTX
Correlation and regression
PPTX
Simple linear regression
ODP
Multiple Linear Regression II and ANOVA I
PPT
Statistical Inference
PPTX
Univariate Analysis
PPT
Anova lecture
PDF
Introduction to ANOVA
PPT
Simple Linier Regression
ODP
Correlation
PPTX
7 anova chi square test
PPTX
Analysis of data in research
DOCX
Binary Logistic Regression
PPTX
Multivariate data analysis
PPTX
F-Distribution
PPTX
Analysis of variance
PPTX
Descriptive statistics
PDF
Logistic regression
PPT
Chi-square, Yates, Fisher & McNemar
Lecture 6. univariate and bivariate analysis
Correlation and regression
Simple linear regression
Multiple Linear Regression II and ANOVA I
Statistical Inference
Univariate Analysis
Anova lecture
Introduction to ANOVA
Simple Linier Regression
Correlation
7 anova chi square test
Analysis of data in research
Binary Logistic Regression
Multivariate data analysis
F-Distribution
Analysis of variance
Descriptive statistics
Logistic regression
Chi-square, Yates, Fisher & McNemar
Ad

Similar to Statistical test in spss (20)

PDF
Basic Statistics in Social Science Research.pdf
PPTX
REVISION SLIDES 2.pptx
PPTX
Parametric & non-parametric
PPTX
REVIEWCOMPREHENSIVE-EXAM. BY bjohn MBpptx
PPT
Overview-of-Biostatistics-Jody-Kriemanpt
PPT
Overview-of-Biostatistics-Jody-Krieman-5-6-09 (1).ppt
PPTX
mean comparison.pptx
PPTX
mean comparison.pptx
PPTX
When to use, What Statistical Test for data Analysis modified.pptx
PDF
Analysis of Data - Dr. K. Thiyagu
PDF
Analysis of data thiyagu
PDF
Analysis of Data - Dr. K. Thiyagu
PDF
Basic stat tools
PPTX
Statistics.pptx
PPTX
Inferential Statistics.pptx
PPTX
BASIC STATISTICAL TREATMENT IN RESEARCH.pptx
PPTX
Very good statistics-overview rbc (1)
PPTX
Biostatistics ii
PPT
1 ANOVA.ppt
PPT
Kinds Of Variables Kato Begum
Basic Statistics in Social Science Research.pdf
REVISION SLIDES 2.pptx
Parametric & non-parametric
REVIEWCOMPREHENSIVE-EXAM. BY bjohn MBpptx
Overview-of-Biostatistics-Jody-Kriemanpt
Overview-of-Biostatistics-Jody-Krieman-5-6-09 (1).ppt
mean comparison.pptx
mean comparison.pptx
When to use, What Statistical Test for data Analysis modified.pptx
Analysis of Data - Dr. K. Thiyagu
Analysis of data thiyagu
Analysis of Data - Dr. K. Thiyagu
Basic stat tools
Statistics.pptx
Inferential Statistics.pptx
BASIC STATISTICAL TREATMENT IN RESEARCH.pptx
Very good statistics-overview rbc (1)
Biostatistics ii
1 ANOVA.ppt
Kinds Of Variables Kato Begum
Ad

Recently uploaded (20)

PPTX
Introduction to Knowledge Engineering Part 1
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Foundation of Data Science unit number two notes
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Introduction to Knowledge Engineering Part 1
Fluorescence-microscope_Botany_detailed content
climate analysis of Dhaka ,Banglades.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Foundation of Data Science unit number two notes
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Galatica Smart Energy Infrastructure Startup Pitch Deck
Business Ppt On Nestle.pptx huunnnhhgfvu
Reliability_Chapter_ presentation 1221.5784
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Qualitative Qantitative and Mixed Methods.pptx
Clinical guidelines as a resource for EBP(1).pdf
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf

Statistical test in spss

  • 2. Statistics Statistics:  Science dealing with data collection, organization, analysis, interpretation and presentation.  Numbers that are used to describe data or relationship.  Study which develop critical thinking and analytic skills.
  • 5. Statistical Analysis Types of Statistical Analysis: Descriptive Statistics:  Organizing and summarizing data using number and graphs.  Data Summary: Bar graphs, Histograms, Pie-charts, Shape of graph and skewness  Measures of Central Tendency: Mean, Median, Mode  Measures of variability: Range, Variance and Standard Deviation  Ex: What is the average age of the sample? Inferential Statistics:  Using sample data to make an inference or draw a conclusion of the population.  Use probability to determine how confident we can be that the conclusions we make are correct. (Confidence Intervals and Margins of Error)  t-Test, Chi-square, and ANOVA (analysis of variance)  Ex: Is the average age of population different from 35?
  • 8. Level of Measurement Types of Level of Measurement: Qualitative or Categorical:  Nominal: Two or more categories and lack order. Ex: Gender; 1= male, 2= female, Football jersey number.  Ordinal: Two or more categories and have order but differences are not meaningful. Ex: Small, Medium and Large farmers, Place finished in a race: 1st, 2nd, 3rd , and so on.  Non-Parametric Statistical methods. Quantitative or Continuous:  Interval: Quantitative variables having own scale. It lacks ‘0’ and divisions are not meaningful. Ex: temperature in Celsius.  Ratio: Quantitative variables with true zero point and divisions are meaningful. Ex: Yield in quintal, Germination, etc.  Parametric Statistical methods.
  • 11. Variable Analysis Types of Variable Analysis: Univariate Analysis:  It uses only one variable to simply describe what is in a result.  Its purpose is more toward descriptive rather than explanatory.  Frequencies, Histograms, Means, Medians, Modes, Ranges, Percentiles, Scatterplots, Confidence Intervals.  Ex: Gender Bivariate Analysis:  It compares or contrasts the effect of two variables on each other.  Crosstabs, Chi-square, ANOVAs, t-Tests, Correlations.  Ex: Gender and GPA value Multivariate Analysis:  It tests three or more variables together to check for all kinds of effects that occur together.  OLS regression, Binary logistic regression, and other types of regressions.  Gender, GPA value and prejudice
  • 14. Statistical Tests Types of Statistical test: Parametric test:  Specific assumptions are made about the population parameter.  It is powerful if existed.  Test statistics is based on distribution.  Central measure- mean.  Can draw more conclusions.  One way ANOVA, Independent sample t-test, Paired sample t-test, correlation etc.  Ex: Length and weight of human body.  Conditions to be satisfied: Data must be interval/ratio. Subjects should be randomly selected. Data should be normally distributed. Variation in the results should be roughly same.
  • 15. Statistical Tests Types of Statistical test: Non-Parametric test:  Distribution free test: Does not assume that our data follow a specific distribution.  Not powerful like parametric test.  Test statistics is arbitrary.  Central measure: Median.  Simplicity, not affected by outliers.  Chi-square, Friedman test, Kruskal wallis test, Sign test etc.  Ex: Did fungus growth take place or not?  Conditions to be satisfied: Data must be nominal/ordinal. Non normal distribution of data.
  • 16. Statistical Tests Parametric Vs. Non-Parametric test: Study Type Parametric test Non parametric test Compare means between two distinct/independent groups Two sample t-test Mann-whitney test Compare two quantitative measurements taken from the same individual Paired t-test Wilcoxon signed-ranked test Compare means between three or more distinct/independent groups Analysis of variance (ANOVA) Kruskal wallis test Repeated measures, >two conditions One way, repeated measures ANOVA Friedman’s test Estimate the degree of association between two quantitative variables Pearson Coefficient of correlation Spearman’s rank correlation
  • 19. Choosing right statistical test Types of Statistical test: Regression test (test cause and effect relationship): Predictor variable Outcome variable Research question example Simple linear regression • Continuous • One predictor • Continuous • One outcome What is the effect of income on longevity? Multiple linear regression • Continuous • Two or more predictors • Continuous • One outcome What is the effect of income and minutes of exercise per day on longevity? Logistic regression • Continuous • Binary What is the effect of drug dosages on the survival of test subjects?
  • 20. Choosing right statistical test Types of Statistical test: Comparison test (differences among group means): Predictor variable Outcome variable Research question example Paired t-test • Categorical • One predictor • Quantitative • Groups come from the same population What is the effect of two different test prep programs on the average exam scores for students from the same class? Independent t-test • Categorical • One predictor • Quantitative • Groups come from the different population What is the difference in average exam scores for students from two different schools? ANOVA • Categorical • One or more predictor • Quantitative • One outcome What is the difference in average pain levels among post-surgical patients given three different painkillers?
  • 21. Choosing right statistical test Types of Statistical test: Correlation test (Check whether two variables are related ): Predictor variable Outcome variable Research question example Pearson • Continuous • Continuous How are latitude and temperature related? Chi-square • Categorical • Categorical How is membership in a sports team related to membership in drama club among high school students?
  • 22. Choosing right statistical test Types of Statistical test: Non-Parametric test: Predictor variable Outcome variable Use in place of Spearman •Ordinal •Ordinal Regression and correlation tests Sign test •Categorical •Quantitative T-test Kruskal–Wallis •Categorical •3 or more groups •Quantitative ANOVA Wilcoxon Rank- Sum test •Categorical •2 groups •Quantitative •Groups come from different populations Independent t-test Wilcoxon Signed- rank test •Categorical •2 groups •Quantitative •Groups come from the same population Paired t-test
  • 24. One sample t-test Assumptions:  Dependent variable should be measured at interval or scale ratio (continuous).  The data are independent.  There should be no significant outliers.  Dependent variable should be approximately normally distributed.  Sample are small, mostly lower than 30. Example:  A researcher is planning a psychological intervention study to test depression index, where anyone who achieves a score of 4.0 is deemed to have 'normal' levels of depression.  Lower scores indicate less depression and higher scores indicate greater depression.  He has recruited 40 participants to take part in the study.  Depression scores are recorded in the variable dep_score.  He wants to know whether his sample is representative of the normal population. Onesamplet-test
  • 25. One sample t-test Procedure: 1. Click Analyze > Compare Means > One-Sample T Test... on the main menu. 2. It will be presented with the One-Sample T Test dialogue box, as shown below: 1. 2.
  • 26. One sample t-test 3. Transfer the dependent variable, dep_score, into the Test Variable(s). Enter the population mean you are comparing the sample against in the Test Value: box, by changing the current value of "0" to "4". You will end up with the screen. 4. Click on the Options button. You will be presented with the One-Sample T Test: Options dialogue box, as shown below: 5. Click on the Continue button. You will be returned to the One-Sample T Test dialogue box. 6. Click on the OK button to generate the output. 3. 4.
  • 27. One sample t-test Result and Interpretation:  In this example, p < .05 (0.022) therefore, it can be concluded that the population means are statistically significantly different.  If p > .05, the difference between the sample-estimated population mean and the comparison population mean would not be statistically significantly different.  Depression score was statistically significantly lower than the population normal depression score, t(39) = -2.381, p = .022.  There was a statistically significant difference between means (p < .05). Therefore, we can reject the null hypothesis and accept the alternative hypothesis.
  • 28. Independent sample t-test Assumptions:  Dependent variable should be measured at interval or scale ratio (continuous).  Independent variable should consist of two categorical, independent groups.  The data are independent.  There should be no significant outliers.  Dependent variable should be approximately normally distributed.  There needs to be homogeneity of variances. Example:  A researcher decided to investigate whether an exercise or weight loss intervention is more effective in lowering cholesterol levels.  To this end, the researcher recruited a random sample of inactive males that were classified as overweight.  This sample was then randomly split into two groups:  Group 1 underwent a calorie-controlled diet and Group 2 undertook the exercise-training programme.  In order to determine which treatment programme was more effective, the mean cholesterol concentrations were compared between the two groups at the end of the treatment programmes. Independent samplet-test
  • 29. Independent sample t-test Procedure: 1. Click Analyze > Compare Means > Independent-Samples T Test... on the top menu, as shown below: 2. You will be presented with the Independent-Samples T Test dialogue box, as shown below: 1. 2.
  • 30. Independent sample t-test 3. Transfer the dependent variable, Cholesterol, into the Test Variable(s): box, and transfer the independent variable, Treatment, into the Grouping Variable: box. 3. 4. 4. You then need to define the groups (treatments). Click on the Define Options Button. You will be presented with the Define Groups dialogue box 5. Enter "1" into the Group 1: box and enter "2" into the Group 2: box. Remember that we labelled the Diet Treatment group as 1 and the Exercise Treatment group as 2. 6. Click the Continue Button. 7. Click the Ok Button. 5.
  • 31. Independent sample t-test Result and Interpretation:  You can see that the group means are statistically significantly different because the value in the "Sig. (2-tailed)" row is less than 0.05.  Looking at the Group Statistics table, we can see that those people who undertook the exercise trial had lower cholesterol levels at the end of the programme than those who underwent a calorie-controlled diet.  This study found that overweight, physically inactive male participants had statistically significantly lower cholesterol concentrations (5.80 ± 0.38 mmol/L) at the end of an exercise-training programme compared to after a calorie-controlled diet (6.15 ± 0.52 mmol/L), t(38)=2.428, p=0.020.
  • 32. Paired t-test Assumptions:  Dependent variable should be measured at interval or scale ratio (continuous).  Independent variable should consist of two categorical, related groups or matched pairs.  There should be no significant outliers.  The distribution of the differences in the dependent variable between the two related groups should be approximately normally distributed. Example:  A group of Sports Science students (n = 20) are selected from the population to investigate whether a 12-week plyometric-training programme improves their standing long jump performance.  In order to test whether this training improves performance, the students are tested for their long jump performance before they undertake a plyometric-training programme and then again at the end of the programme (i.e., the dependent variable is "standing long jump performance", and the two related groups are the standing long jump values "before" and "after" the 12-week plyometric-training programme). Paired t-test
  • 33. Paired t-test Procedure: 1. Click Analyze > Compare Means > Paired Samples T Test... on the top menu, as shown below: 2. You will be presented with the Paired-Samples T Test dialogue box, as shown below: 1. 2.
  • 34. Paired t-test 3. Transfer the variables JUMP1 and JUMP2 into the Paired Variables: box. 3. 4. Click the Continue Button. 5. Click the Ok Button.
  • 35. Paired t-test Result and Interpretation:  We can conclude that there was a statistically significant improvement in jump distance following the plyometric- training programme from 2.48 ± 0.16 m to 2.52 ± 0.16 m (p < 0.0005); an improvement of 0.03 ± 0.03 m.
  • 36. One way ANOVA Assumptions:  Dependent variable should be measured at interval or scale ratio (continuous)  Independent variable should consist of two or more categorical, independent groups  Independence of observations  There need to be homogeneity of variances  There should be no significant outlier  Dependent variable should be approximately normally distributed for each category of the independent variable. Example:  A manager employs an external agency which provides training in the spreadsheet program for his employees.  They offer 3 courses: a beginner, intermediate and advanced course.  He is unsure which course is needed for the type of work they do at his company, so he sends 10 employees on the beginner course, 10 on the intermediate and 10 on the advanced course.  When they all return from the training, he gives them a problem to solve using the spreadsheet program, and times how long it takes them to complete the problem.  He then compares the three courses (beginner, intermediate, advanced) to see if there are any differences in the average time it took to complete the problem. OnewayANOVA
  • 37. One way ANOVA Procedure: 1. Click Analyze > Compare Means > One way ANOVA on the top menu, as shown below: 2. You will be presented with the One way ANOVA dialogue box, as shown below: 1. 2.
  • 38. One way ANOVA 3. Transfer the dependent variable, Time, into the Test Variable(s): box, and transfer the independent variable, course into the Factor box. 3. 4. 4. Click on the Post hoc button. Tick the Tukey checkbox as shown below: 5. Click the Continue Button. 8. Click the Ok Button. 6. 6. Click on the Options button. Tick the Descriptive checkbox in the –Statistics– area, as shown below: 7. Click the Continue Button.
  • 39. One way ANOVA Result and Interpretation:
  • 40. One way ANOVA Result and Interpretation:  We can see from the table that there is a statistically significant difference in time to complete the problem between the group that took the beginner course and the intermediate course (p = 0.046), as well as between the beginner course and advanced course (p = 0.034). However, there were no differences between the groups that took the intermediate and advanced course (p = 0.989).  There was a statistically significant difference between groups as determined by one-way ANOVA (F(2,27) = 4.467, p = .021). A Tukey post hoc test revealed that the time to complete the problem was statistically significantly lower after taking the intermediate (23.6 ± 3.3 min, p = .046) and advanced (23.4 ± 3.2 min, p = .034) course compared to the beginners course (27.2 ± 3.0 min). There was no statistically significant difference between the intermediate and advanced groups (p = .989).  We can see that the significance value is 0.021 (i.e., p = .021), which is below 0.05. and, therefore, there is a statistically significant difference in the mean length of time to complete the spreadsheet problem between the different courses taken
  • 41. Two way ANOVA Assumptions:  Dependent variable should be measured at interval or scale ratio (continuous).  Independent variable should consist of two or more categorical, independent groups.  Independence of observations.  There need to be homogeneity of variances.  There should be no significant outlier.  Dependent variable should be approximately normally distributed for each category of two independent variable. Example:  A researcher was interested in whether an individual's interest in politics was influenced by their level of education and gender.  They recruited a random sample of participants to their study and asked them about their interest in politics, which they scored from 0 to 100, with higher scores indicating a greater interest in politics.  The researcher then divided the participants by gender (Male/Female) and then again by level of education (School/College/University).  Therefore, the dependent variable was "interest in politics", and the two independent variables were "gender" and "education". TwowayANOVA
  • 42. Two way ANOVA Procedure: 1. For Gender, code "males" as 1 and "females" as 2, and for Educational Level, code "school" as 1, "college" as 2 and "university" as 3. 2. Click Analyze > General Linear Model > Univariate on the top menu, as shown below: 3. You will be presented with the Univariate dialogue box, as shown below: 2. 3.
  • 43. Two way ANOVA 4. Transfer the dependent variable, Int_Politics, into the Dependent Variable: box, and transfer both independent variables, Gender and Edu_Level, into the Fixed Factor(s): box 4. 5. 5. Click on the Plots Button . You will be presented with the Univariate: Profile Plots dialogue box, as shown below: 6. Transfer the independent variable, Edu_Level, from the Factors: box into the Horizontal Axis: box, and transfer the other independent variable, Gender, into the Separate Lines: box. You will be presented with the following screen: 6.
  • 44. Two way ANOVA 7. Click on the Add Button. You will see that "Edu_Level*Gender" has been added to the Plots: box, as shown below: 7. 9. 8. Click on the Continue button. This will return you to the Univariate dialogue box. 9. Click on the post-hoc Button. You will be presented with the Univariate: Post Hoc Multiple Comparisons for Observed Means dialogue. Transfer Edu_Level from the Factor(s): box to the Post Hoc Tests for: box and select Tukey 11. 10. Click on the Continue button to return to the Univariate dialogue box. 11. Click on the Options Button. This will present you with the Univariate: Options dialogue box. Transfer Gender, Edu_Level and Gender*Edu_Level from the Factor(s) and Factor Interactions: box into the Display Means for: box. In the –Display– area, tick the Descriptive Statistics option.. Then click continue and OK.
  • 45. Two way ANOVA Result and Interpretation:  You can see from this graph that the lines do not appear to be parallel (with the lines actually crossing). You might expect there to be a statistically significant interaction.  We have a statistically significant interaction at the p = .014 level as seen in figure.  We can see from the table above that there was no statistically significant difference in mean interest in politics between males and females (p = .207), but there were statistically significant differences between educational levels (p < .0005).
  • 46. Two way ANOVA Result and Interpretation:  We can see that there is a statistically significant difference between all three different educational levels (p < .0005).  A two-way ANOVA was conducted that examined the effect of gender and education level on interest in politics. There was a statistically significant interaction between the effects of gender and education level on interest in politics, F (2, 54) = 4.643, p = .014.  Simple main effects analysis showed that males were significantly more interested in politics than females when educated to university level (p = .002), but there were no differences between gender when educated to school (p = .465) or college level (p = .793).
  • 47. Pearson’s Correlation Assumptions:  Two variables should be measured at the interval or ratio level (continuous)  There is a linear relationship between your two variables.  There should be no significant outliers.  Variables should be approximately normally distributed. Example:  A researcher wants to know whether a person's height is related to how well they perform in a long jump.  The researcher recruited untrained individuals from the general population, measured their height and had them perform a long jump.  The researcher then investigated whether there was an association between height and long jump performance by running a Pearson's correlation. Range of correlation coefficient ‘r’:  -1.0 to -0.7 Strong negative association  -0.7 to -0.3 Weak negative association  -0.3 to +0.3 little (or no) association  +0.3 to +0.7 Weak positive association  +0.7 to +1 Strong positive association Pearson’s Correlation
  • 48. Pearson’s Correlation Procedure: 1. Click Analyze > Correlate > Bivariate on the main menu 2. It will be presented with the Bivariate Correlations dialogue box. 1. 2.
  • 49. Pearson’s Correlation 3. Transfer the variables Height and Jump_Dist into the Variables. You will end up with a screen similar to the one below: 4. Make sure that the Pearson checkbox is selected under the –Correlation Coefficients– area 5. Click on the Options button and you will be presented with the Bivariate Correlations: Options dialogue box. If you wish to generate some descriptive, you can do it here by clicking on the relevant checkbox in the –Statistics– area. 6. Click on Continue and then OK button to generate the output. 3. 5.
  • 50. Pearson’s Correlation Result and Interpretation:  We can see that the Pearson correlation coefficient, r, is 0.706, and that it is statistically significant (p = 0.005).  A Pearson product-moment correlation was run to determine the relationship between height and distance jumped in a long jump. There was a strong, positive correlation between height and distance jumped, which was statistically significant (r = .706, n = 14, p = .005).
  • 51. Linear Regression Assumptions:  Two variables should be measured at the interval or ratio level (continuous)  There is a linear relationship between your two variables.  There should be independence of observations  Data needs to show homoscedasticity  Check that the residuals (errors) of the regression line are approximately normally distributed Example:  A salesperson for a large car brand wants to determine whether there is a relationship between an individual's income and the price they pay for a car.  As such, the individual's "income" is the independent variable and the "price" they pay for a car is the dependent variable.  The salesperson wants to use this information to determine which cars to offer potential customers in new areas where average income is known. Linear Regression
  • 52. Linear Regression Procedure: 1. Click Analyze > Regression > Linear on the main menu 2. It will be presented with the Linear Regression dialogue box. 1. 3. 3. Transfer the independent variable, Income, into the Independent(s): box and the dependent variable, Price, into the Dependent: box. 4. Click on the OK button. This will generate the results.
  • 53. Linear Regression Result and Interpretation:  This table indicates that the regression model predicts the dependent variable significantly well  The R value represents the simple correlation and is 0.873, which indicates a high degree of correlation.  This table presents the regression equation as: Price = 8287 + 0.564(Income)  The R2 value indicates how much of the total variation in the dependent variable, Price, can be explained by the independent variable, Income. In this case, 76.2% can be explained, which is very large.
  • 54. Multiple Regression Assumptions:  Dependent variables should be measured at the interval or ratio level (continuous)  Two or more independent variables, which can be either continuous or categorical.  There is a linear relationship between dependent variable and each of independent variables.  There should be independence of observations  Data needs to show homoscedasticity  Data must not show multicollinearity  There should be no significant outliers, high leverage points or highly influential points  Check that the residuals (errors) of the regression line are approximately normally distributed Example:  A health researcher wants to be able to predict "VO2max", an indicator of fitness and health  The researcher's goal is to be able to predict VO2max based on four attributes: age, weight, heart rate and gender. MultipleRegression
  • 55. Multiple Regression Procedure: 1. Click Analyze > Regression > Linear on the main menu 2. It will be presented with the Linear Regression dialogue box. Transfer the dependent variable, VO2max, into the Dependent: box and the independent variables, age, weight, heart_rate and gender into the Independent(s): box, 1. 3. 3. Click on the Statistics button. You will be presented with the Linear Regression: Statistics dialogue box Select Confidence intervals in the –Regression Coefficients– area leaving the Level(%): option at "95" 4. Click on Continue and then OK button. This will generate the results. 2.
  • 56. Multiple Regression Result and Interpretation:  This table indicates that the regression model predicts the dependent variable significantly well  The r value of 0.760 indicates a good level of prediction/correlation.  The r2 value of 0.577 that our independent variables explain 57.7% of the variability of our dependent variable, VO2max.
  • 57. Multiple Regression Result and Interpretation:  This table presents the regression equation as: VO2max = 87.83 – (0.165 x age) – (0.385 x weight) – (0.118 x heart_rate) + (13.208 x gender)  A multiple regression was run to predict VO2max from gender, age, weight and heart rate. These variables statistically significantly predicted VO2max, F(4, 95) = 32.393, p < .0005, R2 = .577. All four variables added statistically significantly to the prediction, p < .05.
  • 58. Binary logistic Regression Assumptions:  Dependent variables should be measured on dichotomous scale.  One or more independent variables, which can be either continuous or categorical.  There is a linear relationship between dependent variable and each of independent variables.  There should be independence of observations Example:  A health researcher wants to be able to predict heart disease.  The researcher's goal is to be able to predict heart disease from four attributes: age, weight, VO2max and gender.  There is caseno, a case number as an independent variable which helps to identify the outliers. Binarylogistic Regression
  • 59. Binary Logistic Regression Procedure: 1. Click Analyze > Regression > Binary Logistic on the main menu 2. It will be presented with the Logistic Regression dialogue box. Transfer the dependent variable, heart disease, into the Dependent: box and the independent variables, age, weight, VO2max and gender into the Independent(s): box, 1. 3. 3. Click on the Categorical button. You will be presented with the Linear Regression: Define Categorical variables box. Transfer the categorical independent variable, gender, from the Covariates: box to the Categorical Covariates: box 2.
  • 60. Binary Logistic Regression Procedure: 4. In the –Change Contrast– area, change the Reference Category: from the Last option to the First option. Then, click on the Change button 5. Click on the Continue button. You will be returned to the Logistic Regression dialogue box. 4. 6. Click on the Options button. You will be presented with the Logistic Regression: Options dialogue box. In the –Statistics and Plots– area, click the Classification plots, Hosmer-Lemeshow goodness-of-fit, Casewise listing of residuals and CI for exp(B): options, and in the –Display– area, click the At last step option 6. 7. Click on Continue and then OK button. This will generate the results.
  • 61. Binary Logistic Regression Result and Interpretation:  33% variation in dependent variable is explained by independent variable .  The cut value is .500. This means that if the probability of a case being classified into the "yes" category is greater than .500, then that particular case is classified into the "yes" category. Otherwise, the case is classified as in the "no" category.
  • 62. Binary Logistic Regression Result and Interpretation:  You can see that age (p = .003), gender (p = .021) and VO2max (p = .039) added significantly to the model/prediction, but weight (p = .799) did not add significantly to the model.  Table shows that the odds of having heart disease ("yes" category) is 7.026 times greater for males as opposed to females.  A logistic regression was performed to ascertain the effects of age, weight, gender and VO2max on the likelihood that participants have heart disease. The logistic regression model was statistically significant, χ2(4) = 27.402, p < .0005. The model explained 33.0% (Nagelkerke R2) of the variance in heart disease and correctly classified 71.0% of cases. Males were 7.02 times more likely to exhibit heart disease than females. Increasing age was associated with an increased likelihood of exhibiting heart disease, but increasing VO2max was associated with a reduction in the likelihood of exhibiting heart disease.
  • 63. Chi-Square Test Assumptions:  Two variables should be measured at an ordinal or nominal level (categorical data)  Two variable should consist of two or more categorical, independent groups. Example:  Educators are looking for novel ways in which to teach statistics to undergraduates as part of a non-statistics degree course.  They would like to know whether gender (male/female) is associated with the preferred type of learning medium (online vs. books).  We have two nominal variables: Gender (male/female) and Preferred Learning Medium (online/books). Chi-SquareTest
  • 64. Chi-Square Test Procedure: 1. Click Analyze > Descriptives Statistics > Crosstabs... on the top menu 1. 2. 2. It will be presented with the Logistic Regression dialogue box. Transfer one of the variables into the Row(s): box and the other variable into the Column(s): box. 3. If you want to display clustered bar charts (recommended), make sure that Display clustered bar charts checkbox is ticked.
  • 65. Chi-Square Test 6. Click on the Format button. You will be presented with table. 4. 5. Click on the Cells button. You will be presented with the following Crosstabs: Cell Display dialogue box. Select Observed from the –Counts– area, and Row, Column and Total from the –Percentages– area. Then click continue. 6.5. 4. Click on the Statistics button. You will be presented with the following Crosstabs: Statistics dialogue box. Select the Chi-square and Phi and Cramer's V options, as shown below. Then click continue. 7. Once you made choice, then click on the continue and OK to generate output.
  • 66. Chi-Square Test Result and Interpretation:  Both males and females prefer to learn using online materials versus books.  We can see here that χ(1) = 0.487, p = .485. This tells us that there is no statistically significant association between Gender and Preferred Learning Medium; that is, both Males and Females equally prefer online learning versus books.  We can see that the strength of association between the variables is very weak (0.078).
  • 67. Mann-Whitney U Test Assumptions:  Dependent variable should be measured at the ordinal or continuous level.  Independent variable should consist of two categorical, independent groups.  It should have independence of observations.  Mann-Whitney U test can be used when your two variables are not normally distributed. Example:  A researcher decided to investigate whether an exercise or weight loss intervention was more effective in lowering cholesterol levels.  The researcher recruited a random sample of inactive males that were classified as overweight.  This sample was then randomly split into two groups: Group 1 underwent a calorie-controlled diet (i.e., the 'diet' group) and Group 2 undertook an exercise-training programme (i.e., the 'exercise' group).  In order to determine which treatment programme was more effective, cholesterol concentrations were compared between the two groups at the end of the treatment programmes. Mann-WhitneyU Test
  • 68. Mann-Whitney U Test Procedure: 1. Click Analyze > Nonparametric Tests > Legacy Dialogs > 2 Independent Samples... on the top menu 1. 2. 2. It will be presented with the Two-Independent-Samples Tests dialogue box. Transfer the dependent variable, Cholesterol, into the Test Variable List: box and the independent variable, Group, into the Grouping Variable: box
  • 69. Mann-Whitney U Test 3. Click on the Define groups button. The button will not be clickable if you have not highlighted the Grouping Variable. Enter 1 into the Group 1: box and enter 2 into the Group 2: box. Remember that we labelled the Diet group as 1 and the Exercise group as 2. Then click continue. 3. 4. 4. If you wish to use this procedure to generate some descriptive statistics, click on the Options button and then tick Descriptive and Quartiles within the –Statistics– area. 5. Click on the Continue button, which will bring you back to the main dialogue box with the Grouping Variable: box now completed. 6. Click on the OK button. This will generate the output for the Mann-Whitney U test. 5.
  • 70. Mann-Whitney U Test Result and Interpretation:  Descriptive statistics for the Mann-Whitney U test, they are not actually very useful since the data are not normally distributed.  The rank table is very useful because it indicates which group can be considered as having the higher cholesterol concentrations, overall; namely, the group with the highest mean rank. In this case, the diet group had the highest cholesterol concentrations.  It can be concluded that cholesterol concentration in the diet group was statistically significantly higher than the exercise group (U = 110, p = .014).
  • 71. Sign Test Assumptions:  Dependent variable should be measured at the ordinal or continuous level.  Independent variable should consist of two categorical, "related groups" or "matched pairs“.  The paired observations for each participant need to be independent.  The difference scores (i.e., differences between the paired observations) are from a continuous distribution. Example:  A researcher wants to test a new formula for a sports drink that improves running performance.  He recruited 20 participants who each performed two trials in which they had to run as far as possible in two hours on a treadmill.  In one of the trials they drank the carbohydrate-only drink and in the other trial they drank the carbohydrate-protein drink.  The order of the trials was counterbalanced and the distance they ran in both trials was recorded. SignTest
  • 72. Sign Test Procedure: 1. Click Analyze > Nonparametric Tests > Legacy Dialogs > 2 Related Samples... on the main menu. 1. 2. 2. It will be presented with the Two-Related-Samples Tests dialogue box. Transfer the variables carb and carb_protein into the Test Pairs: box by highlighting both variables Deselect Wilcoxon and select Sign in the –Test Type– area. Click on the OK button to generate the output.
  • 73. Sign Test Result and Interpretation:  You can see how many participants decreased (the "Negative Differences" row), improved (the "Positive Differences" row) or witnessed no change (the "Ties" row) in their performance in the carbohydrate-protein trial (i.e., carb_protein) compared to the carbohydrate-only trial (i.e., carb) in Frequencies table.  The statistical significance (i.e., p-value) of the sign test is found in the "Exact Sig. (2-tailed)" row of the table above. However, if you had more than a total of 25 positive and negative differences, an "Asymp. Sig. (2-sided test)" row will be displayed instead..  Twenty participants were recruited to understand the performance benefits of a carbohydrate-protein versus carbohydrate-only drink on running performance as measured by the distance run in two hours on a treadmill. An exact sign test was used to compare the differences in distance run in the two trials. The carbohydrate-protein drink elicited a statistically significant median increase in distance run (0.113 km) compared to the carbohydrate-only drink, p = .004.
  • 74. Kruskal-Wallis Test Assumptions:  Dependent variable should be measured at the ordinal or continuous level.  Independent variable should consist of two or more categorical, independent groups.  It should have independence of observations. Example:  The researcher identifies 3 well-known, anti-depressive drugs which might have this positive side effect, and labels them Drug A, Drug B and Drug C.  The researcher then recruits a group of 60 individuals with a similar level of back pain and randomly assigns them to one of three groups – Drug A, Drug B or Drug C treatment groups – and prescribes the relevant drug for a 4 week period.  At the end of the 4 week period, the researcher asks the participants to rate their back pain on a scale of 1 to 10, with 10 indicating the greatest level of pain.  The researcher wants to compare the levels of pain experienced by the different groups at the end of the drug treatment period. Kruskal-Wallis Test
  • 75. Kruskal-Wallis Test Procedure: 1. Click Analyze > Nonparametric Tests > Legacy Dialogs > K Independent Samples... on the top menu. 1. 2. 2. It will be presented with the Test for Several Independent Samples dialogue box. Transfer the dependent variable, Pain_Score , into the Test Variable List: box and the independent variable, Drug_Treatment Group, into the Grouping Variable: box
  • 76. Kruskal-Wallis Test 3. Click on the Define range button. You will be presented with the "Several Independent Samples: Define Range" dialogue box. Enter "1" into the Minimum: box and "3" into the Maximum box. These values represent the range of codes you gave the groups of the independent variable, Drug_Treatment_Group (i.e., Drug A was coded "1" through to Drug C which was coded "3"). 3. 4. 4. Click on the Continue button and you will be returned to the "Tests for Several Independent Samples" dialogue box, but now with a completed Grouping Variable: box 5. Click on the OK button to generate the output.
  • 77. Kruskal-Wallis Test Result and Interpretation:  The mean rank of the Pain_Score for each drug treatment group can be used to compare the effect of the different drug treatments. Whether these drug treatment groups have different pain scores can be assessed using the Test Statistics table which presents the result of the Kruskal-Wallis H test.  A Kruskal-Wallis H test showed that there was a statistically significant difference in pain score between the different drug treatments, χ2(2) = 8.520, p = 0.014, with a mean rank pain score of 35.33 for Drug A, 34.83 for Drug B and 21.35 for Drug C.
  • 78. Descriptives &Frequencies Procedure: 1. Click Analyze > Descriptive Statistics > Descriptives on the top menu 1. 2. 2. It will be presented with the Descriptives dialogue box. Transfer the dependent variable, Income, into the Variable(s) List box A. Descriptives 3. 3. Click on Options button. Check list every measures to be displayed. Click continue and then OK.
  • 79. Descriptives &Frequencies Result and Interpretation:  Descriptive statistics displayed all the measures to be studied.  Descriptive statistics displays the result of the continuous variables only.
  • 80. Descriptives &Frequencies Procedure: 1. Click Analyze > Descriptive Statistics > Frequencies on the top menu 1. 2. 2. It will be presented with the Frequencies dialogue box. Transfer the both independent and dependent variable, Income, into the Variable(s) List box B. Frequencies 3.
  • 81. Descriptives &Frequencies 4. Click on Charts button. Select any chart you want. In –Chart values- option, click either Frequencies or Percentages 3. 4. 5. Click continue and then Ok to generate output. 3. Click on Statistics button. Check list every measures to be displayed and click continue .
  • 82. Descriptives &Frequencies Result and Interpretation:  Frequencies statistics displayed all the measures to be studied along with graph
  • 83. Test of Normality and Outliers Normality and Outliers:  Skewness and Kurtosis: The skewness and kurtosis measures should be as close to zero as possible. Divide the measure with its standard error, it will give you the z-score, which should be somewhere in between -1.96 to +1.96.  Kolmogorov-Smirnov Test & Shapiro-Wilk Test: If the sig. value of Shapiro-Wilk test is > 0.05, the data is normal. If it is below 0.005, the data significantly deviate from a normal distribution  Histograms and Box plots: Look at the graphical figures. Histogram should have the approximate shape of a normal curve. In Q-Q Plot, the dots should be approximately distributed along the line. Box plot should be approximately symmetrical. Process Involved: In statistics, normality tests are used to determine if a data set is well-modeled by a normal distribution and to compute how likely it is for a random variable underlying the data set to be normally distributed. is utmost to have a normality test before going to analysis In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error
  • 84. Test of Normality and Outliers Procedure: 1. Click Analyze > Descriptives Statistics > Explore on the top menu. 1. 2. 2. It will be presented with the Explore box. Transfer the dependent variable, Course , into Dependent List: box.
  • 85. Test of Normality and Outliers 3. 4. 3. Click Statistics. It will present a box. Select Descriptives and Outliers and click continue. 4. Click Plots. It will present a box. Select Stem-and-leaf, Histogram and Normality plots with tests. Then click continue and OK.
  • 86. Testof Normality and Outliers Result and Interpretation:  The z-value calculated by dividing skewness and kurtosis measures with their respective Std. Error shows that value does not lie between -1.96 to +1.96. Thus they are not normally distributed.  P value of .004 in Shapiro-Wilk test indicates that it is less than 0.05 and thus are not normally distributed.
  • 87. Test of Normality and Outliers Result and Interpretation:  Both box plot and Q-Q plot shows that they are not distributed normally.  In Box plot, Round sign indicates normal outlier whereas Star sign indicates extreme outliers. And the number near to sign are the cases which show outliers. Ex: 9 indicates the value of course time is 2.00 hours ( consult Fig 1 in procedure)
  • 88. Dealing with Outliers 1. Leave it in if it is a legitimate outlier. Use a non-parametric test on skewed data. 2. Correct data entry errors 3. Winsorize it to match the highest value in the rest of the data 4. Throw it out only if necessary. Better for multivariate analysis. 5. Transform it in form of log. Ex: 2,3,4,5,6,10,100  0.3, 0.47, 0.6, 0.69, 1, 2 Basic way to winsorize data: Consider the data set consisting of: {92, 19, 101, 58, 1053, 91, 26, 78, 10, 13, −40, 101, 86, 85, 15, 89, 89, 28, −5, 41} (N = 20, mean = 101.5) The data below the 5th percentile lies between −40 and −5, while the data above the 95th percentile lies between 101 and 1053. (Values shown in bold.) Then a 90% winsorization would result in the following: {92, 19, 101, 58, 101, 91, 26, 78, 10, 13, −5, 101, 86, 85, 15, 89, 89, 28, −5, 41} (N = 20, mean = 55.65) [For large data you can use Frequencies test in SPSS to winsorize data]