Statatistic in the Philippines of the current

C E N T R A L L U Z O N S T A T E U N I V E R S I T Y
Non-Parametric Methods
AEVANN PIERRE N. ESPIRITU
STAT 1200 – Management Science
2nd
Semester, 2024-2025

DEPARTMENT of
S T A T I S T I C S
Non-parametric Methods | 2
After completing this chapter, the students must be able to
• Decipher when to perform nonparametric tests.
• Apply the different nonparametric tests in a particular data set.
Learning Outcomes

DEPARTMENT of
S T A T I S T I C S
Parametric vs Nonparametric Statistical Analysis
Parametric tests assume underlying statistical distributions in the data.
Therefore, several conditions of validity must be met so that the result of a
parametric test is reliable. For example, Student’s t-test for two
independent samples is reliable only if each sample follows a normal
distribution and if sample variances are homogeneous.
Nonparametric tests do not rely on any distribution. They can thus be
applied even if parametric conditions of validity are not met.

DEPARTMENT of
S T A T I S T I C S
Advantages of Nonparametric Methods
1. They can be used to test population parameters when the variable is not
normally distributed.
2. They can be used when the data are nominal or ordinal.
3. They can be used to test hypotheses that do not involve population
parameters.
4. In some cases, the computations are easier than those for the
parametric counterparts.
5. They are easy to understand.
6. There are fewer assumptions that have to be met, and the assumptions
are easier to verify.

DEPARTMENT of
S T A T I S T I C S
Disadvantages of Nonparametric Methods
1. They are less sensitive than their parametric counterparts when the
assumptions of the parametric methods are met. Therefore, larger differences
are needed before the null hypothesis can be rejected.
2. They tend to use less information than the parametric tests. For example, the
sign test requires the researcher to determine only whether the data values are
above or below the median, not how much above or below the median each
value is.
3. They are less efficient than their parametric counterparts when the
assumptions of the parametric methods are met. That is, larger sample sizes are
needed to overcome the loss of information. For example, the nonparametric
sign test is about 60% as efficient as its parametric counterpart, the z test. Thus,
a sample size of 100 is needed for use of the sign test, compared with a sample
size of 60 for use of the z test to obtain the same results.

DEPARTMENT of
S T A T I S T I C S
Assumptions for Nonparametric Statistics
1. The sample or samples are randomly selected.
2. If two or more samples are used, they must be independent of each
other unless otherwise stated.
Remarks:
• If the parametric assumptions can be met, the parametric methods are
preferred.
• When parametric assumptions cannot be met, the nonparametric
methods are a valuable tool for analyzing the data.

DEPARTMENT of
S T A T I S T I C S
7
Selection of statistical tools
Nonparametric Methods |
Conditions/ Purposes
Parametric Test Nonparametric Test
Normal Distribution Not-normal Distribution
Compare a mean with
standard value
One sample t-test if n<30,
and Z-test if n>30
Wilcoxon test
Compare two means of
unpaired data sets
t-test if n<30, and Z-test if
n>30 Mann-Whitney test
Compare two means of
paired data sets Paired-sample t-test Wilcoxon test
Compare >2 means of
unmatched data sets One-way ANOVA Kruskal-Wallis test

DEPARTMENT of
S T A T I S T I C S
8
Selection of statistical tools
Nonparametric Methods |
Conditions/ Purposes
Parametric Test Nonparametric Test
Normal Distribution Not-normal Distribution
Compare >2 means of
matched data sets Multi-factor ANOVA Friedman test
Find the relationship between
two variables Pearson’s correlation Spearman’s correlation
Predict the values of one
variable from another
Simple linear or
nonlinear regression Spearman’s correlation
Find the relationship among
several variables
Multiple regression
(linear/nonlinear)
Kendall’s coefficient of
concordance

DEPARTMENT of
S T A T I S T I C S
Nonparametric Methods | 9
Assessing normality using different statistical graphs/plots
• A normal quantile plot (or normal probability plot) is a graph of points (x, y)
where each x value is from the original set of sample data, and each y value is
the corresponding z score that is a quantile value expected from the standard
normal distribution.
Procedure for determining whether it is reasonable to assume that sample data
are from a normally distributed population:
1. Histogram: Construct a histogram. Reject normality if the histogram departs
dramatically from a bell shape.
2. Outliers: Identify outliers. Reject normality if there is more than one outlier
present. (Just one outlier could be an error or the result of chance variation, but
be careful, because even a single outlier can have a dramatic effect on results.)

DEPARTMENT of
S T A T I S T I C S
3. Normal quantile plot: If the histogram is basically symmetric and there is at
most one outlier, use technology to generate a normal quantile plot. Use the
following criteria to determine whether or not the distribution is normal.
(These criteria can be used loosely for small samples, but they should be used
more strictly for large samples.)
Normal Distribution: The population distribution is normal if the pattern of the points is
reasonably close to a straight line and the points do not show some systematic pattern that
is not a straight-line pattern.
Not a Normal Distribution: The population distribution is not normal if either or both of
these two conditions applies:
• The points do not lie reasonably close to a straight line.
• The points show some systematic pattern that is not a straight-line pattern. Later in this
section we will describe the actual process of constructing a normal quantile plot, but for
now we focus on interpreting such a plot.

DEPARTMENT of
S T A T I S T I C S
Example: (Normal) The first case shows a histogram of IQ scores that is
close to being bell-shaped, so the histogram suggests that the IQ scores are
from a normal distribution. The corresponding normal quantile plot shows
points that are reasonably close to a straight-line pattern, and the points do
not show any other systematic pattern that is not a straight line. It is safe to
assume that these IQ scores are from a normally distributed population.

DEPARTMENT of
S T A T I S T I C S
Example: (Uniform) The second case shows a histogram of data having a
uniform distribution. The corresponding normal quantile plot suggests that
the points are not normally distributed because the points show a
systematic pattern that is not a straight-line pattern. These sample values
are not from a population having a normal distribution.

DEPARTMENT of
S T A T I S T I C S
The Spearman Rank Correlation Coefficient
• The Spearman rank correlation coefficient is a nonparametric statistic that
uses ranks to determine if there is a relationship between two variables.
• The computations for the rank correlation coefficient are simpler than
those for the Pearson coefficient and involve ranking each set of data.
• The difference in ranks is found, and is computed by using these
differences.
• If both sets of data have the same ranks, will be +1.
• If the sets of data are ranked in exactly the opposite way, will be -1.
• If there is no relationship between the rankings, will be near 0.
9.1 Spearman Rank Correlation

DEPARTMENT of
S T A T I S T I C S
Assumptions for Spearman’s Rank Correlation Coefficient
1. The sample is a random sample.
2. The data consist of two measurements or observations taken on the
same individual.
Formula for Computing the Spearman Rank Correlation Coefficient
Where: difference in ranks
number of data pairs
Decision Rule: Reject Ho if critical value.

DEPARTMENT of
S T A T I S T I C S
Steps in Performing Spearman’s Rank Correlation Coefficient
Step 1: State the hypotheses.
Step 2: Find the critical value.
Step 3: Find the test value.
a. Rank the values in each data set.
b. Subtract the rankings for each pair of data values
c. Square the differences.
d. Find the sum of the squares.
e. Substitute in the formula of
Step 4: Make the decision.
Step 5: Summarize the results.

DEPARTMENT of
S T A T I S T I C S
Example 1:
Find the Spearman rank correlation
coefficient for the following data, which
represent the number of hospitals and
nursing homes in each of seven
randomly selected states. At the 0.05
level of significance, is there enough
evidence to conclude that there is a
correlation between the two?
Hospitals Nursing Homes
107 230
61 134
202 704
133 376
145 431
117 538
108 373

DEPARTMENT of
S T A T I S T I C S
Solution:
Ho: (There is no correlation between the number
of hospitals and nursing homes)
Ha: (There is correlation between the number of
hospitals and nursing homes)
The values for and 0.05.
The critical value is 0.786.
Decision Rule: Reject Ho if 0.786.
Critical Values for the Rank Correlation Coefficient

DEPARTMENT of
S T A T I S T I C S
a.) Rank each data set as shown in the table. Let be the rank of the hospitals
and be the rank of the nursing homes.
Hospitals Rank of Nursing Homes Rank
of
107 2 230 2 0 0
61 1 134 1 0 0
202 7 704 7 0 0
133 5 376 4 1 1
145 6 431 5 1 1
117 4 538 6 -2 4
108 3 373 3 0 0
Σ

DEPARTMENT of
S T A T I S T I C S
b.) Substitute in the formula for
Step 4: Make a decision.
Since is greater than the critical value of 0.786, the decision is to reject the
null hypothesis.
There is enough evidence to say that there is correlation between the
number of hospitals and nursing homes.

DEPARTMENT of
S T A T I S T I C S
Example 2:
The following data shows the final term
exam scores in English and Math of 10
students. At the 0.01 level of
significance, is there enough evidence to
conclude that there is a correlation
between the two variables?
English Math
56 66
75 70
45 40
71 60
62 65
64 56
58 59
80 77
76 67
61 63

DEPARTMENT of
S T A T I S T I C S
Solution:
Ho: (There is no correlation between English and
Math scores of students.)
Ha: (There is correlation between English and
Math scores of students)
The values for and 0.01.
Critical Values for the Rank Correlation Coefficient

DEPARTMENT of
S T A T I S T I C S
a.) Rank each data set as shown in the table. Let be English scores and be
the Math scores.
English Rank of Math Rank of
56 9 66 4 5 25
75 3 70 2 1 1
45 10 40 10 0 0
71 4 60 7 3 9
62 6 65 5 1 1
64 5 56 9 4 16
58 8 59 8 0 0
80 1 77 1 0 0
76 2 67 3 1 1
61 7 63 6 1 1 Σ

DEPARTMENT of
S T A T I S T I C S
b.) Substitute in the formula for
Since is not greater than the critical value of 0.794, the decision is failed to
reject the null hypothesis.
There is no enough evidence to say that there is correlation between the
English and Math scores of students.

DEPARTMENT of
S T A T I S T I C S
24
 tests the null hypothesis that the row variable and column variable in a
contingency table are not related
Ho: The row variable and column variable are not related
Ha: The row variable and column variable are related
Assumptions:
• The sample data are randomly selected.
• For every cell in the contingency table, the expected frequency is at
least 5.
Non-parametric Methods |
9.2 Chi-Squared Test of Independence

DEPARTMENT of
S T A T I S T I C S
25
Test Statistic Value:
where;
Decision rule: Reject Ho if where r is the number of rows and c is the
number of columns in a contingency table

DEPARTMENT of
S T A T I S T I C S
Steps in Performing Chi-Squared Test of Independence
a. First, find the expected values for each cell of the contingency table.
b. Find the test value using the formula of .
Step 4: Make the decision.

DEPARTMENT of
S T A T I S T I C S
27
Example 1:
Based on the table below, is there evidence to suggest that sex is related to
whether a person is left-handed or right-handed? Test at 0.05 level of
significance.
Sex
Hand Preference
Total
Left Right
Female 12 108 120
Male 24 156 180
Total 36 264 300

DEPARTMENT of
S T A T I S T I C S
Solution:
Ho: The sex and hand preference are not related
Ha: The sex and hand preference are related
The values for 0.05 and
df

DEPARTMENT of
S T A T I S T I C S
First, find the expected values for each cell of the contingency table.
Sex
Hand Preference
Total
Left Right
Female 12 108 120
Male 24 156 180
Total 36 264 300
Find the test value using the formula of .

DEPARTMENT of
S T A T I S T I C S
Since 0.758 is not greater than the critical value 3.841, the decision is failed
to reject the null hypothesis.
At 5% level of significance, we can conclude that sex and hand preference
are not related.

DEPARTMENT of
S T A T I S T I C S
31
Example 2:
A researcher wishes to see if there is a relationship between the hospital
and the number of patient infections. A random sample of 3 hospitals
were selected, and the number of infections for a specific year has been
reported. At 0.05 level of significance, can it be concluded that the
number of infections is related to the hospital where they occurred.
Hospital
Surgical site
infections
Pneumonia
infections
Bloodstream
infections Total
A 41 27 51 119
B 36 3 40 79
C 169 106 109 384
Total 246 136 200 582

DEPARTMENT of
S T A T I S T I C S
Solution:
Ho: The number of infections is independent of the hospital.
Ha: The number of infections is dependent on the hospital
The values for 0.05 and
df

DEPARTMENT of
S T A T I S T I C S
First, find the expected values for each cell of the contingency table.
𝐸11=
(119)(246 )
582
=50.30
𝐸21=
(79 )(246 )
582
=33.39
𝐸31=
(384) (246)
582
=162.31
𝐸1 2=
(119) (136 )
582
=27.81
𝐸22=
(79 )(136 )
582
=18.46
𝐸32=
(384) (136)
582
=89.73
𝐸1 3=
(119) (200)
582
=40.89
𝐸23=
(79)(200 )
582
=27.15
𝐸33=
(384) (200 )
582
=131.96

DEPARTMENT of
S T A T I S T I C S
Find the test value using the formula of .
Since is greater than the critical value 9.488, the decision is to reject the
null hypothesis.
At 5% level of significance, we can conclude that the number of infections
is related to the hospital where they occurred.

Statatistic in the Philippines of the current

More Related Content

Similar to Statatistic in the Philippines of the current (20)

Recently uploaded (20)

Statatistic in the Philippines of the current