2. Introduction
Chi-square is one of the most commonly used non-parametric test
Introduced by Karl Pearson as a test of significance in 1990
Denoted by the Greek sign χ2
It is a useful measure of comparing experimentally observed result
with experimentally theoretical result or based on hypothesis
The chi-squared distribution with k degrees of freedom is the
distribution of a sum of the squares of k independent standard
normal random samples
It is determined by the degrees of freedom
It can be applied on categorical or qualitative data using a
contingency table
Used to evaluate unpaired/unrelated samples and proportions
3. Chi-Square
It is a mathematical expression, representing the ratio between
experimentally obtained result (O) and the theoretically expected
result (E) based on certain hypothesis
It used data in the form of frequencies (i.e., the number of
occurrence of an event)
Calculated by dividing the square of the overall deviation in the
observed and expected frequencies by the expected frequency
4. If there is no difference between the actual and observed
frequency, the value of chi-square is zero
If there is a difference then the value of test will be other than
zero
Differences may be due to sampling fluctuations
FORMULA: CHI SQUARE
5. Contingency Table
A type of table in a matrix format that displays the multivariate
frequency distribution of the variables
It provides a basic picture of interrelation between two variables
The values depend on the number of classes
6. Degrees of Freedom
It is the number of independent pieces of information which are free to
vary, that go into the estimate of a parameter
In a contingency table, the degree of freedom is calculated in a
different manner as df = (R-1) (C-1)
where:
R = no. of rows in a table
C = no. of columns in a table
7. Chi-Square Distribution
The sampling distribution of the chi-square statistic is not a Normal distribution.
It is a right-skewed distribution that allows only positive values because X2 can
never be negative
When the expected counts are all at least 5, the sampling distribution of the X2
statistic is close to a chi-square distribution with df equals the number of categories
minus 1.
8. Characteristics of Chi Square
It is based on frequencies and not on the parameters like
mean and standard deviation
Used for testing difference between the entire set of the
expected and the observed frequency
Used for testing the hypothesis and is not useful for
estimation
It is an important non-parametric test as no rigid
assumptions are necessary in regard to the type of
population, no need of parameter values and relatively
less mathematical details are involved
9. Assumptions for the validity of Chi-Square Test
All observations should be independent No individual
item should be included twice.
The total number of observation should be large. The
chi-square test should not be used if n>50
For comparison purpose, the data must be in original
units
If the theoretical frequencies is <5, then we pool it with
either preceding or succeeding frequency, so that the
resulting sum is >5
10. Limitations
It does not give us much information about the
strength of the relationship. It only conveys the
existence on non-existence of relationships between
the variables
It is sensitive to sample size
It is also sensitive to small expected frequencies.
11. STEPS
Identify the problem
Make a contingency table and note the observed frequency (O)
in each classes of one event, row wise i.e., horizontally. And
then the numbers in each group of the other event. column
wise, i.e., vertically
Set up the Ho; According to Null Hypothesis, no association
exists between attributes. This needs setting up of Ha
Calculate the expected frequencies (E)
Find the difference between observed and expected frequency
in each cell (O-E)
Calculate the chi-square value by applying the formula. The
value ranges from zero to infinite.
1.
2.
3.
4.
5.
6.
12. Uses of Chi Square Test
Goodness of fit - It measures how much the observed or actual
frequency differ from the expected/predicted frequency.
Test of Homogenity - Used to determine whether frequency counts
are distributed identically across different samples
Test of Independence - Used to explain that variables are how much
attached with each other.
1.
2.
3.
13. Chi-Square Test of
Goodness-of-Fit
Chi-Square Test of
Independence
TYPES
Number of Variables: One
Purpose of Test: Determines if
sample date matches a
population
Degrees of Freedom: K-1
Number of Variables: Two
Purpose of Test: Compares
two set of data to see if there
is a relationship
Degrees of Freedom: (r-1) (c-1)
14. Chi Square Goodness of Fit Test
A Ho and Ha established and a significance level is selected for
rejection of Ho.
A random sample of observations is drawn from a relevant statistical
population.
A set of expected frequencies is derived under the assumption that
the Ho is true
The observed frequencies compared with the expected frequencies
The calculated value of Chi-Square goodness of fit test is compared
with the table value. If the calculated value of chi-square goodness of
fit is greater than the table value, we will reject the null hypothesis and
conclude that there is a significant difference between the observed
and the expected frequency.
1.
2.
3.
4.
5.
15. Steps in Testing Goodness of Fit
A null and alternative hypothesis is established and a significance
level is selected for rejection of null hypothesis.
A random sample of observations is drawn from a relevant statistical
population.
A set of expected frequencies is derived under the assumption that
the null hypothesis is true.
The observed frequencies compared with the expected frequencies
The calculated value of Chi-Square goodness of fit test is compared
with the table value. If the calculated value of the Chi-Square
goodness of fit test is greater than the table value, we will reject the
null hypothesis and conclude that there is a significant difference
between the observed and the expected frequency.
1.
2.
3.
4.
5.
16. Conditions for Performing a Chi-Square for Goodness of Fit
Random: The data come a well-designed random sample or from a randomized
experiment
10%: When sampling without replacement, check that n ≤ (1/10)N.
Large Counts: All expected counts are greater than 5
Before we start using the chi-square goodness-of-fit test, we have two
important cautions to offer.
•The chi-square test statistic compares observed and expected counts. Don’t try to
perform calculations with the observed and expected proportions in each
category.
•When checking the Large Sample Size condition, be sure to examine the
expected counts, not the observed counts.
17. Mars, Incorporated makes milk chocolate candies. Here’s what the company’s
Consumer Affairs Department says about the color distribution of its M&M’S® Milk
Chocolate Candies: On average, the new mix of colors of M&M’S ® Milk Chocolate
Candies will contain 13 percent of each of browns and reds, 14 percent yellows, 16
percent greens, 20 percent oranges and 24 percent blues.
The one-way table summarizes the data from a sample bag of M&M’S ® Milk Chocolate
Candies. In general, one-way tables display the distribution of a categorical variable for
the individuals in a sample.
Sample Problem:
18. Stating the Hypothesis
H0: The company’s stated color distribution for M&M’S ® Milk Chocolate
Candies is correct.
Ha: The company’s stated color distribution for M&M’S ® Milk Chocolate
Candies is not correct.
We can also write the hypotheses in symbols as:
H0: pblue= 0.24, porange= 0.20, pgreen= 0.16, pyellow= 0.14, pred= 0.13,
pbrown= 0.13,
Ha: At least one of the pi’s is incorrect
21. The Chi-Square Distributions and P-values
Since our P-value is between 0.05 and 0.10, it is greater than α = 0.05. Therefore, we fail to reject H0. We
don’t have sufficient evidence to conclude that the company’s claimed color distribution is incorrect.