Test the significant for large sample maths mini project
1. TEST THE SIGNIFICANCE FOR
LARGE SAMPLE
1. Brajesh Bhaskar Kodam ARMIET/CS23/KB015
2. Manish Choudhary ARMIET/AIML23/SA004
3. Rameshwar Jadhav ARMIET/AIML23/VA016
4. Vishal Yadav ARMIET/AIML23/AT011
(Branch): Comp
Semester-IV
2. INTRODUCTION
In statistics, we often deal with data that varies—like people’s
weight, height, or other measurable values. To make sense of this
data, we usually take a small part of it, called a sample, and
compare it with what we expect from the entire group, or
population. This helps us:
Estimate important values about the whole group
(population).
Study key features of the population based on a sample.
When the sample size is more than 30, it’s usually considered a
large sample. The techniques we use for testing significance in
large samples are a bit different from those used for small
samples. These methods help us draw more accurate and
trustworthy conclusions.
3. THEORY
Significance testing is a statistical method that helps us figure out
whether the results we see in a sample are just due to random
chance, or if they actually reflect something true about the whole
population. It plays a key role in making smart, data-based
decisions by testing different assumptions (called hypotheses).
In general, when a sample has more than 30 observations (n >
30), it's considered a large sample. Large sample tests are very
common in research and analysis because:
They provide more reliable results than small sample
tests.
The normal approximation applies, making statistical
calculations easier.
They help in making decisions based on data, such as
comparing group means or proportions.
4. COMMON TESTS FOR LARGE SAMPLES
1. Z-Test (For Large Samples)
One-Sample Z-Test: Comparing a sample mean to a known
population mean.
Two-Sample Z-Test: Comparing means of two independent
samples.
Proportion Z-Test: Testing a sample proportion against a
claimed proportion.
2. Chi-Square Test (For Categorical Data)
Independence Test: It is used to check whether two
categorical variables are related or independent. (It
answers: "Is there a relationship between two factors?")
Goodness-of-Fit Test: It is a statistical test used to see if
observed data matches expected data based on a specific
theoretical distribution.(It answer: "Does this data follow
the pattern we expected?")
5. STEPS IN SIGNIFICANCE TESTING FOR
LARGE SAMPLES
1. State the Null (H₀) and Alternative Hypothesis (Ha)
2. Compute the Test Statistic
Use the formula based on the test type.
3. Choose a Significance Level ()
Common values: 0.05 (5%) or 0.01 (1%).
4. Compare with Critical Value or P-Value
If the test statistic exceeds the critical value or p-value <
α, reject H .
₀
5. Make a Conclusion
If H is rejected, it means there is a significant effect or
₀
difference.
If H is not rejected, the difference is likely due to
₀
chance.
6. PROBLEM STATEMENT: (One Sample Z test)
Q1. A tyre company claims that the lives of tyres have a mean of 42,000 km
with a standard deviation of 4,000 km. A new product is tested on a sample
of 81 tyres, and the sample mean is found to be 42,500 km. Test at 5% level
of significance whether the new product is significantly better than the old
one.
Solution:
n= 81 tyres, = 42500 , µ= 42000, = 4000
Step 1: Null Hypothesis(H₀):The mean tyre life of the new product is not
significantly different from the old product. (µ= 42000)
Alternative Hypothesis (Ha): The mean tyre life of the new
product
is significantly different from the old product. (µ 42000)
Step 2: Test Statistic:
Z= = = 1.125
7. PROBLEM STATEMENT: (One Sample Z test)
Step 3: Level of Significance()
= 5% i.e. 0.05
Step 4: Critical Value (Z)
1. To find this refer the z-table.
2. Since it is a two tailed test 0.05/2 = 0.025
3. Critical value is 1-0.025 = 0.9750
4. 0.9750 = 0.5000 + value in table
Hence, Z = 1.96
Step 5: Z < Z
Hence, we fail to reject H .
₀
This means the new tyre is not significantly
better than the old one at the 5% significance
level.
Two tailed test
Z table
8. PROBLEM STATEMENT: (Two Sample Z test)
Q2. A researcher wants to compare test scores between two schools.
•School A: n = 50, mean = 78, SD = 10
•School B: n = 60, mean = 75, SD = 12
At a 5% significance level, is there a significant difference in the average
scores?
Solution:
•Step 1: Null Hypothesis (H )
₀ : There is no difference in average scores. (μ₁
= μ )
₂
Alternative Hypothesis (H )
ₐ : There is a significant difference
in average scores. (μ ≠ μ )
₁ ₂
Step 2: Test Statistic:
Z= = = 1.43
9. PROBLEM STATEMENT: (Two Sample Z test)
Step 3: Level of Significance()
= 5% i.e. 0.05
Step 4: Critical Value (Z)
1. To find this refer the z-table.
2. Since it is a two tailed test 0.05/2 = 0.025
3. Critical value is 1-0.025 = 0.9750
4. 0.9750 = 0.5000 + value in table
Hence, Z = 1.96
Step 5: Z < Z
Hence, we fail to reject H .
₀
There is not enough evidence to conclude that the average test
scores of the two schools are significantly different.
Z table
10. PROBLEM STATEMENT: (Proportion-Z Test)
Q3. A mobile company claims that 60% of customers are satisfied
with their service.
A survey of 200 customers finds that 120 are satisfied.
Test the company’s claim at a 5% significance level.
Solution:
Step 1: Null Hypothesis(H₀): The true proportion of satisfied
customers is 60%.(p= 0.60)
Alternative Hypothesis (Ha): The true proportion is not
60%. (p 0.60)
Step 2: Test Statistic:
Z= == 0 …… =
11. PROBLEM STATEMENT: (Proportion-Z Test)
Step 3: Level of Significance()
= 5% i.e. 0.05
Step 4: Critical Value (Z)
1. To find this refer the z-table.
2. Since it is a two tailed test 0.05/2 = 0.025
3. Critical value is 1-0.025 = 0.9750
4. 0.9750 = 0.5000 + value in table
Hence, Z = 1.96
Step 5: Z < Z
Hence, we fail to reject H .
₀
There is not enough evidence to reject the company’s claim.
The proportion of satisfied customers is consistent with 60%.
Z table
12. PROBLEM STATEMENT: (Chi-square test)
Q4. A principal wants to determine if student absences are equally
distributed across the weekdays. A sample of 100 teachers reports the
highest absence days, with the observed and expected counts given
below. At a 5% significance level, do absences occur equally across all
days? Use the Chi-Square Goodness of Fit Test to determine the answer.
DAYS MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY
Observed 23 16 14 19 28
Expected 20 20 20 20 20
Solution:
Step1: Null Hypothesis(H₀):Student absence are equally distributed across
all weekdays.
Alternative Hypothesis(Ha):Student absence not equally distributed
across weekdays.
14. PROBLEM STATEMENT: (Chi-square test)
Step 4: Critical value:
df= 5-1 = 4
df= 4 = 9.488… From Chi- Square table.
Step 5: 6.3 < 9.488
Hence, we fail to reject the null hypothesis (H )
₀
There is no significant evidence to suggest that student absences
are not equally distributed across weekdays. The absences appear
to occur uniformly across the week.
Chi-square table
15. PROBLEM STATEMENT: (Chi-square test)
Q5. A marketing analyst wants to test if people use five different
social media platforms equally. Out of 150 people surveyed, the
observed usage distribution is as follows. Test at α = 0.05 to
determine if the usage is evenly distributed.
PLATFORMS INSTAGRAM FACEBOOK TWITTER SNAPCHAT LINKEDIN
Observed 30 25 35 40 20
Expected 30 30 30 30 30
Solution:
Step1: Null Hypothesis(H₀): Social media usage is evenly distributed across
platforms.
Alternative Hypothesis(Ha): Social media usage is not evenly
distributed.
17. PROBLEM STATEMENT: (Chi-square test)
Step 4: Critical value:
df= 5-1 = 4
df= 4 = 9.488… From Chi- Square table.
Step 5: 8.332 < 9.488
Hence, we fail to reject the null hypothesis (H )
₀
There is no significant evidence to suggest that social media usage
is unevenly distributed — the usage appears roughly equal across
platforms.
Chi-square table
18. PROBLEM STATEMENT: (Chi-square test)
Q6. A health researcher wants to determine whether a person’s diet type is
related to the presence of health issues. To investigate this, a survey was
conducted on 100 individuals, and the number of people with and without
health issues was recorded for each diet category. At a 5% level of
significance, can we conclude that diet type and health issues are
associated?
Diet type Has Health Issues No Health Issues Total
Vegetarian 10 30 40
Non-Vegetarian 25 15 40
Vegan 5 15 20
Total 40 60 100
Solution:
Step1: Null Hypothesis(H₀): Diet type and health issues are independent
Alternative Hypothesis(Ha): Diet type and health issues are not
19. PROBLEM STATEMENT: (Chi-square test)
Step 2: Expected frequencies
E=
Diet Type Has Issues(E) No Issues (E)
Vegetarian (40×40)/100 = 16 (60×40)/100 = 24
Non-Vegetarian (40×40)/100 = 16 (60×40)/100 = 24
Vegan (40×20)/100 = 8 (60×20)/100 = 12
Step 3:Test Statistics:
= = Observed Value, = Expected value
Observed Expected /
Vegetarian, Has Issues 10 16 -6 36 2.25
Vegetarian, No Issues 30 24 6 36 1.5
Non-Veg, Has Issues 25 16 9 81 5.06
Non-Veg, No Issues 15 24 -9 81 3.38
Vegan, Has Issues 5 8 -3 9 1.13
Vegan, No Issues 15 12 3 9 0.75
Total 14.07
20. PROBLEM STATEMENT: (Chi-square test)
Step 4: Level of Significance()
= 5% i.e. 0.05
Step 5: Critical value:
df= (r-1)(c-1)= (3-1)(2-1)= 2
df= 2 = 5.991… From Chi- Square table.
Step 6: 14.07 > 5.991
Hence, we reject the null hypothesis (H )
₀
There is a significant relationship between diet type and health
issue.
Chi-square table
21. CONCLUSION
• Significance testing helps us check if a sample truly represents
the whole group (population). When the sample size is large
(more than 30), special methods are used to get accurate
results.
It helps us:
Estimate important values about the whole group.
Understand patterns and differences in data.
Make better decisions based on numbers, not guesses.
• In short, significance testing in large samples helps us study
data correctly and make reliable conclusions.
22. APPLICATION
Significance testing for large samples is widely used in various fields, including:
1. Medical Research & Drug Testing:
Used in clinical trials to determine if a new drug is more effective than an
existing treatment.
2. Quality Control in Manufacturing:
Ensures products meet required standards by testing sample batches.
3. Market Research & Consumer Behavior:
Companies test whether a new product’s average sales differ significantly
from a competitor.
4. Education & Psychological Studies:
Used to compare student performance before and after an educational
intervention.
23. REFERENCES:
Fisher, R. A. (1925). Statistical Methods for Research
Workers. Edinburgh: Oliver and Boyd.
Neyman, J., & Pearson, E. S. (1933). On the Problem of
the Most Efficient Tests of Statistical Hypotheses.
Philosophical Transactions of the Royal Society of London.