Hypothesis testing basics in the field of statistics

HYPOTHESIS TESTING
• IT IS A METHOD FOR TESTING A CLAIM OR HYPOTHESIS ABOUT A PARAMETER IN
A POPULATION, USING DATA MEASURED IN A SAMPLE. IN THIS METHOD, WE
TEST SOME HYPOTHESIS BY DETERMINING THE LIKELIHOOD THAT A SAMPLE
STATISTIC COULD HAVE BEEN SELECTED, IF THE HYPOTHESIS REGARDING THE
POPULATION PARAMETER WERE TRUE.

Normality test
• Normality tests are used to determine if a data set is well-modeled
by a normal distribution and to compute how likely it is for a random
variable underlying the data set to be normally distributed.
• If the variable is normally distributed, we use parametric test that
are based on this assumption.
• If a variable fails a normality test, in that case we use non-parametric
test.

PARAMETRIC TEST
• Parametric tests assume a normal distribution of values, or a
“bell-shaped curve.”
• Parametric tests are in general more powerful (require a smaller
sample size) than nonparametric tests.
• Example:- height
• if you were to graph height from a group of people, one would
see a typical bell-shaped curve. Height is roughly a normal
distribution.

PARAMETRIC TEST CONTI.
• The most widely used tests are the
• t-test (paired or unpaired),
• ANOVA (one-way, two-way)

WHAT IS A PAIRED T-TEST
• Paired t-test (also known as a dependent or correlated t-test) is a statistical test that
compares the averages/means and standard deviations of two related groups to
determine if there is a significant difference between the two groups.
• A significant difference occurs when the differences between groups are unlikely to be
due to sampling error or chance.
• The groups can be related by being the same group of people, the same item, or being
subjected to the same conditions.
• Paired t-tests are considered more powerful than unpaired t-tests because using the
same participants or item eliminates variation between the samples that could be
caused by anything other than what’s being tested.

WHAT ARE THE HYPOTHESES OF A PAIRED T-
TEST?
• There are two possible hypotheses in a paired t-test.
• The null hypothesis states that there is no significant difference
between the means of the two groups.
• The alternative hypothesis states that there is a significant
difference between the two population means, and that this
difference is unlikely to be caused by sampling error or chance.

WHAT ARE THE ASSUMPTIONS OF A PAIRED T-
TEST?
• The dependent variable is normally distributed
• The observations are sampled independently
• The dependent variable is measured on an incremental level, such
as ratios or intervals.
• The independent variables must consist of two related groups or
matched pairs.

WHEN TO USE A PAIRED T-TEST?
• Paired t-tests are used when the same item or group is tested twice, which is
known as a repeated measures t-test.
• Examples:-
• The before and after effect of a pharmaceutical treatment on the same group of
people.
• Body temperature using two different thermometers on the same group of
participants.
• Standardized test results of a group of students before and after a study prep
course.

WHAT IS AN UNPAIRED T-TEST?
• An unpaired t-test (also known as an independent t-test) is a
statistical procedure that compares the averages/means of two
independent or unrelated groups to determine if there is a
significant difference between the two.

WHAT ARE THE HYPOTHESES OF AN UNPAIRED
T-TEST?
• The hypotheses of an unpaired t-test are the same as those for a
paired t-test. The two hypotheses are:
• The null hypothesis states that there is no significant difference
between the means of the two groups.
• The alternative hypothesis states that there is a significant
difference between the two population means, and that this
difference is unlikely to be caused by sampling error or chance.

WHEN TO USE AN UNPAIRED T-TEST?
• An unpaired t-test is used to compare the mean between two independent
groups. We use an unpaired t-test when we are comparing two separate groups
with equal variance.
• Examples
• Research during which there are two independent groups, such as women and
men, that examines whether the average bone density is significantly different
between the two groups.
• Comparing the average commuting distance traveled by residents of two states
using 1,000 randomly selected participants from each city.

PAIRED VS UNPAIRED T-TEST
• The key differences between a paired and unpaired t-test are
summarized below.
1. A paired t-test is designed to compare the means of the same
group or item under two separate scenarios. An unpaired t-test
compares the means of two independent or unrelated groups.
2. In an unpaired t-test, the variance between groups is assumed to be
equal. In a paired t-test, the variance is not assumed to be equal.

ANOVA (ANALYSIS OF VARIANCE )
• It is a statistical technique that is used to check if the means of
two or more groups are significantly different from each other.
• ANOVA checks the impact of one or more factors by comparing
the means of different samples.
• Example:-
• We can use ANOVA to prove/disprove if all the medication
treatments were equally effective or not.

ONE-WAY ANOVA
• A one-way ANOVA evaluates the impact of a sole factor on a sole
response variable.
• It determines whether all the samples are the same.
• The one-way ANOVA is used to determine whether there are any
statistically significant differences between the means of three or
more independent (unrelated) groups.

EXAMPLES OF WHEN TO USE A ONE WAY ANOVA
• Situation 1: You have a group of individuals randomly split into smaller groups
and completing different tasks. For example, you might be studying the effects
of tea on weight loss and form three groups: green tea, black tea, and no tea.
• Situation 2: Similar to situation 1, but in this case the individuals are split into
groups based on an attribute they possess. For example, you might be
studying leg strength of people according to weight. You could split
participants into weight categories (obese, overweight and normal) and
measure their leg strength on a weight machine.

LIMITATIONS OF THE ONE WAY ANOVA
• A one way ANOVA will tell you that at least two groups were
different from each other. But it won’t tell you which groups
were different.

TWO-WAY ANOVA
• A two-way ANOVA is an extension of the one-way ANOVA.
• With a one-way, you have one independent variable affecting a dependent
variable.
• With a two-way ANOVA, there are two independents.
• Example:-
• A two-way ANOVA allows a company to compare worker productivity based on
two independent variables, such as salary and skill set. It is utilized to observe
the interaction between the two factors and tests the effect of two factors at the
same time.

ASSUMPTIONS FOR TWO WAY ANOVA
• The population must be close to a normal distribution.
• Samples must be independent.
• Population variances must be equal.
• Groups must have equal sample sizes.

Hypothesis testing basics in the field of statistics

NON-PARAMETRIC TESTS
• It is used when continuous data are not normally distributed or
when dealing with discrete variables.
• Non-parametric tests are designed for real data: skewed, lumpy,
having a outliers, and gaps scattered around.

WILCOXON SIGNED RANK TEST
• The Wilcoxon signed rank test should be used if the differences
between pairs of data are non-normally distributed.
• used to determine two dependent samples selected from
population having the same distribution.
• similar to paired T test in parametric tests

CHI-SQUARE TEST:
• Chi-Square Test is used to examine the association between two
or more variables measured on categorical scales.
• Chi-Square is used most frequently to test the statistical
significance of result reported in bivariate tables, and interpreting
bivariate tables is integral to interpreting the results of a chi-
square test.

Fisher’s exact TEST:
• It is the substitute for chi square test with small or imbalance
data sets.

P-Value
• The p-value is the level of marginal significance
within a statistical hypothesis test
representing the probability of the occurrence
of a given event.
• A p value is the probability of obtaining a
sample outcome, given that the value stated in
the null hypothesis is true. The p value for
obtaining a sample outcome is compared to
the level of significance.
• The p-value is used as an alternative to
rejection points to provide the smallest level of
significance at which the null hypothesis would
be rejected.

P VALUE CONTI.
• A p value is used in hypothesis testing to help you support or reject the null
hypothesis.
• The p value is the evidence against a null hypothesis.
• The smaller the p-value, the stronger the evidence that you should reject the null
hypothesis.
• On the other hand, a large p-value means your results have a huge probability of
being completely random and not due to anything in the experiment. Therefore,
the smaller the p-value, the more important (“significant”) the results are.

ONE-TAILED TEST
• A one-tailed test is a statistical test in which the critical area of a
distribution is one-sided so that it is either greater than or less than a
certain value, but not both.
• If the sample being tested falls into the one-sided critical area, the
alternative hypothesis will be accepted instead of the null hypothesis.
• Alpha levels (sometimes just called “significance levels”) are used
in hypothesis tests;

• A one-tailed test has the entire 5% of the alpha level in one tail (in
either the left, or the right tail).

TWO TAILED TEST
• A two-tailed test allots half of your alpha to testing the statistical
significance in one direction and half of your alpha to testing
statistical significance in the other direction.
• Two tailed test allots half of your alpha to testing the statistical
significance in one direction and half of your alpha to testing
statistical significance in the other direction. This means that .025
is in each tail of the distribution of your test statistic.

• Two-tailed hypothesis tests are also known as nondirectional and
two-sided tests because you can test for effects in both directions.
• When you perform a two-tailed test, you split the significance
level percentage between both tails of the distribution.

Z-TEST
• A z-test is a statistical test used to determine whether two population means are
different when the variances are known and the sample size is large.
• The test statistic is assumed to have a normal distribution, and nuisance
parameters such as standard deviation should be known in order for an accurate z-
test to be performed.
• A z-statistic, or z-score, is a number representing how many standard deviations
above or below the mean population a score derived from a z-test is.
• Z-tests are closely related to t-tests, but t-tests are best performed when an experiment
has a small sample size.
• Also, t-tests assume the standard deviation is unknown, while z-tests assume it is known.

Z test Conti.
• For each significance level in the confidence interval, the Z-test
has a single critical value (for example, 1.96 for 5% two tailed)
which makes it more convenient than the Student's t-test whose
critical values are defined by the sample size (through the
corresponding degrees of freedom).

ONE-SAMPLE Z TEST
• We perform the One-Sample Z test when we want to
compare a sample mean with the population mean.
• Example :- Let’s say we need to determine if girls on
average score higher than 600 in the exam. We have
the information that the standard deviation for girls’
scores is 100. So, we collect the data of 20 girls by
using random samples and record their marks.
Finally, we also set our value (significance level) to
⍺
be 0.05.

• Since the P-value is less than 0.05, we can reject the null
hypothesis and conclude based on our result that Girls on average scored
higher than 600.

TWO SAMPLE Z TEST
• We perform a Two Sample Z test when we want to
compare the mean of two samples.
• Example
• Here, let’s say we want to know if Girls on average score
10 marks more than the boys. We have the information
that the standard deviation for girls’ Score is 100 and for
boys’ score is 90. Then we collect the data of 20 girls and
20 boys by using random samples and record their
marks. Finally, we also set our value (significance level)
⍺
to be 0.05.

• Thus, we can conclude based on the P-value that we fail to reject the Null
Hypothesis. We don’t have enough evidence to conclude that girls on average score of
10 marks more than the boys.

DECIDING BETWEEN Z TEST AND T-TEST
• If the sample size is large enough, then the Z test and t-Test will
conclude with the same results.
• For a large sample size, Sample Variance will be a better
estimate of Population variance so even if population variance is
unknown, we can use the Z test using sample variance.
• Similarly, for a Large Sample, we have a high degree of freedom.
And since t-distribution approaches the normal distribution, the
difference between the z score and t score is negligible.

Hypothesis testing basics in the field of statistics

More Related Content

Similar to Hypothesis testing basics in the field of statistics (20)

Recently uploaded (20)

Hypothesis testing basics in the field of statistics