standar error of proporton hypthesis testing standard error of diff t test.pptx
1. Standard Error Of Proportion,
Difference Of Mean And
Difference Of Proportion, t test,
chi square test, anova
Dr. Syed Razi Haider Zaidi
Associate Professor Community Medicine,
SIMS, Lahore.
2. Standard error of proportion
• The standard error of a proportion is a statistic indicating how greatly
a particular sample proportion is likely to differ from the proportion in
the population proportion, p. Let p^ represent a proportion observed
in a sample.
• sep = sqrt (p^q^/n)
• q = 1 - p, and n represents the sample size.
3. • For example, if 47 of the 300 residents in the sample supported the
use of covid vaccine, the sample proportion, p would be calculated as
47 / 300 = 0.157.
• This means our best estimate for the proportion of residents in the
population who supported the law would be 0.157.
• However, there’s no guarantee that this estimate will exactly match
the true population proportion so we typically calculate the standard
error of the proportion as well.
4. This is calculated as:
Standard Error of the Proportion Formula:
Standard Error = √ (1- ) / n
p
̂ p
̂
For example, if = 0.157 and n = 300, then we would calculate
p
̂
the standard error of the proportion as:
Standard error of the proportion = √.157(1-.157) / 300 = 0.021
We then typically use this standard error to calculate a confidence
interval for the true proportion of residents who
support the covid vaccine.
5. This is calculated as:
Confidence Interval for a Population Proportion Formula:
Confidence Interval = p
̂ +/- z*√ (1- ) / n
p
̂ p
̂
Looking at this formula, it’s easy to see that
the larger the standard error of the proportion,
the wider the confidence interval.
6. Note that the z in the formula is the z-value that corresponds to
popular confidence level choices:
For example, here’s how to calculate a 95% confidence interval
for the true proportion of residents in the city who support the
new law
8. • For example, here’s how to calculate a 95% confidence interval for the
true proportion of residents in the city who support the new vaccines:
• 95% C.I. = +/- z*√ (1- ) / n
p̂ p̂ p̂
• 95% C.I. = .157 +/- 1.96*√.157(1-.157) / 300
• 95% C.I. = .157 +/- 1.96*(.021)
• 95% C.I. = [ .10884 , .19816]
9. • 2) The proportion of blood group A among Indians is 30%. In a batch of 100 individuals if it is
observed as 25%, what is your conclusion about the group?
• Ans- Given values n= 100, p= proportion of blood group A in sample =25%, q=100-p= 75% P=
proportion of blood group A in Indian population =30%
• H0: The sample is drawn from Indian population with population proportion of blood group A, P =
30%
• H1: The sample is not drawn from Indian population with population proportion of blood group A, P
≠ 30%
• Z-test for proportion:- Z= − / ( ) ,
𝑝 𝑃 𝑆𝐸 𝑃
• SE(P)=sqrt / = 30 70 /100 = sqrt21 =4.58
𝑃∗𝑄 𝑛 ∗
• Z= 25−30 / 4.58 = 1.09
• 1.09 < 1.96
• Here Cal Z < 1.96 hence accept the null hypothesis, The sample is drawn from Indian population. 6
10. Hypothesis
• In Statistics, a hypothesis is defined as a formal statement, which gives the
explanation about the relationship between the two or more variables of the
specified population. It helps the researcher to translate the given problem to a
clear explanation for the outcome of the study. It clearly explains and predicts
the expected outcome.
• Null Hypothesis
• In the null hypothesis, there is no significant difference between the populations
specified in the experiments. The null hypothesis is denoted by H0.
• Alternative Hypothesis
• In an alternative hypothesis, there is a difference between populations specified.
It is denoted by the Ha or H1.
11. Hypothesis testing
• Hypothesis testing is used to assess the plausibility of a hypothesis by
using sample data.
• The test provides evidence concerning the plausibility of the
hypothesis, given the data.
• Statistical analysts test a hypothesis by measuring and examining a
random sample of the population being analyzed.
• The four steps of hypothesis testing include stating the hypotheses,
formulating an analysis plan, analyzing the sample data, and analyzing
the result.
12. • All analysts use a random population sample to test two different
hypotheses: the null hypothesis and the alternative hypothesis.
• The null hypothesis is usually a hypothesis of equality between
population parameters; e.g., a null hypothesis may state that the
population mean return is equal to zero. The alternative hypothesis is
effectively the opposite of a null hypothesis. Thus, they are
mutually exclusive, and only one can be true. However, one of the
two hypotheses will always be true.
• The null hypothesis is a statement about a population parameter,
such as the population mean, that is assumed to be true
13. Type I and Type II Errors
• When a statistical hypothesis is tested, there are 4 possible results:
(1)The hypothesis is true but our test accepts it.
• (2)The hypothesis is false but our test rejects it.
• (3)The hypothesis is true but our test rejects it.
• (4)The hypothesis is false but our test accepts it.
• Obviously, the last 2 possibilities lead to errors. Rejecting a null
hypothesis when it is true is called a Type I error. Accepting a null
hypothesis when it is false is called Type II error.
14. Test of significance
• We need to run a test of significance to reach value of p
• Z test
• Chi square
• t test
• Anova
• Etc etc
15. What is Statistical Significance?
• In Statistics, “significance” means “not by chance” or “probably true”. We can say
that if a statistician declares that some result is “highly significant”, then he
indicates by stating that it might be very probably true. It does not mean that
the result is highly significant, but it suggests that it is highly probable.
• Level of Significance Definition
• The level of significance is defined as the fixed probability of wrong elimination
of null hypothesis when in fact, it is true. The level of significance is stated to be
the probability of type I error (rejecting null when it is true)and is preset by the
researcher with the outcomes of error. The level of significance is the
measurement of the statistical significance. It defines whether the null
hypothesis is assumed to be accepted or rejected. It is expected to identify if the
result is statistically significant for the null hypothesis to be false or rejected.
16. • Level of Significance Symbol
• The level of significance is denoted by the Greek symbol α (alpha). Therefore, the level of
significance is defined as follows:
• Significance Level = p (type I error) = α
• The values or the observations are less likely when they are farther than the mean. The results
are written as “significant at x%”.
• Example: The value significant at 5% refers to p-value is less than 0.05 or p < 0.05. Similarly,
significant at the 1% means that the p-value is less than 0.01.
• The level of significance is taken at 0.05 or 5%. When the p-value is low, it means that the
recognised values are significantly different from the population value that was hypothesised in
the beginning. The p-value is said to be more significant if it is as low as possible. Also, the result
would be highly significant if the p-value is very less. But, most generally, p-values smaller than
0.05 are known as significant, since getting a p-value less than 0.05 is quite a less practice.
17. • How to Find the Level of Significance?
• To measure the level of statistical significance of the result, the investigator first
needs to calculate the p-value. It defines the probability of identifying an effect
which provides that the null hypothesis is true. When the p-value is less than the
level of significance (α), the null hypothesis is rejected. If the p-value so
observed is not less than the significance level α, then theoretically null
hypothesis is accepted. Level of significance is kept generally at 0.05.
• If p > 0.01 and p ≤ 0.05, then there must be a strong assumption about the null
hypothesis.
• If p ≤ 0.01, then a very strong assumption about the null hypothesis is indicated.
19. Hypothesis testing pearl of wisdom
• There are 5 main steps in hypothesis testing:
• State your research hypothesis as a null hypothesis and alternate
hypothesis (Ho) and (Ha or H1).
• Collect data in a way designed to test the hypothesis.
• Perform an appropriate statistical test.(e.g z test, t test, chi square,
anova etc)
• Decide whether to reject or fail to reject your null hypothesis.
• Present the findings in your results and discussion section.
20. Test of significance
• We need to run a test of significance to reach value of p
• Z test
• Chi square
• t test
• Anova
• Etc etc
21. • Based on the outcome of your statistical test, you will have to decide
whether to reject or fail to reject your null hypothesis.
• In most cases you will use the p-value generated by your statistical
test to guide your decision. And in most cases, your predetermined
level of significance for rejecting the null hypothesis will be 0.05 –
that is, when there is a less than 5% chance that you would see these
results if the null hypothesis were true.
22. Standard error difference between two proportions (Z test)
Ho:- There is no significant difference between two population
proportion P1= P2
• H1:- There is significant difference between two population
proportion P1≠ P2
• Z= observed difference between proportion /SEp1-p2
• Z= 1− 2 / ( 1− 2)
𝐼 𝑝 𝑝 𝐼 𝑆𝐸 𝑝 𝑝
• SE(p1-p2)= sqrt{ 1 1 1 + 2 2 / 2}
𝑝 𝑞 𝑛 𝑝 𝑞 𝑛
• If Z < 1.96 then accept Ho otherwise reject Ho.
23. • A survey of 400 children in age group 0-5 years showed prevalence rate of protein calorie malnutrition
to be 15%. Another study showed prevalence of 5% in a sample of 300 of same age group. Can we say
that there is statistical significance in difference between the two prevalence rates?
• Ans- Given values n1= 400, p1= 15%, q1= 100-15 =85%
• n2 =300, p2=5%, q2= 100-5=95%
• Z-test for difference between two proportions
• Ho:- There is no significant difference between two population (prevalence) proportion P1= P2
• H1:- There is significant difference between two population proportion
• P1≠ P2
• Z= 1− 2 / ( 1− 2)
𝐼 𝑝 𝑝 𝐼 𝑆𝐸 𝑝 𝑝
• SE(p1-p2)=sqrt{ 1 1/ 1 + 2 2 / 2} = sqrt{15 85 /400 + 5 95/100} = 2.18
𝑝 𝑞 𝑛 𝑝 𝑞 𝑛 ∗ ∗
• Z= 1− 2 / ( 1− 2) = 15−5 /2.18 =4.59 >1.96 Here cal Z > 1.96 hence reject Ho, there is
𝐼 𝑝 𝑝 𝐼 𝑆𝐸 𝑝 𝑝 𝐼 𝐼
significant difference in prevalence of protein calorie malnutrition
24. Standard error of difference of means
• Difference between two means is significant or not? Whether this
difference is present in actual populations(significant)meaning
samples represent two different universes or not
25. Drug trial to see effect on kidney weight
number (n) mean SD
• Control group 12 318 10.2
• Experimental 12 370 24.1
26. • S.E(d) between mean= sqrt ( σ12
/ n1 + σ22
/ n2)
• Sqrt(10.2*10.2/12 + 24.1*24.1/12}
• Sqrt{8.67+48.4}
• 7.5
• The SED between two means is 7.5.the actual difference between is
(370-319)=52 which is more than twice the S.E.(d) between means
and therefore significant.we conclude that treatment effects the
kidney weight.
27. t test
• A t-test (also known as Student's t-test) is a tool for evaluating the
means of one or two populations using hypothesis testing. A t-test
may be used to evaluate whether a single group differs from a known
value (a one-sample t-test), whether two groups differ from each
other (an independent two-sample t-test), or whether there is a
significant difference in paired measurements (a paired, or dependent
samples t-test).
• 1908 by William Sealy Gosset. • Gosset published his mathematical
work under the pseudonym “Student”.
28. Assumptions of t-Test •
• . Dependent variables are interval or ratio.
• • The population from which samples are drawn is normally
distributed.
• Samples are randomly selected.
• The groups have equal variance (Homogeneity of variance).
29. Applications of t test
• • To test whether a sample mean is different from a hypothesized
value.
• • To compare mean of two samples.
• • To compare two sample means by group.
• • The calculation of a confidence interval for a sample mean.
30. Types of “t” test
• . • Single sample t test – we have only 1 group; want to test against a
hypothetical mean.
• • Independent samples t test – we have 2 means, 2 groups; no
relation between groups, Eg: When we want to compare the mean of
T/m group with Placebo group.
• • Paired t test – It consists of samples of matched pairs of similar units
or one group of units tested twice. Eg: Difference of mean pre & post
drug intervention.
31. . One Sample t-test •
• It is used in measuring whether a sample value significantly differs
from a hypothesized value.
• For example, a research scholar might hypothesize that on an average
it takes 3 minutes for people to drink a standard cup of coffee. • He
conducts an experiment and measures how long it takes his subjects to
drink a standard cup of coffee.
• The one sample t-test measures whether the mean amount of time it
took the experimental group to complete the task varies significantly
from the hypothesized 3 minutes value.
35. • The independent sample t-test consists of tests that compare mean value(s) of
continuous-level (interval or ratio data), in a normally distributed data.
• • The independent sample t-test compares two means.
• • The independent samples t-test is also called unpaired t-test/ two sample t test.
• • It is the t-test to be used when two separate independent and identically
distributed variables are measured.
• • Eg: 1. Comparision of quality of life improved for patients who took drug
Valporate as opposed to patients who took drug Levetiracetam in myoclonic
seizures.
• • 2.Comparasion of mean cholesterol levels in treatment group with placebo
group after administration of test drug.
36. Assumptions
• A random sample of each population is used.
• The random samples are each made up of independent observation.
• Each sample is independent of one another.
• The population distribution of each population must be nearly
normal, or the size of the sample is large.
37. • To test the null hypothesis that the two population means, μ1 and μ2, are equal:
• 1. Calculate the difference between the two sample means, x 1 − x 2.
̄ ̄
• • 2. Calculate the pooled standard deviation: sp
• • 3. Calculate the standard error of the difference between the means:
• • 4. Calculate the T-statistic, which is given by T = x 1 − x 2/S E (x 1 − x 2 )
̄ ̄ ̄ ̄
• This statistic follows a t-distribution with n1 + n2 − 2 degrees of freedom.
• • 5. Use tables of the t-distribution to compare your value for T to the t n1+n2−2
distribution. This will give the p-value for the unpaired t-test.
43. • Two independent samples t-test and z-test are both statistical tests used to compare the means of two
independent samples. However, the choice between the two tests depends on the characteristics of the data
and the assumptions that we can make about the population.
• In general, a two independent samples z-test is appropriate when we know the population standard deviation
and the sample sizes are large. This is because, when sample sizes are large, the sample means are typically
normally distributed, and the z-test assumes normality in the population.
• On the other hand, a two samples t-test is more appropriate when we do not know the population standard
deviation and the sample sizes are small. This is because, when the sample size is small, the sample means
may not be normally distributed, and the t-test can provide a more accurate estimate of the population
mean.
• Here is the summary of which tests out of z-test or t-test to use in which scenarios:
• Two independent samples z-test:
• Large sample size (typically > 30)
• Known population standard deviation
• Normally distributed population
44. Paired t test
• • A paired t-test is used to compare two population means where you have two
samples in which observations in one sample can be paired with observations in
the other sample.
• • A comparison of two different methods of measurement or two different
treatments where the measurements/treatments are applied to the same subjects.
• • Eg: 1.pre-test/post-test samples in which a factor is measured before and after an
intervention,
• • 2.Cross-over trials in which individuals are randomized to two treatments and
then the same individuals are crossed-over to the alternative treatment,
• • 3.Matched samples, in which individuals are matched on personal characteristics
such as age and sex,
45. • . Paired t test • Suppose a sample of “n” subjects were given an
antihypertensive drug we want to check blood pressure before and
after treatment . We want to find out the effectiveness of the
treatment by comparing mean pre & post t/t.
• • To test the null hypothesis that the true mean difference is zero, the
procedure is as follows:
• 1.Calculate the difference (di = yi − xi) between the two observations
on each pair.
47. • Calculate the mean difference, d.
• 4. Calculate the t-statistic, which is given by T = d/S.E, Under the null
hypothesis, this statistic follows a t-distribution with n − 1 degrees of
freedom.
• 5. Use tables of the t-distribution to compare your value for T to the t
n−1 distribution. This will give the p-value for the paired t-test