Unit-2 Biostatistics Probability Definition

UNIT-2
Regression, Probability & Parametric Test

Regression: Curve fitting by the method of least squares, fitting the lines y= a +
bx and x = a + by, Multiple regression, standard error of regression
Probability: Definition of probability, Binomial distribution, Normal
distribution, Poisson’s distribution, properties - problems Sample, Population,
large sample, small sample, Null hypothesis, alternative hypothesis, sampling,
essence of sampling, types of sampling, Error-I type, Error-II type, Standard
error of mean (SEM) - Pharmaceutical examples
Parametric test: t-test(Sample, Pooled or Unpaired and Paired), ANOVA,
(One way and Two way), Least Significance difference

BINOMIAL DISTRIBUTION
The binomial distribution is calculated by multiplying the probability of success raised
to the power of the number of successes and the probability of failure raised to the
power of the difference between the number of successes and the number of trials.
P(X= x) = nCxpxqn-x P(X) = n!/(n-x)! x! (px(1-p)n-x)
OR
p=1-q
n=the number of times success occurs in trails
x=the number of successes desired
P=probability of success
q=probability of failure

Properties of Binomial Distribution
1. Binomial distribution has a fixed number of independent trials; i.e., n.
2. In each trial, there are only two outcomes, success or failure.
3. The probability of success (p) remains constant across all trials.
4. Each trial is independent, with no impact on others.
5. It is a discrete probability distribution with specific, countable values.
6. Probability Distribution Function (PDF) calculates probabilities for 'x' successes in 'n' trials.
7. Mean (µ) equals np, and Variance (σ²) equals npq.
8. The shape of the binomial curve varies based on 'n' and 'p,' tending towards symmetry with larger 'n.'
9. For large 'n,' it approximates a normal distribution (Central Limit Theorem).
10. Cumulative Distribution Function (CDF) finds cumulative probabilities for ≤ 'x' successes.

POISSON DISTRIBUTION
The Poisson distribution is a discrete probability distribution that
calculates the likelihood of a certain number of events happening in a fixed
time or space, assuming the events occur independently and at a constant rate.
Poisson distribution is characterized by a single parameter, lambda (λ), which
represents the average rate of occurrence of the events. The probability mass
function of the Poisson distribution is given by:
• P(X = x) is the Probability of Observing x Events
• e is the Base of the Natural Logarithm (approximately 2.71828)
• λ is the Average Rate of Occurrence of Events
• X is the Number of Events that Occur
P (X = x) = e-λ
λ
x/ x!

CONTINOUS DISTRIBUTION
In probability theory and
statistics, the
continuous uniform
distributions or
rectangular distributions
are a family of symmetric
probability distributions.
UNIFORM

EXPONENTIAL
The exponential
distribution is a probability
distribution that models
the time between events
that happen continuously
and independently at a
constant rate
-ve
+ve

NORMAL
Normal Distribution is the most
common or normal form of
distribution of Random
Variables, hence the name “normal
distribution.” It is also called Gaussian
Distribution in Statistics or Probability.
We use this distribution to represent a
large number of random variables. It
serves as a foundation for statistics and
probability theory.

Properties of Normal Distribution
• Symmetry: The normal distribution is symmetric around its mean. This means the
left side of the distribution mirrors the right side.
• Mean, Median, and Mode: In a normal distribution, the mean, median, and
mode are all equal and located at the center of the distribution.
• Bell-shaped Curve: The curve is bell-shaped, indicating that most of the
observations cluster around the central peak, and the probabilities for values further
away from the mean taper off equally in both directions.
• Standard Deviation: The spread of the distribution is determined by the standard
deviation. About 68% of the data falls within one standard deviation of the mean,
95% within two standard deviations, and 99.7% within three standard deviations.

Studying the
graph it is clear
that using
Empirical Rule
we distribute
data broadly in
three parts. And
thus, empirical
rule is also called
“68 – 95 – 99.7”
rule.

SAMPLING
sampling is the selection of a subset or a statistical
sample (termed sample for short) of individuals from within a statistical
population to estimate characteristics of the whole population
Sampling has lower costs and faster data collection compared to recording data
from the entire population (in many cases, collecting the whole population is
impossible, like getting sizes of all stars in the universe), and thus, it can
provide insights in cases where it is infeasible to measure an entire population.

Sampling Methods
Within any of the types of frames identified above, a variety of sampling methods can be
employed individually or in combination. Factors commonly influencing the choice
between these designs include:
• Nature and quality of the frame
• Availability of auxiliary information about units on the frame
• Accuracy requirements, and the need to measure accuracy
• Whether detailed analysis of the sample is expected
• Cost/operational concerns

SAMPLING DESIGN
UNIVERSAL SAMPLING
SAMPLING UNIT
SAMPLING FRAME
SAMPLING SIZE
SAMPLING METHOD
BUDGET
SAMPLING DESIGN

SAMPLING METHODS/
TECHNIQUES
PROBABILITY
NON-PROBABILITY
SIMPLE SAMPLING
STRATIFIED SAMPLING
SYSTEMIC SAMPLING
MULTI STAGE SAMPLING
MULTI PHASE SAMPLING
CLUSTER SAMPLING
PURPOSIVE OR JUDGEMENT
CONVENIENCE
QUOTA
SNOWBALL
CONSECUTIVE

HYPOTHESIS
NULL HYPOTHESIS ALTERNATE HYPOTHESIS
H0 H1 Ha
The null and alternative hypotheses are two competing claims that
researchers weigh evidence for and against using a statistical test
There is no effect on the population There is an effect on the population
The effect is usually the effect of
the independent variable on the
dependent variable

The null and alternative are always claims about the population.
That’s because the goal of hypothesis testing is to make inferences about a population based on
a sample.
Often, we infer whether there’s an effect in the population by looking at differences between groups
or relationships between variables in the sample. It’s critical for your research to write strong
hypotheses.
You can use a statistical test to decide whether the evidence favors the null or alternative hypothesis.
Each type of statistical test comes with a specific way of phrasing the null and alternative hypothesis.
However, the hypotheses can also be phrased in a general way that applies to any test.

NULL HYPOTHESIS
Claim that there is no effect in the population
If the sample provides enough evidence against the claim that there’s no effect in the
population (p ≤ α), then we can reject the null hypothesis. Otherwise, we fail to reject the null
hypothesis.
Null hypotheses often include phrases such as “no effect,” “no difference,” or “no
relationship.” When written in mathematical terms, they always include an equality (usually
=, but sometimes ≥ or ≤).
You can never know with complete certainty whether there is an effect in the population.
Some percentage of the time, your inference about the population will be incorrect. When
you incorrectly reject the null hypothesis, it’s called a type I error. When you
incorrectly fail to reject it, it’s a type II error.

NULL HYPOTHESIS
Ex. Does the amount of text highlighted in the textbook affect exam scores?
The amount of text highlighted in the textbook has no effect on exam scores.
Ex. Does daily meditation decrease the incidence of depression?
Daily meditation does not decrease the incidence of depression.

ALTERNATE HYPOTHESIS
The alternative hypothesis (Ha) is the other answer to your research question. It
claims that there’s an effect on the population.
Often, your alternative hypothesis is the same as your research hypothesis. In
other words, it’s the claim that you expect or hope will be true.
The alternative hypothesis is the complement to the null hypothesis. Null and
alternative hypotheses are exhaustive, meaning that together they cover every
possible outcome. They are also mutually exclusive, meaning that only one can
be true at a time.

If you reject the null hypothesis, you can say that the alternative
hypothesis is supported. On the other hand, if you fail to reject the null
hypothesis, then you can say that the alternative hypothesis is not
supported. Never say that you’ve proven or disproven a hypothesis.
Alternative hypotheses often include phrases such as “an effect,” “a
difference,” or “a relationship.” When alternative hypotheses are written in
mathematical terms, they always include an inequality (usually ≠, but
sometimes < or >).
Ex. Does daily meditation decrease the incidence of depression?- Daily
meditation decreases the incidence of depression.

SIMILARITIES AND DIFFERENCES BETWEEN
NULL AND ALTERNATIVE HYPOTHESES
• They’re both answers to the research question.
• They both make claims about the population.
• They’re both evaluated by statistical tests.

Null hypotheses Alternative hypotheses
A claim that there is no effect in the
population.
A claim that there is an effect in the
population.
H0 H1 Ha
• No effect, No difference, No
relationship, No change, Does not
increase, Does not decrease
• An effect, A difference, A
relationship, A change, Increases,
Decreases
Equality symbol (=, ≥, or ≤) Inequality symbol (≠, <, or >)

Does the independent variable affect the dependent variable?
• Null hypothesis (H0): Independent variable does not affect dependent variable.
• Alternative hypothesis (Ha): Independent variable affects dependent variable.
Statistical test Null hypothesis Alternative hypothesis
One-way ANOVA with
two groups
The mean dependent variable does
not differ between group 1 (µ1) and
group 2 (µ2) in the population; µ1 =
µ2.
The mean dependent
variable differs between group
1 (µ1) and group 2 (µ2) in the
population; µ1 ≠ µ2.
One-way ANOVA with
three groups
The mean dependent variable does
not differ between group 1 (µ1), group
2 (µ2), and group 3 (µ3) in the
population; µ1 = µ2 = µ3.
The mean dependent
variable of group 1 (µ1), group
2 (µ2), and group 3 (µ3) are not
all equal in the population.

ERROR
In statistics, a Type I error is a false positive conclusion, while a Type II error is a
false negative conclusion.
Making a statistical decision always involves uncertainties, so the risks of making these
errors are unavoidable in hypothesis testing.
The probability of making a Type I error is the significance level, or alpha (α), while the
probability of making a Type II error is beta (β). These risks can be minimized through
careful planning in your study design.

Example: Type I vs Type II error
You decide to get tested for COVID-19 based on mild symptoms. There are
two errors that could potentially occur:
• Type I error (false positive): the test result says you have coronavirus,
but you actually don’t.
• Type II error (false negative): the test result says you don’t have
coronavirus, but you actually do.

Parametric test: t-test(Sample, Pooled or Unpaired and
Paired),ANOV
A, (One way and Two way), Least
Signi
fi
cance di
ff
erence

T-TEST
A t test is a statistical test that is used to compare the means of two groups.
It is often used in hypothesis testing to determine whether a process or
treatment actually has an effect on the population of interest, or whether
two groups are different from one another.
A t test can only be used when comparing the means of two groups (a.k.a.
pairwise comparison). If you want to compare more than two groups, or if
you want to do multiple pairwise comparisons, use an ANOVA test or a
post-hoc test.

The t test is a parametric test of difference, meaning that it makes the same
assumptions about your data as other parametric tests. The t test assumes your data:
1.are independent
2.are (approximately) normally distributed
3.have a similar amount of variance within each group being compared (a.k.a.
homogeneity of variance)
If your data do not fit these assumptions, you can try a nonparametric alternative to
the t test, such as the Wilcoxon Signed-Rank test for data with unequal variances.

t test used for- need to consider two things: whether the groups being compared come
from a single population or two different populations, and whether you want to test the
difference in a specific direction.
One-sample, two-sample, or paired t test?
• If the groups come from a single population (e.g., measuring before and after an
experimental treatment), perform a paired t test. This is a within-subjects design.
• If the groups come from two different populations (e.g., two different species, or
people from two separate cities), perform a two-
sample t test (a.k.a. independent t test). This is a between-subjects design.
• If there is one group being compared against a standard value (e.g., comparing the
acidity of a liquid to a neutral pH of 7), perform a one-sample t test.

One-tailed or two-tailed t test?
• If you only care whether the two populations are different from one
another, perform a two-tailed t test.
• If you want to know whether one population mean is greater than or
less than the other, perform a one-tailed t test.

Performing a t-test
The t test estimates the true difference between two group means using the ratio of the difference in group means over the
pooled standard error of both groups. You can calculate it manually using a formula, or use statistical analysis software.
The formula for the two-sample t test (a.k.a. the Student’s t-test) is shown below
In this formula, t is the t value, x1 and x2 are the means of the two groups being compared, s2 is
the pooled standard error of the two groups, and n1 and n2 are the number of observations in each of the groups.
A larger t value shows that the difference between group means is greater than the pooled standard error, indicating a more
significant difference between the groups.
You can compare your calculated t value against the values in a critical value chart (e.g., Student’s t table) to determine
whether your t value is greater than what would be expected by chance. If so, you can reject the null hypothesis and
conclude that the two groups are in fact different.
Most statistical software (R, SPSS, etc.) includes a t test function

Unit-2 Biostatistics Probability Definition

More Related Content

Similar to Unit-2 Biostatistics Probability Definition (20)

More from KRUTIKA CHANNE (6)

Recently uploaded (20)

Unit-2 Biostatistics Probability Definition