Introduction to Statistics and Probability

Introduction to Statistics and
Probability

STATISTICS
• It is the science of collecting, organizing, analyzing and interpreting
data.
• There are two types of Statistics:
Inferential Statistics : It is about using sample data from a dataset
and making inferences and conclusions using probability theory.
Descriptive Statistics: It is used to summarize and represent the
data in an accurate way using charts, tables and graphs.
For example, you might stand in a mall and ask a sample of 100
people if they like shopping at Sears. You could make a bar chart
of yes or no answers (that would be descriptive statistics) or you
could use your research (and inferential statistics) to reason that
around 75%-80% of population.

DESCRIPTIVE STATISTICS
The following measures are used to represent the data set :
Descriptive
Statistics
Measure of
Position
Measure
of Spread
Measure
of Shape

MEASURE OF POSITION
• Also known as measure of Central Tendency.
• A measure of central tendency is a single value that attempts to
describe a set of data by identifying the central position within that
set of data.
• There are three measures of central tendencies: Mean, Median and
Mode.

Median: It is a point that divides the data into two equal halves while
being less susceptible to outliers compare to mean.
For ungrouped data: middle data point of an ordered data set.
For grouped data :
Where,
• L = lower limit of median class
• n = number of observations
• cf = cumulative frequency of class preceding the median class
• f = frequency of median class
• w = class size
Mean: It is a point where mass of distribution of data balances.

Mode: It refers to the data item that occurs most frequently in a
given data set.
Mode for ungrouped data: Most frequent observation in the data.
Mode for grouped data:

Example for ungrouped data:
Question:

MEASURE OF DISPERSION
• It refers to how the data deviates from the position measure i.e.
gives an indication of the amount of variation in the process.
• Dispersion of the data set can be described by:
Range: It is the difference between highest and the lowest values.
Standard Deviation: It is the measurement of average distance
between each quantity and mean i.e. how data is spread out from
mean. Higher the standard deviation, more is the data spread from
mean.

In normal distribution, when data is unimodal, z-scores is used to
calculate the probability of a score occurring within a standard
normal distribution and helps to compare two scores from different
samples.
Below, calculating the probability of randomly obtaining a score from
the distribution.
68% probability for -1 and +1 standard deviation from mean.
Similarly, 95% for -1.96 and +1.96 standard deviation.

MEASURE OF SHAPE
• It is used to characterize the location and variability of data set.
• Two common statistics that measure the shape of the data are:
Skewness and Kurtosis
Skewness : It is the horizontal displacement of the normal
curve about the mean position. Skewness for a normal distribution
is zero.

The methods to measure Skewness are:
Karl Pearson’s coefficient of Skewness: The value of coefficient
is between -1 and +1.
Bowley’s coefficient of Skewness: It is based on quartile value.
Where,
Q1 = First quartile
Q2 = Second quartile
Q3= Third quartile

Moment Coefficient of Skewness: It is defined as
Where,
m3 = Skewness
m2 = Variance
Kurtosis: It is the vertical distortion of normal curve without
disturbing symmetry of normal curve. The kurtosis for a standard
normal distribution is three.

CORRELATION ANALYSIS
It is a statistical technique that can show whether and how strongly
pairs of variables are related.
If correlation coefficient (r) is
• Positive, then both variables are directly proportional.
• Zero, there is no relation between them.
• Negative, then both variables are inversely proportional

Correlation: On the basis of number of variables
• Simple Correlation: It is when only two variables are analyzed.
For example, correlation between demand and supply.
• Partial Correlation: It is when two or more variables are
considered for analysis but only two influencing variables are
studied, rest are constant. For example, correlation between
demand, supply and income where income is constant.
• Multiple Correlation: It is when three or more variables are
analyzed simultaneously. For example, rainfall, production of rice
and price of rice are studied simultaneously.

COMPUTATION OF COEFFICIENT OF CORRELATION
There are two methods for computation:
Pearson’s Product Moment Method:
Assumes, distribution to be normal.

Spearmen Rank Moment Method
This method does not assume normal distribution.
For non-repeating ranks:
Where,
n = number of observations
D = difference between two ranks of each observation
For repeating ranks:
Where,
t = number of times a rank is repeated.

REGRESSION ANALYSIS
The statistical technique of estimating the unknown value of one
variable(i.e. dependent variable) from the known value of other
variable (i.e. independent variable) is called regression analysis.
The regression equation of X on Y is: X = a +bY
The regression equation of Y on X is: Y = a +bX
Dependent Variable: The single variable which we wish to
estimate/predict by regression model.
Independent Variable: The known variable(s) used to predict/estimate
the value of dependent variable.
X is dependent,
Y is independent
Y is depedent, X
is independent

Where, regression coefficient of y on x :
Where, regression coefficient of y on x :
Where,
r = coefficient of correlation between x and y
𝛔 = standard deviation
Regression Lines
The line which gives the best estimate of one variable for any
given value of the other variable.
• Y on X -
• X on Y -

PROBABILITY
Probability is a numerical description of how likely an event is
to occur or how likely it is that a proposition is true.
Tossing a coin: When a coin is tossed, there are two
possible outcomes: Heads (H) or Tails (T).Thus,
probability of the coin landing H is ½ and the
probability of the coin landing T is ½.
Rolling a die: When a single die is thrown, there are
six possible outcomes: 1, 2, 3, 4, 5, 6.The probability
of any one of them is 1/6.
Some examples are:

TERMINOLOGY
Experiment: A process by which an outcome is obtained.
Sample space: The set S of all possible outcomes of an experiment.
i.e. the sample space for a dice roll is {1, 2, 3, 4, 5, 6}
Event: Any subset E of the sample space i.e.
Let,
E1 = An even number is rolled.
E2 = A number less than three is rolled.
Outcome: Result of a single trial.
Equally likely outcomes: Two outcomes of a random experiment
are said to be equally likely, if upon performing the experiment a (very)
large number of times, the relative occurrences of the two outcomes
turn out to be equal.
Trial: Performing a random experiment.

EVENTS
Simple Events : If the event E has only single element of a
sample space, it is called as a simple event. Eg: if S = {56 , 78 , 96 ,
54 , 89} and E = {78} then E is a simple event.
Compound Events: Any event consists of more than one
element of the sample space. Eg: if S = {56 ,78 ,96 ,54 ,89}, E1 =
{56 ,54 }, E2 = {78 ,56 ,89 } then, E1 and E2 represent two
compound events.
Independent Events and Dependent Events:
If the occurrence of any event is completely unaffected by the
occurrence of any other event, such events are Independent
Events.
Probability of two independent event is given by,

The events which are affected by other events are Dependent
Events.
Probability of dependent event is given by,
Exhaustive Events: A set of events is called exhaustive if all the
events together consume the entire sample space. Eg: A and B are
sets of mutually exclusive events,
Mutually Exclusive Events: If the occurrence of one event
excludes the occurrence of another event i.e. no two events can
occur simultaneously.
Where,
S = sample space

Addition Theorem
Theorem 1: If A and B are two mutually exclusive events, then
P(A ∪ B) = P(A) + P(B)
Where,
n = Total number of exhaustive cases
n1= Number of cases favorable to A.
n2= Number of cases favorable to B.
Theorem2: If A and B are two events that are not mutually exclusive,
then
P(A ∪ B) = P( A ) + P( B ) - P ( A ∩ B )
Where,
P (A ∩ B) = Probability of events favorable to both A and B

Multiplication Theorem
If A and B are two independent events, then the probability that both
will occur is equal to the product of their individual probabilities.
Example:
The probability of appointing a lecturer who is B.Com, MBA, and PhD,
with probabilities 1/20, 1/25 and 1/40 is given by:
Using multiplicative theorem for independent events,

Conditional Probability
The conditional probability of an event B is the probability that the event
will occur given the knowledge that an event A has already occurred. It
is representated as P( B | A).
P(A | B) = P(A ∩ B) ⁄ P(B)
Where A and B are two dependent events.

Total Probability Theorem
Given n mutually exclusive events A1, A2, … Ak such that their
probabilities sum is unity and their union is the event space E, then
Ai ∩ Aj = NULL, for all i not equal to j
A1 U A2 U ... U Ak = E
Then Total Probability Theorem or Law of Total Probability is:
where B is an arbitrary event, and P(B/Ai) is the conditional probability
of B assuming A already occurred.

Proof of Total Probability Theorem :
As intersection and Union are Distributive. Therefore,
B = (B ∩ A1) U (B ∩ A2) U ….... U (B ∩ AK)
Since all these partitions are disjoint. So, we have,
P (B ∩ A1) = P (B ∩ A1) U P(B ∩ A2) U ….... U P (B ∩ AK)
This is, addition theorem of probabilities for union of disjoint events.
Using Conditional Probability:
P (B / A) = P(B ∩ A) / P(A)
We know,
A1 U A2 U A3 U ….. U AK =
E(Total)Then, for any event B, we have,
B = B ∩ E
B = B ∩ (A1 U A2 U A3 U … U AK)

As the events are said to be independent here,
P(A ∩ B) = P(A) * P(B)
where P(B|A) is the conditional probability which gives the probability
of occurrence of event B when event A has already occurred. Hence,
P( B ∩ Ai ) = P( B | Ai ).P( Ai ) ; i = 1,2,3 . . . k
Applying this rule above:
This is Law of Total Probability.
It is used for evaluation of denominator in Bayes’ Theorem.

BAYES’ THEOREM
It is a mathematical formula for determining conditional probability.
In above formula, the posterior probability is equal to the conditional
probability of event B given A multiplied by the prior probability of A, all
divided by the prior probability of B.
Science itself is a special case of Bayes’
theorem because we are revising a prior
probability( hypothesis) in the light of
observation or experience that confirms our
hypothesis( experimental evidence) to develop
a posterior probability( conclusion)

BINOMIAL DISTRIBUTION
OF PROBABILITY
A binomial distribution is the probability of a SUCCESS or FAILURE
outcome in an experiment or survey that is repeated multiple times.
Criteria for binomial distribution:
• The number of observations or trials is fixed
• Each observation or trial is independent.
• The probability of success (tails, heads, fail or pass) is exactly the
same from one trial to another.

Example:
Q. A coin is tossed 10 times. What is the probability of getting exactly 6
heads?
The number of trials (n) is 10
x = 6
The odds of success (p) (tossing a heads) is 0.5
Odds of failure (q) = 1- p
P(x=6) = 10C6 * 0.5^6 * 0.5^4
= 210 * 0.015625 * 0.0625
= 0.205078125

POISSON DISTRIBUTION
OF PROBABILITY
The Poisson distribution is the discrete probability distribution of the
number of events occurring in a given time period, given the average
number of times the event occurs over that time period.
When the number of trials in a binomial distribution is very large, and
the probability of success is very small, then np ~ npq (as q ~ 1),
therefore it is possible to change the distribution to a Poisson
distribution.
Where,
x = 0,1,2,3….
ƛ = mean number of occurrences in the interval
e = Euler’s constant

Example:
Q. Twenty sheets of aluminum alloy were examined for surface flaws. The
frequency of the number of sheets with a given number of flaws per sheet was
as follows
The total number of flaws = (0x4)+(1x3)+(2x5)+(3x2+(4x4)+(5x1)+(6x1) = 46
So the average for 20 sheets (ℳ ) = 46/20 = 2.3
Probability = P(X>=3)
= 1 – (P(x0) +P(x1)+P(x2))
Using Poisson distribution formula
= 0.40396
What is the probability of finding a sheet chosen at random which contains 3
or more surface flaws?

Continuous Distribution
A probability distribution in which the
random variable X can take on any value
(is continuous) i.e. the probability of X
taking on any one specific value is zero.
Normal Distribution: A continuous random variable x is said to
follow normal distribution, if its probability density function is define
as follow,
Where, (μ)= means and (σ)= standard deviations.

Chi- Squared Test:
The Chi-Square statistic is commonly used for testing relationships
between categorical variables.
The null hypothesis of the Chi-Square test is that no relationship
exists on the categorical variables in the population. They are
independent.
The calculation of the Chi-Square statistic is quite straight-forward
and intuitive.
Where,
fo = The observed frequency ,
fe = The expected frequency if NO relationship existed
between the variables,
χ2 = Degree of freedom.

Introduction to Statistics and Probability

More Related Content

What's hot (20)

Similar to Introduction to Statistics and Probability (20)

Recently uploaded (20)

Introduction to Statistics and Probability