SlideShare a Scribd company logo
Random variables and probability distributions
Random Variable
The outcome of an experiment need not be a number, for
example, the outcome when a
coin is tossed can be 'heads' or 'tails'. However, we often want
to represent outcomes
as numbers. A random variable is a function that associates a
unique numerical value
with every outcome of an experiment. The value of the random
variable will vary from
trial to trial as the experiment is repeated.
There are two types of random variable - discrete and
continuous.
A random variable has either an associated probability
distribution (discrete random
variable) or probability density function (continuous random
variable).
Examples
1. A coin is tossed ten times. The random variable X is the
number of tails that are
noted. X can only take the values 0, 1, ..., 10, so X is a discrete
random variable.
2. A light bulb is burned until it burns out. The random variable
Y is its lifetime in
hours. Y can take any positive real value, so Y is a continuous
random variable.
Expected Value
The expected value (or population mean) of a random variable
indicates its average or
central value. It is a useful summary value (a number) of the
variable's distribution.
Stating the expected value gives a general impression of the
behaviour of some random
variable without giving full details of its probability
distribution (if it is discrete) or its
probability density function (if it is continuous).
Two random variables with the same expected value can have
very different
distributions. There are other useful descriptive measures which
affect the shape of the
distribution, for example variance.
The expected value of a random variable X is symbolised by
E(X) or µ.
If X is a discrete random variable with possible values x1, x2,
x3, ..., xn, and p(xi)
denotes P(X = xi), then the expected value of X is defined by:
where the elements are summed over all values of the random
variable X.
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#discvar
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#contvar
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#variance
If X is a continuous random variable with probability density
function f(x), then the
expected value of X is defined by:
Example
Discrete case : When a die is thrown, each of the possible faces
1, 2, 3, 4, 5, 6 (the xi's)
has a probability of 1/6 (the p(xi)'s) of showing. The expected
value of the face showing
is therefore:
µ = E(X) = (1 x 1/6) + (2 x 1/6) + (3 x 1/6) + (4 x 1/6) + (5 x
1/6) + (6 x 1/6) = 3.5
Notice that, in this case, E(X) is 3.5, which is not a possible
value of X.
See also sample mean.
Variance
The (population) variance of a random variable is a non-
negative number which gives
an idea of how widely spread the values of the random variable
are likely to be; the
larger the variance, the more scattered the observations on
average.
Stating the variance gives an impression of how closely
concentrated round the
expected value the distribution is; it is a measure of the 'spread'
of a distribution about
its average value.
Variance is symbolised by V(X) or Var(X) or
The variance of the random variable X is defined to be:
where E(X) is the expected value of the random variable X.
Notes
a. the larger the variance, the further that individual values of
the random variable
(observations) tend to be from the mean, on average;
b. the smaller the variance, the closer that individual values of
the random variable
(observations) tend to be to the mean, on average;
c. taking the square root of the variance gives the standard
deviation, i.e.:
d. the variance and standard deviation of a random variable are
always non-
negative.
See also sample variance.
http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html#
sampmean
http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html#
sampvar
Probability Distribution
The probability distribution of a discrete random variable is a
list of probabilities
associated with each of its possible values. It is also sometimes
called the probability
function or the probability mass function.
More formally, the probability distribution of a discrete random
variable X is a function
which gives the probability p(xi) that the random variable
equals xi, for each value xi:
p(xi) = P(X=xi)
It satisfies the following conditions:
a.
b.
Cumulative Distribution Function
All random variables (discrete and continuous) have a
cumulative distribution function. It
is a function giving the probability that the random variable X
is less than or equal to x,
for every value x.
Formally, the cumulative distribution function F(x) is defined to
be:
for
For a discrete random variable, the cumulative distribution
function is found by summing
up the probabilities as in the example below.
For a continuous random variable, the cumulative distribution
function is the integral of
its probability density function.
Example
Discrete case : Suppose a random variable X has the following
probability distribution
p(xi):
xi
0 1 2 3 4 5
p(xi)
1/32 5/32 10/32 10/32 5/32 1/32
This is actually a binomial distribution: Bi(5, 0.5) or B(5, 0.5).
The cumulative distribution
function F(x) is then:
xi
0 1 2 3 4 5
F(xi)
1/32 6/32 16/32 26/32 31/32 32/32
F(x) does not change at intermediate values. For example:
F(1.3) = F(1) = 6/32
F(2.86) = F(2) = 16/32
Probability Density Function
The probability density function of a continuous random
variable is a function which can
be integrated to obtain the probability that the random variable
takes a value in a given
interval.
More formally, the probability density function, f(x), of a
continuous random variable X is
the derivative of the cumulative distribution function F(x):
Since it follows that:
If f(x) is a probability density function then it must obey two
conditions:
a. that the total probability for all possible values of the
continuous random variable
X is 1:
b. that the probability density function can never be negative:
f(x) > 0 for all x.
Discrete Random Variable
A discrete random variable is one which may take on only a
countable number of
distinct values such as 0, 1, 2, 3, 4, ... Discrete random
variables are usually (but not
necessarily) counts. If a random variable can take only a finite
number of distinct values,
then it must be discrete. Examples of discrete random variables
include the number of
children in a family, the Friday night attendance at a cinema,
the number of patients in a
doctor's surgery, the number of defective light bulbs in a box of
ten.
Compare continuous random variable.
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#contvar
Continuous Random Variable
A continuous random variable is one which takes an infinite
number of possible values.
Continuous random variables are usually measurements.
Examples include height,
weight, the amount of sugar in an orange, the time required to
run a mile.
Compare discrete random variable.
Independent Random Variables
Two random variables X and Y say, are said to be independent
if and only if the value of
X has no influence on the value of Y and vice versa.
The cumulative distribution functions of two independent
random variables X and Y are
related by
F(x,y) = G(x).H(y)
where
G(x) and H(y) are the marginal distribution functions of X and
Y for all pairs (x,y).
Knowledge of the value of X does not effect the probability
distribution of Y and vice
versa. Thus there is no relationship between the values of
independent random
variables.
For continuous independent random variables, their probability
density functions are
related by
f(x,y) = g(x).h(y)
where
g(x) and h(y) are the marginal density functions of the random
variables X and Y
respectively, for all pairs (x,y).
For discrete independent random variables, their probabilities
are related by
P(X = xi ; Y = yj) = P(X = xi).P(Y=yj)
for each pair (xi,yj).
Probability-Probability (P-P) Plot
A probability-probability (P-P) plot is used to see if a given set
of data follows some
specified distribution. It should be approximately linear if the
specified distribution is the
correct model.
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#discvar
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#cdf
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#probdistn
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#contvar
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#discvar
The probability-probability (P-P) plot is constructed using the
theoretical cumulative
distribution function, F(x), of the specified model. The values
in the sample of data, in
order from smallest to largest, are denoted x(1), x(2), ..., x(n).
For i = 1, 2, ....., n, F(x(i)) is
plotted against (i-0.5)/n.
Compare quantile-quantile (Q-Q) plot.
Quantile-Quantile (QQ) Plot
A quantile-quantile (Q-Q) plot is used to see if a given set of
data follows some specified
distribution. It should be approximately linear if the specified
distribution is the correct
model.
The quantile-quantile (Q-Q) plot is constructed using the
theoretical cumulative
distribution function, F(x), of the specified model. The values
in the sample of data, in
order from smallest to largest, are denoted x(1), x(2), ..., x(n).
For i = 1, 2, ....., n, x(i) is
plotted against F
-1
((i-0.5)/n).
Compare probability-probability (P-P) plot.
Normal Distribution
Normal distributions model (some) continuous random
variables. Strictly, a Normal
random variable should be capable of assuming any value on the
real line, though this
requirement is often waived in practice. For example, height at
a given age for a given
gender in a given racial group is adequately described by a
Normal random variable
even though heights must be positive.
A continuous random variable X, taking all real values in the
range is said to
follow a Normal distribution with parameters µ and if it has
probability density function
We write
This probability density function (p.d.f.) is a symmetrical, bell-
shaped curve, centred at
its expected value µ. The variance is .
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#cdf
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#cdf
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#qqplot
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#cdf
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#cdf
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#ppplot
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#contvar
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#pdf
Many distributions arising in practice can be approximated by a
Normal distribution.
Other random variables may be transformed to normality.
The simplest case of the normal distribution, known as the
Standard Normal
Distribution, has expected value zero and variance one. This is
written as N(0,1).
Examples
Poisson Distribution
Poisson distributions model (some) discrete random variables.
Typically, a Poisson
random variable is a count of the number of events that occur in
a certain time interval
or spatial area. For example, the number of cars passing a fixed
point in a 5 minute
interval, or the number of calls received by a switchboard
during a given period of time.
A discrete random variable X is said to follow a Poisson
distribution with parameter m,
written X ~ Po(m), if it has probability distribution
where
x = 0, 1, 2, ..., n
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#expval
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#variance
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#discvar
m > 0.
The following requirements must be met:
a. the length of the observation period is fixed in advance;
b. the events occur at a constant average rate;
c. the number of events occurring in disjoint intervals are
statistically independent.
The Poisson distribution has expected value E(X) = m and
variance V(X) = m; i.e. E(X)
= V(X) = m.
The Poisson distribution can sometimes be used to approximate
the Binomial
distribution with parameters n and p. When the number of
observations n is large, and
the success probability p is small, the Bi(n,p) distribution
approaches the Poisson
distribution with the parameter given by m = np. This is useful
since the computations
involved in calculating binomial probabilities are greatly
reduced.
Examples
Binomial Distribution
Binomial distributions model (some) discrete random variables.
Typically, a binomial random variable is the number of
successes in a series of trials, for
example, the number of 'heads' occurring when a coin is tossed
50 times.
A discrete random variable X is said to follow a Binomial
distribution with parameters n
and p, written X ~ Bi(n,p) or X ~ B(n,p), if it has probability
distribution
where
x = 0, 1, 2, ......., n
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#indepevents
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#binodistn
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#binodistn
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#discvar
n = 1, 2, 3, .......
p = success probability; 0 < p < 1
The trials must meet the following requirements:
a. the total number of trials is fixed in advance;
b. there are just two outcomes of each trial; success and failure;
c. the outcomes of all the trials are statistically independent;
d. all the trials have the same probability of success.
The Binomial distribution has expected value E(X) = np and
variance V(X) = np(1-p).
Examples
Geometric Distribution
Geometric distributions model (some) discrete random
variables. Typically, a Geometric
random variable is the number of trials required to obtain the
first failure, for example,
the number of tosses of a coin untill the first 'tail' is obtained,
or a process where
components from a production line are tested, in turn, until the
first defective item is
found.
A discrete random variable X is said to follow a Geometric
distribution with parameter p,
written X ~ Ge(p), if it has probability distribution
P(X=x) = p
x-1
(1-p)
x
where
x = 1, 2, 3, ...
p = success probability; 0 < p < 1
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#outcome
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#indepevents
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#expval
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#variance
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#discvar
The trials must meet the following requirements:
a. the total number of trials is potentially infinite;
b. there are just two outcomes of each trial; success and failure;
c. the outcomes of all the trials are statistically independent;
d. all the trials have the same probability of success.
The Geometric distribution has expected value E(X)= 1/(1-p)
and variance V(X)=p/{(1-
p)
2
}.
The Geometric distribution is related to the Binomial
distribution in that both are based
on independent trials in which the probability of success is
constant and equal to p.
However, a Geometric random variable is the number of trials
until the first failure,
whereas a Binomial random variable is the number of successes
in n trials.
Examples
Uniform Distribution
Uniform distributions model (some) continuous random
variables and (some) discrete
random variables. The values of a uniform random variable are
uniformly distributed
over an interval. For example, if buses arrive at a given bus stop
every 15 minutes, and
you arrive at the bus stop at a random time, the time you wait
for the next bus to arrive
could be described by a uniform distribution over the interval
from 0 to 15.
A discrete random variable X is said to follow a Uniform
distribution with parameters a
and b, written X ~ Un(a,b), if it has probability distribution
P(X=x) = 1/(b-a)
where
x = 1, 2, 3, ......., n.
A discrete uniform distribution has equal probability at each of
its n values.
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#outcome
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#indepevents
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#expval
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#variance
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#binodistn
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#contvar
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#discvar
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#discvar
A continuous random variable X is said to follow a Uniform
distribution with parameters
a and b, written X ~ Un(a,b), if its probability density function
is constant within a finite
interval [a,b], and zero outside this interval (with a less than or
equal to b).
The Uniform distribution has expected value E(X)=(a+b)/2 and
variance {(b-a)
2
}/12.
Example
Central Limit Theorem
The Central Limit Theorem states that whenever a random
sample of size n is taken
from any distribution with mean µ and variance , then the
sample mean will be
approximately normally distributed with mean µ and variance
/n. The larger the value
of the sample size n, the better the approximation to the normal.
This is very useful when it comes to inference. For example, it
allows us (if the sample
size is fairly large) to use hypothesis tests which assume
normality even if our data
appear non-normal. This is because the tests use the sample
mean , which the
Central Limit Theorem tells us will be approximately normally
distributed.
Random variables and probability distributions
Mid-Term Exam
1. The following data set shows the number of hours of sick
leave that some of the
employees of Bastien’s, Inc. have taken during the first quarter
of the year.(20 points)
19 22 27 24 28 12
23 47 11 55 25 42
36 25 34 16 45 49
12 20 28 29 21 10
59 39 48 32 40 31
a. Develop a frequency distribution for the above data (Let the
width of your class be 10
units and start your first class as 10-19)
b. Develop a relative frequency distribution and percent
frequency distribution for the
data
c. Develop a cumulative frequency distribution
d. How many employers have taken less than 40 hours of sick
leave?
2. A researcher has obtained the number of hours worked per
week during the summer for a
sample of fifteen students. Please make sure you must use the
manual procedure
explained in your textbook in chapter 3. You should also show
mathematical steps
in detail. Your answer will be marked down by 50% if you just
include the final
answer.(20 points)
40 25 35 30 20 40 30 20 40 10 30 20 10 5 20
a. Calculate mean, range, and standard deviation.
b. Calculate 40
th
percentile and median.
3. Assume you have applied for two jobs A and B. The
probability that you get an offer for
job A is 0.23. The probability of being offered job B is 0.19.
The probability of getting at
least one of the jobs is 0.38. You should also show
mathematical steps in detail. Your
answer will be marked down by 50% if you just include the
final answer. (20 points)
a. What is the probability that you will be offered both jobs?
b. Are events A and B mutually exclusive? Why or why not?
Explain.
4. The average starting salary of this year’s graduates of a large
university (LU) is $10,000
with a standard deviation of $5,000. Furthermore, it is known
the starting salaries are
normally distributed. You should also show mathematical steps
in detail. Your
answer will be marked down by 50% if you just include the
final answer. (30 points)
a. What is the probability that a randomly selected LU graduate
will have a starting
salary of at least $20,500?
b. The sample size for a large university (LU) is 150,000 this
year, and individuals with
starting salaries of less than $15,600 receive a low income tax
break. How many of
the LU graduates will receive a low income tax break?

More Related Content

PPTX
ISM_Session_5 _ 23rd and 24th December.pptx
DOCX
DMV (1) (1).docx
PPTX
1853_Random Variable & Distribution.pptx
PPT
LSCM 2072_chapter 1.ppt social marketing management
PPT
random variation 9473 by jaideep.ppt
DOC
Theory of probability and probability distribution
PDF
Guia de estudio para aa5
ISM_Session_5 _ 23rd and 24th December.pptx
DMV (1) (1).docx
1853_Random Variable & Distribution.pptx
LSCM 2072_chapter 1.ppt social marketing management
random variation 9473 by jaideep.ppt
Theory of probability and probability distribution
Guia de estudio para aa5

Similar to Random variables and probability distributions Random Va.docx (20)

PPTX
Probability distribution
PDF
STAT-WEEK-1-2.pdfAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
PPT
Chapter07
PPTX
Hypothesis testing.pptx
PDF
U unit7 ssb
PDF
Appendix 2 Probability And Statistics
PDF
Basic probability theory and statistics
PPTX
probabiity distributions.pptx its about types of probability distributions
PPTX
this materials is useful for the students who studying masters level in elect...
PPT
514293682-53601-week-9-Discrete-Probability-Distributions.ppt
PPTX
Discrete and Continuous Random Variables
PDF
Random variable
PDF
Random variable
PDF
Statistics-Defined.pdf
PDF
fi lecture 6 probability distribution pdf
PPTX
Chapter 1 random variables and probability distributions
PPTX
GROUP 4 IT-A.pptx ptttt ppt ppt ppt ppt ppt ppt
PDF
Chapter 4 part3- Means and Variances of Random Variables
PPT
Probability and random process Chapter two-Random_Variables.ppt
PPTX
Statistics and Probability-Random Variables and Probability Distribution
Probability distribution
STAT-WEEK-1-2.pdfAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Chapter07
Hypothesis testing.pptx
U unit7 ssb
Appendix 2 Probability And Statistics
Basic probability theory and statistics
probabiity distributions.pptx its about types of probability distributions
this materials is useful for the students who studying masters level in elect...
514293682-53601-week-9-Discrete-Probability-Distributions.ppt
Discrete and Continuous Random Variables
Random variable
Random variable
Statistics-Defined.pdf
fi lecture 6 probability distribution pdf
Chapter 1 random variables and probability distributions
GROUP 4 IT-A.pptx ptttt ppt ppt ppt ppt ppt ppt
Chapter 4 part3- Means and Variances of Random Variables
Probability and random process Chapter two-Random_Variables.ppt
Statistics and Probability-Random Variables and Probability Distribution
Ad

More from catheryncouper (20)

DOCX
1-Racism Consider the two films shown in class Night and Fog,.docx
DOCX
1-2 December 2015 Geneva, SwitzerlandWHO INFORMAL CO.docx
DOCX
1-httpfluoridealert.orgresearchersstateskentucky2-.docx
DOCX
1. Consider our political system today, in 2019. Which groups of peo.docx
DOCX
1-Ageism is a concept introduced decades ago and is defined as .docx
DOCX
1. Create a PowerPoint PowerPoint must include a minimum of.docx
DOCX
1. Compare vulnerable populations. Describe an example of one of the.docx
DOCX
1. Complete the Budget Challenge activity at httpswww.federa.docx
DOCX
1. Connections between organizations, information systems and busi.docx
DOCX
1-Experiences with a Hybrid Class Tips And PitfallsCollege .docx
DOCX
RefereanceSpectra.jpgReactionInformation.jpgWittigReacti.docx
DOCX
Reconciling the Complexity of Human DevelopmentWith the Real.docx
DOCX
Reexamine the three topics you picked last week and summarized. No.docx
DOCX
ReconstructionDatesThe Civil War_________ Recons.docx
DOCX
Record, Jeffrey. The Mystery Of Pearl Harbor. Military History 2.docx
DOCX
Reasons for Not EvaluatingReasons from McCain, D. V. (2005). Eva.docx
DOCX
Recognize Strengths and Appreciate DifferencesPersonality Dimens.docx
DOCX
Real-World DecisionsHRM350 Version 21University of Phoe.docx
DOCX
Real Clear PoliticsThe American Dream Not Dead –YetBy Ca.docx
DOCX
Recommended Reading for both Papers.· Kolter-Keller, Chapter17 D.docx
1-Racism Consider the two films shown in class Night and Fog,.docx
1-2 December 2015 Geneva, SwitzerlandWHO INFORMAL CO.docx
1-httpfluoridealert.orgresearchersstateskentucky2-.docx
1. Consider our political system today, in 2019. Which groups of peo.docx
1-Ageism is a concept introduced decades ago and is defined as .docx
1. Create a PowerPoint PowerPoint must include a minimum of.docx
1. Compare vulnerable populations. Describe an example of one of the.docx
1. Complete the Budget Challenge activity at httpswww.federa.docx
1. Connections between organizations, information systems and busi.docx
1-Experiences with a Hybrid Class Tips And PitfallsCollege .docx
RefereanceSpectra.jpgReactionInformation.jpgWittigReacti.docx
Reconciling the Complexity of Human DevelopmentWith the Real.docx
Reexamine the three topics you picked last week and summarized. No.docx
ReconstructionDatesThe Civil War_________ Recons.docx
Record, Jeffrey. The Mystery Of Pearl Harbor. Military History 2.docx
Reasons for Not EvaluatingReasons from McCain, D. V. (2005). Eva.docx
Recognize Strengths and Appreciate DifferencesPersonality Dimens.docx
Real-World DecisionsHRM350 Version 21University of Phoe.docx
Real Clear PoliticsThe American Dream Not Dead –YetBy Ca.docx
Recommended Reading for both Papers.· Kolter-Keller, Chapter17 D.docx
Ad

Recently uploaded (20)

PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Institutional Correction lecture only . . .
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
GDM (1) (1).pptx small presentation for students
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
master seminar digital applications in india
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Classroom Observation Tools for Teachers
202450812 BayCHI UCSC-SV 20250812 v17.pptx
O7-L3 Supply Chain Operations - ICLT Program
Institutional Correction lecture only . . .
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
GDM (1) (1).pptx small presentation for students
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Cell Structure & Organelles in detailed.
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Pharma ospi slides which help in ospi learning
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
VCE English Exam - Section C Student Revision Booklet
master seminar digital applications in india
01-Introduction-to-Information-Management.pdf
Final Presentation General Medicine 03-08-2024.pptx
A systematic review of self-coping strategies used by university students to ...
STATICS OF THE RIGID BODIES Hibbelers.pdf
Classroom Observation Tools for Teachers

Random variables and probability distributions Random Va.docx

  • 1. Random variables and probability distributions Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we often want to represent outcomes as numbers. A random variable is a function that associates a unique numerical value with every outcome of an experiment. The value of the random variable will vary from trial to trial as the experiment is repeated. There are two types of random variable - discrete and continuous. A random variable has either an associated probability distribution (discrete random variable) or probability density function (continuous random variable). Examples 1. A coin is tossed ten times. The random variable X is the number of tails that are noted. X can only take the values 0, 1, ..., 10, so X is a discrete random variable. 2. A light bulb is burned until it burns out. The random variable Y is its lifetime in
  • 2. hours. Y can take any positive real value, so Y is a continuous random variable. Expected Value The expected value (or population mean) of a random variable indicates its average or central value. It is a useful summary value (a number) of the variable's distribution. Stating the expected value gives a general impression of the behaviour of some random variable without giving full details of its probability distribution (if it is discrete) or its probability density function (if it is continuous). Two random variables with the same expected value can have very different distributions. There are other useful descriptive measures which affect the shape of the distribution, for example variance. The expected value of a random variable X is symbolised by E(X) or µ. If X is a discrete random variable with possible values x1, x2, x3, ..., xn, and p(xi) denotes P(X = xi), then the expected value of X is defined by: where the elements are summed over all values of the random variable X. http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#discvar
  • 3. http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#contvar http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#variance If X is a continuous random variable with probability density function f(x), then the expected value of X is defined by: Example Discrete case : When a die is thrown, each of the possible faces 1, 2, 3, 4, 5, 6 (the xi's) has a probability of 1/6 (the p(xi)'s) of showing. The expected value of the face showing is therefore: µ = E(X) = (1 x 1/6) + (2 x 1/6) + (3 x 1/6) + (4 x 1/6) + (5 x 1/6) + (6 x 1/6) = 3.5 Notice that, in this case, E(X) is 3.5, which is not a possible value of X. See also sample mean. Variance The (population) variance of a random variable is a non- negative number which gives an idea of how widely spread the values of the random variable are likely to be; the larger the variance, the more scattered the observations on average. Stating the variance gives an impression of how closely
  • 4. concentrated round the expected value the distribution is; it is a measure of the 'spread' of a distribution about its average value. Variance is symbolised by V(X) or Var(X) or The variance of the random variable X is defined to be: where E(X) is the expected value of the random variable X. Notes a. the larger the variance, the further that individual values of the random variable (observations) tend to be from the mean, on average; b. the smaller the variance, the closer that individual values of the random variable (observations) tend to be to the mean, on average; c. taking the square root of the variance gives the standard deviation, i.e.: d. the variance and standard deviation of a random variable are always non- negative. See also sample variance. http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html# sampmean http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html# sampvar
  • 5. Probability Distribution The probability distribution of a discrete random variable is a list of probabilities associated with each of its possible values. It is also sometimes called the probability function or the probability mass function. More formally, the probability distribution of a discrete random variable X is a function which gives the probability p(xi) that the random variable equals xi, for each value xi: p(xi) = P(X=xi) It satisfies the following conditions: a. b. Cumulative Distribution Function All random variables (discrete and continuous) have a cumulative distribution function. It is a function giving the probability that the random variable X is less than or equal to x, for every value x. Formally, the cumulative distribution function F(x) is defined to be:
  • 6. for For a discrete random variable, the cumulative distribution function is found by summing up the probabilities as in the example below. For a continuous random variable, the cumulative distribution function is the integral of its probability density function. Example Discrete case : Suppose a random variable X has the following probability distribution p(xi): xi 0 1 2 3 4 5 p(xi) 1/32 5/32 10/32 10/32 5/32 1/32 This is actually a binomial distribution: Bi(5, 0.5) or B(5, 0.5). The cumulative distribution function F(x) is then: xi 0 1 2 3 4 5
  • 7. F(xi) 1/32 6/32 16/32 26/32 31/32 32/32 F(x) does not change at intermediate values. For example: F(1.3) = F(1) = 6/32 F(2.86) = F(2) = 16/32 Probability Density Function The probability density function of a continuous random variable is a function which can be integrated to obtain the probability that the random variable takes a value in a given interval. More formally, the probability density function, f(x), of a continuous random variable X is the derivative of the cumulative distribution function F(x): Since it follows that: If f(x) is a probability density function then it must obey two conditions: a. that the total probability for all possible values of the continuous random variable X is 1:
  • 8. b. that the probability density function can never be negative: f(x) > 0 for all x. Discrete Random Variable A discrete random variable is one which may take on only a countable number of distinct values such as 0, 1, 2, 3, 4, ... Discrete random variables are usually (but not necessarily) counts. If a random variable can take only a finite number of distinct values, then it must be discrete. Examples of discrete random variables include the number of children in a family, the Friday night attendance at a cinema, the number of patients in a doctor's surgery, the number of defective light bulbs in a box of ten. Compare continuous random variable. http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#contvar Continuous Random Variable A continuous random variable is one which takes an infinite number of possible values. Continuous random variables are usually measurements. Examples include height, weight, the amount of sugar in an orange, the time required to run a mile. Compare discrete random variable.
  • 9. Independent Random Variables Two random variables X and Y say, are said to be independent if and only if the value of X has no influence on the value of Y and vice versa. The cumulative distribution functions of two independent random variables X and Y are related by F(x,y) = G(x).H(y) where G(x) and H(y) are the marginal distribution functions of X and Y for all pairs (x,y). Knowledge of the value of X does not effect the probability distribution of Y and vice versa. Thus there is no relationship between the values of independent random variables. For continuous independent random variables, their probability density functions are related by f(x,y) = g(x).h(y) where g(x) and h(y) are the marginal density functions of the random variables X and Y respectively, for all pairs (x,y). For discrete independent random variables, their probabilities
  • 10. are related by P(X = xi ; Y = yj) = P(X = xi).P(Y=yj) for each pair (xi,yj). Probability-Probability (P-P) Plot A probability-probability (P-P) plot is used to see if a given set of data follows some specified distribution. It should be approximately linear if the specified distribution is the correct model. http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#discvar http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#cdf http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#probdistn http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#contvar http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#discvar The probability-probability (P-P) plot is constructed using the theoretical cumulative distribution function, F(x), of the specified model. The values in the sample of data, in order from smallest to largest, are denoted x(1), x(2), ..., x(n). For i = 1, 2, ....., n, F(x(i)) is plotted against (i-0.5)/n. Compare quantile-quantile (Q-Q) plot.
  • 11. Quantile-Quantile (QQ) Plot A quantile-quantile (Q-Q) plot is used to see if a given set of data follows some specified distribution. It should be approximately linear if the specified distribution is the correct model. The quantile-quantile (Q-Q) plot is constructed using the theoretical cumulative distribution function, F(x), of the specified model. The values in the sample of data, in order from smallest to largest, are denoted x(1), x(2), ..., x(n). For i = 1, 2, ....., n, x(i) is plotted against F -1 ((i-0.5)/n). Compare probability-probability (P-P) plot. Normal Distribution Normal distributions model (some) continuous random variables. Strictly, a Normal random variable should be capable of assuming any value on the real line, though this requirement is often waived in practice. For example, height at a given age for a given gender in a given racial group is adequately described by a Normal random variable even though heights must be positive.
  • 12. A continuous random variable X, taking all real values in the range is said to follow a Normal distribution with parameters µ and if it has probability density function We write This probability density function (p.d.f.) is a symmetrical, bell- shaped curve, centred at its expected value µ. The variance is . http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#cdf http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#cdf http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#qqplot http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#cdf http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#cdf http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#ppplot http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#contvar http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#pdf Many distributions arising in practice can be approximated by a Normal distribution. Other random variables may be transformed to normality.
  • 13. The simplest case of the normal distribution, known as the Standard Normal Distribution, has expected value zero and variance one. This is written as N(0,1). Examples Poisson Distribution Poisson distributions model (some) discrete random variables. Typically, a Poisson random variable is a count of the number of events that occur in a certain time interval or spatial area. For example, the number of cars passing a fixed point in a 5 minute interval, or the number of calls received by a switchboard during a given period of time. A discrete random variable X is said to follow a Poisson distribution with parameter m, written X ~ Po(m), if it has probability distribution where x = 0, 1, 2, ..., n http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#expval http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#variance http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
  • 14. ns.html#discvar m > 0. The following requirements must be met: a. the length of the observation period is fixed in advance; b. the events occur at a constant average rate; c. the number of events occurring in disjoint intervals are statistically independent. The Poisson distribution has expected value E(X) = m and variance V(X) = m; i.e. E(X) = V(X) = m. The Poisson distribution can sometimes be used to approximate the Binomial distribution with parameters n and p. When the number of observations n is large, and the success probability p is small, the Bi(n,p) distribution approaches the Poisson distribution with the parameter given by m = np. This is useful since the computations involved in calculating binomial probabilities are greatly reduced. Examples Binomial Distribution Binomial distributions model (some) discrete random variables.
  • 15. Typically, a binomial random variable is the number of successes in a series of trials, for example, the number of 'heads' occurring when a coin is tossed 50 times. A discrete random variable X is said to follow a Binomial distribution with parameters n and p, written X ~ Bi(n,p) or X ~ B(n,p), if it has probability distribution where x = 0, 1, 2, ......., n http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#indepevents http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#binodistn http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#binodistn http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#discvar n = 1, 2, 3, ....... p = success probability; 0 < p < 1 The trials must meet the following requirements: a. the total number of trials is fixed in advance; b. there are just two outcomes of each trial; success and failure; c. the outcomes of all the trials are statistically independent; d. all the trials have the same probability of success.
  • 16. The Binomial distribution has expected value E(X) = np and variance V(X) = np(1-p). Examples Geometric Distribution Geometric distributions model (some) discrete random variables. Typically, a Geometric random variable is the number of trials required to obtain the first failure, for example, the number of tosses of a coin untill the first 'tail' is obtained, or a process where components from a production line are tested, in turn, until the first defective item is found. A discrete random variable X is said to follow a Geometric distribution with parameter p, written X ~ Ge(p), if it has probability distribution P(X=x) = p x-1 (1-p) x where x = 1, 2, 3, ... p = success probability; 0 < p < 1 http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
  • 17. ns.html#outcome http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#indepevents http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#expval http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#variance http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#discvar The trials must meet the following requirements: a. the total number of trials is potentially infinite; b. there are just two outcomes of each trial; success and failure; c. the outcomes of all the trials are statistically independent; d. all the trials have the same probability of success. The Geometric distribution has expected value E(X)= 1/(1-p) and variance V(X)=p/{(1- p) 2 }. The Geometric distribution is related to the Binomial distribution in that both are based on independent trials in which the probability of success is constant and equal to p. However, a Geometric random variable is the number of trials until the first failure, whereas a Binomial random variable is the number of successes in n trials. Examples
  • 18. Uniform Distribution Uniform distributions model (some) continuous random variables and (some) discrete random variables. The values of a uniform random variable are uniformly distributed over an interval. For example, if buses arrive at a given bus stop every 15 minutes, and you arrive at the bus stop at a random time, the time you wait for the next bus to arrive could be described by a uniform distribution over the interval from 0 to 15. A discrete random variable X is said to follow a Uniform distribution with parameters a and b, written X ~ Un(a,b), if it has probability distribution P(X=x) = 1/(b-a) where x = 1, 2, 3, ......., n. A discrete uniform distribution has equal probability at each of its n values. http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#outcome http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#indepevents http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#expval http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#variance
  • 19. http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#binodistn http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#contvar http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#discvar http://www.stats.gla.ac.uk/steps/glossary/probability_distributio ns.html#discvar A continuous random variable X is said to follow a Uniform distribution with parameters a and b, written X ~ Un(a,b), if its probability density function is constant within a finite interval [a,b], and zero outside this interval (with a less than or equal to b). The Uniform distribution has expected value E(X)=(a+b)/2 and variance {(b-a) 2 }/12. Example Central Limit Theorem The Central Limit Theorem states that whenever a random sample of size n is taken from any distribution with mean µ and variance , then the sample mean will be approximately normally distributed with mean µ and variance
  • 20. /n. The larger the value of the sample size n, the better the approximation to the normal. This is very useful when it comes to inference. For example, it allows us (if the sample size is fairly large) to use hypothesis tests which assume normality even if our data appear non-normal. This is because the tests use the sample mean , which the Central Limit Theorem tells us will be approximately normally distributed. Random variables and probability distributions Mid-Term Exam 1. The following data set shows the number of hours of sick leave that some of the employees of Bastien’s, Inc. have taken during the first quarter of the year.(20 points) 19 22 27 24 28 12 23 47 11 55 25 42 36 25 34 16 45 49 12 20 28 29 21 10 59 39 48 32 40 31
  • 21. a. Develop a frequency distribution for the above data (Let the width of your class be 10 units and start your first class as 10-19) b. Develop a relative frequency distribution and percent frequency distribution for the data c. Develop a cumulative frequency distribution d. How many employers have taken less than 40 hours of sick leave? 2. A researcher has obtained the number of hours worked per week during the summer for a sample of fifteen students. Please make sure you must use the manual procedure explained in your textbook in chapter 3. You should also show mathematical steps in detail. Your answer will be marked down by 50% if you just include the final answer.(20 points) 40 25 35 30 20 40 30 20 40 10 30 20 10 5 20 a. Calculate mean, range, and standard deviation.
  • 22. b. Calculate 40 th percentile and median. 3. Assume you have applied for two jobs A and B. The probability that you get an offer for job A is 0.23. The probability of being offered job B is 0.19. The probability of getting at least one of the jobs is 0.38. You should also show mathematical steps in detail. Your answer will be marked down by 50% if you just include the final answer. (20 points) a. What is the probability that you will be offered both jobs? b. Are events A and B mutually exclusive? Why or why not? Explain. 4. The average starting salary of this year’s graduates of a large university (LU) is $10,000 with a standard deviation of $5,000. Furthermore, it is known the starting salaries are normally distributed. You should also show mathematical steps in detail. Your
  • 23. answer will be marked down by 50% if you just include the final answer. (30 points) a. What is the probability that a randomly selected LU graduate will have a starting salary of at least $20,500? b. The sample size for a large university (LU) is 150,000 this year, and individuals with starting salaries of less than $15,600 receive a low income tax break. How many of the LU graduates will receive a low income tax break?