Random variables and probability distributions Random Va.docx

Random variables and probability distributions
Random Variable
The outcome of an experiment need not be a number, for
example, the outcome when a
coin is tossed can be 'heads' or 'tails'. However, we often want
to represent outcomes
as numbers. A random variable is a function that associates a
unique numerical value
with every outcome of an experiment. The value of the random
variable will vary from
trial to trial as the experiment is repeated.
There are two types of random variable - discrete and
continuous.
A random variable has either an associated probability
distribution (discrete random
variable) or probability density function (continuous random
variable).
Examples
1. A coin is tossed ten times. The random variable X is the
number of tails that are
noted. X can only take the values 0, 1, ..., 10, so X is a discrete
random variable.
2. A light bulb is burned until it burns out. The random variable
Y is its lifetime in

hours. Y can take any positive real value, so Y is a continuous
random variable.
Expected Value
The expected value (or population mean) of a random variable
indicates its average or
central value. It is a useful summary value (a number) of the
variable's distribution.
Stating the expected value gives a general impression of the
behaviour of some random
variable without giving full details of its probability
distribution (if it is discrete) or its
probability density function (if it is continuous).
Two random variables with the same expected value can have
very different
distributions. There are other useful descriptive measures which
affect the shape of the
distribution, for example variance.
The expected value of a random variable X is symbolised by
E(X) or µ.
If X is a discrete random variable with possible values x1, x2,
x3, ..., xn, and p(xi)
denotes P(X = xi), then the expected value of X is defined by:
where the elements are summed over all values of the random
variable X.
http://www.stats.gla.ac.uk/steps/glossary/probability_distributio
ns.html#discvar

ns.html#contvar
ns.html#variance
If X is a continuous random variable with probability density
function f(x), then the
expected value of X is defined by:
Example
Discrete case : When a die is thrown, each of the possible faces
1, 2, 3, 4, 5, 6 (the xi's)
has a probability of 1/6 (the p(xi)'s) of showing. The expected
value of the face showing
is therefore:
µ = E(X) = (1 x 1/6) + (2 x 1/6) + (3 x 1/6) + (4 x 1/6) + (5 x
1/6) + (6 x 1/6) = 3.5
Notice that, in this case, E(X) is 3.5, which is not a possible
value of X.
See also sample mean.
Variance
The (population) variance of a random variable is a non-
negative number which gives
an idea of how widely spread the values of the random variable
are likely to be; the
larger the variance, the more scattered the observations on
average.
Stating the variance gives an impression of how closely

concentrated round the
expected value the distribution is; it is a measure of the 'spread'
of a distribution about
its average value.
Variance is symbolised by V(X) or Var(X) or
The variance of the random variable X is defined to be:
where E(X) is the expected value of the random variable X.
Notes
a. the larger the variance, the further that individual values of
the random variable
(observations) tend to be from the mean, on average;
b. the smaller the variance, the closer that individual values of
the random variable
(observations) tend to be to the mean, on average;
c. taking the square root of the variance gives the standard
deviation, i.e.:
d. the variance and standard deviation of a random variable are
always non-
negative.
See also sample variance.
http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html#
sampmean
http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html#
sampvar

Probability Distribution
The probability distribution of a discrete random variable is a
list of probabilities
associated with each of its possible values. It is also sometimes
called the probability
function or the probability mass function.
More formally, the probability distribution of a discrete random
variable X is a function
which gives the probability p(xi) that the random variable
equals xi, for each value xi:
p(xi) = P(X=xi)
It satisfies the following conditions:
a.
b.
Cumulative Distribution Function
All random variables (discrete and continuous) have a
cumulative distribution function. It
is a function giving the probability that the random variable X
is less than or equal to x,
for every value x.
Formally, the cumulative distribution function F(x) is defined to
be:

for
For a discrete random variable, the cumulative distribution
function is found by summing
up the probabilities as in the example below.
For a continuous random variable, the cumulative distribution
function is the integral of
its probability density function.
Example
Discrete case : Suppose a random variable X has the following
probability distribution
p(xi):
xi
0 1 2 3 4 5
p(xi)
1/32 5/32 10/32 10/32 5/32 1/32
This is actually a binomial distribution: Bi(5, 0.5) or B(5, 0.5).
The cumulative distribution
function F(x) is then:
xi
0 1 2 3 4 5

F(xi)
1/32 6/32 16/32 26/32 31/32 32/32
F(x) does not change at intermediate values. For example:
F(1.3) = F(1) = 6/32
F(2.86) = F(2) = 16/32
Probability Density Function
The probability density function of a continuous random
variable is a function which can
be integrated to obtain the probability that the random variable
takes a value in a given
interval.
More formally, the probability density function, f(x), of a
continuous random variable X is
the derivative of the cumulative distribution function F(x):
Since it follows that:
If f(x) is a probability density function then it must obey two
conditions:
a. that the total probability for all possible values of the
continuous random variable
X is 1:

b. that the probability density function can never be negative:
f(x) > 0 for all x.
Discrete Random Variable
A discrete random variable is one which may take on only a
countable number of
distinct values such as 0, 1, 2, 3, 4, ... Discrete random
variables are usually (but not
necessarily) counts. If a random variable can take only a finite
number of distinct values,
then it must be discrete. Examples of discrete random variables
include the number of
children in a family, the Friday night attendance at a cinema,
the number of patients in a
doctor's surgery, the number of defective light bulbs in a box of
ten.
Compare continuous random variable.
ns.html#contvar
Continuous Random Variable
A continuous random variable is one which takes an infinite
number of possible values.
Continuous random variables are usually measurements.
Examples include height,
weight, the amount of sugar in an orange, the time required to
run a mile.
Compare discrete random variable.

Independent Random Variables
Two random variables X and Y say, are said to be independent
if and only if the value of
X has no influence on the value of Y and vice versa.
The cumulative distribution functions of two independent
random variables X and Y are
related by
F(x,y) = G(x).H(y)
where
G(x) and H(y) are the marginal distribution functions of X and
Y for all pairs (x,y).
Knowledge of the value of X does not effect the probability
distribution of Y and vice
versa. Thus there is no relationship between the values of
independent random
variables.
For continuous independent random variables, their probability
density functions are
related by
f(x,y) = g(x).h(y)
where
g(x) and h(y) are the marginal density functions of the random
variables X and Y
respectively, for all pairs (x,y).
For discrete independent random variables, their probabilities

are related by
P(X = xi ; Y = yj) = P(X = xi).P(Y=yj)
for each pair (xi,yj).
Probability-Probability (P-P) Plot
A probability-probability (P-P) plot is used to see if a given set
of data follows some
specified distribution. It should be approximately linear if the
specified distribution is the
correct model.
ns.html#discvar
ns.html#cdf
ns.html#probdistn
ns.html#contvar
ns.html#discvar
The probability-probability (P-P) plot is constructed using the
theoretical cumulative
distribution function, F(x), of the specified model. The values
in the sample of data, in
order from smallest to largest, are denoted x(1), x(2), ..., x(n).
For i = 1, 2, ....., n, F(x(i)) is
plotted against (i-0.5)/n.
Compare quantile-quantile (Q-Q) plot.

Quantile-Quantile (QQ) Plot
A quantile-quantile (Q-Q) plot is used to see if a given set of
data follows some specified
distribution. It should be approximately linear if the specified
distribution is the correct
model.
The quantile-quantile (Q-Q) plot is constructed using the
theoretical cumulative
distribution function, F(x), of the specified model. The values
in the sample of data, in
order from smallest to largest, are denoted x(1), x(2), ..., x(n).
For i = 1, 2, ....., n, x(i) is
plotted against F
-1
((i-0.5)/n).
Compare probability-probability (P-P) plot.
Normal Distribution
Normal distributions model (some) continuous random
variables. Strictly, a Normal
random variable should be capable of assuming any value on the
real line, though this
requirement is often waived in practice. For example, height at
a given age for a given
gender in a given racial group is adequately described by a
Normal random variable
even though heights must be positive.

A continuous random variable X, taking all real values in the
range is said to
follow a Normal distribution with parameters µ and if it has
probability density function
We write
This probability density function (p.d.f.) is a symmetrical, bell-
shaped curve, centred at
its expected value µ. The variance is .
ns.html#cdf
ns.html#cdf
ns.html#qqplot
ns.html#cdf
ns.html#cdf
ns.html#ppplot
ns.html#contvar
ns.html#pdf
Many distributions arising in practice can be approximated by a
Normal distribution.
Other random variables may be transformed to normality.

The simplest case of the normal distribution, known as the
Standard Normal
Distribution, has expected value zero and variance one. This is
written as N(0,1).
Examples
Poisson Distribution
Poisson distributions model (some) discrete random variables.
Typically, a Poisson
random variable is a count of the number of events that occur in
a certain time interval
or spatial area. For example, the number of cars passing a fixed
point in a 5 minute
interval, or the number of calls received by a switchboard
during a given period of time.
A discrete random variable X is said to follow a Poisson
distribution with parameter m,
written X ~ Po(m), if it has probability distribution
where
x = 0, 1, 2, ..., n
ns.html#expval
ns.html#variance

ns.html#discvar
m > 0.
The following requirements must be met:
a. the length of the observation period is fixed in advance;
b. the events occur at a constant average rate;
c. the number of events occurring in disjoint intervals are
statistically independent.
The Poisson distribution has expected value E(X) = m and
variance V(X) = m; i.e. E(X)
= V(X) = m.
The Poisson distribution can sometimes be used to approximate
the Binomial
distribution with parameters n and p. When the number of
observations n is large, and
the success probability p is small, the Bi(n,p) distribution
approaches the Poisson
distribution with the parameter given by m = np. This is useful
since the computations
involved in calculating binomial probabilities are greatly
reduced.
Examples
Binomial Distribution
Binomial distributions model (some) discrete random variables.

Typically, a binomial random variable is the number of
successes in a series of trials, for
example, the number of 'heads' occurring when a coin is tossed
50 times.
A discrete random variable X is said to follow a Binomial
distribution with parameters n
and p, written X ~ Bi(n,p) or X ~ B(n,p), if it has probability
distribution
where
x = 0, 1, 2, ......., n
ns.html#indepevents
ns.html#binodistn
ns.html#binodistn
ns.html#discvar
n = 1, 2, 3, .......
p = success probability; 0 < p < 1
The trials must meet the following requirements:
a. the total number of trials is fixed in advance;
b. there are just two outcomes of each trial; success and failure;
c. the outcomes of all the trials are statistically independent;
d. all the trials have the same probability of success.

The Binomial distribution has expected value E(X) = np and
variance V(X) = np(1-p).
Examples
Geometric Distribution
Geometric distributions model (some) discrete random
variables. Typically, a Geometric
random variable is the number of trials required to obtain the
first failure, for example,
the number of tosses of a coin untill the first 'tail' is obtained,
or a process where
components from a production line are tested, in turn, until the
first defective item is
found.
A discrete random variable X is said to follow a Geometric
distribution with parameter p,
written X ~ Ge(p), if it has probability distribution
P(X=x) = p
x-1
(1-p)
x
where
x = 1, 2, 3, ...
p = success probability; 0 < p < 1

ns.html#outcome
ns.html#indepevents
ns.html#expval
ns.html#variance
ns.html#discvar
The trials must meet the following requirements:
a. the total number of trials is potentially infinite;
b. there are just two outcomes of each trial; success and failure;
c. the outcomes of all the trials are statistically independent;
d. all the trials have the same probability of success.
The Geometric distribution has expected value E(X)= 1/(1-p)
and variance V(X)=p/{(1-
p)
2
}.
The Geometric distribution is related to the Binomial
distribution in that both are based
on independent trials in which the probability of success is
constant and equal to p.
However, a Geometric random variable is the number of trials
until the first failure,
whereas a Binomial random variable is the number of successes
in n trials.
Examples

Uniform Distribution
Uniform distributions model (some) continuous random
variables and (some) discrete
random variables. The values of a uniform random variable are
uniformly distributed
over an interval. For example, if buses arrive at a given bus stop
every 15 minutes, and
you arrive at the bus stop at a random time, the time you wait
for the next bus to arrive
could be described by a uniform distribution over the interval
from 0 to 15.
A discrete random variable X is said to follow a Uniform
distribution with parameters a
and b, written X ~ Un(a,b), if it has probability distribution
P(X=x) = 1/(b-a)
where
x = 1, 2, 3, ......., n.
A discrete uniform distribution has equal probability at each of
its n values.
ns.html#outcome
ns.html#indepevents
ns.html#expval
ns.html#variance

ns.html#binodistn
ns.html#contvar
ns.html#discvar
ns.html#discvar
A continuous random variable X is said to follow a Uniform
distribution with parameters
a and b, written X ~ Un(a,b), if its probability density function
is constant within a finite
interval [a,b], and zero outside this interval (with a less than or
equal to b).
The Uniform distribution has expected value E(X)=(a+b)/2 and
variance {(b-a)
2
}/12.
Example
Central Limit Theorem
The Central Limit Theorem states that whenever a random
sample of size n is taken
from any distribution with mean µ and variance , then the
sample mean will be
approximately normally distributed with mean µ and variance

/n. The larger the value
of the sample size n, the better the approximation to the normal.
This is very useful when it comes to inference. For example, it
allows us (if the sample
size is fairly large) to use hypothesis tests which assume
normality even if our data
appear non-normal. This is because the tests use the sample
mean , which the
Central Limit Theorem tells us will be approximately normally
distributed.
Random variables and probability distributions
Mid-Term Exam
1. The following data set shows the number of hours of sick
leave that some of the
employees of Bastien’s, Inc. have taken during the first quarter
of the year.(20 points)
19 22 27 24 28 12
23 47 11 55 25 42
36 25 34 16 45 49
12 20 28 29 21 10
59 39 48 32 40 31

a. Develop a frequency distribution for the above data (Let the
width of your class be 10
units and start your first class as 10-19)
b. Develop a relative frequency distribution and percent
frequency distribution for the
data
c. Develop a cumulative frequency distribution
d. How many employers have taken less than 40 hours of sick
leave?
2. A researcher has obtained the number of hours worked per
week during the summer for a
sample of fifteen students. Please make sure you must use the
manual procedure
explained in your textbook in chapter 3. You should also show
mathematical steps
in detail. Your answer will be marked down by 50% if you just
include the final
answer.(20 points)
40 25 35 30 20 40 30 20 40 10 30 20 10 5 20
a. Calculate mean, range, and standard deviation.

b. Calculate 40
th
percentile and median.
3. Assume you have applied for two jobs A and B. The
probability that you get an offer for
job A is 0.23. The probability of being offered job B is 0.19.
The probability of getting at
least one of the jobs is 0.38. You should also show
mathematical steps in detail. Your
answer will be marked down by 50% if you just include the
final answer. (20 points)
a. What is the probability that you will be offered both jobs?
b. Are events A and B mutually exclusive? Why or why not?
Explain.
4. The average starting salary of this year’s graduates of a large
university (LU) is $10,000
with a standard deviation of $5,000. Furthermore, it is known
the starting salaries are
normally distributed. You should also show mathematical steps
in detail. Your

answer will be marked down by 50% if you just include the
final answer. (30 points)
a. What is the probability that a randomly selected LU graduate
will have a starting
salary of at least $20,500?
b. The sample size for a large university (LU) is 150,000 this
year, and individuals with
starting salaries of less than $15,600 receive a low income tax
break. How many of
the LU graduates will receive a low income tax break?

Random variables and probability distributions Random Va.docx

More Related Content

Similar to Random variables and probability distributions Random Va.docx (20)

More from catheryncouper (20)

Recently uploaded (20)

Random variables and probability distributions Random Va.docx