Intro probability 4

Probability Theory

Convergence of Random Variables
Phong VO
vdphong@fit.hcmus.edu.vn

September 11, 2010

– Typeset by FoilTEX –

Markov and Chebychev Inequalities

Theorem 1. (Markov’s Inequality). If X is a r.v that takes only
nonnegative values, the for any value a > 0

E(X)
P (X ≥ a) ≤
a

– Typeset by FoilTEX – 1

Proof 1. We give a proof for the case where X is continuous with density
f:

∞
E(X) = xf (x)dx
0
a ∞
= xf (x)dx + xf (x)dx
0 a
∞
≥ xf (x)dx
a
∞
≥ af (x)dx
a
∞
=a f (x)dx
a
= aP (X ≥ a)


Theorem 2. (Chebyshev’s Inequality). If X is a r.v with mean µ and
variance σ 2, then, for any value k > 0,

σ2
P (|X − µ| ≥ k) ≤ 2
k

Proof 2. Since (X − µ)2 is a nonnegative random variable, we can apply
Markov’s inequality to obtain

2 2 E[(X − µ)2]
P ((X − µ) ≥ k ) ≤
k2

But since (X − µ)2 ≥ k 2 if and only if |X − µ| ≥ k, the preceding is
equivalent to


E[(X − µ)2] σ 2
P (|X − µ| ≥ k) ≤ 2
= 2
k k
and the proof is complete


Motivation

• Since statistics and data mining are all about gathering data, it is
naturally interested in what happens as we gather more and more data.

• It is about the behavior of sequences of random variables.


The Weak Law of Large Numbers (WLLN)

• This is one of the most important theorems in probability theory.

• It is said that the mean of a large sample is close to the mean of the
distribution.

• The proportion of heads of a large number of tosses is expected to be
closre to 1/2.


Let X1, X2, . . . be an IID sample and let E(Xi) = µ and σ 2 = V (Xi).
n
Recall that the sample mean is deﬁned as Xn = 1/n i=1 Xi and that
E(Xn = µ) and V (Xn) = σ 2/n. Then, with probability 1,
P
Theorem 3. If X1, X2, . . . are IID, then Xn → µ as n → ∞

Interpretation of WLLN: The distribution of Xn becomes more
concentrated around µ as n gets large.


The Central Limit Theorem (CLT)

Theorem 4. Let X1, X2, . . . be IID with mean µ and variance σ 2. Let
n
Xn = 1/n i=1 Xi. Then the distribution of

√
n(Xn − µ)
Zn ≡ Z
σ

where Z ∼ N (0, 1). In other words,

a
1 2
limn→∞P (Zn ≤ z) = Ω(z) = √ e−x /2
dx
2π −∞


Interpretation of CLT: Probability statement about Xn can be
approximated using a Normal distribution. It’s the probability statements
that we are approximating, not the random variable itself.

• This theorem provides a simple method for computing approximate
probabilities for sums of independent random variables.

• Explain the remarkable fact that the empirical frequencies of so many
natural ”‘populations”’ exhibit a bell-shaped curve.

• This theorem holds for any distribution of the Xi’s


Example 1. ( Normal Approximation to the Binomial) Let X be the
number of times that a fair coin, ﬂipped 40 times, lands heads. Find the
probability that X = 20. Use the normal approximation and then compare
it to the exact solution.

Example 2. Let Xi, i = 1, 2, . . . , 10 be independent r.vs, each being
1
uniformly distributed over (0, 1). Estimate P ( 1 0Xi > 7)

Example 3. The lifetime of a special type of battery is a r.v with mean
40 hours and standard deviation 20 hours. A battery is used until it fails,
at which point it is replaced by a new one. Assuming a stockpile of 25
such batteries, the lifetimes of which are independent, approximate the
probability that over 1100 hours of use can be obtained.


Stochastic Processes

• A stochastic process {X(t), t ∈ T } is a collection of r.vs. For each t ∈ T ,
X(t) is a r.v.

• We interpret t as time and X(t) as the state of the process at time t.

• T is called the index set of the process; discrete-time process: T is a
countable set; continuous-time process: T is an interval of the real line

• The state space of a stochastic process is deﬁned as the set of all possible
values that the r.v X(t) can assume.

• A stochastic process if a family of r.vs that describe the evolution through
time of some (physical) process.


Example 4. Consider a particle that moved along a set of m + 1 nodes,
labeled 0, 1, . . . , m, that are arranged around a circle. At each step the
particle is equally likely to move one position in either the clockwise or
counterclockwise direction. That is, is the position of the particle after its
nth step then

1
P (Xn+1 = i + 1|Xn = i) = P (Xn+1 = i − 1|Xn = i) =
2

where i + 1 ≡ 0 when i = m and i − 1 ≡ m when i = 0. Suppose now
that the particle starts at 0 and continues to move around according to the
preceding rules until all the nodes 1, 2, . . . , m have been visited. What is
the probability that node i, i = 1, 2, . . . , m, is the last one visited?


Intro probability 4

More Related Content

What's hot (18)

Viewers also liked (20)

Similar to Intro probability 4 (20)

Intro probability 4