SlideShare a Scribd company logo
Probability Theory


Convergence of Random Variables
              Phong VO
      vdphong@fit.hcmus.edu.vn

          September 11, 2010




           – Typeset by FoilTEX –
Markov and Chebychev Inequalities


Theorem 1. (Markov’s Inequality).        If X is a r.v that takes only
nonnegative values, the for any value a > 0

                                                E(X)
                                  P (X ≥ a) ≤
                                                 a




– Typeset by FoilTEX –                                               1
Proof 1. We give a proof for the case where X is continuous with density
f:

                                      ∞
                         E(X) =           xf (x)dx
                                  0
                                      a                     ∞
                             =            xf (x)dx +            xf (x)dx
                                  0                     a
                                      ∞
                             ≥            xf (x)dx
                                  a
                                      ∞
                             ≥            af (x)dx
                                  a
                                          ∞
                             =a               f (x)dx
                                      a
                             = aP (X ≥ a)

– Typeset by FoilTEX –                                                     2
Theorem 2. (Chebyshev’s Inequality). If X is a r.v with mean µ and
variance σ 2, then, for any value k > 0,


                                               σ2
                              P (|X − µ| ≥ k) ≤ 2
                                               k

Proof 2. Since (X − µ)2 is a nonnegative random variable, we can apply
Markov’s inequality to obtain


                                  2    2    E[(X − µ)2]
                         P ((X − µ) ≥ k ) ≤
                                                k2

   But since (X − µ)2 ≥ k 2 if and only if |X − µ| ≥ k, the preceding is
equivalent to

– Typeset by FoilTEX –                                                 3
E[(X − µ)2] σ 2
                   P (|X − µ| ≥ k) ≤       2
                                                = 2
                                         k       k
and the proof is complete




– Typeset by FoilTEX –                                 4
Motivation



• Since statistics and data mining are all about gathering data, it is
  naturally interested in what happens as we gather more and more data.

• It is about the behavior of sequences of random variables.




– Typeset by FoilTEX –                                                5
The Weak Law of Large Numbers (WLLN)



• This is one of the most important theorems in probability theory.

• It is said that the mean of a large sample is close to the mean of the
  distribution.

• The proportion of heads of a large number of tosses is expected to be
  closre to 1/2.




– Typeset by FoilTEX –                                                 6
Let X1, X2, . . . be an IID sample and let E(Xi) = µ and σ 2 = V (Xi).
                                                         n
Recall that the sample mean is defined as Xn = 1/n i=1 Xi and that
E(Xn = µ) and V (Xn) = σ 2/n. Then, with probability 1,
                                             P
Theorem 3. If X1, X2, . . . are IID, then Xn → µ as n → ∞

   Interpretation of WLLN: The distribution of Xn becomes more
concentrated around µ as n gets large.




– Typeset by FoilTEX –                                                  7
The Central Limit Theorem (CLT)


Theorem 4. Let X1, X2, . . . be IID with mean µ and variance σ 2. Let
          n
Xn = 1/n i=1 Xi. Then the distribution of

                                       √
                                           n(Xn − µ)
                                Zn ≡                   Z
                                              σ

     where Z ∼ N (0, 1). In other words,

                                                           a
                                               1                     2
                    limn→∞P (Zn ≤ z) = Ω(z) = √                e−x       /2
                                                                              dx
                                                2π     −∞




– Typeset by FoilTEX –                                                             8
Interpretation of CLT: Probability statement about Xn can be
approximated using a Normal distribution. It’s the probability statements
that we are approximating, not the random variable itself.

• This theorem provides a simple method for computing approximate
  probabilities for sums of independent random variables.

• Explain the remarkable fact that the empirical frequencies of so many
  natural ”‘populations”’ exhibit a bell-shaped curve.

• This theorem holds for any distribution of the Xi’s




– Typeset by FoilTEX –                                                  9
Example 1. ( Normal Approximation to the Binomial) Let X be the
number of times that a fair coin, flipped 40 times, lands heads. Find the
probability that X = 20. Use the normal approximation and then compare
it to the exact solution.

Example 2. Let Xi, i = 1, 2, . . . , 10 be independent r.vs, each being
                                                1
uniformly distributed over (0, 1). Estimate P ( 1 0Xi > 7)

Example 3. The lifetime of a special type of battery is a r.v with mean
40 hours and standard deviation 20 hours. A battery is used until it fails,
at which point it is replaced by a new one. Assuming a stockpile of 25
such batteries, the lifetimes of which are independent, approximate the
probability that over 1100 hours of use can be obtained.




– Typeset by FoilTEX –                                                   10
Stochastic Processes


• A stochastic process {X(t), t ∈ T } is a collection of r.vs. For each t ∈ T ,
  X(t) is a r.v.

• We interpret t as time and X(t) as the state of the process at time t.

• T is called the index set of the process; discrete-time process: T is a
  countable set; continuous-time process: T is an interval of the real line

• The state space of a stochastic process is defined as the set of all possible
  values that the r.v X(t) can assume.

• A stochastic process if a family of r.vs that describe the evolution through
  time of some (physical) process.

– Typeset by FoilTEX –                                                       11
Example 4. Consider a particle that moved along a set of m + 1 nodes,
labeled 0, 1, . . . , m, that are arranged around a circle. At each step the
particle is equally likely to move one position in either the clockwise or
counterclockwise direction. That is, is the position of the particle after its
nth step then



                                                                    1
             P (Xn+1 = i + 1|Xn = i) = P (Xn+1 = i − 1|Xn = i) =
                                                                    2


   where i + 1 ≡ 0 when i = m and i − 1 ≡ m when i = 0. Suppose now
that the particle starts at 0 and continues to move around according to the
preceding rules until all the nodes 1, 2, . . . , m have been visited. What is
the probability that node i, i = 1, 2, . . . , m, is the last one visited?


– Typeset by FoilTEX –                                                      12

More Related Content

PDF
Prml
PDF
Ml mle_bayes
PDF
Intro probability 2
PDF
Intro probability 3
PDF
Quantum modes - Ion Cotaescu
PDF
Deep generative model.pdf
PDF
Monte Carlo Statistical Methods
PDF
EM algorithm and its application in probabilistic latent semantic analysis
Prml
Ml mle_bayes
Intro probability 2
Intro probability 3
Quantum modes - Ion Cotaescu
Deep generative model.pdf
Monte Carlo Statistical Methods
EM algorithm and its application in probabilistic latent semantic analysis

What's hot (18)

PDF
Lesson 23: Antiderivatives (slides)
PDF
11.solution of linear and nonlinear partial differential equations using mixt...
PDF
Solution of linear and nonlinear partial differential equations using mixture...
PDF
Nested sampling
PDF
Runtime Analysis of Population-based Evolutionary Algorithms
PDF
Lesson 18: Maximum and Minimum Values (slides)
PDF
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
PDF
Bachelor_Defense
PDF
ma112011id535
PDF
Lesson 26: The Fundamental Theorem of Calculus (slides)
PDF
11.[104 111]analytical solution for telegraph equation by modified of sumudu ...
PDF
tensor-decomposition
PDF
Lesson 15: Exponential Growth and Decay (slides)
PDF
Bachelor thesis of do dai chi
PDF
A new approach to constants of the motion and the helmholtz conditions
PDF
NIPS2010: optimization algorithms in machine learning
PDF
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
PDF
Monte Carlo Statistical Methods
Lesson 23: Antiderivatives (slides)
11.solution of linear and nonlinear partial differential equations using mixt...
Solution of linear and nonlinear partial differential equations using mixture...
Nested sampling
Runtime Analysis of Population-based Evolutionary Algorithms
Lesson 18: Maximum and Minimum Values (slides)
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Bachelor_Defense
ma112011id535
Lesson 26: The Fundamental Theorem of Calculus (slides)
11.[104 111]analytical solution for telegraph equation by modified of sumudu ...
tensor-decomposition
Lesson 15: Exponential Growth and Decay (slides)
Bachelor thesis of do dai chi
A new approach to constants of the motion and the helmholtz conditions
NIPS2010: optimization algorithms in machine learning
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
Monte Carlo Statistical Methods
Ad

Viewers also liked (20)

PDF
Probability theory
PDF
Intro probability 1
PPT
Probability And Random Variable Lecture(Lec8)
PDF
Lecture slides stats1.13.l07.air
PPTX
Attractive ppt on Hypothesis by ammara aftab
PPTX
Bivariate data
PPT
Exploring bivariate data
PPTX
Sampling, Statistics and Sample Size
PPTX
law of large number and central limit theorem
PDF
Lecture slides stats1.13.l09.air
PPTX
Sampling and Sampling Distributions
PPTX
Sampling distribution concepts
PPT
Discrete Probability Distributions
PPT
Probability Theory and Mathematical Statistics
PPT
Sampling distribution
PPTX
Sampling distribution
PPTX
Attribution theory
ODP
ANOVA II
PPTX
Attribution Theory ppt
ODP
Multiple Linear Regression II and ANOVA I
Probability theory
Intro probability 1
Probability And Random Variable Lecture(Lec8)
Lecture slides stats1.13.l07.air
Attractive ppt on Hypothesis by ammara aftab
Bivariate data
Exploring bivariate data
Sampling, Statistics and Sample Size
law of large number and central limit theorem
Lecture slides stats1.13.l09.air
Sampling and Sampling Distributions
Sampling distribution concepts
Discrete Probability Distributions
Probability Theory and Mathematical Statistics
Sampling distribution
Sampling distribution
Attribution theory
ANOVA II
Attribution Theory ppt
Multiple Linear Regression II and ANOVA I
Ad

Similar to Intro probability 4 (20)

PDF
Session 6
PDF
Engr 371 final exam august 1999
PDF
Engr 371 final exam april 1999
PDF
Basics of probability in statistical simulation and stochastic programming
PDF
Engr 371 final exam april 1996
PDF
Ssp notes
DOCX
El6303 solu 3 f15 1
PDF
Tail
PDF
RANDOM NUMBER GENERATION, The Logistic Equation
PDF
10 Computing
PDF
Tele3113 wk1wed
PDF
Engr 371 final exam april 2010
PDF
Engr 371 final exam december 1997
PDF
09 Unif Exp Gamma
PDF
01_AJMS_158_18_RA.pdf
PDF
01_AJMS_158_18_RA.pdf
PDF
A Note on the Asymptotic Convergence of Bernoulli Distribution
PDF
05210401 P R O B A B I L I T Y T H E O R Y A N D S T O C H A S T I C P R...
PDF
mathes probabality mca syllabus for probability and stats
PDF
Course material mca
Session 6
Engr 371 final exam august 1999
Engr 371 final exam april 1999
Basics of probability in statistical simulation and stochastic programming
Engr 371 final exam april 1996
Ssp notes
El6303 solu 3 f15 1
Tail
RANDOM NUMBER GENERATION, The Logistic Equation
10 Computing
Tele3113 wk1wed
Engr 371 final exam april 2010
Engr 371 final exam december 1997
09 Unif Exp Gamma
01_AJMS_158_18_RA.pdf
01_AJMS_158_18_RA.pdf
A Note on the Asymptotic Convergence of Bernoulli Distribution
05210401 P R O B A B I L I T Y T H E O R Y A N D S T O C H A S T I C P R...
mathes probabality mca syllabus for probability and stats
Course material mca

Intro probability 4

  • 1. Probability Theory Convergence of Random Variables Phong VO vdphong@fit.hcmus.edu.vn September 11, 2010 – Typeset by FoilTEX –
  • 2. Markov and Chebychev Inequalities Theorem 1. (Markov’s Inequality). If X is a r.v that takes only nonnegative values, the for any value a > 0 E(X) P (X ≥ a) ≤ a – Typeset by FoilTEX – 1
  • 3. Proof 1. We give a proof for the case where X is continuous with density f: ∞ E(X) = xf (x)dx 0 a ∞ = xf (x)dx + xf (x)dx 0 a ∞ ≥ xf (x)dx a ∞ ≥ af (x)dx a ∞ =a f (x)dx a = aP (X ≥ a) – Typeset by FoilTEX – 2
  • 4. Theorem 2. (Chebyshev’s Inequality). If X is a r.v with mean µ and variance σ 2, then, for any value k > 0, σ2 P (|X − µ| ≥ k) ≤ 2 k Proof 2. Since (X − µ)2 is a nonnegative random variable, we can apply Markov’s inequality to obtain 2 2 E[(X − µ)2] P ((X − µ) ≥ k ) ≤ k2 But since (X − µ)2 ≥ k 2 if and only if |X − µ| ≥ k, the preceding is equivalent to – Typeset by FoilTEX – 3
  • 5. E[(X − µ)2] σ 2 P (|X − µ| ≥ k) ≤ 2 = 2 k k and the proof is complete – Typeset by FoilTEX – 4
  • 6. Motivation • Since statistics and data mining are all about gathering data, it is naturally interested in what happens as we gather more and more data. • It is about the behavior of sequences of random variables. – Typeset by FoilTEX – 5
  • 7. The Weak Law of Large Numbers (WLLN) • This is one of the most important theorems in probability theory. • It is said that the mean of a large sample is close to the mean of the distribution. • The proportion of heads of a large number of tosses is expected to be closre to 1/2. – Typeset by FoilTEX – 6
  • 8. Let X1, X2, . . . be an IID sample and let E(Xi) = µ and σ 2 = V (Xi). n Recall that the sample mean is defined as Xn = 1/n i=1 Xi and that E(Xn = µ) and V (Xn) = σ 2/n. Then, with probability 1, P Theorem 3. If X1, X2, . . . are IID, then Xn → µ as n → ∞ Interpretation of WLLN: The distribution of Xn becomes more concentrated around µ as n gets large. – Typeset by FoilTEX – 7
  • 9. The Central Limit Theorem (CLT) Theorem 4. Let X1, X2, . . . be IID with mean µ and variance σ 2. Let n Xn = 1/n i=1 Xi. Then the distribution of √ n(Xn − µ) Zn ≡ Z σ where Z ∼ N (0, 1). In other words, a 1 2 limn→∞P (Zn ≤ z) = Ω(z) = √ e−x /2 dx 2π −∞ – Typeset by FoilTEX – 8
  • 10. Interpretation of CLT: Probability statement about Xn can be approximated using a Normal distribution. It’s the probability statements that we are approximating, not the random variable itself. • This theorem provides a simple method for computing approximate probabilities for sums of independent random variables. • Explain the remarkable fact that the empirical frequencies of so many natural ”‘populations”’ exhibit a bell-shaped curve. • This theorem holds for any distribution of the Xi’s – Typeset by FoilTEX – 9
  • 11. Example 1. ( Normal Approximation to the Binomial) Let X be the number of times that a fair coin, flipped 40 times, lands heads. Find the probability that X = 20. Use the normal approximation and then compare it to the exact solution. Example 2. Let Xi, i = 1, 2, . . . , 10 be independent r.vs, each being 1 uniformly distributed over (0, 1). Estimate P ( 1 0Xi > 7) Example 3. The lifetime of a special type of battery is a r.v with mean 40 hours and standard deviation 20 hours. A battery is used until it fails, at which point it is replaced by a new one. Assuming a stockpile of 25 such batteries, the lifetimes of which are independent, approximate the probability that over 1100 hours of use can be obtained. – Typeset by FoilTEX – 10
  • 12. Stochastic Processes • A stochastic process {X(t), t ∈ T } is a collection of r.vs. For each t ∈ T , X(t) is a r.v. • We interpret t as time and X(t) as the state of the process at time t. • T is called the index set of the process; discrete-time process: T is a countable set; continuous-time process: T is an interval of the real line • The state space of a stochastic process is defined as the set of all possible values that the r.v X(t) can assume. • A stochastic process if a family of r.vs that describe the evolution through time of some (physical) process. – Typeset by FoilTEX – 11
  • 13. Example 4. Consider a particle that moved along a set of m + 1 nodes, labeled 0, 1, . . . , m, that are arranged around a circle. At each step the particle is equally likely to move one position in either the clockwise or counterclockwise direction. That is, is the position of the particle after its nth step then 1 P (Xn+1 = i + 1|Xn = i) = P (Xn+1 = i − 1|Xn = i) = 2 where i + 1 ≡ 0 when i = m and i − 1 ≡ m when i = 0. Suppose now that the particle starts at 0 and continues to move around according to the preceding rules until all the nodes 1, 2, . . . , m have been visited. What is the probability that node i, i = 1, 2, . . . , m, is the last one visited? – Typeset by FoilTEX – 12