Concentration inequality in Machine Learning

Concentration Inequality in ML
Subject- Machine Learning
Dr. Varun Kumar
Subject- Machine Learning Dr. Varun Kumar 1 / 12

Outlines
1 Meaning of Concentration in Probability Context
2 Markov Inequality
3 Chebeshev Inequality
4 Moment Generating Function (MGF)
5 Chernoffs Inequality
6 References

Introduction to concentration Inequality
Key features
⇒ Concentration inequalities are widely employed in non-asymptotical
analyses of mathematical statistics in a wide range of settings.
⇒ It is a method for simplifying random quantity, ie. distribution-free to
distribution-dependent.
⇒ Simplify the other distributed random variables like, exponential,
Gamma, and Weibull to Gaussian distributed.
⇒ It works, where the mean has maximum concentration.
fX (x) =
1
√
2πσ
e−
(x−µ)2
2σ2
| {z }
Gaussian
, fX (x) =
1
β
e− x
β
| {z }
Exponential
,
fX (x) =
xα−1
e− x
β
βαΓ(α)
| {z }
Gamma
fX (x) =
k
λ
x
λ
k−1
e
−

x
λ
k
| {z }
Weibull

Usage of Inequality in machine learning
⇒ Decision action plays an important role in machine learning
(especially for solving the classification problem).
⇒ Inequality relation helps for making a decision favorable or
non-favorable.
⇒ Applying Chebyshev inequality, there is requirement of variance of the
data sequence. It is independent from the type of distribution.
⇒ Applying Markov inequality, only mean value is required for finding
probability. It also independent from density function.

Mathematical description for a given random variable
Mathematical description
General mathematics for continuous random variable:
Mean = E(X) = µ =
Z ∞
−∞
xfX (x)dx (1)
Variance = σ2
=
Z ∞
−∞
(x − µ)2
fX (x)dx (2)

Markov Inequality
Statement: If X is a positive random variable, i.e X 0, having
probability density function fX (x). Let a is an positive arbitrary constant,
then
P(X a) ≤
E(X)
a
(3)
Proof: As per the properties of random variable,
E(X) =
Z ∞
0
xfX (x)dx ≥
Z ∞
a
xfX (x)dx (4)
Let x = a, then
E(X) =
Z ∞
0
xfX (x)dx ≥ a
Z ∞
a
fX (x)dx = aP(X a) (5)
or
P(X a) ≤
E(X)
a

Example–
Q A customer goes to a shop is RV having mean 40. Find the
probability for the number of customer exceed more than 60.
Ans As per the question, let X is a RV then P(X 60) =? From Markov
inequality,
P(X 60) ≤
E(X)
60
=
40
60
Maximum probability=2/3
Question framing in training and testing data set:
Day D1 D2 D3 D4 D5 ... ... Dn
No of customer 34 25 38 66 64 ... ... 43
Table: Training data set
Let mean E(X) = µ = 40, and unlabeled input for number of customer
P(X ≥ 60) = µ
60 = 2
3

Chebeshev Inequality
Statement: If X is a positive random variable, i.e X 0, having probability
density function fX (x). Let is an positive arbitrary constant, then
P(|X − µ| ≥ ) ≤
σ2
2
(6)
Proof:
σ2
=
Z ∞
−∞
(x − µ)2
fX (x)dx ≥
Z ∞
|x−µ|≥
(x − µ)2
fX (x)dx (7)
Let |x − µ| = and ignoring the inequality then
σ2
≥
Z ∞
|x−µ|≥
(x − µ)2
fX (x)dx =
Z ∞
|x−µ|≥
2
fX (x)dx = 2
P(|x − µ| ≥ ) (8)
Hence
P(|X − µ| ≥ ) ≤
σ2
2

Example–
P(|X − µ| ≤ ) ≥ 1 −
σ2
2
Q A manufacturer produces X unit car in a week is RV having variance
is 100 and mean is 40. What will be the maximum and minimum
probability for production for 60 and and 25 unit.
Q According to question, µ = 40 and σ2 = 100
P(X ≥ 60) = P(X − 40 ≥ 20) =??
P(X ≤ 25) = P(|X − 40| ≤ 15) =??
From Chebyshev’s inequality
P(X − 40 ≥ 20) ≤ σ2
2 = 100
202 = 0.25
Similarly
P(|X − 40| ≤ 15) ≥ 1 − σ2
2 = 1 − 100
152 = 0.56

Moment generating function (MGF)
Let X is the RV then MGF is defined as
Mx (t) = E(etX
) = E
h
1 + tX +
t2X2
2!
+
t3X3
3!
+ ........
i
where t is constant. Applying expectation operator on both side
dnMx (t)
dtn
|t=0 = E[Xn
]
Chernoffs inequality
Let X is RV then etX will also be a RV for constant t. Applying the
Markov’s inequality.
P(X ≥ a) = P(etX
≥ eta
) ≤
E(etX )
eta
(9)

Jenson’s inequality
For a real convex function ϕ, numbers x1, x2, . . . , xn in its domain, and
positive weights ai , Jensen’s inequality can be stated as:
ϕ
P
ai xi
P
ai

≤
P
ai ϕ(xi )
P
ai
(10)
and the inequality is reversed if ϕ is concave, which is
ϕ
P
ai xi
P
ai

≥
P
ai ϕ(xi )
P
ai
(11)
Equality holds if and only if x1 = x2 = · · · = xn or ϕ is on a domain
containing x1, x2, · · · , xn.
Ex- Let ϕ(x) = log x is concave function then from (11)
log
x1 + x2 + ... + xn
n

≥
log x1 + log x2 + .... + log xn
n
= log(x1x2..xn)
1
n (12)
x1 + x2 + ... + xn
n
≥ (x1x2..xn)
1
n

References
E. Alpaydin, Introduction to machine learning. MIT press, 2020.
T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning , 2006, vol. 9.
J. Grus, Data science from scratch: first principles with python. O’Reilly Media,
2019.

Concentration inequality in Machine Learning

More Related Content

What's hot (20)

Similar to Concentration inequality in Machine Learning (20)

More from VARUN KUMAR (20)

Recently uploaded (20)

Concentration inequality in Machine Learning