SlideShare a Scribd company logo
Concentration Inequality in ML
Subject- Machine Learning
Dr. Varun Kumar
Subject- Machine Learning Dr. Varun Kumar 1 / 12
Outlines
1 Meaning of Concentration in Probability Context
2 Markov Inequality
3 Chebeshev Inequality
4 Moment Generating Function (MGF)
5 Chernoffs Inequality
6 References
Subject- Machine Learning Dr. Varun Kumar 2 / 12
Introduction to concentration Inequality
Key features
⇒ Concentration inequalities are widely employed in non-asymptotical
analyses of mathematical statistics in a wide range of settings.
⇒ It is a method for simplifying random quantity, ie. distribution-free to
distribution-dependent.
⇒ Simplify the other distributed random variables like, exponential,
Gamma, and Weibull to Gaussian distributed.
⇒ It works, where the mean has maximum concentration.
fX (x) =
1
√
2πσ
e−
(x−µ)2
2σ2
| {z }
Gaussian
, fX (x) =
1
β
e− x
β
| {z }
Exponential
,
fX (x) =
xα−1
e− x
β
βαΓ(α)
| {z }
Gamma
fX (x) =
k
λ
x
λ
k−1
e
−

x
λ
k
| {z }
Weibull
Subject- Machine Learning Dr. Varun Kumar 3 / 12
Usage of Inequality in machine learning
⇒ Decision action plays an important role in machine learning
(especially for solving the classification problem).
⇒ Inequality relation helps for making a decision favorable or
non-favorable.
⇒ Applying Chebyshev inequality, there is requirement of variance of the
data sequence. It is independent from the type of distribution.
⇒ Applying Markov inequality, only mean value is required for finding
probability. It also independent from density function.
Subject- Machine Learning Dr. Varun Kumar 4 / 12
Mathematical description for a given random variable
Mathematical description
General mathematics for continuous random variable:
Mean = E(X) = µ =
Z ∞
−∞
xfX (x)dx (1)
Variance = σ2
=
Z ∞
−∞
(x − µ)2
fX (x)dx (2)
Subject- Machine Learning Dr. Varun Kumar 5 / 12
Markov Inequality
Statement: If X is a positive random variable, i.e X  0, having
probability density function fX (x). Let a is an positive arbitrary constant,
then
P(X  a) ≤
E(X)
a
(3)
Proof: As per the properties of random variable,
E(X) =
Z ∞
0
xfX (x)dx ≥
Z ∞
a
xfX (x)dx (4)
Let x = a, then
E(X) =
Z ∞
0
xfX (x)dx ≥ a
Z ∞
a
fX (x)dx = aP(X  a) (5)
or
P(X  a) ≤
E(X)
a
Subject- Machine Learning Dr. Varun Kumar 6 / 12
Example–
Q A customer goes to a shop is RV having mean 40. Find the
probability for the number of customer exceed more than 60.
Ans As per the question, let X is a RV then P(X  60) =? From Markov
inequality,
P(X  60) ≤
E(X)
60
=
40
60
Maximum probability=2/3
Question framing in training and testing data set:
Day D1 D2 D3 D4 D5 ... ... Dn
No of customer 34 25 38 66 64 ... ... 43
Table: Training data set
Let mean E(X) = µ = 40, and unlabeled input for number of customer
P(X ≥ 60) = µ
60 = 2
3
Subject- Machine Learning Dr. Varun Kumar 7 / 12
Chebeshev Inequality
Statement: If X is a positive random variable, i.e X  0, having probability
density function fX (x). Let  is an positive arbitrary constant, then
P(|X − µ| ≥ ) ≤
σ2
2
(6)
Proof:
σ2
=
Z ∞
−∞
(x − µ)2
fX (x)dx ≥
Z ∞
|x−µ|≥
(x − µ)2
fX (x)dx (7)
Let |x − µ| =  and ignoring the inequality then
σ2
≥
Z ∞
|x−µ|≥
(x − µ)2
fX (x)dx =
Z ∞
|x−µ|≥
2
fX (x)dx = 2
P(|x − µ| ≥ ) (8)
Hence
P(|X − µ| ≥ ) ≤
σ2
2
Subject- Machine Learning Dr. Varun Kumar 8 / 12
Example–
P(|X − µ| ≤ ) ≥ 1 −
σ2
2
Q A manufacturer produces X unit car in a week is RV having variance
is 100 and mean is 40. What will be the maximum and minimum
probability for production for 60 and and 25 unit.
Q According to question, µ = 40 and σ2 = 100
P(X ≥ 60) = P(X − 40 ≥ 20) =??
P(X ≤ 25) = P(|X − 40| ≤ 15) =??
From Chebyshev’s inequality
P(X − 40 ≥ 20) ≤ σ2
2 = 100
202 = 0.25
Similarly
P(|X − 40| ≤ 15) ≥ 1 − σ2
2 = 1 − 100
152 = 0.56
Subject- Machine Learning Dr. Varun Kumar 9 / 12
Moment generating function (MGF)
Let X is the RV then MGF is defined as
Mx (t) = E(etX
) = E
h
1 + tX +
t2X2
2!
+
t3X3
3!
+ ........
i
where t is constant. Applying expectation operator on both side
dnMx (t)
dtn
|t=0 = E[Xn
]
Chernoffs inequality
Let X is RV then etX will also be a RV for constant t. Applying the
Markov’s inequality.
P(X ≥ a) = P(etX
≥ eta
) ≤
E(etX )
eta
(9)
Subject- Machine Learning Dr. Varun Kumar 10 / 12
Jenson’s inequality
For a real convex function ϕ, numbers x1, x2, . . . , xn in its domain, and
positive weights ai , Jensen’s inequality can be stated as:
ϕ
P
ai xi
P
ai

≤
P
ai ϕ(xi )
P
ai
(10)
and the inequality is reversed if ϕ is concave, which is
ϕ
P
ai xi
P
ai

≥
P
ai ϕ(xi )
P
ai
(11)
Equality holds if and only if x1 = x2 = · · · = xn or ϕ is on a domain
containing x1, x2, · · · , xn.
Ex- Let ϕ(x) = log x is concave function then from (11)
log
x1 + x2 + ... + xn
n

≥
log x1 + log x2 + .... + log xn
n
= log(x1x2..xn)
1
n (12)
x1 + x2 + ... + xn
n
≥ (x1x2..xn)
1
n
Subject- Machine Learning Dr. Varun Kumar 11 / 12
References
E. Alpaydin, Introduction to machine learning. MIT press, 2020.
T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning , 2006, vol. 9.
J. Grus, Data science from scratch: first principles with python. O’Reilly Media,
2019.
Subject- Machine Learning Dr. Varun Kumar 12 / 12

More Related Content

PPT
Design of experiments
PPTX
Least Squares Fitting
PPTX
Estimation and confidence interval
PPT
Unit 9b. Sample size estimation.ppt
PPTX
General research methodology mpharm
PDF
Power Analysis and Sample Size Determination
PPT
Least square method
PPTX
introduction to design of experiments
Design of experiments
Least Squares Fitting
Estimation and confidence interval
Unit 9b. Sample size estimation.ppt
General research methodology mpharm
Power Analysis and Sample Size Determination
Least square method
introduction to design of experiments

What's hot (20)

PDF
Statistical parameters
PPTX
Poisson distribution presentation
PPT
LECTURE 1 ONE SAMPLE T TEST.ppt
PPTX
What is the Multinomial-Logistic Regression Classification Algorithm and How ...
PPT
Confidence Intervals
PPT
Medical research
PPT
Confidence intervals
PPTX
Wilcoxon Rank-Sum Test
PPTX
Statistical Significance Tests.pptx
PDF
How to perform a Monte Carlo simulation
PPTX
non parametric tests.pptx
PDF
Simple Linear Regression
PPTX
Autonomy vs Beneficence.pptx
PDF
Kaplan meier survival curves and the log-rank test
PPTX
Factorial design ,full factorial design, fractional factorial design
PDF
Parameter estimation
PPT
Doe10 factorial2k blocking
PPT
Lar calc10 ch02_sec1
PPTX
2^3 factorial design in SPSS
PDF
Basic Statistics, Biostatistics, and Frequency Distribution
Statistical parameters
Poisson distribution presentation
LECTURE 1 ONE SAMPLE T TEST.ppt
What is the Multinomial-Logistic Regression Classification Algorithm and How ...
Confidence Intervals
Medical research
Confidence intervals
Wilcoxon Rank-Sum Test
Statistical Significance Tests.pptx
How to perform a Monte Carlo simulation
non parametric tests.pptx
Simple Linear Regression
Autonomy vs Beneficence.pptx
Kaplan meier survival curves and the log-rank test
Factorial design ,full factorial design, fractional factorial design
Parameter estimation
Doe10 factorial2k blocking
Lar calc10 ch02_sec1
2^3 factorial design in SPSS
Basic Statistics, Biostatistics, and Frequency Distribution
Ad

Similar to Concentration inequality in Machine Learning (20)

PDF
Gaussian process in machine learning
PDF
Probability and Statistics
PPTX
random variables-descriptive and contincuous
PDF
Application of Chebyshev and Markov Inequality in Machine Learning
PDF
Chapter 4 2022.pdf
PPTX
Lesson6 Mathmatical Expectations, mean of rv.pptx
PDF
Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
PDF
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
PDF
2 random variables notes 2p3
PDF
Statistical Hydrology for Engineering.pdf
PPTX
PPTX
Discussion about random variable ad its characterization
PPTX
Cramer row inequality
PDF
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
PDF
Finance Enginering from Columbia.pdf
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PPTX
Probability distribution
PDF
Kernels and Support Vector Machines
PDF
IVR - Chapter 1 - Introduction
PPTX
Numerical Methods: Solution of Algebraic Equations
Gaussian process in machine learning
Probability and Statistics
random variables-descriptive and contincuous
Application of Chebyshev and Markov Inequality in Machine Learning
Chapter 4 2022.pdf
Lesson6 Mathmatical Expectations, mean of rv.pptx
Deep Learning Opening Workshop - ProxSARAH Algorithms for Stochastic Composit...
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
2 random variables notes 2p3
Statistical Hydrology for Engineering.pdf
Discussion about random variable ad its characterization
Cramer row inequality
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
Finance Enginering from Columbia.pdf
Maximum likelihood estimation of regularisation parameters in inverse problem...
Probability distribution
Kernels and Support Vector Machines
IVR - Chapter 1 - Introduction
Numerical Methods: Solution of Algebraic Equations
Ad

More from VARUN KUMAR (20)

PDF
Distributed rc Model
PDF
Electrical Wire Model
PDF
Interconnect Parameter in Digital VLSI Design
PDF
Introduction to Digital VLSI Design
PDF
Challenges of Massive MIMO System
PDF
E-democracy or Digital Democracy
PDF
Ethics of Parasitic Computing
PDF
Action Lines of Geneva Plan of Action
PDF
Geneva Plan of Action
PDF
Fair Use in the Electronic Age
PDF
Software as a Property
PDF
Orthogonal Polynomial
PDF
Patent Protection
PDF
Copyright Vs Patent and Trade Secrecy Law
PDF
Property Right and Software
PDF
Investigating Data Trials
PDF
Gaussian Numerical Integration
PDF
Censorship and Controversy
PDF
Romberg's Integration
PDF
Introduction to Censorship
Distributed rc Model
Electrical Wire Model
Interconnect Parameter in Digital VLSI Design
Introduction to Digital VLSI Design
Challenges of Massive MIMO System
E-democracy or Digital Democracy
Ethics of Parasitic Computing
Action Lines of Geneva Plan of Action
Geneva Plan of Action
Fair Use in the Electronic Age
Software as a Property
Orthogonal Polynomial
Patent Protection
Copyright Vs Patent and Trade Secrecy Law
Property Right and Software
Investigating Data Trials
Gaussian Numerical Integration
Censorship and Controversy
Romberg's Integration
Introduction to Censorship

Recently uploaded (20)

PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Artificial Intelligence
PDF
composite construction of structures.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Construction Project Organization Group 2.pptx
PPTX
web development for engineering and engineering
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
OOP with Java - Java Introduction (Basics)
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Automation-in-Manufacturing-Chapter-Introduction.pdf
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Artificial Intelligence
composite construction of structures.pdf
Foundation to blockchain - A guide to Blockchain Tech
CH1 Production IntroductoryConcepts.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Construction Project Organization Group 2.pptx
web development for engineering and engineering
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Lecture Notes Electrical Wiring System Components
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx

Concentration inequality in Machine Learning

  • 1. Concentration Inequality in ML Subject- Machine Learning Dr. Varun Kumar Subject- Machine Learning Dr. Varun Kumar 1 / 12
  • 2. Outlines 1 Meaning of Concentration in Probability Context 2 Markov Inequality 3 Chebeshev Inequality 4 Moment Generating Function (MGF) 5 Chernoffs Inequality 6 References Subject- Machine Learning Dr. Varun Kumar 2 / 12
  • 3. Introduction to concentration Inequality Key features ⇒ Concentration inequalities are widely employed in non-asymptotical analyses of mathematical statistics in a wide range of settings. ⇒ It is a method for simplifying random quantity, ie. distribution-free to distribution-dependent. ⇒ Simplify the other distributed random variables like, exponential, Gamma, and Weibull to Gaussian distributed. ⇒ It works, where the mean has maximum concentration. fX (x) = 1 √ 2πσ e− (x−µ)2 2σ2 | {z } Gaussian , fX (x) = 1 β e− x β | {z } Exponential , fX (x) = xα−1 e− x β βαΓ(α) | {z } Gamma fX (x) = k λ x λ k−1 e − x λ k | {z } Weibull Subject- Machine Learning Dr. Varun Kumar 3 / 12
  • 4. Usage of Inequality in machine learning ⇒ Decision action plays an important role in machine learning (especially for solving the classification problem). ⇒ Inequality relation helps for making a decision favorable or non-favorable. ⇒ Applying Chebyshev inequality, there is requirement of variance of the data sequence. It is independent from the type of distribution. ⇒ Applying Markov inequality, only mean value is required for finding probability. It also independent from density function. Subject- Machine Learning Dr. Varun Kumar 4 / 12
  • 5. Mathematical description for a given random variable Mathematical description General mathematics for continuous random variable: Mean = E(X) = µ = Z ∞ −∞ xfX (x)dx (1) Variance = σ2 = Z ∞ −∞ (x − µ)2 fX (x)dx (2) Subject- Machine Learning Dr. Varun Kumar 5 / 12
  • 6. Markov Inequality Statement: If X is a positive random variable, i.e X 0, having probability density function fX (x). Let a is an positive arbitrary constant, then P(X a) ≤ E(X) a (3) Proof: As per the properties of random variable, E(X) = Z ∞ 0 xfX (x)dx ≥ Z ∞ a xfX (x)dx (4) Let x = a, then E(X) = Z ∞ 0 xfX (x)dx ≥ a Z ∞ a fX (x)dx = aP(X a) (5) or P(X a) ≤ E(X) a Subject- Machine Learning Dr. Varun Kumar 6 / 12
  • 7. Example– Q A customer goes to a shop is RV having mean 40. Find the probability for the number of customer exceed more than 60. Ans As per the question, let X is a RV then P(X 60) =? From Markov inequality, P(X 60) ≤ E(X) 60 = 40 60 Maximum probability=2/3 Question framing in training and testing data set: Day D1 D2 D3 D4 D5 ... ... Dn No of customer 34 25 38 66 64 ... ... 43 Table: Training data set Let mean E(X) = µ = 40, and unlabeled input for number of customer P(X ≥ 60) = µ 60 = 2 3 Subject- Machine Learning Dr. Varun Kumar 7 / 12
  • 8. Chebeshev Inequality Statement: If X is a positive random variable, i.e X 0, having probability density function fX (x). Let is an positive arbitrary constant, then P(|X − µ| ≥ ) ≤ σ2 2 (6) Proof: σ2 = Z ∞ −∞ (x − µ)2 fX (x)dx ≥ Z ∞ |x−µ|≥ (x − µ)2 fX (x)dx (7) Let |x − µ| = and ignoring the inequality then σ2 ≥ Z ∞ |x−µ|≥ (x − µ)2 fX (x)dx = Z ∞ |x−µ|≥ 2 fX (x)dx = 2 P(|x − µ| ≥ ) (8) Hence P(|X − µ| ≥ ) ≤ σ2 2 Subject- Machine Learning Dr. Varun Kumar 8 / 12
  • 9. Example– P(|X − µ| ≤ ) ≥ 1 − σ2 2 Q A manufacturer produces X unit car in a week is RV having variance is 100 and mean is 40. What will be the maximum and minimum probability for production for 60 and and 25 unit. Q According to question, µ = 40 and σ2 = 100 P(X ≥ 60) = P(X − 40 ≥ 20) =?? P(X ≤ 25) = P(|X − 40| ≤ 15) =?? From Chebyshev’s inequality P(X − 40 ≥ 20) ≤ σ2 2 = 100 202 = 0.25 Similarly P(|X − 40| ≤ 15) ≥ 1 − σ2 2 = 1 − 100 152 = 0.56 Subject- Machine Learning Dr. Varun Kumar 9 / 12
  • 10. Moment generating function (MGF) Let X is the RV then MGF is defined as Mx (t) = E(etX ) = E h 1 + tX + t2X2 2! + t3X3 3! + ........ i where t is constant. Applying expectation operator on both side dnMx (t) dtn |t=0 = E[Xn ] Chernoffs inequality Let X is RV then etX will also be a RV for constant t. Applying the Markov’s inequality. P(X ≥ a) = P(etX ≥ eta ) ≤ E(etX ) eta (9) Subject- Machine Learning Dr. Varun Kumar 10 / 12
  • 11. Jenson’s inequality For a real convex function ϕ, numbers x1, x2, . . . , xn in its domain, and positive weights ai , Jensen’s inequality can be stated as: ϕ P ai xi P ai ≤ P ai ϕ(xi ) P ai (10) and the inequality is reversed if ϕ is concave, which is ϕ P ai xi P ai ≥ P ai ϕ(xi ) P ai (11) Equality holds if and only if x1 = x2 = · · · = xn or ϕ is on a domain containing x1, x2, · · · , xn. Ex- Let ϕ(x) = log x is concave function then from (11) log x1 + x2 + ... + xn n ≥ log x1 + log x2 + .... + log xn n = log(x1x2..xn) 1 n (12) x1 + x2 + ... + xn n ≥ (x1x2..xn) 1 n Subject- Machine Learning Dr. Varun Kumar 11 / 12
  • 12. References E. Alpaydin, Introduction to machine learning. MIT press, 2020. T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University, School of Computer Science, Machine Learning , 2006, vol. 9. J. Grus, Data science from scratch: first principles with python. O’Reilly Media, 2019. Subject- Machine Learning Dr. Varun Kumar 12 / 12