SlideShare a Scribd company logo
3
Most read
4
Most read
5
Most read
Bias Variance Trade-off in Machine Learning
Subject: Machine Learning
Dr. Varun Kumar
Subject: Machine Learning Dr. Varun Kumar Machine Learning 1 / 13
Outlines
1 Introduction to Under Fitting and Over Fitting
2 Introduction to Bias and Variance
3 Mathematical Intuition
4 Bias Variance Trade-off relation
5 References
Subject: Machine Learning Dr. Varun Kumar Machine Learning 2 / 13
Introduction to under fitting and over fitting and best fit
Regression problem: Underfitting, overfitting, best fitting
1 Underfitting: A model or a ML algorithm is said to have underfitting
when it cannot capture the underlying trend of the data.
2 Overfitting: A model or a ML algorithm is said to have overfitting
when it capture the underlying trend of the data very accurately.
3 Best fit: A model or a ML algorithm is said to have best fit when it
capture the underlying trend of the data moderately.
Subject: Machine Learning Dr. Varun Kumar Machine Learning 3 / 13
Bias and variance
1 Bias: It shows the degree of randomness of the training data.
(a) Based on the training data a suitable model may be created for the
regression or classification problem.
(b) Regression: Linear, logistic, polynomial
(c) Classification: Decision tree, random forest, naive baye’s, KNN, SVM
2 Variance: It shows the degree of randomness of the testing data.
(i) Testing data validates the accuracy of a model, that has been made
with the help of training data set.
(ii) Testing data is nothing but the unlabeled or unknown data.
Subject: Machine Learning Dr. Varun Kumar Machine Learning 4 / 13
Underfitting, overfitting, best fit and bias, variance
Note:
⇒ The objective of ML algorithm not only fit for the training data but
also fit for the testing data.
⇒ In other words, low bias and low variance is the appropriate solution.
Underfitting Overfitting Best fit
High bias Very low bias Low bias
High variance High variance Low variance
Subject: Machine Learning Dr. Varun Kumar Machine Learning 5 / 13
Underfitting, overfitting, and best fit in classification problem
A classifier (Decision tree, random forest, naive baye’s, KNN, SVM) works
on two types of data
Training data
Testing data
Example:
Classifier comes under the category of underfitting, overfitting, and best fit
based on its training and testing accuracy.
Underfitting Overfitting Best fit
Train error=25% Train error=1% Train error=8%
Test error= 27% Test error= 23% Test error=9%
Subject: Machine Learning Dr. Varun Kumar Machine Learning 6 / 13
Mathematical intuition
bias ˆ
f (x)

= E[ˆ
f (x)] − f (x) (1)
variance ˆ
f (x)

= E
h
ˆ
f (x) − E[ˆ
f (x)]
2
i
(2)
⇒ ˆ
f (x) → output observed through the training model
⇒ For linear model ˆ
f (x) = w1x + w0
⇒ For complex model ˆ
f (x) =
Pp
i=1 wi xi + w0
⇒ We don’t have idea regarding the true f (x).
⇒ Simple model: Low bias  high variance
⇒ Complex model: High bias  low variance
E[(y − ˆ
f (x))2
] = bias2
+ Variance + σ2
(Irreducible error) (3)
Subject: Machine Learning Dr. Varun Kumar Machine Learning 7 / 13
Continued–
⇒ Let there be n + m sample data in a given data set in which n and m
samples are taken for the training and testing (validation) purpose then
trainerr =
1
n
n
X
i=1
(yi − ˆ
f (xi ))2
testerr =
1
m
n+m
X
i=n+1
(yi − ˆ
f (xi ))2
⇒ If model complexity is increased the training model becomes too optimistic
gives a wrong picture of how close ˆ
f to f .
⇒ Let D = {xi , yi }n+m
1 is a given data set, where
yi = f (xi ) + i
⇒  ∼ N(0, σ2
)
Subject: Machine Learning Dr. Varun Kumar Machine Learning 8 / 13
Continued–
⇒ We use ˆ
f to approximate f , where training set T ⊂ D such that
yi = ˆ
f (xi )
⇒ We are interested to know
E
h
ˆ
f (xi ) − f (xi )
2
i
⇒ We cannot estimate the above expression directly, because we don’t know
f (xi )
E

(ŷi − yi )2

= E

(ˆ
f (xi ) − f (xi ) − i )2

⇐ yi = f (xi ) + i
= E

(ˆ
f (xi ) − f (xi ))2
− 2i (ˆ
f (xi ) − f (xi )) + 2
i

= E

(ˆ
f (xi ) − f (xi ))2

− 2E

i (ˆ
f (xi ) − f (xi ))

+ E

2
i

E

(ˆ
f (xi ) − f (xi ))2

= E

(ŷi − yi )2

+ 2E

i (ˆ
f (xi ) − f (xi ))

− E

2
i

(4)
Subject: Machine Learning Dr. Varun Kumar Machine Learning 9 / 13
Continued–
⇒ Empirical estimate: Let Z = {zi }n
i=1
E(Z) =
1
n
n
X
i=1
zi
⇒ We can empirically evaluate R.H.S using training or test observation
Case 1: Using test observation:
E

(ˆ
f (xi ) − f (xi ))2

| {z }
True error
= E

(ŷi − yi )2

+ 2E

i (ˆ
f (xi ) − f (xi ))

− E

2
i

=
1
m
n+m
X
i=n+1
(ŷi − yi )2
| {z }
Empirical error
−
1
m
n+m
X
i=n+1
(i )2
| {z }
Small constant
+ 2E
h
i (ˆ
f (xi ) − f (xi ))
i
| {z }
Covariance
∵ Cov(X, Y ) = E
h
(X − µx )(Y − µy )
i
let i = X and Y = (ˆ
f (xi ) − f (xi ))
Subject: Machine Learning Dr. Varun Kumar Machine Learning 10 / 13
Continued–
⇒ E
h
(X − µx )(Y − µy )
i
= E
h
X(Y − µy )
i
= E

XY

− µy E

X

= E[XY ]
⇒ None of the test data participated for estimating the ˆ
f (xi ).
⇒ ˆ
f (xi ) is estimated only using the training data.
⇒ ∴ i ⊥ (ˆ
f (xi ) − f (xi ))
∴ E[XY ] = E[X]E[Y ] = 0
⇒ True error=Empirical error+Small constant ← Test data
Case 2: For training observation:
E[XY ] 6= 0
Using Stein’s Lemma
1
n
n
X
i=1
i (ˆ
f (xi ) − f (xi )) =
σ2
n
n
X
i=1
∂ ˆ
f (xi )
∂yi
Subject: Machine Learning Dr. Varun Kumar Machine Learning 11 / 13
Bias and variance trade-off relation
Subject: Machine Learning Dr. Varun Kumar Machine Learning 12 / 13
References
E. Alpaydin, Introduction to machine learning. MIT press, 2020.
J. Grus, Data science from scratch: first principles with python. O’Reilly Media,
2019.
T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning , 2006, vol. 9.
Subject: Machine Learning Dr. Varun Kumar Machine Learning 13 / 13

More Related Content

PPTX
Underfitting and Overfitting in Machine Learning
PPTX
Overfitting & Underfitting
PPTX
Bagging.pptx
PPTX
Machine Learning: Bias and Variance Trade-off
PPTX
Ensemble Learning and Random Forests
PDF
Machine Learning Model Evaluation Methods
PDF
Understanding Bagging and Boosting
PDF
Cross validation
Underfitting and Overfitting in Machine Learning
Overfitting & Underfitting
Bagging.pptx
Machine Learning: Bias and Variance Trade-off
Ensemble Learning and Random Forests
Machine Learning Model Evaluation Methods
Understanding Bagging and Boosting
Cross validation

What's hot (20)

PDF
Logistic regression in Machine Learning
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
ODP
Machine Learning With Logistic Regression
PPT
3. mining frequent patterns
PDF
Classification Based Machine Learning Algorithms
PDF
Dimensionality Reduction
PDF
Decision trees in Machine Learning
PPTX
Support vector machines (svm)
PDF
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
PDF
Naive Bayes
PPTX
Naive Bayes Presentation
PPTX
Introduction to Linear Discriminant Analysis
PDF
Gradient descent method
PPTX
Machine learning with ADA Boost
PPTX
Data preprocessing in Machine learning
PPTX
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
PDF
Dimensionality Reduction
PDF
Scaling and Normalization
PPTX
Decision Tree - C4.5&CART
PPT
2.4 rule based classification
Logistic regression in Machine Learning
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Machine Learning With Logistic Regression
3. mining frequent patterns
Classification Based Machine Learning Algorithms
Dimensionality Reduction
Decision trees in Machine Learning
Support vector machines (svm)
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Naive Bayes
Naive Bayes Presentation
Introduction to Linear Discriminant Analysis
Gradient descent method
Machine learning with ADA Boost
Data preprocessing in Machine learning
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Dimensionality Reduction
Scaling and Normalization
Decision Tree - C4.5&CART
2.4 rule based classification
Ad

Similar to Bias and variance trade off (20)

PPTX
MLU_DTE_Lecture_2.pptx
PPTX
IME 672 - Classifier Evaluation I.pptx
PDF
M08 BiasVarianceTradeoff
PPTX
regression.pptx
PDF
11 Machine Learning Important Issues in Machine Learning
PDF
Bias-Variance_relted_to_ML.pdf
PPTX
Statistical Learning and Model Selection module 2.pptx
PDF
Lecture6 xing
PDF
Machine learning (4)
PDF
Machine Learning Foundations
PDF
Cs229 notes4
PPTX
MACHINE LEARNING YEAR DL SECOND PART.pptx
PDF
Deep Learning Introduction for Engineering
PPTX
CST413 KTU S7 CSE Machine Learning Supervised Learning Classification Algorit...
PPTX
ML2_ML (1) concepts explained in details.pptx
PPT
chap4_Parametric_Methods.ppt
PDF
Unit 1_Data Validation_Validation Techniques.pdf
PPTX
Machine learning
PPTX
Lecture 4b _Overfitting_Underfitting_Bias_Variance_Presentation.pptx
PPTX
Machine learning with scikitlearn
MLU_DTE_Lecture_2.pptx
IME 672 - Classifier Evaluation I.pptx
M08 BiasVarianceTradeoff
regression.pptx
11 Machine Learning Important Issues in Machine Learning
Bias-Variance_relted_to_ML.pdf
Statistical Learning and Model Selection module 2.pptx
Lecture6 xing
Machine learning (4)
Machine Learning Foundations
Cs229 notes4
MACHINE LEARNING YEAR DL SECOND PART.pptx
Deep Learning Introduction for Engineering
CST413 KTU S7 CSE Machine Learning Supervised Learning Classification Algorit...
ML2_ML (1) concepts explained in details.pptx
chap4_Parametric_Methods.ppt
Unit 1_Data Validation_Validation Techniques.pdf
Machine learning
Lecture 4b _Overfitting_Underfitting_Bias_Variance_Presentation.pptx
Machine learning with scikitlearn
Ad

More from VARUN KUMAR (20)

PDF
Distributed rc Model
PDF
Electrical Wire Model
PDF
Interconnect Parameter in Digital VLSI Design
PDF
Introduction to Digital VLSI Design
PDF
Challenges of Massive MIMO System
PDF
E-democracy or Digital Democracy
PDF
Ethics of Parasitic Computing
PDF
Action Lines of Geneva Plan of Action
PDF
Geneva Plan of Action
PDF
Fair Use in the Electronic Age
PDF
Software as a Property
PDF
Orthogonal Polynomial
PDF
Patent Protection
PDF
Copyright Vs Patent and Trade Secrecy Law
PDF
Property Right and Software
PDF
Investigating Data Trials
PDF
Gaussian Numerical Integration
PDF
Censorship and Controversy
PDF
Romberg's Integration
PDF
Introduction to Censorship
Distributed rc Model
Electrical Wire Model
Interconnect Parameter in Digital VLSI Design
Introduction to Digital VLSI Design
Challenges of Massive MIMO System
E-democracy or Digital Democracy
Ethics of Parasitic Computing
Action Lines of Geneva Plan of Action
Geneva Plan of Action
Fair Use in the Electronic Age
Software as a Property
Orthogonal Polynomial
Patent Protection
Copyright Vs Patent and Trade Secrecy Law
Property Right and Software
Investigating Data Trials
Gaussian Numerical Integration
Censorship and Controversy
Romberg's Integration
Introduction to Censorship

Recently uploaded (20)

PDF
Well-logging-methods_new................
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
composite construction of structures.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Geodesy 1.pptx...............................................
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
PPT on Performance Review to get promotions
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Construction Project Organization Group 2.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPT
Mechanical Engineering MATERIALS Selection
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
web development for engineering and engineering
PDF
Structs to JSON How Go Powers REST APIs.pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Well-logging-methods_new................
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
composite construction of structures.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Geodesy 1.pptx...............................................
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPT on Performance Review to get promotions
additive manufacturing of ss316l using mig welding
Construction Project Organization Group 2.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Mechanical Engineering MATERIALS Selection
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
web development for engineering and engineering
Structs to JSON How Go Powers REST APIs.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...

Bias and variance trade off

  • 1. Bias Variance Trade-off in Machine Learning Subject: Machine Learning Dr. Varun Kumar Subject: Machine Learning Dr. Varun Kumar Machine Learning 1 / 13
  • 2. Outlines 1 Introduction to Under Fitting and Over Fitting 2 Introduction to Bias and Variance 3 Mathematical Intuition 4 Bias Variance Trade-off relation 5 References Subject: Machine Learning Dr. Varun Kumar Machine Learning 2 / 13
  • 3. Introduction to under fitting and over fitting and best fit Regression problem: Underfitting, overfitting, best fitting 1 Underfitting: A model or a ML algorithm is said to have underfitting when it cannot capture the underlying trend of the data. 2 Overfitting: A model or a ML algorithm is said to have overfitting when it capture the underlying trend of the data very accurately. 3 Best fit: A model or a ML algorithm is said to have best fit when it capture the underlying trend of the data moderately. Subject: Machine Learning Dr. Varun Kumar Machine Learning 3 / 13
  • 4. Bias and variance 1 Bias: It shows the degree of randomness of the training data. (a) Based on the training data a suitable model may be created for the regression or classification problem. (b) Regression: Linear, logistic, polynomial (c) Classification: Decision tree, random forest, naive baye’s, KNN, SVM 2 Variance: It shows the degree of randomness of the testing data. (i) Testing data validates the accuracy of a model, that has been made with the help of training data set. (ii) Testing data is nothing but the unlabeled or unknown data. Subject: Machine Learning Dr. Varun Kumar Machine Learning 4 / 13
  • 5. Underfitting, overfitting, best fit and bias, variance Note: ⇒ The objective of ML algorithm not only fit for the training data but also fit for the testing data. ⇒ In other words, low bias and low variance is the appropriate solution. Underfitting Overfitting Best fit High bias Very low bias Low bias High variance High variance Low variance Subject: Machine Learning Dr. Varun Kumar Machine Learning 5 / 13
  • 6. Underfitting, overfitting, and best fit in classification problem A classifier (Decision tree, random forest, naive baye’s, KNN, SVM) works on two types of data Training data Testing data Example: Classifier comes under the category of underfitting, overfitting, and best fit based on its training and testing accuracy. Underfitting Overfitting Best fit Train error=25% Train error=1% Train error=8% Test error= 27% Test error= 23% Test error=9% Subject: Machine Learning Dr. Varun Kumar Machine Learning 6 / 13
  • 7. Mathematical intuition bias ˆ f (x) = E[ˆ f (x)] − f (x) (1) variance ˆ f (x) = E h ˆ f (x) − E[ˆ f (x)] 2 i (2) ⇒ ˆ f (x) → output observed through the training model ⇒ For linear model ˆ f (x) = w1x + w0 ⇒ For complex model ˆ f (x) = Pp i=1 wi xi + w0 ⇒ We don’t have idea regarding the true f (x). ⇒ Simple model: Low bias high variance ⇒ Complex model: High bias low variance E[(y − ˆ f (x))2 ] = bias2 + Variance + σ2 (Irreducible error) (3) Subject: Machine Learning Dr. Varun Kumar Machine Learning 7 / 13
  • 8. Continued– ⇒ Let there be n + m sample data in a given data set in which n and m samples are taken for the training and testing (validation) purpose then trainerr = 1 n n X i=1 (yi − ˆ f (xi ))2 testerr = 1 m n+m X i=n+1 (yi − ˆ f (xi ))2 ⇒ If model complexity is increased the training model becomes too optimistic gives a wrong picture of how close ˆ f to f . ⇒ Let D = {xi , yi }n+m 1 is a given data set, where yi = f (xi ) + i ⇒ ∼ N(0, σ2 ) Subject: Machine Learning Dr. Varun Kumar Machine Learning 8 / 13
  • 9. Continued– ⇒ We use ˆ f to approximate f , where training set T ⊂ D such that yi = ˆ f (xi ) ⇒ We are interested to know E h ˆ f (xi ) − f (xi ) 2 i ⇒ We cannot estimate the above expression directly, because we don’t know f (xi ) E (ŷi − yi )2 = E (ˆ f (xi ) − f (xi ) − i )2 ⇐ yi = f (xi ) + i = E (ˆ f (xi ) − f (xi ))2 − 2i (ˆ f (xi ) − f (xi )) + 2 i = E (ˆ f (xi ) − f (xi ))2 − 2E i (ˆ f (xi ) − f (xi )) + E 2 i E (ˆ f (xi ) − f (xi ))2 = E (ŷi − yi )2 + 2E i (ˆ f (xi ) − f (xi )) − E 2 i (4) Subject: Machine Learning Dr. Varun Kumar Machine Learning 9 / 13
  • 10. Continued– ⇒ Empirical estimate: Let Z = {zi }n i=1 E(Z) = 1 n n X i=1 zi ⇒ We can empirically evaluate R.H.S using training or test observation Case 1: Using test observation: E (ˆ f (xi ) − f (xi ))2 | {z } True error = E (ŷi − yi )2 + 2E i (ˆ f (xi ) − f (xi )) − E 2 i = 1 m n+m X i=n+1 (ŷi − yi )2 | {z } Empirical error − 1 m n+m X i=n+1 (i )2 | {z } Small constant + 2E h i (ˆ f (xi ) − f (xi )) i | {z } Covariance ∵ Cov(X, Y ) = E h (X − µx )(Y − µy ) i let i = X and Y = (ˆ f (xi ) − f (xi )) Subject: Machine Learning Dr. Varun Kumar Machine Learning 10 / 13
  • 11. Continued– ⇒ E h (X − µx )(Y − µy ) i = E h X(Y − µy ) i = E XY − µy E X = E[XY ] ⇒ None of the test data participated for estimating the ˆ f (xi ). ⇒ ˆ f (xi ) is estimated only using the training data. ⇒ ∴ i ⊥ (ˆ f (xi ) − f (xi )) ∴ E[XY ] = E[X]E[Y ] = 0 ⇒ True error=Empirical error+Small constant ← Test data Case 2: For training observation: E[XY ] 6= 0 Using Stein’s Lemma 1 n n X i=1 i (ˆ f (xi ) − f (xi )) = σ2 n n X i=1 ∂ ˆ f (xi ) ∂yi Subject: Machine Learning Dr. Varun Kumar Machine Learning 11 / 13
  • 12. Bias and variance trade-off relation Subject: Machine Learning Dr. Varun Kumar Machine Learning 12 / 13
  • 13. References E. Alpaydin, Introduction to machine learning. MIT press, 2020. J. Grus, Data science from scratch: first principles with python. O’Reilly Media, 2019. T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University, School of Computer Science, Machine Learning , 2006, vol. 9. Subject: Machine Learning Dr. Varun Kumar Machine Learning 13 / 13