SlideShare a Scribd company logo
Bias-variance decomposition 
in Random Forests 
Gilles Louppe @glouppe 
Paris ML S02E04, December 8, 2014 
1 / 14
Motivation 
In supervised learning, combining the predictions of several 
randomized models often achieves better results than a single 
non-randomized model. 
Why ? 
2 / 14
Supervised learning 
 The inputs are random variables X = X1, ..., Xp ; 
 The output is a random variable Y . 
 Data comes as a
nite learning set 
L = f(xi , yi )ji = 0, . . . ,N - 1g, 
where xi 2 X = X1  ...  Xp and yi 2 Y are randomly drawn 
from PX,Y . 
 The goal is to
nd a model 'L : X7! Y minimizing 
Err ('L) = EX,Y fL(Y ,'L(X))g. 
3 / 14
Performance evaluation 
Classi
cation 
 Symbolic output (e.g., Y = fyes, nog) 
 Zero-one loss 
L(Y ,'L(X)) = 1(Y6= 'L(X)) 
Regression 
 Numerical output (e.g., Y = R) 
 Squared error loss 
L(Y ,'L(X)) = (Y - 'L(X))2 
4 / 14
Decision trees 
0.7 
0.5 
X1 
X2 
t5 t3 t4 
푡2 
푡1 
푋1 ≤ 0.7 
푡3 
푡4 푡5 
풙 
푝(푌 = 푐|푋 = 풙) 
Split node 
≤  Leaf node 
푋2 ≤ 0.5 ≤  
t 2 ' : nodes of the tree ' 
Xt : split variable at t 
vt 2 R : split threshold at t 
'(x) = arg maxc2Y p(Y = cjX = x) 
5 / 14
Bias-variance decomposition in regression 
Theorem. For the squared error loss, the bias-variance 
decomposition of the expected generalization error at X = x is 
ELfErr ('L(x))g = noise(x) + bias2(x) + var(x) 
where 
noise(x) = Err ('B(x)), 
bias2(x) = ('B(x) - ELf'L(x)g)2, 
var(x) = ELf(ELf'L(x)g - 'L(x))2g. 
6 / 14
Bias-variance decomposition 
7 / 14
Diagnosing the generalization error of a decision tree 
 (Residual error : Lowest achievable error, independent of 'L.) 
 Bias : Decision trees usually have low bias. 
 Variance : They often suer from high variance. 
 Solution : Combine the predictions of several randomized trees 
into a single model. 
8 / 14
Random forests 
풙 
휑1 휑푀 
푝휑1 (푌 = 푐|푋 = 풙) 
… 
푝휑푚 (푌 = 푐|푋 = 풙) 
Σ 
푝휓(푌 = 푐|푋 = 풙) 
Randomization 
 Bootstrap samples  Random selection of K 6 p split variables g Random Forests  Random selection of the threshold g Extra-Trees 
9 / 14
Bias-variance decomposition (cont.) 
Theorem. For the squared error loss, the bias-variance 
decomposition of the expected generalization error 
ELfErr ( L,1,...,M (x))g at X = x of an ensemble of M 
randomized models 'L,m is 
ELfErr ( L,1,...,M (x))g = noise(x) + bias2(x) + var(x), 
where 
noise(x) = Err ('B(x)), 
bias2(x) = ('B(x) - EL,f'L,(x)g)2, 
var(x) = (x)2 
L,(x) + 
1 - (x) 
M 
2 
L,(x). 
and where (x) is the Pearson correlation coecient between the 
predictions of two randomized trees built on the same learning set. 
10 / 14
Interpretation of (x) (Louppe, 2014) 
Theorem. (x) = 
VLfEjLf'L,(x)gg 
VLfEjLf'L,(x)gg+ELfVjLf'L,(x)gg 
In other words, it is the ratio between 
 the variance due to the learning set and 
 the total variance, accounting for random eects due to both 
the learning set and the random perburbations. 
(x) ! 1 when variance is mostly due to the learning set ; 
(x) ! 0 when variance is mostly due to the random 
perturbations ; 
(x)  0. 
11 / 14

More Related Content

PDF
[DL輪読会]YOLO9000: Better, Faster, Stronger
PDF
0から理解するニューラルネットアーキテクチャサーチ(NAS)
PPT
Linear regression
PPTX
ボリュームレンダリング入門
PDF
Cubic Spline Interpolation
PDF
最適輸送の計算アルゴリズムの研究動向
PDF
Htn in videogames
PPTX
Gradient , Directional Derivative , Divergence , Curl
[DL輪読会]YOLO9000: Better, Faster, Stronger
0から理解するニューラルネットアーキテクチャサーチ(NAS)
Linear regression
ボリュームレンダリング入門
Cubic Spline Interpolation
最適輸送の計算アルゴリズムの研究動向
Htn in videogames
Gradient , Directional Derivative , Divergence , Curl

Similar to Bias-variance decomposition in Random Forests (20)

PDF
Introduction to Machine Learning Lectures
PPTX
MLU_DTE_Lecture_2.pptx
PDF
M08 BiasVarianceTradeoff
PDF
4_2_Ensemble models and gradient boosting2.pdf
PDF
4 2 ensemble models and grad boost part 1
PPTX
regression.pptx
PPT
chap4_Parametric_Methods.ppt
PDF
Cs229 notes4
PDF
Tree models with Scikit-Learn: Great models with little assumptions
PPT
Telling the Story of Support Vector Mmachines.ppt
PDF
Demystifying the Bias-Variance Tradeoff
PDF
Conistency of random forests
PDF
Machine learning (4)
PDF
PDF
4 2 ensemble models and grad boost part 1 2019-10-07
PDF
Empirical Analysis of the Bias-Variance Tradeoff Across Machine Learning Models
PDF
Empirical Analysis of the Bias-Variance Tradeoff Across Machine Learning Models
PPT
Tree net and_randomforests_2009
PDF
Lecture 3b: Decision Trees (1 part)
PDF
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Introduction to Machine Learning Lectures
MLU_DTE_Lecture_2.pptx
M08 BiasVarianceTradeoff
4_2_Ensemble models and gradient boosting2.pdf
4 2 ensemble models and grad boost part 1
regression.pptx
chap4_Parametric_Methods.ppt
Cs229 notes4
Tree models with Scikit-Learn: Great models with little assumptions
Telling the Story of Support Vector Mmachines.ppt
Demystifying the Bias-Variance Tradeoff
Conistency of random forests
Machine learning (4)
4 2 ensemble models and grad boost part 1 2019-10-07
Empirical Analysis of the Bias-Variance Tradeoff Across Machine Learning Models
Empirical Analysis of the Bias-Variance Tradeoff Across Machine Learning Models
Tree net and_randomforests_2009
Lecture 3b: Decision Trees (1 part)
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Ad

Recently uploaded (20)

PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
project resource management chapter-09.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
August Patch Tuesday
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Hybrid model detection and classification of lung cancer
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
1. Introduction to Computer Programming.pptx
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
TLE Review Electricity (Electricity).pptx
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
project resource management chapter-09.pdf
A comparative analysis of optical character recognition models for extracting...
August Patch Tuesday
Digital-Transformation-Roadmap-for-Companies.pptx
SOPHOS-XG Firewall Administrator PPT.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Hybrid model detection and classification of lung cancer
Unlocking AI with Model Context Protocol (MCP)
DP Operators-handbook-extract for the Mautical Institute
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
gpt5_lecture_notes_comprehensive_20250812015547.pdf
A novel scalable deep ensemble learning framework for big data classification...
NewMind AI Weekly Chronicles - August'25-Week II
1. Introduction to Computer Programming.pptx
Enhancing emotion recognition model for a student engagement use case through...
1 - Historical Antecedents, Social Consideration.pdf
TLE Review Electricity (Electricity).pptx
Ad

Bias-variance decomposition in Random Forests

  • 1. Bias-variance decomposition in Random Forests Gilles Louppe @glouppe Paris ML S02E04, December 8, 2014 1 / 14
  • 2. Motivation In supervised learning, combining the predictions of several randomized models often achieves better results than a single non-randomized model. Why ? 2 / 14
  • 3. Supervised learning The inputs are random variables X = X1, ..., Xp ; The output is a random variable Y . Data comes as a
  • 4. nite learning set L = f(xi , yi )ji = 0, . . . ,N - 1g, where xi 2 X = X1 ... Xp and yi 2 Y are randomly drawn from PX,Y . The goal is to
  • 5. nd a model 'L : X7! Y minimizing Err ('L) = EX,Y fL(Y ,'L(X))g. 3 / 14
  • 7. cation Symbolic output (e.g., Y = fyes, nog) Zero-one loss L(Y ,'L(X)) = 1(Y6= 'L(X)) Regression Numerical output (e.g., Y = R) Squared error loss L(Y ,'L(X)) = (Y - 'L(X))2 4 / 14
  • 8. Decision trees 0.7 0.5 X1 X2 t5 t3 t4 푡2 푡1 푋1 ≤ 0.7 푡3 푡4 푡5 풙 푝(푌 = 푐|푋 = 풙) Split node ≤ Leaf node 푋2 ≤ 0.5 ≤ t 2 ' : nodes of the tree ' Xt : split variable at t vt 2 R : split threshold at t '(x) = arg maxc2Y p(Y = cjX = x) 5 / 14
  • 9. Bias-variance decomposition in regression Theorem. For the squared error loss, the bias-variance decomposition of the expected generalization error at X = x is ELfErr ('L(x))g = noise(x) + bias2(x) + var(x) where noise(x) = Err ('B(x)), bias2(x) = ('B(x) - ELf'L(x)g)2, var(x) = ELf(ELf'L(x)g - 'L(x))2g. 6 / 14
  • 11. Diagnosing the generalization error of a decision tree (Residual error : Lowest achievable error, independent of 'L.) Bias : Decision trees usually have low bias. Variance : They often suer from high variance. Solution : Combine the predictions of several randomized trees into a single model. 8 / 14
  • 12. Random forests 풙 휑1 휑푀 푝휑1 (푌 = 푐|푋 = 풙) … 푝휑푚 (푌 = 푐|푋 = 풙) Σ 푝휓(푌 = 푐|푋 = 풙) Randomization Bootstrap samples Random selection of K 6 p split variables g Random Forests Random selection of the threshold g Extra-Trees 9 / 14
  • 13. Bias-variance decomposition (cont.) Theorem. For the squared error loss, the bias-variance decomposition of the expected generalization error ELfErr ( L,1,...,M (x))g at X = x of an ensemble of M randomized models 'L,m is ELfErr ( L,1,...,M (x))g = noise(x) + bias2(x) + var(x), where noise(x) = Err ('B(x)), bias2(x) = ('B(x) - EL,f'L,(x)g)2, var(x) = (x)2 L,(x) + 1 - (x) M 2 L,(x). and where (x) is the Pearson correlation coecient between the predictions of two randomized trees built on the same learning set. 10 / 14
  • 14. Interpretation of (x) (Louppe, 2014) Theorem. (x) = VLfEjLf'L,(x)gg VLfEjLf'L,(x)gg+ELfVjLf'L,(x)gg In other words, it is the ratio between the variance due to the learning set and the total variance, accounting for random eects due to both the learning set and the random perburbations. (x) ! 1 when variance is mostly due to the learning set ; (x) ! 0 when variance is mostly due to the random perturbations ; (x) 0. 11 / 14
  • 15. Diagnosing the generalization error of random forests Bias : Identical to the bias of a single randomized tree. Variance : var(x) = (x)2 L,(x) + 1-(x) M 2 L,(x) As M ! 1, var(x) ! (x)2 L,(x) The stronger the randomization, (x) ! 0, var(x) ! 0. The weaker the randomization, (x) ! 1, var(x) ! 2 L,(x) Bias-variance trade-o. Randomization increases bias but makes it possible to reduce the variance of the corresponding ensemble model through averaging. The crux of the problem is to
  • 16. nd the right trade-o. Tips : tune max features in Random Forests. 12 / 14
  • 18. cation Theorem. For the zero-one loss and binary classi
  • 19. cation, the expected generalization error ELfErr ('L(x))g at X = x decomposes as follows : ELfErr ('L(x))g = P('B(x)6= Y ) + ( 0.5 - ELfbp p bp L(Y = 'B(x))g VLfL(Y = 'B(x))g )(2P('B(x) = Y ) - 1) For ELfbp L(Y = 'B(x)g 0.5, VLfbp L(Y = 'B(x))g ! 0 makes ! 0 and the expected generalization error tends to the error of the Bayes model. Conversely, for ELfbp L(Y = 'B(x)g 0.5, VLfbp L(Y = 'B(x))g ! 0 makes ! 1 and the error is maximal. 13 / 14