hands on machine learning Chapter 6&7 decision tree, ensemble and random forest

Interaction Lab. Kumoh National Institute of Technology
Hands-On Machine Learning
with Scikit-Learn, Keras & TensorFlow
Chapter 6. Decision Tree
Chapter 7. Ensemble and random forest
Jeong jaeyeop

■Decision tree
■Ensemble and random forest
Agenda
Interaction Lab., Kumoh National Institue of Technology 2

Decision tree
Ensemble and random forest
Data Engineering Lab., Kumoh National Institue of Technology 3

■Decision tree
 Machine learning algorithms
• Classification
• Regression
 Components of a random forest
Intro

■Decision tree training
 Use iris datasets
• Species : setosa, versicolor, virginica
• Property : sepal length, sepal width, petal length, petal width
Decision tree training and visualization(1/2)

■Decision tree visualization
 Use export_graphviz in sklearn
Decision tree training and visualization(2/2)

■How to tree make prediction?
 Start root node(depth = 0)
 Samples are value
 Value is number of samples
 Gini is impurity
• If gini is 0, that node is pure
 Class is predicted value
Prediction(1/2)

■Gini calculation method
 1 −
0
54
2
−
49
54
2
−
5
54
2
≈ 0.168
 𝐺𝑖 = 1 − 𝑘=1
𝑛
𝑃𝑖,𝑘
2
Prediction(2/2)

■CART algorithm
 Decision tree training in sklearn
• Divide subset using property 𝑘 and threshold 𝑡𝑘
 Cost function
• 𝐽 𝑘, 𝑡𝑘 =
𝑚𝑙𝑒𝑓𝑡
𝑚
𝐺𝑙𝑒𝑓𝑡 +
𝑚𝑟𝑖𝑔ℎ𝑡
𝑚
𝐺𝑟𝑖𝑔ℎ𝑡
•
𝐺𝑙𝑒𝑓𝑡/𝑟𝑖𝑔ℎ𝑡 𝑖𝑠 𝑙𝑒𝑓𝑡/𝑟𝑖𝑔ℎ𝑡 𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦 𝑜𝑓 𝑠𝑢𝑏𝑠𝑒𝑡𝑠
𝑚𝑙𝑒𝑓𝑡/𝑟𝑖𝑔ℎ𝑡 𝑖𝑠 𝑙𝑒𝑓𝑡/𝑟𝑖𝑔ℎ𝑡 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
 Repeat until limited max depth, cannot divide
CART training algorithm

■Determination of impurity
 Default setup is gini
 Entropy
• Determination of impurity in machine learning
• If it is stable and orderly, the entropy is 0
• If a set contains only one class of samples, the entropy is 0
• 𝐻𝑖 = − 𝑘=1
𝑃𝑖,𝑘≠0
𝑛
𝑃𝑖,𝑘 𝑙𝑜𝑔2(𝑃𝑖,𝑘)
• −
49
54
𝑙𝑜𝑔2(
49
54
) −
5
54
𝑙𝑜𝑔2(
5
54
) ≈ 0.445
 Entropy makes balanced tree
Gini or entropy

■Nonparametric model
 Model structure is free
 Limit the degree of freedom of the decision tree to avoid overfitting
 Regulatory parameters in keras
• min_samples_split
• min_samples_leaf
• min_weight_fraction_leaf
• max_leaf_nodes
• max_features
Regulatory parameters

■Decision tree regressor in sklearn
 DecisionTreeRegressor
Regression(1/2)

■CART algorithm
 Not gini, use mse for divide subsets
 Cost function
• 𝐽 𝑘, 𝑡𝑘 =
𝑚𝑙𝑒𝑓𝑡
𝑚
𝑀𝑆𝐸𝑙𝑒𝑓𝑡 +
𝑚𝑟𝑖𝑔ℎ𝑡
𝑚
𝑀𝑆𝐸𝑟𝑖𝑔ℎ𝑡
•
𝑀𝑆𝐸𝑛𝑜𝑑𝑒 = 𝑖∈𝑛𝑜𝑑𝑒(𝑦𝑛𝑜𝑑𝑒 − 𝑦(𝑖)
)2
𝑦𝑛𝑜𝑑𝑒 =
1
𝑚𝑛𝑜𝑑𝑒
𝑖∈𝑛𝑜𝑑𝑒 𝑦(𝑖)
Regression(2/2)

■Sensitive to rotation of training sets
Instability

Ensemble and random forest
Data Engineering Lab., Kumoh National Institue of Technology 15

■Ensemble
 Collect predictions from estimators and get better predictions
 Different algorithm same data
 Same data different algorithm
■Random forest
 Ensemble of decision trees
■Ensemble method
 Bagging
 Boosting
 Stacking
Intro

■Training several classifiers
Voting-based classifiers(1/3)

■Prediction several classifiers
 Weak learner
• Low performance, like random
 Strong learner
• High performance

■Voting-based classifiers in sklearn
 Hard voting
• Decision by majority
• voting = ‘hard’
 Soft voting
• Decision by probability
• voting = ‘soft’

■Same algorithm different data
 Bagging(bootstrap aggregating)
• Sampling by allowing redundancy in the training set
 Pasting
• Sampling without allowing redundancy in the training set
Bagging and pasting(1/2)

■Bagging and pasting in sklearn
Bagging and pasting(2/2)

■OOB
 Out Of Bag
• Unselected samples : 37%
• Use validation sets
OOB evaluation

■Ensemble of decision tree
 Generally, bagging
 max_leaf_node = 16, 500 trees
 Split tree node
• Find the best property among randomly selected property candidates
Random forest(1/3)

■Extra-trees
 Extremely random trees
 Find best property among randomly selected property candidates
 Fast than general random forest
 ExtraTrees in sklearn
• ExtraTreesClassifier()
• ExTreesRegressor()
Random forest(2/3)

■Property importance
 Ease of measuring the relative importance of property
 feature_importances_
 Property of the petals are more important than property of sepal
Random forest(3/3)

■Concept
 Connecting multiple weak learners to create a strong learner
 Learn the estimators by complementing the preceding model
■Popular kind
 AdaBoost(adaptive boosting)
 Gradient boosting
Boosting(1/6)

■AdaBoost
 Increasing the weight of training samples that previous models
were underfitting
 Making predictions in the same way as bagging and pasting
Boosting(2/6)

■ AdaBoost algorithm
 Weight(𝑤(𝑖)
) : initialize
1
𝑚
 Learn the first estimator
• Weighted error rate 𝑟1, calculated for this training set
 Error rate weighted by 𝑗𝑡ℎ estimator
• 𝑟
𝑗 =
𝑖=1
𝑦
𝑗
(𝑖)
≠𝑦(𝑖)
𝑚
𝑊(𝑖)
𝑖=1
𝑚 𝑤(𝑖)
 Estimator weight
• 𝛼𝑗 = (𝑒𝑡𝑎)𝑙𝑜𝑔
1−𝑟𝑗
𝑟𝑗
 Weight update rule
• 𝑊(𝑖)
←
𝑊(𝑖)
𝑦𝑗
(𝑖)
= 𝑦(𝑖)
𝑊(𝑖)
exp(𝛼𝑗) 𝑦𝑗
(𝑖)
≠ 𝑦(𝑖)
Boosting(3/6)

■Gradient boosting
 Learn with residual error from previous model
 Learn from quadratic curve data
Boosting(4/6)

Boosting(5/6)

 Learning rate
• Low requires a lot of trees
 Typical library
• XGBoost
Boosting(6/6)

■Concept
 Train a model that aggregates predictions
 Blender or meta learner
Stacking(1/3)

■Training
 Divide the data into two subsets
 Train the first layer with the first data
 Predict the second data with the first layer trained
Stacking(2/3)

■Multi layer stacking ensemble prediction
Stacking(3/3)

hands on machine learning Chapter 6&7 decision tree, ensemble and random forest

More Related Content

What's hot (20)

Similar to hands on machine learning Chapter 6&7 decision tree, ensemble and random forest (20)

More from Jaey Jeong (16)

Recently uploaded (20)

hands on machine learning Chapter 6&7 decision tree, ensemble and random forest

Editor's Notes