SlideShare a Scribd company logo
Decision Tree Ensembles
• Tree-based models for supervised learning
• Single decision trees are unstable and may lack accuracy
• Ensemble methods combine multiple trees for better performance
Why Ensembles?
• High variance in decision trees: small data changes → big prediction
changes
• Ensemble = Combine weak learners (trees) into a strong model
• Trade-off: Interpretability ↓, Accuracy ↑
Bagging (Bootstrap Aggregating)
• Randomly sample data (with replacement) to create bootstrap
samples
• Train a decision tree on each sample
• Average (regression) or vote (classification) for final prediction
• Out-of-bag (OOB) samples = built-in validation set
Random Forest
• Bagging + randomness at each split (use random subset of features)
• More diverse trees → stronger ensemble
• Handles categorical & continuous variables
• Shows variable importance
• Robust to outliers & missing data
• Limitation: Interpretability ↓ compared to a single tree
Boosted Trees
• Sequentially build trees, each learning from previous errors
• Types: AdaBoost, Gradient Boosting, XGBoost
• Typically more accurate than bagging/random forest
• Sensitive to overfitting and requires careful tuning
Key Hyperparameters
• Number of trees/layers (B)
• Tree depth (splits per tree)
• Learning rate (for boosting)
• Minimum split size
Evaluation & Overfitting
• Use validation data (not training data) to tune hyperparameters
• Too few splits: Underfitting
• Too many splits: Overfitting
Model Assessment
• Metrics: Misclassification rate, AUC, confusion matrix
• Random forest provides feature importance
• Always check if model's accuracy is better than a naive model
Comparison Table
• Bagging: Bootstrapped samples, deep trees | Accuracy: Moderate |
Interpretability: Medium
• Random Forest: Bootstrapping + feature randomness | Accuracy:
High | Interpretability: Lower
• Boosted Trees: Sequential, error-correcting | Accuracy: Highest |
Interpretability: Lowest
Takeaways
• Ensembles improve tree-based models significantly
• Random Forest: Good default choice for accuracy & feature
importance
• Boosted Trees: Best for accuracy but needs more tuning
• Always evaluate on validation data and check for overfitting

More Related Content

PPTX
artifial intelligence notes of islamia university
PPTX
Ml7 bagging
PPTX
CS109a_Lecture16_Bagging_RF_Boosting.pptx
PPTX
decision_trees_forests_2.pptx
PPTX
Ensemble learning Techniques
PDF
Boosting Algorithms Omar Odibat
PDF
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
PPTX
Ensemble Models in machine learning.pptx
artifial intelligence notes of islamia university
Ml7 bagging
CS109a_Lecture16_Bagging_RF_Boosting.pptx
decision_trees_forests_2.pptx
Ensemble learning Techniques
Boosting Algorithms Omar Odibat
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Ensemble Models in machine learning.pptx

Similar to Decision_Tree_Ensembles_Lecture.pptx Basics (20)

PDF
BaggingBoosting.pdf
PDF
Understanding Bagging and Boosting
PPT
RANDOM FORESTS Ensemble technique Introduction
PPTX
Introduction to RandomForests 2004
PPTX
DecisionTree_RandomForest good for data science
PPTX
DecisionTree_RandomForest.pptx
PPT
RandomForests in artificial intelligence
PPTX
Random Forest Classifier in Machine Learning | Palin Analytics
PPT
Ensemble Learning in Machine Learning.ppt
PPT
Using Tree algorithms on machine learning
PDF
L4. Ensembles of Decision Trees
PPTX
Footprinting, Enumeration, Scanning, Sniffing, Social Engineering
PDF
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
PPT
Lecture -8 Classification(AdaBoost) .ppt
PPTX
Supervised and Unsupervised Learning .pptx
PPT
RandomForestsRandomForestsRandomForests.ppt
PPT
RandomForests Bootstrapping BAgging Aggregation
PPTX
Machine learning interviews day3
PDF
Machine learning in science and industry — day 2
PPTX
13 random forest
BaggingBoosting.pdf
Understanding Bagging and Boosting
RANDOM FORESTS Ensemble technique Introduction
Introduction to RandomForests 2004
DecisionTree_RandomForest good for data science
DecisionTree_RandomForest.pptx
RandomForests in artificial intelligence
Random Forest Classifier in Machine Learning | Palin Analytics
Ensemble Learning in Machine Learning.ppt
Using Tree algorithms on machine learning
L4. Ensembles of Decision Trees
Footprinting, Enumeration, Scanning, Sniffing, Social Engineering
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Lecture -8 Classification(AdaBoost) .ppt
Supervised and Unsupervised Learning .pptx
RandomForestsRandomForestsRandomForests.ppt
RandomForests Bootstrapping BAgging Aggregation
Machine learning interviews day3
Machine learning in science and industry — day 2
13 random forest
Ad

More from ymanoj3 (9)

PPTX
Baseball_OBP_Analysis_Report presentation
PPTX
ML_Exam_Review_Summary - Naive Bayes _ ml concepts
PPTX
Rent_to_Price_Analysis_Presentation.pptx
PPTX
Interpreting_Logistic_Regression_Coefficients.pptx
PPTX
Decision_Trees_Lecture.pptx - Basics Class
PPTX
LinkedIn_Growth_Playbook_AlternateView.pptx
PPTX
LinkedIn_Mishka_Strategies_Part2 Winning Linkedin
PPTX
Sofie_LLM_DevSecOps_Project - Kuberenetes deployment
PPTX
EasyShop DevOps Project - Scalaable AWS Deployment
Baseball_OBP_Analysis_Report presentation
ML_Exam_Review_Summary - Naive Bayes _ ml concepts
Rent_to_Price_Analysis_Presentation.pptx
Interpreting_Logistic_Regression_Coefficients.pptx
Decision_Trees_Lecture.pptx - Basics Class
LinkedIn_Growth_Playbook_AlternateView.pptx
LinkedIn_Mishka_Strategies_Part2 Winning Linkedin
Sofie_LLM_DevSecOps_Project - Kuberenetes deployment
EasyShop DevOps Project - Scalaable AWS Deployment
Ad

Recently uploaded (20)

PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Well-logging-methods_new................
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Construction Project Organization Group 2.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
DOCX
573137875-Attendance-Management-System-original
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Sustainable Sites - Green Building Construction
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
UNIT 4 Total Quality Management .pptx
CYBER-CRIMES AND SECURITY A guide to understanding
Well-logging-methods_new................
Embodied AI: Ushering in the Next Era of Intelligent Systems
Construction Project Organization Group 2.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Model Code of Practice - Construction Work - 21102022 .pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
bas. eng. economics group 4 presentation 1.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
573137875-Attendance-Management-System-original
Arduino robotics embedded978-1-4302-3184-4.pdf
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Sustainable Sites - Green Building Construction

Decision_Tree_Ensembles_Lecture.pptx Basics

  • 1. Decision Tree Ensembles • Tree-based models for supervised learning • Single decision trees are unstable and may lack accuracy • Ensemble methods combine multiple trees for better performance
  • 2. Why Ensembles? • High variance in decision trees: small data changes → big prediction changes • Ensemble = Combine weak learners (trees) into a strong model • Trade-off: Interpretability ↓, Accuracy ↑
  • 3. Bagging (Bootstrap Aggregating) • Randomly sample data (with replacement) to create bootstrap samples • Train a decision tree on each sample • Average (regression) or vote (classification) for final prediction • Out-of-bag (OOB) samples = built-in validation set
  • 4. Random Forest • Bagging + randomness at each split (use random subset of features) • More diverse trees → stronger ensemble • Handles categorical & continuous variables • Shows variable importance • Robust to outliers & missing data • Limitation: Interpretability ↓ compared to a single tree
  • 5. Boosted Trees • Sequentially build trees, each learning from previous errors • Types: AdaBoost, Gradient Boosting, XGBoost • Typically more accurate than bagging/random forest • Sensitive to overfitting and requires careful tuning
  • 6. Key Hyperparameters • Number of trees/layers (B) • Tree depth (splits per tree) • Learning rate (for boosting) • Minimum split size
  • 7. Evaluation & Overfitting • Use validation data (not training data) to tune hyperparameters • Too few splits: Underfitting • Too many splits: Overfitting
  • 8. Model Assessment • Metrics: Misclassification rate, AUC, confusion matrix • Random forest provides feature importance • Always check if model's accuracy is better than a naive model
  • 9. Comparison Table • Bagging: Bootstrapped samples, deep trees | Accuracy: Moderate | Interpretability: Medium • Random Forest: Bootstrapping + feature randomness | Accuracy: High | Interpretability: Lower • Boosted Trees: Sequential, error-correcting | Accuracy: Highest | Interpretability: Lowest
  • 10. Takeaways • Ensembles improve tree-based models significantly • Random Forest: Good default choice for accuracy & feature importance • Boosted Trees: Best for accuracy but needs more tuning • Always evaluate on validation data and check for overfitting