SlideShare a Scribd company logo
Machine
Learning - V
Random Forest
Random Forest
To understand the Random Forest let’s first understand Ensemble model.
Ensemble model is a collection of outputs from multiple models to more
accuracy predictive modeling
Ensemble model are high demand due to easy implementation of
multiple models in a short time and effort for high prediction accuracy.
And Decision tree is a branching method of one or more if-then-else
statements for the predictors
* It is very useful for data exploration that breaks down the dataset into
smaller and smaller subsets of association.
Rupak Roy
Single decision Tree
In Decision trees the measure i.e. the branching of the tree is done by
Information Gain.
Information Gain = Entropy of the parent node – Entropy of the
split(children)
Entropy is a measure on how disorganized the systems is.
Entropy ranges from 0 to1. Pure node has an Entropy of 0 while impure
node has Entropy of 1
* The core algorithm for building decision trees was known as ID3 by
J.R.Quinlan
• It uses a top down approach and can be used to build Classification
and Regression Decision trees.
Rupak Roy
Decision Making in Regression Decision
As we know the main aim in regression tree is to reduce the standard
deviation and in classification tree the main aim is to reduce entropy.
Random Forest is suitable for numerical values and since random forest
is a collection of decision trees so first lets understand how numerical
values works for a decision tree.
The numerical values for the decision tree i.e. the regression tree uses
standard deviation scores to do the splitting. The attribute with the
largest standard deviation reduction is chosen for the next decision
node(the node that can be further split). The branch with standard
deviation value more than 0 usually needs for splitting.
Rupak Roy
Decision Making in Regression Decision
 Stop/pruning criteria is provided using a size based criteria to further
stop the tree growing which leads to over fitting problems.
 The process of splitting the decision nodes runs recursively till it
reaches the terminal/leafnodes(the node that cannot be further split)
When the number of instances is more than one at a least node we
calculate the average as the final value for the target.
Rupak Roy
Decision Tree Algorithms
ID3 or Iterative Dichotomizer is one of the first of 3 decision tree
algorithms to implement, developed by J Quinlan in 1986
C4.5 – is the next version also developed by J Quinlan, optimized for
continuous and discrete features with some improvement on the over
fitting problem by using bottom up approach known as pruning.
CART or Classification & Regression Trees
 The CART implementation is similar to C4.5 that prunes the tree by
imposing a complexity penalty based on number of leaves in the
tree.
 CART uses the GINI method to create binary splits. Most commonly
used decision tree algorithm.
Advantages of Single DT
Advantages of Single DT
 It is a non- parametric method i.e. it is independent of type, size of
underlying population, that is we can even use when sample size is
low. Therefore very fast and easy to understand and implement.
 Can handle outliers and missing values, therefore requires less data
preparation than other machine leaning methods and can be used
for both continuous and numerical data types.
Here let’s focus more into the disadvantages of a Decision Tree to get a
solution.
Rupak Roy
Disadvantages of Single DT
Disadvantages of Single DT
 As we know decision trees are easily prone to over fitting issue,
therefore it needs to be controlled by pruning techniques.
 It uses range of values to split the tree rather than actual values for
continuous numerical variables. Hence sometimes not very effective
for estimating continuous values.
 The robustness to outliers and skewness comes at the cost of throwing
away some of the information from the dataset.
 When some input variable have too many possible values they need
to be aggregated into groups else it will result in too many splits
which may result in poor predicting performance.
This disadvantages of a Decision Tree has given rise to the ensemble
methods.
Rupak Roy
Ensemble Methods
This disadvantages of a Decision Tree has given rise to the ensemble
methods.
* A collection of several models in this case collection of decision trees
are used in order to increase predictive power & the final score is
obtained by aggregating them.
• This is known as Ensemble Method in Machine Learning
Random forest for continuous numerical variables and Boosting &
Bagging for categorical variables are the most popular ensemble
methods.
However the basic functionality remains the same i.e. the original
concept of creating a tree by using entropy & information gain.
Rupak Roy
Random Forest in brief
• The goal of random forest is to improve the prediction accuracy by
using the collection of un-pruned decision trees combined with a rule
based criteria.
So let’s understand the goals of random forest in detail.
Rupak Roy

More Related Content

PPTX
Random Forest
PPTX
Random forest algorithm
PPTX
Random forest
PPT
Decision tree and random forest
PDF
From decision trees to random forests
PDF
Understanding random forests
PPTX
Random forest
PPT
Support Vector Machines
Random Forest
Random forest algorithm
Random forest
Decision tree and random forest
From decision trees to random forests
Understanding random forests
Random forest
Support Vector Machines

What's hot (20)

PPTX
Decision Tree Learning
PPTX
Random Forest and KNN is fun
PPTX
Machine Learning for Disease Prediction
PPT
Support Vector machine
PPT
Machine Learning 3 - Decision Tree Learning
PDF
Machine Learning Feature Selection - Random Forest
PPT
Reinforcement learning 7313
PPTX
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
PDF
LSTM Basics
PPTX
Random Forest Classifier in Machine Learning | Palin Analytics
PPTX
Deep Learning Explained
PDF
Bias and variance trade off
PPTX
Lect6 Association rule & Apriori algorithm
PDF
Introduction to Machine Learning Classifiers
PPTX
Data Science Process.pptx
PPT
3.3 hierarchical methods
PPTX
Dbscan algorithom
PPT
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
PDF
K - Nearest neighbor ( KNN )
PPTX
Deep Learning, Keras, and TensorFlow
Decision Tree Learning
Random Forest and KNN is fun
Machine Learning for Disease Prediction
Support Vector machine
Machine Learning 3 - Decision Tree Learning
Machine Learning Feature Selection - Random Forest
Reinforcement learning 7313
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
LSTM Basics
Random Forest Classifier in Machine Learning | Palin Analytics
Deep Learning Explained
Bias and variance trade off
Lect6 Association rule & Apriori algorithm
Introduction to Machine Learning Classifiers
Data Science Process.pptx
3.3 hierarchical methods
Dbscan algorithom
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
K - Nearest neighbor ( KNN )
Deep Learning, Keras, and TensorFlow
Ad

Similar to Introduction to Random Forest (20)

PDF
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
PDF
Understanding Decision Trees in Machine Learning: A Comprehensive Guide
PDF
lec8_annotated.pdf ml csci 567 vatsal sharan
PPTX
An Introduction to Random Forest and linear regression algorithms
PPTX
Decision Tree in Machine Learning
PDF
Machine Learning - Decision Trees
PDF
What makes a good decision tree?
PDF
Random Forest / Bootstrap Aggregation
DOCX
Classification Using Decision Trees and RulesChapter 5.docx
PPTX
Random forest and decision tree
PDF
Decision trees
PDF
Bank loan purchase modeling
PPTX
Primer on major data mining algorithms
PPTX
Ai & Machine learning - 31140523010 - BDS302.pptx
ODP
Machine Learning with Decision trees
PDF
Data Science - Part V - Decision Trees & Random Forests
PPTX
DECESION TREE and -SVM-NAIVEs bayes-BAYS.pptx
PPTX
Decision_Trees_Random_Forests for use in machine learning and computer scienc...
PPTX
Decision Trees Learning in Machine Learning
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Understanding Decision Trees in Machine Learning: A Comprehensive Guide
lec8_annotated.pdf ml csci 567 vatsal sharan
An Introduction to Random Forest and linear regression algorithms
Decision Tree in Machine Learning
Machine Learning - Decision Trees
What makes a good decision tree?
Random Forest / Bootstrap Aggregation
Classification Using Decision Trees and RulesChapter 5.docx
Random forest and decision tree
Decision trees
Bank loan purchase modeling
Primer on major data mining algorithms
Ai & Machine learning - 31140523010 - BDS302.pptx
Machine Learning with Decision trees
Data Science - Part V - Decision Trees & Random Forests
DECESION TREE and -SVM-NAIVEs bayes-BAYS.pptx
Decision_Trees_Random_Forests for use in machine learning and computer scienc...
Decision Trees Learning in Machine Learning
Ad

More from Rupak Roy (20)

PDF
Hierarchical Clustering - Text Mining/NLP
PDF
Clustering K means and Hierarchical - NLP
PDF
Network Analysis - NLP
PDF
Topic Modeling - NLP
PDF
Sentiment Analysis Practical Steps
PDF
NLP - Sentiment Analysis
PDF
Text Mining using Regular Expressions
PDF
Introduction to Text Mining
PDF
Apache Hbase Architecture
PDF
Introduction to Hbase
PDF
Apache Hive Table Partition and HQL
PDF
Installing Apache Hive, internal and external table, import-export
PDF
Introductive to Hive
PDF
Scoop Job, import and export to RDBMS
PDF
Apache Scoop - Import with Append mode and Last Modified mode
PDF
Introduction to scoop and its functions
PDF
Introduction to Flume
PDF
Apache Pig Relational Operators - II
PDF
Passing Parameters using File and Command Line
PDF
Apache PIG Relational Operations
Hierarchical Clustering - Text Mining/NLP
Clustering K means and Hierarchical - NLP
Network Analysis - NLP
Topic Modeling - NLP
Sentiment Analysis Practical Steps
NLP - Sentiment Analysis
Text Mining using Regular Expressions
Introduction to Text Mining
Apache Hbase Architecture
Introduction to Hbase
Apache Hive Table Partition and HQL
Installing Apache Hive, internal and external table, import-export
Introductive to Hive
Scoop Job, import and export to RDBMS
Apache Scoop - Import with Append mode and Last Modified mode
Introduction to scoop and its functions
Introduction to Flume
Apache Pig Relational Operators - II
Passing Parameters using File and Command Line
Apache PIG Relational Operations

Recently uploaded (20)

PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
RMMM.pdf make it easy to upload and study
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Sports Quiz easy sports quiz sports quiz
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Classroom Observation Tools for Teachers
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
master seminar digital applications in india
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Cell Types and Its function , kingdom of life
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Supply Chain Operations Speaking Notes -ICLT Program
STATICS OF THE RIGID BODIES Hibbelers.pdf
VCE English Exam - Section C Student Revision Booklet
RMMM.pdf make it easy to upload and study
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Sports Quiz easy sports quiz sports quiz
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPH.pptx obstetrics and gynecology in nursing
Classroom Observation Tools for Teachers
Microbial disease of the cardiovascular and lymphatic systems
master seminar digital applications in india
O7-L3 Supply Chain Operations - ICLT Program
Abdominal Access Techniques with Prof. Dr. R K Mishra
Cell Types and Its function , kingdom of life
2.FourierTransform-ShortQuestionswithAnswers.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Final Presentation General Medicine 03-08-2024.pptx
O5-L3 Freight Transport Ops (International) V1.pdf

Introduction to Random Forest

  • 2. Random Forest To understand the Random Forest let’s first understand Ensemble model. Ensemble model is a collection of outputs from multiple models to more accuracy predictive modeling Ensemble model are high demand due to easy implementation of multiple models in a short time and effort for high prediction accuracy. And Decision tree is a branching method of one or more if-then-else statements for the predictors * It is very useful for data exploration that breaks down the dataset into smaller and smaller subsets of association. Rupak Roy
  • 3. Single decision Tree In Decision trees the measure i.e. the branching of the tree is done by Information Gain. Information Gain = Entropy of the parent node – Entropy of the split(children) Entropy is a measure on how disorganized the systems is. Entropy ranges from 0 to1. Pure node has an Entropy of 0 while impure node has Entropy of 1 * The core algorithm for building decision trees was known as ID3 by J.R.Quinlan • It uses a top down approach and can be used to build Classification and Regression Decision trees. Rupak Roy
  • 4. Decision Making in Regression Decision As we know the main aim in regression tree is to reduce the standard deviation and in classification tree the main aim is to reduce entropy. Random Forest is suitable for numerical values and since random forest is a collection of decision trees so first lets understand how numerical values works for a decision tree. The numerical values for the decision tree i.e. the regression tree uses standard deviation scores to do the splitting. The attribute with the largest standard deviation reduction is chosen for the next decision node(the node that can be further split). The branch with standard deviation value more than 0 usually needs for splitting. Rupak Roy
  • 5. Decision Making in Regression Decision  Stop/pruning criteria is provided using a size based criteria to further stop the tree growing which leads to over fitting problems.  The process of splitting the decision nodes runs recursively till it reaches the terminal/leafnodes(the node that cannot be further split) When the number of instances is more than one at a least node we calculate the average as the final value for the target. Rupak Roy
  • 6. Decision Tree Algorithms ID3 or Iterative Dichotomizer is one of the first of 3 decision tree algorithms to implement, developed by J Quinlan in 1986 C4.5 – is the next version also developed by J Quinlan, optimized for continuous and discrete features with some improvement on the over fitting problem by using bottom up approach known as pruning. CART or Classification & Regression Trees  The CART implementation is similar to C4.5 that prunes the tree by imposing a complexity penalty based on number of leaves in the tree.  CART uses the GINI method to create binary splits. Most commonly used decision tree algorithm.
  • 7. Advantages of Single DT Advantages of Single DT  It is a non- parametric method i.e. it is independent of type, size of underlying population, that is we can even use when sample size is low. Therefore very fast and easy to understand and implement.  Can handle outliers and missing values, therefore requires less data preparation than other machine leaning methods and can be used for both continuous and numerical data types. Here let’s focus more into the disadvantages of a Decision Tree to get a solution. Rupak Roy
  • 8. Disadvantages of Single DT Disadvantages of Single DT  As we know decision trees are easily prone to over fitting issue, therefore it needs to be controlled by pruning techniques.  It uses range of values to split the tree rather than actual values for continuous numerical variables. Hence sometimes not very effective for estimating continuous values.  The robustness to outliers and skewness comes at the cost of throwing away some of the information from the dataset.  When some input variable have too many possible values they need to be aggregated into groups else it will result in too many splits which may result in poor predicting performance. This disadvantages of a Decision Tree has given rise to the ensemble methods. Rupak Roy
  • 9. Ensemble Methods This disadvantages of a Decision Tree has given rise to the ensemble methods. * A collection of several models in this case collection of decision trees are used in order to increase predictive power & the final score is obtained by aggregating them. • This is known as Ensemble Method in Machine Learning Random forest for continuous numerical variables and Boosting & Bagging for categorical variables are the most popular ensemble methods. However the basic functionality remains the same i.e. the original concept of creating a tree by using entropy & information gain. Rupak Roy
  • 10. Random Forest in brief • The goal of random forest is to improve the prediction accuracy by using the collection of un-pruned decision trees combined with a rule based criteria. So let’s understand the goals of random forest in detail. Rupak Roy