DecisionTree_RandomForest good for data science

Example Decision Tree – Retail Data

Tree Depth = 1 (Decision Stump)

Tree Depth = 20 (Complex Tree)

wo Predictor Decision Boundaries

Two Predictor Decision Boundaries

Minimize overfitting: Early stopping

Decision Tree - Strengths & Weaknesses

he Problem with Single Decision Trees

1. Sample records with
replacement (aka "bootstrap"
the training data)
Sampling is the process of selecting a
subset of items from a vast collection of
items.
Bootstrap = Sampling with replacement. It
means a data point in a drawn sample can
reappear in future drawn samples as well.
2. Fit an overgrown tree to
each resampled data set
3. Average predictions
Bagging :
Bootstrap Aggregating :
wisdom of the crowd

Bagging : Bootstrap Aggregating : wisdom of the crowd
As we add more trees... our average prediction error reduces

• Random forest is identified as a collection of
decision trees. Each tree estimates a
classification, and this is called a “vote”.
Ideally, we consider each vote from every
tree and chose the most voted classification
(Majority-Voting).
• Random Forest follow the same bagging
process as the decision trees but each time a
split is to be performed, the search for the
split variable is limited to a random subset of
m of the p attributes (variables or features)
aka Split-Attribute Randomization :
• classification trees: m = √p
• regression trees: m = p/3
• m is commonly referred to as mtry
• Random Forests produce many unique trees.
Random Forest

Bagging vs Random Forest
• Bagging introduces randomness
into the rows of the data.
• Random forest
introduces randomness into the
rows and columns of the data
• Combined, this provides a more
diverse set of trees that almost
always lowers our prediction error.
Split-Attribute Randomization : Prediction Error

Random Forest : Out-of-Bag (OOB) Observations

andom Forest - Strengths & Weaknesses

DecisionTree_RandomForest good for data science

More Related Content

Similar to DecisionTree_RandomForest good for data science (20)

Recently uploaded (20)

DecisionTree_RandomForest good for data science