SlideShare a Scribd company logo
DECISION TREES & RANDOM FORESTS
Max Pagels, Data Science Specialist
max.pagels@sc5.io
12.6.2016
A TREE IN THE REAL WORLD
A TREE IN COMPUTER SCIENCE
A TREE IN COMPUTER SCIENCE
Root
Leaf
Edge
Node
DECISION TREES
A decision tree is a learning algorithm that constructs a set of
decisions based on training data.
Decision trees are popular because:
• They are naturally non-linear, so you can use them to solve
complex problems
• They are easy to visualise
• How they work is easily explained
• They can be used for regression (predict a number) and
classification (predict a class)
A decision tree algorithm is an explicit version of “ten questions”.
A TOY EXAMPLE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
BUILDING A DECISION TREE
Basic approach (starting with a root
node):
Loop over all leaf nodes:
1. Select the best attribute A
2. Assign A as the decision
attribute for the node we are
currently traversing
3. For each value of A, create a
descendant node (leaf node)
4. Sort training examples to
leaf nodes
5. If stopping criterion hit,
stop; else continue
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
BUILDING A DECISION TREE
Basic approach (starting with a root
node):
Loop over all leaf nodes:
1. Select the best attribute A
2. Assign A as the decision
attribute for the node we are
currently traversing
3. For each value of A, create a
descendant node (leaf node)
4. Sort training examples to
leaf nodes
5. If stopping criterion hit,
stop; else continue
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
BUILDING A DECISION TREE
Basic approach (starting with a root
node):
Loop over all leaf nodes:
1. Select the best attribute A
2. Assign A as the decision
attribute for the node we are
currently traversing
3. For each value of A, create a
descendant node (leaf node)
4. Sort training examples to
leaf nodes
5. If stopping criterion hit,
stop; else continue
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
DECISION TREES IN SCIKIT
sklearn.tree.DecisionTreeClassifier(
criterion=‘gini',
splitter=‘best',
max_depth=None,
min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.0,
max_features=None,
random_state=None,
max_leaf_nodes=None,
min_impurity_split=1e-07,
class_weight=None,
presort=False
)
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
Yes No
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
F
Yes No
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
F
Yes No
Yes No
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
F
Yes No
Yes No
Date No Date
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
EF
Yes No
Yes No
Date No Date
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
EF
Yes No
Yes No
Date No Date
Yes No
BUILDING A DECISION TREE
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
EF
Yes No
Yes No
Date No Date
Yes No
Date No Date
ID3
Weekend? Evening? Food? Date?
Yes Yes Yes Yes
No Yes No No
No No Yes No
Yes No Yes Yes
Yes No No No
Yes Yes No No
W
EF
Yes No
Yes No
Date No Date
Yes No
Date No Date
The ID3 algorithm, and many other decision tree algorithms, are
prone to overfitting: trees become too deep and start to capture
noise in the training data.
Overfitting means a trained algorithm will fail to generalise well to
new examples.
One way of combatting overfitting is to use an ensemble method.
ID3
RANDOM FORESTS
A FOREST IN THE REAL WORLD
A FOREST IN COMPUTER SCIENCE
A FOREST IN COMPUTER SCIENCE
RANDOM FORESTS
A random forest is an ensemble method based on decision trees.
The basic idea is deceptively simple:
1. Construct N decision trees
• Randomly sample a subset of the training data (with
replacement)
• Construct/train a decision tree using the decision tree
algorithm and the sampled subset of data
2. Predict by asking all trees in the forest for their opinion
• For regression problems, take the mean (average) of all
trees’ predictions
• For classification problems, take the mode of all trees’
predictions (i.e. vote)
CLASSIFICATION
Y
Y
N
Mo({Y,Y,N}) = Y
REGRESSION
2.1
1.8
1.9
µ({2.1, 1.8, 1.9}) = 1.933…
SUMMARY
Decision trees are easy-to-understand learning algorithms that can
be used for regression and classification, even for non-linear
problems.
Random forests are ensemble learning algorithms that help prevent
overfitting by creating many decision trees and averaging their
predictions.
If you are just getting started with machine learning, decision trees
are an excellent starting point.
THANK YOU!
Questions?

More Related Content

PPT
Decision tree and random forest
PDF
From decision trees to random forests
PPTX
Random Forest and KNN is fun
PPTX
Ensemble methods in machine learning
PPTX
Random forest
PPTX
Random Forest Classifier in Machine Learning | Palin Analytics
PDF
Understanding random forests
PDF
CART: Not only Classification and Regression Trees
Decision tree and random forest
From decision trees to random forests
Random Forest and KNN is fun
Ensemble methods in machine learning
Random forest
Random Forest Classifier in Machine Learning | Palin Analytics
Understanding random forests
CART: Not only Classification and Regression Trees

What's hot (20)

PPTX
Decision tree
PPTX
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
PDF
Decision tree
PPTX
Statistics for data science
PPTX
Random forest algorithm
PPTX
Introduction to XGboost
PPTX
Decision Tree Learning
PDF
Naive Bayes
PPT
Decision tree
PPTX
Support Vector Machine ppt presentation
PDF
Decision tree
PDF
XGBoost: the algorithm that wins every competition
PDF
Understanding Bagging and Boosting
PDF
Real World End to End machine Learning Pipeline
PPTX
CART – Classification & Regression Trees
PPTX
Ensemble methods
PDF
Visual Explanation of Ridge Regression and LASSO
PDF
Classification Based Machine Learning Algorithms
PDF
Decision Tree in Machine Learning
PDF
Anomaly detection
Decision tree
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Decision tree
Statistics for data science
Random forest algorithm
Introduction to XGboost
Decision Tree Learning
Naive Bayes
Decision tree
Support Vector Machine ppt presentation
Decision tree
XGBoost: the algorithm that wins every competition
Understanding Bagging and Boosting
Real World End to End machine Learning Pipeline
CART – Classification & Regression Trees
Ensemble methods
Visual Explanation of Ridge Regression and LASSO
Classification Based Machine Learning Algorithms
Decision Tree in Machine Learning
Anomaly detection
Ad

Similar to Decision trees & random forests (20)

PPT
Decision tree Using Machine Learning.ppt
PPT
Storey_DecisionTrees explain ml algo.ppt
PPTX
BAS 250 Lecture 5
PPTX
DECISION TRESS 2 for machine learning beginners
PPTX
DECISION TRESS for Machine Learning Beginners
PPTX
decision tree DECISION TREE IN MACHINE .pptx
PPTX
Random forest and decision tree
PDF
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
PPTX
Pokli Machine-Learning-Decision-Trees.pptx
PPTX
Lecture 12.pptx for bca student DAA lecture
PPTX
Decision Trees Learning in Machine Learning
PDF
Gloeocercospora sorghiGloeocercospora sorghi
PDF
A Method for Vibration Testing Decision Tree-Based Classification Systems.
PPTX
"Induction of Decision Trees" @ Papers We Love Bucharest
PPT
Slide3.ppt
PDF
Decision trees
PPTX
Decision Tree Learning: Decision tree representation, Appropriate problems fo...
PPTX
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
PDF
CSA 3702 machine learning module 2
PPTX
Machine Learning, Decision Tree Learning module_2_ppt.pptx
Decision tree Using Machine Learning.ppt
Storey_DecisionTrees explain ml algo.ppt
BAS 250 Lecture 5
DECISION TRESS 2 for machine learning beginners
DECISION TRESS for Machine Learning Beginners
decision tree DECISION TREE IN MACHINE .pptx
Random forest and decision tree
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
Pokli Machine-Learning-Decision-Trees.pptx
Lecture 12.pptx for bca student DAA lecture
Decision Trees Learning in Machine Learning
Gloeocercospora sorghiGloeocercospora sorghi
A Method for Vibration Testing Decision Tree-Based Classification Systems.
"Induction of Decision Trees" @ Papers We Love Bucharest
Slide3.ppt
Decision trees
Decision Tree Learning: Decision tree representation, Appropriate problems fo...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
CSA 3702 machine learning module 2
Machine Learning, Decision Tree Learning module_2_ppt.pptx
Ad

More from SC5.io (13)

PDF
AWS Machine Learning & Google Cloud Machine Learning
PDF
Transfer learning with Custom Vision
PDF
Practical AI for Business: Bandit Algorithms
PDF
Bandit Algorithms
PDF
Machine Learning Using Cloud Services
PDF
Angular.js Primer in Aalto University
PDF
Miten design-muutosjohtaminen hyödyttää yrityksiä?
PDF
Securing the client side web
PDF
Engineering HTML5 Applications for Better Performance
PDF
2013 10-02-backbone-robots-aarhus
PDF
2013 10-02-html5-performance-aarhus
PDF
2013 04-02-server-side-backbone
PPTX
Building single page applications
AWS Machine Learning & Google Cloud Machine Learning
Transfer learning with Custom Vision
Practical AI for Business: Bandit Algorithms
Bandit Algorithms
Machine Learning Using Cloud Services
Angular.js Primer in Aalto University
Miten design-muutosjohtaminen hyödyttää yrityksiä?
Securing the client side web
Engineering HTML5 Applications for Better Performance
2013 10-02-backbone-robots-aarhus
2013 10-02-html5-performance-aarhus
2013 04-02-server-side-backbone
Building single page applications

Recently uploaded (20)

PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
1_Introduction to advance data techniques.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Introduction to Business Data Analytics.
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Global journeys: estimating international migration
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Mega Projects Data Mega Projects Data
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Data_Analytics_and_PowerBI_Presentation.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Moving the Public Sector (Government) to a Digital Adoption
Reliability_Chapter_ presentation 1221.5784
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Clinical guidelines as a resource for EBP(1).pdf
1_Introduction to advance data techniques.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Introduction to Business Data Analytics.
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Global journeys: estimating international migration
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Miokarditis (Inflamasi pada Otot Jantung)
Mega Projects Data Mega Projects Data
STUDY DESIGN details- Lt Col Maksud (21).pptx

Decision trees & random forests

  • 1. DECISION TREES & RANDOM FORESTS Max Pagels, Data Science Specialist max.pagels@sc5.io 12.6.2016
  • 2. A TREE IN THE REAL WORLD
  • 3. A TREE IN COMPUTER SCIENCE
  • 4. A TREE IN COMPUTER SCIENCE Root Leaf Edge Node
  • 5. DECISION TREES A decision tree is a learning algorithm that constructs a set of decisions based on training data. Decision trees are popular because: • They are naturally non-linear, so you can use them to solve complex problems • They are easy to visualise • How they work is easily explained • They can be used for regression (predict a number) and classification (predict a class) A decision tree algorithm is an explicit version of “ten questions”.
  • 6. A TOY EXAMPLE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No
  • 7. BUILDING A DECISION TREE Basic approach (starting with a root node): Loop over all leaf nodes: 1. Select the best attribute A 2. Assign A as the decision attribute for the node we are currently traversing 3. For each value of A, create a descendant node (leaf node) 4. Sort training examples to leaf nodes 5. If stopping criterion hit, stop; else continue Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No
  • 8. BUILDING A DECISION TREE Basic approach (starting with a root node): Loop over all leaf nodes: 1. Select the best attribute A 2. Assign A as the decision attribute for the node we are currently traversing 3. For each value of A, create a descendant node (leaf node) 4. Sort training examples to leaf nodes 5. If stopping criterion hit, stop; else continue Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No
  • 9. BUILDING A DECISION TREE Basic approach (starting with a root node): Loop over all leaf nodes: 1. Select the best attribute A 2. Assign A as the decision attribute for the node we are currently traversing 3. For each value of A, create a descendant node (leaf node) 4. Sort training examples to leaf nodes 5. If stopping criterion hit, stop; else continue Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No
  • 10. DECISION TREES IN SCIKIT sklearn.tree.DecisionTreeClassifier( criterion=‘gini', splitter=‘best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_split=1e-07, class_weight=None, presort=False )
  • 11. BUILDING A DECISION TREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No
  • 12. BUILDING A DECISION TREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W
  • 13. BUILDING A DECISION TREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W Yes No
  • 14. BUILDING A DECISION TREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W F Yes No
  • 15. BUILDING A DECISION TREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W F Yes No Yes No
  • 16. BUILDING A DECISION TREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W F Yes No Yes No Date No Date
  • 17. BUILDING A DECISION TREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W EF Yes No Yes No Date No Date
  • 18. BUILDING A DECISION TREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W EF Yes No Yes No Date No Date Yes No
  • 19. BUILDING A DECISION TREE Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W EF Yes No Yes No Date No Date Yes No Date No Date
  • 20. ID3 Weekend? Evening? Food? Date? Yes Yes Yes Yes No Yes No No No No Yes No Yes No Yes Yes Yes No No No Yes Yes No No W EF Yes No Yes No Date No Date Yes No Date No Date
  • 21. The ID3 algorithm, and many other decision tree algorithms, are prone to overfitting: trees become too deep and start to capture noise in the training data. Overfitting means a trained algorithm will fail to generalise well to new examples. One way of combatting overfitting is to use an ensemble method. ID3
  • 23. A FOREST IN THE REAL WORLD
  • 24. A FOREST IN COMPUTER SCIENCE
  • 25. A FOREST IN COMPUTER SCIENCE
  • 26. RANDOM FORESTS A random forest is an ensemble method based on decision trees. The basic idea is deceptively simple: 1. Construct N decision trees • Randomly sample a subset of the training data (with replacement) • Construct/train a decision tree using the decision tree algorithm and the sampled subset of data 2. Predict by asking all trees in the forest for their opinion • For regression problems, take the mean (average) of all trees’ predictions • For classification problems, take the mode of all trees’ predictions (i.e. vote)
  • 29. SUMMARY Decision trees are easy-to-understand learning algorithms that can be used for regression and classification, even for non-linear problems. Random forests are ensemble learning algorithms that help prevent overfitting by creating many decision trees and averaging their predictions. If you are just getting started with machine learning, decision trees are an excellent starting point.