SlideShare a Scribd company logo
Advanced Machine Learning with Python
Session 9 :Decision Trees
SIGKDD
Carlos Santillan
Bentley Systems Inc
csantill@gmail.com
Decision Trees
Decision Trees
Decision Trees
A tree-like graph decision support model
Growing a Tree
Types of Decision Trees
There are two main Types
• Classification Tree (Categorical Value Decision Tree)
• Regression Tree (Continuous Variable Decision Tree)
CART (Classification and Regression Tree) Used to refer to both
The type of a Decision tree is based on the type of the target Variable
Nodes
1.Root Node
2.Internal Node (Decision Node)
3.Leaf (terminal)
Depth - Length of of longest path
from root to leaf
Decision Stump (One level decision Tree)
Decision Tree Terms
Decision Tree Algorithm
The basic greedy algorithm is as follows:
Start at Node find “best attribute” to split at
Repartition N into N1, N2, … according to best split
Repeat for each Node N until “stop condition” is met
Growing an optimal Decision Tree is an NP-complete
Problem
Fortunately greedy algorithms have good accuracy and
performance
What is the “Best Attribute” to split
There are different criteria that can be used to determine what
is the best attribute to split.
• Information Gain
• Gini Index
• Classification Error
• Gain Ratio (Normalized Information Gain)
• Variance Reduction
Purity
Entropy
Def: Measure of Impurity in our sample
• Entropy =0 (All elements are same class)
• Entropy =1 (All elements evenly split between classes)
Information Gain
Information Gain = Entropy (Parent) - [ Weighted Average]
Entropy (Children)
If we Split X < 4
• Entropy < 4 = 0.86
• Entropy > 4 = 0
Information Gain = 0.95 - 14/16 (0.86) - (2/16) (0)
Information Gain = 0.19
Information Gain
IG = Entropy (Parent) - [ Weighted Average] Entropy (Children)
If we Split X < 3
• Entropy < 3 = 0
• Entropy > 3 = 0.811
Information Gain = 0.95 - 8/16 (0) - (2/16) (0.811)
Information Gain = 0.8486
GINI Index
Definition : Expected error rate
• GINI =0 (All elements are same class)
• GINI =0.5 (All elements evenly split between classes)
GINI Gain
If we Split X < 4
• Gini < 4 = 0.4081
• Gini > 4 = 0
Gini Gain = 0.4687 - 10/16 (0.4081) - (0/16) (0)
Gini Gain = 0.2136
GINI Gain
If we Split X < 3
• Gini< 3 = 0
• Gini > 3 = 0.375
Gini Gain = 0.4687 - 8/16 (0) - (2/16) (0.375)
Gini Gain = 0.421825
When to use which?
● Gini for continuous attributes
● Entropy for categorical.
● Entropy is slower to calculate than GINI
● Gini may fail with very small probability
● Difference between the two is theoretically around 2%
When to stop growing?
• All data points at leaf are pure
• When tree a reaches depth k
• Number of cases in node less that minimum number of cases
• Splitting criteria less than certain threshold
Pruning
Prevent over fitting
Smaller trees may be more accurate
Strategies:
• Prepruning : Stop growing when information becomes
unreliable
• Postpruning : fully grow a tree and remove unreliable parts
Note: Pruning currently not supported by scikit
Algorithms
ID3 (Iterative Dichotomiser 3) Greedy algorithm, categorical
(entropy)
C4.5 Improves on ID3 support categorical and continuous
(entropy)
C5.0 (See5)
CART similar to C4.5 (Gini Impurity)
Pros
• Easy to Understand (white box)
• Supports both Numerical and Categorical data
• Fast (greedy) algorithms
• Performs well with large datasets
• Accurate
• Feature importance
Cons
• Without pruning/Cross-validation Prone to overfitting
• Information gain biased toward features with a lot of classes
• Sensitive to changes in the data
DEMO
Resources
• https://github.com/csantill/AustinSIGKDD-DecisionTrees
• Decision Forests for Classification, Regression, Density
Estimation, Manifold Learning and Semi-Supervised Learning
• Classification and Regression Trees
• A Visual Introduction to Machine Learning
• A Complete Tutorial on Tree Based Modeling from Scratch
• Theoretical Comparison between the Gini Index and
Information Gain Criteria
Thank You
Carlos Santillan

More Related Content

PPTX
[Women in Data Science Meetup ATX] Decision Trees
PPT
Data pre processing
DOCX
A random decision tree frameworkfor privacy preserving data mining
PDF
7 decision tree
PPTX
Data reduction
PPTX
Data mining techniques unit v
PPTX
special quotients
PPTX
Data mining techniques unit 2
[Women in Data Science Meetup ATX] Decision Trees
Data pre processing
A random decision tree frameworkfor privacy preserving data mining
7 decision tree
Data reduction
Data mining techniques unit v
special quotients
Data mining techniques unit 2

What's hot (17)

PPTX
Exploratory data analysis
PPTX
Data preprocessing
PPT
Decision tree
PPT
Brief Introduction to the 12 Steps of Evaluation Data Cleaning
PDF
03. Data Preprocessing
PDF
LITE 2015 - Data and Reporting
PPT
An Introduction to SPSS
PPTX
Data preprocessing
PPTX
Object-Oriented Design Fundamentals.pptx
PPTX
Statistical software packages
PPTX
Data processing and analysis final
PPTX
Decision tree induction
PPT
Preprocessing
PPTX
Data processing & Analysis: SPSS an overview
PDF
4 module 3 --
PPT
Data preprocessing
PPT
Spss beginners
Exploratory data analysis
Data preprocessing
Decision tree
Brief Introduction to the 12 Steps of Evaluation Data Cleaning
03. Data Preprocessing
LITE 2015 - Data and Reporting
An Introduction to SPSS
Data preprocessing
Object-Oriented Design Fundamentals.pptx
Statistical software packages
Data processing and analysis final
Decision tree induction
Preprocessing
Data processing & Analysis: SPSS an overview
4 module 3 --
Data preprocessing
Spss beginners
Ad

Viewers also liked (20)

PPTX
Protecting Web App users in today’s hostile environment
PDF
How should we perceive Security in the Cloud
PDF
Network Function Virtualization (NFV) BoF
PDF
INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...
PDF
L3. Decision Trees
PPTX
Translators
PPTX
Final presentation MIS 637 A - Rishab Kothari
PDF
Algoritma C4.5 Dalam Data Mining
PDF
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
PPTX
Decision Tree - C4.5&CART
PPTX
Id3,c4.5 algorithim
PPTX
Fin presentation
PPTX
Network Function Virtualization : Overview
PPTX
Techniques in Translation
PDF
5 pen pc technology
PPTX
Step By Step Guide to Learn R
PPTX
EVO - Gamification in healthcare - Manu Melwin Joy
PPTX
Decision tree
PPTX
RapidMiner: Data Mining And Rapid Miner
PPT
Sales forecasting
Protecting Web App users in today’s hostile environment
How should we perceive Security in the Cloud
Network Function Virtualization (NFV) BoF
INTO-CPS: An integrated “tool chain” for comprehensive Model-Based Design of ...
L3. Decision Trees
Translators
Final presentation MIS 637 A - Rishab Kothari
Algoritma C4.5 Dalam Data Mining
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Decision Tree - C4.5&CART
Id3,c4.5 algorithim
Fin presentation
Network Function Virtualization : Overview
Techniques in Translation
5 pen pc technology
Step By Step Guide to Learn R
EVO - Gamification in healthcare - Manu Melwin Joy
Decision tree
RapidMiner: Data Mining And Rapid Miner
Sales forecasting
Ad

Similar to Decision Trees (20)

PDF
Decision trees
PPTX
Decision Tree Learning
PDF
Decision tree
PPTX
Decision tree
PDF
Decision Trees - The Machine Learning Magic Unveiled
PPTX
Decision Tree.pptx
PDF
Decision tree lecture 3
PPTX
Lecture08_Decision Tree Learning PartII.pptx
PDF
Machine Learning Decision Tree Algorithms
PPTX
DECISION TRESS 2 for machine learning beginners
PPTX
DECISION TRESS for Machine Learning Beginners
PPTX
03-classificationTrees03-classificationTrees.pptx
PPTX
Decision_Trees_Lecture.pptx - Basics Class
PPTX
BAS 250 Lecture 8
PDF
Supervised Learning Decision Trees Review of Entropy
PDF
Supervised Learning Decision Trees Machine Learning
PDF
CSA 3702 machine learning module 2
PPTX
Decision Trees
PDF
Gloeocercospora sorghiGloeocercospora sorghi
PDF
A Method for Vibration Testing Decision Tree-Based Classification Systems.
Decision trees
Decision Tree Learning
Decision tree
Decision tree
Decision Trees - The Machine Learning Magic Unveiled
Decision Tree.pptx
Decision tree lecture 3
Lecture08_Decision Tree Learning PartII.pptx
Machine Learning Decision Tree Algorithms
DECISION TRESS 2 for machine learning beginners
DECISION TRESS for Machine Learning Beginners
03-classificationTrees03-classificationTrees.pptx
Decision_Trees_Lecture.pptx - Basics Class
BAS 250 Lecture 8
Supervised Learning Decision Trees Review of Entropy
Supervised Learning Decision Trees Machine Learning
CSA 3702 machine learning module 2
Decision Trees
Gloeocercospora sorghiGloeocercospora sorghi
A Method for Vibration Testing Decision Tree-Based Classification Systems.

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Modernizing your data center with Dell and AMD
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Understanding_Digital_Forensics_Presentation.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
A Presentation on Artificial Intelligence
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Mobile App Security Testing_ A Comprehensive Guide.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
The Rise and Fall of 3GPP – Time for a Sabbatical?
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Big Data Technologies - Introduction.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Network Security Unit 5.pdf for BCA BBA.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Digital-Transformation-Roadmap-for-Companies.pptx
MYSQL Presentation for SQL database connectivity
“AI and Expert System Decision Support & Business Intelligence Systems”
Modernizing your data center with Dell and AMD
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Understanding_Digital_Forensics_Presentation.pptx
The AUB Centre for AI in Media Proposal.docx

Decision Trees

  • 1. Advanced Machine Learning with Python Session 9 :Decision Trees SIGKDD Carlos Santillan Bentley Systems Inc csantill@gmail.com
  • 4. Decision Trees A tree-like graph decision support model
  • 6. Types of Decision Trees There are two main Types • Classification Tree (Categorical Value Decision Tree) • Regression Tree (Continuous Variable Decision Tree) CART (Classification and Regression Tree) Used to refer to both The type of a Decision tree is based on the type of the target Variable
  • 7. Nodes 1.Root Node 2.Internal Node (Decision Node) 3.Leaf (terminal) Depth - Length of of longest path from root to leaf Decision Stump (One level decision Tree) Decision Tree Terms
  • 8. Decision Tree Algorithm The basic greedy algorithm is as follows: Start at Node find “best attribute” to split at Repartition N into N1, N2, … according to best split Repeat for each Node N until “stop condition” is met Growing an optimal Decision Tree is an NP-complete Problem Fortunately greedy algorithms have good accuracy and performance
  • 9. What is the “Best Attribute” to split There are different criteria that can be used to determine what is the best attribute to split. • Information Gain • Gini Index • Classification Error • Gain Ratio (Normalized Information Gain) • Variance Reduction
  • 11. Entropy Def: Measure of Impurity in our sample • Entropy =0 (All elements are same class) • Entropy =1 (All elements evenly split between classes)
  • 12. Information Gain Information Gain = Entropy (Parent) - [ Weighted Average] Entropy (Children) If we Split X < 4 • Entropy < 4 = 0.86 • Entropy > 4 = 0 Information Gain = 0.95 - 14/16 (0.86) - (2/16) (0) Information Gain = 0.19
  • 13. Information Gain IG = Entropy (Parent) - [ Weighted Average] Entropy (Children) If we Split X < 3 • Entropy < 3 = 0 • Entropy > 3 = 0.811 Information Gain = 0.95 - 8/16 (0) - (2/16) (0.811) Information Gain = 0.8486
  • 14. GINI Index Definition : Expected error rate • GINI =0 (All elements are same class) • GINI =0.5 (All elements evenly split between classes)
  • 15. GINI Gain If we Split X < 4 • Gini < 4 = 0.4081 • Gini > 4 = 0 Gini Gain = 0.4687 - 10/16 (0.4081) - (0/16) (0) Gini Gain = 0.2136
  • 16. GINI Gain If we Split X < 3 • Gini< 3 = 0 • Gini > 3 = 0.375 Gini Gain = 0.4687 - 8/16 (0) - (2/16) (0.375) Gini Gain = 0.421825
  • 17. When to use which? ● Gini for continuous attributes ● Entropy for categorical. ● Entropy is slower to calculate than GINI ● Gini may fail with very small probability ● Difference between the two is theoretically around 2%
  • 18. When to stop growing? • All data points at leaf are pure • When tree a reaches depth k • Number of cases in node less that minimum number of cases • Splitting criteria less than certain threshold
  • 19. Pruning Prevent over fitting Smaller trees may be more accurate Strategies: • Prepruning : Stop growing when information becomes unreliable • Postpruning : fully grow a tree and remove unreliable parts Note: Pruning currently not supported by scikit
  • 20. Algorithms ID3 (Iterative Dichotomiser 3) Greedy algorithm, categorical (entropy) C4.5 Improves on ID3 support categorical and continuous (entropy) C5.0 (See5) CART similar to C4.5 (Gini Impurity)
  • 21. Pros • Easy to Understand (white box) • Supports both Numerical and Categorical data • Fast (greedy) algorithms • Performs well with large datasets • Accurate • Feature importance
  • 22. Cons • Without pruning/Cross-validation Prone to overfitting • Information gain biased toward features with a lot of classes • Sensitive to changes in the data
  • 23. DEMO
  • 24. Resources • https://github.com/csantill/AustinSIGKDD-DecisionTrees • Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning • Classification and Regression Trees • A Visual Introduction to Machine Learning • A Complete Tutorial on Tree Based Modeling from Scratch • Theoretical Comparison between the Gini Index and Information Gain Criteria