SlideShare a Scribd company logo
By
Dr.Ramkumar.T
ramkumar.thirunavukarasu@vit.ac.in
Decision Tree
1
Decision Tree
• Relatively fast compared to other classification
models
• Obtain similar and sometimes better accuracy
compared to other models
• Simple and easy to understand
• Can be converted into simple and easy to
understand classification rules
2
Decision Tree
3
Decision Tree
• The tree has three types of nodes.
• A root node - that has no incoming edges
and one or more outgoing edges.
• Internal nodes - Each of which has exactly
one incoming edge and two or more
outgoing edges.
• Leaf or terminal nodes - Each of which
has exactly one incoming edge and no
outgoing edges.
4
How Decision Tree Works ?
• Classifying a test record is straightforward once a
decision tree has been constructed.
• Starting from the root node, we apply the test
condition to the record and follow the appropriate
branch based on the outcome of the test.
• This will lead us either to another internal node,
for which a new test condition is applied, or to a
leaf node.
• The class label associated with the leaf node is
then assigned to the record.
5
Requirements of Decision Tree
• Attribute-value description - object or case must
be expressible in terms of a fixed collection of
properties or attributes (e.g., hot, mild, cold).
• Predefined classes (target values)- the target
function has discrete output values (Boolean or
multiclass)
• Sufficient data- Enough training cases should be
provided to learn the model.
6
How Decision Tree Inducted?
• Tree is constructed in a top-down recursive
divide-and-conquer manner
• At start, all the training examples are at the root
• Attributes are categorical (if continuous-valued,
they are discretized in advance)
• Attributes are selected on the basis of a heuristic
or statistical measure (e.g., information gain)
• Examples are partitioned recursively based on
selected attributes
7
Decision Tree Induction – Measures
• Entropy or Information Theory - One of the techniques
for selecting an attribute to split a node.
• “If you have uncertainty – you have Information”
• Information is defined as “-pilogpi” where „pi‟ is the
probability of some event.
• Information of any event that is likely to have several
possible outcomes is given by
• Information Gain – It is the measure of how good an attribute
is for predicting the class of each of the training data.
8
DT – Measures (ID3 Algorithm)
• Expected information (entropy) needed to
classify a tuple in D:
• Information needed (after using A to split D
into v partitions) to classify D:
 Information gained by branching on
attribute A
9
(D)
Entropy
Entropy(D)
Gain(A) A


10
Gain Ratio Measure (C4.5)
 Information gain measure is biased towards attributes with a
large number of values
 C4.5 (a successor of ID3) uses gain ratio to overcome the
problem (normalization to information gain)
GainRatio(A) = Gain(A)/SplitEntropy(A)
 Ex.
Gain_ratio(income) = 0.029/0.926 = 0.031
 The attribute with the maximum gain ratio is selected as the
splitting attribute
)
|
|
|
|
(
log
|
|
|
|
)
( 2
1 D
D
D
D
D
py
SplitEntro
j
v
j
j
A 

 

926
.
0
)
14
4
(
log
14
4
)
14
6
(
log
14
6
)
14
4
(
log
14
4
)
( 2
2
2 







D
py
SplitEntro A
Training Data
11
7 decision tree
7 decision tree
7 decision tree
7 decision tree
7 decision tree

More Related Content

PDF
4 module 3 --
PDF
3 module 2
PDF
2 introductory slides
PDF
6 module 4
PPT
Data mining techniques unit iv
PPT
Data mining technique for classification and feature evaluation using stream ...
PPTX
Data mining techniques unit v
PPTX
04 Classification in Data Mining
4 module 3 --
3 module 2
2 introductory slides
6 module 4
Data mining techniques unit iv
Data mining technique for classification and feature evaluation using stream ...
Data mining techniques unit v
04 Classification in Data Mining

What's hot (19)

PPTX
Decision tree induction
PPTX
Classification
PPTX
Dsa unit 1
PPTX
Dma unit 2
PPT
Dma unit 1
PPTX
Data mining techniques unit 2
PPTX
Data mining Basics and complete description
PPTX
lazy learners and other classication methods
PPTX
Data mining Basics and complete description onword
PPTX
Data mining primitives
PDF
Survey on Various Classification Techniques in Data Mining
PPT
Decision tree
PPTX
Chapter 4 Classification
PDF
Associative Classification: Synopsis
PPTX
Data Mining: Mining stream time series and sequence data
PPTX
Lect8 Classification & prediction
PPTX
Data mining technique (decision tree)
PPTX
Exploratory data analysis
PPTX
Lect9 Decision tree
Decision tree induction
Classification
Dsa unit 1
Dma unit 2
Dma unit 1
Data mining techniques unit 2
Data mining Basics and complete description
lazy learners and other classication methods
Data mining Basics and complete description onword
Data mining primitives
Survey on Various Classification Techniques in Data Mining
Decision tree
Chapter 4 Classification
Associative Classification: Synopsis
Data Mining: Mining stream time series and sequence data
Lect8 Classification & prediction
Data mining technique (decision tree)
Exploratory data analysis
Lect9 Decision tree
Ad

Similar to 7 decision tree (20)

PDF
Decision Tree-ID3,C4.5,CART,Regression Tree
PDF
Decision trees
PPTX
DecisionTree.pptx for btech cse student
PDF
Lecture 5 Decision tree.pdf
DOCX
Classification Using Decision Trees and RulesChapter 5.docx
PDF
Efficient classification of big data using vfdt (very fast decision tree)
PPTX
Decision Trees Learning in Machine Learning
PPTX
module_3_1.pptx
PPTX
module_3_1.pptx
PPT
Slide3.ppt
PPTX
Decision Tree Learning: Decision tree representation, Appropriate problems fo...
PDF
decision tree.pdf
PPTX
Decision Trees
PDF
Machine learning important pdf for supervised
PPTX
Chapter4-ML.pptx slide for concept of mechanic learning
PDF
CSA 3702 machine learning module 2
PDF
A Decision Tree Based Classifier for Classification & Prediction of Diseases
PPTX
Machine Learning, Decision Tree Learning module_2_ppt.pptx
PPT
DM Unit-III ppt.ppt
PPTX
Decision Tree Classification Algorithm.pptx
Decision Tree-ID3,C4.5,CART,Regression Tree
Decision trees
DecisionTree.pptx for btech cse student
Lecture 5 Decision tree.pdf
Classification Using Decision Trees and RulesChapter 5.docx
Efficient classification of big data using vfdt (very fast decision tree)
Decision Trees Learning in Machine Learning
module_3_1.pptx
module_3_1.pptx
Slide3.ppt
Decision Tree Learning: Decision tree representation, Appropriate problems fo...
decision tree.pdf
Decision Trees
Machine learning important pdf for supervised
Chapter4-ML.pptx slide for concept of mechanic learning
CSA 3702 machine learning module 2
A Decision Tree Based Classifier for Classification & Prediction of Diseases
Machine Learning, Decision Tree Learning module_2_ppt.pptx
DM Unit-III ppt.ppt
Decision Tree Classification Algorithm.pptx
Ad

Recently uploaded (20)

PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Foundation of Data Science unit number two notes
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Database Infoormation System (DBIS).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Computer network topology notes for revision
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Supervised vs unsupervised machine learning algorithms
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
Launch Your Data Science Career in Kochi – 2025
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPT
Quality review (1)_presentation of this 21
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Fluorescence-microscope_Botany_detailed content
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Foundation of Data Science unit number two notes
STUDY DESIGN details- Lt Col Maksud (21).pptx
Database Infoormation System (DBIS).pptx
Reliability_Chapter_ presentation 1221.5784
Computer network topology notes for revision
Galatica Smart Energy Infrastructure Startup Pitch Deck
Miokarditis (Inflamasi pada Otot Jantung)
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
.pdf is not working space design for the following data for the following dat...
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Supervised vs unsupervised machine learning algorithms
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Launch Your Data Science Career in Kochi – 2025
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Moving the Public Sector (Government) to a Digital Adoption
Quality review (1)_presentation of this 21
Introduction to Knowledge Engineering Part 1
Fluorescence-microscope_Botany_detailed content

7 decision tree

  • 2. Decision Tree • Relatively fast compared to other classification models • Obtain similar and sometimes better accuracy compared to other models • Simple and easy to understand • Can be converted into simple and easy to understand classification rules 2
  • 4. Decision Tree • The tree has three types of nodes. • A root node - that has no incoming edges and one or more outgoing edges. • Internal nodes - Each of which has exactly one incoming edge and two or more outgoing edges. • Leaf or terminal nodes - Each of which has exactly one incoming edge and no outgoing edges. 4
  • 5. How Decision Tree Works ? • Classifying a test record is straightforward once a decision tree has been constructed. • Starting from the root node, we apply the test condition to the record and follow the appropriate branch based on the outcome of the test. • This will lead us either to another internal node, for which a new test condition is applied, or to a leaf node. • The class label associated with the leaf node is then assigned to the record. 5
  • 6. Requirements of Decision Tree • Attribute-value description - object or case must be expressible in terms of a fixed collection of properties or attributes (e.g., hot, mild, cold). • Predefined classes (target values)- the target function has discrete output values (Boolean or multiclass) • Sufficient data- Enough training cases should be provided to learn the model. 6
  • 7. How Decision Tree Inducted? • Tree is constructed in a top-down recursive divide-and-conquer manner • At start, all the training examples are at the root • Attributes are categorical (if continuous-valued, they are discretized in advance) • Attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain) • Examples are partitioned recursively based on selected attributes 7
  • 8. Decision Tree Induction – Measures • Entropy or Information Theory - One of the techniques for selecting an attribute to split a node. • “If you have uncertainty – you have Information” • Information is defined as “-pilogpi” where „pi‟ is the probability of some event. • Information of any event that is likely to have several possible outcomes is given by • Information Gain – It is the measure of how good an attribute is for predicting the class of each of the training data. 8
  • 9. DT – Measures (ID3 Algorithm) • Expected information (entropy) needed to classify a tuple in D: • Information needed (after using A to split D into v partitions) to classify D:  Information gained by branching on attribute A 9 (D) Entropy Entropy(D) Gain(A) A  
  • 10. 10 Gain Ratio Measure (C4.5)  Information gain measure is biased towards attributes with a large number of values  C4.5 (a successor of ID3) uses gain ratio to overcome the problem (normalization to information gain) GainRatio(A) = Gain(A)/SplitEntropy(A)  Ex. Gain_ratio(income) = 0.029/0.926 = 0.031  The attribute with the maximum gain ratio is selected as the splitting attribute ) | | | | ( log | | | | ) ( 2 1 D D D D D py SplitEntro j v j j A      926 . 0 ) 14 4 ( log 14 4 ) 14 6 ( log 14 6 ) 14 4 ( log 14 4 ) ( 2 2 2         D py SplitEntro A