SlideShare a Scribd company logo
5
Most read
6
Most read
7
Most read
Decision Tree(ID3)
Xueping Peng
Xueping.peng@uts.edu.au
Outline
 What is decision tree
 How to Use DecisionTree
 How to Generate a DecisionTree
 Sum Up and Some Drawbacks
What is decision tree(1/3)
 Decision tree is a hierarchical tree structure that used to classify
classes based on a series of questions (or rules) about the attributes
of the class.
 The attributes of the classes can be any type of variables from binary,
nominal, ordinal, and quantitative values.
 The classes must be qualitative type (categorical or binary, or
ordinal).
 In short, given a data of attributes together with its classes, a
decision tree produces a sequence of rules (or series of questions)
that can be used to recognize the class.
What is decision tree(2/3)
Attributes Classes
Gender Car Ownership Travel Cost ($)/km Income Level Transportation
Mode
Male 0 Cheap Low Bus
Male 1 Cheap Medium Bus
Female 1 Cheap Medium Train
Female 0 Cheap Low Bus
Male 1 Cheap Medium Bus
Female 0 Standard Medium Train
Female 1 Standard Medium Train
Female 1 Expensive High Car
Male 2 Expensive Medium Car
Female 2 Expensive High Car
What is decision tree(3/3)
How to Use DecisionTree
Person Name Gender Car
Ownership
Travel Cost ($)/km Income Level Transportation
Level
Alex Male 1 Standard High ?
Buddy Male 0 Cheap Medium ?
Cherry Female 1 Cheap High ?
 Test Data
 What transportation mode would Alex, Buddy and Cheery use?
AlexBuddy Cherry
How to Generate a Decision Tree(1/13)
 Description of ID3
How to Generate a Decision Tree(2/13)
 Which is the best choice?
 We have 29 positive examples and 35 negative ones
 Should I use attribute 1 or attribute 2 in this iteration of the node?
How to Generate a Decision Tree(3/13)
 Use Entropy to Measure Degree of Impurity
 Entropy
 All above formulas contain values of probability of Pj a class j.
How to Generate a Decision Tree(4/13)
 What does Entropy mean?
 Entropy is the minimum number of bits needed to encode the
classification of a member of S randomly drawn.
 P+ = 1, the receiver knows the class, no message sent, Entropy=0.
 P+ = 0.5, 1 bit needed.
 Optimal length code assigns –log2p to message having probability p
 The idea behind is to assign shorter codes to the more probable
messages and longer codes to less likely examples
 Thus,the expected number of bits to encode + or – of random
member of S is:
 H(S) = p+ (-log2 p+) + p-(-log2 p-)
How to Generate a Decision Tree(5/13)
 Information Gain
 Measures the expected reduction in entropy caused by partitioning
the examples according to the given attribute
 IG(S|A): the number of bits saved when encoding the target value of
an arbitrary member of S, knowing the value of attribute A.
 Expected reduction in entropy caused by knowing the value ofA
 IG(S|A) = H(S) – Σj Prob(A=vj) H(Y | A = vj)
How to Generate a Decision Tree(6/13)
 Which is the best choice?
 We have 29 positive examples and 35 negative ones
 Should I use attribute 0 or attribute 2 in this iteration of the node?
IG(A1) = 0.993 – 26/64*0.70 – 36/64*0.74 = 0.292
IG(A2) = 0.993 – 51/64*0.93 – 13/64*0.61 = 0.128
How to Generate a Decision Tree(7/13)
 Specific Conditional Entropy H(Y|X=v)
 Y is class, X is attribute and v is value of X
 H(Y |X=v) = The entropy of Y among only those records in which X
has value v
 H(Class|Travel Cost=Cheap) =
-0.8*log20.8 - 0.2*log20.2 = 0.722
 H(Class|Travel Cost=Expensive) =
-1*log21 = 0
 H(Class|Travel Cost=Standard) =
-1*log21 = 0
How to Generate a Decision Tree(8/13)
 Conditional Entropy H(Y|X)
 H(Y |X) = The average specific conditional entropy of Y=
Σj Prob(X=vj) H(Y | X = vj)
 e.g. H(Class|Travel Cost) =
prob(Travel Cost=Cheap) * H(Class|Travel Cost=Cheap) +
prob(Travel Cost=Expensive) * H(Class|Travel Cost=Expensive) +
prob(Travel Cost=Standard) * H(Class|Travel Cost=Standard)
= 0.5 * 0.722 + 0.2 * 0 + 0.3 * 0 = 0.361
How to Generate a Decision Tree(9/13)
 Information Gain IG(Y|X)
 IG(Y|X) = H(Y) - H(Y | X)
 e.g.
 H(Class) = – 0.4 log2 (0.4) – 0.3 log2 (0.3) – 0.3 log2 (0.3) = 1.571
 IG(Class|Travel Cost) = H(Class) – H(Class|Travel Cost)
1.571 – 0.361 = 1.210
 Results of first iteration
Gain Gender Car
Ownership
Travel Cost ($)/km Income Level
IG 0.125 0.534 1.210 0.695
How to Generate a Decision Tree(10/13)
 Root Node
 Split Node
How to Generate a Decision Tree(11/13)
 Second Iteration
How to Generate a Decision Tree(12/13)
 Results of Second Iteration
 Split Node
 Update DecisionTree
Gain Gender Car
Ownership
Income
Level
IG 0.322 0.171 0.171
How to Generate a Decision Tree(13/13)
 Third Iteration
 Update DecisionTree
To Sum Up
 ID3 is a strong system that
 Uses hill-climbing search based on the information gain measure
to search through the space of decision trees
 Outputs a single hypothesis
 Never backtracks.It converges to locally optimal solutions
 Uses all training examples at each step, contrary to methods that
make decisions incrementally
 Uses statistical properties of all examples:the search is less
sensitive to errors in individual training examples
Some Drawbacks
 It can only deal with nominal data
 It may be not robust in presence of noise
 It is not able to deal with noisy data sets
References
 Tutorial on DecisionTree,
http://people.revoledu.com/kardi/tutorial/DecisionTree/index
.html
 Information Gain,
http://www.autonlab.org/tutorials/infogain11.pdf
 http://www.slideshare.net/aorriols/lecture5-c45

More Related Content

PPTX
Id3,c4.5 algorithim
PPTX
ID3 ALGORITHM
PPT
Decision tree
PPT
Flow & Error Control
PPTX
Decision tree induction \ Decision Tree Algorithm with Example| Data science
PPTX
Introdution and designing a learning system
PPTX
Inductive bias
PDF
Decision trees in Machine Learning
Id3,c4.5 algorithim
ID3 ALGORITHM
Decision tree
Flow & Error Control
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Introdution and designing a learning system
Inductive bias
Decision trees in Machine Learning

What's hot (20)

ODP
Machine Learning with Decision trees
PPT
Omega example
PDF
Bias and variance trade off
PDF
Decision tree lecture 3
PPTX
And or graph
PPTX
CART Algorithm.pptx
PPTX
Decision Trees
PPT
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
PPTX
Chapter 4 Classification
PPTX
Feedforward neural network
PPT
Clustering: Large Databases in data mining
PPTX
Dynamic interconnection networks
PPTX
Bagging.pptx
PDF
Rnn and lstm
PPTX
K means clustering
PDF
Support Vector Machines ( SVM )
PPTX
Constraint satisfaction problems (csp)
PPTX
Dempster shafer theory
PPTX
Logics for non monotonic reasoning-ai
PPTX
Bayesian network
Machine Learning with Decision trees
Omega example
Bias and variance trade off
Decision tree lecture 3
And or graph
CART Algorithm.pptx
Decision Trees
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Chapter 4 Classification
Feedforward neural network
Clustering: Large Databases in data mining
Dynamic interconnection networks
Bagging.pptx
Rnn and lstm
K means clustering
Support Vector Machines ( SVM )
Constraint satisfaction problems (csp)
Dempster shafer theory
Logics for non monotonic reasoning-ai
Bayesian network
Ad

Viewers also liked (20)

PPTX
Clique Pens Pricing: The Writing Implements Division of U.S. Home
PPTX
Presentasi Implementasi Algoritma ID3
PPSX
Classification Using Decision tree
DOCX
Algoritma id3
PDF
Clickers' Clique
PDF
Baocaoseminar2012
PPT
BioWeka
PPTX
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
PPTX
Presentasi Eclat
PPTX
Data mining
PPTX
Eclat algorithm in association rule mining
PPTX
Clique
PPT
Apriori and Eclat algorithm in Association Rule Mining
PPTX
Airborne wind energy system ppt
PDF
Decision tree
PDF
Clique Relaxation Models in Networks: Theory, Algorithms, and Applications
PPTX
Classification of hotels
PDF
How to Make Awesome SlideShares: Tips & Tricks
PDF
Getting Started With SlideShare
Clique Pens Pricing: The Writing Implements Division of U.S. Home
Presentasi Implementasi Algoritma ID3
Classification Using Decision tree
Algoritma id3
Clickers' Clique
Baocaoseminar2012
BioWeka
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
Presentasi Eclat
Data mining
Eclat algorithm in association rule mining
Clique
Apriori and Eclat algorithm in Association Rule Mining
Airborne wind energy system ppt
Decision tree
Clique Relaxation Models in Networks: Theory, Algorithms, and Applications
Classification of hotels
How to Make Awesome SlideShares: Tips & Tricks
Getting Started With SlideShare
Ad

Similar to Decision Tree - ID3 (20)

PPT
Storey_DecisionTrees explain ml algo.ppt
PPT
Decision tree Using Machine Learning.ppt
PPTX
Decision trees
PPTX
Decision Tree Learning: Decision tree representation, Appropriate problems fo...
PPTX
Machine Learning, Decision Tree Learning module_2_ppt.pptx
PPTX
module_3_1.pptx
PPTX
module_3_1.pptx
PDF
Decision Tree-ID3,C4.5,CART,Regression Tree
PDF
Machine Learning using python module_2_ppt.pdf
PPTX
ML_Unit_1_Part_C
PDF
Research scholars evaluation based on guides view using id3
PPTX
Decision Trees Learning in Machine Learning
PPT
Decision Trees.ppt
PDF
Research scholars evaluation based on guides view
PPTX
7. decision trees basics .pptx
PPTX
Decision tree presentation
PPTX
Decision tree
PDF
Decision trees
PPTX
BAS 250 Lecture 5
PDF
Decision treeDecision treeDecision treeDecision tree
Storey_DecisionTrees explain ml algo.ppt
Decision tree Using Machine Learning.ppt
Decision trees
Decision Tree Learning: Decision tree representation, Appropriate problems fo...
Machine Learning, Decision Tree Learning module_2_ppt.pptx
module_3_1.pptx
module_3_1.pptx
Decision Tree-ID3,C4.5,CART,Regression Tree
Machine Learning using python module_2_ppt.pdf
ML_Unit_1_Part_C
Research scholars evaluation based on guides view using id3
Decision Trees Learning in Machine Learning
Decision Trees.ppt
Research scholars evaluation based on guides view
7. decision trees basics .pptx
Decision tree presentation
Decision tree
Decision trees
BAS 250 Lecture 5
Decision treeDecision treeDecision treeDecision tree

Decision Tree - ID3

  • 2. Outline  What is decision tree  How to Use DecisionTree  How to Generate a DecisionTree  Sum Up and Some Drawbacks
  • 3. What is decision tree(1/3)  Decision tree is a hierarchical tree structure that used to classify classes based on a series of questions (or rules) about the attributes of the class.  The attributes of the classes can be any type of variables from binary, nominal, ordinal, and quantitative values.  The classes must be qualitative type (categorical or binary, or ordinal).  In short, given a data of attributes together with its classes, a decision tree produces a sequence of rules (or series of questions) that can be used to recognize the class.
  • 4. What is decision tree(2/3) Attributes Classes Gender Car Ownership Travel Cost ($)/km Income Level Transportation Mode Male 0 Cheap Low Bus Male 1 Cheap Medium Bus Female 1 Cheap Medium Train Female 0 Cheap Low Bus Male 1 Cheap Medium Bus Female 0 Standard Medium Train Female 1 Standard Medium Train Female 1 Expensive High Car Male 2 Expensive Medium Car Female 2 Expensive High Car
  • 5. What is decision tree(3/3)
  • 6. How to Use DecisionTree Person Name Gender Car Ownership Travel Cost ($)/km Income Level Transportation Level Alex Male 1 Standard High ? Buddy Male 0 Cheap Medium ? Cherry Female 1 Cheap High ?  Test Data  What transportation mode would Alex, Buddy and Cheery use? AlexBuddy Cherry
  • 7. How to Generate a Decision Tree(1/13)  Description of ID3
  • 8. How to Generate a Decision Tree(2/13)  Which is the best choice?  We have 29 positive examples and 35 negative ones  Should I use attribute 1 or attribute 2 in this iteration of the node?
  • 9. How to Generate a Decision Tree(3/13)  Use Entropy to Measure Degree of Impurity  Entropy  All above formulas contain values of probability of Pj a class j.
  • 10. How to Generate a Decision Tree(4/13)  What does Entropy mean?  Entropy is the minimum number of bits needed to encode the classification of a member of S randomly drawn.  P+ = 1, the receiver knows the class, no message sent, Entropy=0.  P+ = 0.5, 1 bit needed.  Optimal length code assigns –log2p to message having probability p  The idea behind is to assign shorter codes to the more probable messages and longer codes to less likely examples  Thus,the expected number of bits to encode + or – of random member of S is:  H(S) = p+ (-log2 p+) + p-(-log2 p-)
  • 11. How to Generate a Decision Tree(5/13)  Information Gain  Measures the expected reduction in entropy caused by partitioning the examples according to the given attribute  IG(S|A): the number of bits saved when encoding the target value of an arbitrary member of S, knowing the value of attribute A.  Expected reduction in entropy caused by knowing the value ofA  IG(S|A) = H(S) – Σj Prob(A=vj) H(Y | A = vj)
  • 12. How to Generate a Decision Tree(6/13)  Which is the best choice?  We have 29 positive examples and 35 negative ones  Should I use attribute 0 or attribute 2 in this iteration of the node? IG(A1) = 0.993 – 26/64*0.70 – 36/64*0.74 = 0.292 IG(A2) = 0.993 – 51/64*0.93 – 13/64*0.61 = 0.128
  • 13. How to Generate a Decision Tree(7/13)  Specific Conditional Entropy H(Y|X=v)  Y is class, X is attribute and v is value of X  H(Y |X=v) = The entropy of Y among only those records in which X has value v  H(Class|Travel Cost=Cheap) = -0.8*log20.8 - 0.2*log20.2 = 0.722  H(Class|Travel Cost=Expensive) = -1*log21 = 0  H(Class|Travel Cost=Standard) = -1*log21 = 0
  • 14. How to Generate a Decision Tree(8/13)  Conditional Entropy H(Y|X)  H(Y |X) = The average specific conditional entropy of Y= Σj Prob(X=vj) H(Y | X = vj)  e.g. H(Class|Travel Cost) = prob(Travel Cost=Cheap) * H(Class|Travel Cost=Cheap) + prob(Travel Cost=Expensive) * H(Class|Travel Cost=Expensive) + prob(Travel Cost=Standard) * H(Class|Travel Cost=Standard) = 0.5 * 0.722 + 0.2 * 0 + 0.3 * 0 = 0.361
  • 15. How to Generate a Decision Tree(9/13)  Information Gain IG(Y|X)  IG(Y|X) = H(Y) - H(Y | X)  e.g.  H(Class) = – 0.4 log2 (0.4) – 0.3 log2 (0.3) – 0.3 log2 (0.3) = 1.571  IG(Class|Travel Cost) = H(Class) – H(Class|Travel Cost) 1.571 – 0.361 = 1.210  Results of first iteration Gain Gender Car Ownership Travel Cost ($)/km Income Level IG 0.125 0.534 1.210 0.695
  • 16. How to Generate a Decision Tree(10/13)  Root Node  Split Node
  • 17. How to Generate a Decision Tree(11/13)  Second Iteration
  • 18. How to Generate a Decision Tree(12/13)  Results of Second Iteration  Split Node  Update DecisionTree Gain Gender Car Ownership Income Level IG 0.322 0.171 0.171
  • 19. How to Generate a Decision Tree(13/13)  Third Iteration  Update DecisionTree
  • 20. To Sum Up  ID3 is a strong system that  Uses hill-climbing search based on the information gain measure to search through the space of decision trees  Outputs a single hypothesis  Never backtracks.It converges to locally optimal solutions  Uses all training examples at each step, contrary to methods that make decisions incrementally  Uses statistical properties of all examples:the search is less sensitive to errors in individual training examples
  • 21. Some Drawbacks  It can only deal with nominal data  It may be not robust in presence of noise  It is not able to deal with noisy data sets
  • 22. References  Tutorial on DecisionTree, http://people.revoledu.com/kardi/tutorial/DecisionTree/index .html  Information Gain, http://www.autonlab.org/tutorials/infogain11.pdf  http://www.slideshare.net/aorriols/lecture5-c45