SlideShare a Scribd company logo
2
Most read
3
Most read
6
Most read
Entropy and
Information Gain
in Decision Tree
OMega TechEd
Entropy
Entropy is the machine learning metric that measures the
unpredictability or impurity in the system.
Entropy is the measurement of disorder or impurities in the information
processed in machine learning. It determines how a decision tree
chooses to split data.
2
High Entropy
Low Entropy
OMega TechEd
Entropy
A random variable with only one value, a coin that always comes up heads,
has no uncertainty and thus its entropy is defined as zero; thus, we gain no
information by observing its value.
Entropy always lies between 0 and 1, however depending on the number of
classes in the dataset, it can be greater than 1.
In general, the entropy of a random variable V with values vk, each with
probability P(vk), is defined as Entropy:
H(V ) = − ∑ k P(vk) log2 P(vk) .
Entropy of a fair coin flip
H(Fair ) = −(0.5 log2 0.5+0.5 log2 0.5) = 1 .
3
OMega TechEd
How to calculate Entropy?
H(V ) = − ∑ k P(vk) log2 P(vk) .
Example:
If we had a total 10 data points in our dataset with 3 belonging to positive
class and 7 belonging to negative class:
-3/10 * log2 (3/10) – 7/10 * log2 (7/10) ≈ 0.876
The Entropy is approximately 0.88 .
High entropy means low level of purity.
4
OMega TechEd
Entropy (Cont.)
Different cases
5
Entropy=0
Entropy=1 Entropy=0.88
If dataset contain equal no of positive and negative data points entropy is 1.
If dataset contain only positive or only negative data points entropy is 0.
OMega TechEd
Information Gain
Information gain is defined as the pattern observed in the dataset and
reduction in the entropy.
Mathematically, information gain can be expressed with the below formula:
Information Gain = (Entropy of parent node)-(Entropy of child node)
6
OMega TechEd
Decision tree using information gain
1. An attribute with the highest information gain from a set should be
selected as the parent (root) node.
2. Build child nodes for every value of attribute A.
3. Repeat iteratively until we finish constructing the whole tree.
7
OMega TechEd
Choosing the best attribute
We need a measure of “good” and “bad” for
attributes. One way to do is to compute the
information gain.
Example:
At the root node of the restaurant problem,
there are 6 True samples and 6 False
samples.
Entropy(Parent) =1
8
6 positive
6 negative
2 negative
4 positive 2 positive
4 negative
None
2
Some
4
Full
6
0 positive
0 negative
Patrons
OMega TechEd
Choosing the best attribute
E(Patrons=None) = 0
E(Patrons=Some) = 0
E(Patrons=Full) =
= -2/6 * log2 (2/6) – 4/6 * log2 (4/6)
= -1/3 * (-1.59) - 2/3 * (-0.59)
= 0.53+0.39 ≈ 0.92
Weighted average of entropy for each node
E(Patrons)=
2/12 * 0 + 4/12 * 0 + 6/12 * 0.92 = 0.46
E(Patrons) ≈ 0.46
9
6 positive
6 negative
2 negative
4 positive 2 positive
4 negative
None
2
Some
4
Full
6
0 positive
0 negative
Patrons
Choosing the best attribute
10
Information Gain = (Entropy of parent node)-(Entropy of child node)
IG= 1-0.46 ≈ 0.54
Gain(Patrons) ≈ 0.54
OMega TechEd
Choosing the best attribute
E(Type=French) = 1
E(Type=Italian) = 1
E(Type=Thai) = 1
E(Type=Burger) =1
Weighted average of entropy for each node
E(Type)=
[2/12 * 1 + 2/12 * 1 + 4/12 * 1 + 4/12 *1]
= 1
E(Type) ≈ 1
11
6 positive
6 negative
2 positive
2 negative
French
2
Italian
2
Thai
4
1 negative
1 positive
1 positive
1 negative
Burger
4
2 positive
2 negative
Type
OMega TechEd
Choosing the best attribute
12
Information Gain = (Entropy of parent node)-(Entropy of child node)
IG= 1 - 1 ≈ 0
Gain(Type) ≈ 0
Confirming that Patrons is a better attribute than Type. In fact, at the root Patrons gives
the highest information gain.
OMega TechEd
Thank you
Reference:
Artificial Intelligence: A Modern Approach, 3rd ed.
Stuart Russell and Peter Norvig OMega TechEd

More Related Content

PPTX
Id3 algorithm
PPTX
Decision tree
PPTX
ID3 ALGORITHM
PDF
Machine Learning Algorithm - Decision Trees
PPSX
Decision tree Using c4.5 Algorithm
PDF
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
PDF
EM Algorithm
PPTX
Genetic algorithms
Id3 algorithm
Decision tree
ID3 ALGORITHM
Machine Learning Algorithm - Decision Trees
Decision tree Using c4.5 Algorithm
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
EM Algorithm
Genetic algorithms

What's hot (20)

PPT
Divide and conquer algorithm
PPTX
Chapter 4 Classification
PPTX
Ai 8 puzzle problem
PPT
Iterative deepening search
PPTX
Knapsack Problem
PPTX
Merge sort algorithm
PPTX
Decision Trees
PPTX
Issues in DTL.pptx
PPT
Divide and Conquer
PPT
Divide and conquer
PPTX
Circle generation algorithm
PPTX
Unit 4-booth algorithm
PPT
Spanning trees
PPTX
Constraint satisfaction problems (csp)
PPTX
Introdution and designing a learning system
PDF
COMPILER DESIGN- Syntax Directed Translation
PPTX
Finite Automata in compiler design
PPT
01 knapsack using backtracking
PDF
Decision trees in Machine Learning
PPT
Algorithm And analysis Lecture 03& 04-time complexity.
Divide and conquer algorithm
Chapter 4 Classification
Ai 8 puzzle problem
Iterative deepening search
Knapsack Problem
Merge sort algorithm
Decision Trees
Issues in DTL.pptx
Divide and Conquer
Divide and conquer
Circle generation algorithm
Unit 4-booth algorithm
Spanning trees
Constraint satisfaction problems (csp)
Introdution and designing a learning system
COMPILER DESIGN- Syntax Directed Translation
Finite Automata in compiler design
01 knapsack using backtracking
Decision trees in Machine Learning
Algorithm And analysis Lecture 03& 04-time complexity.
Ad

Similar to Entropy and information gain in decision tree. (20)

PDF
Lessonweeeeeeeeeeeeeeeeeewwwwwwwwwwwwwwwwwwwww5.pdf
PDF
Machine Learning course Lecture number 5, InfoGain.pdf
PPTX
CS632_Lecture_15_updated.pptx
PDF
Decision Trees - The Machine Learning Magic Unveiled
PPT
Random Forest algorithm in Machine learning
PPTX
Data Science-entropy machine learning.pptx
PPT
1_MachineLearning.ppt
PPTX
Decision tree induction \ Decision Tree Algorithm with Example| Data science
PPTX
An algorithm for building
PPTX
Lecture4.pptx
PPTX
Decision Tree (1).pptx in data science, helpful to understand about the decsi...
PPT
002.decision trees
PPTX
Decision tree
PPTX
Reliability Decission tree, system reliability theory
PPT
Machine Learning
PPTX
03b-algorithm-data-mining-DTs-gain-ratio.pptx
PDF
ID3 Algorithm & ROC Analysis
PDF
Decision tree
Lessonweeeeeeeeeeeeeeeeeewwwwwwwwwwwwwwwwwwwww5.pdf
Machine Learning course Lecture number 5, InfoGain.pdf
CS632_Lecture_15_updated.pptx
Decision Trees - The Machine Learning Magic Unveiled
Random Forest algorithm in Machine learning
Data Science-entropy machine learning.pptx
1_MachineLearning.ppt
Decision tree induction \ Decision Tree Algorithm with Example| Data science
An algorithm for building
Lecture4.pptx
Decision Tree (1).pptx in data science, helpful to understand about the decsi...
002.decision trees
Decision tree
Reliability Decission tree, system reliability theory
Machine Learning
03b-algorithm-data-mining-DTs-gain-ratio.pptx
ID3 Algorithm & ROC Analysis
Decision tree
Ad

More from Megha Sharma (20)

PPTX
Designing Printed Circuit boards, Software Choices, The Design Process
PPTX
Manufacturing PCB, Etching board, milling board, Third party manufacturing, a...
PPTX
Business Model, make thing, sell thing, subscription, customization, Key Reso...
PPTX
Funding an IOT startup, Venture Capital, Government funding, Crowdfunding, Le...
PPTX
Sketch, Iterate and Explore, Nondigital Methods.
PPTX
CNC Milling, Software, Repurposing and Recycling.
PPTX
3D printing, Types of 3D printing: FDM, Laser Sintering, Powder bed, LOM, DLP.
PPTX
Laser Cutting, Choosing a laser cutter, Software, Hinges and joints.
PPTX
Memory management, Types of memory, Making the most of your RAM.
PPTX
Performance and Battery Life, Libraries, Debugging.
PPTX
Prototyping Embedded Devices: Arduino, Developing on the Arduino.
PPTX
Raspberry-Pi, Developing on Raspberry Pi, Difference between Arduino & Raspbe...
PPTX
Open Source versus Closed Source in IOT in IOT
PPTX
Why closed? Why Open? Mixing open and closed source
PPTX
Model Performance Metrics. Accuracy, Precision, Recall
PPTX
Graceful Degradation and Affordance in IOT
PPTX
Web thinking connected device, Small Pieces Loosely joined.
PPTX
Production & Mass Personalization, Changing Embedded Platform, Physical proto...
PPTX
Whose data is it anyways? Public vs Private data collection.
PPTX
Thinking about Prototyping: Sketching, Familiarity, Cost versus Ease of proto...
Designing Printed Circuit boards, Software Choices, The Design Process
Manufacturing PCB, Etching board, milling board, Third party manufacturing, a...
Business Model, make thing, sell thing, subscription, customization, Key Reso...
Funding an IOT startup, Venture Capital, Government funding, Crowdfunding, Le...
Sketch, Iterate and Explore, Nondigital Methods.
CNC Milling, Software, Repurposing and Recycling.
3D printing, Types of 3D printing: FDM, Laser Sintering, Powder bed, LOM, DLP.
Laser Cutting, Choosing a laser cutter, Software, Hinges and joints.
Memory management, Types of memory, Making the most of your RAM.
Performance and Battery Life, Libraries, Debugging.
Prototyping Embedded Devices: Arduino, Developing on the Arduino.
Raspberry-Pi, Developing on Raspberry Pi, Difference between Arduino & Raspbe...
Open Source versus Closed Source in IOT in IOT
Why closed? Why Open? Mixing open and closed source
Model Performance Metrics. Accuracy, Precision, Recall
Graceful Degradation and Affordance in IOT
Web thinking connected device, Small Pieces Loosely joined.
Production & Mass Personalization, Changing Embedded Platform, Physical proto...
Whose data is it anyways? Public vs Private data collection.
Thinking about Prototyping: Sketching, Familiarity, Cost versus Ease of proto...

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Approach and Philosophy of On baking technology
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
cuic standard and advanced reporting.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
NewMind AI Weekly Chronicles - August'25 Week I
Advanced methodologies resolving dimensionality complications for autism neur...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Empathic Computing: Creating Shared Understanding
Unlocking AI with Model Context Protocol (MCP)
Approach and Philosophy of On baking technology
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Digital-Transformation-Roadmap-for-Companies.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
cuic standard and advanced reporting.pdf
Big Data Technologies - Introduction.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MYSQL Presentation for SQL database connectivity
“AI and Expert System Decision Support & Business Intelligence Systems”
sap open course for s4hana steps from ECC to s4
Review of recent advances in non-invasive hemoglobin estimation
MIND Revenue Release Quarter 2 2025 Press Release
Diabetes mellitus diagnosis method based random forest with bat algorithm
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation_ Review paper, used for researhc scholars
NewMind AI Weekly Chronicles - August'25 Week I

Entropy and information gain in decision tree.

  • 1. Entropy and Information Gain in Decision Tree OMega TechEd
  • 2. Entropy Entropy is the machine learning metric that measures the unpredictability or impurity in the system. Entropy is the measurement of disorder or impurities in the information processed in machine learning. It determines how a decision tree chooses to split data. 2 High Entropy Low Entropy OMega TechEd
  • 3. Entropy A random variable with only one value, a coin that always comes up heads, has no uncertainty and thus its entropy is defined as zero; thus, we gain no information by observing its value. Entropy always lies between 0 and 1, however depending on the number of classes in the dataset, it can be greater than 1. In general, the entropy of a random variable V with values vk, each with probability P(vk), is defined as Entropy: H(V ) = − ∑ k P(vk) log2 P(vk) . Entropy of a fair coin flip H(Fair ) = −(0.5 log2 0.5+0.5 log2 0.5) = 1 . 3 OMega TechEd
  • 4. How to calculate Entropy? H(V ) = − ∑ k P(vk) log2 P(vk) . Example: If we had a total 10 data points in our dataset with 3 belonging to positive class and 7 belonging to negative class: -3/10 * log2 (3/10) – 7/10 * log2 (7/10) ≈ 0.876 The Entropy is approximately 0.88 . High entropy means low level of purity. 4 OMega TechEd
  • 5. Entropy (Cont.) Different cases 5 Entropy=0 Entropy=1 Entropy=0.88 If dataset contain equal no of positive and negative data points entropy is 1. If dataset contain only positive or only negative data points entropy is 0. OMega TechEd
  • 6. Information Gain Information gain is defined as the pattern observed in the dataset and reduction in the entropy. Mathematically, information gain can be expressed with the below formula: Information Gain = (Entropy of parent node)-(Entropy of child node) 6 OMega TechEd
  • 7. Decision tree using information gain 1. An attribute with the highest information gain from a set should be selected as the parent (root) node. 2. Build child nodes for every value of attribute A. 3. Repeat iteratively until we finish constructing the whole tree. 7 OMega TechEd
  • 8. Choosing the best attribute We need a measure of “good” and “bad” for attributes. One way to do is to compute the information gain. Example: At the root node of the restaurant problem, there are 6 True samples and 6 False samples. Entropy(Parent) =1 8 6 positive 6 negative 2 negative 4 positive 2 positive 4 negative None 2 Some 4 Full 6 0 positive 0 negative Patrons OMega TechEd
  • 9. Choosing the best attribute E(Patrons=None) = 0 E(Patrons=Some) = 0 E(Patrons=Full) = = -2/6 * log2 (2/6) – 4/6 * log2 (4/6) = -1/3 * (-1.59) - 2/3 * (-0.59) = 0.53+0.39 ≈ 0.92 Weighted average of entropy for each node E(Patrons)= 2/12 * 0 + 4/12 * 0 + 6/12 * 0.92 = 0.46 E(Patrons) ≈ 0.46 9 6 positive 6 negative 2 negative 4 positive 2 positive 4 negative None 2 Some 4 Full 6 0 positive 0 negative Patrons
  • 10. Choosing the best attribute 10 Information Gain = (Entropy of parent node)-(Entropy of child node) IG= 1-0.46 ≈ 0.54 Gain(Patrons) ≈ 0.54 OMega TechEd
  • 11. Choosing the best attribute E(Type=French) = 1 E(Type=Italian) = 1 E(Type=Thai) = 1 E(Type=Burger) =1 Weighted average of entropy for each node E(Type)= [2/12 * 1 + 2/12 * 1 + 4/12 * 1 + 4/12 *1] = 1 E(Type) ≈ 1 11 6 positive 6 negative 2 positive 2 negative French 2 Italian 2 Thai 4 1 negative 1 positive 1 positive 1 negative Burger 4 2 positive 2 negative Type OMega TechEd
  • 12. Choosing the best attribute 12 Information Gain = (Entropy of parent node)-(Entropy of child node) IG= 1 - 1 ≈ 0 Gain(Type) ≈ 0 Confirming that Patrons is a better attribute than Type. In fact, at the root Patrons gives the highest information gain. OMega TechEd
  • 13. Thank you Reference: Artificial Intelligence: A Modern Approach, 3rd ed. Stuart Russell and Peter Norvig OMega TechEd