SlideShare a Scribd company logo
3
Most read
6
Most read
11
Most read
Machine Learning
Submitted To
Neelam Ma’m
Assistance Prof.
SCRIET, Meerut
Submitted By
Ravindra Singh Kushwaha
B.Tech(IT) 8thsem
SCRIET, Meerut
Issues in Decision Tree Learning
Issues in Decision Tree Learning
• Overfitting
• Incorporating Continuous-valued attributes
• Attributes with many values
• Handling attributes with costs
• Handling examples with missing attribute values
Overfitting
• Consider a hypothesis h over
• Training data: errortrain(h)
• Entire distribution D of data: errorD(h)
• The hypothesis h ∈ H overfits training data if there is an
alternative hypothesis h’ ∈ H such that
• errortrain(h) <
errortrain(h’) AND
• errorD(h) > errorD(h’)
Overfitting in decision tree learning
Avoiding Overfitting
• Causes
1. This can happen when the training data contains errors or
noise.
2. small numbers of examples are associated with leaf nodes
• Avoiding Overfitting
1. Stop growing when data split not statistically significant
2. Grow full tree, then post-prune it.
• Selecting Best Tree
1. Measure performance over training data
2. Measure performance over separate validation data
Reduced-Error Pruning
• Split data into training and validation sets
• Do until further pruning is harmful
1. Evaluate impact of pruning each possible node on
validation set
2. Greedily remove the one that most improves the validation
set accuracy
Effect of Reduced-Error Pruning
Rule Post-Pruning
• The major drawback of Reduced-Error Pruning is when
the data is limited, validation set reduces even further
the number of examples for training.
Hence Rule Post-Pruning
• Convert tree to equivalent set of rules
• Prune each rule independently of others
• Sort final rules into desired sequence for use
Converting a tree to rules
IF (Outlook = Sunny) 𝖠 (Humidity = High)
THEN PlayTennis = No
IF (Outlook = Sunny) 𝖠 (Humidity = Normal)
THEN PlayTennis = Yes
Continuous Valued-Attributes
• Create a discrete-valued attribute to test continuous
• So if Temperature = 75
• We can infer that PlayTennis = Yes
Attributes with many values
• Problem:
• If attribute has many values, Gain will select any value
• Example – Using date attribute
• One approach – Gain Ratio
Where si is a subset of S which has value vi
Attributes with costs
• Problem:
• Medical diagnosis, BloodTest has cost $150
• Robotics, Width_from_1ft has cost 23 sec
• One Approach - replace gain
• Tan and Schlimmer (1990)
• Nunez (1988)
• where w ∈ [0, 1] is a constant that determines the relative importance of cost versus information
gain.
Examples with missing attribute values
• What if some examples missing values of attribute A?
• Use training examples anyway and sort through tree
• If node n tests A, Assign it the most common value among
the examples at node n
• Assign a probability pi to each possible value of A – vi and
assign fraction pi of example to each descendant in tree
Some of the latest Applications
Gesture Recognition
Motion Detection
Xbox 360 Kinect
Thank You

More Related Content

PPTX
Decision tree induction \ Decision Tree Algorithm with Example| Data science
PPTX
2. forward chaining and backward chaining
PPT
KNOWLEDGE REPRESENTATION ISSUES.ppt
PPTX
Probabilistic Reasoning
PDF
Dempster Shafer Theory AI CSE 8th Sem
PPTX
Fuzzy logic and application in AI
PPTX
Evaluating hypothesis
PPTX
VIRTUAL MEMORY
Decision tree induction \ Decision Tree Algorithm with Example| Data science
2. forward chaining and backward chaining
KNOWLEDGE REPRESENTATION ISSUES.ppt
Probabilistic Reasoning
Dempster Shafer Theory AI CSE 8th Sem
Fuzzy logic and application in AI
Evaluating hypothesis
VIRTUAL MEMORY

What's hot (20)

PPT
B trees dbms
PPT
Chapter 7 - Deadlocks
PPTX
Cache coherence ppt
PPTX
Informed and Uninformed search Strategies
PPTX
Association Analysis in Data Mining
PPTX
Knowledge representation In Artificial Intelligence
PPT
Data Structures- Part5 recursion
PPTX
Hill climbing algorithm
PPT
2.3 bayesian classification
PPT
Data Structure and Algorithms
PPTX
Decision tree in artificial intelligence
PPT
2.5 backpropagation
PDF
Triggers and Stored Procedures
PPT
Parallel processing
PPTX
Deadlock Avoidance in Operating System
PPT
Chapter 8 - Main Memory
PDF
First Order Logic resolution
PPTX
Unsupervised learning (clustering)
PPTX
Machine learning clustering
PPTX
Instruction codes
B trees dbms
Chapter 7 - Deadlocks
Cache coherence ppt
Informed and Uninformed search Strategies
Association Analysis in Data Mining
Knowledge representation In Artificial Intelligence
Data Structures- Part5 recursion
Hill climbing algorithm
2.3 bayesian classification
Data Structure and Algorithms
Decision tree in artificial intelligence
2.5 backpropagation
Triggers and Stored Procedures
Parallel processing
Deadlock Avoidance in Operating System
Chapter 8 - Main Memory
First Order Logic resolution
Unsupervised learning (clustering)
Machine learning clustering
Instruction codes
Ad

Similar to Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudhary Charan Singh University, Meerut (20)

PPTX
Issues in DTL.pptx
ODP
Machine Learning with Decision trees
PPT
Machine Learning 3 - Decision Tree Learning
PPTX
Lect9 Decision tree
PPTX
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
PDF
Supervised Learning Decision Trees Machine Learning
PDF
Supervised Learning Decision Trees Review of Entropy
PPT
Download presentation source
PPTX
MLF-2.pptx
PPTX
Decision Tree - C4.5&CART
DOCX
Classification Using Decision Trees and RulesChapter 5.docx
PDF
CSA 3702 machine learning module 2
PDF
Lecture 5 Decision tree.pdf
PDF
Decision tree learning
PPTX
83 learningdecisiontree
PPTX
Module 2_ Hyperparameters in Decision Tree Learning.pptx
PPTX
DecisionTree.pptx for btech cse student
PPT
Machine Learning
PDF
From decision trees to random forests
PPT
decisiontrees.ppt
Issues in DTL.pptx
Machine Learning with Decision trees
Machine Learning 3 - Decision Tree Learning
Lect9 Decision tree
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
Supervised Learning Decision Trees Machine Learning
Supervised Learning Decision Trees Review of Entropy
Download presentation source
MLF-2.pptx
Decision Tree - C4.5&CART
Classification Using Decision Trees and RulesChapter 5.docx
CSA 3702 machine learning module 2
Lecture 5 Decision tree.pdf
Decision tree learning
83 learningdecisiontree
Module 2_ Hyperparameters in Decision Tree Learning.pptx
DecisionTree.pptx for btech cse student
Machine Learning
From decision trees to random forests
decisiontrees.ppt
Ad

Recently uploaded (20)

PDF
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
PPT
Total quality management ppt for engineering students
PPTX
Software Engineering and software moduleing
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
communication and presentation skills 01
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PDF
Soil Improvement Techniques Note - Rabbi
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PPTX
Feature types and data preprocessing steps
PDF
Design Guidelines and solutions for Plastics parts
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
Total quality management ppt for engineering students
Software Engineering and software moduleing
Exploratory_Data_Analysis_Fundamentals.pdf
communication and presentation skills 01
Categorization of Factors Affecting Classification Algorithms Selection
Soil Improvement Techniques Note - Rabbi
R24 SURVEYING LAB MANUAL for civil enggi
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Feature types and data preprocessing steps
Design Guidelines and solutions for Plastics parts
Automation-in-Manufacturing-Chapter-Introduction.pdf
Fundamentals of safety and accident prevention -final (1).pptx

Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudhary Charan Singh University, Meerut

  • 1. Machine Learning Submitted To Neelam Ma’m Assistance Prof. SCRIET, Meerut Submitted By Ravindra Singh Kushwaha B.Tech(IT) 8thsem SCRIET, Meerut Issues in Decision Tree Learning
  • 2. Issues in Decision Tree Learning • Overfitting • Incorporating Continuous-valued attributes • Attributes with many values • Handling attributes with costs • Handling examples with missing attribute values
  • 3. Overfitting • Consider a hypothesis h over • Training data: errortrain(h) • Entire distribution D of data: errorD(h) • The hypothesis h ∈ H overfits training data if there is an alternative hypothesis h’ ∈ H such that • errortrain(h) < errortrain(h’) AND • errorD(h) > errorD(h’)
  • 4. Overfitting in decision tree learning
  • 5. Avoiding Overfitting • Causes 1. This can happen when the training data contains errors or noise. 2. small numbers of examples are associated with leaf nodes • Avoiding Overfitting 1. Stop growing when data split not statistically significant 2. Grow full tree, then post-prune it. • Selecting Best Tree 1. Measure performance over training data 2. Measure performance over separate validation data
  • 6. Reduced-Error Pruning • Split data into training and validation sets • Do until further pruning is harmful 1. Evaluate impact of pruning each possible node on validation set 2. Greedily remove the one that most improves the validation set accuracy
  • 8. Rule Post-Pruning • The major drawback of Reduced-Error Pruning is when the data is limited, validation set reduces even further the number of examples for training. Hence Rule Post-Pruning • Convert tree to equivalent set of rules • Prune each rule independently of others • Sort final rules into desired sequence for use
  • 9. Converting a tree to rules IF (Outlook = Sunny) 𝖠 (Humidity = High) THEN PlayTennis = No IF (Outlook = Sunny) 𝖠 (Humidity = Normal) THEN PlayTennis = Yes
  • 10. Continuous Valued-Attributes • Create a discrete-valued attribute to test continuous • So if Temperature = 75 • We can infer that PlayTennis = Yes
  • 11. Attributes with many values • Problem: • If attribute has many values, Gain will select any value • Example – Using date attribute • One approach – Gain Ratio Where si is a subset of S which has value vi
  • 12. Attributes with costs • Problem: • Medical diagnosis, BloodTest has cost $150 • Robotics, Width_from_1ft has cost 23 sec • One Approach - replace gain • Tan and Schlimmer (1990) • Nunez (1988) • where w ∈ [0, 1] is a constant that determines the relative importance of cost versus information gain.
  • 13. Examples with missing attribute values • What if some examples missing values of attribute A? • Use training examples anyway and sort through tree • If node n tests A, Assign it the most common value among the examples at node n • Assign a probability pi to each possible value of A – vi and assign fraction pi of example to each descendant in tree
  • 14. Some of the latest Applications Gesture Recognition Motion Detection Xbox 360 Kinect