SlideShare a Scribd company logo
Practical Machine Learning Tools and Techniques
Decision TreesDealing with numeric attributesStandard method: binary splits Steps to decide where to split: Evaluate info gain for every possible split point of attributeChoose “best” split pointBut this is computationally intensive
Decision TreesExampleSplit on temperature attribute:             64  65  68  69  70   71  72  72  75  75  80   81  83  85           Yes  No  Yes  YesYes  No  No  Yes  YesYes  No  Yes  Yes  Notemperature < 71.5: yes/4, no/2temperature > 71.5: yes/5, no/3Info([4,2],[5,3])= 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939 bits
Decision TreesDealing with missing values:Split instances with missing values into piecesA piece going down a branch receives a weight  proportional to the popularity of the branchweights sum to 1
Decision TreesPruning Making the decision tree less complex by removing cases of over fitting We have two types of pruning:Prepruning: Trying to decide during tree buildingPostpruning: Doing pruning after the tree has been constructedThe two types of postpruning thatare generally used are:Subtree replacement Subtree raising To decide whether to do postpruning or not, we calculate the error rate before and after the pruning
Decision TreesSubtree raising:
Decision TreesSubtree replacement
Classification rulesCriteria for choosing  tests:p/t ratioMaximizes the ratio of positive instances with stress on accuracyp[log(p/t) – log(p/t)]Maximizes the number of positive instances with lesser accuracy
Classification rulesGenerating good rules:We can remove over fitting by either pruning of trees during construction or after they have been fully constructedTo prune during construction we check each newly added test. If the error rate on the pruning set increases because of this new test, we remove it
Classification rulesObtaining rules from partial decision trees: Algorithm
Classification rules
Classification rulesAs the node 4 was not replaced, we stop at this stageNow each leaf node gives us a possible ruleChoose the leaf which covers the greatest number of instances
Extending linear modelsSupport vector machines:Support vector machines are algorithms for learning linear classifierThey use maximum marginal hyper plane: removes over fittingThe instances closest to the maximum marginal hyper plane are support vectors, rest all instances can be ignored
Extending linear models
Extending linear modelsSupport vector machines:The hyper plane can be written as:Support vector: All instances for which alpha(i) > 0b and alpha are determined using software packagesThe hyper plane can also be written using kernel as:
Extending linear modelsMultilayer perceptron:We can create a network of perceptron to approximate arbitrary target concepts Multilayer perceptron is an example of an artificial neural networkConsists of: input layer, hidden layer(s), and output layer  Structure of MLP is usually found by experimentationParameters can be found using backpropagation
Extending linear modelsExamples:
Extending linear modelsBack propagation:f(x) = 1/(1+exp(-x))Error = ½(y-f(x))^2So we try to minimize the error and get:Now just calculate the above expression for all training instances and do:      w(i) = w(i) – L(dE/dw)We assume values of w in the starting
ClusteringIncremental clustering: StepsTree consists of empty root nodeAdd instances one by oneUpdate tree at appropriately at each stage To update, find the right leaf for an instance May involve restructuring the treeRestructuring: Merging and ReplacementDecisions are made using category utility
ClusteringExample of incremental clustering:
EM AlgorithmEM = Expectation­Maximization Generalize k­means to probabilistic settingIterative procedure:E “expectation” step:     Calculate cluster probability for each instance M “maximization” step:     Estimate distribution parameters from cluster      probabilitiesStore cluster probabilities as instance weightsStop when improvement is negligible
Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net

More Related Content

PPTX
WEKA: Data Mining Input Concepts Instances And Attributes
PPTX
WEKA: Output Knowledge Representation
PPTX
WEKA: Credibility Evaluating Whats Been Learned
PPTX
WEKA: Algorithms The Basic Methods
PPTX
WEKA: Practical Machine Learning Tools And Techniques
PDF
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
PPT
Decision tree and random forest
PPTX
Random forest
WEKA: Data Mining Input Concepts Instances And Attributes
WEKA: Output Knowledge Representation
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Algorithms The Basic Methods
WEKA: Practical Machine Learning Tools And Techniques
Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman
Decision tree and random forest
Random forest

What's hot (20)

PPTX
Random forest
PPTX
Machine learning session9(clustering)
PPTX
ID3 ALGORITHM
PPT
Decision tree
PDF
From decision trees to random forests
PPTX
Random forest algorithm
PPTX
Classification with Naive Bayes
PPTX
Machine learning session8(svm nlp)
PPT
Covering (Rules-based) Algorithm
PDF
Random Forest / Bootstrap Aggregation
PPTX
Id3 algorithm
PPTX
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
PDF
Understanding the Machine Learning Algorithms
PDF
Data Science - Part IX - Support Vector Machine
PPTX
Decision Trees for Classification: A Machine Learning Algorithm
PPTX
Support Vector Machine (SVM)
PPTX
Classification Continued
PDF
Classification Based Machine Learning Algorithms
PPTX
Preparing your data for Machine Learning with Feature Scaling
PPT
Machine Learning 3 - Decision Tree Learning
Random forest
Machine learning session9(clustering)
ID3 ALGORITHM
Decision tree
From decision trees to random forests
Random forest algorithm
Classification with Naive Bayes
Machine learning session8(svm nlp)
Covering (Rules-based) Algorithm
Random Forest / Bootstrap Aggregation
Id3 algorithm
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Understanding the Machine Learning Algorithms
Data Science - Part IX - Support Vector Machine
Decision Trees for Classification: A Machine Learning Algorithm
Support Vector Machine (SVM)
Classification Continued
Classification Based Machine Learning Algorithms
Preparing your data for Machine Learning with Feature Scaling
Machine Learning 3 - Decision Tree Learning
Ad

Similar to WEKA:Practical Machine Learning Tools And Techniques (20)

PPTX
Machine learning session6(decision trees random forrest)
PPTX
Machine Learning with Python unit-2.pptx
PDF
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
PPTX
Unit 2-ML.pptx
PPTX
Decision Tree - C4.5&CART
PPTX
Cubesat challenge considerations deep dive
PPTX
Decision Tree.pptx
PDF
Two methods for optimising cognitive model parameters
PDF
Decision tree learning
PDF
Machine Learning Algorithm - Decision Trees
PPTX
17- Kernels and Clustering.pptx
PPTX
Machine Learning Techniques - Linear Model.pptx
PDF
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
PPT
eam2
PDF
CSA 3702 machine learning module 2
PDF
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
PPTX
Random Forest Decision Tree.pptx
PPT
Chapter 17
PPTX
Machine learning and decision trees
PDF
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
Machine learning session6(decision trees random forrest)
Machine Learning with Python unit-2.pptx
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Unit 2-ML.pptx
Decision Tree - C4.5&CART
Cubesat challenge considerations deep dive
Decision Tree.pptx
Two methods for optimising cognitive model parameters
Decision tree learning
Machine Learning Algorithm - Decision Trees
17- Kernels and Clustering.pptx
Machine Learning Techniques - Linear Model.pptx
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
eam2
CSA 3702 machine learning module 2
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Random Forest Decision Tree.pptx
Chapter 17
Machine learning and decision trees
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
Ad

More from weka Content (10)

PPTX
WEKA: The Knowledge Flow Interface
PPTX
WEKA:The Command Line Interface
PPTX
WEKA:The Experimenter
PPTX
WEKA:The Explorer
PPTX
WEKA:Output Knowledge Representation
PPTX
WEKA:Algorithms The Basic Methods
PPTX
WEKA:Credibility Evaluating Whats Been Learned
PPTX
WEKA:Data Mining Input Concepts Instances And Attributes
PPTX
WEKA:Introduction To Weka
PPT
An Introduction To Weka
WEKA: The Knowledge Flow Interface
WEKA:The Command Line Interface
WEKA:The Experimenter
WEKA:The Explorer
WEKA:Output Knowledge Representation
WEKA:Algorithms The Basic Methods
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Data Mining Input Concepts Instances And Attributes
WEKA:Introduction To Weka
An Introduction To Weka

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Modernizing your data center with Dell and AMD
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Cloud computing and distributed systems.
PDF
Review of recent advances in non-invasive hemoglobin estimation
20250228 LYD VKU AI Blended-Learning.pptx
MYSQL Presentation for SQL database connectivity
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Big Data Technologies - Introduction.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Modernizing your data center with Dell and AMD
The AUB Centre for AI in Media Proposal.docx
Advanced methodologies resolving dimensionality complications for autism neur...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Empathic Computing: Creating Shared Understanding
Diabetes mellitus diagnosis method based random forest with bat algorithm
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Machine learning based COVID-19 study performance prediction
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Cloud computing and distributed systems.
Review of recent advances in non-invasive hemoglobin estimation

WEKA:Practical Machine Learning Tools And Techniques

  • 2. Decision TreesDealing with numeric attributesStandard method: binary splits Steps to decide where to split: Evaluate info gain for every possible split point of attributeChoose “best” split pointBut this is computationally intensive
  • 3. Decision TreesExampleSplit on temperature attribute: 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes YesYes No No Yes YesYes No Yes Yes Notemperature < 71.5: yes/4, no/2temperature > 71.5: yes/5, no/3Info([4,2],[5,3])= 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939 bits
  • 4. Decision TreesDealing with missing values:Split instances with missing values into piecesA piece going down a branch receives a weight  proportional to the popularity of the branchweights sum to 1
  • 5. Decision TreesPruning Making the decision tree less complex by removing cases of over fitting We have two types of pruning:Prepruning: Trying to decide during tree buildingPostpruning: Doing pruning after the tree has been constructedThe two types of postpruning thatare generally used are:Subtree replacement Subtree raising To decide whether to do postpruning or not, we calculate the error rate before and after the pruning
  • 8. Classification rulesCriteria for choosing tests:p/t ratioMaximizes the ratio of positive instances with stress on accuracyp[log(p/t) – log(p/t)]Maximizes the number of positive instances with lesser accuracy
  • 9. Classification rulesGenerating good rules:We can remove over fitting by either pruning of trees during construction or after they have been fully constructedTo prune during construction we check each newly added test. If the error rate on the pruning set increases because of this new test, we remove it
  • 10. Classification rulesObtaining rules from partial decision trees: Algorithm
  • 12. Classification rulesAs the node 4 was not replaced, we stop at this stageNow each leaf node gives us a possible ruleChoose the leaf which covers the greatest number of instances
  • 13. Extending linear modelsSupport vector machines:Support vector machines are algorithms for learning linear classifierThey use maximum marginal hyper plane: removes over fittingThe instances closest to the maximum marginal hyper plane are support vectors, rest all instances can be ignored
  • 15. Extending linear modelsSupport vector machines:The hyper plane can be written as:Support vector: All instances for which alpha(i) > 0b and alpha are determined using software packagesThe hyper plane can also be written using kernel as:
  • 16. Extending linear modelsMultilayer perceptron:We can create a network of perceptron to approximate arbitrary target concepts Multilayer perceptron is an example of an artificial neural networkConsists of: input layer, hidden layer(s), and output layer  Structure of MLP is usually found by experimentationParameters can be found using backpropagation
  • 18. Extending linear modelsBack propagation:f(x) = 1/(1+exp(-x))Error = ½(y-f(x))^2So we try to minimize the error and get:Now just calculate the above expression for all training instances and do: w(i) = w(i) – L(dE/dw)We assume values of w in the starting
  • 19. ClusteringIncremental clustering: StepsTree consists of empty root nodeAdd instances one by oneUpdate tree at appropriately at each stage To update, find the right leaf for an instance May involve restructuring the treeRestructuring: Merging and ReplacementDecisions are made using category utility
  • 21. EM AlgorithmEM = Expectation­Maximization Generalize k­means to probabilistic settingIterative procedure:E “expectation” step: Calculate cluster probability for each instance M “maximization” step: Estimate distribution parameters from cluster  probabilitiesStore cluster probabilities as instance weightsStop when improvement is negligible
  • 22. Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net