WEKA:Practical Machine Learning Tools And Techniques

Practical Machine Learning Tools and Techniques

Decision TreesDealing with numeric attributesStandard method: binary splits Steps to decide where to split: Evaluate info gain for every possible split point of attributeChoose “best” split pointBut this is computationally intensive

Decision TreesExampleSplit on temperature attribute: 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes YesYes No No Yes YesYes No Yes Yes Notemperature < 71.5: yes/4, no/2temperature > 71.5: yes/5, no/3Info([4,2],[5,3])= 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939 bits

Decision TreesDealing with missing values:Split instances with missing values into piecesA piece going down a branch receives a weight proportional to the popularity of the branchweights sum to 1

Decision TreesPruning Making the decision tree less complex by removing cases of over fitting We have two types of pruning:Prepruning: Trying to decide during tree buildingPostpruning: Doing pruning after the tree has been constructedThe two types of postpruning thatare generally used are:Subtree replacement Subtree raising To decide whether to do postpruning or not, we calculate the error rate before and after the pruning

Decision TreesSubtree raising:

Decision TreesSubtree replacement

Classification rulesCriteria for choosing tests:p/t ratioMaximizes the ratio of positive instances with stress on accuracyp[log(p/t) – log(p/t)]Maximizes the number of positive instances with lesser accuracy

Classification rulesGenerating good rules:We can remove over fitting by either pruning of trees during construction or after they have been fully constructedTo prune during construction we check each newly added test. If the error rate on the pruning set increases because of this new test, we remove it

Classification rulesObtaining rules from partial decision trees: Algorithm

Classification rulesAs the node 4 was not replaced, we stop at this stageNow each leaf node gives us a possible ruleChoose the leaf which covers the greatest number of instances

Extending linear modelsSupport vector machines:Support vector machines are algorithms for learning linear classifierThey use maximum marginal hyper plane: removes over fittingThe instances closest to the maximum marginal hyper plane are support vectors, rest all instances can be ignored

Extending linear modelsSupport vector machines:The hyper plane can be written as:Support vector: All instances for which alpha(i) > 0b and alpha are determined using software packagesThe hyper plane can also be written using kernel as:

Extending linear modelsMultilayer perceptron:We can create a network of perceptron to approximate arbitrary target concepts Multilayer perceptron is an example of an artificial neural networkConsists of: input layer, hidden layer(s), and output layer Structure of MLP is usually found by experimentationParameters can be found using backpropagation

Extending linear modelsExamples:

Extending linear modelsBack propagation:f(x) = 1/(1+exp(-x))Error = ½(y-f(x))^2So we try to minimize the error and get:Now just calculate the above expression for all training instances and do: w(i) = w(i) – L(dE/dw)We assume values of w in the starting

ClusteringIncremental clustering: StepsTree consists of empty root nodeAdd instances one by oneUpdate tree at appropriately at each stage To update, find the right leaf for an instance May involve restructuring the treeRestructuring: Merging and ReplacementDecisions are made using category utility

ClusteringExample of incremental clustering:

EM AlgorithmEM = ExpectationMaximization Generalize kmeans to probabilistic settingIterative procedure:E “expectation” step: Calculate cluster probability for each instance M “maximization” step: Estimate distribution parameters from cluster probabilitiesStore cluster probabilities as instance weightsStop when improvement is negligible

Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net

WEKA:Practical Machine Learning Tools And Techniques

More Related Content

What's hot (20)

Similar to WEKA:Practical Machine Learning Tools And Techniques (20)

More from weka Content (10)

Recently uploaded (20)

WEKA:Practical Machine Learning Tools And Techniques