SlideShare a Scribd company logo
Incremental Learning using WEKA
CS267: Data Mining Presentation
Guided By: Dr. Tran
- Rohit Vobbilisetty
 WEKA - Definition
 Incremental Learning – Definition
 Incremental Learning in WEKA
 Steps to train an UpdateableClassifier
 Stochastic Gradient Descent
 Sample Code, Result and Demo
Overview
 Weka (Waikato Environment for Knowledge Analysis)
is a collection of machine learning algorithms for data
mining tasks.
 Weka 3.7 (Developer version)
What is WEKA ?
 Train the Model for each Instance within the dataset
 Suitable when dealing with large datasets, which do not fit
into the computer’s memory.
Incremental Learning
Definition and Need
 Applicable to Models implementing the interface:
weka.classifiers.UpdateableClassifier
(http://weka.sourceforge.net/doc.dev/weka/classifiers/UpdateableClas
sifier.html)
 Models implementing this interface:
HoeffdingTree, Ibk, KStar , LWL,
MultiClassClassifierUpdateable, NaiveBayesMultinomialText,
NaiveBayesMultinomialUpdateable, NaiveBayesUpdateable,
SGD, SGDText
Incremental Learning - Weka
 Initialize an object of ArffLoader.
 Retrieve this object’s structure and set it’s class index
(The feature that needs to be predicted –
setClassIndex() ).
 Iteratively retrieve an instance from the training set
and update the classifier ( updateClassifier() ).
 Evaluate the trained model against the test dataset.
Step to train an
UpdateableClassifier()
 Stochastic gradient descent is a gradient descent
optimization method for minimizing an objective
function that is written as a sum of differentiable
functions.
 Applicable to large datasets, since each iteration
involves processing only a single instance of the
training dataset.
Stochastic Gradient Descent
w: Parameter to be estimated.
Qi(w): A single instance of data
 Name: vote.arff ( 17 features )
 Features:
 Class Name: 2 (democrat, republican)
 handicapped-infants: 2 (y,n)
 water-project-cost-sharing: 2 (y,n)
 adoption-of-the-budget-resolution: 2 (y,n)
 physician-fee-freeze: 2 (y,n)
 el-salvador-aid: 2 (y,n)
 religious-groups-in-schools: 2 (y,n)
 anti-satellite-test-ban: 2 (y,n)
 aid-to-nicaraguan-contras: 2 (y,n)
 mx-missile: 2 (y,n)
 immigration: 2 (y,n)
 synfuels-corporation-cutback: 2 (y,n)
 education-spending: 2 (y,n)
 superfund-right-to-sue: 2 (y,n)
 crime: 2 (y,n)
 duty-free-exports: 2 (y,n)
 export-administration-act-south-africa: 2 (y,n)
Sample DataSet Description
ArffLoader loader = new ArffLoader();
loader.setFile(new File(“Training File Path”));
Instances structure = loader.getStructure();
SGD classifier = new SGD(); // Configure the classifier
classifier.setEpochs(500);
classifier.setEpsilon(0.001);
// Required if dealing with binary class
classifier.setLossFunction(new SelectedTag(SGD.HINGE, SGD.TAGS_SELECTION));
structure.setClassIndex(16); // Set the feature to be predicted
classifier.buildClassifier(structure);
Instance current;
// Incrementally update the Classifier
while ((current = loader.getNextInstance(structure)) != null) {
((UpdateableClassifier)classifier).updateClassifier(current);
}
Sample Code - SGD
Class =
-0.26 handicapped-infants
+ -0.09 water-project-cost-sharing
+ -0.51 adoption-of-the-budget-resolution
+ 0.73 physician-fee-freeze
+ 0.33 el-salvador-aid
+ 0.04 religious-groups-in-schools
+ -0.14 anti-satellite-test-ban
+ -0.33 aid-to-nicaraguan-contras
+ -0.28 mx-missile
+ 0.1 immigration
+ -0.37 synfuels-corporation-cutback
+ 0.33 education-spending
+ 0.15 superfund-right-to-sue
+ 0.18 crime
+ -0.25 duty-free-exports
+ 0.02 export-administration-act-south-africa
- 0.11
Sample Output
Correctly Classified Instances 401 92.1839 %
Incorrectly Classified Instances 34 7.8161 %
Kappa statistic 0.838
Mean absolute error 0.0782
Root mean squared error 0.2796
Relative absolute error 16.482 %
Root relative squared error 57.4214 %
Coverage of cases (0.95 level) 92.1839 %
Mean rel. region size (0.95 level) 50 %
Total Number of Instances 435
Confusion Matrix:
242.0 25.0
9.0 159.0
 SGD class does not support Numeric data types,
unless it is configured to use Huber Loss or Square
Loss.
 The learning rate should not be too small (Slow
process) or large (Overshoot the minimum).
 Some errors had to be resolved by consulting the
WEKA Java code.
Challenges Faced
 Wikipedia:
http://en.wikipedia.org/wiki/Stochastic_gradient_desc
ent
 Weka Wiki
http://weka.wikispaces.com/Use+Weka+in+your+Java
+code
References
Thank You

More Related Content

PDF
Real Time Storage Configuration Using PERC9 on Dell 13th Generation PowerEdge...
PPTX
Central retinal artery occlusion
PDF
Wekatutorial
PPTX
Fast Data:The Rebirth of Streaming Analytics
PDF
Functional Scala
PDF
Comparing Incremental Learning Strategies for Convolutional Neural Networks
PDF
HAIS09-BeyondHomemadeArtificialDatasets
PDF
Lecture8 - From CBR to IBk
Real Time Storage Configuration Using PERC9 on Dell 13th Generation PowerEdge...
Central retinal artery occlusion
Wekatutorial
Fast Data:The Rebirth of Streaming Analytics
Functional Scala
Comparing Incremental Learning Strategies for Convolutional Neural Networks
HAIS09-BeyondHomemadeArtificialDatasets
Lecture8 - From CBR to IBk

Similar to Incremental Learning using WEKA (20)

PPTX
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
PDF
Weka project - Classification & Association Rule Generation
PDF
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
PPTX
Weka library, JAVA
PDF
Data Profiling in Apache Calcite
PPTX
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
PDF
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
PDF
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
PPTX
Training course lect3
PPTX
Unlocking Your Hadoop Data with Apache Spark and CDH5
PDF
Javascript & SQL within database management system
DOCX
First fare 2010 java-beta-2011
PPTX
Productionalizing spark streaming applications
PDF
Towards a Unified Data Analytics Optimizer with Yanlei Diao
PPTX
Example R usage for oracle DBA UKOUG 2013
PDF
Data Secrets From a Platform Engineer (Bilbro)
PPT
Clustering
PDF
Fast Distributed Online Classification
PDF
Declarative benchmarking of cassandra and it's data models
DOCX
Prashant Kumar
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
Weka project - Classification & Association Rule Generation
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Weka library, JAVA
Data Profiling in Apache Calcite
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Training course lect3
Unlocking Your Hadoop Data with Apache Spark and CDH5
Javascript & SQL within database management system
First fare 2010 java-beta-2011
Productionalizing spark streaming applications
Towards a Unified Data Analytics Optimizer with Yanlei Diao
Example R usage for oracle DBA UKOUG 2013
Data Secrets From a Platform Engineer (Bilbro)
Clustering
Fast Distributed Online Classification
Declarative benchmarking of cassandra and it's data models
Prashant Kumar
Ad

Recently uploaded (20)

PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
web development for engineering and engineering
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Lecture Notes Electrical Wiring System Components
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
Geodesy 1.pptx...............................................
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPT
Project quality management in manufacturing
PPTX
Construction Project Organization Group 2.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
Automation-in-Manufacturing-Chapter-Introduction.pdf
web development for engineering and engineering
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
UNIT 4 Total Quality Management .pptx
Lecture Notes Electrical Wiring System Components
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Geodesy 1.pptx...............................................
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
bas. eng. economics group 4 presentation 1.pptx
CH1 Production IntroductoryConcepts.pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Project quality management in manufacturing
Construction Project Organization Group 2.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
additive manufacturing of ss316l using mig welding
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Foundation to blockchain - A guide to Blockchain Tech
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Ad

Incremental Learning using WEKA

  • 1. Incremental Learning using WEKA CS267: Data Mining Presentation Guided By: Dr. Tran - Rohit Vobbilisetty
  • 2.  WEKA - Definition  Incremental Learning – Definition  Incremental Learning in WEKA  Steps to train an UpdateableClassifier  Stochastic Gradient Descent  Sample Code, Result and Demo Overview
  • 3.  Weka (Waikato Environment for Knowledge Analysis) is a collection of machine learning algorithms for data mining tasks.  Weka 3.7 (Developer version) What is WEKA ?
  • 4.  Train the Model for each Instance within the dataset  Suitable when dealing with large datasets, which do not fit into the computer’s memory. Incremental Learning Definition and Need
  • 5.  Applicable to Models implementing the interface: weka.classifiers.UpdateableClassifier (http://weka.sourceforge.net/doc.dev/weka/classifiers/UpdateableClas sifier.html)  Models implementing this interface: HoeffdingTree, Ibk, KStar , LWL, MultiClassClassifierUpdateable, NaiveBayesMultinomialText, NaiveBayesMultinomialUpdateable, NaiveBayesUpdateable, SGD, SGDText Incremental Learning - Weka
  • 6.  Initialize an object of ArffLoader.  Retrieve this object’s structure and set it’s class index (The feature that needs to be predicted – setClassIndex() ).  Iteratively retrieve an instance from the training set and update the classifier ( updateClassifier() ).  Evaluate the trained model against the test dataset. Step to train an UpdateableClassifier()
  • 7.  Stochastic gradient descent is a gradient descent optimization method for minimizing an objective function that is written as a sum of differentiable functions.  Applicable to large datasets, since each iteration involves processing only a single instance of the training dataset. Stochastic Gradient Descent w: Parameter to be estimated. Qi(w): A single instance of data
  • 8.  Name: vote.arff ( 17 features )  Features:  Class Name: 2 (democrat, republican)  handicapped-infants: 2 (y,n)  water-project-cost-sharing: 2 (y,n)  adoption-of-the-budget-resolution: 2 (y,n)  physician-fee-freeze: 2 (y,n)  el-salvador-aid: 2 (y,n)  religious-groups-in-schools: 2 (y,n)  anti-satellite-test-ban: 2 (y,n)  aid-to-nicaraguan-contras: 2 (y,n)  mx-missile: 2 (y,n)  immigration: 2 (y,n)  synfuels-corporation-cutback: 2 (y,n)  education-spending: 2 (y,n)  superfund-right-to-sue: 2 (y,n)  crime: 2 (y,n)  duty-free-exports: 2 (y,n)  export-administration-act-south-africa: 2 (y,n) Sample DataSet Description
  • 9. ArffLoader loader = new ArffLoader(); loader.setFile(new File(“Training File Path”)); Instances structure = loader.getStructure(); SGD classifier = new SGD(); // Configure the classifier classifier.setEpochs(500); classifier.setEpsilon(0.001); // Required if dealing with binary class classifier.setLossFunction(new SelectedTag(SGD.HINGE, SGD.TAGS_SELECTION)); structure.setClassIndex(16); // Set the feature to be predicted classifier.buildClassifier(structure); Instance current; // Incrementally update the Classifier while ((current = loader.getNextInstance(structure)) != null) { ((UpdateableClassifier)classifier).updateClassifier(current); } Sample Code - SGD
  • 10. Class = -0.26 handicapped-infants + -0.09 water-project-cost-sharing + -0.51 adoption-of-the-budget-resolution + 0.73 physician-fee-freeze + 0.33 el-salvador-aid + 0.04 religious-groups-in-schools + -0.14 anti-satellite-test-ban + -0.33 aid-to-nicaraguan-contras + -0.28 mx-missile + 0.1 immigration + -0.37 synfuels-corporation-cutback + 0.33 education-spending + 0.15 superfund-right-to-sue + 0.18 crime + -0.25 duty-free-exports + 0.02 export-administration-act-south-africa - 0.11 Sample Output Correctly Classified Instances 401 92.1839 % Incorrectly Classified Instances 34 7.8161 % Kappa statistic 0.838 Mean absolute error 0.0782 Root mean squared error 0.2796 Relative absolute error 16.482 % Root relative squared error 57.4214 % Coverage of cases (0.95 level) 92.1839 % Mean rel. region size (0.95 level) 50 % Total Number of Instances 435 Confusion Matrix: 242.0 25.0 9.0 159.0
  • 11.  SGD class does not support Numeric data types, unless it is configured to use Huber Loss or Square Loss.  The learning rate should not be too small (Slow process) or large (Overshoot the minimum).  Some errors had to be resolved by consulting the WEKA Java code. Challenges Faced
  • 12.  Wikipedia: http://en.wikipedia.org/wiki/Stochastic_gradient_desc ent  Weka Wiki http://weka.wikispaces.com/Use+Weka+in+your+Java +code References