SlideShare a Scribd company logo
Contributed by Yizhou Sun 2008 An Introduction to WEKA
Content What is WEKA? The Explorer: Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization References and Resources 04/26/10
What is WEKA? W aikato  E nvironment for  K nowledge  A nalysis It’s a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand. Weka is also a bird found only on the islands of New Zealand.  04/26/10
Download and Install WEKA Website:  http:// www.cs.waikato.ac.nz/~ml/weka/index.html Support multiple platforms (written in java): Windows, Mac OS X and Linux 04/26/10
Main Features 49 data preprocessing tools 76 classification/regression algorithms 8 clustering algorithms 3 algorithms for finding association rules 15 attribute/subset evaluators + 10 search algorithms for feature selection 04/26/10
Main GUI Three graphical user interfaces “ The Explorer” (exploratory data analysis) “ The Experimenter” (experimental environment) “ The KnowledgeFlow” (new process model inspired interface) 04/26/10
Content What is WEKA? The Explorer: Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization References and Resources 04/26/10
Explorer: pre-processing the data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, … 04/26/10
@relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... WEKA only deals with “flat” files 04/26/10 Flat file in ARFF format
@relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... WEKA only deals with “flat” files 04/26/10 numeric attribute nominal attribute
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
Explorer: building “classifiers” Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees  and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … 04/26/10
April 26, 2010 This follows an  example of Quinlan’s ID3 (Playing Tennis) Decision Tree Induction: Training Dataset
April 26, 2010 Output: A Decision Tree for “buys_computer” age? overcast student? credit rating? <=30 >40 no yes yes yes 31..40 no fair excellent yes no
Basic algorithm (a greedy algorithm) Tree is constructed in a  top-down recursive divide-and-conquer manner At start, all the training examples are at the root Attributes are categorical (if continuous-valued, they are discretized in advance) Examples are partitioned recursively based on selected attributes Test attributes are selected on the basis of a heuristic or statistical measure (e.g.,  information gain ) April 26, 2010 Algorithm for Decision Tree Induction
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
Explorer: clustering data WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: k -Means , EM, Cobweb,  X -means, FarthestFirst Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution 04/26/10
Given  k , the  k-means  algorithm is implemented in four steps: Partition objects into  k  nonempty subsets Compute seed points as the centroids of the clusters of the current partition (the centroid is the center, i.e.,  mean point , of the cluster) Assign each object to the cluster with the nearest seed point  Go back to Step 2, stop when no more new assignment April 26, 2010 The K-Means Clustering Method
Demo Now. (Demo Online)
Explorer: finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statistical dependencies between groups of attributes: milk, butter    bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence 04/26/10
Basic Concepts: Frequent Patterns itemset : A set of one or more items k-itemset  X = {x 1 , …, x k } (absolute) support , or,  support count  of X: Frequency or occurrence of an itemset X (relative)   support ,  s , is the fraction of transactions that contains X (i.e., the  probability  that a transaction contains X) An itemset X is  frequent  if X’s support is no less than a  minsup  threshold April 26, 2010 Customer buys diaper Customer buys both Customer buys beer Tid Items bought 10 Beer, Nuts, Diaper 20 Beer, Coffee, Diaper 30 Beer, Diaper, Eggs 40 Nuts, Eggs, Milk 50 Nuts, Coffee, Diaper, Eggs, Milk
Basic Concepts: Association Rules Find all the rules  X     Y  with minimum support and confidence support ,  s ,  probability  that a transaction contains X    Y confidence ,  c,   conditional probability  that a transaction having X also contains  Y Let  minsup = 50%, minconf = 50% Freq. Pat.:  Beer:3, Nuts:3, Diaper:4, Eggs:3, {Beer, Diaper}:3 April 26, 2010 Customer buys diaper Customer buys both Customer buys beer Nuts, Eggs, Milk 40 Nuts, Coffee, Diaper, Eggs, Milk 50 Beer, Diaper, Eggs 30 Beer, Coffee, Diaper 20 Beer, Nuts, Diaper 10 Items bought Tid Association rules: (many more!) Beer    Diaper  (60%, 100%) Diaper    Beer  (60%, 75%)
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
Explorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two 04/26/10
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
Explorer: data visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “ Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “ Zoom-in” function 04/26/10
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
04/26/10 University of Waikato
References and Resources References: WEKA website:  http://www.cs.waikato.ac.nz/~ml/weka/index.html WEKA Tutorial: Machine Learning with WEKA:  A  presentation  demonstrating all graphical user interfaces (GUI) in Weka.  A  presentation  which explains how to use Weka for exploratory data mining.  WEKA Data Mining Book: Ian H. Witten and Eibe Frank,  Data Mining: Practical Machine Learning Tools and Techniques (Second Edition) WEKA Wiki: http://weka.sourceforge.net/wiki/index.php/Main_Page Others: Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 2nd ed.

More Related Content

PPTX
Lect5 principal component analysis
PPTX
Bayesian Neural Networks
PPT
Weka presentation
PPTX
Classification Algorithm.
PPTX
Presentation on unsupervised learning
PPTX
Lecture #01
PPTX
Distributed Database Management System
PPTX
Data visualization with R
Lect5 principal component analysis
Bayesian Neural Networks
Weka presentation
Classification Algorithm.
Presentation on unsupervised learning
Lecture #01
Distributed Database Management System
Data visualization with R

What's hot (20)

PPTX
Data Wrangling
PPTX
Statistics for data science
PPTX
ML - Multiple Linear Regression
PDF
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
PPTX
Data Analytics and Business Intelligence
PPTX
Business Intelligence
PDF
K - Nearest neighbor ( KNN )
PPTX
Dag representation of basic blocks
PPTX
Singular Value Decomposition (SVD).pptx
PDF
Bias and variance trade off
PDF
Hadoop architecture-tutorial
PPTX
WEKA: The Knowledge Flow Interface
PPTX
Hashing In Data Structure
PPTX
Hyperparameter Tuning
PPTX
Naive Bayes Presentation
PDF
Data preprocessing using Machine Learning
PPTX
PPTX
Naive bayes
PPTX
Linear Regression and Logistic Regression in ML
PDF
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Data Wrangling
Statistics for data science
ML - Multiple Linear Regression
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Data Analytics and Business Intelligence
Business Intelligence
K - Nearest neighbor ( KNN )
Dag representation of basic blocks
Singular Value Decomposition (SVD).pptx
Bias and variance trade off
Hadoop architecture-tutorial
WEKA: The Knowledge Flow Interface
Hashing In Data Structure
Hyperparameter Tuning
Naive Bayes Presentation
Data preprocessing using Machine Learning
Naive bayes
Linear Regression and Logistic Regression in ML
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Ad

Similar to WEKA Tutorial (20)

PPT
Introduction to Weka and Preprocessing.ppt
PDF
wekapresentation-130107115704-phpapp02.pdf
PDF
Machine Learning with WEKA
PPTX
A simple introduction to weka
PPT
R1234_SRU data knowledge informations regarding
PPTX
WEKA Tutorial and Introduction Data mining
PPT
Data Mining with WEKA WEKA
PPT
Weka a tool_for_exploratory_data_mining
PDF
PPT
weka-tutorial-all.ppt
PPT
Data Mining Concepts
PPT
Data Mining Concepts 15061
PPT
Data Mining Concepts
PPT
data mining with weka application
PPTX
Unit 3.pptx
PPT
Computer notes - data structures
PPT
Weka toolkit introduction
PPT
Weka toolkit introduction
PDF
Frequent Itemset Minning and Association Rules
Introduction to Weka and Preprocessing.ppt
wekapresentation-130107115704-phpapp02.pdf
Machine Learning with WEKA
A simple introduction to weka
R1234_SRU data knowledge informations regarding
WEKA Tutorial and Introduction Data mining
Data Mining with WEKA WEKA
Weka a tool_for_exploratory_data_mining
weka-tutorial-all.ppt
Data Mining Concepts
Data Mining Concepts 15061
Data Mining Concepts
data mining with weka application
Unit 3.pptx
Computer notes - data structures
Weka toolkit introduction
Weka toolkit introduction
Frequent Itemset Minning and Association Rules
Ad

More from butest (20)

PDF
EL MODELO DE NEGOCIO DE YOUTUBE
DOC
1. MPEG I.B.P frame之不同
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPT
Timeline: The Life of Michael Jackson
DOCX
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPTX
Com 380, Summer II
PPT
PPT
DOCX
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
DOC
MICHAEL JACKSON.doc
PPTX
Social Networks: Twitter Facebook SL - Slide 1
PPT
Facebook
DOCX
Executive Summary Hare Chevrolet is a General Motors dealership ...
DOC
Welcome to the Dougherty County Public Library's Facebook and ...
DOC
NEWS ANNOUNCEMENT
DOC
C-2100 Ultra Zoom.doc
DOC
MAC Printing on ITS Printers.doc.doc
DOC
Mac OS X Guide.doc
DOC
hier
DOC
WEB DESIGN!
EL MODELO DE NEGOCIO DE YOUTUBE
1. MPEG I.B.P frame之不同
LESSONS FROM THE MICHAEL JACKSON TRIAL
Timeline: The Life of Michael Jackson
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
LESSONS FROM THE MICHAEL JACKSON TRIAL
Com 380, Summer II
PPT
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
MICHAEL JACKSON.doc
Social Networks: Twitter Facebook SL - Slide 1
Facebook
Executive Summary Hare Chevrolet is a General Motors dealership ...
Welcome to the Dougherty County Public Library's Facebook and ...
NEWS ANNOUNCEMENT
C-2100 Ultra Zoom.doc
MAC Printing on ITS Printers.doc.doc
Mac OS X Guide.doc
hier
WEB DESIGN!

WEKA Tutorial