SlideShare a Scribd company logo
WEKA
BY: Keshab Kumar Gaurav
(ISSA, DRDO)
INTRODUCTION TO WEKA
 A collection of open source of many data
mining and machine learning algorithms,
Including
> Pre-processing on data
> Classification
> Clustering
> Association rule extraction
>3D Visualize
 Developed by researchers at the University
of Waikato in New Zealand
 Pure Java based (also open source).
Weka Main Features
 71 data pre-processing tools
 52 classification/regression algorithms
 7 clustering algorithms
 9 attribute/subset evaluators + 3 search
algorithms for feature selection.
 3 algorithms for finding association rules
 3 graphical user interfaces
“The Explorer”
“The Experimenter”
“The Knowledge Flow”
Weka : Download and Installation
 Download Weka (the stable version) from
http://www.cs.waikato.ac.nz/ml/weka/
– Choose a self-extracting executable (including
Java VM)
 After download is completed, run the self
extracting file to install Weka, and use the
default set-ups.
GOAL
The programs aims to build a state-of-the-art
facility for developing techniques for machine
learning and investigating their application in
key areas of machine learning.
Specifically we will create a workbench for
machine learning. Determine the factors that
contributes towards its successful application in
the agriculture, industries, scientific research
and developing new method for machine
learning and ways of accessing their
effectiveness.
Start Weka
From windows desktop
– click “Start”, choose “All programs”
– Choose “Weka 3.7.9” to start Weka
Then the first interface window appears:
Weka GUI Chooser
WEKA APPLICATION
INTERFACES
 Explorer
– Environment for exploring data with WEKA. It gives
access to all the facilities using menu selection and
form filling.
 Experimenter
– It can be used to get the answer for a question: Which
methods and parameter values work best for the given
problem?
 Knowledge Flow
– Same function as explorer. Supports incremental
learning. It allows designing configurations for
streamed data processing. Incremental algorithms can
be used to process very large datasets.
 Simple CLI
– It provides a simple Command Line Interface for
directly executing WEKA commands.
WEKA Application Interface
WEKA FUNCTIONS AND
TOOLS
 Preprocessing Filters
 Attribute selection
 Classification/Regression
 Clustering
 Association discovery
 Visualization
LOAD DATA FILE AND
PREPROCESSING
 Load data file in formats: ARFF, CSV,
C4.5,binary
 Import from URL or SQL database (using
JDBC)
 Preprocessing filters
o Adding/removing attributes
o Attribute value substitution
o Discretization
o Time series filters (delta, shift)
o Sampling, randomization
o Missing value management
o Normalization and other numeric
transformations.
WEKA DATA FORMATS
FOUR FORMATS
– ARFF (Attribute Relation File Format) has two sections
• The Header information defines attribute name, type and
relations.
• The Data section lists the data records.
– CSV: Comma Separated Values (text file)
– C4.5: A format used by a decision induction algorithm C4.5,
requires two separated files
• Name file: defines the names of the attributes
• Date file: lists the records (samples)
– Binary
– Data can also be read from a URL or from an SQL database
(using JDBC).
ATTRIBUTE RELATION FILE FORMAT (arff)
An ARFF file consists of two distinct sections
• The Header section defines attribute name, type and
relations, start with a keyword.
@Relation <data-name>
@attribute <attribute-name> <type> or {range}
• The Data section lists the data records, starts with
@Data list of data instances
Example
WEKA SYSTEM HIERARCHY
Weka : A machine learning algorithms for data mining
Role of WEKA
INPUT
Raw data
Data Ming by WEKA
•Pre-processing
•Classification
•Regression
•Clustering
•Association Rules
•Visualization
OUTPUT
Result
KDD Process of WEKA
Data
Knowledge
Selection
Preprocessing
Transformation
Data Mining
Interpretation
Evaluation
CLASSIFICATION
 Predicted target must be categorical
 Implemented methods
 decision trees(J48) and rules
 Naive Bayes
 neural networks
 instance-based classifier
 Evaluation methods
 test data set
 cross validation
 (Example)
Weka : A machine learning algorithms for data mining
Weka : A machine learning algorithms for data mining
CLUSTERING
 Clustering allows a user to make groups of data to
determine patterns from the data.
 Clustering has its advantages when the data set is
defined and a general pattern needs to be
determined from the data.
 We can create a specific number of groups,
depending on your business needs.
 One defining benefit of clustering over classification
is that every attribute in the data set will be used to
analyze the data. (where as in the classification
method, only a subset of the attributes are used in
the model.)
Clustering SimpleKMeans
ASSOCIATION
There are few association rules algorithms
implemented in WEKA. They try to find
associations between different attributes instead
of trying to predict the value of the class
attribute.
Association Rules (A=>B)
3D Visualising
Conclusion
The overall goal of Weka is to build a state-
of-the-art facility for developing machine
learning (ML) techniques and allow people to
apply them to real-world data mining
problems.
Thank You !!!

More Related Content

PDF
PPT
An Introduction To Weka
PPT
An Introduction To Weka
PPT
1.5 weka an intoduction
PPT
data mining with weka application
PDF
wekapresentation-130107115704-phpapp02.pdf
PPTX
PDF
Machine Learning with WEKA
An Introduction To Weka
An Introduction To Weka
1.5 weka an intoduction
data mining with weka application
wekapresentation-130107115704-phpapp02.pdf
Machine Learning with WEKA

Similar to Weka : A machine learning algorithms for data mining (20)

PPT
Weka toolkit introduction
PPT
Weka toolkit introduction
PPT
Shraddha weka
PPT
Shraddha weka
PPTX
Weka_new_forthedataming_practicalss.pptx
PPT
Introduction to Weka and Preprocessing.ppt
PDF
TAO Fayan_ Introduction to WEKA
PPT
Data Mining with WEKA WEKA
PPTX
A simple introduction to weka
PPTX
Introduction to Weka- beginner tutorial.pptx
PPT
Weka-Presentation.ppt
DOC
Data mining techniques using weka
PPTX
Weka presentation
PPTX
WEKA Tutorial and Introduction Data mining
PDF
weka-190429184259.pdf
PPTX
Installation Guidelines_Weka.pptx
PPTX
Business intelligence and data warehousing
PPT
Pruebas De RapidMinner Aplicado A La.ppt
Weka toolkit introduction
Weka toolkit introduction
Shraddha weka
Shraddha weka
Weka_new_forthedataming_practicalss.pptx
Introduction to Weka and Preprocessing.ppt
TAO Fayan_ Introduction to WEKA
Data Mining with WEKA WEKA
A simple introduction to weka
Introduction to Weka- beginner tutorial.pptx
Weka-Presentation.ppt
Data mining techniques using weka
Weka presentation
WEKA Tutorial and Introduction Data mining
weka-190429184259.pdf
Installation Guidelines_Weka.pptx
Business intelligence and data warehousing
Pruebas De RapidMinner Aplicado A La.ppt
Ad

Recently uploaded (20)

PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
Classroom Observation Tools for Teachers
PDF
Complications of Minimal Access Surgery at WLH
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
01-Introduction-to-Information-Management.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Lesson notes of climatology university.
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Anesthesia in Laparoscopic Surgery in India
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
RMMM.pdf make it easy to upload and study
Classroom Observation Tools for Teachers
Complications of Minimal Access Surgery at WLH
Abdominal Access Techniques with Prof. Dr. R K Mishra
VCE English Exam - Section C Student Revision Booklet
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
A systematic review of self-coping strategies used by university students to ...
01-Introduction-to-Information-Management.pdf
GDM (1) (1).pptx small presentation for students
2.FourierTransform-ShortQuestionswithAnswers.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Microbial diseases, their pathogenesis and prophylaxis
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Chinmaya Tiranga quiz Grand Finale.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Lesson notes of climatology university.
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Anesthesia in Laparoscopic Surgery in India
Ad

Weka : A machine learning algorithms for data mining

  • 1. WEKA BY: Keshab Kumar Gaurav (ISSA, DRDO)
  • 2. INTRODUCTION TO WEKA  A collection of open source of many data mining and machine learning algorithms, Including > Pre-processing on data > Classification > Clustering > Association rule extraction >3D Visualize  Developed by researchers at the University of Waikato in New Zealand  Pure Java based (also open source).
  • 3. Weka Main Features  71 data pre-processing tools  52 classification/regression algorithms  7 clustering algorithms  9 attribute/subset evaluators + 3 search algorithms for feature selection.  3 algorithms for finding association rules  3 graphical user interfaces “The Explorer” “The Experimenter” “The Knowledge Flow”
  • 4. Weka : Download and Installation  Download Weka (the stable version) from http://www.cs.waikato.ac.nz/ml/weka/ – Choose a self-extracting executable (including Java VM)  After download is completed, run the self extracting file to install Weka, and use the default set-ups.
  • 5. GOAL The programs aims to build a state-of-the-art facility for developing techniques for machine learning and investigating their application in key areas of machine learning. Specifically we will create a workbench for machine learning. Determine the factors that contributes towards its successful application in the agriculture, industries, scientific research and developing new method for machine learning and ways of accessing their effectiveness.
  • 6. Start Weka From windows desktop – click “Start”, choose “All programs” – Choose “Weka 3.7.9” to start Weka Then the first interface window appears: Weka GUI Chooser
  • 8.  Explorer – Environment for exploring data with WEKA. It gives access to all the facilities using menu selection and form filling.  Experimenter – It can be used to get the answer for a question: Which methods and parameter values work best for the given problem?  Knowledge Flow – Same function as explorer. Supports incremental learning. It allows designing configurations for streamed data processing. Incremental algorithms can be used to process very large datasets.
  • 9.  Simple CLI – It provides a simple Command Line Interface for directly executing WEKA commands. WEKA Application Interface
  • 11.  Preprocessing Filters  Attribute selection  Classification/Regression  Clustering  Association discovery  Visualization
  • 12. LOAD DATA FILE AND PREPROCESSING
  • 13.  Load data file in formats: ARFF, CSV, C4.5,binary  Import from URL or SQL database (using JDBC)  Preprocessing filters o Adding/removing attributes o Attribute value substitution o Discretization o Time series filters (delta, shift) o Sampling, randomization o Missing value management o Normalization and other numeric transformations.
  • 15. FOUR FORMATS – ARFF (Attribute Relation File Format) has two sections • The Header information defines attribute name, type and relations. • The Data section lists the data records. – CSV: Comma Separated Values (text file) – C4.5: A format used by a decision induction algorithm C4.5, requires two separated files • Name file: defines the names of the attributes • Date file: lists the records (samples) – Binary – Data can also be read from a URL or from an SQL database (using JDBC).
  • 16. ATTRIBUTE RELATION FILE FORMAT (arff) An ARFF file consists of two distinct sections • The Header section defines attribute name, type and relations, start with a keyword. @Relation <data-name> @attribute <attribute-name> <type> or {range} • The Data section lists the data records, starts with @Data list of data instances
  • 20. Role of WEKA INPUT Raw data Data Ming by WEKA •Pre-processing •Classification •Regression •Clustering •Association Rules •Visualization OUTPUT Result
  • 21. KDD Process of WEKA Data Knowledge Selection Preprocessing Transformation Data Mining Interpretation Evaluation
  • 23.  Predicted target must be categorical  Implemented methods  decision trees(J48) and rules  Naive Bayes  neural networks  instance-based classifier  Evaluation methods  test data set  cross validation  (Example)
  • 27.  Clustering allows a user to make groups of data to determine patterns from the data.  Clustering has its advantages when the data set is defined and a general pattern needs to be determined from the data.  We can create a specific number of groups, depending on your business needs.
  • 28.  One defining benefit of clustering over classification is that every attribute in the data set will be used to analyze the data. (where as in the classification method, only a subset of the attributes are used in the model.)
  • 31. There are few association rules algorithms implemented in WEKA. They try to find associations between different attributes instead of trying to predict the value of the class attribute.
  • 34. Conclusion The overall goal of Weka is to build a state- of-the-art facility for developing machine learning (ML) techniques and allow people to apply them to real-world data mining problems.