Data Mining

MSc IT Part – I, Semester-1 Page No:- ________
DATA MINING Date:- ____________
Sonali. Parab.
PRACTICAL NO: 1
Aim:Build the data mining model structure and built the decision tree with proper decision
nodes and infer at least five different types of reports. Implement Using RTool.
Solution:
Dataset Used :Iris
Step 1:Display the Structure of iris data.
Fig 1.1: Structure of iris data
Step 2:The random seed is set to a fixed value below to make the results reproducible.
Fig 1.2:Random Seed Set

Step 3:Install the party package if it is not installed. Load the party package, build adecision
tree, and check the prediction result.
Sonali. Parab.
Fig 1.3: Load Party library
Fig 1.4: iris table
Step 4:printing the rules and plot the tree
Fig 1.5: Rules of data

Sonali. Parab.
A. Report 1
Fig 1.6: Decision Tree

Sonali. Parab.
Step 5:Plot Decision tree in simple style
Fig 1.7: Command to plot decision tree in simple style
B. Report 2
Fig 1.8: Decision tree (Simple Style)

Sonali. Parab.
Step 6:Plot iris species in bar plot
Fig 1.9: bar plot command
C. Report 3
Fig 1.10:Barplot of Species

Sonali. Parab.
Step 7:Plot iris Species in pie chart
Fig 1.11: Command for pie chart
D. Report 4
Fig 1.12: Pie Chart

Sonali. Parab.
Step 8:Plot histogram of iris Petal Length
Fig 1.13: Command to plot histogram
E. Report 5
Fig 1.14: Histogram of iris Petal Length

Sonali. Parab.
PRACTICAL NO: 2
Aim:Build the data mining model structure and Implement Naïve Bayes Algorithm.
Implement Using WEKA.
Solution:
Dataset Used :Diabetes.arff
Step 1:Pre-processing
Go to WekaOpen file go to weka folder select diabetes.arff dataset open
Fig 2.1 Choosing diabetes.arff dataset

Sonali. Parab.
Step 2:Filter the data
FilterssuperviseddiscretizeApply
Fig 2.2 Selecting the Filter
Fig 2.3 Structure of Filtered Diabetes.arff Dataset

Sonali. Parab.
Step 3:Classify the data using Naïve Bayes Algorithm
Fig 2.4 Select Classification Algorithm
Fig 2.5 Running and Displaying Result

Sonali. Parab.
=== Run information ===
Scheme:weka.classifiers.bayes.NaiveBayes
Relation: pima_diabetes-weka.filters.supervised.attribute.Discretize-Rfirst-last
Instances: 768
Attributes: 9
preg
plas
pres
skin
insu
mass
pedi
age
class
Test mode:10-fold cross-validation
=== Classifier model (full training set) ===
Naive Bayes Classifier
Class
Attribute tested_negative tested_positive
(0.65) (0.35)
====================================================
preg

Sonali. Parab.
'(-inf-6.5]' 427.0 174.0
'(6.5-inf)' 75.0 96.0
[total] 502.0 270.0
plas
'(-inf-99.5]' 182.0 17.0
'(99.5-127.5]' 211.0 79.0
'(127.5-154.5]' 86.0 77.0
'(154.5-inf)' 25.0 99.0
[total] 504.0 272.0
pres
'All' 501.0 269.0
[total] 501.0 269.0
skin
'All' 501.0 269.0
[total] 501.0 269.0
insu
'(-inf-14.5]' 237.0 140.0
'(14.5-121]' 165.0 28.0
'(121-inf)' 101.0 103.0
[total] 503.0 271.0
mass
'(-inf-27.85]' 196.0 28.0

Sonali. Parab.
'(27.85-inf)' 306.0 242.0
[total] 502.0 270.0
pedi
'(-inf-0.5275]' 362.0 149.0
'(0.5275-inf)' 140.0 121.0
[total] 502.0 270.0
age
'(-inf-28.5]' 297.0 72.0
'(28.5-inf)' 205.0 198.0
[total] 502.0 270.0
Time taken to build model: 0 seconds

Sonali. Parab.
Step 4: Visualize classifiers errors
Fig 2.6 Visualization of Classification Errors

Sonali. Parab.
PRACTICAL NO: 3
Aim:Implement the clustering Algorithm By Using Weka Tool.
Solution:
Dataset Used :Iris.arff
Step 1:Preprocess
Open file go to weka folder select iris dataset Choose 
Filterssuperviseddiscretize
Fig 3.1: Structure of iris data

Sonali. Parab.
Fig 3.2: Filtering the Data
Fig 3.3: Filtered Dataset

Sonali. Parab.
Step 2:Cluster
Select cluster tabchoose button clusterers  select simplekmeans click
radio button use training setright click “Poperties” numClusters= 3click
start button.
Fig 3.4 Configuring Clustering Algorithm
Fig 3.5 Generating Result

Sonali. Parab.
Scheme:weka.clusterers.SimpleKMeans -N 3 -A "weka.core.EuclideanDistance -R first-last"
-I 500 -S 10
Relation: iris-weka.filters.supervised.attribute.Discretize-Rfirst-last
Instances: 150
Attributes: 5
sepallength
sepalwidth
petallength
petalwidth
class
Test mode:evaluate on training data
=== Model and evaluation on training set ===
kMeans
======
Number of iterations: 5
Within cluster sum of squared errors: 109.0
Missing values globally replaced with mean/mode
Cluster centroids:
Cluster#

Sonali. Parab.
Attribute Full Data 0 1 2
(150) (50) (50) (50)
=====================================================
sepallength '(-inf-5.55]' '(-inf-5.55]' '(5.55-6.15]' '(6.15-inf)'
sepalwidth '(-inf-2.95]' '(3.35-inf)' '(-inf-2.95]' '(2.95-3.35]'
petallength '(4.75-inf)' '(-inf-2.45]' '(2.45-4.75]' '(4.75-inf)'
petalwidth '(0.8-1.75]' '(-inf-0.8]' '(0.8-1.75]' '(1.75-inf)'
class Iris-setosa Iris-setosa Iris-versicolor Iris-virginica
Time taken to build model (full training data) : 0 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 50 ( 33%)
1 50 ( 33%)
2 50 ( 33%)

Sonali. Parab.
Step 4:Visualizing the Result
Right click on resultvisualize cluster assignments
Fig 3.6 Selecting Visualization
Fig 3.7 Displaying Visualization Result

Sonali. Parab.
PRACTICAL NO: 4
Aim :Build the basic Time series model structure and create the predictions
BodyFatDataset.By Using RTool.
Solution:
Dataset Used :BodyFat
Step 1 :load Package mboost.
Fig 4.1 : Show the load Of Package mboost.

Sonali. Parab.
Step2 :To Show Data stored in BodyFat Dataset.
Fig 4.2 : Show The Data stored in BodyFat Dataset.
Step 3 :Select the Summary Of BodyFat Dataset.
Fig 4.3 :Show The Summary Of BodyFat Dataset.

Sonali. Parab.
Step4 :Using Predication Method And Plot Graph On BodyFat Dataset.
Fig 4.4 : Show Predication Method And Plot Graph Formula ApplyOn BodyFat Dataset.
Step5 :Predication Graph For BodyFat Dataset.
Fig 4.5 :Show The Predication Graph For BodyFat Dataset.

Sonali. Parab.
PRACTICAL NO: 5
Aim:Build the data mining model and implement k-nearest neighbor By Using Weka Tool.
Solution:
Dataset Used:ContactLenses.arff
Step 1:Preprocess
Open file go to weka folder select contact lens dataset Choose 
Filterssuperviseddiscretize
Fig 5.1: Structure of contact lens dataset

Sonali. Parab.
Fig 5.3:Filtered Dataset

Sonali. Parab.
Step 2:Classify
Select classify tabchoose buttonexpand Lazy folderselect IBKclick radio button use
training setclick start button.
Fig 5.4 Choosing K-nearest neighbour algorithm
Fig 5.5 Generating Result

Sonali. Parab.
Scheme:weka.classifiers.lazy.IBk -K 1 -W 0 -A
"weka.core.neighboursearch.LinearNNSearch -A "weka.core.EuclideanDistance -R first-last""
Relation: contact-lenses-weka.filters.supervised.attribute.Discretize-Rfirst-last
Instances: 24
Attributes: 5
age
spectacle-prescrip
astigmatism
tear-prod-rate
contact-lenses
Test mode:evaluate on training data
=== Classifier model (full training set) ===
IB1 instance-based classifier
using 1 nearest neighbour(s) for classification
Time taken to build model: 0 seconds
=== Evaluation on training set ===
=== Summary ===
Correctly Classified Instances 24 100 %

Sonali. Parab.
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
Mean absolute error 0.0494
Root mean squared error 0.0524
Relative absolute error 13.4078 %
Root relative squared error 12.3482 %
Total Number of Instances 24
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
1 0 1 1 1 1 soft
1 0 1 1 1 1 hard
1 0 1 1 1 1 none
Weighted Avg. 1 0 1 1 1 1
=== Confusion Matrix ===
a b c <-- classified as
5 0 0 | a = soft
0 4 0 | b = hard
0 0 15 | c = none

Sonali. Parab.
PRACTICAL NO: 6
Solution:
Dataset Used:Supermarket.arff
Step 1:Preprocess
Open file go to Weka folder select Supermarket dataset Choose  FiltersAll Filter
Fig 6.1: Structure of Supermarket dataset

Sonali. Parab.
Fig 6.3: Filtered Dataset

Sonali. Parab.
Step 2:Associate
Select Associate tabchoose apriori algorithmpropertiesconfigure
algorithm according to requirementsclick ‘start’
Fig 6.4 Choosing Apriori Algorithm
Fig 6.5 Configuring Algorithm

Sonali. Parab.
Fig 6.6 Displaying Association Results
Scheme: weka.associations.Apriori -N 12 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1
Relation: supermarket-weka.filters.AllFilter-weka.filters.AllFilter-weka.filters.AllFilter-weka.
filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.AllFilter-weka.
filters.AllFilter
Instances: 4627
Attributes: 217
[list of attributes omitted]
=== Associator model (full training set) ===

Sonali. Parab.
Apriori
=======
Minimum support: 0.15 (694 instances)
Minimum metric <confidence>: 0.9
Number of cycles performed: 17
Generated sets of large itemsets:
Size of set of large itemsets L(1): 44
Best rules found:
1. biscuits=t frozen foods=t fruit=t total=high 788 ==> bread and cake=t 723 conf:(0.92)
2. baking needs=t biscuits=t fruit=t total=high 760 ==> bread and cake=t 696 conf:(0.92)
3. baking needs=t frozen foods=t fruit=t total=high 770 ==> bread and cake=t 705
conf:(0.92)

4. biscuits=t fruit=t vegetables=t total=high 815 ==> bread and cake=t 746 conf:(0.92)
5. party snack foods=t fruit=t total=high 854 ==> bread and cake=t 779 conf:(0.91)
6. biscuits=t frozen foods=t vegetables=t total=high 797 ==> bread and cake=t 725
conf:(0.91)
7. baking needs=t biscuits=t vegetables=t total=high 772 ==> bread and cake=t 701
conf:(0.91)
Sonali. Parab.
8. biscuits=t fruit=t total=high 954 ==> bread and cake=t 866 conf:(0.91)
9. frozen foods=t fruit=t vegetables=t total=high 834 ==> bread and cake=t 757
conf:(0.91)
10. frozen foods=t fruit=t total=high 969 ==> bread and cake=t 877 conf:(0.91)
11. baking needs=t fruit=t vegetables=t total=high 831 ==> bread and cake=t 752 conf:(0.9)
12. biscuits=t milk-cream=t total=high 907 ==> bread and cake=t 820 conf:(0.9)

Sonali. Parab.
PRACTICAL NO: 7
Solution:
Dataset Used:Titanic
Step 1:Preprocess
Loading the Data in Data Frame
Transforming the Data into Suitable Format
Fig 7.1: Structure of Titanic dataset

Sonali. Parab.
Fig 7.2 Summary of Titanic Dataset
Step 2:Associate
Loading library ‘arules’ that contains functions for Association mining
Function used to apply Apriori Algorithm with Default Configuration
Fig 7.3 Choosing Apriori Algorithm

Sonali. Parab.
Fig 7.4 Inspecting the Results of Apriori Algorithm
Fig 7.5 Applying Settings to Display Rules with RHS containing survived only

Sonali. Parab.
Step 3:Finding and Removing Redundant Rules
Code to Find Redundant Rules
Code to Remove Redundant Rules
Fig 7.6 Finding & Removing Redundant Rules

Sonali. Parab.
Step 4:Visualizing:
Loading library aulesViz which contains functions for Visualizing Assoication Results
Function to plot Results Using Scatter Plot
X axis: Support
Y axis:Confidence
Fig 7.7 Scatter Plot

Sonali. Parab.
Function to plot Association Results as Graph Plot
Fig 7.8 Graph Plot Showing How Data Items are Assoicated

Sonali. Parab.
PRACTICAL NO: 8
Aim:Consider the suitable data for text mining and Implement the Text Mining technique
using R-Tool.
Solution:
Dataset Used:Plain Text File (www.txt)
Step 1:Loading the Text File
Loading Essential Libraries for Text Mining tm, SnowballC and twitteR
Loading The Data From Text File Into RTool Using readLines()
Fig 8.1: Using tail() and head() functions to display start and of paragraphs

Sonali. Parab.
Step 2:Transforming
Loading tm library and transforming document to Corpusdoc
Fig 8.2 Inspecting Corpusdoc
Function to Remove Punctuations
Fig 8.3 Removing Punctuations

Sonali. Parab.
Function to Strip White Spaces
Fig 8.4 Stripping White Spaces
Function to Remove Stop Words from Document
Fig 8.5 Removing Stop Words From Document

Sonali. Parab.
Function to Stem the Document
Fig 8.6 Stemming the Document
Function to Convert corpusdoc to TermDocumentMatrix
Fig 8.7 Inspecting TermDocumentMatrix

Sonali. Parab.
Step 3:Finding Frequent Terms in Document
Fig 8.7 Find Frequent Terms From Document
Step 4:Finding Association among terms
Function to find Association among Different terms in Document
Fig 8.8 Result of How Strongly Terms Are Associated with Term “information”

Data Mining

More Related Content

Viewers also liked (16)

Similar to Data Mining (20)

More from Sonali Parab (18)

Recently uploaded (20)

Data Mining