SlideShare a Scribd company logo
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
PRACTICAL NO: 1 
Aim:Build the data mining model structure and built the decision tree with proper decision 
nodes and infer at least five different types of reports. Implement Using RTool. 
Solution: 
Dataset Used :Iris 
Step 1:Display the Structure of iris data. 
Fig 1.1: Structure of iris data 
Step 2:The random seed is set to a fixed value below to make the results reproducible. 
Fig 1.2:Random Seed Set
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Step 3:Install the party package if it is not installed. Load the party package, build adecision 
tree, and check the prediction result. 
Sonali. Parab. 
Fig 1.3: Load Party library 
Fig 1.4: iris table 
Step 4:printing the rules and plot the tree 
Fig 1.5: Rules of data
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
A. Report 1 
Fig 1.6: Decision Tree
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Step 5:Plot Decision tree in simple style 
Fig 1.7: Command to plot decision tree in simple style 
B. Report 2 
Fig 1.8: Decision tree (Simple Style)
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Step 6:Plot iris species in bar plot 
Fig 1.9: bar plot command 
C. Report 3 
Fig 1.10:Barplot of Species
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Step 7:Plot iris Species in pie chart 
Fig 1.11: Command for pie chart 
D. Report 4 
Fig 1.12: Pie Chart
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Step 8:Plot histogram of iris Petal Length 
Fig 1.13: Command to plot histogram 
E. Report 5 
Fig 1.14: Histogram of iris Petal Length
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
PRACTICAL NO: 2 
Aim:Build the data mining model structure and Implement Naïve Bayes Algorithm. 
Implement Using WEKA. 
Solution: 
Dataset Used :Diabetes.arff 
Step 1:Pre-processing 
Go to WekaOpen file go to weka folder select diabetes.arff dataset open 
Fig 2.1 Choosing diabetes.arff dataset
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Step 2:Filter the data 
FilterssuperviseddiscretizeApply 
Fig 2.2 Selecting the Filter 
Fig 2.3 Structure of Filtered Diabetes.arff Dataset
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Step 3:Classify the data using Naïve Bayes Algorithm 
Fig 2.4 Select Classification Algorithm 
Fig 2.5 Running and Displaying Result
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
=== Run information === 
Scheme:weka.classifiers.bayes.NaiveBayes 
Relation: pima_diabetes-weka.filters.supervised.attribute.Discretize-Rfirst-last 
Instances: 768 
Attributes: 9 
preg 
plas 
pres 
skin 
insu 
mass 
pedi 
age 
class 
Test mode:10-fold cross-validation 
=== Classifier model (full training set) === 
Naive Bayes Classifier 
Class 
Attribute tested_negative tested_positive 
(0.65) (0.35) 
==================================================== 
preg
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
'(-inf-6.5]' 427.0 174.0 
'(6.5-inf)' 75.0 96.0 
[total] 502.0 270.0 
plas 
'(-inf-99.5]' 182.0 17.0 
'(99.5-127.5]' 211.0 79.0 
'(127.5-154.5]' 86.0 77.0 
'(154.5-inf)' 25.0 99.0 
[total] 504.0 272.0 
pres 
'All' 501.0 269.0 
[total] 501.0 269.0 
skin 
'All' 501.0 269.0 
[total] 501.0 269.0 
insu 
'(-inf-14.5]' 237.0 140.0 
'(14.5-121]' 165.0 28.0 
'(121-inf)' 101.0 103.0 
[total] 503.0 271.0 
mass 
'(-inf-27.85]' 196.0 28.0
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
'(27.85-inf)' 306.0 242.0 
[total] 502.0 270.0 
pedi 
'(-inf-0.5275]' 362.0 149.0 
'(0.5275-inf)' 140.0 121.0 
[total] 502.0 270.0 
age 
'(-inf-28.5]' 297.0 72.0 
'(28.5-inf)' 205.0 198.0 
[total] 502.0 270.0 
Time taken to build model: 0 seconds
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Step 4: Visualize classifiers errors 
Fig 2.6 Visualization of Classification Errors
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
PRACTICAL NO: 3 
Aim:Implement the clustering Algorithm By Using Weka Tool. 
Solution: 
Dataset Used :Iris.arff 
Step 1:Preprocess 
Open file go to weka folder select iris dataset Choose  
Filterssuperviseddiscretize 
Fig 3.1: Structure of iris data
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Fig 3.2: Filtering the Data 
Fig 3.3: Filtered Dataset
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Step 2:Cluster 
Select cluster tabchoose button clusterers  select simplekmeans click 
radio button use training setright click “Poperties” numClusters= 3click 
start button. 
Fig 3.4 Configuring Clustering Algorithm 
Fig 3.5 Generating Result
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
=== Run information === 
Scheme:weka.clusterers.SimpleKMeans -N 3 -A "weka.core.EuclideanDistance -R first-last" 
-I 500 -S 10 
Relation: iris-weka.filters.supervised.attribute.Discretize-Rfirst-last 
Instances: 150 
Attributes: 5 
sepallength 
sepalwidth 
petallength 
petalwidth 
class 
Test mode:evaluate on training data 
=== Model and evaluation on training set === 
kMeans 
====== 
Number of iterations: 5 
Within cluster sum of squared errors: 109.0 
Missing values globally replaced with mean/mode 
Cluster centroids: 
Cluster#
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Attribute Full Data 0 1 2 
(150) (50) (50) (50) 
===================================================== 
sepallength '(-inf-5.55]' '(-inf-5.55]' '(5.55-6.15]' '(6.15-inf)' 
sepalwidth '(-inf-2.95]' '(3.35-inf)' '(-inf-2.95]' '(2.95-3.35]' 
petallength '(4.75-inf)' '(-inf-2.45]' '(2.45-4.75]' '(4.75-inf)' 
petalwidth '(0.8-1.75]' '(-inf-0.8]' '(0.8-1.75]' '(1.75-inf)' 
class Iris-setosa Iris-setosa Iris-versicolor Iris-virginica 
Time taken to build model (full training data) : 0 seconds 
=== Model and evaluation on training set === 
Clustered Instances 
0 50 ( 33%) 
1 50 ( 33%) 
2 50 ( 33%)
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Step 4:Visualizing the Result 
Right click on resultvisualize cluster assignments 
Fig 3.6 Selecting Visualization 
Fig 3.7 Displaying Visualization Result
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
PRACTICAL NO: 4 
Aim :Build the basic Time series model structure and create the predictions 
BodyFatDataset.By Using RTool. 
Solution: 
Dataset Used :BodyFat 
Step 1 :load Package mboost. 
Fig 4.1 : Show the load Of Package mboost.
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Step2 :To Show Data stored in BodyFat Dataset. 
Fig 4.2 : Show The Data stored in BodyFat Dataset. 
Step 3 :Select the Summary Of BodyFat Dataset. 
Fig 4.3 :Show The Summary Of BodyFat Dataset.
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Step4 :Using Predication Method And Plot Graph On BodyFat Dataset. 
Fig 4.4 : Show Predication Method And Plot Graph Formula ApplyOn BodyFat Dataset. 
Step5 :Predication Graph For BodyFat Dataset. 
Fig 4.5 :Show The Predication Graph For BodyFat Dataset.
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
PRACTICAL NO: 5 
Aim:Build the data mining model and implement k-nearest neighbor By Using Weka Tool. 
Solution: 
Dataset Used:ContactLenses.arff 
Step 1:Preprocess 
Open file go to weka folder select contact lens dataset Choose  
Filterssuperviseddiscretize 
Fig 5.1: Structure of contact lens dataset
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Fig 5.2: Filtering the Data 
Fig 5.3:Filtered Dataset
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Step 2:Classify 
Select classify tabchoose buttonexpand Lazy folderselect IBKclick radio button use 
training setclick start button. 
Fig 5.4 Choosing K-nearest neighbour algorithm 
Fig 5.5 Generating Result
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
=== Run information === 
Scheme:weka.classifiers.lazy.IBk -K 1 -W 0 -A 
"weka.core.neighboursearch.LinearNNSearch -A "weka.core.EuclideanDistance -R first-last"" 
Relation: contact-lenses-weka.filters.supervised.attribute.Discretize-Rfirst-last 
Instances: 24 
Attributes: 5 
age 
spectacle-prescrip 
astigmatism 
tear-prod-rate 
contact-lenses 
Test mode:evaluate on training data 
=== Classifier model (full training set) === 
IB1 instance-based classifier 
using 1 nearest neighbour(s) for classification 
Time taken to build model: 0 seconds 
=== Evaluation on training set === 
=== Summary === 
Correctly Classified Instances 24 100 %
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Incorrectly Classified Instances 0 0 % 
Kappa statistic 1 
Mean absolute error 0.0494 
Root mean squared error 0.0524 
Relative absolute error 13.4078 % 
Root relative squared error 12.3482 % 
Total Number of Instances 24 
=== Detailed Accuracy By Class === 
TP Rate FP Rate Precision Recall F-Measure ROC Area Class 
1 0 1 1 1 1 soft 
1 0 1 1 1 1 hard 
1 0 1 1 1 1 none 
Weighted Avg. 1 0 1 1 1 1 
=== Confusion Matrix === 
a b c <-- classified as 
5 0 0 | a = soft 
0 4 0 | b = hard 
0 0 15 | c = none
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
PRACTICAL NO: 6 
Aim:Build the data mining model and implement k-nearest neighbor By Using Weka Tool. 
Solution: 
Dataset Used:Supermarket.arff 
Step 1:Preprocess 
Open file go to Weka folder select Supermarket dataset Choose  FiltersAll Filter 
Fig 6.1: Structure of Supermarket dataset
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Fig 6.2: Filtering the Data 
Fig 6.3: Filtered Dataset
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Step 2:Associate 
Select Associate tabchoose apriori algorithmpropertiesconfigure 
algorithm according to requirementsclick ‘start’ 
Fig 6.4 Choosing Apriori Algorithm 
Fig 6.5 Configuring Algorithm
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Fig 6.6 Displaying Association Results 
=== Run information === 
Scheme: weka.associations.Apriori -N 12 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1 
Relation: supermarket-weka.filters.AllFilter-weka.filters.AllFilter-weka.filters.AllFilter-weka. 
filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.AllFilter-weka. 
filters.AllFilter 
Instances: 4627 
Attributes: 217 
[list of attributes omitted] 
=== Associator model (full training set) ===
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Apriori 
======= 
Minimum support: 0.15 (694 instances) 
Minimum metric <confidence>: 0.9 
Number of cycles performed: 17 
Generated sets of large itemsets: 
Size of set of large itemsets L(1): 44 
Size of set of large itemsets L(2): 380 
Size of set of large itemsets L(3): 910 
Size of set of large itemsets L(4): 633 
Size of set of large itemsets L(5): 105 
Size of set of large itemsets L(6): 1 
Best rules found: 
1. biscuits=t frozen foods=t fruit=t total=high 788 ==> bread and cake=t 723 conf:(0.92) 
2. baking needs=t biscuits=t fruit=t total=high 760 ==> bread and cake=t 696 conf:(0.92) 
3. baking needs=t frozen foods=t fruit=t total=high 770 ==> bread and cake=t 705 
conf:(0.92)
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
4. biscuits=t fruit=t vegetables=t total=high 815 ==> bread and cake=t 746 conf:(0.92) 
5. party snack foods=t fruit=t total=high 854 ==> bread and cake=t 779 conf:(0.91) 
6. biscuits=t frozen foods=t vegetables=t total=high 797 ==> bread and cake=t 725 
conf:(0.91) 
7. baking needs=t biscuits=t vegetables=t total=high 772 ==> bread and cake=t 701 
conf:(0.91) 
Sonali. Parab. 
8. biscuits=t fruit=t total=high 954 ==> bread and cake=t 866 conf:(0.91) 
9. frozen foods=t fruit=t vegetables=t total=high 834 ==> bread and cake=t 757 
conf:(0.91) 
10. frozen foods=t fruit=t total=high 969 ==> bread and cake=t 877 conf:(0.91) 
11. baking needs=t fruit=t vegetables=t total=high 831 ==> bread and cake=t 752 conf:(0.9) 
12. biscuits=t milk-cream=t total=high 907 ==> bread and cake=t 820 conf:(0.9)
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
PRACTICAL NO: 7 
Aim:Build the data mining model and implement k-nearest neighbor By Using Weka Tool. 
Solution: 
Dataset Used:Titanic 
Step 1:Preprocess 
Loading the Data in Data Frame 
Transforming the Data into Suitable Format 
Fig 7.1: Structure of Titanic dataset
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Fig 7.2 Summary of Titanic Dataset 
Step 2:Associate 
Loading library ‘arules’ that contains functions for Association mining 
Function used to apply Apriori Algorithm with Default Configuration 
Fig 7.3 Choosing Apriori Algorithm
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Fig 7.4 Inspecting the Results of Apriori Algorithm 
Fig 7.5 Applying Settings to Display Rules with RHS containing survived only
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Step 3:Finding and Removing Redundant Rules 
Code to Find Redundant Rules 
Code to Remove Redundant Rules 
Fig 7.6 Finding & Removing Redundant Rules
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Step 4:Visualizing: 
Loading library aulesViz which contains functions for Visualizing Assoication Results 
Function to plot Results Using Scatter Plot 
X axis: Support 
Y axis:Confidence 
Fig 7.7 Scatter Plot
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Function to plot Association Results as Graph Plot 
Fig 7.8 Graph Plot Showing How Data Items are Assoicated
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
PRACTICAL NO: 8 
Aim:Consider the suitable data for text mining and Implement the Text Mining technique 
using R-Tool. 
Solution: 
Dataset Used:Plain Text File (www.txt) 
Step 1:Loading the Text File 
Loading Essential Libraries for Text Mining tm, SnowballC and twitteR 
Loading The Data From Text File Into RTool Using readLines() 
Fig 8.1: Using tail() and head() functions to display start and of paragraphs
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Step 2:Transforming 
Loading tm library and transforming document to Corpusdoc 
Fig 8.2 Inspecting Corpusdoc 
Function to Remove Punctuations 
Fig 8.3 Removing Punctuations
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Function to Strip White Spaces 
Fig 8.4 Stripping White Spaces 
Function to Remove Stop Words from Document 
Fig 8.5 Removing Stop Words From Document
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Function to Stem the Document 
Fig 8.6 Stemming the Document 
Function to Convert corpusdoc to TermDocumentMatrix 
Fig 8.7 Inspecting TermDocumentMatrix
MSc IT Part – I, Semester-1 Page No:- ________ 
DATA MINING Date:- ____________ 
Sonali. Parab. 
Step 3:Finding Frequent Terms in Document 
Fig 8.7 Find Frequent Terms From Document 
Step 4:Finding Association among terms 
Function to find Association among Different terms in Document 
Fig 8.8 Result of How Strongly Terms Are Associated with Term “information”

More Related Content

PDF
FUZZY IMAGE SEGMENTATION USING VALIDITY INDEXES CORRELATION
PDF
Iterative improved learning algorithm for petrographic image classification a...
PDF
IRJET- Image Classification – Cat and Dog Images
DOCX
Distributed systems
PPT
Download It
PPT
ensemble learning
PDF
Introduction to Some Tree based Learning Method
PDF
ランダムフォレストとそのコンピュータビジョンへの応用
FUZZY IMAGE SEGMENTATION USING VALIDITY INDEXES CORRELATION
Iterative improved learning algorithm for petrographic image classification a...
IRJET- Image Classification – Cat and Dog Images
Distributed systems
Download It
ensemble learning
Introduction to Some Tree based Learning Method
ランダムフォレストとそのコンピュータビジョンへの応用

Viewers also liked (16)

PDF
L4. Ensembles of Decision Trees
PPTX
[Women in Data Science Meetup ATX] Decision Trees
PPTX
Ensemble modeling overview, Big Data meetup
PDF
From decision trees to random forests
PPTX
Decision trees and random forests
PDF
Machine Learning and Data Mining: 16 Classifiers Ensembles
PPTX
Lecture 6: Ensemble Methods
PDF
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
PPT
2.8 accuracy and ensemble methods
PPTX
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
PPTX
Machine learning overview (with SAS software)
PPSX
Election algorithms
PDF
Understanding Random Forests: From Theory to Practice
PDF
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
PDF
5.4 Arbres et forêts aléatoires
PDF
Data Science - Part V - Decision Trees & Random Forests
L4. Ensembles of Decision Trees
[Women in Data Science Meetup ATX] Decision Trees
Ensemble modeling overview, Big Data meetup
From decision trees to random forests
Decision trees and random forests
Machine Learning and Data Mining: 16 Classifiers Ensembles
Lecture 6: Ensemble Methods
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
2.8 accuracy and ensemble methods
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning overview (with SAS software)
Election algorithms
Understanding Random Forests: From Theory to Practice
2013-1 Machine Learning Lecture 06 - Lucila Ohno-Machado - Ensemble Methods
5.4 Arbres et forêts aléatoires
Data Science - Part V - Decision Trees & Random Forests
Ad

Similar to Data Mining (20)

PDF
Machine Learning, K-means Algorithm Implementation with R
PDF
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
PDF
IRJET- Efficient Face Detection from Video Sequences using KNN and PCA
PDF
EKON22 Introduction to Machinelearning
PDF
3 Data scientist associate - Case GoalZone - Fitness class attendance study.pdf
PDF
Machine_Learning_Co__
PDF
Machine_Learning_Trushita
PDF
Classification and Prediction Based Data Mining Algorithm in Weka Tool
PDF
Human_Activity_Recognition_Predictive_Model
DOC
Cis247 a ilab 3 overloaded methods and static methods variables
DOC
Cis247 i lab 3 overloaded methods and static methods variables
DOC
Cis247 a ilab 3 overloaded methods and static methods variables
PDF
AIML4 CNN lab256 1hr (111-1).pdf
PDF
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...
PDF
Image Classification using Deep Learning
PDF
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
DOCX
Artifical_intiligence_worksheet-exp-9.docx
PDF
Machine learning key to your formulation challenges
PDF
Artificial Intelligence based Pattern Recognition
DOCX
Cis247 i lab 2 of 7 employee class
Machine Learning, K-means Algorithm Implementation with R
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
IRJET- Efficient Face Detection from Video Sequences using KNN and PCA
EKON22 Introduction to Machinelearning
3 Data scientist associate - Case GoalZone - Fitness class attendance study.pdf
Machine_Learning_Co__
Machine_Learning_Trushita
Classification and Prediction Based Data Mining Algorithm in Weka Tool
Human_Activity_Recognition_Predictive_Model
Cis247 a ilab 3 overloaded methods and static methods variables
Cis247 i lab 3 overloaded methods and static methods variables
Cis247 a ilab 3 overloaded methods and static methods variables
AIML4 CNN lab256 1hr (111-1).pdf
IRJET - Comparative Study of Flight Delay Prediction using Back Propagati...
Image Classification using Deep Learning
INTRODUCTION TO MACHINE LEARNING FOR MATERIALS SCIENCE
Artifical_intiligence_worksheet-exp-9.docx
Machine learning key to your formulation challenges
Artificial Intelligence based Pattern Recognition
Cis247 i lab 2 of 7 employee class
Ad

More from Sonali Parab (18)

PPT
Forensic laboratory setup requirements
DOCX
Forensic laboratory setup requirements
DOCX
Firewalls
DOCX
Embedded System
DOCX
Advance Database Management Systems -Object Oriented Principles In Database
PDF
Cloud and Ubiquitous Computing manual
PPT
Advance Database Management Systems -Object Oriented Principles In Database
PPT
Default and On demand routing - Advance Computer Networks
DOCX
Cloud Computing And Virtualization
DOCX
Protocols in Bluetooth
PPT
Protols used in bluetooth
PPT
Public Cloud Provider
DOCX
Public Cloud Provider
DOCX
Minning www
DOCX
Remote Method Invocation
DOCX
Agile testing
PPT
Minning WWW
PPTX
Remote Method Invocation (Java RMI)
Forensic laboratory setup requirements
Forensic laboratory setup requirements
Firewalls
Embedded System
Advance Database Management Systems -Object Oriented Principles In Database
Cloud and Ubiquitous Computing manual
Advance Database Management Systems -Object Oriented Principles In Database
Default and On demand routing - Advance Computer Networks
Cloud Computing And Virtualization
Protocols in Bluetooth
Protols used in bluetooth
Public Cloud Provider
Public Cloud Provider
Minning www
Remote Method Invocation
Agile testing
Minning WWW
Remote Method Invocation (Java RMI)

Recently uploaded (20)

PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Institutional Correction lecture only . . .
PDF
Insiders guide to clinical Medicine.pdf
PDF
Basic Mud Logging Guide for educational purpose
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Microbial disease of the cardiovascular and lymphatic systems
human mycosis Human fungal infections are called human mycosis..pptx
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
TR - Agricultural Crops Production NC III.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Week 4 Term 3 Study Techniques revisited.pptx
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
FourierSeries-QuestionsWithAnswers(Part-A).pdf
RMMM.pdf make it easy to upload and study
Anesthesia in Laparoscopic Surgery in India
Abdominal Access Techniques with Prof. Dr. R K Mishra
Pharmacology of Heart Failure /Pharmacotherapy of CHF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
VCE English Exam - Section C Student Revision Booklet
Institutional Correction lecture only . . .
Insiders guide to clinical Medicine.pdf
Basic Mud Logging Guide for educational purpose
2.FourierTransform-ShortQuestionswithAnswers.pdf

Data Mining

  • 1. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. PRACTICAL NO: 1 Aim:Build the data mining model structure and built the decision tree with proper decision nodes and infer at least five different types of reports. Implement Using RTool. Solution: Dataset Used :Iris Step 1:Display the Structure of iris data. Fig 1.1: Structure of iris data Step 2:The random seed is set to a fixed value below to make the results reproducible. Fig 1.2:Random Seed Set
  • 2. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Step 3:Install the party package if it is not installed. Load the party package, build adecision tree, and check the prediction result. Sonali. Parab. Fig 1.3: Load Party library Fig 1.4: iris table Step 4:printing the rules and plot the tree Fig 1.5: Rules of data
  • 3. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. A. Report 1 Fig 1.6: Decision Tree
  • 4. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Step 5:Plot Decision tree in simple style Fig 1.7: Command to plot decision tree in simple style B. Report 2 Fig 1.8: Decision tree (Simple Style)
  • 5. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Step 6:Plot iris species in bar plot Fig 1.9: bar plot command C. Report 3 Fig 1.10:Barplot of Species
  • 6. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Step 7:Plot iris Species in pie chart Fig 1.11: Command for pie chart D. Report 4 Fig 1.12: Pie Chart
  • 7. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Step 8:Plot histogram of iris Petal Length Fig 1.13: Command to plot histogram E. Report 5 Fig 1.14: Histogram of iris Petal Length
  • 8. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. PRACTICAL NO: 2 Aim:Build the data mining model structure and Implement Naïve Bayes Algorithm. Implement Using WEKA. Solution: Dataset Used :Diabetes.arff Step 1:Pre-processing Go to WekaOpen file go to weka folder select diabetes.arff dataset open Fig 2.1 Choosing diabetes.arff dataset
  • 9. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Step 2:Filter the data FilterssuperviseddiscretizeApply Fig 2.2 Selecting the Filter Fig 2.3 Structure of Filtered Diabetes.arff Dataset
  • 10. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Step 3:Classify the data using Naïve Bayes Algorithm Fig 2.4 Select Classification Algorithm Fig 2.5 Running and Displaying Result
  • 11. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. === Run information === Scheme:weka.classifiers.bayes.NaiveBayes Relation: pima_diabetes-weka.filters.supervised.attribute.Discretize-Rfirst-last Instances: 768 Attributes: 9 preg plas pres skin insu mass pedi age class Test mode:10-fold cross-validation === Classifier model (full training set) === Naive Bayes Classifier Class Attribute tested_negative tested_positive (0.65) (0.35) ==================================================== preg
  • 12. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. '(-inf-6.5]' 427.0 174.0 '(6.5-inf)' 75.0 96.0 [total] 502.0 270.0 plas '(-inf-99.5]' 182.0 17.0 '(99.5-127.5]' 211.0 79.0 '(127.5-154.5]' 86.0 77.0 '(154.5-inf)' 25.0 99.0 [total] 504.0 272.0 pres 'All' 501.0 269.0 [total] 501.0 269.0 skin 'All' 501.0 269.0 [total] 501.0 269.0 insu '(-inf-14.5]' 237.0 140.0 '(14.5-121]' 165.0 28.0 '(121-inf)' 101.0 103.0 [total] 503.0 271.0 mass '(-inf-27.85]' 196.0 28.0
  • 13. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. '(27.85-inf)' 306.0 242.0 [total] 502.0 270.0 pedi '(-inf-0.5275]' 362.0 149.0 '(0.5275-inf)' 140.0 121.0 [total] 502.0 270.0 age '(-inf-28.5]' 297.0 72.0 '(28.5-inf)' 205.0 198.0 [total] 502.0 270.0 Time taken to build model: 0 seconds
  • 14. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Step 4: Visualize classifiers errors Fig 2.6 Visualization of Classification Errors
  • 15. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. PRACTICAL NO: 3 Aim:Implement the clustering Algorithm By Using Weka Tool. Solution: Dataset Used :Iris.arff Step 1:Preprocess Open file go to weka folder select iris dataset Choose  Filterssuperviseddiscretize Fig 3.1: Structure of iris data
  • 16. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Fig 3.2: Filtering the Data Fig 3.3: Filtered Dataset
  • 17. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Step 2:Cluster Select cluster tabchoose button clusterers  select simplekmeans click radio button use training setright click “Poperties” numClusters= 3click start button. Fig 3.4 Configuring Clustering Algorithm Fig 3.5 Generating Result
  • 18. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. === Run information === Scheme:weka.clusterers.SimpleKMeans -N 3 -A "weka.core.EuclideanDistance -R first-last" -I 500 -S 10 Relation: iris-weka.filters.supervised.attribute.Discretize-Rfirst-last Instances: 150 Attributes: 5 sepallength sepalwidth petallength petalwidth class Test mode:evaluate on training data === Model and evaluation on training set === kMeans ====== Number of iterations: 5 Within cluster sum of squared errors: 109.0 Missing values globally replaced with mean/mode Cluster centroids: Cluster#
  • 19. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Attribute Full Data 0 1 2 (150) (50) (50) (50) ===================================================== sepallength '(-inf-5.55]' '(-inf-5.55]' '(5.55-6.15]' '(6.15-inf)' sepalwidth '(-inf-2.95]' '(3.35-inf)' '(-inf-2.95]' '(2.95-3.35]' petallength '(4.75-inf)' '(-inf-2.45]' '(2.45-4.75]' '(4.75-inf)' petalwidth '(0.8-1.75]' '(-inf-0.8]' '(0.8-1.75]' '(1.75-inf)' class Iris-setosa Iris-setosa Iris-versicolor Iris-virginica Time taken to build model (full training data) : 0 seconds === Model and evaluation on training set === Clustered Instances 0 50 ( 33%) 1 50 ( 33%) 2 50 ( 33%)
  • 20. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Step 4:Visualizing the Result Right click on resultvisualize cluster assignments Fig 3.6 Selecting Visualization Fig 3.7 Displaying Visualization Result
  • 21. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. PRACTICAL NO: 4 Aim :Build the basic Time series model structure and create the predictions BodyFatDataset.By Using RTool. Solution: Dataset Used :BodyFat Step 1 :load Package mboost. Fig 4.1 : Show the load Of Package mboost.
  • 22. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Step2 :To Show Data stored in BodyFat Dataset. Fig 4.2 : Show The Data stored in BodyFat Dataset. Step 3 :Select the Summary Of BodyFat Dataset. Fig 4.3 :Show The Summary Of BodyFat Dataset.
  • 23. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Step4 :Using Predication Method And Plot Graph On BodyFat Dataset. Fig 4.4 : Show Predication Method And Plot Graph Formula ApplyOn BodyFat Dataset. Step5 :Predication Graph For BodyFat Dataset. Fig 4.5 :Show The Predication Graph For BodyFat Dataset.
  • 24. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. PRACTICAL NO: 5 Aim:Build the data mining model and implement k-nearest neighbor By Using Weka Tool. Solution: Dataset Used:ContactLenses.arff Step 1:Preprocess Open file go to weka folder select contact lens dataset Choose  Filterssuperviseddiscretize Fig 5.1: Structure of contact lens dataset
  • 25. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Fig 5.2: Filtering the Data Fig 5.3:Filtered Dataset
  • 26. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Step 2:Classify Select classify tabchoose buttonexpand Lazy folderselect IBKclick radio button use training setclick start button. Fig 5.4 Choosing K-nearest neighbour algorithm Fig 5.5 Generating Result
  • 27. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. === Run information === Scheme:weka.classifiers.lazy.IBk -K 1 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A "weka.core.EuclideanDistance -R first-last"" Relation: contact-lenses-weka.filters.supervised.attribute.Discretize-Rfirst-last Instances: 24 Attributes: 5 age spectacle-prescrip astigmatism tear-prod-rate contact-lenses Test mode:evaluate on training data === Classifier model (full training set) === IB1 instance-based classifier using 1 nearest neighbour(s) for classification Time taken to build model: 0 seconds === Evaluation on training set === === Summary === Correctly Classified Instances 24 100 %
  • 28. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0.0494 Root mean squared error 0.0524 Relative absolute error 13.4078 % Root relative squared error 12.3482 % Total Number of Instances 24 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 1 0 1 1 1 1 soft 1 0 1 1 1 1 hard 1 0 1 1 1 1 none Weighted Avg. 1 0 1 1 1 1 === Confusion Matrix === a b c <-- classified as 5 0 0 | a = soft 0 4 0 | b = hard 0 0 15 | c = none
  • 29. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. PRACTICAL NO: 6 Aim:Build the data mining model and implement k-nearest neighbor By Using Weka Tool. Solution: Dataset Used:Supermarket.arff Step 1:Preprocess Open file go to Weka folder select Supermarket dataset Choose  FiltersAll Filter Fig 6.1: Structure of Supermarket dataset
  • 30. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Fig 6.2: Filtering the Data Fig 6.3: Filtered Dataset
  • 31. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Step 2:Associate Select Associate tabchoose apriori algorithmpropertiesconfigure algorithm according to requirementsclick ‘start’ Fig 6.4 Choosing Apriori Algorithm Fig 6.5 Configuring Algorithm
  • 32. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Fig 6.6 Displaying Association Results === Run information === Scheme: weka.associations.Apriori -N 12 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1 Relation: supermarket-weka.filters.AllFilter-weka.filters.AllFilter-weka.filters.AllFilter-weka. filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.AllFilter-weka. filters.AllFilter Instances: 4627 Attributes: 217 [list of attributes omitted] === Associator model (full training set) ===
  • 33. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Apriori ======= Minimum support: 0.15 (694 instances) Minimum metric <confidence>: 0.9 Number of cycles performed: 17 Generated sets of large itemsets: Size of set of large itemsets L(1): 44 Size of set of large itemsets L(2): 380 Size of set of large itemsets L(3): 910 Size of set of large itemsets L(4): 633 Size of set of large itemsets L(5): 105 Size of set of large itemsets L(6): 1 Best rules found: 1. biscuits=t frozen foods=t fruit=t total=high 788 ==> bread and cake=t 723 conf:(0.92) 2. baking needs=t biscuits=t fruit=t total=high 760 ==> bread and cake=t 696 conf:(0.92) 3. baking needs=t frozen foods=t fruit=t total=high 770 ==> bread and cake=t 705 conf:(0.92)
  • 34. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ 4. biscuits=t fruit=t vegetables=t total=high 815 ==> bread and cake=t 746 conf:(0.92) 5. party snack foods=t fruit=t total=high 854 ==> bread and cake=t 779 conf:(0.91) 6. biscuits=t frozen foods=t vegetables=t total=high 797 ==> bread and cake=t 725 conf:(0.91) 7. baking needs=t biscuits=t vegetables=t total=high 772 ==> bread and cake=t 701 conf:(0.91) Sonali. Parab. 8. biscuits=t fruit=t total=high 954 ==> bread and cake=t 866 conf:(0.91) 9. frozen foods=t fruit=t vegetables=t total=high 834 ==> bread and cake=t 757 conf:(0.91) 10. frozen foods=t fruit=t total=high 969 ==> bread and cake=t 877 conf:(0.91) 11. baking needs=t fruit=t vegetables=t total=high 831 ==> bread and cake=t 752 conf:(0.9) 12. biscuits=t milk-cream=t total=high 907 ==> bread and cake=t 820 conf:(0.9)
  • 35. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. PRACTICAL NO: 7 Aim:Build the data mining model and implement k-nearest neighbor By Using Weka Tool. Solution: Dataset Used:Titanic Step 1:Preprocess Loading the Data in Data Frame Transforming the Data into Suitable Format Fig 7.1: Structure of Titanic dataset
  • 36. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Fig 7.2 Summary of Titanic Dataset Step 2:Associate Loading library ‘arules’ that contains functions for Association mining Function used to apply Apriori Algorithm with Default Configuration Fig 7.3 Choosing Apriori Algorithm
  • 37. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Fig 7.4 Inspecting the Results of Apriori Algorithm Fig 7.5 Applying Settings to Display Rules with RHS containing survived only
  • 38. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Step 3:Finding and Removing Redundant Rules Code to Find Redundant Rules Code to Remove Redundant Rules Fig 7.6 Finding & Removing Redundant Rules
  • 39. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Step 4:Visualizing: Loading library aulesViz which contains functions for Visualizing Assoication Results Function to plot Results Using Scatter Plot X axis: Support Y axis:Confidence Fig 7.7 Scatter Plot
  • 40. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Function to plot Association Results as Graph Plot Fig 7.8 Graph Plot Showing How Data Items are Assoicated
  • 41. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. PRACTICAL NO: 8 Aim:Consider the suitable data for text mining and Implement the Text Mining technique using R-Tool. Solution: Dataset Used:Plain Text File (www.txt) Step 1:Loading the Text File Loading Essential Libraries for Text Mining tm, SnowballC and twitteR Loading The Data From Text File Into RTool Using readLines() Fig 8.1: Using tail() and head() functions to display start and of paragraphs
  • 42. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Step 2:Transforming Loading tm library and transforming document to Corpusdoc Fig 8.2 Inspecting Corpusdoc Function to Remove Punctuations Fig 8.3 Removing Punctuations
  • 43. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Function to Strip White Spaces Fig 8.4 Stripping White Spaces Function to Remove Stop Words from Document Fig 8.5 Removing Stop Words From Document
  • 44. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Function to Stem the Document Fig 8.6 Stemming the Document Function to Convert corpusdoc to TermDocumentMatrix Fig 8.7 Inspecting TermDocumentMatrix
  • 45. MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ Sonali. Parab. Step 3:Finding Frequent Terms in Document Fig 8.7 Find Frequent Terms From Document Step 4:Finding Association among terms Function to find Association among Different terms in Document Fig 8.8 Result of How Strongly Terms Are Associated with Term “information”