SlideShare a Scribd company logo
Clustering and Association Rules 
Case 4 
NOVEMBER 24, 2014 
GROUP 7 
Sushmita Dey 
Nikolaos Minas 
AllanKuo 
Prof Shaonan Tian
Clustering 
• Clustering is a popular 
method. 
• It groups a set of points 
together in a . Objects different 
from each other are grouped in 
. The distance is used 
as matric to separate objects to 
.
Clustering 
• Objects within same cluster are closer 
to each other compared to objects in 
different cluster. 
• We used from the iris data 
set to apply
K-Means Clustering 
• We use k-means() function from the 
“fpc” package. 
• We started with number of cluster 
equal to and the result was 
of pure cluster, 
of slightly less pure 
cluster and the mixture of 
and
K-Means Clustering 
• Figure 1 • Figure 2 
3 3 
1 
2 
1 
1 
1 
1 2 2 
2 
2 
1 3 
3 
2 
1 2 
1 
2 
2 
3 
2 
1 
3 
2 
3 3 
1 
2 
1 
2 
3 
2 
2 
2 
3 
2 
1 
1 3 
1 
3 
3 
3 
2 
1 
2 
3 
3 
3 
1 
1 
2 
2 2 
1 
1 
2 
2 
3 
2 
3 
2 
2 
1 
2 
3 
1 
1 
2 
1 
2 
1 
1 
3 
3 
3 
1 
1 
2 
2 
2 
2 
1 
3 
2 
1 
2 
2 
2 
2 
2 2 
2 
1 
1 
3 
2 
2 
2 
2 
1 
3 
3 
1 
2 
2 
2 
2 
2 
1 
2 
3 
1 2 
1 
3 
1 2 
1 
1 
3 
3 
1 2 
3 
1 
3 
2 
2 
3 
1 
1 
1 
0 5 10 
-15 -14 -13 -12 -11 -10 -9 
dc 1 
dc 2 
4 
1 
1 
4 
4 
2 
4 
4 
2 
4 
3 4 
4 
2 
1 
1 
3 1 
1 
4 
2 
2 
4 
4 
1 
4 
3 
1 
1 1 
3 
4 
2 
4 
4 
1 
4 
4 
4 
1 
4 
2 
2 1 
3 
1 
1 
1 
4 
3 
4 
1 
1 
1 
2 
4 
4 4 
3 
3 
4 
4 
1 
4 
1 
4 
4 
3 
4 
1 
2 
2 
4 
3 
4 
2 
2 
1 
1 
1 
3 
3 
2 
4 
4 
4 
4 
3 
1 
4 
4 
4 
4 
4 
4 
3 
2 
1 
4 
4 
4 
4 
3 
1 
1 
3 
4 
4 
4 
4 
2 
4 
1 
3 4 
3 
1 
2 4 
3 
4 
1 
1 
2 4 
3 1 
3 
3 
3 
2 
0 5 10 
-18 -16 -14 -12 
dc 1 
dc 2
Hierarchical Clustering with 
hclust() 
• We used hclust() function from the 
“fpc” package 
• We used War’s variance 
method to create clusters 
• We started with and 
went upto
Hierarchical Clustering 
• Fig 5: • Fig6 
1 
2 
2 
3 
3 
2 1 1 
2 
3 
3 1 
11 
3 
3 
2 
1 2 2 
1 
1 
3 
2 
2 
3 
1 
3 
3 
3 
2 3 
3 
1 
3 
2 
3 
1 
2 
3 
2 
3 
2 
1 
2 
3 
2 
1 
3 
1 
2 
2 
1 
2 
3 
2 1 
2 
2 
3 
2 
3 
2 
3 
3 
2 
1 
3 
3 
3 
1 
3 
3 
2 
2 
2 
1 
2 
1 
3 
2 
3 
2 
1 
3 
1 
3 
3 
3 
3 
2 
1 
3 
1 
1 
2 
1 
3 
2 
2 
3 
3 
3 
3 
2 3 1 
2 
3 
1 
2 
1 
3 
3 
3 
3 
2 
2 
3 
3 
1 
3 
2 
1 
2 
3 
2 
2 
1 
1 
3 
3 
1 
0 5 10 
-15 -14 -13 -12 -11 -10 -9 
dc 1 
dc 2 
1 
2 
2 
2 
3 
1 11 
2 
2 
2 
1 
1 
3 
2 
2 
3 
1 
3 
4 3 
4 
2 4 
4 
3 
3 
4 
1 
3 
2 
3 
1 
2 
3 
2 
3 
2 
1 
2 
3 
2 
1 
4 
2 1 
2 
1 
2 
3 
2 1 
2 
4 
2 
4 
2 
4 
3 
2 
1 
3 
3 
4 
1 
4 
4 
2 
2 
2 
1 
22 
1 
3 
2 
4 
2 
1 
2 
3 
1 
3 
1 
3 
3 
3 
3 
3 
2 
1 
3 
1 
1 
1 
2 
1 
2 
1 
3 
2 
4 
3 
3 
2 3 1 
2 
4 
1 
2 
1 
3 
3 
4 
2 
2 
3 
3 
1 
3 
2 
1 
2 
3 
2 
2 
1 
1 
4 
4 
1 
5 10 15 20 
-16 -15 -14 -13 -12 -11 -10 
dc 1 
dc 2 
Figure 5: Centroid Plot with 3 
Clusters 
Figure 5: Centroid Plot with 4 
Clusters
Association Rules 
• Association rule is a popular 
unsupervised 
• Association rule is used in 
in the retails stores to 
find which items are 
.
Association Rules 
• Association rules are mostly suited to 
find between items in 
large set of transactional data 
• A typical rule may be represented as: 
• {peanut butter, jelly}-> { } 
• If peanut butter and jelly are 
purchased then
Apriori Algorithm 
• Apriori Algorithm is used to learn 
in a large 
transactional dataset. 
• Apriori algorithm employs a simple a 
priori belief as a heuristic that all 
of a set 
must also be . 
• We used the arules package from R to 
analyze the Groceries dataset.
Groceries Data Sets
Data Exploration 
• We install and load the package using the 
commandsinstall.packages(“arules” 
)and library(arules). 
• We use R functions to explore the grocery 
dataset. 
• We use dim() function to find the 
dimensions of the Groceries dataset 
• We use inspect() function from 
”arules” package to find the 1st 10 
transactions in the data sets.
Data Exploration 
• We use output from the summary() 
function on the dataset to find most 
frequently purchased item( 
), items per average 
transaction( ) and items in the 
largest transaction # of items(32) 
• We use the itemFrequencyPlot() 
• Function to create plot from the dataset for visual 
exploration 
• We plotted item frequency plot for all the items 
and items with support
Items frequency plot(All items)
Items frequency plot(Items with 
10% support)
Associations Rules 
•We use Apriori algorithm from the 
arules package to generate set of 
association rules. 
•We generated rules using 
support = and confidence = 
by trying out different values 
of support and confidence.
Associations Rules 
• We use summary() function on rule set 
to find the rule length distribution, 
with rules containing one item. 
• We found that generated rule sets 
have quality metric of lift as 
• We use inspect() and 
sort()function to generate 
sorted by .

More Related Content

PPTX
Colloborative computing
PPTX
Time Series Forecasting for Google Inc. and Break-even analysis for Google gl...
PPTX
Kenneth Lay
PPTX
Unit 4.pptx
PDF
RDataMining slides-association-rule-mining-with-r
PPTX
PDF
Data Science - Part VII - Cluster Analysis
PPT
Data Mining Concepts 15061
Colloborative computing
Time Series Forecasting for Google Inc. and Break-even analysis for Google gl...
Kenneth Lay
Unit 4.pptx
RDataMining slides-association-rule-mining-with-r
Data Science - Part VII - Cluster Analysis
Data Mining Concepts 15061

Similar to Clustering and Association Rule (20)

PPT
Data Mining Concepts
PPT
Data Mining Concepts
PPT
Cluster2
PPT
DM_clustering.ppt
PPT
Lect4
PPTX
5_6305592025861329686.pptx_20240912_120520_0000.pptx
PPTX
Association Rule Mining in Data Mining.pptx
PPS
UHDMML.pps
PDF
ch_5_dm clustering in data mining.......
PPT
pattern mninng.ppt
PDF
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
PDF
Mat189: Cluster Analysis with NBA Sports Data
PPTX
Data mining and warehousing
PDF
MLSD18. Unsupervised Learning
PPT
Apriori and Eclat algorithm in Association Rule Mining
PDF
Data Mining Module 4 Business Analytics.pdf
PPTX
ASSOCIATION Rule plus MArket basket Analysis.pptx
PPTX
Association and Correlation analysis.....
PPTX
Classification & Clustering.pptx
PDF
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
Data Mining Concepts
Data Mining Concepts
Cluster2
DM_clustering.ppt
Lect4
5_6305592025861329686.pptx_20240912_120520_0000.pptx
Association Rule Mining in Data Mining.pptx
UHDMML.pps
ch_5_dm clustering in data mining.......
pattern mninng.ppt
KIT-601 Lecture Notes-UNIT-4.pdf Frequent Itemsets and Clustering
Mat189: Cluster Analysis with NBA Sports Data
Data mining and warehousing
MLSD18. Unsupervised Learning
Apriori and Eclat algorithm in Association Rule Mining
Data Mining Module 4 Business Analytics.pdf
ASSOCIATION Rule plus MArket basket Analysis.pptx
Association and Correlation analysis.....
Classification & Clustering.pptx
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
Ad

Recently uploaded (20)

PPTX
GDM (1) (1).pptx small presentation for students
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Pharma ospi slides which help in ospi learning
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
01-Introduction-to-Information-Management.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Classroom Observation Tools for Teachers
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
GDM (1) (1).pptx small presentation for students
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Pharma ospi slides which help in ospi learning
Abdominal Access Techniques with Prof. Dr. R K Mishra
01-Introduction-to-Information-Management.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Anesthesia in Laparoscopic Surgery in India
Module 4: Burden of Disease Tutorial Slides S2 2025
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
VCE English Exam - Section C Student Revision Booklet
Microbial disease of the cardiovascular and lymphatic systems
STATICS OF THE RIGID BODIES Hibbelers.pdf
Classroom Observation Tools for Teachers
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Ad

Clustering and Association Rule

  • 1. Clustering and Association Rules Case 4 NOVEMBER 24, 2014 GROUP 7 Sushmita Dey Nikolaos Minas AllanKuo Prof Shaonan Tian
  • 2. Clustering • Clustering is a popular method. • It groups a set of points together in a . Objects different from each other are grouped in . The distance is used as matric to separate objects to .
  • 3. Clustering • Objects within same cluster are closer to each other compared to objects in different cluster. • We used from the iris data set to apply
  • 4. K-Means Clustering • We use k-means() function from the “fpc” package. • We started with number of cluster equal to and the result was of pure cluster, of slightly less pure cluster and the mixture of and
  • 5. K-Means Clustering • Figure 1 • Figure 2 3 3 1 2 1 1 1 1 2 2 2 2 1 3 3 2 1 2 1 2 2 3 2 1 3 2 3 3 1 2 1 2 3 2 2 2 3 2 1 1 3 1 3 3 3 2 1 2 3 3 3 1 1 2 2 2 1 1 2 2 3 2 3 2 2 1 2 3 1 1 2 1 2 1 1 3 3 3 1 1 2 2 2 2 1 3 2 1 2 2 2 2 2 2 2 1 1 3 2 2 2 2 1 3 3 1 2 2 2 2 2 1 2 3 1 2 1 3 1 2 1 1 3 3 1 2 3 1 3 2 2 3 1 1 1 0 5 10 -15 -14 -13 -12 -11 -10 -9 dc 1 dc 2 4 1 1 4 4 2 4 4 2 4 3 4 4 2 1 1 3 1 1 4 2 2 4 4 1 4 3 1 1 1 3 4 2 4 4 1 4 4 4 1 4 2 2 1 3 1 1 1 4 3 4 1 1 1 2 4 4 4 3 3 4 4 1 4 1 4 4 3 4 1 2 2 4 3 4 2 2 1 1 1 3 3 2 4 4 4 4 3 1 4 4 4 4 4 4 3 2 1 4 4 4 4 3 1 1 3 4 4 4 4 2 4 1 3 4 3 1 2 4 3 4 1 1 2 4 3 1 3 3 3 2 0 5 10 -18 -16 -14 -12 dc 1 dc 2
  • 6. Hierarchical Clustering with hclust() • We used hclust() function from the “fpc” package • We used War’s variance method to create clusters • We started with and went upto
  • 7. Hierarchical Clustering • Fig 5: • Fig6 1 2 2 3 3 2 1 1 2 3 3 1 11 3 3 2 1 2 2 1 1 3 2 2 3 1 3 3 3 2 3 3 1 3 2 3 1 2 3 2 3 2 1 2 3 2 1 3 1 2 2 1 2 3 2 1 2 2 3 2 3 2 3 3 2 1 3 3 3 1 3 3 2 2 2 1 2 1 3 2 3 2 1 3 1 3 3 3 3 2 1 3 1 1 2 1 3 2 2 3 3 3 3 2 3 1 2 3 1 2 1 3 3 3 3 2 2 3 3 1 3 2 1 2 3 2 2 1 1 3 3 1 0 5 10 -15 -14 -13 -12 -11 -10 -9 dc 1 dc 2 1 2 2 2 3 1 11 2 2 2 1 1 3 2 2 3 1 3 4 3 4 2 4 4 3 3 4 1 3 2 3 1 2 3 2 3 2 1 2 3 2 1 4 2 1 2 1 2 3 2 1 2 4 2 4 2 4 3 2 1 3 3 4 1 4 4 2 2 2 1 22 1 3 2 4 2 1 2 3 1 3 1 3 3 3 3 3 2 1 3 1 1 1 2 1 2 1 3 2 4 3 3 2 3 1 2 4 1 2 1 3 3 4 2 2 3 3 1 3 2 1 2 3 2 2 1 1 4 4 1 5 10 15 20 -16 -15 -14 -13 -12 -11 -10 dc 1 dc 2 Figure 5: Centroid Plot with 3 Clusters Figure 5: Centroid Plot with 4 Clusters
  • 8. Association Rules • Association rule is a popular unsupervised • Association rule is used in in the retails stores to find which items are .
  • 9. Association Rules • Association rules are mostly suited to find between items in large set of transactional data • A typical rule may be represented as: • {peanut butter, jelly}-> { } • If peanut butter and jelly are purchased then
  • 10. Apriori Algorithm • Apriori Algorithm is used to learn in a large transactional dataset. • Apriori algorithm employs a simple a priori belief as a heuristic that all of a set must also be . • We used the arules package from R to analyze the Groceries dataset.
  • 12. Data Exploration • We install and load the package using the commandsinstall.packages(“arules” )and library(arules). • We use R functions to explore the grocery dataset. • We use dim() function to find the dimensions of the Groceries dataset • We use inspect() function from ”arules” package to find the 1st 10 transactions in the data sets.
  • 13. Data Exploration • We use output from the summary() function on the dataset to find most frequently purchased item( ), items per average transaction( ) and items in the largest transaction # of items(32) • We use the itemFrequencyPlot() • Function to create plot from the dataset for visual exploration • We plotted item frequency plot for all the items and items with support
  • 15. Items frequency plot(Items with 10% support)
  • 16. Associations Rules •We use Apriori algorithm from the arules package to generate set of association rules. •We generated rules using support = and confidence = by trying out different values of support and confidence.
  • 17. Associations Rules • We use summary() function on rule set to find the rule length distribution, with rules containing one item. • We found that generated rule sets have quality metric of lift as • We use inspect() and sort()function to generate sorted by .