Clustering and Association Rule

Clustering and Association Rules
Case 4
NOVEMBER 24, 2014
GROUP 7
Sushmita Dey
Nikolaos Minas
AllanKuo
Prof Shaonan Tian

Clustering
• Clustering is a popular
method.
• It groups a set of points
together in a . Objects different
from each other are grouped in
. The distance is used
as matric to separate objects to
.

Clustering
• Objects within same cluster are closer
to each other compared to objects in
different cluster.
• We used from the iris data
set to apply

K-Means Clustering
• We use k-means() function from the
“fpc” package.
• We started with number of cluster
equal to and the result was
of pure cluster,
of slightly less pure
cluster and the mixture of
and

K-Means Clustering
• Figure 1 • Figure 2
3 3
1
2
1
1
1
1 2 2
2
2
1 3
3
2
1 2
1
2
2
3
2
1
3
2
3 3
1
2
1
2
3
2
2
2
3
2
1
1 3
1
3
3
3
2
1
2
3
3
3
1
1
2
2 2
1
1
2
2
3
2
3
2
2
1
2
3
1
1
2
1
2
1
1
3
3
3
1
1
2
2
2
2
1
3
2
1
2
2
2
2
2 2
2
1
1
3
2
2
2
2
1
3
3
1
2
2
2
2
2
1
2
3
1 2
1
3
1 2
1
1
3
3
1 2
3
1
3
2
2
3
1
1
1
0 5 10
-15 -14 -13 -12 -11 -10 -9
dc 1
dc 2
4
1
1
4
4
2
4
4
2
4
3 4
4
2
1
1
3 1
1
4
2
2
4
4
1
4
3
1
1 1
3
4
2
4
4
1
4
4
4
1
4
2
2 1
3
1
1
1
4
3
4
1
1
1
2
4
4 4
3
3
4
4
1
4
1
4
4
3
4
1
2
2
4
3
4
2
2
1
1
1
3
3
2
4
4
4
4
3
1
4
4
4
4
4
4
3
2
1
4
4
4
4
3
1
1
3
4
4
4
4
2
4
1
3 4
3
1
2 4
3
4
1
1
2 4
3 1
3
3
3
2
0 5 10
-18 -16 -14 -12
dc 1
dc 2

Hierarchical Clustering with
hclust()
• We used hclust() function from the
“fpc” package
• We used War’s variance
method to create clusters
• We started with and
went upto

Hierarchical Clustering
• Fig 5: • Fig6
1
2
2
3
3
2 1 1
2
3
3 1
11
3
3
2
1 2 2
1
1
3
2
2
3
1
3
3
3
2 3
3
1
3
2
3
1
2
3
2
3
2
1
2
3
2
1
3
1
2
2
1
2
3
2 1
2
2
3
2
3
2
3
3
2
1
3
3
3
1
3
3
2
2
2
1
2
1
3
2
3
2
1
3
1
3
3
3
3
2
1
3
1
1
2
1
3
2
2
3
3
3
3
2 3 1
2
3
1
2
1
3
3
3
3
2
2
3
3
1
3
2
1
2
3
2
2
1
1
3
3
1
0 5 10
-15 -14 -13 -12 -11 -10 -9
dc 1
dc 2
1
2
2
2
3
1 11
2
2
2
1
1
3
2
2
3
1
3
4 3
4
2 4
4
3
3
4
1
3
2
3
1
2
3
2
3
2
1
2
3
2
1
4
2 1
2
1
2
3
2 1
2
4
2
4
2
4
3
2
1
3
3
4
1
4
4
2
2
2
1
22
1
3
2
4
2
1
2
3
1
3
1
3
3
3
3
3
2
1
3
1
1
1
2
1
2
1
3
2
4
3
3
2 3 1
2
4
1
2
1
3
3
4
2
2
3
3
1
3
2
1
2
3
2
2
1
1
4
4
1
5 10 15 20
-16 -15 -14 -13 -12 -11 -10
dc 1
dc 2
Figure 5: Centroid Plot with 3
Clusters
Figure 5: Centroid Plot with 4
Clusters

Association Rules
• Association rule is a popular
unsupervised
• Association rule is used in
in the retails stores to
find which items are
.

Association Rules
• Association rules are mostly suited to
find between items in
large set of transactional data
• A typical rule may be represented as:
• {peanut butter, jelly}-> { }
• If peanut butter and jelly are
purchased then

Apriori Algorithm
• Apriori Algorithm is used to learn
in a large
transactional dataset.
• Apriori algorithm employs a simple a
priori belief as a heuristic that all
of a set
must also be .
• We used the arules package from R to
analyze the Groceries dataset.

Data Exploration
• We install and load the package using the
commandsinstall.packages(“arules”
)and library(arules).
• We use R functions to explore the grocery
dataset.
• We use dim() function to find the
dimensions of the Groceries dataset
• We use inspect() function from
”arules” package to find the 1st 10
transactions in the data sets.

Data Exploration
• We use output from the summary()
function on the dataset to find most
frequently purchased item(
), items per average
transaction( ) and items in the
largest transaction # of items(32)
• We use the itemFrequencyPlot()
• Function to create plot from the dataset for visual
exploration
• We plotted item frequency plot for all the items
and items with support

Items frequency plot(All items)

Items frequency plot(Items with
10% support)

Associations Rules
•We use Apriori algorithm from the
arules package to generate set of
association rules.
•We generated rules using
support = and confidence =
by trying out different values
of support and confidence.

Associations Rules
• We use summary() function on rule set
to find the rule length distribution,
with rules containing one item.
• We found that generated rule sets
have quality metric of lift as
• We use inspect() and
sort()function to generate
sorted by .

Clustering and Association Rule

More Related Content

Similar to Clustering and Association Rule (20)

Recently uploaded (20)

Clustering and Association Rule