clustering in research cluster analysis.ppt

© 2007 Prentice Hall 20-1
Chapter Outline
1) Overview
2) Basic Concept
3) Statistics Associated with Cluster Analysis
4) Conducting Cluster Analysis
i. Formulating the Problem
ii. Selecting a Distance or Similarity Measure
iii. Selecting a Clustering Procedure
iv. Deciding on the Number of Clusters
v. Interpreting and Profiling the Clusters
vi. Assessing Reliability and Validity

Statistics Associated with Cluster Analysis
 Agglomeration schedule. An agglomeration schedule
gives information on the objects or cases being combined
at each stage of a hierarchical clustering process.
 Cluster centroid. The cluster centroid is the mean values
of the variables for all the cases or objects in a particular
cluster.
 Cluster centers. The cluster centers are the initial
starting points in nonhierarchical clustering. Clusters are
built around these centers, or seeds.
 Cluster membership. Cluster membership indicates the
cluster to which each object or case belongs.

 Dendrogram. A dendrogram, or tree graph, is a
graphical device for displaying clustering results.
Vertical lines represent clusters that are joined
together. The position of the line on the scale
indicates the distances at which clusters were joined.
The dendrogram is read from left to right. Figure
20.8 is a dendrogram.
 Distances between cluster centers. These
distances indicate how separated the individual pairs
of clusters are. Clusters that are widely separated
are distinct, and therefore desirable.

 Icicle diagram. An icicle diagram is a graphical
display of clustering results, so called because it
resembles a row of icicles hanging from the eaves of
a house. The columns correspond to the objects
being clustered, and the rows correspond to the
number of clusters. An icicle diagram is read from
bottom to top. Figure 20.7 is an icicle diagram.
 Similarity/distance coefficient matrix. A
similarity/distance coefficient matrix is a lower-
triangle matrix containing pairwise distances between
objects or cases.

Conducting Cluster Analysis
Formulate the Problem
Assess the Validity of Clustering
Select a Distance Measure
Select a Clustering Procedure
Decide on the Number of Clusters
Interpret and Profile Clusters
Fig. 20.3

Attitudinal Data For Clustering
Case No. V1 V2 V3 V4 V5 V6
1 6 4 7 3 2 3
2 2 3 1 4 5 4
3 7 2 6 4 1 3
4 4 6 4 5 3 6
5 1 3 2 2 6 4
6 6 4 6 3 3 4
7 5 3 6 3 3 4
8 7 3 7 4 1 4
9 2 4 3 3 6 3
10 3 5 3 6 4 6
11 1 3 2 3 5 3
12 5 4 5 4 2 4
13 2 2 1 5 4 4
14 4 6 4 6 4 7
15 6 5 4 2 1 4
16 3 5 4 6 4 7
17 4 4 7 2 2 5
18 3 7 2 6 4 3
19 4 6 3 7 2 7
20 2 3 2 4 7
Table 20.1

Formulate the Problem
 Perhaps the most important part of formulating the
clustering problem is selecting the variables on which
the clustering is based.
 Inclusion of even one or two irrelevant variables may
distort an otherwise useful clustering solution.
 Basically, the set of variables selected should describe
the similarity between objects in terms that are
relevant to the marketing research problem.
 The variables should be selected based on past
research, theory, or a consideration of the hypotheses
being tested. In exploratory research, the researcher
should exercise judgment and intuition.

Select a Distance or Similarity Measure
 The most commonly used measure of similarity is the Euclidean
distance or its square. The Euclidean distance is the square
root of the sum of the squared differences in values for each
variable. Other distance measures are also available. The city-
block or Manhattan distance between two objects is the sum of
the absolute differences in values for each variable. The
Chebychev distance between two objects is the maximum
absolute difference in values for any variable.
 If the variables are measured in vastly different units, the
clustering solution will be influenced by the units of
measurement. In these cases, before clustering respondents,
we must standardize the data by rescaling each variable to have
a mean of zero and a standard deviation of unity. It is also
desirable to eliminate outliers (cases with atypical values).
 Use of different distance measures may lead to different
clustering results. Hence, it is advisable to use different
measures and compare the results.

clustering in research cluster analysis.ppt

More Related Content

Similar to clustering in research cluster analysis.ppt (20)

More from ssuserb9efd7 (20)

Recently uploaded (20)

clustering in research cluster analysis.ppt