SlideShare a Scribd company logo
© 2007 Prentice Hall 20-1
Chapter Outline
1) Overview
2) Basic Concept
3) Statistics Associated with Cluster Analysis
4) Conducting Cluster Analysis
i. Formulating the Problem
ii. Selecting a Distance or Similarity Measure
iii. Selecting a Clustering Procedure
iv. Deciding on the Number of Clusters
v. Interpreting and Profiling the Clusters
vi. Assessing Reliability and Validity
© 2007 Prentice Hall 20-2
Statistics Associated with Cluster Analysis
 Agglomeration schedule. An agglomeration schedule
gives information on the objects or cases being combined
at each stage of a hierarchical clustering process.
 Cluster centroid. The cluster centroid is the mean values
of the variables for all the cases or objects in a particular
cluster.
 Cluster centers. The cluster centers are the initial
starting points in nonhierarchical clustering. Clusters are
built around these centers, or seeds.
 Cluster membership. Cluster membership indicates the
cluster to which each object or case belongs.
© 2007 Prentice Hall 20-3
Statistics Associated with Cluster Analysis
 Dendrogram. A dendrogram, or tree graph, is a
graphical device for displaying clustering results.
Vertical lines represent clusters that are joined
together. The position of the line on the scale
indicates the distances at which clusters were joined.
The dendrogram is read from left to right. Figure
20.8 is a dendrogram.
 Distances between cluster centers. These
distances indicate how separated the individual pairs
of clusters are. Clusters that are widely separated
are distinct, and therefore desirable.
© 2007 Prentice Hall 20-4
Statistics Associated with Cluster Analysis
 Icicle diagram. An icicle diagram is a graphical
display of clustering results, so called because it
resembles a row of icicles hanging from the eaves of
a house. The columns correspond to the objects
being clustered, and the rows correspond to the
number of clusters. An icicle diagram is read from
bottom to top. Figure 20.7 is an icicle diagram.
 Similarity/distance coefficient matrix. A
similarity/distance coefficient matrix is a lower-
triangle matrix containing pairwise distances between
objects or cases.
© 2007 Prentice Hall 20-5
Conducting Cluster Analysis
Formulate the Problem
Assess the Validity of Clustering
Select a Distance Measure
Select a Clustering Procedure
Decide on the Number of Clusters
Interpret and Profile Clusters
Fig. 20.3
© 2007 Prentice Hall 20-6
Attitudinal Data For Clustering
Case No. V1 V2 V3 V4 V5 V6
1 6 4 7 3 2 3
2 2 3 1 4 5 4
3 7 2 6 4 1 3
4 4 6 4 5 3 6
5 1 3 2 2 6 4
6 6 4 6 3 3 4
7 5 3 6 3 3 4
8 7 3 7 4 1 4
9 2 4 3 3 6 3
10 3 5 3 6 4 6
11 1 3 2 3 5 3
12 5 4 5 4 2 4
13 2 2 1 5 4 4
14 4 6 4 6 4 7
15 6 5 4 2 1 4
16 3 5 4 6 4 7
17 4 4 7 2 2 5
18 3 7 2 6 4 3
19 4 6 3 7 2 7
20 2 3 2 4 7
Table 20.1
© 2007 Prentice Hall 20-7
Conducting Cluster Analysis
Formulate the Problem
 Perhaps the most important part of formulating the
clustering problem is selecting the variables on which
the clustering is based.
 Inclusion of even one or two irrelevant variables may
distort an otherwise useful clustering solution.
 Basically, the set of variables selected should describe
the similarity between objects in terms that are
relevant to the marketing research problem.
 The variables should be selected based on past
research, theory, or a consideration of the hypotheses
being tested. In exploratory research, the researcher
should exercise judgment and intuition.
© 2007 Prentice Hall 20-8
Conducting Cluster Analysis
Select a Distance or Similarity Measure
 The most commonly used measure of similarity is the Euclidean
distance or its square. The Euclidean distance is the square
root of the sum of the squared differences in values for each
variable. Other distance measures are also available. The city-
block or Manhattan distance between two objects is the sum of
the absolute differences in values for each variable. The
Chebychev distance between two objects is the maximum
absolute difference in values for any variable.
 If the variables are measured in vastly different units, the
clustering solution will be influenced by the units of
measurement. In these cases, before clustering respondents,
we must standardize the data by rescaling each variable to have
a mean of zero and a standard deviation of unity. It is also
desirable to eliminate outliers (cases with atypical values).
 Use of different distance measures may lead to different
clustering results. Hence, it is advisable to use different
measures and compare the results.

More Related Content

PPTX
cluster analysis(1).pptxbfdhdhhthjhfghhj
PPT
PPTX
Cluster analysis in prespective to Marketing Research
PPT
ch_20_cluster_analysis.ppt
PPTX
01 Statistika Lanjut - Cluster Analysis part 1 with sound (1).pptx
PPTX
Cluster analysis
PPTX
Clusteranalysis
PPTX
Clusteranalysis 121206234137-phpapp01
cluster analysis(1).pptxbfdhdhhthjhfghhj
Cluster analysis in prespective to Marketing Research
ch_20_cluster_analysis.ppt
01 Statistika Lanjut - Cluster Analysis part 1 with sound (1).pptx
Cluster analysis
Clusteranalysis
Clusteranalysis 121206234137-phpapp01

Similar to clustering in research cluster analysis.ppt (20)

PPTX
Read first few slides cluster analysis
PDF
Clustering techniques
PPT
DM_clustering.ppt
PPTX
Data mining Techniques
PPTX
Cluster analysis
PPT
Cluster
PPT
4 DM Clustering ifor computerscience.ppt
PPTX
Cluster Analysis
PDF
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
PPTX
Hierarchical clustering
PPT
26-Clustering MTech-2017.ppt
PPT
Cluster spss week7
PDF
Clustering Algorithms - Kmeans,Min ALgorithm
PPTX
Clusters (4).pptx
PPTX
Cluster Analysis in Business Research Methods
PDF
ch_5_dm clustering in data mining.......
PDF
ClusteringClusteringClusteringClustering.pdf
PPTX
Cluster Analysis.pptx
PDF
Bs31267274
PPT
3.1 clustering
Read first few slides cluster analysis
Clustering techniques
DM_clustering.ppt
Data mining Techniques
Cluster analysis
Cluster
4 DM Clustering ifor computerscience.ppt
Cluster Analysis
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
Hierarchical clustering
26-Clustering MTech-2017.ppt
Cluster spss week7
Clustering Algorithms - Kmeans,Min ALgorithm
Clusters (4).pptx
Cluster Analysis in Business Research Methods
ch_5_dm clustering in data mining.......
ClusteringClusteringClusteringClustering.pdf
Cluster Analysis.pptx
Bs31267274
3.1 clustering
Ad

More from ssuserb9efd7 (20)

PPTX
ethical aspects of research in business.pptx
PPT
intro to research and its process brm.ppt
PPTX
agriculture mkting and its functionariespptx
PPTX
dbms ms access basics and introduction to ms access
PPT
systemdevelopmentmethodologies-160803075401.ppt
PPTX
communication IN MANAGEMENT INFORMATION SYSTEM
PPTX
datacommunication-labay-160923034228.pptx
PPT
2-presentationsmalhotraorgnlppt01-140523022714-phpapp01 (1).ppt
PPT
basics of management information system.
PPTX
hypothesis in research .......................
PPTX
Tax_treatment_of_foreign_exchange_gains_and_losses[1].pptx
PPT
tabulation and cross tabulation: data processsing
PPTX
Communication in principles of management
PPTX
capitalisation financial management fm
PPTX
Early Advocates of Organisational Behaviour and hawthorne studies.pptx
PPTX
types and concept of experimental research design .pptx
PDF
chapter 1.pdf
PDF
Organisational Design.pdf
PPTX
new ppt leadership issues.pptx
PPTX
ch26 aakar david.pptx
ethical aspects of research in business.pptx
intro to research and its process brm.ppt
agriculture mkting and its functionariespptx
dbms ms access basics and introduction to ms access
systemdevelopmentmethodologies-160803075401.ppt
communication IN MANAGEMENT INFORMATION SYSTEM
datacommunication-labay-160923034228.pptx
2-presentationsmalhotraorgnlppt01-140523022714-phpapp01 (1).ppt
basics of management information system.
hypothesis in research .......................
Tax_treatment_of_foreign_exchange_gains_and_losses[1].pptx
tabulation and cross tabulation: data processsing
Communication in principles of management
capitalisation financial management fm
Early Advocates of Organisational Behaviour and hawthorne studies.pptx
types and concept of experimental research design .pptx
chapter 1.pdf
Organisational Design.pdf
new ppt leadership issues.pptx
ch26 aakar david.pptx
Ad

Recently uploaded (20)

PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPT
Quality review (1)_presentation of this 21
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Computer network topology notes for revision
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Mega Projects Data Mega Projects Data
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Introduction to machine learning and Linear Models
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to Knowledge Engineering Part 1
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Supervised vs unsupervised machine learning algorithms
oil_refinery_comprehensive_20250804084928 (1).pptx
Quality review (1)_presentation of this 21
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Computer network topology notes for revision
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Fluorescence-microscope_Botany_detailed content
Mega Projects Data Mega Projects Data
Galatica Smart Energy Infrastructure Startup Pitch Deck
Qualitative Qantitative and Mixed Methods.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Reliability_Chapter_ presentation 1221.5784
ISS -ESG Data flows What is ESG and HowHow
Introduction to machine learning and Linear Models

clustering in research cluster analysis.ppt

  • 1. © 2007 Prentice Hall 20-1 Chapter Outline 1) Overview 2) Basic Concept 3) Statistics Associated with Cluster Analysis 4) Conducting Cluster Analysis i. Formulating the Problem ii. Selecting a Distance or Similarity Measure iii. Selecting a Clustering Procedure iv. Deciding on the Number of Clusters v. Interpreting and Profiling the Clusters vi. Assessing Reliability and Validity
  • 2. © 2007 Prentice Hall 20-2 Statistics Associated with Cluster Analysis  Agglomeration schedule. An agglomeration schedule gives information on the objects or cases being combined at each stage of a hierarchical clustering process.  Cluster centroid. The cluster centroid is the mean values of the variables for all the cases or objects in a particular cluster.  Cluster centers. The cluster centers are the initial starting points in nonhierarchical clustering. Clusters are built around these centers, or seeds.  Cluster membership. Cluster membership indicates the cluster to which each object or case belongs.
  • 3. © 2007 Prentice Hall 20-3 Statistics Associated with Cluster Analysis  Dendrogram. A dendrogram, or tree graph, is a graphical device for displaying clustering results. Vertical lines represent clusters that are joined together. The position of the line on the scale indicates the distances at which clusters were joined. The dendrogram is read from left to right. Figure 20.8 is a dendrogram.  Distances between cluster centers. These distances indicate how separated the individual pairs of clusters are. Clusters that are widely separated are distinct, and therefore desirable.
  • 4. © 2007 Prentice Hall 20-4 Statistics Associated with Cluster Analysis  Icicle diagram. An icicle diagram is a graphical display of clustering results, so called because it resembles a row of icicles hanging from the eaves of a house. The columns correspond to the objects being clustered, and the rows correspond to the number of clusters. An icicle diagram is read from bottom to top. Figure 20.7 is an icicle diagram.  Similarity/distance coefficient matrix. A similarity/distance coefficient matrix is a lower- triangle matrix containing pairwise distances between objects or cases.
  • 5. © 2007 Prentice Hall 20-5 Conducting Cluster Analysis Formulate the Problem Assess the Validity of Clustering Select a Distance Measure Select a Clustering Procedure Decide on the Number of Clusters Interpret and Profile Clusters Fig. 20.3
  • 6. © 2007 Prentice Hall 20-6 Attitudinal Data For Clustering Case No. V1 V2 V3 V4 V5 V6 1 6 4 7 3 2 3 2 2 3 1 4 5 4 3 7 2 6 4 1 3 4 4 6 4 5 3 6 5 1 3 2 2 6 4 6 6 4 6 3 3 4 7 5 3 6 3 3 4 8 7 3 7 4 1 4 9 2 4 3 3 6 3 10 3 5 3 6 4 6 11 1 3 2 3 5 3 12 5 4 5 4 2 4 13 2 2 1 5 4 4 14 4 6 4 6 4 7 15 6 5 4 2 1 4 16 3 5 4 6 4 7 17 4 4 7 2 2 5 18 3 7 2 6 4 3 19 4 6 3 7 2 7 20 2 3 2 4 7 Table 20.1
  • 7. © 2007 Prentice Hall 20-7 Conducting Cluster Analysis Formulate the Problem  Perhaps the most important part of formulating the clustering problem is selecting the variables on which the clustering is based.  Inclusion of even one or two irrelevant variables may distort an otherwise useful clustering solution.  Basically, the set of variables selected should describe the similarity between objects in terms that are relevant to the marketing research problem.  The variables should be selected based on past research, theory, or a consideration of the hypotheses being tested. In exploratory research, the researcher should exercise judgment and intuition.
  • 8. © 2007 Prentice Hall 20-8 Conducting Cluster Analysis Select a Distance or Similarity Measure  The most commonly used measure of similarity is the Euclidean distance or its square. The Euclidean distance is the square root of the sum of the squared differences in values for each variable. Other distance measures are also available. The city- block or Manhattan distance between two objects is the sum of the absolute differences in values for each variable. The Chebychev distance between two objects is the maximum absolute difference in values for any variable.  If the variables are measured in vastly different units, the clustering solution will be influenced by the units of measurement. In these cases, before clustering respondents, we must standardize the data by rescaling each variable to have a mean of zero and a standard deviation of unity. It is also desirable to eliminate outliers (cases with atypical values).  Use of different distance measures may lead to different clustering results. Hence, it is advisable to use different measures and compare the results.