Optimizing Market Segmentation

Optimizing Segmentation
Insight Research Group

2

Why is Segmentation Difficult?
 Infinite number of possible solutions

 Hundreds of possible variables for to use

 Clearly defined clusters are rarely present in real life data-
sets

3

Technical Challenges

 Challenge: Incorporating fundamentally
categorical variables
 Ethnicity, Religion, Political Party, Etc.

 Standard methods assume continuous data (ideal case) and
require interval level data as worst case (e.g. ratings scales)
 Correlation, linear regression, k-means clustering

4

Technical Solutions
 Challenge: Incorporating fundamentally categorical
variables

 Multiple Correspondence Analysis (factor analysis for categorical
data)

 Pro: handles both demographic (categorical) and ratings variables
 Would allow treating sets of variables separately (i.e. demographic,
behavioral, psychological) – these sets could be used as inputs to
clustering method

 Con: segmentation would be based on extracted components

5


 Determining the number of clusters/segments in the
data

 Standard methods require the user to specify the number of cluster
to extract

 Our standard practice results in fewer clusters then input variables
 e.g. AMC segmentation solutions required ~12 variables to find ~5
segments
 This ratio of features-to-segments will „water-down‟ the effect of the
individual variables (segments do not differ significantly on most items)

6

Technical Solution 1
 Challenge: Determining the number of clusters/segments
in the data

 Solution: fit a probabilistic mixture model and compute a
complexity penalized likelihood (AIC / BIC scores)
 The model with the best AIC / BIC score is our best guess for the
number of natural clusters in the data

 Gaussian mixture models for continuous data
 Latent Class Models for categorical data
 Latent class models can handle both categorical and continuous data if the
continuous data is binned.

 Both of the above return BIC scores to determine the number of
clusters

7

 How many clusters do you see? (4 sources generated the data –duh)

8

 The BIC infers 4 clusters (4 clusters solution had the best BIC score)

9

 How many clusters do you see? (4 sources generated the data –not so obvious)

10

 The BIC says 4! (4 clusters had the best BIC score, thanks BIC!)

11

 Challenge: Determining the number of clusters/segments
in the data
 Solution: ensure there are fewer input variables then extracted
clusters
2(+) segments can be obtained from
a single variable.

That is a 2-1 ratio of segments-to-
variables

For AMC & MTV we got 5 segments
from ~12 variables. A ratio of 0.4-1.
- That is less then 1 segments for
every two variables…

 Also See: Van Buuren & Heiser (1989); Vichi & Kiers (2001); Hwang, Dillon, &
Takane (2006).

12

 Respondents vary in their use of
ratings scales

 Some respondents only use part
of the scale,
 Either top or bottom of range

 Segmentation method will find the
high/low scale-use respondents
and define segments for them
 See AMC segments,

13

Psychographic
banner for AMC
segments.

These items were not
used to define the cluster
solution.

14

 Challenge: Respondents vary in their use of ratings
scales

 Calibrate respondents to equate ratings scale across sample
 Overcoming Scale Use Heterogeneity (2003) Peter E. Rossi

 Pro: Improves the accuracy and validly of standard methods
 E.g. correlation, regression, clustering

 Con: requires complex and computational expensive models
 i.e. hierarchical bayesian models – available as R package

15

 Challenge: Respondents vary in their use of ratings
scales

 Abandon rating scales – use simple Agree/Disagree variables
 Focus on methods for categorical variables

 Multiple Correspondence Analysis (factor analysis for categorical data)

 Pro: handles both demographic (categorical) and ratings variables
 Would allow treating sets of variables separately (i.e. demographic, behavioral,
psychological) – these sets could be used as inputs to clustering methods

16

What Slows Us Down?
 Each segmentation iteration consumes resources

 Producing new segmentation variable for each respondent
 .5 man hour

 Producing new banners
 Generating tables - .25 hours
 Formatting and printing – 1+ man hours

 Analyzing full banner for new segmentation
 Requires entire research team, 6+ man hours

17

How to Speed it up
 Producing new segmentation variable for each respondent
 .5 man hour – Not the bottleneck

 Producing new banners
 Generating tables - .25 hours – Not the bottleneck
 Formatting and printing – 1+ man hours – Potential for Automation

 Analyzing full banner for the new segmentation
 Requires entire research team, 6+ man hours – workflow bottleneck
 Ideas / brainstorm
 Criteria of success is often vague
 When the goal is well defined quant methods can increase efficiency
 If you can formalize it you can solve it
 Time invested in the planning phase will reap productivity gains during analysis

18

Hypothetical Case Study
 Goals Brainstorm:
 Client and previous research says:
 “segmentation should differentiate enthusiasts (early adopters) and utility
consumers (late adopters)”
 “also, segmentation should include demographics that are known to influence
technology adoption.
 Age, Gender, Income, Education

 Quant answers:
 “Ok, lets write a battery of questions addressing consumers perceptions and
relation to technology products – this will be distilled into a single „tech
enthusiasm‟ measure.
 “Also, all relevant demographic information can be reduced into a one (or more)
demo factors
 “Segments will be defined from a „reduced dimensionality‟ representation of the
data (MCA)”

20
Categories graph

21
Combined graph

23

MCA for Segmentation
 (2006). An extension of multiple correspondence analysis for identifying
heterogeneous subgroups of respondents

 (2010). Traveler segmentation strategy with nominal variables through
correspondence analysis

 (2010). Fuzzy cluster multiple correspondence analysis

 (2010). Simultaneous two-way clustering of multiple correspondence
analysis

 (2005). A simultaneous approach to constrained multiple correspondence
analysis and cluster analysis for market segmentation

 (2002). Analysis of categorical marketing data by generalized constrained
multiple correspondence analysis

24

Further Directions
 Extension to Multiple Correspondence Analysis
 Methods that let us combine nominal, numeric, and ordinal
variables
 Methods that let us group variables into sets.
 E.g. could ensures that psychographic, behavioral and demographic
have an equal influence on the final solution.

 Methods that simultaneously preform dimensionality
reduction and cluster discovery
 Optimizes the entire analysis to discover the most distinctive
clusters
 Very promising approach
 Con: I have not found an implementation of these methods.

Optimizing Market Segmentation

More Related Content

What's hot (20)

Similar to Optimizing Market Segmentation (20)

Recently uploaded (20)

Optimizing Market Segmentation