SlideShare a Scribd company logo
Optimizing Segmentation
Insight Research Group
2




Why is Segmentation Difficult?
 Infinite number of possible solutions

 Hundreds of possible variables for to use

 Clearly defined clusters are rarely present in real life data-
  sets
3




Technical Challenges

 Challenge: Incorporating fundamentally
  categorical variables
    Ethnicity, Religion, Political Party, Etc.


  Standard methods assume continuous data (ideal case) and
   require interval level data as worst case (e.g. ratings scales)
      Correlation, linear regression, k-means clustering
4




Technical Solutions
 Challenge: Incorporating fundamentally categorical
  variables

   Multiple Correspondence Analysis (factor analysis for categorical
    data)

   Pro: handles both demographic (categorical) and ratings variables
    Would allow treating sets of variables separately (i.e. demographic,
     behavioral, psychological) – these sets could be used as inputs to
     clustering method


   Con: segmentation would be based on extracted components
5




Technical Challenges

 Determining the number of clusters/segments in the
  data

    Standard methods require the user to specify the number of cluster
     to extract

    Our standard practice results in fewer clusters then input variables
      e.g. AMC segmentation solutions required ~12 variables to find ~5
       segments
      This ratio of features-to-segments will „water-down‟ the effect of the
       individual variables (segments do not differ significantly on most items)
6

Technical Solution 1
 Challenge: Determining the number of clusters/segments
  in the data

 Solution: fit a probabilistic mixture model and compute a
  complexity penalized likelihood (AIC / BIC scores)
   The model with the best AIC / BIC score is our best guess for the
    number of natural clusters in the data

   Gaussian mixture models for continuous data
   Latent Class Models for categorical data
     Latent class models can handle both categorical and continuous data if the
      continuous data is binned.


   Both of the above return BIC scores to determine the number of
    clusters
7


Technical Solution 1
 How many clusters do you see? (4 sources generated the data –duh)
8


Technical Solution 1
 The BIC infers 4 clusters (4 clusters solution had the best BIC score)
9


Technical Solution 1
 How many clusters do you see? (4 sources generated the data –not so obvious)
10


Technical Solution 1
 The BIC says 4! (4 clusters had the best BIC score, thanks BIC!)
11


Technical Solution 2
 Challenge: Determining the number of clusters/segments
  in the data
   Solution: ensure there are fewer input variables then extracted
    clusters
                                          2(+) segments can be obtained from
                                          a single variable.

                                          That is a 2-1 ratio of segments-to-
                                          variables

                                          For AMC & MTV we got 5 segments
                                          from ~12 variables. A ratio of 0.4-1.
                                          - That is less then 1 segments for
                                          every two variables…



       Also See: Van Buuren & Heiser (1989); Vichi & Kiers (2001); Hwang, Dillon, &
        Takane (2006).
12



Technical Challenges
 Respondents vary in their use of
  ratings scales

 Some respondents only use part
  of the scale,
     Either top or bottom of range


 Segmentation method will find the
  high/low scale-use respondents
  and define segments for them
     See AMC segments,
13




Psychographic
banner for AMC
segments.

These items were not
used to define the cluster
solution.
14




Technical Solution 1
 Challenge: Respondents vary in their use of ratings
  scales

   Calibrate respondents to equate ratings scale across sample
      Overcoming Scale Use Heterogeneity (2003) Peter E. Rossi


   Pro: Improves the accuracy and validly of standard methods
      E.g. correlation, regression, clustering


   Con: requires complex and computational expensive models
      i.e. hierarchical bayesian models – available as R package
15




Technical Solution 2
 Challenge: Respondents vary in their use of ratings
  scales

   Abandon rating scales – use simple Agree/Disagree variables
    Focus on methods for categorical variables

    Multiple Correspondence Analysis (factor analysis for categorical data)

    Pro: handles both demographic (categorical) and ratings variables
      Would allow treating sets of variables separately (i.e. demographic, behavioral,
       psychological) – these sets could be used as inputs to clustering methods
16




What Slows Us Down?
 Each segmentation iteration consumes resources

 Producing new segmentation variable for each respondent
    .5 man hour

 Producing new banners
    Generating tables - .25 hours
    Formatting and printing – 1+ man hours

 Analyzing full banner for new segmentation
    Requires entire research team, 6+ man hours
17



How to Speed it up
 Producing new segmentation variable for each respondent
     .5 man hour – Not the bottleneck


 Producing new banners
     Generating tables - .25 hours – Not the bottleneck
     Formatting and printing – 1+ man hours – Potential for Automation


 Analyzing full banner for the new segmentation
     Requires entire research team, 6+ man hours – workflow bottleneck
     Ideas / brainstorm
       Criteria of success is often vague
       When the goal is well defined quant methods can increase efficiency
          If you can formalize it you can solve it
       Time invested in the planning phase will reap productivity gains during analysis
18


Hypothetical Case Study
 Goals Brainstorm:
   Client and previous research says:
    “segmentation should differentiate enthusiasts (early adopters) and utility
     consumers (late adopters)”
    “also, segmentation should include demographics that are known to influence
     technology adoption.
       Age, Gender, Income, Education



   Quant answers:
    “Ok, lets write a battery of questions addressing consumers perceptions and
     relation to technology products – this will be distilled into a single „tech
     enthusiasm‟ measure.
    “Also, all relevant demographic information can be reduced into a one (or more)
     demo factors
    “Segments will be defined from a „reduced dimensionality‟ representation of the
     data (MCA)”
19
Hypothetical Case Study
20
Hypothetical Case Study
          Categories graph
21
Hypothetical Case Study
          Combined graph
22
Hypothetical Case Study
23



MCA for Segmentation
 (2006). An extension of multiple correspondence analysis for identifying
  heterogeneous subgroups of respondents

 (2010). Traveler segmentation strategy with nominal variables through
  correspondence analysis

 (2010). Fuzzy cluster multiple correspondence analysis

 (2010). Simultaneous two-way clustering of multiple correspondence
  analysis

 (2005). A simultaneous approach to constrained multiple correspondence
  analysis and cluster analysis for market segmentation

 (2002). Analysis of categorical marketing data by generalized constrained
  multiple correspondence analysis
24




Further Directions
 Extension to Multiple Correspondence Analysis
   Methods that let us combine nominal, numeric, and ordinal
    variables
   Methods that let us group variables into sets.
    E.g. could ensures that psychographic, behavioral and demographic
     have an equal influence on the final solution.


 Methods that simultaneously preform dimensionality
  reduction and cluster discovery
   Optimizes the entire analysis to discover the most distinctive
    clusters
   Very promising approach
       Con: I have not found an implementation of these methods.

More Related Content

PPT
PPT
PDF
Multiple regression
PDF
Beyond the Mean
PDF
Ahp calculations
PPT
PDF
Data analysis Design Document
PPT
Multiple regression
Beyond the Mean
Ahp calculations
Data analysis Design Document

What's hot (20)

PPT
PDF
Logistic regression
PDF
Using Problem-Specific Knowledge and Learning from Experience in Estimation o...
PDF
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
PDF
11.selection method by fuzzy set theory and preference matrix
PDF
Selection method by fuzzy set theory and preference matrix
PDF
Memetic search in differential evolution algorithm
PDF
Uncertainty Management
DOCX
AHP-ANALYTIC HIERARCHY PROCESS- How To Slove AHP in Excel
DOC
Figure 1
DOC
352735322 rsh-qam11-tif-03-doc
PDF
Df24693697
PPT
DOC
352735346 rsh-qam11-tif-16-doc
PDF
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
PPTX
ADAN Symposium
PPTX
Decision Tree and Bayesian Classification
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
Three case studies deploying cluster analysis
PPT
Segmentation for Targeting
Logistic regression
Using Problem-Specific Knowledge and Learning from Experience in Estimation o...
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
11.selection method by fuzzy set theory and preference matrix
Selection method by fuzzy set theory and preference matrix
Memetic search in differential evolution algorithm
Uncertainty Management
AHP-ANALYTIC HIERARCHY PROCESS- How To Slove AHP in Excel
Figure 1
352735322 rsh-qam11-tif-03-doc
Df24693697
352735346 rsh-qam11-tif-16-doc
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
ADAN Symposium
Decision Tree and Bayesian Classification
International Journal of Engineering Research and Development (IJERD)
Three case studies deploying cluster analysis
Segmentation for Targeting
Ad

Similar to Optimizing Market Segmentation (20)

PPTX
DIY market segmentation 20170125
PPTX
Market Research_Grp 3_27-05-2024[1].pptx
PDF
Presentation nicole huyghe (advanced analytics) get inspired 2012
PPTX
marketAnalyticsFinal
PPTX
Marketing analysis
PPTX
Agile analytics : An exploratory study of technical complexity management
PDF
Data Con LA 2022 - Real world consumer segmentation
DOCX
12820191©JMH [email protected] - No redistribution.docx
PDF
ASA conference Feb 2013
PDF
Prediction Model Using Web Usage Mining Techniques
PDF
Ordina Planning & Scheduling Day - APS - powerful forecasting for a good plan...
PDF
RepèRes Bayesia Consumer Segmentation Skim Conf08
PPTX
machineLearningTypingTool_Rev1
PPTX
Store segmentation progresso
PPTX
Marketing analytics - clustering Types
PDF
Term Paper on WEKA
PDF
Leveraging sql server to improve vector display through point clustering
PDF
ICSE 2011 Panel - Wolfram Schulte
PPTX
Useful interactions
DIY market segmentation 20170125
Market Research_Grp 3_27-05-2024[1].pptx
Presentation nicole huyghe (advanced analytics) get inspired 2012
marketAnalyticsFinal
Marketing analysis
Agile analytics : An exploratory study of technical complexity management
Data Con LA 2022 - Real world consumer segmentation
12820191©JMH [email protected] - No redistribution.docx
ASA conference Feb 2013
Prediction Model Using Web Usage Mining Techniques
Ordina Planning & Scheduling Day - APS - powerful forecasting for a good plan...
RepèRes Bayesia Consumer Segmentation Skim Conf08
machineLearningTypingTool_Rev1
Store segmentation progresso
Marketing analytics - clustering Types
Term Paper on WEKA
Leveraging sql server to improve vector display through point clustering
ICSE 2011 Panel - Wolfram Schulte
Useful interactions
Ad

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Encapsulation theory and applications.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
Teaching material agriculture food technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Cloud computing and distributed systems.
Chapter 3 Spatial Domain Image Processing.pdf
Empathic Computing: Creating Shared Understanding
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
sap open course for s4hana steps from ECC to s4
Assigned Numbers - 2025 - Bluetooth® Document
Encapsulation theory and applications.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Review of recent advances in non-invasive hemoglobin estimation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Teaching material agriculture food technology
Network Security Unit 5.pdf for BCA BBA.
Building Integrated photovoltaic BIPV_UPV.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
A comparative analysis of optical character recognition models for extracting...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Advanced methodologies resolving dimensionality complications for autism neur...
Unlocking AI with Model Context Protocol (MCP)
MYSQL Presentation for SQL database connectivity
Cloud computing and distributed systems.

Optimizing Market Segmentation

  • 2. 2 Why is Segmentation Difficult?  Infinite number of possible solutions  Hundreds of possible variables for to use  Clearly defined clusters are rarely present in real life data- sets
  • 3. 3 Technical Challenges  Challenge: Incorporating fundamentally categorical variables  Ethnicity, Religion, Political Party, Etc.  Standard methods assume continuous data (ideal case) and require interval level data as worst case (e.g. ratings scales)  Correlation, linear regression, k-means clustering
  • 4. 4 Technical Solutions  Challenge: Incorporating fundamentally categorical variables  Multiple Correspondence Analysis (factor analysis for categorical data)  Pro: handles both demographic (categorical) and ratings variables  Would allow treating sets of variables separately (i.e. demographic, behavioral, psychological) – these sets could be used as inputs to clustering method  Con: segmentation would be based on extracted components
  • 5. 5 Technical Challenges  Determining the number of clusters/segments in the data  Standard methods require the user to specify the number of cluster to extract  Our standard practice results in fewer clusters then input variables  e.g. AMC segmentation solutions required ~12 variables to find ~5 segments  This ratio of features-to-segments will „water-down‟ the effect of the individual variables (segments do not differ significantly on most items)
  • 6. 6 Technical Solution 1  Challenge: Determining the number of clusters/segments in the data  Solution: fit a probabilistic mixture model and compute a complexity penalized likelihood (AIC / BIC scores)  The model with the best AIC / BIC score is our best guess for the number of natural clusters in the data  Gaussian mixture models for continuous data  Latent Class Models for categorical data  Latent class models can handle both categorical and continuous data if the continuous data is binned.  Both of the above return BIC scores to determine the number of clusters
  • 7. 7 Technical Solution 1  How many clusters do you see? (4 sources generated the data –duh)
  • 8. 8 Technical Solution 1  The BIC infers 4 clusters (4 clusters solution had the best BIC score)
  • 9. 9 Technical Solution 1  How many clusters do you see? (4 sources generated the data –not so obvious)
  • 10. 10 Technical Solution 1  The BIC says 4! (4 clusters had the best BIC score, thanks BIC!)
  • 11. 11 Technical Solution 2  Challenge: Determining the number of clusters/segments in the data  Solution: ensure there are fewer input variables then extracted clusters 2(+) segments can be obtained from a single variable. That is a 2-1 ratio of segments-to- variables For AMC & MTV we got 5 segments from ~12 variables. A ratio of 0.4-1. - That is less then 1 segments for every two variables…  Also See: Van Buuren & Heiser (1989); Vichi & Kiers (2001); Hwang, Dillon, & Takane (2006).
  • 12. 12 Technical Challenges  Respondents vary in their use of ratings scales  Some respondents only use part of the scale,  Either top or bottom of range  Segmentation method will find the high/low scale-use respondents and define segments for them  See AMC segments,
  • 13. 13 Psychographic banner for AMC segments. These items were not used to define the cluster solution.
  • 14. 14 Technical Solution 1  Challenge: Respondents vary in their use of ratings scales  Calibrate respondents to equate ratings scale across sample  Overcoming Scale Use Heterogeneity (2003) Peter E. Rossi  Pro: Improves the accuracy and validly of standard methods  E.g. correlation, regression, clustering  Con: requires complex and computational expensive models  i.e. hierarchical bayesian models – available as R package
  • 15. 15 Technical Solution 2  Challenge: Respondents vary in their use of ratings scales  Abandon rating scales – use simple Agree/Disagree variables  Focus on methods for categorical variables  Multiple Correspondence Analysis (factor analysis for categorical data)  Pro: handles both demographic (categorical) and ratings variables  Would allow treating sets of variables separately (i.e. demographic, behavioral, psychological) – these sets could be used as inputs to clustering methods
  • 16. 16 What Slows Us Down?  Each segmentation iteration consumes resources  Producing new segmentation variable for each respondent  .5 man hour  Producing new banners  Generating tables - .25 hours  Formatting and printing – 1+ man hours  Analyzing full banner for new segmentation  Requires entire research team, 6+ man hours
  • 17. 17 How to Speed it up  Producing new segmentation variable for each respondent  .5 man hour – Not the bottleneck  Producing new banners  Generating tables - .25 hours – Not the bottleneck  Formatting and printing – 1+ man hours – Potential for Automation  Analyzing full banner for the new segmentation  Requires entire research team, 6+ man hours – workflow bottleneck  Ideas / brainstorm  Criteria of success is often vague  When the goal is well defined quant methods can increase efficiency  If you can formalize it you can solve it  Time invested in the planning phase will reap productivity gains during analysis
  • 18. 18 Hypothetical Case Study  Goals Brainstorm:  Client and previous research says:  “segmentation should differentiate enthusiasts (early adopters) and utility consumers (late adopters)”  “also, segmentation should include demographics that are known to influence technology adoption.  Age, Gender, Income, Education  Quant answers:  “Ok, lets write a battery of questions addressing consumers perceptions and relation to technology products – this will be distilled into a single „tech enthusiasm‟ measure.  “Also, all relevant demographic information can be reduced into a one (or more) demo factors  “Segments will be defined from a „reduced dimensionality‟ representation of the data (MCA)”
  • 20. 20 Hypothetical Case Study Categories graph
  • 21. 21 Hypothetical Case Study Combined graph
  • 23. 23 MCA for Segmentation  (2006). An extension of multiple correspondence analysis for identifying heterogeneous subgroups of respondents  (2010). Traveler segmentation strategy with nominal variables through correspondence analysis  (2010). Fuzzy cluster multiple correspondence analysis  (2010). Simultaneous two-way clustering of multiple correspondence analysis  (2005). A simultaneous approach to constrained multiple correspondence analysis and cluster analysis for market segmentation  (2002). Analysis of categorical marketing data by generalized constrained multiple correspondence analysis
  • 24. 24 Further Directions  Extension to Multiple Correspondence Analysis  Methods that let us combine nominal, numeric, and ordinal variables  Methods that let us group variables into sets.  E.g. could ensures that psychographic, behavioral and demographic have an equal influence on the final solution.  Methods that simultaneously preform dimensionality reduction and cluster discovery  Optimizes the entire analysis to discover the most distinctive clusters  Very promising approach  Con: I have not found an implementation of these methods.