SlideShare a Scribd company logo
Multivariate Analysis
Segmenting Stores in Soup Case Study
D3M
This is the store demo file
Variables in Store Demo File
Objective
 Segment the 2000 IRI stores into smaller groups
 Interpret the segments you created
 Compute the price elasticity for each segment and
discuss the pricing strategy that Progresso should
pursue to maximize profits
 State the assumptions used in deriving optimal prices for
profit maximization
 Discuss the practicality of your recommended pricing
strategy
Approach
 Questions you should ask
 Segmentation based on what??
 How many segments??
 Always start by summarizing variables in your
data and understanding the basic relationships
 Understand the correlation b/w variables –store
demographics & market shares
 These are what we will use for segmentation
As usual, start by summarizing the data
Several of the
Demographic
variables are Highly
Correlated
Correlations of Market Shares Across 2000 Stores
What can we learn about Progresso’s Competitors from just correlations?
Campbell does well in Midwest
& South
Progresso is strong in East,
followed by West
Segmentation of IRI Stores
D3M
Factor & Cluster Analysis
Learning Objectives
 Unsupervised Learning Methods
 Principle component, Factor Analysis, & Clustering
 Objective is Dimension Reduction
 Reduce the number of collinear variables (PCA/Factor)
 Group your rows (e.g. customers, markets, counties): Cluster Analysis
Additional Learning Resources
 MIT Open Courses Lecture 11 & 14
 Data Mining Class at U of Chicago (Lecture notes 7 & 8)
 Stanford course on Machine Learning: Watch Lecture 10 on
“Unsupervised Learning”
Note the Difference between Cluster and PCA/Factor analysis
V1 V2 V3 V4 V5 V20…..
Cluster
Analysis
(Group Subjects)
Factor
Analysis
(Group Variables)
Data
Variable Reduction Techniques
You are working with columns here
We will look at 2 Techniques
 Principle Component Analysis
 Factor Analysis
PCA/Factor Analysis
 Our demographic variables are highly correlated
 If we were to use these in a Regression model for example, we will high
multicollinearity
 A useful technique for reducing the number of variables is
Principle Component Analysis (PCA) & Factor Analysis
 PCA/Factor analysis is able to summarize the information
contained in a larger number of variables into a smaller number
of ‘factors’ without significant loss of information
 Widely used technique in a variety of fields ranging from
Psychometrics to analysis of unstructured data like text or
images
If we use 3 components, we capture approximately 84% of information
contained in the 10 demographics
Eigenvalues of a matrix are also
called characteristic roots and
represents the variance accounted
for by a linear combination of the
variables. Usually # of components
to use is Eigenvalue greater than 1.
In our case its 3
Principle Component Analysis
Look for large positive or negative numbers for
each factor. See the corresponding variable
names to interpret the underlying ‘factor’
These are called factor “loadings”. Measures the correlation between each demographic
and the underlying “factor”. Our Job to Interpret and put a label to these.
Factor Analysis
Using 3 “factors” instead of 10
demographics, we capture approx.
84% of the information.
What do these techniques do?
 Take a large number of variables
that are highly correlated & create
new variables
 New variables (components or
factors) are linear combinations of
our current variables
 Goal is to retain most of the
variability (information) in the data
 Reduce the dimension of the
problem with little loss of
information
 Newly created variables are
orthogonal (no correlation)
Note: Our current application of 10 demographic variables is
quite trivial. We will see larger problems where these methods
are more useful
These are the
new variables
in our data.
Our job is to
interpret
them. The
new variables
(factors) are
standardized
and
uncorrelated.
We can use
them further
for other
analysis, for
example
Segmentation
of stores in
our data.
Examine the Factor Scores
The new variables (Factors) have a mean of 0 and Std of 1.
They are orthogonal to each other (zero correlation)
Cluster of Variable Algorithm
We can use Median
Income, % Kids 18,
and % Black. These
3 variables will be
representative of
other demographics
in its cluster
Cluster Analysis
Segmentation of IRI Stores
D3M
Now we are interested in grouping rows (Stores in our case)
V1 V2 V3 V4 V5 V20…..
Cluster
Analysis
(Group Subjects)
Factor
Analysis
(Group Variables)
Data
21
Cluster Analysis
Cluster analysis is a technique used
to identify groups of ‘similar’
customers in a market (i.e., market
segmentation).
Cluster analysis encompasses a
number of different algorithms and
methods for grouping objects of
similar kind into categories.
22
General question: how to organize observed
data into meaningful structures
• Examples:
o In food stores items of similar nature, such as
different types of meat or vegetables are displayed in
the same or nearby locations.
o Biologists have to organize the different species of
animals-- man belongs to the primates, the
mammals, the amniotes, the vertebrates, and the
animals.
o In medicine, clustering diseases, cures for diseases,
or symptoms of diseases can lead to very useful
taxonomies.
o In the field of psychiatry, the correct diagnosis of
clusters of symptoms such as paranoia,
schizophrenia, etc. is essential for successful
therapy.
o Collaborative filtering & Recommendation systems
23
Cluster Analysis
Cluster analysis works on the principle of maximizing the between-
cluster variance while minimizing the within cluster variance
Methods: Hierarchical & K-mean Clustering
Clustering Methods
 Hierarchical clustering is an iterative process that starts with
each observation in its own cluster. At each stage, the
algorithm combines two clusters that are closest together. At
the final stage, all observations are in one cluster.
 Useful for small data sets, takes a long time for large tables.
24
 K-means clustering starts with a known number of clusters, k. The
algorithm picks k cluster seed points, then assigns each observation
to a cluster. It then replaces the cluster seeds with the cluster
means and repeats until the clusters stabilize.
 Works well with large data sets
Hierarchical Clustering of Stores
Questions to Ask: Clustering based on what? How Many Segments?
Store segmentation progresso
Exercise
 Conduct a Hierarchical cluster analysis based on
 Saved Factor Scores & Market Shares of Brands
 To keep things manageable, lets use a 5-segment solution
 Interpret the clusters based on
 Median Income, % Kids Under 18, % White, & Market Shares
 What segment has the highest appeal for Progresso?
 Save the cluster membership and merge file with Transaction
data
 Redo the regression analysis and analyze the own & cross-price elasticity in
each segment
 Suggest an optimal pricing strategy for Progresso for each segment
 Discuss practical considerations in using such segmentation/pricing scheme

More Related Content

PPTX
Pricing strategy progresso
PPTX
Brand Analytics
PPTX
Regressioin mini case
PDF
Pricing Strategies for Brands
PPTX
Brand Asset Case Study
PDF
STOCK_ANALYSIS_PROJECT
PPT
PPTX
Econometric methods to study markets
Pricing strategy progresso
Brand Analytics
Regressioin mini case
Pricing Strategies for Brands
Brand Asset Case Study
STOCK_ANALYSIS_PROJECT
Econometric methods to study markets

What's hot (19)

PPTX
Linear regression
PDF
MidTerm memo
PPT
Over Priced Listings
PDF
WeikaiLi_Publication
PDF
Value investing and emerging markets
PPT
Demand estimation
PPT
Chapter 7
PDF
muthu.shree
PDF
Black_JPM93_Beta_And_return
PPTX
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
PDF
Xue paper-01-13-12
PPTX
statistical measurement project presentation
PPTX
The X Factor
PPTX
Capm theory portfolio management
PDF
Financial Economics Essay Maxwell Mayhew 2015
DOCX
criticalthinkingquestion4
PDF
Supply Chain Metrics That Matter
PDF
Effective demand planning - our vision at Solventure
DOCX
PRM project report
Linear regression
MidTerm memo
Over Priced Listings
WeikaiLi_Publication
Value investing and emerging markets
Demand estimation
Chapter 7
muthu.shree
Black_JPM93_Beta_And_return
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
Xue paper-01-13-12
statistical measurement project presentation
The X Factor
Capm theory portfolio management
Financial Economics Essay Maxwell Mayhew 2015
criticalthinkingquestion4
Supply Chain Metrics That Matter
Effective demand planning - our vision at Solventure
PRM project report
Ad

Similar to Store segmentation progresso (20)

DOCX
Exam Short Preparation on Data Analytics
PDF
Data Science - Part V - Decision Trees & Random Forests
PPTX
ML Unjkfmvjmnb ,mit-2 - Rejhjmfnvhjmnv gression.pptx
PDF
Classification and decision tree classifier machine learning
PPTX
DIY market segmentation 20170125
PPT
Cluster2
PDF
How to understand and implement regression analysis
PDF
ML Foundations: A 3-Day Journey into Machine Learning
PDF
Data Science Interview Questions PDF By ScholarHat
PDF
Dwdm chapter 5 data mining a closer look
PDF
Chapter 1.pdf
PPTX
Machine learning session6(decision trees random forrest)
PPTX
Data Analysis - Approach & Techniques
PDF
Supervised learning techniques and applications
PDF
Factor analysis using spss 2005
PPTX
Machine learning module 2
PPT
Market Research using SPSS _ Edu4Sure Sept 2023.ppt
PPTX
BIG MART SALES PRIDICTION PROJECT.pptx
PPTX
BIG MART SALES.pptx
PPTX
Big Data Analytics.pptx
Exam Short Preparation on Data Analytics
Data Science - Part V - Decision Trees & Random Forests
ML Unjkfmvjmnb ,mit-2 - Rejhjmfnvhjmnv gression.pptx
Classification and decision tree classifier machine learning
DIY market segmentation 20170125
Cluster2
How to understand and implement regression analysis
ML Foundations: A 3-Day Journey into Machine Learning
Data Science Interview Questions PDF By ScholarHat
Dwdm chapter 5 data mining a closer look
Chapter 1.pdf
Machine learning session6(decision trees random forrest)
Data Analysis - Approach & Techniques
Supervised learning techniques and applications
Factor analysis using spss 2005
Machine learning module 2
Market Research using SPSS _ Edu4Sure Sept 2023.ppt
BIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES.pptx
Big Data Analytics.pptx
Ad

More from veesingh (11)

PPTX
Slalom
PPTX
Identification1
PPTX
Fat Tax Slideshow
PPTX
Correlation causality
PPTX
Unsupervised learning
PPTX
Obesity
PPTX
Field experiments
PPTX
Brand mining
PPTX
D3M Commodity
PPTX
D3M Online Reviews
PPTX
D3M Politics
Slalom
Identification1
Fat Tax Slideshow
Correlation causality
Unsupervised learning
Obesity
Field experiments
Brand mining
D3M Commodity
D3M Online Reviews
D3M Politics

Recently uploaded (20)

PPTX
5 Stages of group development guide.pptx
PDF
Roadmap Map-digital Banking feature MB,IB,AB
PDF
Laughter Yoga Basic Learning Workshop Manual
PPTX
New Microsoft PowerPoint Presentation - Copy.pptx
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
PDF
Nidhal Samdaie CV - International Business Consultant
PPTX
ICG2025_ICG 6th steering committee 30-8-24.pptx
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
DOCX
unit 1 COST ACCOUNTING AND COST SHEET
PDF
Outsourced Audit & Assurance in USA Why Globus Finanza is Your Trusted Choice
PDF
20250805_A. Stotz All Weather Strategy - Performance review July 2025.pdf
PPT
Data mining for business intelligence ch04 sharda
DOCX
Business Management - unit 1 and 2
PPTX
The Marketing Journey - Tracey Phillips - Marketing Matters 7-2025.pptx
PDF
Ôn tập tiếng anh trong kinh doanh nâng cao
PDF
How to Get Business Funding for Small Business Fast
PDF
Unit 1 Cost Accounting - Cost sheet
PPTX
Lecture (1)-Introduction.pptx business communication
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
PPT
Chapter four Project-Preparation material
5 Stages of group development guide.pptx
Roadmap Map-digital Banking feature MB,IB,AB
Laughter Yoga Basic Learning Workshop Manual
New Microsoft PowerPoint Presentation - Copy.pptx
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
Nidhal Samdaie CV - International Business Consultant
ICG2025_ICG 6th steering committee 30-8-24.pptx
Belch_12e_PPT_Ch18_Accessible_university.pptx
unit 1 COST ACCOUNTING AND COST SHEET
Outsourced Audit & Assurance in USA Why Globus Finanza is Your Trusted Choice
20250805_A. Stotz All Weather Strategy - Performance review July 2025.pdf
Data mining for business intelligence ch04 sharda
Business Management - unit 1 and 2
The Marketing Journey - Tracey Phillips - Marketing Matters 7-2025.pptx
Ôn tập tiếng anh trong kinh doanh nâng cao
How to Get Business Funding for Small Business Fast
Unit 1 Cost Accounting - Cost sheet
Lecture (1)-Introduction.pptx business communication
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
Chapter four Project-Preparation material

Store segmentation progresso

  • 1. Multivariate Analysis Segmenting Stores in Soup Case Study D3M This is the store demo file
  • 2. Variables in Store Demo File
  • 3. Objective  Segment the 2000 IRI stores into smaller groups  Interpret the segments you created  Compute the price elasticity for each segment and discuss the pricing strategy that Progresso should pursue to maximize profits  State the assumptions used in deriving optimal prices for profit maximization  Discuss the practicality of your recommended pricing strategy
  • 4. Approach  Questions you should ask  Segmentation based on what??  How many segments??  Always start by summarizing variables in your data and understanding the basic relationships  Understand the correlation b/w variables –store demographics & market shares  These are what we will use for segmentation
  • 5. As usual, start by summarizing the data
  • 6. Several of the Demographic variables are Highly Correlated
  • 7. Correlations of Market Shares Across 2000 Stores What can we learn about Progresso’s Competitors from just correlations?
  • 8. Campbell does well in Midwest & South Progresso is strong in East, followed by West
  • 9. Segmentation of IRI Stores D3M
  • 10. Factor & Cluster Analysis Learning Objectives  Unsupervised Learning Methods  Principle component, Factor Analysis, & Clustering  Objective is Dimension Reduction  Reduce the number of collinear variables (PCA/Factor)  Group your rows (e.g. customers, markets, counties): Cluster Analysis Additional Learning Resources  MIT Open Courses Lecture 11 & 14  Data Mining Class at U of Chicago (Lecture notes 7 & 8)  Stanford course on Machine Learning: Watch Lecture 10 on “Unsupervised Learning”
  • 11. Note the Difference between Cluster and PCA/Factor analysis V1 V2 V3 V4 V5 V20….. Cluster Analysis (Group Subjects) Factor Analysis (Group Variables) Data
  • 12. Variable Reduction Techniques You are working with columns here We will look at 2 Techniques  Principle Component Analysis  Factor Analysis
  • 13. PCA/Factor Analysis  Our demographic variables are highly correlated  If we were to use these in a Regression model for example, we will high multicollinearity  A useful technique for reducing the number of variables is Principle Component Analysis (PCA) & Factor Analysis  PCA/Factor analysis is able to summarize the information contained in a larger number of variables into a smaller number of ‘factors’ without significant loss of information  Widely used technique in a variety of fields ranging from Psychometrics to analysis of unstructured data like text or images
  • 14. If we use 3 components, we capture approximately 84% of information contained in the 10 demographics Eigenvalues of a matrix are also called characteristic roots and represents the variance accounted for by a linear combination of the variables. Usually # of components to use is Eigenvalue greater than 1. In our case its 3 Principle Component Analysis
  • 15. Look for large positive or negative numbers for each factor. See the corresponding variable names to interpret the underlying ‘factor’ These are called factor “loadings”. Measures the correlation between each demographic and the underlying “factor”. Our Job to Interpret and put a label to these. Factor Analysis Using 3 “factors” instead of 10 demographics, we capture approx. 84% of the information.
  • 16. What do these techniques do?  Take a large number of variables that are highly correlated & create new variables  New variables (components or factors) are linear combinations of our current variables  Goal is to retain most of the variability (information) in the data  Reduce the dimension of the problem with little loss of information  Newly created variables are orthogonal (no correlation) Note: Our current application of 10 demographic variables is quite trivial. We will see larger problems where these methods are more useful These are the new variables in our data. Our job is to interpret them. The new variables (factors) are standardized and uncorrelated. We can use them further for other analysis, for example Segmentation of stores in our data.
  • 17. Examine the Factor Scores The new variables (Factors) have a mean of 0 and Std of 1. They are orthogonal to each other (zero correlation)
  • 18. Cluster of Variable Algorithm We can use Median Income, % Kids 18, and % Black. These 3 variables will be representative of other demographics in its cluster
  • 20. Now we are interested in grouping rows (Stores in our case) V1 V2 V3 V4 V5 V20….. Cluster Analysis (Group Subjects) Factor Analysis (Group Variables) Data
  • 21. 21 Cluster Analysis Cluster analysis is a technique used to identify groups of ‘similar’ customers in a market (i.e., market segmentation). Cluster analysis encompasses a number of different algorithms and methods for grouping objects of similar kind into categories.
  • 22. 22 General question: how to organize observed data into meaningful structures • Examples: o In food stores items of similar nature, such as different types of meat or vegetables are displayed in the same or nearby locations. o Biologists have to organize the different species of animals-- man belongs to the primates, the mammals, the amniotes, the vertebrates, and the animals. o In medicine, clustering diseases, cures for diseases, or symptoms of diseases can lead to very useful taxonomies. o In the field of psychiatry, the correct diagnosis of clusters of symptoms such as paranoia, schizophrenia, etc. is essential for successful therapy. o Collaborative filtering & Recommendation systems
  • 23. 23 Cluster Analysis Cluster analysis works on the principle of maximizing the between- cluster variance while minimizing the within cluster variance Methods: Hierarchical & K-mean Clustering
  • 24. Clustering Methods  Hierarchical clustering is an iterative process that starts with each observation in its own cluster. At each stage, the algorithm combines two clusters that are closest together. At the final stage, all observations are in one cluster.  Useful for small data sets, takes a long time for large tables. 24  K-means clustering starts with a known number of clusters, k. The algorithm picks k cluster seed points, then assigns each observation to a cluster. It then replaces the cluster seeds with the cluster means and repeats until the clusters stabilize.  Works well with large data sets
  • 25. Hierarchical Clustering of Stores Questions to Ask: Clustering based on what? How Many Segments?
  • 27. Exercise  Conduct a Hierarchical cluster analysis based on  Saved Factor Scores & Market Shares of Brands  To keep things manageable, lets use a 5-segment solution  Interpret the clusters based on  Median Income, % Kids Under 18, % White, & Market Shares  What segment has the highest appeal for Progresso?  Save the cluster membership and merge file with Transaction data  Redo the regression analysis and analyze the own & cross-price elasticity in each segment  Suggest an optimal pricing strategy for Progresso for each segment  Discuss practical considerations in using such segmentation/pricing scheme