Wholesale Customer Data
Clustering
By - Divya Ganjoo
Data
https://archive.ics.uci.edu/ml/datasets/Wholesale+customers#
Attribute Information:
1) FRESH: annual spending (m.u.) on fresh products (Continuous);
2) MILK: annual spending (m.u.) on milk products (Continuous);
3) GROCERY: annual spending (m.u.)on grocery products (Continuous);
4) FROZEN: annual spending (m.u.)on frozen products (Continuous)
5) DETERGENTS_PAPER: annual spending (m.u.) on detergents and paper products (Continuous)
6) DELICATESSEN: annual spending (m.u.)on and delicatessen products (Continuous);
7) CHANNEL: customers’ Channel - Horeca (Hotel/Restaurant/Café) or Retail channel (Nominal)
8) REGION: customers’ Region – Lisnon, Oporto or Other (Nominal)
Exploring data
● Exploring the data reveals that channels contain more variability than the regions (Example graph above).
Similar patterns emerge for other variables.
● Strong correlation is found in - Grocery & Detergent - 0.92, Milk & Det - 0.66 and Milk & Grocery - 0.73
● Looking at various graphs such as the one above, we can roughly estimate 3 clusters
Hierarchical
● Different hierarchical clusterings dendrograms roughly categorize data into 3 or 4 groups
● Some observations seem to be separated out consistently (even though its hard to read here)
● Above dendrogram based on Milk and Grocery AND Milk and Det_Paper
Fig: Dendrogram: Milk and Grocery
Fig: Dendrogram: Milk and
Detergent/Paper
Fig: Heatmap: Milk and
Grocery
K-means
● 3 seems to be the optimum value for no. of clusters as
seen from the plot of Within SS against no. of clusters
● Running kmeans with cluster size = 3, we get the following
centers >>>
Conclusion
Cluster1: High Milk, Grocery, Det_ Paper
Cluster2: High Fresh, Frozen,
Deli
Cluster 3: Low Spenders
Channel 1 Channel 2
Cluster 1 2 48
Cluster 2 52 8
Cluster 3 244 86
Cluster 1: High spenders in Retail channel (Channel 2) tend to spend on
Grocery, Milk and Det_Paper categories
Cluster 2: High spenders in Horeca channel (Channel 1) tend to spend higher
on Fresh product category
Cluster 3: Low spenders
Appendix
● Observation: If we run Kmeans with Grocery, Detergent_Paper and Milk, we can capture 97% variability in data

More Related Content

PDF
SAP FICO BBP Sample Document PDF NEW!
PPTX
Outlier analysis,Chapter-12, Data Mining: Concepts and Techniques
PPT
Data Mining
PPTX
Bank churn with Data Science
PPTX
Data preprocessing
PDF
Churn in the Telecommunications Industry
PPTX
Data mining and analysis of customer churn dataset
PPTX
Bagging.pptx
SAP FICO BBP Sample Document PDF NEW!
Outlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Data Mining
Bank churn with Data Science
Data preprocessing
Churn in the Telecommunications Industry
Data mining and analysis of customer churn dataset
Bagging.pptx

What's hot (20)

PPT
Data Mining In Market Research
PPTX
Decision tree
PDF
Predicting Bank Customer Churn Using Classification
PDF
Telecom Churn Prediction
PPT
MIS637_Final_Project_Rahul_Bhatia
PDF
Predicting Credit Card Defaults using Machine Learning Algorithms
PPTX
Churn customer analysis
PPTX
K-Means Clustering Algorithm.pptx
PPTX
Account based COPA in SAP
PDF
Customer churn prediction in banking
PPT
system development life cycle SDLC
PPTX
knapsack problem
PDF
BDEx - The must have tool for Utilities running SAP CR&B
PPTX
Multi dimensional model vs (1)
PPTX
Market Basket Analysis
PDF
Credit Management Process step by step in SAP SD
PPT
Churn prediction
PPTX
DATA WRANGLING presentation.pptx
DOCX
Standard price & moving average price in SAP
Data Mining In Market Research
Decision tree
Predicting Bank Customer Churn Using Classification
Telecom Churn Prediction
MIS637_Final_Project_Rahul_Bhatia
Predicting Credit Card Defaults using Machine Learning Algorithms
Churn customer analysis
K-Means Clustering Algorithm.pptx
Account based COPA in SAP
Customer churn prediction in banking
system development life cycle SDLC
knapsack problem
BDEx - The must have tool for Utilities running SAP CR&B
Multi dimensional model vs (1)
Market Basket Analysis
Credit Management Process step by step in SAP SD
Churn prediction
DATA WRANGLING presentation.pptx
Standard price & moving average price in SAP

Similar to Cluster analysis - Wholesale customers data set (20)

PPTX
Analytics Assignment - Cluster analysis
PPTX
Cluster Analysis
PPTX
E-commerce Customer Segmentation: Unlocking Consumer Insights
PPTX
Unlocking Insights: Advanced Customer Segmentation Strategies
PPTX
R-language
PPT
Nielsen recap Gerald Naelaerts
PPT
Shopper Research 2002 (ECR Greece)
PPTX
Dominick’s retail analysis
PDF
E-commerce Customer Segmentation and Predictive Modeling: Enhancing Marketing...
PDF
Segmentation Methods for Management Consultants & Business Analysts
PDF
Advanced Retail Science Viewpoint by Brian Hart
PPTX
Customer Profiling
PDF
Market Segmentation Customer Maximum Profit
PPTX
Clustering Applicationasjkdfajfnakfnak.pptx
PPTX
Category Management 6 Planogram Steps You Need to Know.pptx
PPTX
Database Marketing - Dominick's stores in Chicago distric
PDF
Customer Clustering For Retail Marketing
PPTX
Online Grocery Store Segmentation Presentation
PDF
Affordability recession iri
PPTX
Brand mining
Analytics Assignment - Cluster analysis
Cluster Analysis
E-commerce Customer Segmentation: Unlocking Consumer Insights
Unlocking Insights: Advanced Customer Segmentation Strategies
R-language
Nielsen recap Gerald Naelaerts
Shopper Research 2002 (ECR Greece)
Dominick’s retail analysis
E-commerce Customer Segmentation and Predictive Modeling: Enhancing Marketing...
Segmentation Methods for Management Consultants & Business Analysts
Advanced Retail Science Viewpoint by Brian Hart
Customer Profiling
Market Segmentation Customer Maximum Profit
Clustering Applicationasjkdfajfnakfnak.pptx
Category Management 6 Planogram Steps You Need to Know.pptx
Database Marketing - Dominick's stores in Chicago distric
Customer Clustering For Retail Marketing
Online Grocery Store Segmentation Presentation
Affordability recession iri
Brand mining

Recently uploaded (20)

PPTX
chrmotography.pptx food anaylysis techni
PDF
Loose-Leaf for Auditing & Assurance Services A Systematic Approach 11th ed. E...
PPT
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt
PPTX
CHAPTER-2-THE-ACCOUNTING-PROCESS-2-4.pptx
PDF
A biomechanical Functional analysis of the masitary muscles in man
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PPT
expt-design-lecture-12 hghhgfggjhjd (1).ppt
PPT
DU, AIS, Big Data and Data Analytics.ppt
PDF
Global Data and Analytics Market Outlook Report
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
AI AND ML PROPOSAL PRESENTATION MUST.pptx
PDF
Best Data Science Professional Certificates in the USA | IABAC
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
SET 1 Compulsory MNH machine learning intro
PPT
Image processing and pattern recognition 2.ppt
PDF
ahaaaa shbzjs yaiw jsvssv bdjsjss shsusus s
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
MBA JAPAN: 2025 the University of Waseda
PPT
statistic analysis for study - data collection
PPTX
New ISO 27001_2022 standard and the changes
chrmotography.pptx food anaylysis techni
Loose-Leaf for Auditing & Assurance Services A Systematic Approach 11th ed. E...
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt
CHAPTER-2-THE-ACCOUNTING-PROCESS-2-4.pptx
A biomechanical Functional analysis of the masitary muscles in man
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
expt-design-lecture-12 hghhgfggjhjd (1).ppt
DU, AIS, Big Data and Data Analytics.ppt
Global Data and Analytics Market Outlook Report
retention in jsjsksksksnbsndjddjdnFPD.pptx
AI AND ML PROPOSAL PRESENTATION MUST.pptx
Best Data Science Professional Certificates in the USA | IABAC
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
SET 1 Compulsory MNH machine learning intro
Image processing and pattern recognition 2.ppt
ahaaaa shbzjs yaiw jsvssv bdjsjss shsusus s
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
MBA JAPAN: 2025 the University of Waseda
statistic analysis for study - data collection
New ISO 27001_2022 standard and the changes

Cluster analysis - Wholesale customers data set

  • 2. Data https://archive.ics.uci.edu/ml/datasets/Wholesale+customers# Attribute Information: 1) FRESH: annual spending (m.u.) on fresh products (Continuous); 2) MILK: annual spending (m.u.) on milk products (Continuous); 3) GROCERY: annual spending (m.u.)on grocery products (Continuous); 4) FROZEN: annual spending (m.u.)on frozen products (Continuous) 5) DETERGENTS_PAPER: annual spending (m.u.) on detergents and paper products (Continuous) 6) DELICATESSEN: annual spending (m.u.)on and delicatessen products (Continuous); 7) CHANNEL: customers’ Channel - Horeca (Hotel/Restaurant/Café) or Retail channel (Nominal) 8) REGION: customers’ Region – Lisnon, Oporto or Other (Nominal)
  • 3. Exploring data ● Exploring the data reveals that channels contain more variability than the regions (Example graph above). Similar patterns emerge for other variables. ● Strong correlation is found in - Grocery & Detergent - 0.92, Milk & Det - 0.66 and Milk & Grocery - 0.73 ● Looking at various graphs such as the one above, we can roughly estimate 3 clusters
  • 4. Hierarchical ● Different hierarchical clusterings dendrograms roughly categorize data into 3 or 4 groups ● Some observations seem to be separated out consistently (even though its hard to read here) ● Above dendrogram based on Milk and Grocery AND Milk and Det_Paper Fig: Dendrogram: Milk and Grocery Fig: Dendrogram: Milk and Detergent/Paper Fig: Heatmap: Milk and Grocery
  • 5. K-means ● 3 seems to be the optimum value for no. of clusters as seen from the plot of Within SS against no. of clusters ● Running kmeans with cluster size = 3, we get the following centers >>>
  • 6. Conclusion Cluster1: High Milk, Grocery, Det_ Paper Cluster2: High Fresh, Frozen, Deli Cluster 3: Low Spenders Channel 1 Channel 2 Cluster 1 2 48 Cluster 2 52 8 Cluster 3 244 86 Cluster 1: High spenders in Retail channel (Channel 2) tend to spend on Grocery, Milk and Det_Paper categories Cluster 2: High spenders in Horeca channel (Channel 1) tend to spend higher on Fresh product category Cluster 3: Low spenders
  • 7. Appendix ● Observation: If we run Kmeans with Grocery, Detergent_Paper and Milk, we can capture 97% variability in data